VMware resource reservations and HA

vSphere HA is the feature that restarts VMs from a failed host on another host in your HA cluster. Brutal, but if a host dies in the middle of the night it might be better than leaving all the VMs down until somebody is able to manually restart them on a different host.

You can see the status of the HA cluster from the vSphere client, click on the cluster and from the Summary tab there is a vSphere HA box that’ll say something like:

Admission Control:               Enabled
Current Failover Capacity:       2 hosts
Configured Failover Capacity:    N/A

Host Monitoring:                 Enabled
VM Monitoring:                   Disabled
Application Monitoring:          Disabled

Under this are three hyperlinks:

  • Advanced Runtime Info
  • Cluster Status
  • Configuration Issues

Advanced Runtime Info gives you interesting stats about the cluster:

Advanced runtime info for:       rcm-ha-cluster
Slot size:                       256 MHz
                                 8 virtual CPUs
                                 427 MB
Total slots in cluster:          371
Used slots:                      51
Available slots:                 173
Failover slots:                  147
Total powered on vms in cluster: 51
Total hosts in cluster:          3
Total good hosts in cluster:     3

So it all looks good on my cluster at the moment.

Now lets say you get an application admin getting twitchy about performance. They want to guarantee that their application will work at a consistent speed. You tell them that you have plenty of capacity in the cluster anyway, average CPU usage is about 15%, host RAM allocation is at about 60%, they have nothing to worry about. But they’re not buying it, perhaps they’ve had some performance issues, possibly related to storage, and they’re piling on the pressure. You tell them to go and talk to the storage admins, but to “be nice” you set a reservation on their VM to guarantee them some CPU and some RAM (even though, as you’re not even close to being overcommitted on either, it’ll have no effect). You have 105GHz of CPU available in the cluster, and 432GB RAM, so you set reservations on the VM of 4GHz and 4096MB.

Some time later you glance at the cluster summary tab, and hang on, what’s this:

Current Failover Capacity:       0 hosts

Erm, that’s not right… You click into the Advanced Runtime Info and:

Advanced runtime info for:       rcm-ha-cluster
Slot size:                       4000 MHz
                                 8 virtual CPUs
                                 4188 MB
Total slots in cluster:          22
Used slots:                      51
Available slots:                 0
Failover slots:                  0
Total powered on vms in cluster: 51
Total hosts in cluster:          3
Total good hosts in cluster:     3

That’s not very good is it? You used to have 371 slots, and now you only have 22? Notice how the slot size closely resembles the reservations you assigned to the VM that didn’t need them? Notice how you broke your HA by letting somebody who doesn’t understand how virtualisation operates guilt-trip you into making an unnecessary configuration change? ;-)

All is not lost. You can do several things:

  • Ask the application admin if they’ve noticed any changes since you gave them the reservations (answer will be “no”) and remove the VM reservations
  • Create a Storage Pool for the application server, set the reservations on that. Remove the VM reservations and then move the VM into the Storage Pool.
  • Add manual overrides on the HA slot size.
  • Not set unnecessary reservations in the first place. If you have plenty of resources available, a reservation will have no effect.

This is yet another reason why setting resource reservations on VMs directly is a bad idea. If you need reservations, do them at the Storage Pool (or vApp) level.

Note that the number of virtual CPUs in the advanced runtime info seems to have little effect on the slot size, it’s more down to the MHz and MB. It also only takes into account currently powered on VMs

If you’ve just read this and have noticed that your HA cluster has zero failover capacity, your hosts are not overprovisioned, and your slot size is somewhat higher than you expected then I have a PowerCLI script to help you identify VMs with reservations set on them directly. This assumes you’ve already run Connect-VIServer.

$Result = @()
$VMs = Get-VM -Location (Read-Host "Enter name of cluster to enumerate")
$VMsProcessed = 0
foreach($VM in $VMs){
    $obj = New-Object system.object
    $obj | Add-Member -MemberType NoteProperty -Name VMName -Value $VM.Name
    $obj | Add-Member -MemberType NoteProperty -Name PowerState -Value $VM.PowerState
    $obj | Add-Member -MemberType NoteProperty -Name CPUReservationMHz -Value $VM.VMResourceConfiguration.CpuReservationMhz
    $obj | Add-Member -MemberType NoteProperty -Name CpuLimitMhz -Value $VM.VMResourceConfiguration.CpuLimitMhz
    $obj | Add-Member -MemberType NoteProperty -Name MemReservationMB -Value $VM.VMResourceConfiguration.MemReservationMB
    $obj | Add-Member -MemberType NoteProperty -Name MemLimitMB -Value $VM.VMResourceConfiguration.MemLimitMB
    $Result += $obj
    Write-Progress -Activity "Getting VM details" -PercentComplete ($VMsProcessed/$VMs.Count*100)
$Result | Out-GridView
This entry was posted in PowerShell, vSphere and tagged , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.