I needed to install Microsoft Cluster Service (MSCS) on a couple of virtual machines acting as an Active-Passive cluster, with a rule to separate these VM's onto different hosts within VMware 3.5 U1.
Some may ask why (as I did) when you have the wonders that are HA and Vmotion why would you do this ? the short answer was that it took less time (slightly) for the MSCS cluster to fail over to another virtual machine on a different host in the event of hardware failure than it did for HA to realise there was a failure and move the machine and power it back on and the machine to start up.
And as this was a urgent application the MSCS cluster on top of VMware seemed to be the way to go.
So after reading the fantastic white paper by VMware:
Setup for Microsoft Cluster Service
Update 2 Release for ESX Server 3.5, ESX Server 3i version 3.5, VirtualCenter 2.5
Which can be downloaded here
Unfortunately I hit upon a problem in my planned design, on Page 16 of the above document which summarises the caveats for setting up a MSCS cluster on VMware, one of the bullet points was as follows:
- Clustered virtual machines cannot be part of VMware clusters (DRS or HA)
On thinking about this and the way that it worked and after adding a post on the VMware communities to confirm what I was thinking (great bunch of guys on there) I came up with the following...
As the MSCS Cluster has a private network which it uses for a heartbeat LAN there may be issues VMotioning the server over to another host, this as we know can cause a slight blip in the network communication and if this was to happen it may cause a problem with the MSCS cluster causing it to issue a failover and suddenly the service would be unavailable.
OK it makes sense, but that is quite a broad statement to make, "Clustered virtual machines cannot be part of VMware clusters (DRS or HA)". Why couldn't I just edit the DRS Cluster and set the Virtual Machines Automation Level to Disabled ? Surely this would stop the machines from being VMotioned and everyone would be happy ?
Whilst in theory this would work forever and a day and I would not get any issues I decided to confirm why the document did not reflect this setup as even if I did get it working it was point blankly "Not Supported" by VMware, and as we know working in a corporate environment this is a big NO NO.
Next Step: I raised a call with VMware to ask why this was the case, after a few games of Ping Pong I received my answer.....
You are correct in your assumption. It is not recommended to have MSCS VM's within a Vmware Cluster. This is for the reasons you mentioned above regarding interrupted heartbeats. You could set individual HA/DRS settings on these individual MSCS VM's in order to keep them running on the specific hosts. However, there would be issues in the future if these values were ever changed unknowingly by another administrator with access to the system.
Yes, I can confirm that this would be supported by VMware. However, if DRS ever tried to move these VM's even with the DRS setting disabled then this would be a bug and would need to be investigated.
I would recommend that you test this setup using a network monitoring tool such as IO Meter to ensure that no heartbeats are dropped from the VM.