Strange vSphere and Windows NLB communication failure issue

I have two VMWare vSphere HA clusters. Hosted within this vSphere environment is my Exchange 2007 system. I have two Exchange 2007 Hub Transport servers, these use NLB to (unsurprisingly) balance load between them. One of these VMs was unable to talk to a third VM, also within the vSphere environment – this third server is my WSUS system. All three servers are running Windows Server 2008.
All VMs were on the same VLAN, and all three could ping other servers on the VLAN. So Hub01 can ping WSUS, and Hub02 cannot. The failure is bi-directional so WSUS cannot ping Hub02 either. It’s not a DNS issue, that’s working fine. The vSwitches are configured as per VMWare KB article 1006778.
The problem was discovered to be present when one of the two NLB servers, Hub01, was on the same vSphere host as the WSUS server. When this condition was true the second NLB server, Hub02, could not talk to the WSUS box.
The solution has been to move the WSUS box to a different host than either of the two NLB servers. I’ll probably create an anti-affinity rule to ensure that this remains the case.
I can only assume that this is somehow down to how the NLB uses a shared MAC address between member servers, and how vSphere copes with this within its vSwitch code.

This entry was posted in vSphere. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.