Host disconnected in vSphere client

Have had this a few times, but today the host didn’t want to play after the usual restart of the mgmt-vmware service.

Full information: I had a vSphere 4.0 host showing as disconnected in the vSphere client. Host and all of its VMs were greyed out. Tried a right-click on the host and clicking connect but it wouldn’t.

Made an SSH connection onto the host, used su – to gain root access and tried

service mgmt-vmware restart

which then hung when trying to restart VMware ESX Server Host Agent. I made a second SSH connection (and used su – again), within that session I used

ps -ef | grep vmware-hostd

to get the process ID of the vmware-hostd process, then used

kill -9 PID

After a few minutes the restart command in the first session finished.

Tried to connect to the host via vSphere client again but still no joy:

Cannot contact the specified host (xxxxx). The host may not be available on the network, a network configuration problem may exist, or the management services on this host may not be responding.

Interestingly, this message was returned almost immediately just after the restart command finished, but a minute or so later took much longer to pop up.

So next I tried listing the vmfs volumes

ls /vmfs/volumes

This took ages to appear and when it did three of the LUNs were highlighted. I used putty for SSH, and the colours I have are the defaults I think, so most of the LUNs were listed in pale blue (preceded by a GUID in dark blue). The three highlighted ones were not pale blue, but instead white text on a pink background.

This command worked fine on a different host that could see the same LUNs, so I tried a rescan of one of the HBAs

esxcfg-rescan vmhba1

If you want to list all your HBAs then (on >=4.0) use

esxcfg-scsidevs -a

The esxcfg-scsidevs command replaces the esxcfg-vmhbadevs command from older versions of ESX.

After rescanning the HBA I then issued the service restart command again. This finished quickly, I left everything for a minute then tried right-clicking on the host and choosing Connect from vSphere client. This time I got a bit of a delay then a progress bar in the Recent tasks section, and the host connected.

Several of the VMs on the host that live on the affected LUNs were powered off, one had bluescreened with a KERNEL_STACK_INPAGE_ERROR and another one had alternating errors in the Windows event log:

Event Type: Error
Event Source: symmpi
Event Category: None
Event ID: 15
Date: 03/12/2010
Time: 08:59:14
User: N/A
Computer: xxxxxxxx
Description:
The device, \Device\Scsi\symmpi1, is not ready for access yet.

Event Type: Error
Event Source: Disk
Event Category: None
Event ID: 11
Date: 03/12/2010
Time: 08:59:14
User: N/A
Computer: xxxxxxxx
Description:
The driver detected a controller error on \Device\Harddisk0.

So it looks like for some reason the host lost connectivity to some of its LUNs. Am still trying to investigate what happened. The main thing is that I was able to recover connectivity to the host without having to reboot it, and thus did not have to kill the VMs still running on the host.

This entry was posted in vSphere and tagged , , , , , , , . Bookmark the permalink.

2 Responses to Host disconnected in vSphere client

  1. rcmtech says:

    Think this was caused by a Clariion fault, see EMC239941 – Access to thin LUNs is lost.

  2. rcmtech says:

    See another issue I had that caused a host to be disconnected in the vSphere client here: https://rcmtech.wordpress.com/2012/10/04/host-disconnected-in-vsphere-client-ii/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s