VMFS datastore showing greyed out, inactive and unmounted

I’ve been having “fun” recently with VMware storage issues following a temporary loss of power to one of my two EMC Clariion CX4-480 SANs. The power outage was expected, and the SAN was shut down before the electricity was lost. However, ever since I’ve been periodically getting hosts greying out in the vSphere client. They’d eventually come back to life again, and the VMs running on them didn’t seem affected – though they did grey out in vSphere client too and were thus not manageable, but it was worrying – something wasn’t happy.

One host (they’re all running ESXi 5.0 update 2, build 914586) in particular had had several of these issues, more so than the others. So this morning, when it was looking normal, I decided to try putting it into maintenance mode (which worked) and then rebooted it once all the VMs had been moved off it.

When it came back up again, I noticed that one of the VMFS datastores was greyed out. Looking in the host’s Configuration – Hardware – Storage tab, that datastore was listed in the Identification column as <Name> (inactive) (unmounted), and the Capacity, Free and Type columns were all showing N/A:

vmfs inactive unmounted

I found the following entries in the /var/log/vmkernel.log file by looking for the naa reference for the problem LUN/datastore:

~ # cat /var/log/vmkernel.log | grep naa.600601607c7028006891d164db7be011
2013-02-21T08:31:55.213Z cpu26:4122)ScsiDeviceIO: 2324: Cmd(0x4124425654c0) 0x16, CmdSN 0x99f1 from world 0 to dev "naa.600601607c7028006891d164db7be011" fled H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2013-02-21T08:31:55.213Z cpu4:5147)LVM: 11918: Failed to open device naa.600601607c7028006891d164db7be011:1

Trying to do things on these VMs that involved much disk activity caused the VM to lock up and stop responding, even to pings. The host running the VM would then grey out. This eventually happened to about half the VMs on the datastore.

Trying to do an ls (directory listing) on the datastore via SSH to any host also failed after a few minutes.

According to VMware KB 289902 the H:0x5 is the Host status (Initiator), and means SG_ERR_DID_ABORT, aka Told to abort for some other reason, which is less than helpful.

Then I tried searching again and found this blog post, where it is mentioned that one of the steps early in the resolution was to fail over the SAN storage controller. Trespassing a LUN on the Clariion is pretty quick and easy to do. The problem LUN was owned by SPB, its default SP, so I trespassed it, which moved it to SPA. A few seconds later the VMs stored on the datastore started springing back to life,and the hosts recovered too, so that seems to have fixed it. I didn’t need to do any of the other stuff listed in Shabir Yusuf’s blog.

The host that had been rebooted was still showing the LUN as inactive and unmounted, so I right-clicked it and chose “Mount”, and it seems fine now.

This entry was posted in Storage, vSphere and tagged , , , , , . Bookmark the permalink.

4 Responses to VMFS datastore showing greyed out, inactive and unmounted

  1. I had the same issue, but I decided to try restarting the management agents on the ESXi first. So, I did a services.sh restart on the hosts with the troublesome LUNs and that fixed it.

  2. Noel says:

    Thank you so much! I have been pulling my hair out for two days with exactly this issue, EMC’s support next to useless, and this worked right away! Thanks!

  3. This definitely worked awesome thanks.

  4. sahil says:

    you are a life saver…..

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s