Just updated one of my R720 ESXi 5.0 hosts. I did the firmware via the Dell OM_711_SUU_FULL_ISO_A00.iso which included things like BIOS 1.3.6 but also updated the Broadcom 5719 NIC firmware. The old firmware was version 7.2.14 and the new is 7.2.20.
I then updated the host with 14 updates via VMware Update Manager, one of which was the VMware ESXi 5.0 Complete Update 2 (ESXi500-Update02 – KB2033751), but another was Updates VMware ESXi 5.0 net-tg3 vib (ESXi500-201212210-UG – KB2033752).
Upon rebooting the host, all the 5719/tg3 vmnics were missing from the Configuration – Hardware – Network Adapters section in vSphere Client. Not good.
I thought I’d try unloading and reloading the tg3 driver so I ran the following commands via SSH on the host:
~ # vmkload_mod -u tg3 vmkload_mod: Can not remove module tg3: Module not found ~ # vmkload_mod tg3 Module tg3 loaded successfully
So the tg3 driver wasn’t loaded, but did load when I asked it to. Now I’m seeing all the vmnics in the vSphere Client. But will they stick after a reboot?
No. I SSH’d back into the host and ran
~ # esxcfg-module -l ... tg3 Not Loaded
Had a look in /var/log/syslog.log and found eight of these messages each time the host boots:
2013-02-04T12:57:37Z jumpstart: VmkCtl: Loading module tg3 failed. Unable to load module /usr/lib/vmware/vmkmod/tg3: Bad parameter
Per KB1038247, I had a look for “tg3” in /etc/vmware/esx.conf and found the line:
/vmkernel/module/tg3/options = "force_netq=0,0,0,0,0,0,0,0"
which relates to the current issues with the tg3 driver and NetQueue. Alternatively you can use:
~ # esxcfg-module -g tg3 tg3 enabled = 1 options = 'force_netq=0,0,0,0,0,0,0,0'
The KB article says that editing the esx.conf file by hand is a bad idea so use the command:
~ # esxcfg-module -s "" tg3 ~ # esxcfg-module -g tg3 tg3 enabled = 1 options = ''
and reboot. Will it load now? Yes, I’m seeing all the vmnics I’d expect.
So does the newer 3.123b.v50.1 tg3 driver not like the force_netq option? I just issued the following command to get the driver version on a host that I’ve not applied the ESXi500-201212210-UG patch to, and it’s showing the same driver version!
~ # ethtool -i vmnic4 driver: tg3 version: 3.123b.v50.1 firmware-version: FFV7.2.14 bc 5719-v1.29 bus-info: 0000:43:00.0
Yet when I scan the host with VUM it is showing it as “Missing” the ESXi500-201212210-UG patch. Erm? Perhaps VUM looks at the patch rather than the driver version within the patch. But if the driver version before and after the patch install is the same, why is the new driver not starting with the force_netq option present?
The correct way to test this would be to put the option back and reboot. I’m not sure I can be bothered today, this wasn’t supposed to take all day… I think I might try the alternative suggestion and just disable NetQueue for the entire host by using:
esxcfg-advcfg -k FALSE netNetqueueEnabled
as I just noticed that (according to KB2035701) NetQueue is pointless with 1Gb NICs anyway, and whilst I do have two 10Gb NICs in these hosts, they’re not currently connected to anything.
Update 2013-02-05: I’ve just tried installing the tg3 “update” patch onto another host via VUM without doing any other updates, and all the tg3 vmnics have dissappeared. So that seems to be the culprit.
I should point out that I originally installed ESXi onto these R720 servers by using the Dell ESXi installer VMware-VMvisor-Installer-5.0.0.update1-623860.x86_64-Dell_Customized_RecoveryCD_A00.iso which must be where I got the 3.123b.v50.1 tg3 driver from.