Do not update firmware/BIOS from within ESX console
Today I’ve learnt the hard way, and with the help of Dell ProSupport, I was able to rectify the problem that almost rendered my Poweredge R710 non-bootable.
- I saw there is a new BIOS update (FW 2.1.15, released on September 13, 2010) on Dell’s web site for Poweredge R710/R610 today. As described it fixed some serious problem that hang the server especially if you are using Xeon Westmere 5600 series, so it’s strongly suggested. Then I’ve upgrade the BIOS on R610 without any problem as it’s a Windows Server 2008 R2.
- The big problem came now, how to update BIOS/Firmware for servers running ESX? The answer is simple, take the host to Maintenance mode (VMotion all the VM off that host off course), then reboot, press F10 to go into Unified Server Configuration (USC), then setup IP and DNS server address, then use Update System, then use FTP to grab a bunch of updates directly from Dell’s FTP server, sounds easy right? Yes, it should, but what if Dell has not updated the catalog.xml which contains the latest BIOS path? Like us, today is Oct 2, 2010, and Dell still hasn’t update that important file, leaving every R710 having the existing and available BIOS same as 2.1.09, What the Hack! So you stuck, as there is no way you can easily update your BIOS, there is no Virtual Floppy anymore in iDRAC6, if there is, then I can simply boot into DOS and then attach another ISO contains the BIOS. Or shall I say I do not know where to download a bootable DOS image (ISO).
- Now I have boot my ESX again as USC method failed, I start to Google around and called Pro-Support, they suggest running the linux version of BIOS update programe.BIN directly from ESX console, ok, some source from Google saying it’s doable, then I use FastSCP to upload the BIN to /tmp, and then Putty into the server, then chmod u+x BIOS.BIN, then ./BIOS.BIN, after pressing “q”, it asked me if I want to continue to update BIOS (Y/N), I pressed Y, then after 5 seconds it stopped saying Update failed!
- Then the “BEAUTY” CAME! When I issue a reboot from vCenter, it just hang there viewing from iDRAC6’s console, “Waiting for Inventory Collector to finish” with many “……” counting, then after 20 mins, the server finally reboot itself, I tried reboot it again and it just hang again and this time, I used Reset from iDRAC6, then I found there is no more F10 available as it’s saying System Service is NOT AVAIABLE! What!!! Then Dell Pro-Support told me to go to iDRAC by Ctrl+E, then set Cancel System Service to YES, it will clear the fail state and bring back F10 after exit iDRAC. THIS IS DEFINITELY NOT GOOD! SOMETHING in the ./BIOS.BIN script MUST HAVE changed my server setting!!!
- I searched through Google and luckily I found Dell’s official KB.After OpenManage Server Administrator 6.3 is installed on ESX 4.1, when the system is rebooted, the system may not reboot until the Inventory Collector has completed. A message may be displayed that states “Waiting for Inventory Collector to Finish”. The system will not reboot for approximately 15 to 20 minutes. Note: This issue can also affect the Server Update Utility (SUU) and Dell Update Packages (DUPs).The key to fix it is to issue command “chkconfig usbarbitrator off” to turn off usbarbitrator.
- Dell Pro-Support Level 2 engineer told me to type a list of things- “chkconfig –list” to show the Linux configure
- “cat /etc/redhat-distro” to show the service console is actually RHEL 5.0, then I google around and found others also failed when directly updating server firmware as it’s not compatible with the general Redhat Linux may be.
- “service usbarbitrator stop” to stop usbarbitrator service
- “ps aux |grep usb” again to show usbarbitrator is no longer running
- finally issue “chkconfig usbarbitrator off” to permanently disable usbarbitrator service. - Finally I compared the original system config using “chkconfig –list” with my other untouched R710s, I found the only line has been changed is usbarbitrator 3:on, it should be 3:off!!! So the ./BIOS.BIN must have changed that in between and failed to update BIOS after that, and it didn’t roll back, so my system configuration has been changed! Dell’s KB 374107 didn’t specify and indicating the original ESX 4.1 system configure usbarbitrator is indeed with 3:off!
Why Dell still hasn’t update the catalog.xml in their FTP (both ftp.dell .com and ftp.us.dell.com), the BIOS has been released for two weeks? Anyway, I will wait till the end of October and try to use USC to update it again.
The following is quoted from the official Dell Update Packages README for Linux
* Due to the USB arbitration services of VMWare ESX 4.1, the USB devices appear invisible to the Hypervisor. So, when DUPs or the Inventory Collector runs on the Managed Node, the partitions exposed as USB devices are not shown, and it reaches the timeout after 15 to 20 minutes.
This timeout occurs in the following cases:
* If you run DUPs or Inventory Collector on VMware ESX 4.1, the partitions exposed as USB devices are not visible due to the USB arbitration service of VMware ESX 4.1 and timeout occurs.
The timeout occurs in the following instances:
• When you start “DSM SA Shared Service” on the VMware ESX 4.1 managed node, it runs Inventory Collector. To work around this issue, uninstall Server Administrator or wait until the Inventory Collector completes execution before attempting to stop the “DSM SA Shared Service”.
• When you manually try to run DUPs or the Inventory Collector on the VMware ESX 4.1 managed node while USB arbitration service is running. To fix the issue, stop the USB arbitration service and run the DUPs or the Inventory Collector.
To stop the USB arbitration service:
1. Use the “ps aux|grep” usb to check if the USB arbitration
service is running.
2. Use the “chkconfig usbarbitrator off” command to prevent the USB
arbitration service from starting during boot.
3. After you stop the usbarbitrator, reboot the server to allow the
DUPs and/or the Inventory collector to run.Note: If you require the usbarbitrator, enable it manually. To enable the usbarbitrator, run the command – chkconfig usbarbitrator on.
Update: April 6, 2012
* The USB arbitration service of VMWare ESX 4.1 makes the USB devices invisible to the Hypervisor. So, when DUPs or the Inventory Collector runs on the MN, the partitions exposed as USB devices are not shown, and it reaches the timeout after 15 to 20 minutes. This timeout occurs in the following cases:
When you start “DSM SA Shared Service” on the VMware ESX 4.1 managed node, it runs the Inventory Collector. While the USB arbitration service is running, you must wait for 15 to 20 minutes for the Inventory collector to complete the execution before attempting to stop this service, or uninstall Server Administrator.
When you manually run the Inventory Collector (invcol) on the VMware ESX 4.1 managed node while the USB arbitration service is running, you must wait for 15 to 20 minutes before the operations end. The invcol output file has the following:
<InventoryError lang=”en”>
<SPStatus result=”false” module=”MaserIE -i”>
<Message> Inventory Failure: Partition Failure – Attach
partition has failed</Message>
</SPStatus><SPStatus result=”false” module=”MaserIE -i”>
<Message>Invalid inventory results.</Message>
</SPStatus><SPStatus result=”false”>To fix the issue, stop the USB arbitration service and run the DUPs, or Inventory Collector.
Do the following to stop the USB arbitration service:1. Use ps aux | grep usb to find out if the USB arbitration service is running.
2. To stop the USB arbitration service from starting up at bootup, use chkconfig usbarbitrator off.
3. Reboot the server after stopping the usbarbitrator to allow the
DUPs and/or the Inventory collector to run.If you require the usbarbitor, enable it manually. To enable the usbarbitrator, run the command – chkconfig usbarbitrator on. (373924)