Different Methods to Get ESX Host Hardware Alerts via Email
Basically, there are 3 methods to get instant email alert via email by using VMware vCenter, Dell iDRAC and Dell IT Assistant (ITA) which I will focus the most, 2 of them are specific to Dell Poweredge Serer and ITA solution.
Method 1: How to get hardware failure alert with vCenter
This is the easiest but you do need to have vCenter, so it may not be a viable solution for those using free ESXi (there are scripts to get alert for free ESXi, but it’s not the content of today’s topic).
From the top of the hierarchy in vCenter, click Alarms, then New Alarm, give it a name say “Host Hardware Health Monitor”, in Triggers, Add, select “Hardware Health Changed” under Event and “Warning” for Status, Add another one with the same parameter except “Alert” for Status. Finally, for Actions, choose “Send a notification email” under Action and put your email address there.
Of course, you need to configure SMTP setting in vCenter Server Settings first.
Method 2: How to get hardware failure alert with Dell iDRAC
This is probably is even more simple than the above, but it does not report all of the hardware failure in ESX Host, so far I can say it doesn’t report harddisk failure which is very critical for many, so I would call this is a half working or a handicapped solution.
Login to iDRAC, under Alerts, setup Email Alerts and SMTP server, you will need to setup a SMTP server on your dedicated DRAC network to receive such alerts and forward those email alert to your main email server on external. Under Platform Events, you need to CHECK Enable Platform Events Filter Alerts and leave all the default as it is. As you have probably found out already and scratching your head now, how come Dell didn’t include Storage Wanring/Critical Assert Filter? For that question, you need to ask Michael Dell directly.
Btw, I am using iDRAC6, so not sure if your firmware contains such feature.
Method 3: How to get hardware failure alert with Dell IT Assistant (ITA)
This is actually today’s main topic I would like to focus on, it is the proper way to implement host alert via SNMP and SNMP Trap and it does provide a complete solution, but quite time-consuming and a bit difficult to setup. I tried to consolidate all the difficult part, eliminated all the unnecessary steps and use as much GUI as possible without going into CLI.
- Install ITA latest version which is 8.8 (while 8.9 is coming, but still not available for download). One thing you need to take care is to put the ITA network within the same management network of ESX Hosts or add a NIC that connects to the server network that need to be monitored.
- Install OSMA 6.3 or above (6.5 is on the way) on ESX 4.1 Hosts, as I found OSMA version 6.3 is already configured with some important necessary steps like SNMP trap setting to be used later.
- Edit the SNMP conf file under /etc/snmp/snmpd.conf, replace public with your own community_stringe.g. com2sec notConfigUser default public
- Restart the SNMPD service by /sbin/service snmpd restart.
- Enable SNMP Server under Security Profile using vSphere Client GUI, that will enable UPD Port 161 for receiving and UPD Port 162 for sending out SNMP Traps.
- Start to discover and inventory in ITA, you will find ESX hosts are added to Server Section. This completes the Pull side (ie, ITA Pull stuff from ESX Hosts), next we need to setup the Push side (ie, ESX Hosts Push alerts to ITA)
- Done? Not Yet, in order for ESX host to send snmp trap to ITA , you will need to specify the communities and trap targets with the command using VMware PowerCLI.
vicfg-snmp.pl –server <hostname> –username <username> –password <password> -t <target hostname>@<port>/<community>
For example, to send SNMP traps from the host esx_host_ip to port 162 on ita_ip using the ita_community_string, use the command:
vicfg-snmp.pl –server esx_host_ip –username root –password password -t ita_ip@162/ita_community_string
for multiple targets, use , to seperate the rest trap targets:
vicfg-snmp.pl –server esx_host_ip –username root –password password -t ita_ip@162/ita_community_string, ita_ip2@162/ita_community_string
To show and test if it’s working
vicfg-snmp.pl –server esx_host_ip –username root –password password — show
vicfg-snmp.pl –server esx_host_ip –username root –password password — test - Remove all VM related alerts from Alert Categories under ITA, leaving ONLY vmwEnvHardwareEvent as I only want ITA to report EXS Host Server Hardware related warning or critical alerts. The reason is I found ESX sometimes generate many useless false alarms (e.g., “Virtual machine detects a loss in guest heartbeat”) regarding VM’s heardbeat which is related to VMTools installed in the VM.
Remember to enable UPD Port 162 on ITA server firewall. Simply treat ITA as a software device to receive SNMP Trap sent from various monitoring hosts.
Another thing is for Windows hosts to send out SNMP Trap, you will also need to go to SNMP Service under the Traps tab, configure the snmp trap ita_community_string and the IP address of the trap destination which should be the same as ita_ip.
So I did a test by pulling one of the Power Supply on ESX Host, and I get the following alert results in my inbox.
From ITA:
Device:sXXX ip address, Service Tag: XXXXXXX, Asset Tag:, Date:03/22/11, Time:23:18:38:000, Severity:Warning, Message:Name: System Board 1 PS Redundancy 0 Status: Redundancy
From iDRAC:
Message: iDRAC Alert (s002)
Event: PS 2 Status: Power Supply sensor for PS 2, input lost was deasserted
Date/Time: Tue Mar 22 2011 23:26:18
Severity: Normal
Model: PowerEdge RXXX
Service Tag: XXXXXXX
BIOS version: 2.1.15
Hostname: sXXX
OS Name: VMware ESX 4.1.0 build-XXXXXXXX
iDrac version: 1.54
From vCenter:
Target: xxx.xxx.xxx.xxx Previous Status: Gray New Status: Yellow Alarm Definition: ([Event alarm expression: Hardware Health Changed; Status = Yellow] OR [Event alarm expression: Hardware Health Changed; Status = Red]) Event details: Health of Power changed from green to red.
What’s More
Actually there is Method 4 which uses Veeam Monitor (free version) to send email, but I haven’t got time to check that out, if you know how to do it, please drop me a line, thanks.
Finally, I would strongly suggest Dell to implement a trigger that will send out email alert directly from OpenManage itself, it’s simple and works for most of the SMB ESX Host scenario that contains less than 10 hosts in general, you can say this is Method Number 5.
Update Mar-24:
I got ITA working for PowerConnect switch as well, so my PowerConnect can now send SNMP trap back to ITA and generate an email if there is warning/critical issue, it’s really simple to setup PowerConnect’s SNMP community and SNMP trap setting, and I start to like ITA now, glad I am not longer struggling with DMC 2.0.
Finally, there is a very good document about setting up SNMP and SNMP Traps from Dell.
Update Aug-24:
If you are only interested to know if any of your server harddisk failed, then you can install LSI Megaraid Storage Manager which has the build-in email alert capability.