Alexander Schwartz

Checking DELL PowerEdge Servers with nagios

As of Version 5.1, the Dell Open Manage Server Assistant (OMSA) can be installed on a large variety of Dell PowerEdge Servers running either Windows 2000/2003, RHEL3/4 or SLES 9/10. In my opinion, the best way to monitor this server hardware is remotely via SNMP - and the OMSA-delivered SNMP subagent fits in here perfectly. Most of the plugin logic is based on the MIB Documentation provided by DELL. [...]

(Steffen Roegner, original maintainer)

I have taken the task to maintain and improve this piece of software, as Steffen doesn't have access to Dell hardware at his new job.

(Alexander Schwartz, other maintainer)

Synopsis

The check_omsa_snmp plugin uses a number of SNMP requests to find out about certain areas of system health, these areas being referred to as groups. For the moment, the groups covered are:
  • Memory
  • Power Group
    • Power Supply Table
    • Voltage Probe Table
  • Thermal Group
    • Cooling Device Table
    • Temperature Probe Table
What the plugin basically does is to read the main status OIDs of these Groups and (if applicable) return the ok status to Nagios. If one of the groups fails, the plugin will try to get some more in-depth information from the group's status entries; this way the number and status of the failed component will be returned as plugin output. What turned out to be rather difficult is the mapping of DellStatus or DellStatusProbe Variable output to a matching Nagios. The problem is that testing this would (IMO) require real-world testing as opposed to just, say, a mock SNMP agent. Here is the mapping currently in use, any suggestions are appreciated:


Nagios Status Dell Status Dell Status Description
CRITICAL other(1) The object's status is not one of the following:
UNKNOWN unknown(2) The status of the object is unknown.
OK ok(3) The status of the object is OK.
WARNING nonCriticalUpper(4) The object is at the noncritical upper limit.
CRITICAL CriticalUpper(5) The object is at the critical upper limit.
CRITICAL nonRecoverableUpper(6) The object is at the nonrecoverable upper limit.
WARNING nonCriticalLower(7) The object is at the noncritical lower limit.
CRITICAL criticalLower(8) The object is at the critical lower limit.
CRITICAL nonRecoverableLower(9) The object is at the nonrecoverable lower limit.
CRITICAL failed(10) The status of the object is failed.

Samples

Here are some examples about the use and look-alike of the plugin:
[ntest@pls-2 nagios]$ ./plugins/check_omsa_snmp.pl
Host name not specified

[ntest@pls-2 nagios]$ ./plugins/check_omsa_snmp.pl -H ne-01 -C correctcommunity
Error connecting ne-01: Unable to resolve UDP/IPv4 address 'ne-01'

[ntest@pls-2 nagios]$ echo $?
3

[ntest@pls-2 nagios]$ ./plugins/check_omsa_snmp.pl -H pe-01 -C wrongcommunity
No response from remote host 'pe-01', check the host and SNMP community parameters

[ntest@pls-2 nagios]$ ./plugins/check_omsa_snmp.pl -H pe-01 -C correctcomunity -G WrongGroup
Group unknown: 'WrongGroup'. Avaliable Groups are PowerSupply, TemperaturProbe, VoltageProbe, CoolingDevice, All (default)

[ntest@pls-2 nagios]$ echo $?
3

[ntest@pls-2 nagios]$ ./plugins/check_omsa_snmp.pl -H pe-01 -C correctcomunity -G TemperatureProbe
TemperatureProbe is ok
[ntest@pls-2 nagios]$ echo $?
0

[ntest@pls-2 nagios]$ ./plugins/check_omsa_snmp.pl -H pe-01 -C correctcomunity
PowerSupply, TemperatureProbe, VoltageProbe, CoolingDevice are ok
[ntest@pls-2 nagios]$ echo $?
0

[ntest@pls-2 nagios]$ ./plugins/check_omsa_snmp.pl -H pe-01 -C correctcomunity
Power Supply 1 is critical
[ntest@pls-2 nagios]$ echo $?
2

Download

To download the plugin, please go here: Link to MonitorExchange