Collect metrics from the StorCLI utility on the health of MegaRAID
hardware RAID controllers and write them to stdout so that they can be
used by the textfile collector.
We parse the JSON output that StorCLI provides.
Script must be run as root or with appropriate capabilities for storcli
to access the RAID card.
Designed to run under Python 2.7, using the system Python provided with
many Linux distributions.
The metrics look like this:
mbostock@host:~$ sudo ./storcli.py
megaraid_status_code 0
megaraid_controllers_count 1
megaraid_emergency_hot_spare{controller="0"} 1
megaraid_scheduled_patrol_read{controller="0"} 1
megaraid_virtual_drives{controller="0"} 1
megaraid_drive_groups{controller="0"} 1
megaraid_virtual_drives_optimal{controller="0"} 1
megaraid_degraded{controller="0"} 0
megaraid_battery_backup_healthy{controller="0"} 1
megaraid_ports{controller="0"} 8
megaraid_failed{controller="0"} 0
megaraid_drive_groups_optimal{controller="0"} 1
megaraid_healthy{controller="0"} 1
megaraid_physical_drives{controller="0"} 24
megaraid_controller_info{controller="0", model="AVAGOMegaRAIDSASPCIExpressROMB"} 1
mbostock@host:~$
This change adds a new collector called "nfs" that parses the contents
of /proc/net/rpc/nfs and turns it into metrics. It can be used to
inspect the number of operations per type, but also to keep an eye on an
extraneous number of retransmissions, which may indicate connectivity
issues.
I've picked the name "nfs", as most operating systems use "nfs" for the
client component and "nfsd" as the server component. If we want to add
stats for the NFS server as well, we'd better call such a collector
"nfsd".
Add a utility to parse the output of `smartctl`.
* Scans all disks.
* Prints metrics for `smartctl --info`.
* Prints metrics for `smartctl --attributes`.
We seem to have a small number of Linux servers here that have lines in
/proc/mdstat that cannot be parsed by the node exporter, due to them
containing attributes that are not matched by the regular expression
("super 1.2").
Extend the regular expression to skip this data, just like we do for all
of the other status lines.