Exporter for machine metrics
Find a file
Christopher Blum 1b98db9fa7 textfile example storcli enhancements (#1145)
* storcli.py: Remove IntEnum

This removes an external dependency.
Moved VD state to VD info labels

* storcli.py: Fix BBU health detection

BBU Status is 0 for a healthy cache vault and 32 for a healthy BBU.

* storcli.py: Strip all strings from PD

Strip all strings that we get from PDs.
They often contain whitespaces....

* storcli.py: Add formatting options

Add help text explaining how this documented was formatted

* storcli.py: Add DG to pd_info label

Add disk group to pd_info.
That way we can relate to PDs in the same DG.
For example to check if all disks in one RAID
use the same interface...

* storcli.py: Fix promtool issues

Fix linting issues reported by promtool check-metrics
* storcli.py: Exit if storcli reports issues

storcli reports if the command was a success.
We should not continue if there are issues.

* storcli.py: Try to parse metrics to float

This will sanitize the values we hand over to
node_exporter - eliminating any unforeseen values we read out...

* storcli.py: Refactor code to implement handle_sas_controller()

Move code into methods so that we can now also support HBA queries.
* storcli.py: Sort inputs

"...like a good python developer"
  - Daniel Swarbrick

* storcli.py: Replace external dateutil library with internal datetime

Removes external dependency...

* storcli.py: Also collect temperature on megaraid cards

We have already collected them on mpt3sas cards...

* storcli.py: Clean up old code

Removed dead code that is not used any more.

* storcli.py: strip() all information for labels

They often contain whitespaces...

* storcli.py: Try to catch KeyErrors generally

If some key we expect is not there, we will want to
still print whatever we have collected so far...

* storcli.py: Increment version number

We have made some changes here and there.
The general look of the data has not been changed.

* storcli.py: Fix CodeSpell issue

Split string to avoid issues with Codespell due to Celcius in JSON Key

Signed-off-by: Christopher Blum <zeichenanonym@web.de>
2018-11-07 17:12:23 +01:00
.circleci Fix promu config (#1119) 2018-10-20 08:21:51 +02:00
.github Add additional field to github issue template. (#645) 2017-08-17 12:44:26 +02:00
collector Merge pull request #1131 from prometheus/mdl-collector-export 2018-10-31 12:38:48 -04:00
docs Add compat rules for node_time, node_memory_ShmemHugePages and node_memory_ShmemPmdMapped (#1138) 2018-11-05 16:40:19 +01:00
examples launchctl example based on LaunchDaemons (#1102) 2018-10-10 12:44:05 +02:00
text_collector_examples textfile example storcli enhancements (#1145) 2018-11-07 17:12:23 +01:00
vendor Update vendor github.com/prometheus/client_golang/...@v0.9.0 (#1111) 2018-10-15 20:40:34 +02:00
.dockerignore New release process using docker, circleci and a centralized 2016-04-28 22:07:21 +02:00
.gitignore Ignore extracted sysfs fixture files from git 2017-07-20 14:36:48 -04:00
.promu-cgo.yml Update build (#1081) 2018-09-25 16:02:42 +02:00
.promu.yml Fix promu config (#1119) 2018-10-20 08:21:51 +02:00
CHANGELOG.md Collect additional common Infiniband counters (#1120) 2018-10-30 21:54:09 +01:00
checkmetrics.sh Makefile: add checkmetrics target, use in CI (#797) 2018-02-13 18:04:03 +01:00
CONTRIBUTING.md Document DCO in CONTRIBUTING.md 2018-04-16 12:51:12 +02:00
Dockerfile Using the recommended syntax for maintainer label (#1053) 2018-08-28 19:28:58 +02:00
Dockerfile.ppc64le Add dockerfile for ppc64le (#638) 2017-08-17 11:53:04 +02:00
end-to-end-test.sh Add sys/class/net parsing from procfs and expose its metrics (#851) 2018-07-16 15:08:18 +02:00
example-rules.yml Fix cpu utilization rule. 2018-05-17 18:15:07 +02:00
LICENSE License cleanup 2015-01-22 17:11:26 +01:00
MAINTAINERS.md Remove continue with label (#1084) 2018-10-05 13:20:30 +02:00
Makefile Makefile: add target for checking Prometheus rules (#1126) 2018-10-30 18:44:17 +01:00
Makefile.common Update build (#1010) 2018-07-23 09:38:39 +02:00
node_exporter.go Sort collector names in startup logs (#857) 2018-03-29 13:42:44 +01:00
node_exporter_test.go Remove unnecessary select statement (#692) 2017-10-18 07:38:48 +02:00
NOTICE Vendor github.com/mdlayher/wifi and dependencies 2017-01-10 11:29:00 -05:00
README.md Update readme (#1107) 2018-10-11 18:27:41 +02:00
test_image.sh Resolves prometheus/node_exporter#585 (#586) 2017-07-07 07:26:11 +02:00
ttar Vendor ttar from github.com/ideaship/ttar 2018-03-10 15:19:44 +01:00
VERSION Update for 0.17.0-rc.0 release (#1118) 2018-10-19 17:29:19 +02:00

Node exporter

CircleCI Buildkite status Docker Repository on Quay Docker Pulls Go Report Card

Prometheus exporter for hardware and OS metrics exposed by *NIX kernels, written in Go with pluggable metric collectors.

The WMI exporter is recommended for Windows users.

Collectors

There is varying support for collectors on each operating system. The tables below list all existing collectors and the supported systems.

Collectors are enabled by providing a --collector.<name> flag. Collectors that are enabled by default can be disabled by providing a --no-collector.<name> flag.

Enabled by default

Name Description OS
arp Exposes ARP statistics from /proc/net/arp. Linux
bcache Exposes bcache statistics from /sys/fs/bcache/. Linux
bonding Exposes the number of configured and active slaves of Linux bonding interfaces. Linux
boottime Exposes system boot time derived from the kern.boottime sysctl. Darwin, Dragonfly, FreeBSD, NetBSD, OpenBSD
conntrack Shows conntrack statistics (does nothing if no /proc/sys/net/netfilter/ present). Linux
cpu Exposes CPU statistics Darwin, Dragonfly, FreeBSD, Linux
diskstats Exposes disk I/O statistics. Darwin, Linux
edac Exposes error detection and correction statistics. Linux
entropy Exposes available entropy. Linux
exec Exposes execution statistics. Dragonfly, FreeBSD
filefd Exposes file descriptor statistics from /proc/sys/fs/file-nr. Linux
filesystem Exposes filesystem statistics, such as disk space used. Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
hwmon Expose hardware monitoring and sensor data from /sys/class/hwmon/. Linux
infiniband Exposes network statistics specific to InfiniBand and Intel OmniPath configurations. Linux
ipvs Exposes IPVS status from /proc/net/ip_vs and stats from /proc/net/ip_vs_stats. Linux
loadavg Exposes load average. Darwin, Dragonfly, FreeBSD, Linux, NetBSD, OpenBSD, Solaris
mdadm Exposes statistics about devices in /proc/mdstat (does nothing if no /proc/mdstat present). Linux
meminfo Exposes memory statistics. Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
netclass Exposes network interface info from /sys/class/net/ Linux
netdev Exposes network interface statistics such as bytes transferred. Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
netstat Exposes network statistics from /proc/net/netstat. This is the same information as netstat -s. Linux
nfs Exposes NFS client statistics from /proc/net/rpc/nfs. This is the same information as nfsstat -c. Linux
nfsd Exposes NFS kernel server statistics from /proc/net/rpc/nfsd. This is the same information as nfsstat -s. Linux
sockstat Exposes various statistics from /proc/net/sockstat. Linux
stat Exposes various statistics from /proc/stat. This includes boot time, forks and interrupts. Linux
textfile Exposes statistics read from local disk. The --collector.textfile.directory flag must be set. any
time Exposes the current system time. any
timex Exposes selected adjtimex(2) system call stats. Linux
uname Exposes system information as provided by the uname system call. Linux
vmstat Exposes statistics from /proc/vmstat. Linux
xfs Exposes XFS runtime statistics. Linux (kernel 4.4+)
zfs Exposes ZFS performance statistics. Linux

Disabled by default

Name Description OS
buddyinfo Exposes statistics of memory fragments as reported by /proc/buddyinfo. Linux
devstat Exposes device statistics Dragonfly, FreeBSD
drbd Exposes Distributed Replicated Block Device statistics (to version 8.4) Linux
interrupts Exposes detailed interrupts statistics. Linux, OpenBSD
ksmd Exposes kernel and system statistics from /sys/kernel/mm/ksm. Linux
logind Exposes session counts from logind. Linux
meminfo_numa Exposes memory statistics from /proc/meminfo_numa. Linux
mountstats Exposes filesystem statistics from /proc/self/mountstats. Exposes detailed NFS client statistics. Linux
ntp Exposes local NTP daemon health to check time any
processes Exposes aggregate process statistics from /proc. Linux
qdisc Exposes queuing discipline statistics Linux
runit Exposes service status from runit. any
supervisord Exposes service status from supervisord. any
systemd Exposes service and system status from systemd. Linux
tcpstat Exposes TCP connection status information from /proc/net/tcp and /proc/net/tcp6. (Warning: the current version has potential performance issues in high load situations.) Linux
wifi Exposes WiFi device and station statistics. Linux

Textfile Collector

The textfile collector is similar to the Pushgateway, in that it allows exporting of statistics from batch jobs. It can also be used to export static metrics, such as what role a machine has. The Pushgateway should be used for service-level metrics. The textfile module is for metrics that are tied to a machine.

To use it, set the --collector.textfile.directory flag on the Node exporter. The collector will parse all files in that directory matching the glob *.prom using the text format.

To atomically push completion time for a cron job:

echo my_batch_job_completion_time $(date +%s) > /path/to/directory/my_batch_job.prom.$$
mv /path/to/directory/my_batch_job.prom.$$ /path/to/directory/my_batch_job.prom

To statically set roles for a machine using labels:

echo 'role{role="application_server"} 1' > /path/to/directory/role.prom.$$
mv /path/to/directory/role.prom.$$ /path/to/directory/role.prom

Filtering enabled collectors

The node_exporter will expose all metrics from enabled collectors by default. This is the recommended way to collect metrics to avoid errors when comparing metrics of different families.

For advanced use the node_exporter can be passed an optional list of collectors to filter metrics. The collect[] parameter may be used multiple times. In Prometheus configuration you can use this syntax under the scrape config.

  params:
    collect[]:
      - foo
      - bar

This can be useful for having different Prometheus servers collect specific metrics from nodes.

Building and running

Prerequisites:

Building:

go get github.com/prometheus/node_exporter
cd ${GOPATH-$HOME/go}/src/github.com/prometheus/node_exporter
make
./node_exporter <flags>

To see all available configuration flags:

./node_exporter -h

Running tests

make test

Using Docker

The node_exporter is designed to monitor the host system. It's not recommended to deploy it as a Docker container because it requires access to the host system. Be aware that any non-root mount points you want to monitor will need to be bind-mounted into the container. If you start container for host monitoring, specify path.rootfs argument. This argument must match path in bind-mount of host root. The node_exporter will use path.rootfs as prefix to access host filesystem.

docker run -d \
  --net="host" \
  --pid="host" \
  -v "/:/host:ro,rslave" \
  quay.io/prometheus/node-exporter \
  --path.rootfs /host

On some systems, the timex collector requires an additional Docker flag, --cap-add=SYS_TIME, in order to access the required syscalls.

Using a third-party repository for RHEL/CentOS/Fedora

There is a community-supplied COPR repository which closely follows upstream releases.