2018-05-30 10:14:54 -07:00
# Node exporter
2014-02-07 08:09:39 -08:00
2016-01-24 14:38:06 -08:00
[![CircleCI ](https://circleci.com/gh/prometheus/node_exporter/tree/master.svg?style=shield )][circleci]
2017-08-24 03:29:34 -07:00
[![Buildkite status ](https://badge.buildkite.com/94a0c1fb00b1f46883219c256efe9ce01d63b6505f3a942f9b.svg )](https://buildkite.com/prometheus/node-exporter)
2016-04-30 10:14:29 -07:00
[![Docker Repository on Quay ](https://quay.io/repository/prometheus/node-exporter/status )][quay]
[![Docker Pulls ](https://img.shields.io/docker/pulls/prom/node-exporter.svg?maxAge=604800 )][hub]
2017-02-28 09:49:09 -08:00
[![Go Report Card ](https://goreportcard.com/badge/github.com/prometheus/node_exporter )][goreportcard]
2015-03-18 19:04:29 -07:00
2017-01-23 11:57:08 -08:00
Prometheus exporter for hardware and OS metrics exposed by \*NIX kernels, written
2016-12-22 08:38:25 -08:00
in Go with pluggable metric collectors.
2014-02-07 08:09:39 -08:00
2020-06-01 12:07:42 -07:00
The [Windows exporter ](https://github.com/prometheus-community/windows_exporter ) is recommended for Windows users.
2018-11-19 10:35:01 -08:00
To expose NVIDIA GPU metrics, [prometheus-dcgm
2020-07-12 02:10:51 -07:00
](https://github.com/NVIDIA/gpu-monitoring-tools#dcgm-exporter)
2018-11-19 10:35:01 -08:00
can be used.
2017-01-23 11:57:08 -08:00
2020-11-25 08:07:59 -08:00
## Installation and Usage
If you are new to Prometheus and `node_exporter` there is a [simple step-by-step guide ](https://prometheus.io/docs/guides/node-exporter/ ).
2021-01-31 08:42:49 -08:00
The `node_exporter` listens on HTTP port 9100 by default. See the `--help` output for more options.
2020-11-25 08:07:59 -08:00
### Ansible
For automated installs with [Ansible ](https://www.ansible.com/ ), there is the [Cloud Alchemy role ](https://github.com/cloudalchemy/ansible-node-exporter ).
### RHEL/CentOS/Fedora
There is a [community-supplied COPR repository ](https://copr.fedorainfracloud.org/coprs/ibotty/prometheus-exporters/ ) which closely follows upstream releases.
### Docker
The `node_exporter` is designed to monitor the host system. It's not recommended
to deploy it as a Docker container because it requires access to the host system.
For situations where Docker deployment is needed, some extra flags must be used to allow
the `node_exporter` access to the host namespaces.
Be aware that any non-root mount points you want to monitor will need to be bind-mounted
into the container.
If you start container for host monitoring, specify `path.rootfs` argument.
This argument must match path in bind-mount of host root. The node\_exporter will use
`path.rootfs` as prefix to access host filesystem.
```bash
docker run -d \
--net="host" \
--pid="host" \
-v "/:/host:ro,rslave" \
quay.io/prometheus/node-exporter:latest \
--path.rootfs=/host
```
For Docker compose, similar flag changes are needed.
```yaml
---
version: '3.8'
services:
node_exporter:
image: quay.io/prometheus/node-exporter:latest
container_name: node_exporter
command:
- '--path.rootfs=/host'
network_mode: host
pid: host
restart: unless-stopped
volumes:
- '/:/host:ro,rslave'
```
On some systems, the `timex` collector requires an additional Docker flag,
`--cap-add=SYS_TIME` , in order to access the required syscalls.
2016-01-21 15:09:24 -08:00
## Collectors
2015-02-09 13:41:51 -08:00
2016-01-21 15:09:24 -08:00
There is varying support for collectors on each operating system. The tables
below list all existing collectors and the supported systems.
2015-02-09 02:41:04 -08:00
2017-09-28 06:06:26 -07:00
Collectors are enabled by providing a `--collector.<name>` flag.
Collectors that are enabled by default can be disabled by providing a `--no-collector.<name>` flag.
2021-02-11 10:25:50 -08:00
To enable only some specific collector(s), use `--collector.disable-defaults --collector.<name> ...` .
2015-02-09 02:41:04 -08:00
### Enabled by default
2016-01-21 15:09:24 -08:00
Name | Description | OS
---------|-------------|----
2017-04-11 08:45:19 -07:00
arp | Exposes ARP statistics from `/proc/net/arp` . | Linux
Add bcache collector (#597)
* Add bcache collector for Linux
This collector gathers metrics related to the Linux block cache
(bcache) from sysfs.
* Removed commented out code
* Use project comment style
* Add _sectors to metric name to indicate unit
* Really use project comment style
* Rename bcache.go to bcache_linux.go
* Keep collector namespace clean
Rename:
- metric -> bcacheMetric
- periodStatsToMetrics -> bcachePeriodStatsToMetric
* Shorten slice initialization
* Change label names to backing_device, cache_device
* Remove five minute metrics (keep only total)
* Include units in additional metric names
* Enable bcache collector by default
* Provide metrics in seconds, not nanoseconds
* remove metrics with label "all"
* Add fixtures, update end-to-end for bcache collector
* Move fixtures/sys into tar.gz
This changeset moves the collector/fixtures/sys directory into
collector/fixtures/sys.tar.gz and tweaks the Makefile to unpack the
tarball before tests are run.
The reason for this change is that Windows does not allow colons in a
path (colons are present in some of the bcache fixture files), nor can
it (out of the box) deal with pathnames longer than 260 characters
(which we would be increasingly likely to hit if we tried to replace
colons with longer codes that are guaranteed not the turn up in regular
file names).
* Add ttar: plain text archive, replacement for tar
This changeset adds ttar, a plain text replacement for tar, and uses it
for the sysfs fixture archive. The syntax is loosely based on tar(1).
Using a plain text archive makes it possible to review changes without
downloading and extracting the archive. Also, when working on the repo,
git diff and git log become useful again, allowing a committer to verify
and track changes over time.
The code is written in bash, because bash is available out of the box on
all major flavors of Linux and on macOS. The feature set used is
restricted to bash version 3.2 because that is what Apple is still
shipping.
The programm also works on Windows if bash is installed. Obviously, it
does not solve the Windows limitations (path length limited to 260
characters, no symbolic links) that prompted the move to an archive
format in the first place.
2017-07-06 22:20:18 -07:00
bcache | Exposes bcache statistics from `/sys/fs/bcache/` . | Linux
2018-03-29 07:18:12 -07:00
bonding | Exposes the number of configured and active slaves of Linux bonding interfaces. | Linux
2020-11-23 14:16:27 -08:00
btrfs | Exposes btrfs statistics | Linux
2019-01-12 04:33:56 -08:00
boottime | Exposes system boot time derived from the `kern.boottime` sysctl. | Darwin, Dragonfly, FreeBSD, NetBSD, OpenBSD, Solaris
2016-01-21 15:09:24 -08:00
conntrack | Shows conntrack statistics (does nothing if no `/proc/sys/net/netfilter/` present). | Linux
2020-07-02 05:43:14 -07:00
cpu | Exposes CPU statistics | Darwin, Dragonfly, FreeBSD, Linux, Solaris, OpenBSD
2019-02-19 08:22:54 -08:00
cpufreq | Exposes CPU frequency statistics | Linux, Solaris
2019-02-06 02:36:22 -08:00
diskstats | Exposes disk I/O statistics. | Darwin, Linux, OpenBSD
2016-08-16 08:10:23 -07:00
edac | Exposes error detection and correction statistics. | Linux
2016-01-21 15:09:24 -08:00
entropy | Exposes available entropy. | Linux
2017-02-28 13:23:10 -08:00
exec | Exposes execution statistics. | Dragonfly, FreeBSD
2021-02-02 15:05:24 -08:00
fibrechannel | Exposes fibre channel information and statistics from `/sys/class/fc_host/` . | Linux
2016-06-20 09:09:13 -07:00
filefd | Exposes file descriptor statistics from `/proc/sys/fs/file-nr` . | Linux
2017-01-04 03:30:48 -08:00
filesystem | Exposes filesystem statistics, such as disk space used. | Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
2016-10-06 08:33:24 -07:00
hwmon | Expose hardware monitoring and sensor data from `/sys/class/hwmon/` . | Linux
2017-08-15 22:32:54 -07:00
infiniband | Exposes network statistics specific to InfiniBand and Intel OmniPath configurations. | Linux
2017-07-26 06:20:28 -07:00
ipvs | Exposes IPVS status from `/proc/net/ip_vs` and stats from `/proc/net/ip_vs_stats` . | Linux
2016-01-21 15:09:24 -08:00
loadavg | Exposes load average. | Darwin, Dragonfly, FreeBSD, Linux, NetBSD, OpenBSD, Solaris
mdadm | Exposes statistics about devices in `/proc/mdstat` (does nothing if no `/proc/mdstat` present). | Linux
2018-01-04 01:33:57 -08:00
meminfo | Exposes memory statistics. | Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
2018-07-16 06:08:18 -07:00
netclass | Exposes network interface info from `/sys/class/net/` | Linux
2017-01-04 03:30:48 -08:00
netdev | Exposes network interface statistics such as bytes transferred. | Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
2016-01-21 15:09:24 -08:00
netstat | Exposes network statistics from `/proc/net/netstat` . This is the same information as `netstat -s` . | Linux
2018-02-16 06:42:47 -08:00
nfs | Exposes NFS client statistics from `/proc/net/rpc/nfs` . This is the same information as `nfsstat -c` . | Linux
nfsd | Exposes NFS kernel server statistics from `/proc/net/rpc/nfsd` . This is the same information as `nfsstat -s` . | Linux
2021-07-06 01:20:47 -07:00
nvme | Exposes NVMe info from `/sys/class/nvme/` | Linux
2020-11-23 14:23:43 -08:00
powersupplyclass | Exposes Power Supply statistics from `/sys/class/power_supply` | Linux
2020-11-24 09:33:54 -08:00
pressure | Exposes pressure stall statistics from `/proc/pressure/` . | Linux (kernel 4.20+ and/or [CONFIG\_PSI ](https://www.kernel.org/doc/html/latest/accounting/psi.html ))
2020-01-17 04:32:16 -08:00
rapl | Exposes various statistics from `/sys/class/powercap` . | Linux
2019-07-10 00:16:24 -07:00
schedstat | Exposes task scheduler statistics from `/proc/schedstat` . | Linux
2017-01-25 19:06:10 -08:00
sockstat | Exposes various statistics from `/proc/net/sockstat` . | Linux
2019-12-29 16:36:10 -08:00
softnet | Exposes statistics from `/proc/net/softnet_stat` . | Linux
2017-06-13 02:21:53 -07:00
stat | Exposes various statistics from `/proc/stat` . This includes boot time, forks and interrupts. | Linux
2021-07-12 07:56:17 -07:00
tapestats | Exposes statistics from `/sys/class/scsi_tape` . | Linux
2016-01-21 15:09:24 -08:00
textfile | Exposes statistics read from local disk. The `--collector.textfile.directory` flag must be set. | _any_
2019-08-11 20:52:16 -07:00
thermal\_zone | Exposes thermal zone & cooling device statistics from `/sys/class/thermal` . | Linux
2016-01-21 15:09:24 -08:00
time | Exposes the current system time. | _any_
Add timex collector (#664)
This collector is based on adjtimex(2) system call. The collector returns
three values, status if time is synchronised, offset to remote reference,
and local clock frequency adjustment.
Values are taken from kernel time keeping data structures to avoid getting
involved how the synchronisation is implemented. By that I mean one should
not care if time is update using ntpd, systemd.timesyncd, ptpd, and so on.
Since all time sync implementation will always end up telling to kernel what
is the status with time one can simply omit the software in between, and
look results of the syncing. As a positive side effect this makes collector
very quick and conceptually specific, this does not monitor availability of
NTP server, or network in between, or dns resolution, and other unrelated
but necessary things.
Minimum set of values to keep eye on are the following three:
The node_timex_sync_status tells if local clock is in sync with a remote
clock. Value is set to zero when synchronisation to a reliable server
is lost, or a time sync software is misconfigured.
The node_timex_offset_seconds tells how much local clock is off when
compared to reference. In case of multiple time references this value
is outcome of RFC 5905 adjustment algorithm. Ideally offset should be
close to zero, and it depends about use case how large value is
acceptable. For example a typical web server is probably fine if offset
is about 0.1 or less, but that would not be good enough for mobile phone
base station operator.
The node_timex_freq tells amount of adjustment to local clock tick
frequency. For example if offset is one second and growing the local
clock will need instruction to tick quicker. Number value itself is not
very important, and occasional small adjustments are fine. When
frequency is unusually in stable one can assume quality of time stamps
will not be accurate to very far in sub second range. Obviously
explaining why local clock frequency behaves like a passenger in roller
coaster is different matter. Explanations can vary from system load, to
environmental issues such as a machine being physically too hot.
Rest of the measurements can help when debugging. If you run a clock server
do probably want to collect and keep track of everything.
Pull-request: https://github.com/prometheus/node_exporter/pull/664
2017-09-19 07:54:06 -07:00
timex | Exposes selected adjtimex(2) system call stats. | Linux
2020-03-31 01:46:32 -07:00
udp_queues | Exposes UDP total lengths of the rx_queue and tx_queue from `/proc/net/udp` and `/proc/net/udp6` . | Linux
2019-08-03 03:32:43 -07:00
uname | Exposes system information as provided by the uname system call. | Darwin, FreeBSD, Linux, OpenBSD
2016-01-21 15:09:24 -08:00
vmstat | Exposes statistics from `/proc/vmstat` . | Linux
2017-04-21 15:19:35 -07:00
xfs | Exposes XFS runtime statistics. | Linux (kernel 4.4+)
2019-01-12 04:33:56 -08:00
zfs | Exposes [ZFS ](http://open-zfs.org/ ) performance statistics. | [Linux ](http://zfsonlinux.org/ ), Solaris
2015-02-09 02:41:04 -08:00
### Disabled by default
2020-11-25 07:46:58 -08:00
`node_exporter` also implements a number of collectors that are disabled by default. Reasons for this vary by
collector, and may include:
* High cardinality
2021-02-24 07:49:47 -08:00
* Prolonged runtime that exceeds the Prometheus `scrape_interval` or `scrape_timeout`
2020-11-25 07:46:58 -08:00
* Significant resource demands on the host
You can enable additional collectors as desired by adding them to your
init system's or service supervisor's startup configuration for
`node_exporter` but caution is advised. Enable at most one at a time,
testing first on a non-production system, then by hand on a single
production node. When enabling additional collectors, you should
carefully monitor the change by observing the `
scrape_duration_seconds` metric to ensure that collection completes
and does not time out. In addition, monitor the
`scrape_samples_post_metric_relabeling` metric to see the changes in
cardinality.
The `perf` collector may not work out of the box on some Linux systems due to kernel
configuration and security settings. To allow access, set the following `sysctl`
2019-05-07 04:21:41 -07:00
parameter:
```
sysctl -w kernel.perf_event_paranoid=X
```
- 2 allow only user-space measurements (default since Linux 4.6).
- 1 allow both kernel and user measurements (default before Linux 4.6).
- 0 allow access to CPU-specific data but not raw tracepoint samples.
- -1 no restrictions.
Depending on the configured value different metrics will be available, for most
cases `0` will provide the most complete set. For more information see [`man 2
perf_event_open`](http://man7.org/linux/man-pages/man2/perf_event_open.2.html).
2020-11-25 07:46:58 -08:00
By default, the `perf` collector will only collect metrics of the CPUs that
2020-02-20 02:36:33 -08:00
`node_exporter` is running on (ie
[`runtime.NumCPU` ](https://golang.org/pkg/runtime/#NumCPU ). If this is
insufficient (e.g. if you run `node_exporter` with its CPU affinity set to
2020-03-27 12:59:47 -07:00
specific CPUs), you can specify a list of alternate CPUs by using the
2020-02-20 02:36:33 -08:00
`--collector.perf.cpus` flag. For example, to collect metrics on CPUs 2-6, you
would specify: `--collector.perf --collector.perf.cpus=2-6` . The CPU
2020-03-27 12:59:47 -07:00
configuration is zero indexed and can also take a stride value; e.g.
`--collector.perf --collector.perf.cpus=1-10:5` would collect on CPUs
2020-02-20 02:36:33 -08:00
1, 5, and 10.
2020-11-25 07:46:58 -08:00
The `perf` collector is also able to collect
2020-04-17 03:02:08 -07:00
[tracepoint ](https://www.kernel.org/doc/html/latest/core-api/tracepoint.html )
counts when using the `--collector.perf.tracepoint` flag. Tracepoints can be
found using [`perf list` ](http://man7.org/linux/man-pages/man1/perf.1.html ) or
from debugfs. And example usage of this would be
`--collector.perf.tracepoint="sched:sched_process_exec"` .
2020-02-20 02:36:33 -08:00
2016-01-21 15:09:24 -08:00
Name | Description | OS
---------|-------------|----
2017-02-07 06:29:50 -08:00
buddyinfo | Exposes statistics of memory fragments as reported by /proc/buddyinfo. | Linux
2016-10-07 09:40:33 -07:00
devstat | Exposes device statistics | Dragonfly, FreeBSD
2017-08-01 23:04:13 -07:00
drbd | Exposes Distributed Replicated Block Device statistics (to version 8.4) | Linux
2021-05-18 08:50:45 -07:00
ethtool | Exposes network interface and network driver statistics equivalent to `ethtool -S` . | Linux
2016-01-21 15:09:24 -08:00
interrupts | Exposes detailed interrupts statistics. | Linux, OpenBSD
ksmd | Exposes kernel and system statistics from `/sys/kernel/mm/ksm` . | Linux
2016-04-20 08:28:12 -07:00
logind | Exposes session counts from [logind ](http://www.freedesktop.org/wiki/Software/systemd/logind/ ). | Linux
2016-12-22 08:38:25 -08:00
meminfo\_numa | Exposes memory statistics from `/proc/meminfo_numa` . | Linux
2016-12-12 13:46:45 -08:00
mountstats | Exposes filesystem statistics from `/proc/self/mountstats` . Exposes detailed NFS client statistics. | Linux
2021-02-05 07:06:57 -08:00
network_route | Exposes the routing table as metrics | Linux
2017-09-19 01:36:14 -07:00
ntp | Exposes local NTP daemon health to check [time ](./docs/TIME.md ) | _any_
2021-02-05 07:06:57 -08:00
perf | Exposes perf based metrics (Warning: Metrics are dependent on kernel configuration and settings). | Linux
2018-10-11 09:27:41 -07:00
processes | Exposes aggregate process statistics from `/proc` . | Linux
2017-05-23 02:55:50 -07:00
qdisc | Exposes [queuing discipline ](https://en.wikipedia.org/wiki/Network_scheduler#Linux_kernel ) statistics | Linux
2016-01-21 15:09:24 -08:00
runit | Exposes service status from [runit ](http://smarden.org/runit/ ). | _any_
supervisord | Exposes service status from [supervisord ](http://supervisord.org/ ). | _any_
systemd | Exposes service and system status from [systemd ](http://www.freedesktop.org/wiki/Software/systemd/ ). | Linux
tcpstat | Exposes TCP connection status information from `/proc/net/tcp` and `/proc/net/tcp6` . (Warning: the current version has potential performance issues in high load situations.) | Linux
2018-10-11 09:27:41 -07:00
wifi | Exposes WiFi device and station statistics. | Linux
2021-02-05 07:06:57 -08:00
zoneinfo | Exposes NUMA memory zone metrics. | Linux
2016-01-21 15:09:24 -08:00
### Textfile Collector
2015-02-09 02:41:04 -08:00
2020-11-25 07:46:58 -08:00
The `textfile` collector is similar to the [Pushgateway ](https://github.com/prometheus/pushgateway ),
2015-02-09 02:41:04 -08:00
in that it allows exporting of statistics from batch jobs. It can also be used
to export static metrics, such as what role a machine has. The Pushgateway
2020-11-25 07:46:58 -08:00
should be used for service-level metrics. The `textfile` module is for metrics
2015-02-09 02:41:04 -08:00
that are tied to a machine.
2020-11-25 07:46:58 -08:00
To use it, set the `--collector.textfile.directory` flag on the `node_exporter` commandline. The
2015-02-18 09:15:57 -08:00
collector will parse all files in that directory matching the glob `*.prom`
using the [text
2019-03-19 03:23:17 -07:00
format](http://prometheus.io/docs/instrumenting/exposition_formats/). **Note:** Timestamps are not supported.
2015-02-09 02:41:04 -08:00
To atomically push completion time for a cron job:
```
echo my_batch_job_completion_time $(date +%s) > /path/to/directory/my_batch_job.prom.$$
mv /path/to/directory/my_batch_job.prom.$$ /path/to/directory/my_batch_job.prom
```
To statically set roles for a machine using labels:
```
echo 'role{role="application_server"} 1' > /path/to/directory/role.prom.$$
mv /path/to/directory/role.prom.$$ /path/to/directory/role.prom
```
2015-06-07 08:59:17 -07:00
2017-10-14 05:23:42 -07:00
### Filtering enabled collectors
2018-04-09 08:27:30 -07:00
The `node_exporter` will expose all metrics from enabled collectors by default. This is the recommended way to collect metrics to avoid errors when comparing metrics of different families.
2018-01-10 06:16:33 -08:00
For advanced use the `node_exporter` can be passed an optional list of collectors to filter metrics. The `collect[]` parameter may be used multiple times. In Prometheus configuration you can use this syntax under the [scrape config ](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#<scrape_config> ).
2017-10-14 05:23:42 -07:00
```
2018-01-10 06:16:33 -08:00
params:
collect[]:
- foo
- bar
```
This can be useful for having different Prometheus servers collect specific metrics from nodes.
2017-10-14 05:23:42 -07:00
2020-11-25 08:07:59 -08:00
## Development building and running
2016-01-21 15:09:24 -08:00
2017-10-11 02:46:13 -07:00
Prerequisites:
* [Go compiler ](https://golang.org/dl/ )
* RHEL/CentOS: `glibc-static` package.
Building:
2020-11-25 08:07:59 -08:00
git clone https://github.com/prometheus/node_exporter.git
cd node_exporter
2016-01-21 15:09:24 -08:00
make
./node_exporter < flags >
2017-03-22 09:20:34 -07:00
To see all available configuration flags:
./node_exporter -h
2016-01-21 15:09:24 -08:00
## Running tests
make test
2020-02-20 04:42:47 -08:00
## TLS endpoint
** EXPERIMENTAL **
The exporter supports TLS via a new web configuration file.
```console
./node_exporter --web.config=web-config.yml
```
2020-12-10 00:58:02 -08:00
See the [exporter-toolkit https package ](https://github.com/prometheus/exporter-toolkit/blob/v0.1.0/https/README.md ) for more details.
2016-01-21 15:09:24 -08:00
2016-01-24 14:38:06 -08:00
[travis]: https://travis-ci.org/prometheus/node_exporter
[hub]: https://hub.docker.com/r/prom/node-exporter/
[circleci]: https://circleci.com/gh/prometheus/node_exporter
2016-04-30 10:14:29 -07:00
[quay]: https://quay.io/repository/prometheus/node-exporter
2017-02-28 09:49:09 -08:00
[goreportcard]: https://goreportcard.com/report/github.com/prometheus/node_exporter