TCP "OutRsts" is the number of TCP Resets sent by the node. This can be
useful for monitoring connection failures and flooding.
Signed-off-by: Ben Kochie <superq@gmail.com>
* The netdev collector CLI argument `--collector.netdev.ignored-devices` was renamed to `--collector.netdev.device-blacklist` in order to conform with the systemd collector. #1279
* The label named `state` on `node_systemd_service_restart_total` metrics was changed to `name` to better describe the metric. #1393
* Refactoring of the mdadm collector changes several metrics
- `node_md_disks_active` is removed
- `node_md_disks` now has a `state` label for "fail", "spare", "active" disks.
- `node_md_is_active` is replaced by `node_md_state` with a state set of "active", "inactive", "recovering", "resync".
* Additional label `mountaddr` added to NFS device metrics to distinguish mounts from the same URL, but different IP addresses. #1417
* Metrics node_cpu_scaling_frequency_min_hrts and node_cpu_scaling_frequency_max_hrts of the cpufreq collector were renamed to node_cpu_scaling_frequency_min_hertz and node_cpu_scaling_frequency_max_hertz. #1510
* Collectors that are enabled, but are unable to find data to collect, now return 0 for `node_scrape_collector_success`.
* [CHANGE] Add `--collector.netdev.device-whitelist`. #1279
* [CHANGE] Ignore iso9600 filesystem on Linux #1355
* [CHANGE] Refactor mdadm collector #1403
* [CHANGE] Add `mountaddr` label to NFS metrics. #1417
* [CHANGE] Don't count empty collectors as success. #1613
* [FEATURE] New flag to disable default collectors #1276
* [FEATURE] Add experimental TLS support #1277, #1687, #1695
* [FEATURE] Add collector for Power Supply Class #1280
* [FEATURE] Add new schedstat collector #1389
* [FEATURE] Add FreeBSD zfs support #1394
* [FEATURE] Add uname support for Darwin and OpenBSD #1433
* [FEATURE] Add new metric node_cpu_info #1489
* [FEATURE] Add new thermal_zone collector #1425
* [FEATURE] Add new cooling_device metrics to thermal zone collector #1445
* [FEATURE] Add swap usage on darwin #1508
* [FEATURE] Add Btrfs collector #1512
* [FEATURE] Add RAPL collector #1523
* [FEATURE] Add new softnet collector #1576
* [FEATURE] Add new udp_queues collector #1503
* [FEATURE] Add basic authentication #1673
* [ENHANCEMENT] Log pid when there is a problem reading the process stats #1341
* [ENHANCEMENT] Collect InfiniBand port state and physical state #1357
* [ENHANCEMENT] Include additional XFS runtime statistics. #1423
* [ENHANCEMENT] Report non-fatal collection errors in the exporter metric. #1439
* [ENHANCEMENT] Expose IPVS firewall mark as a label #1455
* [ENHANCEMENT] Add check for systemd version before attempting to query certain metrics. #1413
* [ENHANCEMENT] Add a flag to adjust mount timeout #1486
* [ENHANCEMENT] Add new counters for flush requests in Linux 5.5 #1548
* [ENHANCEMENT] Add metrics and tests for UDP receive and send buffer errors #1534
* [ENHANCEMENT] The sockstat collector now exposes IPv6 statistics in addition to the existing IPv4 support. #1552
* [ENHANCEMENT] Add infiniband info metric #1563
* [ENHANCEMENT] Add unix socket support for supervisord collector #1592
* [ENHANCEMENT] Implement loadavg on all BSDs without cgo #1584
* [ENHANCEMENT] Add model_name and stepping to node_cpu_info metric #1617
* [ENHANCEMENT] Add `--collector.perf.cpus` to allow setting the CPU list for perf stats. #1561
* [ENHANCEMENT] Add metrics for IO errors and retires on Darwin. #1636
* [ENHANCEMENT] Add perf tracepoint collection flag #1664
* [ENHANCEMENT] ZFS: read contents of objset file #1632
* [ENHANCEMENT] Linux CPU: Cache CPU metrics to make them monotonically increasing #1711
* [BUGFIX] Read /proc/net files with a single read syscall #1380
* [BUGFIX] Renamed label `state` to `name` on `node_systemd_service_restart_total`. #1393
* [BUGFIX] Fix netdev nil reference on Darwin #1414
* [BUGFIX] Strip path.rootfs from mountpoint labels #1421
* [BUGFIX] Fix seconds reported by schedstat #1426
* [BUGFIX] Fix empty string in path.rootfs #1464
* [BUGFIX] Fix typo in cpufreq metric names #1510
* [BUGFIX] Read /proc/stat in one syscall #1538
* [BUGFIX] Fix OpenBSD cache memory information #1542
* [BUGFIX] Refactor textfile collector to avoid looping defer #1549
* [BUGFIX] Fix network speed math #1580
* [BUGFIX] collector/systemd: use regexp to extract systemd version #1647
* [BUGFIX] Fix initialization in perf collector when using multiple CPUs #1665
* [BUGFIX] Fix accidentally empty lines in meminfo_linux #1671
Signed-off-by: Ben Kochie <superq@gmail.com>
* Update CHANGELOG with fixes and improvements from rc.0
Signed-off-by: Ben Kochie <superq@gmail.com>
Signed-off-by: Richard Hartmann <richih@richih.org>
* Two new states will be added to the tcpstat collector called rx_queued_bytes and tx_queued_bytes.
For UDP datagrams an additional collector 'udp_queues' can be used to expose the total lengths of the tx_queue and rx_queue.
@SuperQ and @discordianfish this changes gives us the option to check for overloaded UDP + TCP processing.
The names of the new TCP states and the UDP metric can be discussed.
The current reasons are just:
I don't want to add another collector for the same exposed file, so I just added the new states to the tcpstat collector.
I chose the name 'udp_queue' instead of 'udpstat' as UDP has no state.
Signed-off-by: Peter Bueschel <peter.bueschel@logmein.com>
The `node_cpu_info` metric contains some information like the `model`
(which is an integer), but not the human readable model name. Also the
stepping of the processor might be interesting, since different stepping
of a processor might behave differently.
Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
Update CHANGELOG/VERSION for 1.0.0-rc.0 release.
* Add a note about new https settings to top-level README.
* Mark --web.config flag as experimental.
Signed-off-by: Ben Kochie <superq@gmail.com>
* add a map of profilers to CPUids
`runtime.NumCPU()` returns the number of CPUs that the process can run
on. This number does not necessarily correlate to CPU ids if the
affinity mask of the process is set.
This change maintains the current behavior as default, but also allows
the user to specify a range of CPUids to use instead.
The CPU id is stored as the value of a map keyed on the profiler
object's address.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
Signed-off-by: Daniel Hodges <hodges@uber.com>
Co-authored-by: jdamato-fsly <55214354+jdamato-fsly@users.noreply.github.com>
Many collectors depend on underlying features to be enabled. This causes
confusion about what "success" means. This changes the behavior of the
`node_scrape_collector_success` metric.
* When a collector is unable to find data don't return success.
* Catch the no data error and send to Debug log level to avoid log spam.
* Update collectors to support this new functionality.
* Fix copy-pasta mistake in infiband debug message.
Closes: https://github.com/prometheus/node_exporter/issues/1323
Signed-off-by: Ben Kochie <superq@gmail.com>
This exposes RAPL statistics from /sys/class/powercap.
Co-Authored-By: Ben Kochie <superq@gmail.com>
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
* Add unix socket support for supervisord collector
For example:
--collector.supervisord.url=unix:///var/run/supervisor.sock
Fixesprometheus/node_exporter#262
Signed-off-by: Paul Cameron <cameronpm@gmail.com>
Integer division and the order of operations when converting Mbps to Bps
results in a loss of accuracy if the interface speeds are set low.
e.g. 100 Mbps is reported as 12000000 Bps, should be 12500000
10 Mbps is reported as 1000000 Bps, should be 1250000
Signed-off-by: Thomas Lin <t.lin@mail.utoronto.ca>
The dbus property 'SystemState' and the timer property 'LastTriggerUSec'
were added in version 212 of systemd.
Check that the version of systemd is higher than 212 before attempting
to query these properties
f755e3b74bdedabea4b3
Resolves issue #291
Signed-off-by: Paul Gier <pgier@redhat.com>
Use the extra information gleaned from the mountinfo file to add
a 'mountaddr' field for NFS metrics. This helps prevent prometheus from
ignoring mounts that come from the same URL, but are actually from
different IP addresses.
This commit also rebases to current master
Signed-off-by: Dipack P Panjabi <dpanjabi@hudson-trading.com>
Include directory operation, read/write system call, and vnode runtime
statistics for XFS filesystems.
Signed-off-by: Steven Kreuzer <skreuzer@FreeBSD.org>
* Closes issue #261 on node_exporter.
Delegated mdstat parsing to procfs project. mdadm_linux.go now only exports the metrics.
-> Added disk labels: "fail", "spare", "active" to indicate disk status
-> hanged metric node_md_disks_total ==> node_md_disks_required
-> Removed test cases for mdadm_linux.go, as the functionality they tested for has been moved to procfs project.
Signed-off-by: Advait Bhatwadekar <advait123@ymail.com>
* Add --collector.netdev.device-whitelist flag
Sometimes it is desired to monitor only one netdev. The golang regexp
does not support a negated regex, so the ignored-devices flag is too
cumbersome for this task.
This change introduces a new flag: accept-devices, which is mutually
exclusive to ignored-devices. This flag allows specifying ONLY the
netdev you'd like.
Signed-off-by: Noam Meltzer <noam@cynerio.co>
Previously, the node_textfile_mtime_seconds metric was based on the
Fileinfo.ModTime() of the ioutil.ReadDir() return value. This is based
on lstat() and therefore has unintended consequences for symlinks
(modification time of the symlink instead of the symlink target is
returned). It is also racy as the lstat() is performed before reading
the file.
This commit changes the node_textfile_mtime_seconds metric to be based
on a fresh Stat() call on the open file. This eliminates the race and
works as expected for symlinks. Fixes#1324.
Signed-off-by: Christian Hoffmann <mail@hoffmann-christian.info>
This enables the collection of pressure stall information as exposed
by the `/proc/pressure` interface added in the 4.20 release of the
Linux kernel.
Closes#1174
Signed-off-by: Daniele Sluijters <daenney@users.noreply.github.com>
The cpu frequency information is not always needed and/or available.
This change allows the cpu frequency metrics to be enabled/disabled
separately from the other cpu metrics, and also prevents a frequency
metric failure (such as a parse error) from failing the main cpu
collector.
Fixes#1241
Signed-off-by: Paul Gier <pgier@redhat.com>
This reduces the system metric collection time by using a wait group
and go routines to allow the systemd metric calls happen concurrently.
Also, makes the start time, restarts, tasks_max, and tasks_current metrics disabled by default
because these can be time consuming to gather.
Signed-off-by: Paul Gier <pgier@redhat.com>
With a bond interface the state of the slave interface from the bond's
point of view is reflected in `mii_status` and is independent of the
link's `operstate`.
When a bond is monitored with `miimon`, `mii_status` will reflect the
state of the physical link as configured via the operator.
When a bond is monitored via `arp_interval` the `mii_status` will
reflect the results of the bond ARP checking. This means the link can
be down from the bond's point of view, but up from a physical
connection point of view.
If a bond is not monitored via miimon or arp, the `mii_status` should
likely be always `up`, however I have observed a case where this is not
true and the `operstate` is `up` while `mii_status` is `down`. Kernel
bond documentation stresses that a bond should not be configured without
one of `mii_mon` or `arp_interval` configured however.
This change results in the metric 'node_bonding_active' matching the
up/down state of the bond's point of view rather than operstate.
Signed-off-by: Sachi King <nakato@nakato.io>
* netclass_linux: remove varying labels from the 'up' metric
This moves the variable label values such as 'operstate' out of
the 'network_up' metric and into a separate metric called '_info'.
This allows the 'up' metric to remain continous over state changes.
Fixes#1236
Signed-off-by: Paul Gier <pgier@redhat.com>
* Rename interface to device in netclass collector
This makes it consistent with other networking metrics like node_network_receive_bytes_total
This closes#1223
Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
* Add diskstats collector for OpenBSD
Tested on i386 and amd64, OpenBSD 6.4 and -current.
* Refactor diskstats collectors
This moves common descriptors from Linux, Darwin, OpenBSD
diskstats collectors into diskstats_common.go
Signed-off-by: Ralf Horstmann <ralf+github@ackstorm.de>
* Add fallback for missing /proc/1/mounts
On some systems, `/proc/1/mounts` is hidden from non-root users due to
the `hidepid` procfs feature. Attempt to fallback to `/proc/mounts` if
`/proc/1/mounts` is not found.
Signed-off-by: Ben Kochie <superq@gmail.com>
* Add tests.
Signed-off-by: Ben Kochie <superq@gmail.com>
* Add CHANGELOG entry.
Signed-off-by: Ben Kochie <superq@gmail.com>
Tcp SYN packet retransmits are a very useful signal as they affect
network performance disproportionately to regular TCP retransmits.
Signed-off-by: Ben Kochie <superq@gmail.com>
Starting with (not yet released) OpenBSD 6.4, sysctl KERN_CPTIME2 will
return ENODEV for offline CPUs.
SMT siblings are reported as offline when hw.smt is disabled, which is
the default since one of the later Spectre variants. So this might
affect a few systems.
For more details see:
https://cvsweb.openbsd.org/src/sys/kern/kern_sysctl.c#rev1.348
Signed-off-by: Ralf Horstmann <ralf+github@ackstorm.de>
* Change systemd unit filtering
Get all units from systemd and filter in Go.
* Improves compatibility with older versions of systemd.
* Improve debugging by printing when units pass the filter.
* Remove extraneous newlines from log messages.
Signed-off-by: Ben Kochie <superq@gmail.com>
* Correctly cast Darwin memory info
* Cast stats to float64 before doing math on them to avoid integer
wrapping.
* Remove invalid `_total` suffix from gauge values.
* Handle counters in `meminfo.go`.
Signed-off-by: Ben Kochie <superq@gmail.com>
* If NRestarts or NRefused are not available, don't ignore the unit itself
* Don't report systemd metrics (NRestarts/NRefused) that are not available
Signed-off-by: James Hartig <james@getadmiral.com>
PIDs can vanish (exit) from /proc/ between gathering the list of PIDs
and getting all of their stats.
* Ignore file not found errors.
* Explicitly count the PIDs we find.
* Cleanup some error style issues.
Signed-off-by: Ben Kochie <superq@gmail.com>
* Replace supervisord xmlrpc library
* Use `github.com/mattn/go-xmlrpc` that doesn't leak goroutines.
* Fix uptime metric
* Use Prometheus best practices for uptime metric.
* Use "start time" rather than "uptime".
* Don't emit a start time if the process is down.
* Add changelog entry.
* Add example compatibility rules.
Signed-off-by: Ben Kochie <superq@gmail.com>
* Add support for NRestarts counter introduced in systemd 235
`.service` units increment this counter any time the Restart= condition is
triggered.
Signed-off-by: Matthew McGinn <mamcgi@gmail.com>
* cpu: Add a 2nd label 'package' to metric node_cpu_core_throttles_total
This commit fixes the node_cpu_core_throttles_total metrics on
multi-socket systems as the core_ids are the same for each package.
I.e. we need to count them seperately.
Rename the node_package_throttles_total metric label `node` to `package`.
Reorganize the sys.ttar archive and use the same symlinks as the Linux
kernel. Also, the new fixtures now use a dual-socket dual-core cpu w/o
HT/SMT (node0: cpu0+1, node1: cpu2+3) as well as processor-less
(memory-only) NUMA node 'node2' (this is a very rare case).
Signed-off-by: Karsten Weiss <knweiss@gmail.com>
* cpu: Use the direct /sys path to the cpu files.
Use the direct path /sys/devices/system/cpu/cpu[0-9]* (without symlinks)
instead of /sys/bus/cpu/devices/cpu[0-9]*.
The latter path also does not exist e.g. on RHEL 6.9's kernel.
Signed-off-by: Karsten Weiss <knweiss@gmail.com>
* cpu: Reverse core+package throttle processing order
Signed-off-by: Karsten Weiss <knweiss@gmail.com>
* cpu: Add documentation URLs
Signed-off-by: Karsten Weiss <knweiss@gmail.com>
Linux "guest" metrics for VMs are already accounted for in node_cpu
`user` and `nice` metrics. Separate these into their own metric to
avoid duplication of data.