Add a collector for NVMes to expose the firmware versions. This requires
procfs >= 0.7.0.
Fixes#1891
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
Some devices (ex virtual) don't have a speed and report `-1` as the
speed value. Add a flag to allow ignoring speed on these devices.
Fixes: https://github.com/prometheus/node_exporter/issues/1967
Signed-off-by: Ben Kochie <superq@gmail.com>
Avoid panic on invalid UTF-8 from /sys/class/power_supply by
sanitizing strings parsed from the kernel.
* Add a broken string to the test fixtures.
Fixes: https://github.com/prometheus/node_exporter/issues/1979
Signed-off-by: Ben Kochie <superq@gmail.com>
This exposes RAPL statistics from /sys/class/powercap.
Co-Authored-By: Ben Kochie <superq@gmail.com>
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
* Add makefile target to update sysfs fixtures.
* Use similar style for fixtures from procfs.
* Re-pack fixtures ttar file.
Signed-off-by: Ben Kochie <superq@gmail.com>
commit 5ef96388a978c54173e1b1ec8e7bcb41fc7d130d
Author: Sven Haardiek <sven@haardiek.de>
Date: Wed Sep 18 20:45:23 2019 +0200
block variables
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit c1177382e241994618a8ab7dd9842027d597b0df
Author: Sven Haardiek <sven@haardiek.de>
Date: Wed Sep 18 20:38:33 2019 +0200
Use SI Units
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit 04e4f99c423872d3094f21f89a8235b233a01941
Merge: 5417c98 f3538e1
Author: Sven Haardiek <sven@haardiek.de>
Date: Wed Sep 18 19:20:17 2019 +0200
Merge branch 'master' into power_supply_class
commit 5417c9820a40b37b490caedeaa3526883380b9bf
Author: Sven Haardiek <sven@haardiek.de>
Date: Wed Sep 4 23:02:39 2019 +0200
Drop averages
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit 1f1447dbe7bbdcdabebf4c968beb14c67d89dd9f
Author: Sven Haardiek <sven@haardiek.de>
Date: Wed Sep 4 22:56:00 2019 +0200
Update Copyright
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit 9677425059a3bf61cd7498cf7b5f05d5af7a626b
Merge: 0b51589 d3478a2
Author: Sven Haardiek <sven@haardiek.de>
Date: Mon Sep 2 22:02:53 2019 +0200
Merge branch 'master' into power_supply_class
commit 0b51589f390cc1b33ea4728d85fca3a3b231cf3f
Author: PrometheusBot <prometheus-team@googlegroups.com>
Date: Fri Aug 30 13:32:17 2019 +0200
makefile: update Makefile.common with newer version (#1466)
Signed-off-by: prombot <prometheus-team@googlegroups.com>
commit af2b9e849c7b69237b7fa0e9a289c929ec7173a0
Author: Boris Momčilović <boris.momcilovic@gmail.com>
Date: Tue Aug 27 14:24:11 2019 +0200
Ipvs firewall mark (#1455)
* IPVS: include firewall mark label
Signed-off-by: Boris Momčilović <boris@firstbeatmedia.com>
commit 773f99de7f699900a00b4d35340e356fe7098ee7
Author: Paul Gier <pgier@redhat.com>
Date: Tue Aug 27 02:26:19 2019 -0500
update procfs to v0.0.4 (#1457)
Signed-off-by: Paul Gier <pgier@redhat.com>
commit 6f8a4f4348f62700cbf7eeb2657851237e13c35d
Author: beorn7 <beorn@grafana.com>
Date: Tue Aug 20 18:49:12 2019 +0200
Update legendLink
This still had the 'k8s' in as it was copied and pasted from the
kubernetes-mixin.
Signed-off-by: beorn7 <beorn@grafana.com>
commit d758cf394cfbed9e87e116a24d72050066cd039a
Author: beorn7 <beorn@grafana.com>
Date: Wed Aug 14 22:24:24 2019 +0200
Make the severity of "critical" alerts configurable
This addresses the blissful scenario where single-node failures are
unproblematic. No reason to wake somebody up if a node is about to
screw itself up by filling the disk.
Signed-off-by: beorn7 <beorn@grafana.com>
commit 041b9e1e785f5f43bbef97c0c76d205181d08890
Author: beorn7 <beorn@grafana.com>
Date: Thu Aug 15 16:43:57 2019 +0200
Add line for number of cores to load graph
Backported from the node dashboard in the kubernetes-mixin.
Signed-off-by: beorn7 <beorn@grafana.com>
commit 5552bb3a6b2be1e3dd1a93dbdb9650bd0363a922
Author: beorn7 <beorn@grafana.com>
Date: Thu Aug 15 16:36:10 2019 +0200
Fix title of CPU panel to usage
We use the `mode="idle"` metric, but we are inverting it, so this is
usage, and that's intended.
Signed-off-by: beorn7 <beorn@grafana.com>
commit db0571b402233323ed7e222e53f7ef7738520f49
Author: beorn7 <beorn@grafana.com>
Date: Thu Aug 15 16:32:54 2019 +0200
node-mixin: Improve disk usage panel
- Use a stacked graph instead of a gauge as development over time is
especially useful for disk space usage.
- By only taking one metric per device into account, we avoid
double-counting for devices that are mounted multiple times.
Signed-off-by: beorn7 <beorn@grafana.com>
commit 3822e096c5d27d06b9c9a68beff81ef23f12eb36
Author: Björn Rabenstein <beorn@grafana.com>
Date: Thu Aug 15 00:40:51 2019 +0200
node-mxin: Improve nodes dashboard (#1448)
* node-mixin: Improve nodes dashboard
- Use stacking where it makes sense.
- Normalize idle CPU so that stacking is more meaningful.
- Consistently fill where stacking is used but don't fill where not.
- Fix y axis max value for Idle CPU panel.
- Fix y axis min value for memory usage panel.
- Use `$__interval` for range where applicable (and set min step
to 1m).
- Make the right Y axis for disk I/O actually work.
This is just an incremental improvements. It doesn't touch the more
involved TODOs.
Signed-off-by: beorn7 <beorn@grafana.com>
commit fbced86b9835e1b196c15ddcac01ba3cfcf369cc
Author: beorn7 <beorn@grafana.com>
Date: Tue Aug 13 21:54:28 2019 +0200
node-mixin: Fix various straight-forward issues in the USE dashboards
- Normalize cluster memory utilisation.
- Fix missing `1m` in memory saturation.
- Have both disk-related row next to each other instead with the
network row in between.
- Correctly render transmit network traffic as negative, using
`seriesOverrides` and `min: null` for the y-axis.
- Make panel and row naming consistent.
- Remove legend where it would just display a single entry with
exactly the title of the panel.
- Fix metric name in individual node CPU Saturation panel.
- Break up disk space utilisation by device in the panel for an
individual node.
NB: All of that doesn't touch any more subtle issues captured in the
various TODOs.
Signed-off-by: beorn7 <beorn@grafana.com>
commit 5bdf0625023cf7d05e0f65c6b6a1303637772ca6
Author: Sandro Jäckel <sandro.jaeckel@gmail.com>
Date: Wed Aug 7 09:19:20 2019 +0200
Update rootfs syntax in Docker example (#1443)
Signed-off-by: Sandro Jäckel <sandro.jaeckel@gmail.com>
commit b59f081d45a3ca65957900ec33772dca25a3066f
Author: Phil Frost <phil@postmates.com>
Date: Tue Aug 6 13:08:06 2019 -0400
Fix seconds reported by schedstat (#1426)
Upstream bugfix: https://github.com/prometheus/procfs/pull/191
Signed-off-by: Phil Frost <phil@postmates.com>
commit ac9a059ae81fa31f9963614483af3b5e3bfd672c
Author: Sven Haardiek <sven@haardiek.de>
Date: Sun Aug 4 20:15:36 2019 +0200
Try to make it work for PowerPC
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit c81acf3b009e8538783489d1468f33faf65d8b01
Merge: c064116 75462bf
Author: Sven Haardiek <sven@haardiek.de>
Date: Sun Aug 4 20:14:16 2019 +0200
Merge remote-tracking branch 'upstream/master' into power_supply_class
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit c0641162c3a432f29df30c8d0632a7756d7d2bff
Merge: 06f6e3e 0b710bb
Author: Sven Haardiek <sven@haardiek.de>
Date: Fri Aug 2 18:30:28 2019 +0200
Merge branch 'master' into power_supply_class
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit 06f6e3e8b2a9b2e3f345b6d312a777731bb4b403
Author: Sven Haardiek <sven.haardiek@iotec-gmbh.de>
Date: Fri Mar 22 15:36:03 2019 +0100
Fix Pull Request comments
* concise metric conditions
* combine info about power supply to one metric
Signed-off-by: Sven Haardiek <sven.haardiek@iotec-gmbh.de>
commit 785c3735c4626de56f8341f800ab7bb5e2594d08
Author: Sven Haardiek <sven@haardiek.de>
Date: Sat Mar 9 18:47:52 2019 +0100
Use sys.ttar instead of uploading the files
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit e07bff5d938457147b9009aef7d42d763018cd66
Author: Sven Haardiek <sven@haardiek.de>
Date: Sat Mar 9 18:34:50 2019 +0100
Add information about from /sys/class/power_supply
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit 55b3e34840c9dfc6513ae8e69b6479d5842a3091
Author: Sven Haardiek <sven@haardiek.de>
Date: Sat Mar 9 18:09:45 2019 +0100
Use cyclecount instead of cycle_count since it is a gauge
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit 602350b333cf9353d2cd0ffd40206c96ffe29941
Author: Sven Haardiek <sven@haardiek.de>
Date: Sat Mar 9 18:09:25 2019 +0100
other build options
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit 5aa38f678451d5b63ffdc32336345a1ff6703725
Author: Sven Haardiek <sven@haardiek.de>
Date: Sat Mar 9 18:08:56 2019 +0100
Update fixtures
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit c6acc474a4224b8d9f7b178d0d2e02636d8629ea
Author: Sven Haardiek <sven@haardiek.de>
Date: Sat Mar 9 17:20:30 2019 +0100
Update command line parameter flag
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit f5a329e6ae5ed3b16aa866d67b944f1a73edfe42
Author: Sven Haardiek <sven@haardiek.de>
Date: Sat Mar 9 17:20:06 2019 +0100
Update procfs dependency
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit 38d5fa5165643d6a44dc863b3a1696774259ac0d
Merge: 5a7ce69 28f3582
Author: Sven Haardiek <sven@haardiek.de>
Date: Sat Mar 9 16:28:29 2019 +0100
Merge branch 'power_supply_class' of github.com:shaardie/node_exporter into power_supply_class
commit 5a7ce69505079c9c090e44448cfbd7ffb2b04df7
Author: Sven Haardiek <sven@haardiek.de>
Date: Sat Oct 20 18:55:49 2018 +0200
Updated Metrics of Power Supply Class
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit 690ab1b9c1f2e183b7088cf81c7f266d85ee6df6
Author: Sven Haardiek <sven@haardiek.de>
Date: Fri Oct 19 20:03:42 2018 +0200
Start work on Power Supply Collector
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit 28f358222bbac4315fbf44d94da36d4b0ff2ed55
Author: Sven Haardiek <sven@haardiek.de>
Date: Sat Oct 20 18:55:49 2018 +0200
Updated Metrics of Power Supply Class
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit 751d99b818503e9a4430b10c39760f180349b294
Author: Sven Haardiek <sven@haardiek.de>
Date: Fri Oct 19 20:03:42 2018 +0200
Start work on Power Supply Collector
Signed-off-by: Sven Haardiek <sven@haardiek.de>
Signed-off-by: Sven Haardiek <sven@haardiek.de>
Parsing the sysfs files for InfiniBand was added to the procfs library
(see https://github.com/prometheus/procfs/pull/164).
Therefore use `InfiniBandClass` from the procfs library instead of
parsing sysfs itself.
If the port counter return `N/A (no PMA)` no metric will be returned
(instead of returning 0 for this metric.
Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
With a bond interface the state of the slave interface from the bond's
point of view is reflected in `mii_status` and is independent of the
link's `operstate`.
When a bond is monitored with `miimon`, `mii_status` will reflect the
state of the physical link as configured via the operator.
When a bond is monitored via `arp_interval` the `mii_status` will
reflect the results of the bond ARP checking. This means the link can
be down from the bond's point of view, but up from a physical
connection point of view.
If a bond is not monitored via miimon or arp, the `mii_status` should
likely be always `up`, however I have observed a case where this is not
true and the `operstate` is `up` while `mii_status` is `down`. Kernel
bond documentation stresses that a bond should not be configured without
one of `mii_mon` or `arp_interval` configured however.
This change results in the metric 'node_bonding_active' matching the
up/down state of the bond's point of view rather than operstate.
Signed-off-by: Sachi King <nakato@nakato.io>
* Update Linux cpufreq collector to use new procfs library functions.
* Split thermal throttle collection to a separate function.
* Add new required fixtures and repack ttar file.
Signed-off-by: Ben Kochie <superq@gmail.com>
* infiniband: Add not connected i40iw0/ports/1 fixtures
* infiniband: Handle issue when iWARP* RDMA modules are not available
This is related to #966, and handle this error,
Jun 07 13:33:24 hostname node_exporter[81888]: time="2018-06-07T13:33:24+02:00" level=error msg="ERROR: infiniband
collector failed after 0.000929s: strconv.ParseUint: parsing \"N/A (no PMA)\": invalid syntax" source="collector.go:132"
Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
* add sys/class/net parsing from procfs and expose its metrics
Signed-off-by: Jan Klat <jenik@klatys.cz>
* change code to use int pointers per procfs change, move netclass to separate collector, change metric naming
Signed-off-by: Jan Klat <jenik@klatys.cz>
* bump year in licence, remove redundant newline, correct fixtures
Signed-off-by: Jan Klat <jenik@klatys.cz>
* fix style
Signed-off-by: Jan Klat <jenik@klatys.cz>
* change carrier changes to counter type
Signed-off-by: Jan Klat <jenik@klatys.cz>
* fix e2e output
Signed-off-by: Jan Klat <jenik@klatys.cz>
* add fixtures
Signed-off-by: Jan Klat <jenik@klatys.cz>
* update vendor, use fixtures correctly
Signed-off-by: Jan Klat <jenik@klatys.cz>
* change fixtures (device in /sys/class/net should be symlinked)
Signed-off-by: Jan Klat <jenik@klatys.cz>
* correct fixtures for 64k page, updated readme
Signed-off-by: Jan Klat <jenik@klatys.cz>
* cpu: Add a 2nd label 'package' to metric node_cpu_core_throttles_total
This commit fixes the node_cpu_core_throttles_total metrics on
multi-socket systems as the core_ids are the same for each package.
I.e. we need to count them seperately.
Rename the node_package_throttles_total metric label `node` to `package`.
Reorganize the sys.ttar archive and use the same symlinks as the Linux
kernel. Also, the new fixtures now use a dual-socket dual-core cpu w/o
HT/SMT (node0: cpu0+1, node1: cpu2+3) as well as processor-less
(memory-only) NUMA node 'node2' (this is a very rare case).
Signed-off-by: Karsten Weiss <knweiss@gmail.com>
* cpu: Use the direct /sys path to the cpu files.
Use the direct path /sys/devices/system/cpu/cpu[0-9]* (without symlinks)
instead of /sys/bus/cpu/devices/cpu[0-9]*.
The latter path also does not exist e.g. on RHEL 6.9's kernel.
Signed-off-by: Karsten Weiss <knweiss@gmail.com>
* cpu: Reverse core+package throttle processing order
Signed-off-by: Karsten Weiss <knweiss@gmail.com>
* cpu: Add documentation URLs
Signed-off-by: Karsten Weiss <knweiss@gmail.com>
* Only report core throttles per core, not per cpu
* Add topology/core_id to the cpu sysfs fixtures
* Add new cpu fixtures to ttar file
* Merge core_id reading and thermal throttle accounting
* Declare core_id
* cpu: Support processor-less (memory-only) NUMA nodes
Processor-less (memory-only) NUMA nodes exist e.g. in systems that use
Intel Optane drives for RAM expansion using Intel Memory Drive
Technology (IMDT).
IMDT RAM expansion supports two modes:
* "Unify Remote Memory domains": present a processor-less (memory-only)
NUMA domain, which is the default
* "Expand local memory domains": to expand each processor’s memory domain
with a portion of the memory made available by Optane and IMDT
This commit fixes a crash in the first case (when "cpulist" is empty).
Here's an example of such a system:
$ numastat -m|head -n5
Per-node system memory usage (in MBs):
Node 0 Node 1 Node 2 Total
--------------- --------------- --------------- ---------------
MemTotal 118239.56 130816.00 464384.00 713439.56
$ for i in {0..2}; do echo -n "$i: " ; cat /sys/bus/node/devices/node$i/cpulist ; done
0: 0-7,16-23
1: 8-15,24-31
2:
$ /opt/vsmp/bin/vsmpversion -vvv
Memory Drive Technology: 8.2.1455.74 (Sep 28 2017 13:09:59)
System configuration:
Boards: 3
1 x Proc. + I/O + Memory
2 x NVM devices (Intel SSDPED1K375GAQ)
Processors: 2, Cores: 16, Threads: 32
Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz Stepping 01
Memory (MB): 713472 (of 977450), Cache: 251416, Private: 12562
1 x 249088MB [262036/ 678/12270]
1 x 232192MB [357707/125369/ 146] 82:00.0#1
1 x 232192MB [357707/125369/ 146] 83:00.0#1
* cpu: rename some variables (pkg => node)
* cpu: Use %v not %q in log.Debugf() format strings
* cpu: Metric 'package_throttles_total' is per package.
'package_throttles_total' is per package, not per cpu. This also reduces
the total number of cpu time series a lot (esp for multi core cpus).
* cpu: Better handling of a cpulist edge-case.
* cpu: Extract the package number from the directory name.
Do not rely on the range index.
* cpu: Add package_throttle_count for node0 cpu1
This file must be ignored by the cpu collector.
* Add bcache collector for Linux
This collector gathers metrics related to the Linux block cache
(bcache) from sysfs.
* Removed commented out code
* Use project comment style
* Add _sectors to metric name to indicate unit
* Really use project comment style
* Rename bcache.go to bcache_linux.go
* Keep collector namespace clean
Rename:
- metric -> bcacheMetric
- periodStatsToMetrics -> bcachePeriodStatsToMetric
* Shorten slice initialization
* Change label names to backing_device, cache_device
* Remove five minute metrics (keep only total)
* Include units in additional metric names
* Enable bcache collector by default
* Provide metrics in seconds, not nanoseconds
* remove metrics with label "all"
* Add fixtures, update end-to-end for bcache collector
* Move fixtures/sys into tar.gz
This changeset moves the collector/fixtures/sys directory into
collector/fixtures/sys.tar.gz and tweaks the Makefile to unpack the
tarball before tests are run.
The reason for this change is that Windows does not allow colons in a
path (colons are present in some of the bcache fixture files), nor can
it (out of the box) deal with pathnames longer than 260 characters
(which we would be increasingly likely to hit if we tried to replace
colons with longer codes that are guaranteed not the turn up in regular
file names).
* Add ttar: plain text archive, replacement for tar
This changeset adds ttar, a plain text replacement for tar, and uses it
for the sysfs fixture archive. The syntax is loosely based on tar(1).
Using a plain text archive makes it possible to review changes without
downloading and extracting the archive. Also, when working on the repo,
git diff and git log become useful again, allowing a committer to verify
and track changes over time.
The code is written in bash, because bash is available out of the box on
all major flavors of Linux and on macOS. The feature set used is
restricted to bash version 3.2 because that is what Apple is still
shipping.
The programm also works on Windows if bash is installed. Obviously, it
does not solve the Windows limitations (path length limited to 260
characters, no symbolic links) that prompted the move to an archive
format in the first place.