While the CPU vulnerabilities collector has been added in https://github.com/prometheus/node_exporter/pull/2721 , it's currently not including information regarding the mitigation strategy used for a given vulnerability.
This information can be quite valuable, as often times different mitigation strategies come with a different performance impact.
This commit adds a third label to the cpu_vulnerabilities_info metric, to include the "mitigation" used for a given vulnerability - if a given vulnerability is not affecting a node or the node is still vulnerable, the mitigation is expected to be empty.
Signed-off-by: João Lima <jlima@cloudflare.com>
Adds a count for TCP packets received out of orders. This can be an
indication that there is packet loss on the way packets travel towards
this server. In that case, the sender will retransmit (and we can
already monitor the Tcp_RetransSegs there), but we have no way to
monitor the packet loss on the receiver side. When a packet is received
and the receiver detects previous one missing, it will increase the
TCPOFOQueue counter and reply with selective ACK to the sender, both
possible indications of packet loss. Confirmation of packet loss can be
achieved by taking packet captures, ignoring wireshark analysis, and
carefully looking at data being retransmitted based on the TCP seq.
Just like RetransSegs, TCPOFOQueue should be interesting for any
deployment as a mean to detect packet loss, so here suggesting adding it
to the default list.
Signed-off-by: François Rigault <frigo@amadeus.com>
Co-authored-by: François Rigault <frigo@amadeus.com>
This attribute was introduced it v6.6-rc1.
The relevant changes in procfs were merged here:
https://github.com/prometheus/procfs/pull/574
and are part of procfs v0.11.2
I have also figured out that the stat should be part of the v4 ops
counters struct, but that will need changes to both procfs and this
code. Since people are already using 6.6-rc1, I think it's better to get
the code out there --- even if they don't care about wdeleg_getattr,
currently they get _no_ nfsd stats with 6.6-rc1.
I will make two follow-up PRs to clean this up in the next releases of
procfs and node-exporter.
Signed-off-by: Tobias Klausmann <klausman@schwarzvogel.de>
* bcache: remove cache_readaheads_totals metrics #2103
Signed-off-by: Saleh Sal <0xack13@gmail.com>
* Append bcacheReadaheadMetrics when CacheReadaheads value exists
Signed-off-by: Saleh Sal <0xack13@gmail.com>
* Update test cases for cachereadahead greater than zero
Signed-off-by: Saleh Sal <0xack13@gmail.com>
---------
Signed-off-by: Saleh Sal <0xack13@gmail.com>
* Refactor netclass_rtnl collector
Merge the netclass_rtnl collector into the netclass collector.
* Disabled by default
* Followup to #2492
Signed-off-by: Ben Kochie <superq@gmail.com>
We don't need to fully sanitize the hwmon label values to metric/label
name strings.
* Just make sure they're valid UTF-8.
* Always included the label metric to avoid group_left failures.
Signed-off-by: Ben Kochie <superq@gmail.com>
Signed-off-by: Ben Kochie <superq@gmail.com>
Since netdev metrics are now read from netlink instead of `/proc/net/dev`, we
can't easily spoof them for the end-to-end tests by reading a fixture file in
place of `/proc/net/dev`.
Therefore, we only get metrics for `lo` and ignore those that would return
unpredictable values (i.e. the byte and packet counters).
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Set the `--path.udev.data` flag to point to the udev fixture, and update the
output fixture with
```console
$ ./end-to-end-test.sh -u
```
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Fix up handling of CPU info collector on non-x86_64 systems due to
fixtures containing `/proc/cpuinfo` from x86_64.
* Update e2e 64k page test fixture from an arm64 system.
* Enable ARM testing in CircleCI.
Fixes: https://github.com/prometheus/node_exporter/issues/1959
Signed-off-by: Ben Kochie <superq@gmail.com>
Add a DMI collector to expose the Desktop Management Interface (DMI)
info from `/sys/class/dmi/id/`. This will expose information about the
BIOS, mainboard, chassis, and product.
Closes: https://github.com/prometheus/node_exporter/issues/303
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
Currently Node Exporter has a metric called `node_uname_info` which of
course exposes uname info. While this is nice, it does not help if you
are running different OSes which could have similar uname info.
Therefore parse `/etc/os-release` or `/usr/lib/os-release` and expose a
`node_os_info` metric which provide information regarding the OS
release/version of the node. Also expose the major.minor part of the OS
release version as `node_os_version`.
Since the os-release files will not change often, cache the parsed
content and only refresh the cache if the modification time changes.
This `os` collector will read files outside of `/proc` and `/sys`, but
the os-release file is widely used and the format is standardized:
https://www.freedesktop.org/software/systemd/man/os-release.html
Bug: https://github.com/prometheus/node_exporter/issues/1574
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
Add a collector for NVMes to expose the firmware versions. This requires
procfs >= 0.7.0.
Fixes#1891
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
* Expose cpu bugs and flags as info metrics with a regexp filter.
* Automatically enable CPU info metrics when using flags or bugs feature.
Signed-off-by: domgoer <domdoumc@gmail.com>
TCP "OutRsts" is the number of TCP Resets sent by the node. This can be
useful for monitoring connection failures and flooding.
Signed-off-by: Ben Kochie <superq@gmail.com>
The `node_cpu_info` metric contains some information like the `model`
(which is an integer), but not the human readable model name. Also the
stepping of the processor might be interesting, since different stepping
of a processor might behave differently.
Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
Let the node exporter collect the non-numeric data from
/sys/class/infiniband: board ID, firmware version, and HCA type.
Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
Co-authored-by: Ben Kochie <superq@gmail.com>
This exposes RAPL statistics from /sys/class/powercap.
Co-Authored-By: Ben Kochie <superq@gmail.com>
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
Collect the InfiniBand port state, the physical state, and the maximum
signal transfer rate.
Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
commit 5ef96388a978c54173e1b1ec8e7bcb41fc7d130d
Author: Sven Haardiek <sven@haardiek.de>
Date: Wed Sep 18 20:45:23 2019 +0200
block variables
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit c1177382e241994618a8ab7dd9842027d597b0df
Author: Sven Haardiek <sven@haardiek.de>
Date: Wed Sep 18 20:38:33 2019 +0200
Use SI Units
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit 04e4f99c423872d3094f21f89a8235b233a01941
Merge: 5417c98 f3538e1
Author: Sven Haardiek <sven@haardiek.de>
Date: Wed Sep 18 19:20:17 2019 +0200
Merge branch 'master' into power_supply_class
commit 5417c9820a40b37b490caedeaa3526883380b9bf
Author: Sven Haardiek <sven@haardiek.de>
Date: Wed Sep 4 23:02:39 2019 +0200
Drop averages
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit 1f1447dbe7bbdcdabebf4c968beb14c67d89dd9f
Author: Sven Haardiek <sven@haardiek.de>
Date: Wed Sep 4 22:56:00 2019 +0200
Update Copyright
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit 9677425059a3bf61cd7498cf7b5f05d5af7a626b
Merge: 0b51589 d3478a2
Author: Sven Haardiek <sven@haardiek.de>
Date: Mon Sep 2 22:02:53 2019 +0200
Merge branch 'master' into power_supply_class
commit 0b51589f390cc1b33ea4728d85fca3a3b231cf3f
Author: PrometheusBot <prometheus-team@googlegroups.com>
Date: Fri Aug 30 13:32:17 2019 +0200
makefile: update Makefile.common with newer version (#1466)
Signed-off-by: prombot <prometheus-team@googlegroups.com>
commit af2b9e849c7b69237b7fa0e9a289c929ec7173a0
Author: Boris Momčilović <boris.momcilovic@gmail.com>
Date: Tue Aug 27 14:24:11 2019 +0200
Ipvs firewall mark (#1455)
* IPVS: include firewall mark label
Signed-off-by: Boris Momčilović <boris@firstbeatmedia.com>
commit 773f99de7f699900a00b4d35340e356fe7098ee7
Author: Paul Gier <pgier@redhat.com>
Date: Tue Aug 27 02:26:19 2019 -0500
update procfs to v0.0.4 (#1457)
Signed-off-by: Paul Gier <pgier@redhat.com>
commit 6f8a4f4348f62700cbf7eeb2657851237e13c35d
Author: beorn7 <beorn@grafana.com>
Date: Tue Aug 20 18:49:12 2019 +0200
Update legendLink
This still had the 'k8s' in as it was copied and pasted from the
kubernetes-mixin.
Signed-off-by: beorn7 <beorn@grafana.com>
commit d758cf394cfbed9e87e116a24d72050066cd039a
Author: beorn7 <beorn@grafana.com>
Date: Wed Aug 14 22:24:24 2019 +0200
Make the severity of "critical" alerts configurable
This addresses the blissful scenario where single-node failures are
unproblematic. No reason to wake somebody up if a node is about to
screw itself up by filling the disk.
Signed-off-by: beorn7 <beorn@grafana.com>
commit 041b9e1e785f5f43bbef97c0c76d205181d08890
Author: beorn7 <beorn@grafana.com>
Date: Thu Aug 15 16:43:57 2019 +0200
Add line for number of cores to load graph
Backported from the node dashboard in the kubernetes-mixin.
Signed-off-by: beorn7 <beorn@grafana.com>
commit 5552bb3a6b2be1e3dd1a93dbdb9650bd0363a922
Author: beorn7 <beorn@grafana.com>
Date: Thu Aug 15 16:36:10 2019 +0200
Fix title of CPU panel to usage
We use the `mode="idle"` metric, but we are inverting it, so this is
usage, and that's intended.
Signed-off-by: beorn7 <beorn@grafana.com>
commit db0571b402233323ed7e222e53f7ef7738520f49
Author: beorn7 <beorn@grafana.com>
Date: Thu Aug 15 16:32:54 2019 +0200
node-mixin: Improve disk usage panel
- Use a stacked graph instead of a gauge as development over time is
especially useful for disk space usage.
- By only taking one metric per device into account, we avoid
double-counting for devices that are mounted multiple times.
Signed-off-by: beorn7 <beorn@grafana.com>
commit 3822e096c5d27d06b9c9a68beff81ef23f12eb36
Author: Björn Rabenstein <beorn@grafana.com>
Date: Thu Aug 15 00:40:51 2019 +0200
node-mxin: Improve nodes dashboard (#1448)
* node-mixin: Improve nodes dashboard
- Use stacking where it makes sense.
- Normalize idle CPU so that stacking is more meaningful.
- Consistently fill where stacking is used but don't fill where not.
- Fix y axis max value for Idle CPU panel.
- Fix y axis min value for memory usage panel.
- Use `$__interval` for range where applicable (and set min step
to 1m).
- Make the right Y axis for disk I/O actually work.
This is just an incremental improvements. It doesn't touch the more
involved TODOs.
Signed-off-by: beorn7 <beorn@grafana.com>
commit fbced86b9835e1b196c15ddcac01ba3cfcf369cc
Author: beorn7 <beorn@grafana.com>
Date: Tue Aug 13 21:54:28 2019 +0200
node-mixin: Fix various straight-forward issues in the USE dashboards
- Normalize cluster memory utilisation.
- Fix missing `1m` in memory saturation.
- Have both disk-related row next to each other instead with the
network row in between.
- Correctly render transmit network traffic as negative, using
`seriesOverrides` and `min: null` for the y-axis.
- Make panel and row naming consistent.
- Remove legend where it would just display a single entry with
exactly the title of the panel.
- Fix metric name in individual node CPU Saturation panel.
- Break up disk space utilisation by device in the panel for an
individual node.
NB: All of that doesn't touch any more subtle issues captured in the
various TODOs.
Signed-off-by: beorn7 <beorn@grafana.com>
commit 5bdf0625023cf7d05e0f65c6b6a1303637772ca6
Author: Sandro Jäckel <sandro.jaeckel@gmail.com>
Date: Wed Aug 7 09:19:20 2019 +0200
Update rootfs syntax in Docker example (#1443)
Signed-off-by: Sandro Jäckel <sandro.jaeckel@gmail.com>
commit b59f081d45a3ca65957900ec33772dca25a3066f
Author: Phil Frost <phil@postmates.com>
Date: Tue Aug 6 13:08:06 2019 -0400
Fix seconds reported by schedstat (#1426)
Upstream bugfix: https://github.com/prometheus/procfs/pull/191
Signed-off-by: Phil Frost <phil@postmates.com>
commit ac9a059ae81fa31f9963614483af3b5e3bfd672c
Author: Sven Haardiek <sven@haardiek.de>
Date: Sun Aug 4 20:15:36 2019 +0200
Try to make it work for PowerPC
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit c81acf3b009e8538783489d1468f33faf65d8b01
Merge: c064116 75462bf
Author: Sven Haardiek <sven@haardiek.de>
Date: Sun Aug 4 20:14:16 2019 +0200
Merge remote-tracking branch 'upstream/master' into power_supply_class
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit c0641162c3a432f29df30c8d0632a7756d7d2bff
Merge: 06f6e3e 0b710bb
Author: Sven Haardiek <sven@haardiek.de>
Date: Fri Aug 2 18:30:28 2019 +0200
Merge branch 'master' into power_supply_class
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit 06f6e3e8b2a9b2e3f345b6d312a777731bb4b403
Author: Sven Haardiek <sven.haardiek@iotec-gmbh.de>
Date: Fri Mar 22 15:36:03 2019 +0100
Fix Pull Request comments
* concise metric conditions
* combine info about power supply to one metric
Signed-off-by: Sven Haardiek <sven.haardiek@iotec-gmbh.de>
commit 785c3735c4626de56f8341f800ab7bb5e2594d08
Author: Sven Haardiek <sven@haardiek.de>
Date: Sat Mar 9 18:47:52 2019 +0100
Use sys.ttar instead of uploading the files
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit e07bff5d938457147b9009aef7d42d763018cd66
Author: Sven Haardiek <sven@haardiek.de>
Date: Sat Mar 9 18:34:50 2019 +0100
Add information about from /sys/class/power_supply
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit 55b3e34840c9dfc6513ae8e69b6479d5842a3091
Author: Sven Haardiek <sven@haardiek.de>
Date: Sat Mar 9 18:09:45 2019 +0100
Use cyclecount instead of cycle_count since it is a gauge
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit 602350b333cf9353d2cd0ffd40206c96ffe29941
Author: Sven Haardiek <sven@haardiek.de>
Date: Sat Mar 9 18:09:25 2019 +0100
other build options
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit 5aa38f678451d5b63ffdc32336345a1ff6703725
Author: Sven Haardiek <sven@haardiek.de>
Date: Sat Mar 9 18:08:56 2019 +0100
Update fixtures
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit c6acc474a4224b8d9f7b178d0d2e02636d8629ea
Author: Sven Haardiek <sven@haardiek.de>
Date: Sat Mar 9 17:20:30 2019 +0100
Update command line parameter flag
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit f5a329e6ae5ed3b16aa866d67b944f1a73edfe42
Author: Sven Haardiek <sven@haardiek.de>
Date: Sat Mar 9 17:20:06 2019 +0100
Update procfs dependency
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit 38d5fa5165643d6a44dc863b3a1696774259ac0d
Merge: 5a7ce69 28f3582
Author: Sven Haardiek <sven@haardiek.de>
Date: Sat Mar 9 16:28:29 2019 +0100
Merge branch 'power_supply_class' of github.com:shaardie/node_exporter into power_supply_class
commit 5a7ce69505079c9c090e44448cfbd7ffb2b04df7
Author: Sven Haardiek <sven@haardiek.de>
Date: Sat Oct 20 18:55:49 2018 +0200
Updated Metrics of Power Supply Class
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit 690ab1b9c1f2e183b7088cf81c7f266d85ee6df6
Author: Sven Haardiek <sven@haardiek.de>
Date: Fri Oct 19 20:03:42 2018 +0200
Start work on Power Supply Collector
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit 28f358222bbac4315fbf44d94da36d4b0ff2ed55
Author: Sven Haardiek <sven@haardiek.de>
Date: Sat Oct 20 18:55:49 2018 +0200
Updated Metrics of Power Supply Class
Signed-off-by: Sven Haardiek <sven@haardiek.de>
commit 751d99b818503e9a4430b10c39760f180349b294
Author: Sven Haardiek <sven@haardiek.de>
Date: Fri Oct 19 20:03:42 2018 +0200
Start work on Power Supply Collector
Signed-off-by: Sven Haardiek <sven@haardiek.de>
Signed-off-by: Sven Haardiek <sven@haardiek.de>
Parsing the sysfs files for InfiniBand was added to the procfs library
(see https://github.com/prometheus/procfs/pull/164).
Therefore use `InfiniBandClass` from the procfs library instead of
parsing sysfs itself.
If the port counter return `N/A (no PMA)` no metric will be returned
(instead of returning 0 for this metric.
Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>