* Add collector for PCIe devices with link information
The link status of PCIe devices sometimes changes,
like link or speed downgrades, and devices disappear.
This patch collects PCIe devices' link infromation to detect such failures.
As a first step, this collector exports PCIe devices'
- Device information (vendor_id, device_id, etc.)
- Parent PCIe device (e.g. PCIe bridge, PCIe switch)
- Link status (max_link_{transfers_per_second|width}, current_link_{transfers_per_second|width})
---------
Signed-off-by: Naoki MATSUMOTO <m.naoki9911@gmail.com>
* chore: ignore/include metrics for FreeBSD
Ignore non-deterministic metrics and include deterministic ones.
Use go123 for NetBSD from upstream release channel rather than the
package manager as that doesn't exist.
* https://cdn.netbsd.org/pub/pkgsrc/packages/NetBSD/x86_64/10.0_2024Q4/All/
---------
Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
For integration tests.
Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
chore: support non-linux GOOS in e2e tests
Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
chore: support e2e tests on freebsd
Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
chore: support e2e tests on openbsd
Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
chore: support e2e tests on netbsd
Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
chore: support e2e tests on solaris
Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
chore: support e2e tests on dragonfly
Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
chore: drop support for e2e tests on solaris
Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
* Optionally fetch ARP stats via rtnetlink instead of procfs
Implement collection of ARP stats via rtnetlink to work around
shortcomings in the output of /proc/net/arp, which truncates InfiniBand
link-layer addresses.
Fixes: #2776
---------
Signed-off-by: Daniel Swarbrick <daniel.swarbrick@gmail.com>
Co-authored-by: Ben Kochie <superq@gmail.com>
This collects usage information (per-user) on the Linux kernel keyring
See: https://man7.org/linux/man-pages/man7/keyrings.7.html
It is disabled by default because cardinality is one-per user.
Signed-off-by: Robert Zimmerman <rzimmerman@cloudflare.com>
Use the correct include value to the device filter function.
* Add new bogus hwmon fixture.
* Update end-to-end test to use hwmon chip include flag.
Signed-off-by: Ben Kochie <superq@gmail.com>
Since netdev metrics are now read from netlink instead of `/proc/net/dev`, we
can't easily spoof them for the end-to-end tests by reading a fixture file in
place of `/proc/net/dev`.
Therefore, we only get metrics for `lo` and ignore those that would return
unpredictable values (i.e. the byte and packet counters).
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Set the `--path.udev.data` flag to point to the udev fixture, and update the
output fixture with
```console
$ ./end-to-end-test.sh -u
```
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Fix up handling of CPU info collector on non-x86_64 systems due to
fixtures containing `/proc/cpuinfo` from x86_64.
* Update e2e 64k page test fixture from an arm64 system.
* Enable ARM testing in CircleCI.
Fixes: https://github.com/prometheus/node_exporter/issues/1959
Signed-off-by: Ben Kochie <superq@gmail.com>
* Correctly name collector file.
* Fix cgroup summary type as gauge.
* Use a boolean metric rather than a label for enabled.
Signed-off-by: Ben Kochie <superq@gmail.com>
Allow filtering APR entries based on device. Useful for ignoring
entries for network namespaces (containers).
Signed-off-by: Ben Kochie <superq@gmail.com>
This adds a new Linux metric, node_softirqs_total, which corresponds
to the 'softirq' line in /proc/stat. This metric is disabled by
default and it can be enabled with '--collector.stat.softirq'.
Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
Currently Node Exporter has a metric called `node_uname_info` which of
course exposes uname info. While this is nice, it does not help if you
are running different OSes which could have similar uname info.
Therefore parse `/etc/os-release` or `/usr/lib/os-release` and expose a
`node_os_info` metric which provide information regarding the OS
release/version of the node. Also expose the major.minor part of the OS
release version as `node_os_version`.
Since the os-release files will not change often, cache the parsed
content and only refresh the cache if the modification time changes.
This `os` collector will read files outside of `/proc` and `/sys`, but
the os-release file is widely used and the format is standardized:
https://www.freedesktop.org/software/systemd/man/os-release.html
Bug: https://github.com/prometheus/node_exporter/issues/1574
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
Some devices (ex virtual) don't have a speed and report `-1` as the
speed value. Add a flag to allow ignoring speed on these devices.
Fixes: https://github.com/prometheus/node_exporter/issues/1967
Signed-off-by: Ben Kochie <superq@gmail.com>
* Expose cpu bugs and flags as info metrics with a regexp filter.
* Automatically enable CPU info metrics when using flags or bugs feature.
Signed-off-by: domgoer <domdoumc@gmail.com>
* Two new states will be added to the tcpstat collector called rx_queued_bytes and tx_queued_bytes.
For UDP datagrams an additional collector 'udp_queues' can be used to expose the total lengths of the tx_queue and rx_queue.
@SuperQ and @discordianfish this changes gives us the option to check for overloaded UDP + TCP processing.
The names of the new TCP states and the UDP metric can be discussed.
The current reasons are just:
I don't want to add another collector for the same exposed file, so I just added the new states to the tcpstat collector.
I chose the name 'udp_queue' instead of 'udpstat' as UDP has no state.
Signed-off-by: Peter Bueschel <peter.bueschel@logmein.com>
This exposes RAPL statistics from /sys/class/powercap.
Co-Authored-By: Ben Kochie <superq@gmail.com>
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
This enables the collection of pressure stall information as exposed
by the `/proc/pressure` interface added in the 4.20 release of the
Linux kernel.
Closes#1174
Signed-off-by: Daniele Sluijters <daenney@users.noreply.github.com>
The cpu frequency information is not always needed and/or available.
This change allows the cpu frequency metrics to be enabled/disabled
separately from the other cpu metrics, and also prevents a frequency
metric failure (such as a parse error) from failing the main cpu
collector.
Fixes#1241
Signed-off-by: Paul Gier <pgier@redhat.com>
* add sys/class/net parsing from procfs and expose its metrics
Signed-off-by: Jan Klat <jenik@klatys.cz>
* change code to use int pointers per procfs change, move netclass to separate collector, change metric naming
Signed-off-by: Jan Klat <jenik@klatys.cz>
* bump year in licence, remove redundant newline, correct fixtures
Signed-off-by: Jan Klat <jenik@klatys.cz>
* fix style
Signed-off-by: Jan Klat <jenik@klatys.cz>
* change carrier changes to counter type
Signed-off-by: Jan Klat <jenik@klatys.cz>
* fix e2e output
Signed-off-by: Jan Klat <jenik@klatys.cz>
* add fixtures
Signed-off-by: Jan Klat <jenik@klatys.cz>
* update vendor, use fixtures correctly
Signed-off-by: Jan Klat <jenik@klatys.cz>
* change fixtures (device in /sys/class/net should be symlinked)
Signed-off-by: Jan Klat <jenik@klatys.cz>
* correct fixtures for 64k page, updated readme
Signed-off-by: Jan Klat <jenik@klatys.cz>
* Do not rely on AArch64 CPUs to support 32-bit ARM for cross-testing.
Signed-off-by: Alexey Kopytov <akopytov@gmail.com>
* aarch64 like ppc64le reports 64k node_sockstat_TCP_mem_bytes due to 64k pages.
Signed-off-by: Alexey Kopytov <akopytov@gmail.com>
Vmstat has over 100 fields, most of which are highly
detailed debug information. Trim this down to only
essential fields by default, configurable by flag.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>