Sanitizing the metric names can lead to duplicate metric names:
```
caller=level.go:63 level=error caller="error gathering metrics: [from Gatherer #2] collected metric \"node_ethtool_giant_hdr\" { label:<name:\"device\" value:\"ens192\" > untyped:<value:0" msg=" > } was collected before with the same name and label values"
```
Generate a map from the sanitized metric names to the metric names from
ethtool. In case of duplicate sanitized metric names drop both metrics,
because it is unknown which one to take.
Fixes: https://github.com/prometheus/node_exporter/issues/2185
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
Use SysctlTimeval from the golang.org/x/sys/unix package to
simplify the implementation of the boottime collector for the BSDs and
allows to build it without cgo.
Tested on macOS 11.6, FreeBSD 13 and OpenBSD 7.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Add a DMI collector to expose the Desktop Management Interface (DMI)
info from `/sys/class/dmi/id/`. This will expose information about the
BIOS, mainboard, chassis, and product.
Closes: https://github.com/prometheus/node_exporter/issues/303
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
Use `time.NewTimer()` and explicit `Stop()` to avoid memory bloat / GC problems with `time.After()` in the Linux filesystem collector timeout handling.
Signed-off-by: bawenmao <bawenmao@sogou-inc.com>
The ethtool_cmd struct from the linux kernel contains information about the speeds and features supported by a
network device. This includes speeds and duplex but also features like autonegotiate and 802.3x pause frames.
Closes#1444
Signed-off-by: W. Andrew Denton <git@flying-snail.net>
* collector: Unwrap glob textfile directories
* collector: Store full path in mtime's file label
The point is to avoid duplicated gauges from files with the same name in
different directories.
This introduces support for exporting from multiple directories matching
given pattern (e.g. `/home/*/metrics/`).
Signed-off-by: Kiril Vladimirov <kiril@vladimiroff.org>
Expose GPU metrics using `sysfs/drm`.
`amdgpu` is the only driver which exposes this information through DRM.
Signed-off-by: Siavash Safi <siavash.safi@gmail.com>
Use the same flag pattern as netdev to make filtering methods the same.
* Move SanitizeMetricName to helper.go
Signed-off-by: Ben Kochie <superq@gmail.com>
* Refactor diskstats_linux to use procfs.
* Add `node_disk_info` metric.
Signed-off-by: W. Andrew Denton <git@flying-snail.net>
Co-authored-by: W. Andrew Denton <git@flying-snail.net>
Currently Node Exporter has a metric called `node_uname_info` which of
course exposes uname info. While this is nice, it does not help if you
are running different OSes which could have similar uname info.
Therefore parse `/etc/os-release` or `/usr/lib/os-release` and expose a
`node_os_info` metric which provide information regarding the OS
release/version of the node. Also expose the major.minor part of the OS
release version as `node_os_version`.
Since the os-release files will not change often, cache the parsed
content and only refresh the cache if the modification time changes.
This `os` collector will read files outside of `/proc` and `/sys`, but
the os-release file is widely used and the format is standardized:
https://www.freedesktop.org/software/systemd/man/os-release.html
Bug: https://github.com/prometheus/node_exporter/issues/1574
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
In high scale virtualized / cloud environments there are typically
no guest VMs. Add a boolean flag to allow disabling the Linux guest
CPU metrics.
Signed-off-by: Ben Kochie <superq@gmail.com>
Add a `node_ethtool_info` metric to all ethtool devices to expose driver
information with following labels:
* bus_info
* driver
* expansion_rom_version
* firmware_version
* version
This metric is useful to monitor the firmware version to be up-to-date.
Note: The version label might be malformed due to bug #39 in ethtool:
https://github.com/safchain/ethtool/issues/39
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
OpenMetrics and the Prometheus exposition format require the metric name
to consist only of alphanumericals and "_", ":" and they must not start
with digits. The metric names from the ethtool stats might contain
spaces, brackets, and dots. Converting them directly to metric names
will produce invalid metric names.
Therefore sanitize the metric names and convert them to lower case.
Fixes: https://github.com/prometheus/node_exporter/issues/2083
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
This adds a new flag --collector.ethtool.metrics-include to the ethtool
collector. Only metrics matching this regexp will be collected.
Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
Other network related collectors allow to filter out unwanted devices.
Add this support to the new ethtool collector as well.
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
Update procfs library to include ignored fields ParseInt handling.
Wrap error returns so that the user can know more about what failed.
Returns from getAllocatedThreads() are errors anyway.
Fixes: https://github.com/prometheus/node_exporter/issues/2110
Signed-off-by: Ben Kochie <superq@gmail.com>
The Linux CPU idle stat can also jump backwards slightly in some cases.
Allow the jump back up to 3 seconds before we attempt to reset the CPU
counter cache.
Fixes: https://github.com/prometheus/node_exporter/issues/1903
Signed-off-by: Ben Kochie <superq@gmail.com>
Add a collector for NVMes to expose the firmware versions. This requires
procfs >= 0.7.0.
Fixes#1891
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
We should filter excluded interfaces before parsing the interface
details.
This change is based on https://github.com/prometheus/procfs/pull/376
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
When /metrics is called for specific collectors, the collectors are
initialed every time. Which means that we spend a lot of time
re-initiating the same collectors again and again. Especially, some
collectors make the assumptions that that are initiated once - e.g.
systemd collector generates info message upon initiation.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Fix the error logging of the promhttp handler by connecting it to the
promlog setup.
* Switch to go-kit/log.
* Cleanup CHANGELOG.
Fixes: https://github.com/prometheus/node_exporter/issues/1886
Signed-off-by: Ben Kochie <superq@gmail.com>
Add darwin/arm64 to the CGO crossbuilder list.
* Update Makefile.common to pick up new promu.
* Fix possible nil pointer caught by staticcheck.
* Update collector build tags.
https://github.com/prometheus/node_exporter/issues/1997
Signed-off-by: Ben Kochie <superq@gmail.com>
Some devices (ex virtual) don't have a speed and report `-1` as the
speed value. Add a flag to allow ignoring speed on these devices.
Fixes: https://github.com/prometheus/node_exporter/issues/1967
Signed-off-by: Ben Kochie <superq@gmail.com>
Avoid panic on invalid UTF-8 from /sys/class/power_supply by
sanitizing strings parsed from the kernel.
* Add a broken string to the test fixtures.
Fixes: https://github.com/prometheus/node_exporter/issues/1979
Signed-off-by: Ben Kochie <superq@gmail.com>
When CONFIG_PSI_DEFAULT_DISABLED=y, the pressure system returns
"operation not supported", rather than permission denied or not
exposing the /proc/pressure files.
Fixes: https://github.com/prometheus/node_exporter/issues/1961
Signed-off-by: Ben Kochie <superq@gmail.com>
* Update Build
- Update CircleCI orb.
- Update CIrcleCI Machine image.
- Use golang-builder 1.15.
* Update Go modules.
* Fixup fixtures for XFS bug.
NOTE: We have improved some of the flag naming conventions (PR #1743). The old names are
deprecated and will be removed in 2.0. They will continue to work for backwards
compatibility.
* [CHANGE] Improve filter flag names #1743
* [CHANGE] Add btrfs and powersupplyclass to list of exporters enabled by default #1897
* [FEATURE] Add fibre channel collector #1786
* [FEATURE] Expose cpu bugs and flags as info metrics. #1788
* [FEATURE] Add network_route collector #1811
* [FEATURE] Add zoneinfo collector #1922
* [ENHANCEMENT] Add more InfiniBand counters #1694
* [ENHANCEMENT] Add flag to aggr ipvs metrics to avoid high cardinality metrics #1709
* [ENHANCEMENT] Adding backlog/current queue length to qdisc collector #1732
* [ENHANCEMENT] Include TCP OutRsts in netstat metrics #1733
* [ENHANCEMENT] Add pool size to entropy collector #1753
* [ENHANCEMENT] Remove CGO dependencies for OpenBSD amd64 #1774
* [ENHANCEMENT] bcache: add writeback_rate_debug stats #1658
* [ENHANCEMENT] Add check state for mdadm arrays via node_md_state metric #1810
* [ENHANCEMENT] Expose XFS inode statistics #1870
* [ENHANCEMENT] Expose zfs zpool state #1878
* [ENHANCEMENT] Added an ability to pass collector.supervisord.url via SUPERVISORD_URL environment variable #1947
* [BUGFIX] filesystem_freebsd: Fix label values #1728
* [BUGFIX] Fix various procfs parsing errors #1735
* [BUGFIX] Handle no data from powersupplyclass #1747
* [BUGFIX] udp_queues_linux.go: change upd to udp in two error strings #1769
* [BUGFIX] Fix node_scrape_collector_success behaviour #1816
* [BUGFIX] Fix NodeRAIDDegraded to not use a string rule expressions #1827
* [BUGFIX] Fix node_md_disks state label from fail to failed #1862
* [BUGFIX] Handle EPERM for syscall in timex collector #1938
* [BUGFIX] bcache: fix typo in a metric name #1943
* [BUGFIX] Fix XFS read/write stats (https://github.com/prometheus/procfs/pull/343)
Signed-off-by: Ben Kochie <superq@gmail.com>
* Use `device` label to match other `node_network_...` metics.
* Fix naming convention to match Promehteus best practices.
Signed-off-by: Ben Kochie <superq@gmail.com>
I have rewritten all CGO dependencies for OpenBSD amd64
using pure go, be able to crosscompile node_exporter.
Signed-off-by: ston1th <ston1th@giftfish.de>
this change fixes the logging message for the filesystem ignored-fs-types flag to output the flag instead of the mountpoints flag.
Signed-off-by: xinau <felix.ehrenpfort@protonmail.com>
We've gathered enough evidence that the CPU counter bug workaround is
working as intended. Downgrade the message from Warning to Debug.
Signed-off-by: Ben Kochie <superq@gmail.com>
* Expose metric for state=check for node_md_state
* Added new e2e output fixture including md201 which is in checking state and a the new state=check labeled metric for all other md
Signed-off-by: Christian Rohmann <github@frittentheke.de>
* Check interface name before loading interface data
* Reduce indentation
* Optimize nested netdev map
This change avoids conversion to strings and back.
Signed-off-by: Julian Kornberger <jk+github@digineo.de>
* Expose cpu bugs and flags as info metrics with a regexp filter.
* Automatically enable CPU info metrics when using flags or bugs feature.
Signed-off-by: domgoer <domdoumc@gmail.com>
Handle the case when /sys/class/power_supply doesn't exist. Fixes
logging error spam.
Requires https://github.com/prometheus/procfs/pull/308
Signed-off-by: Ben Kochie <superq@gmail.com>
TCP "OutRsts" is the number of TCP Resets sent by the node. This can be
useful for monitoring connection failures and flooding.
Signed-off-by: Ben Kochie <superq@gmail.com>
We must know the length of the various filesystem C strings before
turning them from a byte array into a Go string, otherwise our Go
strings could contain null bytes, corrupting the label values.
Signed-off-by: David O'Rourke <david.orourke@gmail.com>
* Two new states will be added to the tcpstat collector called rx_queued_bytes and tx_queued_bytes.
For UDP datagrams an additional collector 'udp_queues' can be used to expose the total lengths of the tx_queue and rx_queue.
@SuperQ and @discordianfish this changes gives us the option to check for overloaded UDP + TCP processing.
The names of the new TCP states and the UDP metric can be discussed.
The current reasons are just:
I don't want to add another collector for the same exposed file, so I just added the new states to the tcpstat collector.
I chose the name 'udp_queue' instead of 'udpstat' as UDP has no state.
Signed-off-by: Peter Bueschel <peter.bueschel@logmein.com>
The `node_cpu_info` metric contains some information like the `model`
(which is an integer), but not the human readable model name. Also the
stepping of the processor might be interesting, since different stepping
of a processor might behave differently.
Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
* Use `strconv.Itoa()` instead of `fmt.Sprintf()` for simple conversion.
* Eliminate copy-paste in collector setup.
Signed-off-by: Ben Kochie <superq@gmail.com>
* add a map of profilers to CPUids
`runtime.NumCPU()` returns the number of CPUs that the process can run
on. This number does not necessarily correlate to CPU ids if the
affinity mask of the process is set.
This change maintains the current behavior as default, but also allows
the user to specify a range of CPUids to use instead.
The CPU id is stored as the value of a map keyed on the profiler
object's address.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
Signed-off-by: Daniel Hodges <hodges@uber.com>
Co-authored-by: jdamato-fsly <55214354+jdamato-fsly@users.noreply.github.com>
Many collectors depend on underlying features to be enabled. This causes
confusion about what "success" means. This changes the behavior of the
`node_scrape_collector_success` metric.
* When a collector is unable to find data don't return success.
* Catch the no data error and send to Debug log level to avoid log spam.
* Update collectors to support this new functionality.
* Fix copy-pasta mistake in infiband debug message.
Closes: https://github.com/prometheus/node_exporter/issues/1323
Signed-off-by: Ben Kochie <superq@gmail.com>
Let the node exporter collect the non-numeric data from
/sys/class/infiniband: board ID, firmware version, and HCA type.
Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
Co-authored-by: Ben Kochie <superq@gmail.com>
Reuse the Go-only implementation already in place for FreeBSD (#385) on
Darwin, DragonflyBSD, NetBSD and OpenBSD.
Tested on all affected platforms.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
This exposes RAPL statistics from /sys/class/powercap.
Co-Authored-By: Ben Kochie <superq@gmail.com>
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
* Add unix socket support for supervisord collector
For example:
--collector.supervisord.url=unix:///var/run/supervisor.sock
Fixesprometheus/node_exporter#262
Signed-off-by: Paul Cameron <cameronpm@gmail.com>
Integer division and the order of operations when converting Mbps to Bps
results in a loss of accuracy if the interface speeds are set low.
e.g. 100 Mbps is reported as 12000000 Bps, should be 12500000
10 Mbps is reported as 1000000 Bps, should be 1250000
Signed-off-by: Thomas Lin <t.lin@mail.utoronto.ca>
This will now use `bcstats.numbufpages` instead of `uvmexp.vnodepages`.
Inspired by OpenBSD's `src/usr.bin/top`
Signed-off-by: Matthieu Guegan <matthieu.guegan@deindeal.ch>
* Add diskstat flush request counters for Linux 5.5+
* Update tests for diskstat flush request counters with Linux 5.5+
Signed-off-by: Holger Hoffstätte <holger@applied-asynchrony.com>
* Add makefile target to update sysfs fixtures.
* Use similar style for fixtures from procfs.
* Re-pack fixtures ttar file.
Signed-off-by: Ben Kochie <superq@gmail.com>