In certain instances on heavily loaded nodes with many network
devices, there may be concurrent access to the netdev collector's
`metricDescs` map, resulting in a panic. This adds a mutex to prevent
concurrent reads and writes to the map.
Signed-off-by: Brad Ison <bison@xvdf.io>
Move the systemd version function to an exporter method. This way we can
update the Verison information at every scrape, in case the underlying
version changes.
Signed-off-by: Ben Kochie <superq@gmail.com>
systemd patch versions are as important as the major version number;
they indicate security or bug fixes or other behavioural changes between
versions.
Use float64 over float32 as the rounding error with float32 rendered
250.3 as 250.3000030517578 in my testing.
Signed-off-by: Joe Groocock <jgroocock@cloudflare.com>
Signed-off-by: Joe Groocock <me@frebib.net>
analogous to the /var/lib/docker exclude added in
https://github.com/prometheus/node_exporter/pull/814
podman rootful containers mount eg. shm filesystems at
/var/lib/containers/storage/*-containers/*/userdata/shm. these should be
treated like things under /var/lib/docker by default.
Signed-off-by: Lauri Tirkkonen <lauri@hacktheplanet.fi>
Allow filtering APR entries based on device. Useful for ignoring
entries for network namespaces (containers).
Signed-off-by: Ben Kochie <superq@gmail.com>
* [BUGFIX] Handle nil CPU thermal power status on M1 #2218
* [BUGFIX] bsd: Ignore filesystems flagged as MNT_IGNORE. #2227
* [BUGFIX] Sanitize UTF-8 in dmi collector #2229
Signed-off-by: Ben Kochie <superq@gmail.com>
This adds a new Linux metric, node_softirqs_total, which corresponds
to the 'softirq' line in /proc/stat. This metric is disabled by
default and it can be enabled with '--collector.stat.softirq'.
Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
Use the non-cgo version for all openbsd architectures.
The old code only pulled some defines from header files. Just add them
as enumerations in native go. Also be careful at what the SysctlRaw returns.
Implement a way that supports both recent and old pre-6.4 OpenBSD systems.
With go-1.16 OpenBSD binaries will link to libc and because of this binaries
built on OpenBSD 6.9-current do not run on OpenBSD 6.3. OpenBSD 6.3 is also
not supported for more then 2 years. So maybe the compat code is not needed.
Still validation object length before doing an unsafe pointer conversion
is probably reasonable but I'm no golang expert.
Signed-off-by: Claudio Jeker <claudio@openbsd.org>
NOTE: In order to support globs in the textfile collector path, filenames exposed by
`node_textfile_mtime_seconds` now contain the full path name.
* [CHANGE] Add path label to rapl collector #2146
* [CHANGE] Exclude filesystems under /run/credentials #2157
* [FEATURE] Add lnstat collector for metrics from /proc/net/stat/ #1771
* [FEATURE] Add darwin powersupply collector #1777
* [FEATURE] Add support for monitoring GPUs on Linux #1998
* [FEATURE] Add Darwin thermal collector #2032
* [FEATURE] Add os release collector #2094
* [FEATURE] Add netdev.address-info collector #2105
* [ENHANCEMENT] Support glob textfile collector directories #1985
* [ENHANCEMENT] ethtool: Expose node_ethtool_info metric #2080
* [ENHANCEMENT] Use include/exclude flags for ethtool filtering #2165
* [ENHANCEMENT] Add flag to disable guest CPU metrics #2123
* [ENHANCEMENT] Add DMI collector #2131
* [ENHANCEMENT] Add threads metrics to processes collector #2164
* [ENHANCMMENT] Reduce timer GC delays in the Linux filesystem collector #2169
* [BUGFIX] ethtool: Sanitize metric names #2093
* [BUGFIX] Fix ethtool collector for multiple interfaces #2126
* [BUGFIX] Fix possible panic on macOS #2133
* [BUGFIX] Collect flag_info and bug_info only for one core #2156
Signed-off-by: Ben Kochie <superq@gmail.com>
TCP timeouts count is a useful signal to show
abnormal network performance and is another
signal to aid debugging. This metric can be
used to generate proactive alerts for host
network namespace workloads.
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
The new `lnstat` collector produces a high number of metrics, per-cpu,
and results in approximately double the number of metrics previously
scraped. For example, a typical server with 64 cores produces 3832
lnstat metrics compared to 4147 metrics for the remaining collectors.
Therefore disable the `lnstat` collector by default.
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
Sanitizing the metric names can lead to duplicate metric names:
```
caller=level.go:63 level=error caller="error gathering metrics: [from Gatherer #2] collected metric \"node_ethtool_giant_hdr\" { label:<name:\"device\" value:\"ens192\" > untyped:<value:0" msg=" > } was collected before with the same name and label values"
```
Generate a map from the sanitized metric names to the metric names from
ethtool. In case of duplicate sanitized metric names drop both metrics,
because it is unknown which one to take.
Fixes: https://github.com/prometheus/node_exporter/issues/2185
Signed-off-by: Benjamin Drung <benjamin.drung@ionos.com>
Use SysctlTimeval from the golang.org/x/sys/unix package to
simplify the implementation of the boottime collector for the BSDs and
allows to build it without cgo.
Tested on macOS 11.6, FreeBSD 13 and OpenBSD 7.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
'iowait' and 'steal' indicate specific idle/wait states, which shouldn't
be counted into CPU Utilisation. Also see
https://github.com/prometheus-operator/kube-prometheus/pull/796 and
https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/667.
Per the iostat man page:
%idle
Show the percentage of time that the CPU or CPUs were idle and the
system did not have an outstanding disk I/O request.
%iowait
Show the percentage of time that the CPU or CPUs were idle during
which the system had an outstanding disk I/O request.
%steal
Show the percentage of time spent in involuntary wait by the
virtual CPU or CPUs while the hypervisor was servicing another
virtual processor.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>