* Add include and exclude flags chip name flags to hwmon collector, following example in systemd collector
---------
Signed-off-by: Conall O'Brien <conall@conall.net>
Co-authored-by: Ben Kochie <superq@gmail.com>
This change adds the ability to process multiple stat calls in parallel.
Processing is rate-limited based on the new flag
`collector.filesystem.stat-workers` (default 4).
Caveat: filesystem stats information is no longer in the same order as
returned by `/proc/1/mounts`. This should not be an issue.
Caveat: This change currently uses unbuffered channels to prove
correctness without reliance on buffers. Buffered channels will yield
superior performance.
Signed-off-by: Erica Mays <erica@emays.dev>
Read missing dev_id, name_assign_type, and addr_assign_type
from sysfs, since they only take a device-specific lock and
not the whole RTNL lock. This means reading them is much less
impactful on other system processes than many of the other
attributes in sysfs that do take the RTNL lock.
Signed-off-by: Dan Williams <dcbw@redhat.com>
On most hard drives, `ID_SERIAL_SHORT` and `SCSI_IDENT_SERIAL` are identical,
but on some SAS drives they do differ. In that case, `SCSI_IDENT_SERIAL`
corresponds to the serial number printed on the drive label, and to the value
returned by `smartctl -i`.
So use that value by default for the `serial` label on the `node_disk_info`
metric, and fallback to `ID_SERIAL_SHORT` only if it's undefined.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Mark the `supervisord` as deprecated. This process
supevisor, like `runit`, is of scope for the node_exporter.
Signed-off-by: Ben Kochie <superq@gmail.com>
* bcache: remove cache_readaheads_totals metrics #2103
Signed-off-by: Saleh Sal <0xack13@gmail.com>
* Append bcacheReadaheadMetrics when CacheReadaheads value exists
Signed-off-by: Saleh Sal <0xack13@gmail.com>
* Update test cases for cachereadahead greater than zero
Signed-off-by: Saleh Sal <0xack13@gmail.com>
---------
Signed-off-by: Saleh Sal <0xack13@gmail.com>
Use the filesystem collector for all OpenBSD archs, there is no reason to
only use it on amd64 systems.
Signed-off-by: Claudio Jeker <claudio@openbsd.org>
* Bump exporter-toolkit to the latest release.
* Use new toolkit landing page function.
* Update kingpin flags.
Signed-off-by: Ben Kochie <superq@gmail.com>
The ntp collector has always been a source of confusion and problems.
The data it produces is more of a blackbox probe against an NTP server.
The time sync / offset data produced is not what users expect.
Mark this collector as deprecated to be removed in v2.0.0
Signed-off-by: Ben Kochie <superq@gmail.com>
Move metric descriptiions to package vars to avoid allocating them every
time `NewCPUFreqCollector()` is called.
Signed-off-by: Ben Kochie <superq@gmail.com>
* Refactor netclass_rtnl collector
Merge the netclass_rtnl collector into the netclass collector.
* Disabled by default
* Followup to #2492
Signed-off-by: Ben Kochie <superq@gmail.com>
* update rtnetlink package to v1.2.3
* add RTNL version of netclass collector that have all the metrics that netdev collector provides, too.
Signed-off-by: Haoyu Sun <hasun@redhat.com>
Some systems have broken netlink messages due to patched kernels. Since
these messages can not be parsed, add a flag to fall back to parsing
from `/proc/net/dev`.
Fixes: https://github.com/prometheus/node_exporter/issues/2502
Signed-off-by: Ben Kochie <superq@gmail.com>
Signed-off-by: Ben Kochie <superq@gmail.com>
Note however that the InetDiagMsg struct contains a InetDiagSockID
member, which itself contains some members which are explicitly
specified as big-endian in Linux kernel source:
struct inet_diag_sockid {
__be16 idiag_sport;
__be16 idiag_dport;
__be32 idiag_src[4];
__be32 idiag_dst[4];
__u32 idiag_if;
__u32 idiag_cookie[2];
};
node_exporter currently does not use these members for anything, so this
is acceptable (for now).
Signed-off-by: Daniel Swarbrick <daniel.swarbrick@gmail.com>
We don't need to fully sanitize the hwmon label values to metric/label
name strings.
* Just make sure they're valid UTF-8.
* Always included the label metric to avoid group_left failures.
Signed-off-by: Ben Kochie <superq@gmail.com>
Signed-off-by: Ben Kochie <superq@gmail.com>
Correctly handle the new `collector.diskstats.device-exclude` flag to
avoid errors when using the old `collector.diskstats.ignored-devices`
flag.
Fixes: https://github.com/prometheus/node_exporter/issues/2486
Signed-off-by: Ben Kochie <superq@gmail.com>
* [CHANGE] Merge metrics descriptions in textfile collector #2475
* [FEATURE] [node-mixin] Add darwin dashboard to mixin #2351
* [FEATURE] Add "isolated" metric on cpu collector on linux #2251
* [FEATURE] Add cgroup summary collector #2408
* [FEATURE] Add selinux collector #2205
* [FEATURE] Add slab info collector #2376
* [FEATURE] Add sysctl collector #2425
* [FEATURE] Also track the CPU Spin time for OpenBSD systems #1971
* [FEATURE] Add support for MacOS version #2471
* [ENHANCEMENT] [node-mixin] Add missing selectors #2426
* [ENHANCEMENT] [node-mixin] Change current datasource to grafana's default #2281
* [ENHANCEMENT] [node-mixin] Change disk graph to disk table #2364
* [ENHANCEMENT] [node-mixin] Change io time units to %util #2375
* [ENHANCEMENT] Ad user_wired_bytes and laundry_bytes on *bsd #2266
* [ENHANCEMENT] Add additional vm_stat memory metrics for darwin #2240
* [ENHANCEMENT] Add device filter flags to arp collector #2254
* [ENHANCEMENT] Add diskstats include and exclude device flags #2417
* [ENHANCEMENT] Add node_softirqs_total metric #2221
* [ENHANCEMENT] Add rapl zone name label option #2401
* [ENHANCEMENT] Add slabinfo collector #1799
* [ENHANCEMENT] Allow user to select port on NTP server to query #2270
* [ENHANCEMENT] collector/diskstats: Add labels and metrics from udev #2404
* [ENHANCEMENT] Enable builds against older macOS SDK #2327
* [ENHANCEMENT] qdisk-linux: Add exclude and include flags for interface name #2432
* [ENHANCEMENT] systemd: Expose systemd minor version #2282
* [ENHANCEMENT] Use netlink for tcpstat collector #2322
* [ENHANCEMENT] Use netlink to get netdev stats #2074
* [ENHANCEMENT] Add additional perf counters for stalled frontend/backend cycles #2191
* [ENHANCEMENT] Add btrfs device error stats #2193
* [BUGFIX] [node-mixin] Fix fsSpaceAvailableCriticalThreshold and fsSpaceAvailableWarning #2352
* [BUGFIX] Fix concurrency issue in ethtool collector #2289
* [BUGFIX] Fix concurrency issue in netdev collector #2267
* [BUGFIX] Fix diskstat reads and write metrics for disks with different sector sizes #2311
* [BUGFIX] Fix iostat on macos broken by deprecation warning #2292
* [BUGFIX] Fix NodeFileDescriptorLimit alerts #2340
* [BUGFIX] Sanitize rapl zone names #2299
* [BUGFIX] Add file descriptor close safely in test #2447
* [BUGFIX] Fix race condition in os_release.go #2454
* [BUGFIX] Skip ZFS IO metrics if their paths are missing #2451
Signed-off-by: Ben Kochie <superq@gmail.com>
Signed-off-by: Ben Kochie <superq@gmail.com>
* Improve metrics filesystem scanning logic
* Makes ioctl syscalls to load the device error stats.
* Adds filesystem mountpoint labels to existing metrics for ease of use.
Signed-off-by: Marcus Cobden <leth@users.noreply.github.com>
The textfile collector will now provide a unified metric description
(that will look like "Metric read from file/a.prom, file/b.prom")
for metrics collected accross several text-files that don't already
have a description.
Also change the error handling in the textfile collector tests to
ContinueOnError to better mirror the real-life use-case.
Signed-off-by: Guillaume Espanel <guillaume.espanel.ext@ovhcloud.com>
Signed-off-by: Guillaume Espanel <guillaume.espanel.ext@ovhcloud.com>
* Allow user to select port on NTP server to query
Some people (me!) run NTP servers on non-privileged ports. The `github.com/beevik/ntp` package allows overriding the port, so this change just adds a flag `collector.ntp.server-port` (defaults to 123) and then passes that value through to the query via the `QueryOptions`.
Signed-off-by: Andrew Rowson <github@growse.com>
On Linux, we get more detailed interface statistics from netlink than we did
from `/proc/net/dev`.
This commit adds a new flag (`--collector.netdev.enable-detailed-metrics`) to
expose those statistics under new (incompatible) metric names. When enabled,
the metric names are also changed on Darwin and BSD platforms to keep
everything consistent, but it doesn't provide more detailed statistics on those
platforms.
The old metrics can be derived from the new ones using the following rules
([dev_seq_printf_stats]):
- `receive_errs` = `receive_errors`
- `receive_drop` = `receive_dropped` + `receive_missed_errors`
- `receive_fifo` = `receive_fifo_errors`
- `receive_frame` = `receive_length_errors` + `receive_over_errors` + `receive_crc_errors` + `receive_frame_errors`
- `receive_multicast` = `multicast`
- `transmit_errs` = `transmit_errors`
- `transmit_drop` = `transmit_dropped`
- `transmit_fifo` = `transmit_fifo_errors`
- `transmit_colls` = `collisions`
- `transmit_carrier` = `transmit_aborted_errors` + `transmit_carrier_errors` + `transmit_heartbeat_errors` + `transmit_window_errors`
[dev_seq_printf_stats]: https://github.com/torvalds/linux/blob/master/net/core/net-procfs.c#L75-L97
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
To prepare for the introduction of new metric names, add tests for the legacy
metric names and values. This will make it easier to ensure that the code that
converts the new metrics to the old ones (for compatibility) behaves correctly.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Since netdev metrics are now read from netlink instead of `/proc/net/dev`, we
can't easily spoof them for the end-to-end tests by reading a fixture file in
place of `/proc/net/dev`.
Therefore, we only get metrics for `lo` and ignore those that would return
unpredictable values (i.e. the byte and packet counters).
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Instead of parsing `/proc/net/dev` to get network interface statistics, get
them from a netlink call.
Internally, both come from the [rtnl_link_stats64] struct, but with
`/proc/net/dev`, some of the values are aggregated together in
[dev_seq_printf_stats], so we get less information out of them.
This commit maintains compatibility by aggregating those stats back into the
same metrics.
[rtnl_link_stats64]: https://github.com/torvalds/linux/blob/master/include/uapi/linux/if_link.h#L42-L246
[dev_seq_printf_stats]: https://github.com/torvalds/linux/blob/master/net/core/net-procfs.c#L75-L97
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
These two memory classes have been here for a while now in FreeBSD,
adding them allows having information for all memory classes.
Signed-off-by: François Charlier <fcharlier@ploup.net>
Log a single error message when the udev data directory (`/run/udev/data` by
default) is unreadable, and then don't try to get device properties out of it.
Also lower the log level from error to debug when we can't parse the udev files
properly, since these messages would be sent every time the node exporter gets
scraped.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
When parsing udev data, skip lines that don't start with `E:`.
Lines prefixed with `E:` represent device properties, as documented in
udevadm(8).
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Set the `--path.udev.data` flag to point to the udev fixture, and update the
output fixture with
```console
$ ./end-to-end-test.sh -u
```
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Now that we read some data from `/run/udev/data`, add the corresponding
fixtures and update the expected test results accordingly.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Instead of hard-coding the path to `/run/udev/data`, intoduce a
`--path.udev.data` flag that defaults to that value.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Add labels to the `node_disk_info` metric extracted from udev, such as `model`,
`path`, `revision`, `serial` and `wwn`.
Also add a few metrics related to filesystem and device mapper, which are also
extracted from udev information.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Use standard include/exclude pattern for device include/exclude in the
diskstats collector.
Signed-off-by: Ben Kochie <superq@gmail.com>
Co-authored-by: rushilenekar20 <rushilenekar20@gmail.com>
Fix up handling of CPU info collector on non-x86_64 systems due to
fixtures containing `/proc/cpuinfo` from x86_64.
* Update e2e 64k page test fixture from an arm64 system.
* Enable ARM testing in CircleCI.
Fixes: https://github.com/prometheus/node_exporter/issues/1959
Signed-off-by: Ben Kochie <superq@gmail.com>
* Correctly name collector file.
* Fix cgroup summary type as gauge.
* Use a boolean metric rather than a label for enabled.
Signed-off-by: Ben Kochie <superq@gmail.com>