Commit graph

997 commits

Author SHA1 Message Date
coderwander 0202220881
refactor: Optimize code by using built-in constants in the standard library (#2989)
Signed-off-by: coderwander <770732124@qq.com>
2024-04-16 09:43:16 +02:00
Ayoub Mrini bf67c859bb
fibre_channel: update procfs to take into account optional attributes (#2933)
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-04-15 11:52:59 +02:00
looklose 7d4103c089 chore: fix typo in comment
Signed-off-by: looklose <shishuaiqun@yeah.net>
2024-04-10 14:24:02 +02:00
Daniel Kimsey 29cdbd63fe zfs: Log mib when sysctl read fails on FreeBSD
When the zfs collector fails on FreeBSD it doesn't log which `mib` triggered the issue. This makes diagnostics hard.

Incompatibilities in the list of supported mibs is not uncommon with major os updates. By adding this change, it'll be easier for users to report the specific mib that is triggering the failure.

Related to #2847

Signed-off-by: Daniel Kimsey <90741+dekimsey@users.noreply.github.com>
2024-04-10 12:44:05 +02:00
Jonathan Davies b6227af54b
os_release.go: Added support end parsing support. (#2982)
* os_release.go: Added support end parsing support.

Fixes: #2977

Signed-off-by: Jonathan Davies <jpds@protonmail.com>

* os_release_test.go: Added TestParseOSSupportEnd.

Signed-off-by: Jonathan Davies <jpds@protonmail.com>

---------

Signed-off-by: Jonathan Davies <jpds@protonmail.com>
2024-04-03 12:23:03 +02:00
Pranshu Srivastava ebddab47e1
collector/textfile: Avoid inconsistent help-texts (#2962)
Avoid metrics with inconsistent help-texts. The earlier behaviour has
been preserved in the sense that the first encountered instance is still
used to generate metrics, whereas the subsequent inconsistent ones are
ignored along with a few peripheral changes.

```
 # HELP node_scrape_collector_duration_seconds node_exporter: Duration of a collector scrape.
 #TYPE node_scrape_collector_duration_seconds gauge
 node_scrape_collector_duration_seconds{collector="textfile"} 0.0004005
 # HELP node_scrape_collector_success node_exporter: Whether a collector succeeded.
 # TYPE node_scrape_collector_success gauge
 node_scrape_collector_success{collector="textfile"} 1
 # HELP node_textfile_mtime_seconds Unixtime mtime of textfiles successfully read.
 # TYPE node_textfile_mtime_seconds gauge
 node_textfile_mtime_seconds{file="/Users/rexagod/repositories/misc/node_exporter/ne-bar.prom"} 1.710812009e+09
 node_textfile_mtime_seconds{file="/Users/rexagod/repositories/misc/node_exporter/ne-foo.prom"} 1.710811982e+09
 # HELP node_textfile_scrape_error 1 if there was an error opening or reading a file, 0 otherwise
 # TYPE node_textfile_scrape_error gauge
 node_textfile_scrape_error 1
 # HELP promhttp_metric_handler_errors_total Total number of internal errors encountered by the promhttp metric handler.
 # TYPE promhttp_metric_handler_errors_total counter
 promhttp_metric_handler_errors_total{cause="encoding"} 0
 promhttp_metric_handler_errors_total{cause="gathering"} 0
 # HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
 # TYPE promhttp_metric_handler_requests_in_flight gauge
 promhttp_metric_handler_requests_in_flight 1
 # HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
 # TYPE promhttp_metric_handler_requests_total counter
 promhttp_metric_handler_requests_total{code="200"} 0
 promhttp_metric_handler_requests_total{code="500"} 0
 promhttp_metric_handler_requests_total{code="503"} 0
 # HELP tau_infrastructure_performing_maintenance_task At what timestamp a given task started or stopped, the last time it was run.
 # TYPE tau_infrastructure_performing_maintenance_task gauge
 tau_infrastructure_performing_maintenance_task{main_task="nightly",start_or_stop="start",sub_task="main"} 1.64728080198446e+09
```

Fixes: #2317

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
2024-03-24 06:43:03 +01:00
Ben Kochie b3bbd1f52c Sanitize ethtool metric name keys
Apply the same metric name sanitization to the keys as to the metric
names. This avoids conflicting help strings in the metric registry.

Fixes: https://github.com/prometheus/node_exporter/issues/2893

Signed-off-by: Ben Kochie <superq@gmail.com>
2024-03-21 12:09:01 +01:00
Gavin Lam 94ef5cc666
Enable watchdog module by default; Add no data error (#2953)
Signed-off-by: Gavin Lam <gavin.oss@tutamail.com>
2024-03-14 07:50:55 +01:00
Gavin Lam 95efb86f6b
Add new collector and metrics for watchdog (#2309) (#2880)
Signed-off-by: Gavin Lam <gavin.oss@tutamail.com>
2024-03-09 10:00:06 +01:00
linuxgcc 5e412a689a
disable selinux,fix end-to-end-test.sh error(#2934) (#2937)
Signed-off-by: heyitao <heyitao@uniontech.com>
Co-authored-by: heyitao <heyitao@uniontech.com>
2024-03-08 15:06:03 +01:00
Ben Kochie 3a02ab1cf0
Revert "filesystem: fix mountTimeout not working issue (#2903)" (#2932)
This reverts commit 9f1f791ac2.

Signed-off-by: Ben Kochie <superq@gmail.com>
2024-02-20 10:31:08 +01:00
Pamela Mei 12192475c8
filesystem: surface device errors (#2923)
filesystem: surface filesystem device error

Fixes: #2918
---------

Signed-off-by: Pamela Mei i540369 <pamela.mei@sap.com>
2024-02-18 12:04:30 +01:00
DongWei 9f1f791ac2
filesystem: fix mountTimeout not working issue (#2903)
Signed-off-by: DongWei <jiangxuege@hotmail.com>
2024-02-14 15:36:16 +01:00
Caleb Webber 6d18ce7bca
Revert "Add ZFS freebsd per dataset stats (#2753)" (#2925)
This reverts commit f34aaa6109.

Signed-off-by: Caleb Webber <caleb@codingthemsoftly.com>
2024-02-14 09:13:18 +01:00
Ben Kochie 29fca60a45
Fix hwmon error capture (#2915)
Fix golangci-lint "ineffectual assignment" by correctly capturing any
errors within the hwmon gathering loop.

Signed-off-by: Ben Kochie <superq@gmail.com>
2024-02-07 15:06:24 +01:00
TaoGe fe78e7e51a
fix hwmon nil ptr (#2873)
* fix hwmon nil ptr

syslink maybe lost in some cases.

---------

Signed-off-by: TaoGe <6657718+yowenter@users.noreply.github.com>
2024-02-03 10:13:12 +01:00
tyltr 34467b1d7a
chore:remove constant from function (#2884)
Signed-off-by: tyltr <tylitianrui@126.com>
2024-01-29 13:09:38 +01:00
David O'Rourke 94ddad4dec
exec_bsd: Fix labels for vm.stats.sys.v_syscall sysctl (#2895)
Signed-off-by: David O'Rourke <david.orourke@gmail.com>
2024-01-29 13:08:53 +01:00
DBS-ST-VIT e22174ca8e
diskstats: ignore zram devices on linux systems by default (#2898)
Signed-off-by: DBS-ST-VIT <dbs-st-vit@users.noreply.github.com>
Co-authored-by: DBS-ST-VIT <dbs-st-vit@users.noreply.github.com>
2024-01-15 09:32:58 +01:00
João Pedro Lima 16f7122d31
Add mitigation information to the linux vulnerabilities collector (#2806)
While the CPU vulnerabilities collector has been added in https://github.com/prometheus/node_exporter/pull/2721 , it's currently not including information regarding the mitigation strategy used for a given vulnerability.

This information can be quite valuable, as often times different mitigation strategies come with a different performance impact.

This commit adds a third label to the cpu_vulnerabilities_info metric, to include the "mitigation" used for a given vulnerability - if a given vulnerability is not affecting a node or the node is still vulnerable, the mitigation is expected to be empty.

Signed-off-by: João Lima <jlima@cloudflare.com>
2023-12-14 13:15:27 +01:00
frigo 0550ab3f04
Add TCPOFOQueue to default netstat metrics (#2867)
Adds a count for TCP packets received out of orders. This can be an
indication that there is packet loss on the way packets travel towards
this server. In that case, the sender will retransmit (and we can
already monitor the Tcp_RetransSegs there), but we have no way to
monitor the packet loss on the receiver side. When a packet is received
and the receiver detects previous one missing, it will increase the
TCPOFOQueue counter and reply with selective ACK to the sender, both
possible indications of packet loss. Confirmation of packet loss can be
achieved by taking packet captures, ignoring wireshark analysis, and
carefully looking at data being retransmitted based on the TCP seq.

Just like RetransSegs, TCPOFOQueue should be interesting for any
deployment as a mean to detect packet loss, so here suggesting adding it
to the default list.

Signed-off-by: François Rigault <frigo@amadeus.com>
Co-authored-by: François Rigault <frigo@amadeus.com>
2023-12-08 18:24:07 +01:00
Gavin Lam 332232c22c
Add new collector and metrics for XFRM (#2544) (#2866)
Signed-off-by: Gavin Lam <gavin.oss@tutamail.com>
2023-12-03 17:10:59 +01:00
Simon Pasquier 12f1744e79
Fix debug log in cpu collector (#2857)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2023-11-24 16:37:27 +01:00
Tobias Klausmann 78af952e63
NFSd: handle new wdeleg_getattr attribute in /proc/net/rpc/nfsd (#2810)
This attribute was introduced it v6.6-rc1.

The relevant changes in procfs were merged here:

https://github.com/prometheus/procfs/pull/574

and are part of procfs v0.11.2

I have also figured out that the stat should be part of the v4 ops
counters struct, but that will need changes to both procfs and this
code. Since people are already using 6.6-rc1, I think it's better to get
the code out there --- even if they don't care about wdeleg_getattr,
currently they get _no_ nfsd stats with 6.6-rc1.

I will make two follow-up PRs to clean this up in the next releases of
procfs and node-exporter.

Signed-off-by: Tobias Klausmann <klausman@schwarzvogel.de>
2023-11-14 03:54:11 +01:00
dongjiang 86ed8cdc6b
NFSd: fix nfsd v4 index miss (#2824)
* fix nfsd v4 index miss

---------

Signed-off-by: dongjiang1989 <dongjiang1989@126.com>
2023-10-16 18:14:21 +02:00
Ben Kochie 31a9cca551
Update e2e fixtures
Update for fixes in https://github.com/prometheus/procfs/pull/543

Signed-off-by: Ben Kochie <superq@gmail.com>
2023-10-16 13:37:17 +02:00
Conall O'Brien 60c86ab218
Fix inconsistent variable name, to address compilation issue (#2820)
https://github.com/prometheus/node_exporter/issues/2819

Signed-off-by: Conall O'Brien <conall@conall.net>
2023-10-04 21:16:58 +02:00
dongjiang e8c5110ada
fix(zfs) zfs arcstats.p on FreeBSD 14.0+ (#2754)
* dongjiang, fix zfs arcstats.p

Signed-off-by: dongjiang1989 <dongjiang1989@126.com>

* dongjiang, fix gofmt -s

Signed-off-by: dongjiang1989 <dongjiang1989@126.com>

* change warn log to debug log by code review

Signed-off-by: dongjiang1989 <dongjiang1989@126.com>

---------

Signed-off-by: dongjiang1989 <dongjiang1989@126.com>
2023-09-20 11:49:56 +02:00
Metbog e387997e4c Move RO status before error return
Signed-off-by: Metbog <metbog@gmail.com>
2023-09-20 11:26:39 +02:00
Conall O'Brien f34aaa6109
Add ZFS freebsd per dataset stats (#2753)
* Rename parsePoolObjsetFile to parseLinuxPoolObjsetFile to better reflect
it's scope
* Create a new parseFreeBSDPoolObjsetStats function, to generate a list
of per pool metrics to be queried via sysctl


---------

Signed-off-by: Conall O'Brien <conall@conall.net>
2023-09-11 06:33:21 +02:00
Daniel Swarbrick 685b98ec7f
Optionally fetch ARP stats via rtnetlink instead of procfs (#2777)
* Optionally fetch ARP stats via rtnetlink instead of procfs

Implement collection of ARP stats via rtnetlink to work around
shortcomings in the output of /proc/net/arp, which truncates InfiniBand
link-layer addresses.

Fixes: #2776

---------

Signed-off-by: Daniel Swarbrick <daniel.swarbrick@gmail.com>
Co-authored-by: Ben Kochie <superq@gmail.com>
2023-09-09 16:41:09 +02:00
Daniel Swarbrick 381f32b1c5 btrfs: close btrfs.FS handle after use
Despite being quite hard to provoke (< 10% in my testing), the btrfs
collector would occasionally leave stale FDs relating to btrfs
mountpoints, making the filesystems unable to be unmounted.

Fixes: #2772.

Signed-off-by: Daniel Swarbrick <daniel.swarbrick@gmail.com>
2023-08-21 16:00:00 +02:00
Josh Bradley f2b274350a
fix(qdisc) flag naming corrected for consistency (#2782)
* fix collector qdisc flag naming for consistency

---------

Signed-off-by: jbradleynh <jbradley@fastly.com>
2023-08-21 07:48:09 +02:00
John Kordich e120d958f5 Change log message from Warn to Debug
Signed-off-by: John Kordich <jkordich@gmail.com>

Co-authored-by: Ben Kochie <superq@gmail.com>
Signed-off-by: John Kordich <jkordich@gmail.com>
2023-08-20 13:38:47 +02:00
John Kordich 933b1c1797 Add new node_cpu_frequency_hertz metric
Revert changes to node_cpu_info and add new node_cpu_frequency_hertz
metric for measuring CPU frequency from /proc/cpuinfo

Signed-off-by: John Kordich <jkordich@gmail.com>
2023-08-20 13:38:47 +02:00
John Kordich e84c278107 Update e2e-output.txt with new expected metric values
Changes the e2e-output.txt file to have the expected CPU MHz values
for the node_cpu_info metric.

Signed-off-by: John Kordich <jkordich@gmail.com>
2023-08-20 13:38:47 +02:00
John Kordich 223ebbd50c Add CPU MHz as the value for "node_cpu_info" metric
For CPUs which don't have an available (or insertable) cpufreq driver,
the /proc/cpuinfo file can sometimes have accurate CPU core frequency
measurements. This change replaces the constant value of "1" for the
"node_cpu_info" metric with the parsed CPU MHz value from
/proc/cpuinfo for each core.

Signed-off-by: John Kordich <jkordich@gmail.com>
2023-08-20 13:38:47 +02:00
Daniel Swarbrick 37ce0bab8c
Sync build tags in *_test.go (#2767)
Ensure that unwanted tests are correctly excluded when various build
tags are specified, i.e. when the code that they test would be excluded
from compilation.

Signed-off-by: Daniel Swarbrick <daniel.swarbrick@gmail.com>
2023-08-15 11:38:13 +02:00
Daniel Swarbrick 3fb5f70b0c Drop redundant GOOS build tags if already in filename
Drop redundant GOOS build tags at start of file if the constraint is
already specified by the filename, e.g. foo_GOOS.go or
foo_GOOS_GOARCH.go, avoiding potential confusion in future.

cf. https://pkg.go.dev/cmd/go#hdr-Build_constraints

Signed-off-by: Daniel Swarbrick <daniel.swarbrick@gmail.com>
2023-08-08 14:30:39 +02:00
Benoît Knecht 3b9613cfae
collector/netdev_linux.go: Fallback to 32-bit stats (#2757)
On some platforms, `msg.Attributes.Stats64` is `nil` because the kernel doesn't
expose 64-bit stats. In that case, return `msg.Attributes.Stats` instead, which
are the 32-bit equivalent.

Note that `RXOtherhostDropped` isn't available in that case, so we hardcode it
to zero.

Fixes #2756.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
2023-08-01 15:58:53 +02:00
PrometheusBot fa481315b5
Synchronize common files from prometheus/prometheus (#2736)
* Update common Prometheus files

Signed-off-by: prombot <prometheus-team@googlegroups.com>

* Fixup linting issues

* Disbale unused-parameter check.
* Fixup minor linting issues.

Signed-off-by: Ben Kochie <superq@gmail.com>

---------

Signed-off-by: prombot <prometheus-team@googlegroups.com>
Signed-off-by: Ben Kochie <superq@gmail.com>
Co-authored-by: Ben Kochie <superq@gmail.com>
2023-07-18 10:46:59 +02:00
Ben Kochie 7c564bcbef
Fixup hwmon chip include (#2739)
Use the correct include value to the device filter function.
* Add new bogus hwmon fixture.
* Update end-to-end test to use hwmon chip include flag.

Signed-off-by: Ben Kochie <superq@gmail.com>
2023-07-10 12:46:30 +02:00
Conall O'Brien c241ecf8bd
Update all Include and Exclude variables to use the systemdUnit naming (#2740)
prefix.

Leave an annotation about using regexps instead of device_filter.go, so
@SuperQ doesn't need to remember everything.

Signed-off-by: Conall O'Brien <conall@conall.net>
2023-07-10 12:25:18 +02:00
Conall O'Brien 8b4dc82488
Add include and exclude filter for hwmon collector (#2699)
* Add include and exclude flags chip name flags to hwmon collector, following example in systemd collector

---------

Signed-off-by: Conall O'Brien <conall@conall.net>
Co-authored-by: Ben Kochie <superq@gmail.com>
2023-07-07 10:30:24 +02:00
Michal c31ebb4359
Add cpu vulnerabilities reporting from sysfs (#2721)
* Add cpu vulnerabilities reporting from sysfs

---------

Signed-off-by: Michal Wasilewski <michal@mwasilewski.net>
2023-07-01 14:21:49 +02:00
Cam Cope 2346fd9b06
add missing linkspeeds (#2711)
Signed-off-by: Cam Cope <ccope@crusoeenergy.com>
2023-06-18 09:01:53 +02:00
Erica Mays bdc430af2b Parallelize stat calls in Linux filesystem collector.
This change adds the ability to process multiple stat calls in parallel.
Processing is rate-limited based on the new flag
`collector.filesystem.stat-workers` (default 4).

Caveat: filesystem stats information is no longer in the same order as
returned by `/proc/1/mounts`.  This should not be an issue.

Caveat: This change currently uses unbuffered channels to prove
correctness without reliance on buffers.  Buffered channels will yield
superior performance.

Signed-off-by: Erica Mays <erica@emays.dev>
2023-06-09 12:31:31 +02:00
Dan Williams 8c5847bd94
netlink: read missing attributes from sysfs (#2669)
Read missing dev_id, name_assign_type, and addr_assign_type
from sysfs, since they only take a device-specific lock and
not the whole RTNL lock. This means reading them is much less
impactful on other system processes than many of the other
attributes in sysfs that do take the RTNL lock.

Signed-off-by: Dan Williams <dcbw@redhat.com>
2023-05-25 15:10:39 +02:00
Abbey Woodyear eaacb2e3c7
exposing softirq metrics (#2294)
Signed-off-by: abbeywoodyear <abbey.woodyear@thehutgroup.com>
2023-05-25 15:09:32 +02:00
Remi Jouannet df1b53bee2
softnet: additionals metrics from softnet_data, (#2592)
* softnet: additionals metrics from softnet_data, https://github.com/prometheus/procfs/pull/473
---------

Signed-off-by: remi <remijouannet@gmail.com>
Signed-off-by: Rémi Jouannet <remijouannet@gmail.com>
2023-05-24 17:23:13 +02:00