Commit graph

2067 commits

Author SHA1 Message Date
dependabot[bot] e590476dc7
build(deps): bump golang.org/x/sys from 0.10.0 to 0.12.0 (#2797)
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.10.0 to 0.12.0.
- [Commits](https://github.com/golang/sys/compare/v0.10.0...v0.12.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-09 17:09:30 +02:00
Daniel Swarbrick 685b98ec7f
Optionally fetch ARP stats via rtnetlink instead of procfs (#2777)
* Optionally fetch ARP stats via rtnetlink instead of procfs

Implement collection of ARP stats via rtnetlink to work around
shortcomings in the output of /proc/net/arp, which truncates InfiniBand
link-layer addresses.

Fixes: #2776

---------

Signed-off-by: Daniel Swarbrick <daniel.swarbrick@gmail.com>
Co-authored-by: Ben Kochie <superq@gmail.com>
2023-09-09 16:41:09 +02:00
Ben Kochie cda1d820bb
Update to Go 1.21 (#2796)
* Update Go build to 1.21.
* Update machine images to Ubuntu 22.04 current.

Signed-off-by: Ben Kochie <superq@gmail.com>
2023-09-09 15:44:48 +02:00
Daniel Swarbrick 381f32b1c5 btrfs: close btrfs.FS handle after use
Despite being quite hard to provoke (< 10% in my testing), the btrfs
collector would occasionally leave stale FDs relating to btrfs
mountpoints, making the filesystems unable to be unmounted.

Fixes: #2772.

Signed-off-by: Daniel Swarbrick <daniel.swarbrick@gmail.com>
2023-08-21 16:00:00 +02:00
Josh Bradley f2b274350a
fix(qdisc) flag naming corrected for consistency (#2782)
* fix collector qdisc flag naming for consistency

---------

Signed-off-by: jbradleynh <jbradley@fastly.com>
2023-08-21 07:48:09 +02:00
John Kordich e120d958f5 Change log message from Warn to Debug
Signed-off-by: John Kordich <jkordich@gmail.com>

Co-authored-by: Ben Kochie <superq@gmail.com>
Signed-off-by: John Kordich <jkordich@gmail.com>
2023-08-20 13:38:47 +02:00
John Kordich 933b1c1797 Add new node_cpu_frequency_hertz metric
Revert changes to node_cpu_info and add new node_cpu_frequency_hertz
metric for measuring CPU frequency from /proc/cpuinfo

Signed-off-by: John Kordich <jkordich@gmail.com>
2023-08-20 13:38:47 +02:00
John Kordich e84c278107 Update e2e-output.txt with new expected metric values
Changes the e2e-output.txt file to have the expected CPU MHz values
for the node_cpu_info metric.

Signed-off-by: John Kordich <jkordich@gmail.com>
2023-08-20 13:38:47 +02:00
John Kordich 223ebbd50c Add CPU MHz as the value for "node_cpu_info" metric
For CPUs which don't have an available (or insertable) cpufreq driver,
the /proc/cpuinfo file can sometimes have accurate CPU core frequency
measurements. This change replaces the constant value of "1" for the
"node_cpu_info" metric with the parsed CPU MHz value from
/proc/cpuinfo for each core.

Signed-off-by: John Kordich <jkordich@gmail.com>
2023-08-20 13:38:47 +02:00
takt 6225435677
Upgrade github.com/ema/qdisc to v1.0.0 to improve qdisc collector (#2779)
performance

Signed-off-by: Oliver Geiselhardt-Herms <ogh@deepl.com>
Co-authored-by: Oliver Geiselhardt-Herms <ogh@deepl.com>
2023-08-18 15:20:22 +02:00
Daniel Swarbrick 37ce0bab8c
Sync build tags in *_test.go (#2767)
Ensure that unwanted tests are correctly excluded when various build
tags are specified, i.e. when the code that they test would be excluded
from compilation.

Signed-off-by: Daniel Swarbrick <daniel.swarbrick@gmail.com>
2023-08-15 11:38:13 +02:00
Daniel Swarbrick 3fb5f70b0c Drop redundant GOOS build tags if already in filename
Drop redundant GOOS build tags at start of file if the constraint is
already specified by the filename, e.g. foo_GOOS.go or
foo_GOOS_GOARCH.go, avoiding potential confusion in future.

cf. https://pkg.go.dev/cmd/go#hdr-Build_constraints

Signed-off-by: Daniel Swarbrick <daniel.swarbrick@gmail.com>
2023-08-08 14:30:39 +02:00
dependabot[bot] c6c28d915c
build(deps): bump github.com/jsimonetti/rtnetlink from 1.3.3 to 1.3.4 (#2765)
Bumps [github.com/jsimonetti/rtnetlink](https://github.com/jsimonetti/rtnetlink) from 1.3.3 to 1.3.4.
- [Release notes](https://github.com/jsimonetti/rtnetlink/releases)
- [Commits](https://github.com/jsimonetti/rtnetlink/compare/v1.3.3...v1.3.4)

---
updated-dependencies:
- dependency-name: github.com/jsimonetti/rtnetlink
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-02 18:13:53 +02:00
dependabot[bot] 60f08e0aac
build(deps): bump github.com/prometheus/procfs from 0.11.0 to 0.11.1 (#2763)
Bumps [github.com/prometheus/procfs](https://github.com/prometheus/procfs) from 0.11.0 to 0.11.1.
- [Release notes](https://github.com/prometheus/procfs/releases)
- [Commits](https://github.com/prometheus/procfs/compare/v0.11.0...v0.11.1)

---
updated-dependencies:
- dependency-name: github.com/prometheus/procfs
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-02 18:13:17 +02:00
dependabot[bot] 35278b94f8
build(deps): bump github.com/beevik/ntp from 1.1.1 to 1.3.0 (#2762)
Signed-off-by: Ben Kochie <superq@gmail.com>
2023-08-02 17:49:21 +02:00
Benoît Knecht 3b9613cfae
collector/netdev_linux.go: Fallback to 32-bit stats (#2757)
On some platforms, `msg.Attributes.Stats64` is `nil` because the kernel doesn't
expose 64-bit stats. In that case, return `msg.Attributes.Stats` instead, which
are the 32-bit equivalent.

Note that `RXOtherhostDropped` isn't available in that case, so we hardcode it
to zero.

Fixes #2756.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
2023-08-01 15:58:53 +02:00
L 5d1b96c936 Include drm collector in README
The DRM collector was missing in the README, this change includes it together with a short description.

Signed-off-by: L <3177243+LukeLR@users.noreply.github.com>
2023-07-31 13:14:13 +01:00
PrometheusBot 8fb4f78ce5
Update common Prometheus files (#2752)
Signed-off-by: prombot <prometheus-team@googlegroups.com>
2023-07-18 21:08:19 +02:00
PrometheusBot fa481315b5
Synchronize common files from prometheus/prometheus (#2736)
* Update common Prometheus files

Signed-off-by: prombot <prometheus-team@googlegroups.com>

* Fixup linting issues

* Disbale unused-parameter check.
* Fixup minor linting issues.

Signed-off-by: Ben Kochie <superq@gmail.com>

---------

Signed-off-by: prombot <prometheus-team@googlegroups.com>
Signed-off-by: Ben Kochie <superq@gmail.com>
Co-authored-by: Ben Kochie <superq@gmail.com>
2023-07-18 10:46:59 +02:00
Ben Kochie 2d8069208c
Release v1.6.1 (#2747)
Rebuild with latest Go compiler bugfix release.

Signed-off-by: Ben Kochie <superq@gmail.com>
2023-07-17 13:58:44 +02:00
Ben Kochie 7c564bcbef
Fixup hwmon chip include (#2739)
Use the correct include value to the device filter function.
* Add new bogus hwmon fixture.
* Update end-to-end test to use hwmon chip include flag.

Signed-off-by: Ben Kochie <superq@gmail.com>
2023-07-10 12:46:30 +02:00
Conall O'Brien c241ecf8bd
Update all Include and Exclude variables to use the systemdUnit naming (#2740)
prefix.

Leave an annotation about using regexps instead of device_filter.go, so
@SuperQ doesn't need to remember everything.

Signed-off-by: Conall O'Brien <conall@conall.net>
2023-07-10 12:25:18 +02:00
Gabi Davar f4344579d5
Add missing ethtool flag documentation (#2743)
Signed-off-by: Gabi Davar <grizzly.nyo@gmail.com>
2023-07-08 09:36:36 +02:00
Conall O'Brien 8b4dc82488
Add include and exclude filter for hwmon collector (#2699)
* Add include and exclude flags chip name flags to hwmon collector, following example in systemd collector

---------

Signed-off-by: Conall O'Brien <conall@conall.net>
Co-authored-by: Ben Kochie <superq@gmail.com>
2023-07-07 10:30:24 +02:00
Ben Kochie ed57c15e2c
Merge pull request #2644 from v-zhuravlev/mixin_alerts
Mixin: Add and update alerts
2023-07-04 09:12:00 +02:00
dependabot[bot] a24344d4a8 build(deps): bump github.com/prometheus/client_golang
Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.15.1 to 1.16.0.
- [Release notes](https://github.com/prometheus/client_golang/releases)
- [Changelog](https://github.com/prometheus/client_golang/blob/main/CHANGELOG.md)
- [Commits](https://github.com/prometheus/client_golang/compare/v1.15.1...v1.16.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/client_golang
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-07-03 11:57:59 +02:00
dependabot[bot] 74da9f9d85 build(deps): bump github.com/beevik/ntp from 1.0.0 to 1.1.1
Bumps [github.com/beevik/ntp](https://github.com/beevik/ntp) from 1.0.0 to 1.1.1.
- [Release notes](https://github.com/beevik/ntp/releases)
- [Changelog](https://github.com/beevik/ntp/blob/main/RELEASE_NOTES.md)
- [Commits](https://github.com/beevik/ntp/compare/v1.0.0...v1.1.1)

---
updated-dependencies:
- dependency-name: github.com/beevik/ntp
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-07-03 11:57:04 +02:00
Michal c31ebb4359
Add cpu vulnerabilities reporting from sysfs (#2721)
* Add cpu vulnerabilities reporting from sysfs

---------

Signed-off-by: Michal Wasilewski <michal@mwasilewski.net>
2023-07-01 14:21:49 +02:00
prombot 3e3ab1778b Update common Prometheus files
Signed-off-by: prombot <prometheus-team@googlegroups.com>
2023-07-01 13:14:03 +02:00
Vitaly Zhuravlev e8d7f4e8b3 Revert alerts pending durtions
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly 3e250a95a0 Update NodeSystemSaturation severity
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev b7dfb32bfc Set severity to NodeCPUHighUsage to info
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev 6bdc1d9c98 Add thresholds for memory, disk and system alerts
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev 77ae769179 Add thresholds for memory alerts
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev 2111e70ac7 Add comma after 'mounted on'
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev e48e7909f4 Extend alert description
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev da32f8de17 Decrease NodeSystemdServiceFailed severity to warning
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev 580c497261 Add NodeSystemSaturation and NodeMemoryMajorPagesFaults
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev e15e7d6a7b Fix NodeMemoryHighUtilization alert
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev c3ec6e8af1 Add diskDevice selector
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev 962de6c921 Add %(nodeExporterSelector)s to Network and conntrack alerts
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev 94fc82e418 Add NodeDiskIOSaturation alert
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev 614030bb80 Set 'at' everywhere as preposition for instance
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev 3d8075da7d Decrease NodeNetwork*Errs pending period
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:51 +08:00
Vitaly Zhuravlev 74794182a7 Add failed systemd service alert
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:51 +08:00
Vitaly Zhuravlev fd2d62af63 Add CPU and memory alerts
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:51 +08:00
Vitaly Zhuravlev 0e0399d41e Decrease NodeFilesystem pending time to 15m
30m is too long and there is a risk of running out of disk space/inodes completely if something is filling up disk very fast (like log file).

Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:51 +08:00
Vitaly Zhuravlev fc967aa992 Add mountpoint to NodeFilesystem alerts
This helps to identify alerting filesystem.

Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:51 +08:00
Ben Kochie a11de2ede5
Update golangci-lint config (#2722)
* Migrate from Python codespell to golangci-lint misspell.
* Inline errcheck exclude list in the golangci-lint config.

Signed-off-by: Ben Kochie <superq@gmail.com>
2023-06-21 10:07:30 +02:00
PrometheusBot d1b634fb80
Update common Prometheus files (#2723)
Signed-off-by: prombot <prometheus-team@googlegroups.com>
2023-06-18 10:20:37 +02:00