Commit graph

50 commits

Author SHA1 Message Date
Vitaly Zhuravlev b7dfb32bfc Set severity to NodeCPUHighUsage to info
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev 6bdc1d9c98 Add thresholds for memory, disk and system alerts
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev 77ae769179 Add thresholds for memory alerts
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev 2111e70ac7 Add comma after 'mounted on'
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev e48e7909f4 Extend alert description
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev da32f8de17 Decrease NodeSystemdServiceFailed severity to warning
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev 580c497261 Add NodeSystemSaturation and NodeMemoryMajorPagesFaults
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev e15e7d6a7b Fix NodeMemoryHighUtilization alert
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev c3ec6e8af1 Add diskDevice selector
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev 962de6c921 Add %(nodeExporterSelector)s to Network and conntrack alerts
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev 94fc82e418 Add NodeDiskIOSaturation alert
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev 614030bb80 Set 'at' everywhere as preposition for instance
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev 3d8075da7d Decrease NodeNetwork*Errs pending period
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:51 +08:00
Vitaly Zhuravlev 74794182a7 Add failed systemd service alert
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:51 +08:00
Vitaly Zhuravlev fd2d62af63 Add CPU and memory alerts
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:51 +08:00
Vitaly Zhuravlev 0e0399d41e Decrease NodeFilesystem pending time to 15m
30m is too long and there is a risk of running out of disk space/inodes completely if something is filling up disk very fast (like log file).

Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:51 +08:00
Vitaly Zhuravlev fc967aa992 Add mountpoint to NodeFilesystem alerts
This helps to identify alerting filesystem.

Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:51 +08:00
Will Bollock 0a17e17718
docs (node/mixin): fix annotation for Skew alert (#2671)
This updates the annotation for the NodeClockSkewDetected mixin alert to
match the new threshold set.

Original discussion was in this PR: https://github.com/prometheus/node_exporter/pull/1480

I spent an embarrassingly large amount of time trying to figure out how
the heck that alert would mean 300s of clock skew. Turns out the
annotation was just left the same after the threshold change.

Signed-off-by: Will Bollock <wbollock@linode.com>
2023-05-11 10:33:10 +02:00
Jan Fajerski 87b8e3790d
docs/node-mixin: add fsMointpointSelector to alerts and dashboards (#2446)
* docs/node-mixin: add fsMountpointSelector

This adds the option to add a `mountpoint` selector to filesystem
related alerts. The default is `mountpoint!=""`.

* docs/node-mixins: add fsMountpointSelector to dashboards

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
2022-10-20 13:06:31 +02:00
Paweł Krupa (paulfantom) 8571536327 docs/node-mixin: add missing selectors
Signed-off-by: Paweł Krupa (paulfantom) <pawel@krupa.net.pl>
2022-07-19 16:44:16 +02:00
Daniel Lenar 0b50eb7294 Reverse fsSpaceAvailableCriticalThreshold and fsSpaceAvailableWarningThreshold
Currently critical alert for space available alerts on warning and
warning alert for space available alerts on critical.

Signed-off-by: Daniel Lenar <dlenar@vailsys.com>
2022-04-21 11:34:54 -05:00
Vitaly Zhuravlev 8823605f12 Fix NodeFileDescriptorLimit alerts
Signed-off-by: Vitaly Zhuravlev <zhuravlev.vitaly@gmail.com>
2022-04-07 16:25:17 +04:00
paulfantom 832909dd25 docs/node-mixin/alerts: make NodeFilesystemAlmostOutOfSpace fire earlier
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2021-08-16 16:35:58 +02:00
Loïc Blot 55ffe57cbc
feat(rules): add NodeFileDescriptorLimit kernel exhaustion alert
Add a new alert when fs.file-nr is close to fs.file-max

Signed-off-by: Loic Blot <loic.blot@unix-experience.fr>
2021-04-30 12:40:09 +02:00
Ben Kochie eefb18db02
Merge pull request #1764 from dhoppe/patch-1
Use description instead of message as field for annotations
2021-01-24 14:56:03 +01:00
Ben Kochie 4b68aeb80a
Merge pull request #1862 from fsschmitt/fix/alerts-label-naming
fix: node_md_disks state label from fail to failed
2021-01-24 14:53:22 +01:00
Björn Rabenstein 9c9c636305
Merge pull request #1861 from paulfantom/network-alerts
docs/node-mixin/alerts: use ratio for network alerts
2020-10-19 12:14:24 +02:00
paulfantom f81747e608 docs/node-mixin/alerts: add max error condition to alert about desynchronized clock
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2020-10-08 11:15:16 +02:00
fsschmitt effa4da989 fix: node_md_disks state label as failed
Signed-off-by: fsschmitt <492108+fsschmitt@users.noreply.github.com>
2020-10-07 14:20:56 +01:00
paulfantom d7cbe85d22
docs/node-mixin/alerts: use a rate for network alerts
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2020-10-07 13:04:51 +02:00
Nicolas Lamirault ff2ff3410f
Configure 2 thresholds for NodeFilesystemAlmostOutOfSpace alert (#1835)
* Add: configure 2 thresholds for NodeFilesystemAlmostOutOfSpace alert

Signed-off-by: Nicolas Lamirault <nicolas.lamirault@gmail.com>
2020-09-18 11:28:32 +02:00
Rajat Vig 7dd8adf7ed
Fix NodeRAIDDegraded to not use a string rule expressions
Signed-off-by: Rajat Vig <rvig@etsy.com>
2020-08-28 10:43:39 +01:00
Simon Pasquier 02212dd2c6 Run jsonnetfmt
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-08-25 10:15:30 +02:00
Hao Ke 9b7a0d06a1 Fix syntax error
Signed-off-by: Hao Ke <hao.ke@auryc.com>

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-08-25 10:07:37 +02:00
paulfantom e4ec8e04c5 docs/node-mixin: add alerts about failing RAID array
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2020-08-24 16:17:20 +02:00
Dennis Hoppe fc64b70386
Use description instead of message as field for annotations
Signed-off-by: Dennis Hoppe <github@debian-solutions.de>
2020-06-24 13:38:57 +02:00
Frederic Branczyk b42819b69d
Merge pull request #1657 from povilasv/NodeTextFileCollectorScrapeError
Add NodeTextFileCollectorScrapeError alert to mixin
2020-04-30 17:54:06 +02:00
Povilas Versockas bd3e6d224c
Add NodeTextFileCollectorScrapeError alert to mixin
Signed-off-by: Povilas Versockas <p.versockas@gmail.com>
2020-03-31 18:12:36 +03:00
beorn7 8b00b22904 Fix sign error in NodeClockSkewDetected
Signed-off-by: beorn7 <beorn@grafana.com>
2020-03-25 13:07:23 +01:00
paulfantom 820f8d595e
docs/node-mixin: alert on desynchronised clock
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2020-03-23 08:23:58 +01:00
Neraud 1006a2c4bb Add missing coma
Signed-off-by: Neraud <neraud.login@gmail.com>
2020-03-21 13:06:43 +01:00
Povilas Versockas 48bb6f670c Add NodeHighNumberConntrackEntriesUsed
Signed-off-by: Povilas Versockas <p.versockas@gmail.com>
2020-03-20 17:46:05 +01:00
iuri aranda 0107bc7942
Make FS space alerts thresholds configurable (#1624)
* Make FS space alerts thresholds configurable (#1)

This makes it possible to tweak the thresholds for
the NodeFilesystemSpaceFillingUp alerts. Which
might be necessary in systems like Kubernetes,
where the image garbage collector runs at 85%,
so it's not a problem that the disk reaches that usage %.

Signed-off-by: iuri aranda <iuri@skyscrapers.eu>
2020-03-02 16:24:51 +01:00
Leo dfeec07f2f Fix node-mixin prometheus alert rules to use percentage
Signed-off-by: Leo <leonardjonathanoh@live.com>
2019-09-11 08:47:24 +00:00
beorn7 97ef113762 Make the severity of "critical" alerts configurable
This addresses the blissful scenario where single-node failures are
unproblematic. No reason to wake somebody up if a node is about to
screw itself up by filling the disk.

Signed-off-by: beorn7 <beorn@grafana.com>
2019-08-14 22:24:24 +02:00
beorn7 3a770a0b1d Convert annotations from message to summary/description
Signed-off-by: beorn7 <beorn@grafana.com>
2019-07-16 21:40:57 +02:00
beorn7 a92d1d7889 Address review comments, batch 2
Signed-off-by: beorn7 <beorn@grafana.com>
2019-07-16 21:18:17 +02:00
beorn7 b3b47f2d07 Make selector naming consistent
Signed-off-by: beorn7 <beorn@grafana.com>
2019-07-10 20:09:01 +02:00
beorn7 dec5b5b053 Fix indentation
Signed-off-by: beorn7 <beorn@grafana.com>
2019-07-10 20:07:20 +02:00
beorn7 2df034c055 Move node-mixin into docs directory
Signed-off-by: beorn7 <beorn@grafana.com>
2019-07-05 19:38:03 +02:00
Renamed from node-mixin/alerts/alerts.libsonnet (Browse further)