Vitaly Zhuravlev
b7dfb32bfc
Set severity to NodeCPUHighUsage to info
...
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev
6bdc1d9c98
Add thresholds for memory, disk and system alerts
...
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev
77ae769179
Add thresholds for memory alerts
...
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev
2111e70ac7
Add comma after 'mounted on'
...
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev
e48e7909f4
Extend alert description
...
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev
da32f8de17
Decrease NodeSystemdServiceFailed severity to warning
...
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev
580c497261
Add NodeSystemSaturation and NodeMemoryMajorPagesFaults
...
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev
e15e7d6a7b
Fix NodeMemoryHighUtilization alert
...
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev
c3ec6e8af1
Add diskDevice selector
...
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev
962de6c921
Add %(nodeExporterSelector)s to Network and conntrack alerts
...
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev
94fc82e418
Add NodeDiskIOSaturation alert
...
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev
614030bb80
Set 'at' everywhere as preposition for instance
...
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:52 +08:00
Vitaly Zhuravlev
3d8075da7d
Decrease NodeNetwork*Errs pending period
...
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:51 +08:00
Vitaly Zhuravlev
74794182a7
Add failed systemd service alert
...
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:51 +08:00
Vitaly Zhuravlev
fd2d62af63
Add CPU and memory alerts
...
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:51 +08:00
Vitaly Zhuravlev
0e0399d41e
Decrease NodeFilesystem pending time to 15m
...
30m is too long and there is a risk of running out of disk space/inodes completely if something is filling up disk very fast (like log file).
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:51 +08:00
Vitaly Zhuravlev
fc967aa992
Add mountpoint to NodeFilesystem alerts
...
This helps to identify alerting filesystem.
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
2023-06-29 23:26:51 +08:00
Will Bollock
0a17e17718
docs (node/mixin): fix annotation for Skew alert ( #2671 )
...
This updates the annotation for the NodeClockSkewDetected mixin alert to
match the new threshold set.
Original discussion was in this PR: https://github.com/prometheus/node_exporter/pull/1480
I spent an embarrassingly large amount of time trying to figure out how
the heck that alert would mean 300s of clock skew. Turns out the
annotation was just left the same after the threshold change.
Signed-off-by: Will Bollock <wbollock@linode.com>
2023-05-11 10:33:10 +02:00
Jan Fajerski
87b8e3790d
docs/node-mixin: add fsMointpointSelector to alerts and dashboards ( #2446 )
...
* docs/node-mixin: add fsMountpointSelector
This adds the option to add a `mountpoint` selector to filesystem
related alerts. The default is `mountpoint!=""`.
* docs/node-mixins: add fsMountpointSelector to dashboards
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
2022-10-20 13:06:31 +02:00
Paweł Krupa (paulfantom)
8571536327
docs/node-mixin: add missing selectors
...
Signed-off-by: Paweł Krupa (paulfantom) <pawel@krupa.net.pl>
2022-07-19 16:44:16 +02:00
Daniel Lenar
0b50eb7294
Reverse fsSpaceAvailableCriticalThreshold and fsSpaceAvailableWarningThreshold
...
Currently critical alert for space available alerts on warning and
warning alert for space available alerts on critical.
Signed-off-by: Daniel Lenar <dlenar@vailsys.com>
2022-04-21 11:34:54 -05:00
Vitaly Zhuravlev
8823605f12
Fix NodeFileDescriptorLimit alerts
...
Signed-off-by: Vitaly Zhuravlev <zhuravlev.vitaly@gmail.com>
2022-04-07 16:25:17 +04:00
paulfantom
832909dd25
docs/node-mixin/alerts: make NodeFilesystemAlmostOutOfSpace fire earlier
...
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2021-08-16 16:35:58 +02:00
Loïc Blot
55ffe57cbc
feat(rules): add NodeFileDescriptorLimit kernel exhaustion alert
...
Add a new alert when fs.file-nr is close to fs.file-max
Signed-off-by: Loic Blot <loic.blot@unix-experience.fr>
2021-04-30 12:40:09 +02:00
Ben Kochie
eefb18db02
Merge pull request #1764 from dhoppe/patch-1
...
Use description instead of message as field for annotations
2021-01-24 14:56:03 +01:00
Ben Kochie
4b68aeb80a
Merge pull request #1862 from fsschmitt/fix/alerts-label-naming
...
fix: node_md_disks state label from fail to failed
2021-01-24 14:53:22 +01:00
Björn Rabenstein
9c9c636305
Merge pull request #1861 from paulfantom/network-alerts
...
docs/node-mixin/alerts: use ratio for network alerts
2020-10-19 12:14:24 +02:00
paulfantom
f81747e608
docs/node-mixin/alerts: add max error condition to alert about desynchronized clock
...
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2020-10-08 11:15:16 +02:00
fsschmitt
effa4da989
fix: node_md_disks state label as failed
...
Signed-off-by: fsschmitt <492108+fsschmitt@users.noreply.github.com>
2020-10-07 14:20:56 +01:00
paulfantom
d7cbe85d22
docs/node-mixin/alerts: use a rate for network alerts
...
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2020-10-07 13:04:51 +02:00
Nicolas Lamirault
ff2ff3410f
Configure 2 thresholds for NodeFilesystemAlmostOutOfSpace alert ( #1835 )
...
* Add: configure 2 thresholds for NodeFilesystemAlmostOutOfSpace alert
Signed-off-by: Nicolas Lamirault <nicolas.lamirault@gmail.com>
2020-09-18 11:28:32 +02:00
Rajat Vig
7dd8adf7ed
Fix NodeRAIDDegraded to not use a string rule expressions
...
Signed-off-by: Rajat Vig <rvig@etsy.com>
2020-08-28 10:43:39 +01:00
Simon Pasquier
02212dd2c6
Run jsonnetfmt
...
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-08-25 10:15:30 +02:00
Hao Ke
9b7a0d06a1
Fix syntax error
...
Signed-off-by: Hao Ke <hao.ke@auryc.com>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-08-25 10:07:37 +02:00
paulfantom
e4ec8e04c5
docs/node-mixin: add alerts about failing RAID array
...
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2020-08-24 16:17:20 +02:00
Dennis Hoppe
fc64b70386
Use description instead of message as field for annotations
...
Signed-off-by: Dennis Hoppe <github@debian-solutions.de>
2020-06-24 13:38:57 +02:00
Frederic Branczyk
b42819b69d
Merge pull request #1657 from povilasv/NodeTextFileCollectorScrapeError
...
Add NodeTextFileCollectorScrapeError alert to mixin
2020-04-30 17:54:06 +02:00
Povilas Versockas
bd3e6d224c
Add NodeTextFileCollectorScrapeError alert to mixin
...
Signed-off-by: Povilas Versockas <p.versockas@gmail.com>
2020-03-31 18:12:36 +03:00
beorn7
8b00b22904
Fix sign error in NodeClockSkewDetected
...
Signed-off-by: beorn7 <beorn@grafana.com>
2020-03-25 13:07:23 +01:00
paulfantom
820f8d595e
docs/node-mixin: alert on desynchronised clock
...
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2020-03-23 08:23:58 +01:00
Neraud
1006a2c4bb
Add missing coma
...
Signed-off-by: Neraud <neraud.login@gmail.com>
2020-03-21 13:06:43 +01:00
Povilas Versockas
48bb6f670c
Add NodeHighNumberConntrackEntriesUsed
...
Signed-off-by: Povilas Versockas <p.versockas@gmail.com>
2020-03-20 17:46:05 +01:00
iuri aranda
0107bc7942
Make FS space alerts thresholds configurable ( #1624 )
...
* Make FS space alerts thresholds configurable (#1 )
This makes it possible to tweak the thresholds for
the NodeFilesystemSpaceFillingUp alerts. Which
might be necessary in systems like Kubernetes,
where the image garbage collector runs at 85%,
so it's not a problem that the disk reaches that usage %.
Signed-off-by: iuri aranda <iuri@skyscrapers.eu>
2020-03-02 16:24:51 +01:00
Leo
dfeec07f2f
Fix node-mixin prometheus alert rules to use percentage
...
Signed-off-by: Leo <leonardjonathanoh@live.com>
2019-09-11 08:47:24 +00:00
beorn7
97ef113762
Make the severity of "critical" alerts configurable
...
This addresses the blissful scenario where single-node failures are
unproblematic. No reason to wake somebody up if a node is about to
screw itself up by filling the disk.
Signed-off-by: beorn7 <beorn@grafana.com>
2019-08-14 22:24:24 +02:00
beorn7
3a770a0b1d
Convert annotations from message to summary/description
...
Signed-off-by: beorn7 <beorn@grafana.com>
2019-07-16 21:40:57 +02:00
beorn7
a92d1d7889
Address review comments, batch 2
...
Signed-off-by: beorn7 <beorn@grafana.com>
2019-07-16 21:18:17 +02:00
beorn7
b3b47f2d07
Make selector naming consistent
...
Signed-off-by: beorn7 <beorn@grafana.com>
2019-07-10 20:09:01 +02:00
beorn7
dec5b5b053
Fix indentation
...
Signed-off-by: beorn7 <beorn@grafana.com>
2019-07-10 20:07:20 +02:00
beorn7
2df034c055
Move node-mixin into docs directory
...
Signed-off-by: beorn7 <beorn@grafana.com>
2019-07-05 19:38:03 +02:00