Johannes 'fish' Ziemke
6f1286b314
mixin: Drop mode label for num cpu metric
...
Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2021-09-03 12:13:35 +02:00
Johannes 'fish' Ziemke
fa9926c4eb
mixin: Cheaper calculation for instance:node_num_cpu:sum
...
Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2021-09-03 11:34:25 +02:00
paulfantom
832909dd25
docs/node-mixin/alerts: make NodeFilesystemAlmostOutOfSpace fire earlier
...
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2021-08-16 16:35:58 +02:00
Johannes 'fish' Ziemke
7fc5c6045a
Read config from $
...
Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2021-07-27 16:32:05 +02:00
ArthurSens
3731f93fd7
Refactor USE method mixin dashboards with grafonnet-lib, add multi-cluster support.
...
Aiming for cleaner code and following standards used on younger mixins.
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2021-07-27 16:32:05 +02:00
Frederic Hemberger
5bee84f30d
docs: Replace go get
with go install
for command installation
...
`go get` is deprecated for installation of commands as of go v1.17
Ref: https://go.googlesource.com/go/+/ced0fdbad0655d63d535390b1a7126fd1fef8348
Signed-off-by: Frederic Hemberger <mail@frederic-hemberger.de>
2021-07-20 12:16:46 +02:00
Loïc Blot
55ffe57cbc
feat(rules): add NodeFileDescriptorLimit kernel exhaustion alert
...
Add a new alert when fs.file-nr is close to fs.file-max
Signed-off-by: Loic Blot <loic.blot@unix-experience.fr>
2021-04-30 12:40:09 +02:00
raviprasad_lr
504f9b785c
fix interval in graphs panels of node dashboard
...
Signed-off-by: raviprasad_lr <raviprasad_lr@yahoo.com>
2021-04-26 11:14:30 +02:00
Johannes 'fish' Ziemke
a5908bf82b
Make interval configurable
...
Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2021-04-07 09:37:04 +02:00
Johannes 'fish' Ziemke
772335caa8
Use 5m rate in mixins
...
The default scrape interval of Prometheus is 60s, so we can't use a 1m
rate.
Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2021-04-07 09:37:04 +02:00
Ben Kochie
eefb18db02
Merge pull request #1764 from dhoppe/patch-1
...
Use description instead of message as field for annotations
2021-01-24 14:56:03 +01:00
Ben Kochie
4b68aeb80a
Merge pull request #1862 from fsschmitt/fix/alerts-label-naming
...
fix: node_md_disks state label from fail to failed
2021-01-24 14:53:22 +01:00
Anthony D'Atri
8b466360a3
Modest doc improvements ( #1876 )
...
* Modest doc improvements
Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>
2020-11-25 16:46:58 +01:00
Julien Pivotto
f645d49242
Mixin: Bump jsonnet requirement to 0.16 to use go-jsonnetcmd
...
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-27 11:41:46 +01:00
Matthias Loibl
77e76485c0
Use absolute jsonnet import paths
...
This should be the way forward when importing libraries in jsonnet. It's
closer to how Go imports look and makes it more obvious where packages
live.
This is not breaking anything, as the old imports were already symlinks
to the now directly used directories.
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
2020-10-20 11:34:43 +02:00
Björn Rabenstein
9c9c636305
Merge pull request #1861 from paulfantom/network-alerts
...
docs/node-mixin/alerts: use ratio for network alerts
2020-10-19 12:14:24 +02:00
paulfantom
f81747e608
docs/node-mixin/alerts: add max error condition to alert about desynchronized clock
...
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2020-10-08 11:15:16 +02:00
fsschmitt
effa4da989
fix: node_md_disks state label as failed
...
Signed-off-by: fsschmitt <492108+fsschmitt@users.noreply.github.com>
2020-10-07 14:20:56 +01:00
paulfantom
d7cbe85d22
docs/node-mixin/alerts: use a rate for network alerts
...
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2020-10-07 13:04:51 +02:00
Arthur Outhenin-Chalandre
6585e43eec
Fix memory gauge in mixin with multiple pods
...
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
2020-09-23 15:36:43 +02:00
Nicolas Lamirault
ff2ff3410f
Configure 2 thresholds for NodeFilesystemAlmostOutOfSpace alert ( #1835 )
...
* Add: configure 2 thresholds for NodeFilesystemAlmostOutOfSpace alert
Signed-off-by: Nicolas Lamirault <nicolas.lamirault@gmail.com>
2020-09-18 11:28:32 +02:00
Rajat Vig
7dd8adf7ed
Fix NodeRAIDDegraded to not use a string rule expressions
...
Signed-off-by: Rajat Vig <rvig@etsy.com>
2020-08-28 10:43:39 +01:00
Simon Pasquier
02212dd2c6
Run jsonnetfmt
...
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-08-25 10:15:30 +02:00
Hao Ke
9b7a0d06a1
Fix syntax error
...
Signed-off-by: Hao Ke <hao.ke@auryc.com>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-08-25 10:07:37 +02:00
Simon Pasquier
6d959e2e8c
*: add mixin tests to CI
...
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-08-25 10:03:46 +02:00
paulfantom
e4ec8e04c5
docs/node-mixin: add alerts about failing RAID array
...
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2020-08-24 16:17:20 +02:00
Dennis Hoppe
fc64b70386
Use description instead of message as field for annotations
...
Signed-off-by: Dennis Hoppe <github@debian-solutions.de>
2020-06-24 13:38:57 +02:00
Frederic Branczyk
b42819b69d
Merge pull request #1657 from povilasv/NodeTextFileCollectorScrapeError
...
Add NodeTextFileCollectorScrapeError alert to mixin
2020-04-30 17:54:06 +02:00
jangdm
d4d2e1db98
fix typo in TIME.md ( #1670 )
...
fix typo in TIME.md
Signed-off-by: jangdm <jamin4@naver.com>
2020-04-09 09:00:00 +02:00
WOO CHANG HO
612ea0cd12
Add more compatible rules
...
Signed-off-by: zodiac12k <zodiac12k@gmail.com>
2020-04-08 10:19:44 +02:00
Povilas Versockas
bd3e6d224c
Add NodeTextFileCollectorScrapeError alert to mixin
...
Signed-off-by: Povilas Versockas <p.versockas@gmail.com>
2020-03-31 18:12:36 +03:00
beorn7
8b00b22904
Fix sign error in NodeClockSkewDetected
...
Signed-off-by: beorn7 <beorn@grafana.com>
2020-03-25 13:07:23 +01:00
paulfantom
820f8d595e
docs/node-mixin: alert on desynchronised clock
...
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2020-03-23 08:23:58 +01:00
Neraud
1006a2c4bb
Add missing coma
...
Signed-off-by: Neraud <neraud.login@gmail.com>
2020-03-21 13:06:43 +01:00
Povilas Versockas
48bb6f670c
Add NodeHighNumberConntrackEntriesUsed
...
Signed-off-by: Povilas Versockas <p.versockas@gmail.com>
2020-03-20 17:46:05 +01:00
iuri aranda
0107bc7942
Make FS space alerts thresholds configurable ( #1624 )
...
* Make FS space alerts thresholds configurable (#1 )
This makes it possible to tweak the thresholds for
the NodeFilesystemSpaceFillingUp alerts. Which
might be necessary in systems like Kubernetes,
where the image garbage collector runs at 85%,
so it's not a problem that the disk reaches that usage %.
Signed-off-by: iuri aranda <iuri@skyscrapers.eu>
2020-03-02 16:24:51 +01:00
paulfantom
40570924b1
docs/node-mixin/dashboards: do not mix tabs and spaces
...
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2019-11-01 15:46:21 +01:00
beorn7
c6914477f5
Fix the normalization for the cluster-wide dashboards
...
We actually have to count or sum, respectively, _all_ the selected
metrics for the cluster-wide view. Which means it's easiest to use the
`scalar` approach after all (but only in the cluster dashboard). This
still propagates all the labels.
I have extended the comment for the `nodeExporterSelector` to note
that the cluster dashboard only makes sense if all the selected node
exporter actually belong to the same cluster.
Since this is jsonnet, users can easily disable the cluster
dashboard. Or even create multiple instances of the dashboards with
different `nodeExporterSelector`s for different clusters.
Signed-off-by: beorn7 <beorn@grafana.com>
2019-10-30 22:52:36 +01:00
Benoît Knecht
5a7b85876d
docs/node-mixin: Improve memory pressure rule
...
The `instance:node_memory_swap_io_pages:rate1m` rule was intended to
measure the amount of memory pressure a system is under, but its name is
a bit misleading (it specifically refers to swap), and the rate of
`node_vmstat_pgmajfault` is a better metric for memory pressure
(see #1524 ).
This commit renames `instance:node_memory_swap_io_pages:rate1m` to
`instance:node_vmstat_pgmajfault:rate1m`, and defines it as
`rate(node_vmstat_pgmajfault{%(nodeExporterSelector)s}[1m])`. The
dashboards are updated accordingly.
Signed-off-by: Benoît Knecht <benoit.knecht@fsfe.org>
2019-10-28 15:12:42 +01:00
Scott Brenner
813a4bdf8b
Two quick typo fixes
...
Signed-off-by: Scott Brenner <scott@scottbrenner.me>
2019-10-09 20:42:27 -07:00
Björn Rabenstein
855a1f1d18
Merge pull request #1482 from leojonathanoh/fix-node-mixin-prometheus-alert-rules-to-use-percentage
...
Fix node-mixin prometheus alert rules to use percentage
2019-09-26 20:01:18 +02:00
Sergiusz Urbaniak
f4417b209a
node-mixin: fix configuration for unset fsSelector/diskDeviceSelector
...
As per https://github.com/prometheus/node_exporter/pull/1429#discussion_r304210103
we want to fetch all devices and all fs types.
Currently, this is done by setting empty string which breaks most queries which rely on it.
This fixes it by setting the appropriate selector instead of empty string.
Signed-off-by: Sergiusz Urbaniak <sergiusz.urbaniak@gmail.com>
2019-09-12 14:02:56 +02:00
Sergiusz Urbaniak
ed78237036
node-mixin: fix query in Disk Space Utilisation dashboard
...
Signed-off-by: Sergiusz Urbaniak <sergiusz.urbaniak@gmail.com>
2019-09-12 14:02:56 +02:00
Leo
dfeec07f2f
Fix node-mixin prometheus alert rules to use percentage
...
Signed-off-by: Leo <leonardjonathanoh@live.com>
2019-09-11 08:47:24 +00:00
Björn Rabenstein
ab8cf1f718
Node mixin: Clarify dashboard dependency on rules ( #1475 )
...
Following @discordianfish's suggestion
[here](https://github.com/prometheus/node_exporter/issues/1454#issuecomment-524225222 ).
Signed-off-by: beorn7 <beorn@grafana.com>
2019-09-08 10:55:43 +02:00
beorn7
76ff263ca6
Update legendLink
...
This still had the 'k8s' in as it was copied and pasted from the
kubernetes-mixin.
Signed-off-by: beorn7 <beorn@grafana.com>
2019-08-20 18:49:12 +02:00
Björn Rabenstein
0f38d680b4
Merge pull request #1449 from prometheus/beorn7/mixin3
...
node-mixin: Make the severity of "critical" alerts configurable
2019-08-19 13:55:52 +02:00
beorn7
44e5731de7
Add line for number of cores to load graph
...
Backported from the node dashboard in the kubernetes-mixin.
Signed-off-by: beorn7 <beorn@grafana.com>
2019-08-15 16:43:57 +02:00
beorn7
024d5ed55e
Fix title of CPU panel to usage
...
We use the `mode="idle"` metric, but we are inverting it, so this is
usage, and that's intended.
Signed-off-by: beorn7 <beorn@grafana.com>
2019-08-15 16:36:10 +02:00
beorn7
a016d9cd6f
node-mixin: Improve disk usage panel
...
- Use a stacked graph instead of a gauge as development over time is
especially useful for disk space usage.
- By only taking one metric per device into account, we avoid
double-counting for devices that are mounted multiple times.
Signed-off-by: beorn7 <beorn@grafana.com>
2019-08-15 16:32:54 +02:00