Automatically add a uid to each dashboard.
This prevents changing URLs when restarting a grafana pod and
re-importing the dashboards via ConfigMaps.
Signed-off-by: Stefan Andres <sandres@anaconda.com>
30m is too long and there is a risk of running out of disk space/inodes completely if something is filling up disk very fast (like log file).
Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com>
This updates the annotation for the NodeClockSkewDetected mixin alert to
match the new threshold set.
Original discussion was in this PR: https://github.com/prometheus/node_exporter/pull/1480
I spent an embarrassingly large amount of time trying to figure out how
the heck that alert would mean 300s of clock skew. Turns out the
annotation was just left the same after the threshold change.
Signed-off-by: Will Bollock <wbollock@linode.com>
The ntp collector has always been a source of confusion and problems.
The data it produces is more of a blackbox probe against an NTP server.
The time sync / offset data produced is not what users expect.
Mark this collector as deprecated to be removed in v2.0.0
Signed-off-by: Ben Kochie <superq@gmail.com>
* docs/node-mixin: add fsMountpointSelector
This adds the option to add a `mountpoint` selector to filesystem
related alerts. The default is `mountpoint!=""`.
* docs/node-mixins: add fsMountpointSelector to dashboards
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
When appying rate() to seconds we have 'seconds per second' or fractions of the second, so actually it actually can be from 0 to 1.
Also update intervalFactor to 1 for better rates.
Signed-off-by: Vitaly Zhuravlev <zhuravlev.vitaly@gmail.com>
https://prometheus.io/docs/prometheus/latest/querying/functions/#rate
rate() calculates per-second average rate, therefore Bps units should be used for disks.
In networking bandwidth throughput is usually measured in bits/s so units are changed accordingly.
Signed-off-by: Vitaly Zhuravlev <zhuravlev.vitaly@gmail.com>
fix the following markdownlint errors (and some more):
[..]mixins/node-exporter/README.md:13: MD031 Fenced code blocks should be surrounded by blank lines
[..]mixins/node-exporter/README.md:21: MD031 Fenced code blocks should be surrounded by blank lines
[..]mixins/node-exporter/README.md:27: MD031 Fenced code blocks should be surrounded by blank lines
[..]mixins/node-exporter/README.md:33: MD031 Fenced code blocks should be surrounded by blank lines
[..]mixins/node-exporter/README.md:41: MD034 Bare URL used
A detailed description of the rules is available at https://github.com/markdownlint/markdownlint/blob/master/docs/RULES.md
Signed-off-by: Sven Kieske <s.kieske@mittwald.de>
Problem: In 0b50eb7294 the usage of the
threshold variables was adjusted. The values had been switched as well
resulting in reversed thresholds after the commit above. Warnings now
have a smaller threshold than critical alerts.
Solution: Adjust thresholds to reflect that warnings should be alerted
on before critical alerts.
Issues: https://github.com/prometheus/node_exporter/pull/2352
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
Currently critical alert for space available alerts on warning and
warning alert for space available alerts on critical.
Signed-off-by: Daniel Lenar <dlenar@vailsys.com>
'iowait' and 'steal' indicate specific idle/wait states, which shouldn't
be counted into CPU Utilisation. Also see
https://github.com/prometheus-operator/kube-prometheus/pull/796 and
https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/667.
Per the iostat man page:
%idle
Show the percentage of time that the CPU or CPUs were idle and the
system did not have an outstanding disk I/O request.
%iowait
Show the percentage of time that the CPU or CPUs were idle during
which the system had an outstanding disk I/O request.
%steal
Show the percentage of time spent in involuntary wait by the
virtual CPU or CPUs while the hypervisor was servicing another
virtual processor.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>