Currently critical alert for space available alerts on warning and
warning alert for space available alerts on critical.
Signed-off-by: Daniel Lenar <dlenar@vailsys.com>
'iowait' and 'steal' indicate specific idle/wait states, which shouldn't
be counted into CPU Utilisation. Also see
https://github.com/prometheus-operator/kube-prometheus/pull/796 and
https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/667.
Per the iostat man page:
%idle
Show the percentage of time that the CPU or CPUs were idle and the
system did not have an outstanding disk I/O request.
%iowait
Show the percentage of time that the CPU or CPUs were idle during
which the system had an outstanding disk I/O request.
%steal
Show the percentage of time spent in involuntary wait by the
virtual CPU or CPUs while the hypervisor was servicing another
virtual processor.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
After a recent change in prometheus/prometheus, Makefile.common includes
now a yamllint target which currently fails. This PR adds the missing
yamllint config and fixes the yamllint errors.
Signed-off-by: Michal Wasilewski <mwasilewski@gmx.com>
Add a fallback to Buffers+Cached+MemFree+Slab for older Linux kernels
where the MemAvailable metric is not available for memory utilization.
Signed-off-by: Ben Kochie <superq@gmail.com>
This should be the way forward when importing libraries in jsonnet. It's
closer to how Go imports look and makes it more obvious where packages
live.
This is not breaking anything, as the old imports were already symlinks
to the now directly used directories.
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
* Make FS space alerts thresholds configurable (#1)
This makes it possible to tweak the thresholds for
the NodeFilesystemSpaceFillingUp alerts. Which
might be necessary in systems like Kubernetes,
where the image garbage collector runs at 85%,
so it's not a problem that the disk reaches that usage %.
Signed-off-by: iuri aranda <iuri@skyscrapers.eu>
We actually have to count or sum, respectively, _all_ the selected
metrics for the cluster-wide view. Which means it's easiest to use the
`scalar` approach after all (but only in the cluster dashboard). This
still propagates all the labels.
I have extended the comment for the `nodeExporterSelector` to note
that the cluster dashboard only makes sense if all the selected node
exporter actually belong to the same cluster.
Since this is jsonnet, users can easily disable the cluster
dashboard. Or even create multiple instances of the dashboards with
different `nodeExporterSelector`s for different clusters.
Signed-off-by: beorn7 <beorn@grafana.com>
The `instance:node_memory_swap_io_pages:rate1m` rule was intended to
measure the amount of memory pressure a system is under, but its name is
a bit misleading (it specifically refers to swap), and the rate of
`node_vmstat_pgmajfault` is a better metric for memory pressure
(see #1524).
This commit renames `instance:node_memory_swap_io_pages:rate1m` to
`instance:node_vmstat_pgmajfault:rate1m`, and defines it as
`rate(node_vmstat_pgmajfault{%(nodeExporterSelector)s}[1m])`. The
dashboards are updated accordingly.
Signed-off-by: Benoît Knecht <benoit.knecht@fsfe.org>