node_exporter/docs/node-mixin/config.libsonnet

{
  _config+:: {
    // Selectors are inserted between {} in Prometheus queries.

    // Select the metrics coming from the node exporter. Note that all
    // the selected metrics are shown stacked on top of each other in
    // the 'USE Method / Cluster' dashboard. Consider disabling that
    // dashboard if mixing up all those metrics in the same dashboard
    // doesn't make sense (e.g. because they are coming from different
    // clusters).
    nodeExporterSelector: 'job="node"',

    // Select the fstype for filesystem-related queries. If left
    // empty, all filesystems are selected. If you have unusual
    // filesystem you don't want to include in dashboards and
    // alerting, you can exclude them here, e.g. 'fstype!="tmpfs"'.
    fsSelector: 'fstype!=""',

    // Select the mountpoint for filesystem-related queries. If left
    // empty, all mountpoints are selected. For example if you have a
    // special purpose tmpfs instance that has a fixed size and will
    // always be 100% full, but you still want alerts and dashboards for
    // other tmpfs instances, you can exclude those by mountpoint prefix
    // like so: 'mountpoint!~"/var/lib/foo.*"'.
    fsMountpointSelector: 'mountpoint!=""',

    // Select the device for disk-related queries. If left empty, all
    // devices are selected. If you have unusual devices you don't
    // want to include in dashboards and alerting, you can exclude
    // them here, e.g. 'device!="tmpfs"'.
    diskDeviceSelector: 'device!=""',

    // Some of the alerts are meant to fire if a critical failure of a
    // node is imminent (e.g. the disk is about to run full). In a
    // true “cloud native” setup, failures of a single node should be
    // tolerated. Hence, even imminent failure of a single node is no
    // reason to create a paging alert. However, in practice there are
    // still many situations where operators like to get paged in time
    // before a node runs out of disk space. nodeCriticalSeverity can
    // be set to the desired severity for this kind of alerts. This
    // can even be templated to depend on labels of the node, e.g. you
    // could make this critical for traditional database masters but
    // just a warning for K8s nodes.
    nodeCriticalSeverity: 'critical',

    // CPU utilization (%) on which to trigger the
    // 'NodeCPUHighUsage' alert.
    cpuHighUsageThreshold: 90,
    // Load average 1m (per core) on which to trigger the
    // 'NodeSystemSaturation' alert.
    systemSaturationPerCoreThreshold: 2,

    // Available disk space (%) thresholds on which to trigger the
    // 'NodeFilesystemSpaceFillingUp' alerts. These alerts fire if the disk
    // usage grows in a way that it is predicted to run out in 4h or 1d
    // and if the provided thresholds have been reached right now.
    // In some cases you'll want to adjust these, e.g. by default Kubernetes
    // runs the image garbage collection when the disk usage reaches 85%
    // of its available space. In that case, you'll want to reduce the
    // critical threshold below to something like 14 or 15, otherwise
    // the alert could fire under normal node usage.
    fsSpaceFillingUpWarningThreshold: 40,
    fsSpaceFillingUpCriticalThreshold: 20,

    // Available disk space (%) thresholds on which to trigger the
    // 'NodeFilesystemAlmostOutOfSpace' alerts.
    fsSpaceAvailableWarningThreshold: 5,
    fsSpaceAvailableCriticalThreshold: 3,

    // Memory utilzation (%) level on which to trigger the
    // 'NodeMemoryHighUtilization' alert.
    memoryHighUtilizationThreshold: 90,

    // Threshold for the rate of memory major page faults to trigger
    // 'NodeMemoryMajorPagesFaults' alert.
    memoryMajorPagesFaultsThreshold: 500,

    // Disk IO queue level above which to trigger
    // 'NodeDiskIOSaturation' alert.
    diskIOSaturationThreshold: 10,

    rateInterval: '5m',
    // Opt-in for multi-cluster support.
    showMultiCluster: false,
    clusterLabel: 'cluster',

    dashboardNamePrefix: 'Node Exporter / ',
    dashboardTags: ['node-exporter-mixin'],
  },
}
Beginnings of a node-exporter monitoring mixin. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> 2018-05-08 03:10:29 -07:00			`{`
			`_config+:: {`
			`// Selectors are inserted between {} in Prometheus queries.`
Make more use of config.libsonnet Signed-off-by: beorn7 <beorn@grafana.com> 2019-07-16 10:34:27 -07:00
Fix the normalization for the cluster-wide dashboards We actually have to count or sum, respectively, _all_ the selected metrics for the cluster-wide view. Which means it's easiest to use the `scalar` approach after all (but only in the cluster dashboard). This still propagates all the labels. I have extended the comment for the `nodeExporterSelector` to note that the cluster dashboard only makes sense if all the selected node exporter actually belong to the same cluster. Since this is jsonnet, users can easily disable the cluster dashboard. Or even create multiple instances of the dashboards with different `nodeExporterSelector`s for different clusters. Signed-off-by: beorn7 <beorn@grafana.com> 2019-10-30 14:52:36 -07:00			`// Select the metrics coming from the node exporter. Note that all`
			`// the selected metrics are shown stacked on top of each other in`
			`// the 'USE Method / Cluster' dashboard. Consider disabling that`
			`// dashboard if mixing up all those metrics in the same dashboard`
			`// doesn't make sense (e.g. because they are coming from different`
			`// clusters).`
Address review comments, batch 2 Signed-off-by: beorn7 <beorn@grafana.com> 2019-07-16 12:18:17 -07:00			`nodeExporterSelector: 'job="node"',`
Beginnings of a node-exporter monitoring mixin. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> 2018-05-08 03:10:29 -07:00
Responses to review comments, round 3 Signed-off-by: beorn7 <beorn@grafana.com> 2019-07-17 14:54:31 -07:00			`// Select the fstype for filesystem-related queries. If left`
			`// empty, all filesystems are selected. If you have unusual`
			`// filesystem you don't want to include in dashboards and`
			`// alerting, you can exclude them here, e.g. 'fstype!="tmpfs"'.`
node-mixin: fix configuration for unset fsSelector/diskDeviceSelector As per https://github.com/prometheus/node_exporter/pull/1429#discussion_r304210103 we want to fetch all devices and all fs types. Currently, this is done by setting empty string which breaks most queries which rely on it. This fixes it by setting the appropriate selector instead of empty string. Signed-off-by: Sergiusz Urbaniak <sergiusz.urbaniak@gmail.com> 2019-09-12 04:57:19 -07:00			`fsSelector: 'fstype!=""',`
Beginnings of a node-exporter monitoring mixin. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> 2018-05-08 03:10:29 -07:00
docs/node-mixin: add fsMointpointSelector to alerts and dashboards (#2446) * docs/node-mixin: add fsMountpointSelector This adds the option to add a `mountpoint` selector to filesystem related alerts. The default is `mountpoint!=""`. * docs/node-mixins: add fsMountpointSelector to dashboards Signed-off-by: Jan Fajerski <jfajersk@redhat.com> 2022-10-20 04:06:31 -07:00			`// Select the mountpoint for filesystem-related queries. If left`
			`// empty, all mountpoints are selected. For example if you have a`
			`// special purpose tmpfs instance that has a fixed size and will`
			`// always be 100% full, but you still want alerts and dashboards for`
			`// other tmpfs instances, you can exclude those by mountpoint prefix`
			`// like so: 'mountpoint!~"/var/lib/foo.*"'.`
			`fsMountpointSelector: 'mountpoint!=""',`

Responses to review comments, round 3 Signed-off-by: beorn7 <beorn@grafana.com> 2019-07-17 14:54:31 -07:00			`// Select the device for disk-related queries. If left empty, all`
			`// devices are selected. If you have unusual devices you don't`
			`// want to include in dashboards and alerting, you can exclude`
			`// them here, e.g. 'device!="tmpfs"'.`
node-mixin: fix configuration for unset fsSelector/diskDeviceSelector As per https://github.com/prometheus/node_exporter/pull/1429#discussion_r304210103 we want to fetch all devices and all fs types. Currently, this is done by setting empty string which breaks most queries which rely on it. This fixes it by setting the appropriate selector instead of empty string. Signed-off-by: Sergiusz Urbaniak <sergiusz.urbaniak@gmail.com> 2019-09-12 04:57:19 -07:00			`diskDeviceSelector: 'device!=""',`
Make more use of config.libsonnet Signed-off-by: beorn7 <beorn@grafana.com> 2019-07-16 10:34:27 -07:00
Make the severity of "critical" alerts configurable This addresses the blissful scenario where single-node failures are unproblematic. No reason to wake somebody up if a node is about to screw itself up by filling the disk. Signed-off-by: beorn7 <beorn@grafana.com> 2019-08-14 13:24:24 -07:00			`// Some of the alerts are meant to fire if a critical failure of a`
			`// node is imminent (e.g. the disk is about to run full). In a`
			`// true “cloud native” setup, failures of a single node should be`
			`// tolerated. Hence, even imminent failure of a single node is no`
			`// reason to create a paging alert. However, in practice there are`
			`// still many situations where operators like to get paged in time`
			`// before a node runs out of disk space. nodeCriticalSeverity can`
			`// be set to the desired severity for this kind of alerts. This`
			`// can even be templated to depend on labels of the node, e.g. you`
			`// could make this critical for traditional database masters but`
			`// just a warning for K8s nodes.`
			`nodeCriticalSeverity: 'critical',`

Set severity to NodeCPUHighUsage to info Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com> 2023-04-05 11:30:53 -07:00			`// CPU utilization (%) on which to trigger the`
			`// 'NodeCPUHighUsage' alert.`
			`cpuHighUsageThreshold: 90,`
Add thresholds for memory, disk and system alerts Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com> 2023-04-05 09:56:00 -07:00			`// Load average 1m (per core) on which to trigger the`
			`// 'NodeSystemSaturation' alert.`
			`systemSaturationPerCoreThreshold: 2,`

Make FS space alerts thresholds configurable (#1624) * Make FS space alerts thresholds configurable (#1) This makes it possible to tweak the thresholds for the NodeFilesystemSpaceFillingUp alerts. Which might be necessary in systems like Kubernetes, where the image garbage collector runs at 85%, so it's not a problem that the disk reaches that usage %. Signed-off-by: iuri aranda <iuri@skyscrapers.eu> 2020-03-02 07:24:51 -08:00			`// Available disk space (%) thresholds on which to trigger the`
			`// 'NodeFilesystemSpaceFillingUp' alerts. These alerts fire if the disk`
			`// usage grows in a way that it is predicted to run out in 4h or 1d`
			`// and if the provided thresholds have been reached right now.`
			`// In some cases you'll want to adjust these, e.g. by default Kubernetes`
			`// runs the image garbage collection when the disk usage reaches 85%`
			`// of its available space. In that case, you'll want to reduce the`
			`// critical threshold below to something like 14 or 15, otherwise`
			`// the alert could fire under normal node usage.`
			`fsSpaceFillingUpWarningThreshold: 40,`
			`fsSpaceFillingUpCriticalThreshold: 20,`

Configure 2 thresholds for NodeFilesystemAlmostOutOfSpace alert (#1835) * Add: configure 2 thresholds for NodeFilesystemAlmostOutOfSpace alert Signed-off-by: Nicolas Lamirault <nicolas.lamirault@gmail.com> 2020-09-18 02:28:32 -07:00			`// Available disk space (%) thresholds on which to trigger the`
			`// 'NodeFilesystemAlmostOutOfSpace' alerts.`
node-mixins/config: Switch fsAvailable warning and critical thresholds Problem: In 0b50eb7294da9908f59e1af897010743bc0bd535 the usage of the threshold variables was adjusted. The values had been switched as well resulting in reversed thresholds after the commit above. Warnings now have a smaller threshold than critical alerts. Solution: Adjust thresholds to reflect that warnings should be alerted on before critical alerts. Issues: https://github.com/prometheus/node_exporter/pull/2352 Signed-off-by: Jan Fajerski <jfajersk@redhat.com> 2022-05-10 05:50:20 -07:00			`fsSpaceAvailableWarningThreshold: 5,`
			`fsSpaceAvailableCriticalThreshold: 3,`
Configure 2 thresholds for NodeFilesystemAlmostOutOfSpace alert (#1835) * Add: configure 2 thresholds for NodeFilesystemAlmostOutOfSpace alert Signed-off-by: Nicolas Lamirault <nicolas.lamirault@gmail.com> 2020-09-18 02:28:32 -07:00
Add thresholds for memory alerts Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com> 2023-04-05 09:21:50 -07:00			`// Memory utilzation (%) level on which to trigger the`
			`// 'NodeMemoryHighUtilization' alert.`
			`memoryHighUtilizationThreshold: 90,`

			`// Threshold for the rate of memory major page faults to trigger`
			`// 'NodeMemoryMajorPagesFaults' alert.`
Add thresholds for memory, disk and system alerts Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com> 2023-04-05 09:56:00 -07:00			`memoryMajorPagesFaultsThreshold: 500,`

			`// Disk IO queue level above which to trigger`
			`// 'NodeDiskIOSaturation' alert.`
			`diskIOSaturationThreshold: 10,`
Add thresholds for memory alerts Signed-off-by: Vitaly Zhuravlev <v-zhuravlev@users.noreply.github.com> 2023-04-05 09:21:50 -07:00
Make interval configurable Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org> 2021-04-03 03:40:22 -07:00			`rateInterval: '5m',`
Refactor USE method mixin dashboards with grafonnet-lib, add multi-cluster support. Aiming for cleaner code and following standards used on younger mixins. Signed-off-by: ArthurSens <arthursens2005@gmail.com> 2021-04-01 17:34:23 -07:00			`// Opt-in for multi-cluster support.`
			`showMultiCluster: false,`
			`clusterLabel: 'cluster',`

			`dashboardNamePrefix: 'Node Exporter / ',`
			`dashboardTags: ['node-exporter-mixin'],`
Beginnings of a node-exporter monitoring mixin. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> 2018-05-08 03:10:29 -07:00			`},`
			`}`