node_exporter/docs/node-mixin/config.libsonnet

{
  _config+:: {
    // Selectors are inserted between {} in Prometheus queries.

    // Select the metrics coming from the node exporter. Note that all
    // the selected metrics are shown stacked on top of each other in
    // the 'USE Method / Cluster' dashboard. Consider disabling that
    // dashboard if mixing up all those metrics in the same dashboard
    // doesn't make sense (e.g. because they are coming from different
    // clusters).
    nodeExporterSelector: 'job="node"',

    // Select the fstype for filesystem-related queries. If left
    // empty, all filesystems are selected. If you have unusual
    // filesystem you don't want to include in dashboards and
    // alerting, you can exclude them here, e.g. 'fstype!="tmpfs"'.
    fsSelector: 'fstype!=""',

    // Select the device for disk-related queries. If left empty, all
    // devices are selected. If you have unusual devices you don't
    // want to include in dashboards and alerting, you can exclude
    // them here, e.g. 'device!="tmpfs"'.
    diskDeviceSelector: 'device!=""',

    // Some of the alerts are meant to fire if a critical failure of a
    // node is imminent (e.g. the disk is about to run full). In a
    // true “cloud native” setup, failures of a single node should be
    // tolerated. Hence, even imminent failure of a single node is no
    // reason to create a paging alert. However, in practice there are
    // still many situations where operators like to get paged in time
    // before a node runs out of disk space. nodeCriticalSeverity can
    // be set to the desired severity for this kind of alerts. This
    // can even be templated to depend on labels of the node, e.g. you
    // could make this critical for traditional database masters but
    // just a warning for K8s nodes.
    nodeCriticalSeverity: 'critical',

    // Available disk space (%) thresholds on which to trigger the
    // 'NodeFilesystemSpaceFillingUp' alerts. These alerts fire if the disk
    // usage grows in a way that it is predicted to run out in 4h or 1d
    // and if the provided thresholds have been reached right now.
    // In some cases you'll want to adjust these, e.g. by default Kubernetes
    // runs the image garbage collection when the disk usage reaches 85%
    // of its available space. In that case, you'll want to reduce the
    // critical threshold below to something like 14 or 15, otherwise
    // the alert could fire under normal node usage.
    fsSpaceFillingUpWarningThreshold: 40,
    fsSpaceFillingUpCriticalThreshold: 20,

    grafana_prefix: '',
  },
}
Beginnings of a node-exporter monitoring mixin. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> 2018-05-08 03:10:29 -07:00			`{`
			`_config+:: {`
			`// Selectors are inserted between {} in Prometheus queries.`
Make more use of config.libsonnet Signed-off-by: beorn7 <beorn@grafana.com> 2019-07-16 10:34:27 -07:00
Fix the normalization for the cluster-wide dashboards We actually have to count or sum, respectively, _all_ the selected metrics for the cluster-wide view. Which means it's easiest to use the `scalar` approach after all (but only in the cluster dashboard). This still propagates all the labels. I have extended the comment for the `nodeExporterSelector` to note that the cluster dashboard only makes sense if all the selected node exporter actually belong to the same cluster. Since this is jsonnet, users can easily disable the cluster dashboard. Or even create multiple instances of the dashboards with different `nodeExporterSelector`s for different clusters. Signed-off-by: beorn7 <beorn@grafana.com> 2019-10-30 14:52:36 -07:00			`// Select the metrics coming from the node exporter. Note that all`
			`// the selected metrics are shown stacked on top of each other in`
			`// the 'USE Method / Cluster' dashboard. Consider disabling that`
			`// dashboard if mixing up all those metrics in the same dashboard`
			`// doesn't make sense (e.g. because they are coming from different`
			`// clusters).`
Address review comments, batch 2 Signed-off-by: beorn7 <beorn@grafana.com> 2019-07-16 12:18:17 -07:00			`nodeExporterSelector: 'job="node"',`
Beginnings of a node-exporter monitoring mixin. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> 2018-05-08 03:10:29 -07:00
Responses to review comments, round 3 Signed-off-by: beorn7 <beorn@grafana.com> 2019-07-17 14:54:31 -07:00			`// Select the fstype for filesystem-related queries. If left`
			`// empty, all filesystems are selected. If you have unusual`
			`// filesystem you don't want to include in dashboards and`
			`// alerting, you can exclude them here, e.g. 'fstype!="tmpfs"'.`
node-mixin: fix configuration for unset fsSelector/diskDeviceSelector As per https://github.com/prometheus/node_exporter/pull/1429#discussion_r304210103 we want to fetch all devices and all fs types. Currently, this is done by setting empty string which breaks most queries which rely on it. This fixes it by setting the appropriate selector instead of empty string. Signed-off-by: Sergiusz Urbaniak <sergiusz.urbaniak@gmail.com> 2019-09-12 04:57:19 -07:00			`fsSelector: 'fstype!=""',`
Beginnings of a node-exporter monitoring mixin. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> 2018-05-08 03:10:29 -07:00
Responses to review comments, round 3 Signed-off-by: beorn7 <beorn@grafana.com> 2019-07-17 14:54:31 -07:00			`// Select the device for disk-related queries. If left empty, all`
			`// devices are selected. If you have unusual devices you don't`
			`// want to include in dashboards and alerting, you can exclude`
			`// them here, e.g. 'device!="tmpfs"'.`
node-mixin: fix configuration for unset fsSelector/diskDeviceSelector As per https://github.com/prometheus/node_exporter/pull/1429#discussion_r304210103 we want to fetch all devices and all fs types. Currently, this is done by setting empty string which breaks most queries which rely on it. This fixes it by setting the appropriate selector instead of empty string. Signed-off-by: Sergiusz Urbaniak <sergiusz.urbaniak@gmail.com> 2019-09-12 04:57:19 -07:00			`diskDeviceSelector: 'device!=""',`
Make more use of config.libsonnet Signed-off-by: beorn7 <beorn@grafana.com> 2019-07-16 10:34:27 -07:00
Make the severity of "critical" alerts configurable This addresses the blissful scenario where single-node failures are unproblematic. No reason to wake somebody up if a node is about to screw itself up by filling the disk. Signed-off-by: beorn7 <beorn@grafana.com> 2019-08-14 13:24:24 -07:00			`// Some of the alerts are meant to fire if a critical failure of a`
			`// node is imminent (e.g. the disk is about to run full). In a`
			`// true “cloud native” setup, failures of a single node should be`
			`// tolerated. Hence, even imminent failure of a single node is no`
			`// reason to create a paging alert. However, in practice there are`
			`// still many situations where operators like to get paged in time`
			`// before a node runs out of disk space. nodeCriticalSeverity can`
			`// be set to the desired severity for this kind of alerts. This`
			`// can even be templated to depend on labels of the node, e.g. you`
			`// could make this critical for traditional database masters but`
			`// just a warning for K8s nodes.`
			`nodeCriticalSeverity: 'critical',`

Make FS space alerts thresholds configurable (#1624) * Make FS space alerts thresholds configurable (#1) This makes it possible to tweak the thresholds for the NodeFilesystemSpaceFillingUp alerts. Which might be necessary in systems like Kubernetes, where the image garbage collector runs at 85%, so it's not a problem that the disk reaches that usage %. Signed-off-by: iuri aranda <iuri@skyscrapers.eu> 2020-03-02 07:24:51 -08:00			`// Available disk space (%) thresholds on which to trigger the`
			`// 'NodeFilesystemSpaceFillingUp' alerts. These alerts fire if the disk`
			`// usage grows in a way that it is predicted to run out in 4h or 1d`
			`// and if the provided thresholds have been reached right now.`
			`// In some cases you'll want to adjust these, e.g. by default Kubernetes`
			`// runs the image garbage collection when the disk usage reaches 85%`
			`// of its available space. In that case, you'll want to reduce the`
			`// critical threshold below to something like 14 or 15, otherwise`
			`// the alert could fire under normal node usage.`
			`fsSpaceFillingUpWarningThreshold: 40,`
			`fsSpaceFillingUpCriticalThreshold: 20,`

Beginnings of a node-exporter monitoring mixin. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> 2018-05-08 03:10:29 -07:00			`grafana_prefix: '',`
			`},`
			`}`