Commit graph

1315 commits

Author SHA1 Message Date
Björn Rabenstein 855a1f1d18
Merge pull request #1482 from leojonathanoh/fix-node-mixin-prometheus-alert-rules-to-use-percentage
Fix node-mixin prometheus alert rules to use percentage
2019-09-26 20:01:18 +02:00
Benjamin Drung 27b8c93a5a Use InfiniBandClass from procfs library (#1396)
Parsing the sysfs files for InfiniBand was added to the procfs library
(see https://github.com/prometheus/procfs/pull/164).

Therefore use `InfiniBandClass` from the procfs library instead of
parsing sysfs itself.

If the port counter return `N/A (no PMA)` no metric will be returned
(instead of returning 0 for this metric.

Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
2019-09-23 18:18:35 +02:00
Ben Kochie f3538e1fc6
Merge pull request #1488 from pgier/update-procfs-v0.0.5
update procfs to v0.0.5
2019-09-16 09:37:38 +02:00
Paul Gier cbfb496629 update procfs to v0.0.5
- Fixes (#1465) failure in netclass collector
- Adds parsing of CPU information

Signed-off-by: Paul Gier <pgier@redhat.com>
2019-09-15 16:57:37 -05:00
PrometheusBot eb19c5c20b makefile: update Makefile.common with newer version (#1481)
Signed-off-by: prombot <prometheus-team@googlegroups.com>
2019-09-13 12:55:06 +02:00
Björn Rabenstein e7c2dbed4e
Merge pull request #1483 from s-urbaniak/fix-selectors
node-mixin: fix configuration for unset fsSelector/diskDeviceSelector and dashboard query
2019-09-12 21:36:31 +02:00
Sergiusz Urbaniak f4417b209a node-mixin: fix configuration for unset fsSelector/diskDeviceSelector
As per https://github.com/prometheus/node_exporter/pull/1429#discussion_r304210103
we want to fetch all devices and all fs types.

Currently, this is done by setting empty string which breaks most queries which rely on it.

This fixes it by setting the appropriate selector instead of empty string.

Signed-off-by: Sergiusz Urbaniak <sergiusz.urbaniak@gmail.com>
2019-09-12 14:02:56 +02:00
Sergiusz Urbaniak ed78237036 node-mixin: fix query in Disk Space Utilisation dashboard
Signed-off-by: Sergiusz Urbaniak <sergiusz.urbaniak@gmail.com>
2019-09-12 14:02:56 +02:00
Leo dfeec07f2f Fix node-mixin prometheus alert rules to use percentage
Signed-off-by: Leo <leonardjonathanoh@live.com>
2019-09-11 08:47:24 +00:00
Ben Kochie 7caedccd73
Merge pull request #1445 from davemcphee/coolingDevice
Scrape cooling_device state
2019-09-09 19:24:17 +02:00
Ben Kochie 82b7b1f732
Merge branch 'master' into coolingDevice 2019-09-09 17:44:03 +02:00
dt-rush 93fbb93a46 fix issue where rootfs path strips to the empty string (#1464)
Change-type: patch
Connects-to: #1463
Signed-off-by: dt-rush <nickp@balena.io>
2019-09-09 17:39:24 +02:00
Björn Rabenstein ab8cf1f718 Node mixin: Clarify dashboard dependency on rules (#1475)
Following @discordianfish's suggestion
[here](https://github.com/prometheus/node_exporter/issues/1454#issuecomment-524225222).

Signed-off-by: beorn7 <beorn@grafana.com>
2019-09-08 10:55:43 +02:00
Ben Kochie 0e77317955
Update netlink vendoring (#1471)
* github.com/ema/qdisc
* github.com/mdlayher/genetlink
* github.com/mdlayher/wifi

Signed-off-by: Ben Kochie <superq@gmail.com>
2019-09-05 15:35:13 +02:00
Paul Gier 8c3de12c22 systemd: check version for availability of properties (#1413)
The dbus property 'SystemState' and the timer property 'LastTriggerUSec'
were added in version 212 of systemd.
Check that the version of systemd is higher than 212 before attempting
to query these properties

f755e3b74b
dedabea4b3

Resolves issue #291

Signed-off-by: Paul Gier <pgier@redhat.com>
2019-09-04 16:27:25 +02:00
Alex Schmitz 664025d60c
Scrape cooling_device state
Signed-off-by: Alex Schmitz <alex.schmitz@gmail.com>
2019-08-30 08:58:47 -05:00
PrometheusBot d3478a207e makefile: update Makefile.common with newer version (#1466)
Signed-off-by: prombot <prometheus-team@googlegroups.com>
2019-08-30 13:32:17 +02:00
Boris Momčilović 93c12e03a1 Ipvs firewall mark (#1455)
* IPVS: include firewall mark label

Signed-off-by: Boris Momčilović <boris@firstbeatmedia.com>
2019-08-27 14:24:11 +02:00
Paul Gier 0b7ac85acb update procfs to v0.0.4 (#1457)
Signed-off-by: Paul Gier <pgier@redhat.com>
2019-08-27 09:26:19 +02:00
Björn Rabenstein 154d59dee7
Merge pull request #1452 from prometheus/beorn7/mixin
Update legendLink
2019-08-21 09:50:26 +02:00
beorn7 76ff263ca6 Update legendLink
This still had the 'k8s' in as it was copied and pasted from the
kubernetes-mixin.

Signed-off-by: beorn7 <beorn@grafana.com>
2019-08-20 18:49:12 +02:00
Björn Rabenstein 0f38d680b4
Merge pull request #1449 from prometheus/beorn7/mixin3
node-mixin: Make the severity of "critical" alerts configurable
2019-08-19 13:55:52 +02:00
Björn Rabenstein d208140290
Merge pull request #1450 from prometheus/beorn7/mixin
More improvements for the node dashboard
2019-08-19 11:08:18 +02:00
beorn7 44e5731de7 Add line for number of cores to load graph
Backported from the node dashboard in the kubernetes-mixin.

Signed-off-by: beorn7 <beorn@grafana.com>
2019-08-15 16:43:57 +02:00
beorn7 024d5ed55e Fix title of CPU panel to usage
We use the `mode="idle"` metric, but we are inverting it, so this is
usage, and that's intended.

Signed-off-by: beorn7 <beorn@grafana.com>
2019-08-15 16:36:10 +02:00
beorn7 a016d9cd6f node-mixin: Improve disk usage panel
- Use a stacked graph instead of a gauge as development over time is
  especially useful for disk space usage.

- By only taking one metric per device into account, we avoid
  double-counting for devices that are mounted multiple times.

Signed-off-by: beorn7 <beorn@grafana.com>
2019-08-15 16:32:54 +02:00
Björn Rabenstein 7ef6f2576d
node-mxin: Improve nodes dashboard (#1448)
* node-mixin: Improve nodes dashboard

- Use stacking where it makes sense.
- Normalize idle CPU so that stacking is more meaningful.
- Consistently fill where stacking is used but don't fill where not.
- Fix y axis max value for Idle CPU panel.
- Fix y axis min value for memory usage panel.
- Use `$__interval` for range where applicable (and set min step
  to 1m).
- Make the right Y axis for disk I/O actually work.

This is just an incremental improvements. It doesn't touch the more
involved TODOs.

Signed-off-by: beorn7 <beorn@grafana.com>
2019-08-15 00:40:51 +02:00
Björn Rabenstein 0d3a2d3209
Merge pull request #1447 from prometheus/beorn7/mixin
node-mixin: Fix various straight-forward issues in the USE dashboards
2019-08-15 00:37:43 +02:00
beorn7 97ef113762 Make the severity of "critical" alerts configurable
This addresses the blissful scenario where single-node failures are
unproblematic. No reason to wake somebody up if a node is about to
screw itself up by filling the disk.

Signed-off-by: beorn7 <beorn@grafana.com>
2019-08-14 22:24:24 +02:00
beorn7 f350aaf87e node-mixin: Fix various straight-forward issues in the USE dashboards
- Normalize cluster memory utilisation.

- Fix missing `1m` in memory saturation.

- Have both disk-related row next to each other instead with the
  network row in between.

- Correctly render transmit network traffic as negative, using
  `seriesOverrides` and `min: null` for the y-axis.

- Make panel and row naming consistent.

- Remove legend where it would just display a single entry with
  exactly the title of the panel.

- Fix metric name in individual node CPU Saturation panel.

- Break up disk space utilisation by device in the panel for an
  individual node.

NB: All of that doesn't touch any more subtle issues captured in the
various TODOs.

Signed-off-by: beorn7 <beorn@grafana.com>
2019-08-13 21:54:28 +02:00
Sandro Jäckel 697c2deed5 Update rootfs syntax in Docker example (#1443)
Signed-off-by: Sandro Jäckel <sandro.jaeckel@gmail.com>
2019-08-07 09:19:20 +02:00
Phil Frost 26d4fbdf07 Fix seconds reported by schedstat (#1426)
Upstream bugfix: https://github.com/prometheus/procfs/pull/191

Signed-off-by: Phil Frost <phil@postmates.com>
2019-08-06 19:08:06 +02:00
Richard Kojedzinszky 75462bf4fe Scrape thermal_zone temperatures (#1425)
* Scrape thermal_zone temperatures

Signed-off-by: Richard Kojedzinszky <richard@kojedz.in>
2019-08-04 12:56:36 +02:00
Ben Kochie 10146109ec
Update CHANGELOG for #1433
Signed-off-by: Ben Kochie <superq@gmail.com>
2019-08-03 12:33:25 +02:00
Philip Gough 2d95ecaa96 Extends uname collector to export on Darwin OS (#1433)
Adds uname collector support for Darwin and OpenBSD

Signed-off-by: Philip Gough <philip.p.gough@gmail.com>
2019-08-03 12:32:43 +02:00
PrometheusBot 2f2392af3f makefile: update Makefile.common with newer version (#1434)
Signed-off-by: prombot <prometheus-team@googlegroups.com>
2019-08-03 12:15:24 +02:00
Johannes 'fish' Ziemke fc73586c97 Remove text_collector_examples/ (#1441)
* Remove text_collector_examples/

These have been moved to https://github.com/prometheus-community/node-exporter-textfile-collector-scripts

This closes #1077

Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2019-08-03 12:14:51 +02:00
Solvik 0b710bb0c9 Handle JBOD setup for storcli exporter (#1419)
* handle jbod setup

Signed-off-by: Solvik Blum <solvik.blum@dailymotion.com>
2019-08-02 12:38:46 +02:00
Dipack P Panjabi a7452023db Added mountinfo changes to node_exporter (#1417)
Use the extra information gleaned from the mountinfo file to add
a 'mountaddr' field for NFS metrics. This helps prevent prometheus from
ignoring mounts that come from the same URL, but are actually from
different IP addresses.

This commit also rebases to current master

Signed-off-by: Dipack P Panjabi <dpanjabi@hudson-trading.com>
2019-07-28 11:32:40 +02:00
Ben Kochie 852b340a46
Add changelog entry for #1439
Signed-off-by: Ben Kochie <superq@gmail.com>
2019-07-28 10:38:41 +02:00
Matthias Rampke b133213c7a Report non-fatal collection errors in the exporter metric. (#1439)
As per prometheus/client_golang#543, pass the Registry for exporter
metrics when setting up the /metrics HTTP handler.

With this, the `promhttp_metric_handler_errors_total` metric will
increment on (possibly non-fatal) collection-time errors, such as
duplicate metrics from text files.

Signed-off-by: Matthias Rampke <mr@soundcloud.com>
2019-07-28 10:37:10 +02:00
Bernd Müller d2be72be4a changed fields for disk write and read data of S.M.A.R.T, Signed-off-by: Bernd Mueller <mueller@b1-systems.de> (#1235)
Signed-off-by: Bernd Müller <mueller@b1-systems.de>
2019-07-24 17:46:50 +02:00
Björn Rabenstein 443072dfc3
Merge pull request #1438 from paulfantom/fix_selectors
docs/node-mixin: fix incorrect queries
2019-07-24 15:24:42 +02:00
paulfantom c41826274d
docs/node-mixin: move fsSelector and diskDeviceSelector to the end of query
This will cause a query to be valid even if values of selector are
empty.

Additionally fixing query responsible for disk space usage.

Signed-off-by: paulfantom <pawel@krupa.net.pl>
2019-07-24 13:05:02 +02:00
Björn Rabenstein 106b09b4ed
Merge pull request #1429 from prometheus/beorn7/mixin
First iteration for the node mixin, 2nd attempt.
2019-07-23 23:14:15 +02:00
beorn7 79f0357e38 Added _excluding_lo to name of network rules that exclude lo
Signed-off-by: beorn7 <beorn@grafana.com>
2019-07-22 20:21:52 +02:00
beorn7 36dc7451c9 Improvement of comments and panel titles
Signed-off-by: beorn7 <beorn@grafana.com>
2019-07-22 14:06:27 +02:00
dt-rush 5d3e2ce2ef properly strip path.rootfs from mountpoint labels (#1421)
Change-type: patch
Connects-to: #1418
Signed-off-by: dt-rush <nickp@balena.io>
2019-07-19 16:51:17 +02:00
beorn7 e01d9f9e78 Break out device in disk IO rules/dashboard
Signed-off-by: beorn7 <beorn@grafana.com>
2019-07-18 15:59:35 +02:00
beorn7 b8c4b0cb29 Removed unneeded sum_ and avg_ from rule names
Signed-off-by: beorn7 <beorn@grafana.com>
2019-07-18 14:14:02 +02:00