Commit graph

74 commits

Author SHA1 Message Date
dt-rush 5d3e2ce2ef properly strip path.rootfs from mountpoint labels (#1421)
Change-type: patch
Connects-to: #1418
Signed-off-by: dt-rush <nickp@balena.io>
2019-07-19 16:51:17 +02:00
Steven Kreuzer d8e47a9f9f Expose additional XFS runtime statistics (#1423)
Include directory operation, read/write system call, and vnode runtime
statistics for XFS filesystems.

Signed-off-by: Steven Kreuzer <skreuzer@FreeBSD.org>
2019-07-15 16:28:09 +02:00
Ben Kochie 0de95ef8f3
Add changelog entry for #1414
Signed-off-by: Ben Kochie <superq@gmail.com>
2019-07-12 14:25:17 +02:00
Phil Frost f693a71c06 Scrape CPU latency stats from /proc/schedstat (#1389)
These are useful as a direct indication of CPU contention and task
scheduler latency.

Handy references:
 - https://github.com/torvalds/linux/blob/master/Documentation/scheduler/sched-stats.txt
 - https://doc.opensuse.org/documentation/leap/tuning/html/book.sle.tuning/cha.tuning.taskscheduler.html

procfs is updated to pull in the enabling change:
https://github.com/prometheus/procfs/pull/186

Signed-off-by: Phil Frost <phil@postmates.com>
2019-07-10 09:16:24 +02:00
Advait Bhatwadekar 3f49b31101 Closes issue #261 on node_exporter. (#1403)
* Closes issue #261 on node_exporter.

Delegated mdstat parsing to procfs project. mdadm_linux.go now only exports the metrics.
-> Added disk labels: "fail", "spare", "active" to indicate disk status
-> hanged metric node_md_disks_total ==> node_md_disks_required
-> Removed test cases for mdadm_linux.go, as the functionality they tested for has been moved to procfs project.

Signed-off-by: Advait Bhatwadekar <advait123@ymail.com>
2019-07-01 11:56:06 +02:00
mknapphrt 3108a50fb6 Fix systemd restart counter label from state to name (#1393)
Signed-off-by: Mark Knapp <mknapp@hudson-trading.com>
2019-06-25 09:37:48 +02:00
Ben Kochie c39f6749fc
Bugfix release 0.18.1 (#1366)
Cherry-pick two bug fixes into 0.18.1.

Signed-off-by: Ben Kochie <superq@gmail.com>
2019-06-04 14:29:33 +02:00
Ben Kochie 4a15edf0b6
Add changelog entry for #1364
Signed-off-by: Ben Kochie <superq@gmail.com>
2019-06-03 11:20:06 +02:00
Ben Kochie fdf9846282 Fixup 0.17.0 changelog (#1354)
* Fix ordering of CHANGE items by PR number.
* Add missing CHANGE for #1003

Signed-off-by: Ben Kochie <superq@gmail.com>
2019-06-02 10:51:07 +01:00
Noam Meltzer 501ccf9fb4 Add --collector.netdev.device-whitelist flag (#1279)
* Add --collector.netdev.device-whitelist flag

Sometimes it is desired to monitor only one netdev. The golang regexp
does not support a negated regex, so the ignored-devices flag is too
cumbersome for this task.
This change introduces a new flag: accept-devices, which is mutually
exclusive to ignored-devices. This flag allows specifying ONLY the
netdev you'd like.

Signed-off-by: Noam Meltzer <noam@cynerio.co>
2019-05-31 17:55:50 +02:00
David O'Rourke 814ef064c0 meminfo: Fix the size mismatch in the swapTotal check mib for BSD. (#1345)
Signed-off-by: David O'Rourke <david.orourke@gmail.com>
2019-05-14 17:42:36 -05:00
Ben Kochie f97f01c46c
Update for 0.18.0 release (#1337)
* Update CHANGELOG for release.
* Bump VERSION.
* Update vendoring.

Signed-off-by: Ben Kochie <superq@gmail.com>
2019-05-09 13:19:12 -05:00
Daniel Hodges 7882009870 Add perf exporter (#1274)
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2019-05-07 13:21:41 +02:00
Ben Kochie 78b9eb9c2c Use 64-bit Darwin netstat counters (#1319)
Avoid 32-bit counter rollovers.

Signed-off-by: Ben Kochie <superq@gmail.com>
2019-04-25 10:07:56 +02:00
Christian Hoffmann 36e3b2a923 textfile: use opened file's mtime as timestamp (#1326)
Previously, the node_textfile_mtime_seconds metric was based on the
Fileinfo.ModTime() of the ioutil.ReadDir() return value. This is based
on lstat() and therefore has unintended consequences for symlinks
(modification time of the symlink instead of the symlink target is
returned). It is also racy as the lstat() is performed before reading
the file.

This commit changes the node_textfile_mtime_seconds metric to be based
on a fresh Stat() call on the open file.  This eliminates the race and
works as expected for symlinks. Fixes #1324.

Signed-off-by: Christian Hoffmann <mail@hoffmann-christian.info>
2019-04-18 17:47:04 +02:00
Daniele Sluijters cc2fd82008 Expose /proc/pressure (#1261)
This enables the collection of pressure stall information as exposed
by the `/proc/pressure` interface added in the 4.20 release of the
Linux kernel.

Closes #1174

Signed-off-by: Daniele Sluijters <daenney@users.noreply.github.com>
2019-04-18 12:19:20 +02:00
Paul Gier cc847f2f44 collector/cpu: split cpu freq metrics into separate collector (#1253)
The cpu frequency information is not always needed and/or available.
This change allows the cpu frequency metrics to be enabled/disabled
separately from the other cpu metrics, and also prevents a frequency
metric failure (such as a parse error) from failing the main cpu
collector.

Fixes #1241

Signed-off-by: Paul Gier <pgier@redhat.com>
2019-02-19 17:22:54 +01:00
Ben Kochie f028b81615
Update systemd blacklist (#1255)
Include additional unit types in the default systemd collector
blacklist.

Signed-off-by: Ben Kochie <superq@gmail.com>
2019-02-17 17:57:15 +01:00
Paul Gier cb9e23c536 Systemd refactor (#1254)
This reduces the system metric collection time by using a wait group
and go routines to allow the systemd metric calls happen concurrently.

Also, makes the start time, restarts, tasks_max, and tasks_current metrics disabled by default
because these can be time consuming to gather.

Signed-off-by: Paul Gier <pgier@redhat.com>
2019-02-11 23:27:21 +01:00
Sachi King 18fc512fc4 Bond: Monitor bond mii_status not link operstate (#1124)
With a bond interface the state of the slave interface from the bond's
point of view is reflected in `mii_status` and is independent of the
link's `operstate`.

When a bond is monitored with `miimon`, `mii_status` will reflect the
state of the physical link as configured via the operator.

When a bond is monitored via `arp_interval` the `mii_status` will
reflect the results of the bond ARP checking.  This means the link can
be down from the bond's point of view, but up from a physical
connection point of view.

If a bond is not monitored via miimon or arp, the `mii_status` should
likely be always `up`, however I have observed a case where this is not
true and the `operstate` is `up` while `mii_status` is `down`.  Kernel
bond documentation stresses that a bond should not be configured without
one of `mii_mon` or `arp_interval` configured however.

This change results in the metric 'node_bonding_active' matching the
up/down state of the bond's point of view rather than operstate.

Signed-off-by: Sachi King <nakato@nakato.io>
2019-02-10 11:00:04 +01:00
Paul Gier e0d6d11859 netclass_linux: remove varying labels from the 'up' metric (#1243)
* netclass_linux: remove varying labels from the 'up' metric

This moves the variable label values such as 'operstate' out of
the 'network_up' metric and into a separate metric called '_info'.
This allows the 'up' metric to remain continous over state changes.
Fixes #1236

Signed-off-by: Paul Gier <pgier@redhat.com>
2019-02-07 15:59:32 +01:00
Johannes 'fish' Ziemke 6ea0aa73e4 Rename interface to device in netclass collector (#1224)
* Rename interface to device in netclass collector

This makes it consistent with other networking metrics like node_network_receive_bytes_total

This closes #1223 

Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2019-02-06 20:02:48 +01:00
Ralf Horstmann 3867ad5ab0 Add diskstats collector for OpenBSD (#1250)
* Add diskstats collector for OpenBSD

Tested on i386 and amd64, OpenBSD 6.4 and -current.

* Refactor diskstats collectors

This moves common descriptors from Linux, Darwin, OpenBSD
diskstats collectors into diskstats_common.go

Signed-off-by: Ralf Horstmann <ralf+github@ackstorm.de>
2019-02-06 11:36:22 +01:00
David O'Rourke d442108d7a collector: Implement uname collector for FreeBSD (#1239)
* collector: Implement uname collector for FreeBSD

Signed-off-by: David O'Rourke <david.orourke@gmail.com>
2019-02-05 17:39:24 +01:00
mknapphrt 7fbdd0ae93 Update procfs vendor (#1248)
Signed-off-by: Mark Knapp <mknapp@hudson-trading.com>
2019-02-04 16:54:41 +01:00
Jon Davies e766485286 Add kstat-based Solaris metrics (#1197)
* collector/loadavg_solaris.go: Use libkstat to gather load averages.
* go.mod: Added go-kstat.
* boot_time_solaris.go: Added.
* cpu_solaris.go: Added.
* README.md: Updated entries for Solaris.
* collector/zfs_solaris.go: Added.
* CHANGELOG.md: Added note about kstat-based Solaris metrics.

Signed-off-by: Jonathan Davies <jpds@protonmail.com>
2019-01-12 13:33:56 +01:00
Ben Kochie f9dd8e9b8c
Release v0.17.0 (#1168)
* Update CHANGELOG
* Update VERSION

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-30 15:18:48 +01:00
Ben Kochie 4abc6fba7d
Add fallback for missing /proc/1/mounts (#1172)
* Add fallback for missing /proc/1/mounts

On some systems, `/proc/1/mounts` is hidden from non-root users due to
the `hidepid` procfs feature. Attempt to fallback to `/proc/mounts` if
`/proc/1/mounts` is not found.

Signed-off-by: Ben Kochie <superq@gmail.com>

* Add tests.

Signed-off-by: Ben Kochie <superq@gmail.com>

* Add CHANGELOG entry.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-30 14:01:55 +01:00
Nemikolh 62f99f95f0 Add receive/transmit bytes total metric (wifi collector). (#1150)
Signed-off-by: Nemikolh <Nemikolh@users.noreply.github.com>
2018-11-19 19:15:54 +01:00
Ben Kochie ab19e0c831
Add changelog entry for #1148 (#1154)
Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-15 04:22:02 +01:00
Arno Uhlig 6edd9d217e [systemd] collect taskCurrent, tasksMax per systemd unit (#1098)
* [systemd] collect taskCurrent, tasksMax per systemd unit

Signed-off-by: Arno Uhlig <arno.uhlig@sap.com>
2018-11-14 10:50:39 +01:00
Ben Kochie b1eec66640
Add TCPSynRetrans to netstat default filter (#1143)
Tcp SYN packet retransmits are a very useful signal as they affect
network performance disproportionately to regular TCP retransmits.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-07 17:21:18 +01:00
Patrick bdc0e7e678 Collect additional common Infiniband counters (#1120)
* Collect additional common Infiniband counters

Signed-off-by: Patrick Freeman <will.pat.free@gmail.com>
2018-10-30 21:54:09 +01:00
Ben Kochie 0da9d248e7
Update for 0.17.0-rc.0 release (#1118)
* Update VERSION.
* Update CHANGELOG.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-10-19 17:29:19 +02:00
Ralf Horstmann 9f820bd3ee Update cpu collector for OpenBSD 6.4 (#1094)
Starting with (not yet released) OpenBSD 6.4, sysctl KERN_CPTIME2 will
return ENODEV for offline CPUs.

SMT siblings are reported as offline when hw.smt is disabled, which is
the default since one of the later Spectre variants. So this might
affect a few systems.

For more details see:
https://cvsweb.openbsd.org/src/sys/kern/kern_sysctl.c#rev1.348

Signed-off-by: Ralf Horstmann <ralf+github@ackstorm.de>
2018-10-02 10:21:30 +02:00
Ben Kochie 0fdc089187
Change systemd unit filtering (#1083)
* Change systemd unit filtering

Get all units from systemd and filter in Go.
* Improves compatibility with older versions of systemd.
* Improve debugging by printing when units pass the filter.
* Remove extraneous newlines from log messages.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-09-24 15:04:55 +02:00
Ben Kochie ebdd524123
Correctly cast Darwin memory info (#1060)
* Correctly cast Darwin memory info

* Cast stats to float64 before doing math on them to avoid integer
wrapping.
* Remove invalid `_total` suffix from gauge values.
* Handle counters in `meminfo.go`.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-09-07 22:27:52 +02:00
James Hartig 60c827231a NRestarts or NRefused aren't available on older systemd versions (#1039)
* If NRestarts or NRefused are not available, don't ignore the unit itself
* Don't report systemd metrics (NRestarts/NRefused) that are not available

Signed-off-by: James Hartig <james@getadmiral.com>
2018-08-14 14:28:26 +02:00
Ben Kochie fe5a117831
Handle vanishing PIDs (#1043)
PIDs can vanish (exit) from /proc/ between gathering the list of PIDs
and getting all of their stats.

* Ignore file not found errors.
* Explicitly count the PIDs we find.
* Cleanup some error style issues.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-08-13 17:27:23 +02:00
Ben Kochie 0662673ad6
Disable wifi collector by default (#1037)
* Disable wifi collector by default

Disable the wifi collector by default due to suspected cashing issues and goroutine leaks.
* https://github.com/prometheus/node_exporter/issues/870
* https://github.com/prometheus/node_exporter/issues/1008

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-08-07 10:27:20 +02:00
Ben Kochie 5d23ad0ca7
Fix supervisord collector (#978)
* Replace supervisord xmlrpc library
* Use `github.com/mattn/go-xmlrpc` that doesn't leak goroutines.
* Fix uptime metric

* Use Prometheus best practices for uptime metric.
  * Use "start time" rather than "uptime".
  * Don't emit a start time if the process is down.
* Add changelog entry.
* Add example compatibility rules.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-08-06 16:54:46 +02:00
xginn8 140b8b85c3 Filter out uninstalled systemd units when collecting all units (#1011)
fixes #567

Signed-off-by: Matthew McGinn <mamcgi@gmail.com>
2018-07-22 09:20:03 +02:00
Sven Lange 2ae8c1c7a7 Add systemd uptime metric collection (#952)
* Add systemd uptime metric collection

Signed-off-by: Sven Lange <tdl@hadiko.de>
2018-07-18 16:02:05 +02:00
xginn8 9b97f44a70 Add a counter for refused socket unit connections, available as of systemd 239 (#995)
Signed-off-by: xginn8 <mamcgi@gmail.com>
2018-07-16 16:01:42 +02:00
xginn8 ac5a981761 Adding socket stat collection for systemd socket units (#968)
Signed-off-by: xginn8 <mamcgi@gmail.com>
2018-07-05 16:26:48 +02:00
xginn8 8af84a215d Add support for NRestarts counter introduced in systemd 235 (#992)
* Add support for NRestarts counter introduced in systemd 235

`.service` units increment this counter any time the Restart= condition is
triggered.

Signed-off-by: Matthew McGinn <mamcgi@gmail.com>
2018-07-05 13:31:45 +02:00
Ben Kochie 1882a08041 Release 0.16.0
Changes since 0.16.0-rc.3

* [CHANGE] align Darwin disk stat names with Linux #930

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-05-15 16:16:05 +02:00
Ben Kochie dc1972e9e3
Document upgrade options for v0.16.0
* Add an upgrade guide.
* Add an example recording rules.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-05-11 13:45:36 +02:00
Ben Kochie 7073dcdcb5
Fix 0.16.0-rc.3 release date. 2018-04-27 17:50:15 +02:00
Ben Kochie 11b60ac32f
Release v0.16.0-rc.3
Chaneges since v0.16.0-rc.2
* Remove gmond collector #852
* Build with Go 1.9[0]
* Fix /proc/net/dev/ interface name handling #910

[0]: https://github.com/prometheus/node_exporter/issues/870

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-04-27 16:50:48 +02:00