Commit graph

876 commits

Author SHA1 Message Date
Paul Gier b1298677aa Early init of procfs (#1315)
Minor change to match naming convention in other collectors.

Initialize the proc or sys FS instance once while initializing
each collector instead of re-creating for each metric update.

Signed-off-by: Paul Gier <pgier@redhat.com>
2019-04-10 18:16:12 +02:00
Paul Gier cc847f2f44 collector/cpu: split cpu freq metrics into separate collector (#1253)
The cpu frequency information is not always needed and/or available.
This change allows the cpu frequency metrics to be enabled/disabled
separately from the other cpu metrics, and also prevents a frequency
metric failure (such as a parse error) from failing the main cpu
collector.

Fixes #1241

Signed-off-by: Paul Gier <pgier@redhat.com>
2019-02-19 17:22:54 +01:00
Ben Kochie f028b81615
Update systemd blacklist (#1255)
Include additional unit types in the default systemd collector
blacklist.

Signed-off-by: Ben Kochie <superq@gmail.com>
2019-02-17 17:57:15 +01:00
Paul Gier cb9e23c536 Systemd refactor (#1254)
This reduces the system metric collection time by using a wait group
and go routines to allow the systemd metric calls happen concurrently.

Also, makes the start time, restarts, tasks_max, and tasks_current metrics disabled by default
because these can be time consuming to gather.

Signed-off-by: Paul Gier <pgier@redhat.com>
2019-02-11 23:27:21 +01:00
Sachi King 18fc512fc4 Bond: Monitor bond mii_status not link operstate (#1124)
With a bond interface the state of the slave interface from the bond's
point of view is reflected in `mii_status` and is independent of the
link's `operstate`.

When a bond is monitored with `miimon`, `mii_status` will reflect the
state of the physical link as configured via the operator.

When a bond is monitored via `arp_interval` the `mii_status` will
reflect the results of the bond ARP checking.  This means the link can
be down from the bond's point of view, but up from a physical
connection point of view.

If a bond is not monitored via miimon or arp, the `mii_status` should
likely be always `up`, however I have observed a case where this is not
true and the `operstate` is `up` while `mii_status` is `down`.  Kernel
bond documentation stresses that a bond should not be configured without
one of `mii_mon` or `arp_interval` configured however.

This change results in the metric 'node_bonding_active' matching the
up/down state of the bond's point of view rather than operstate.

Signed-off-by: Sachi King <nakato@nakato.io>
2019-02-10 11:00:04 +01:00
Paul Gier e0d6d11859 netclass_linux: remove varying labels from the 'up' metric (#1243)
* netclass_linux: remove varying labels from the 'up' metric

This moves the variable label values such as 'operstate' out of
the 'network_up' metric and into a separate metric called '_info'.
This allows the 'up' metric to remain continous over state changes.
Fixes #1236

Signed-off-by: Paul Gier <pgier@redhat.com>
2019-02-07 15:59:32 +01:00
Johannes 'fish' Ziemke 6ea0aa73e4 Rename interface to device in netclass collector (#1224)
* Rename interface to device in netclass collector

This makes it consistent with other networking metrics like node_network_receive_bytes_total

This closes #1223 

Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2019-02-06 20:02:48 +01:00
Ralf Horstmann 3867ad5ab0 Add diskstats collector for OpenBSD (#1250)
* Add diskstats collector for OpenBSD

Tested on i386 and amd64, OpenBSD 6.4 and -current.

* Refactor diskstats collectors

This moves common descriptors from Linux, Darwin, OpenBSD
diskstats collectors into diskstats_common.go

Signed-off-by: Ralf Horstmann <ralf+github@ackstorm.de>
2019-02-06 11:36:22 +01:00
David O'Rourke d442108d7a collector: Implement uname collector for FreeBSD (#1239)
* collector: Implement uname collector for FreeBSD

Signed-off-by: David O'Rourke <david.orourke@gmail.com>
2019-02-05 17:39:24 +01:00
Paul Gier 2b81bff518 collector: use path/filepath for handling file paths (#1245)
Similar to #1228.  Update the remaining collectors to use
'path/filepath' intead of 'path' for manipulating file paths.

Signed-off-by: Paul Gier <pgier@redhat.com>
2019-02-05 16:37:27 +01:00
Ralf Horstmann dda51ad06a Fix staticcheck ST1003 warnings (#1249)
This fixes a few staticcheck ST1003 warnings in OpenBSD CPU
collector. No functional change.

Signed-off-by: Ralf Horstmann <ralf+github@ackstorm.de>
2019-02-05 07:46:50 +01:00
mknapphrt 7fbdd0ae93 Update procfs vendor (#1248)
Signed-off-by: Mark Knapp <mknapp@hudson-trading.com>
2019-02-04 16:54:41 +01:00
Paul Gier 40dce45d8d collector/systemd: add new label "type" for systemd_unit_state (#1229)
Adds a new label called "type" systemd_unit_state which contains the
Type field from the unit file.  This applies only to the .service and
.mount unit types.  The other unit types do not include the optional
type field.

Fixes #1210

Signed-off-by: Paul Gier <pgier@redhat.com>
2019-01-29 23:54:47 +01:00
Matt Layher 3b5c2f6463 collector: use path/filepath for handling file paths (#1228)
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2019-01-21 17:44:55 +01:00
Jon Davies e766485286 Add kstat-based Solaris metrics (#1197)
* collector/loadavg_solaris.go: Use libkstat to gather load averages.
* go.mod: Added go-kstat.
* boot_time_solaris.go: Added.
* cpu_solaris.go: Added.
* README.md: Updated entries for Solaris.
* collector/zfs_solaris.go: Added.
* CHANGELOG.md: Added note about kstat-based Solaris metrics.

Signed-off-by: Jonathan Davies <jpds@protonmail.com>
2019-01-12 13:33:56 +01:00
Ben Kochie 070e4b2e17 Update Makefile.common (#1220)
* Update Makefile.common

Update to new staticcheck method[0].

[0]: https://github.com/prometheus/prometheus/pull/5057

Signed-off-by: Ben Kochie <superq@gmail.com>

* Fix staticcheck errors.

Signed-off-by: Ben Kochie <superq@gmail.com>
2019-01-04 15:58:53 +00:00
Ben Kochie 73ddf5f1f7 netstat: Add TCP In/Out Segs (#1185)
* netstat: Add TCP In/Out Segs

In order to get a better idea of TCP packet loss, we need to know how
many `node_netstat_Tcp_OutSegs` there are so we can compare this to
`node_netstat_Tcp_RetransSegs`.

Signed-off-by: Ben Kochie <superq@gmail.com>

* Update fixtures

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-12-08 12:16:02 +01:00
Tariq Ibrahim 6bd51269b7 update to host_statistics64 for Darwin meminfo (#1183)
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2018-12-06 16:47:20 +01:00
Ben Kochie 4abc6fba7d
Add fallback for missing /proc/1/mounts (#1172)
* Add fallback for missing /proc/1/mounts

On some systems, `/proc/1/mounts` is hidden from non-root users due to
the `hidepid` procfs feature. Attempt to fallback to `/proc/mounts` if
`/proc/1/mounts` is not found.

Signed-off-by: Ben Kochie <superq@gmail.com>

* Add tests.

Signed-off-by: Ben Kochie <superq@gmail.com>

* Add CHANGELOG entry.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-30 14:01:55 +01:00
Jerome Froelich 0cb0c4d911 Remove unused variable readOnly from filesystem_linux.go. (#1173)
The pull request #1002 changed the logic used on Linux servers to determine if a filesystem is
read-only. As a result of this change, the variable `readOnly` is now unused and can be removed.

Signed-off-by: Jerome Froelich <jeromefroelich@hotmail.com>
2018-11-30 14:01:39 +01:00
Nemikolh 62f99f95f0 Add receive/transmit bytes total metric (wifi collector). (#1150)
Signed-off-by: Nemikolh <Nemikolh@users.noreply.github.com>
2018-11-19 19:15:54 +01:00
ioriveur 17fee8081f Check BSD's mib which accounts for swap size (#1149)
* Change Dfly's CPU counting frequency, see: https://github.com/prometheus/node_exporter/issues/1129

Signed-off-by: iori-yja <fivio.11235813@gmail.com>

* Convert Dfly's CPU unit into second

Signed-off-by: iori-yja <fivio.11235813@gmail.com>

* Check BSD's mib which accounts for swap size; see #1127

Signed-off-by: iori-yja <fivo.11235813@gmail.com>

* fix swap check code

Signed-off-by: iori-yja <fivo.11235813@gmail.com>
2018-11-17 11:02:54 +01:00
Arno Uhlig 6edd9d217e [systemd] collect taskCurrent, tasksMax per systemd unit (#1098)
* [systemd] collect taskCurrent, tasksMax per systemd unit

Signed-off-by: Arno Uhlig <arno.uhlig@sap.com>
2018-11-14 10:50:39 +01:00
Ben Kochie b1eec66640
Add TCPSynRetrans to netstat default filter (#1143)
Tcp SYN packet retransmits are a very useful signal as they affect
network performance disproportionately to regular TCP retransmits.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-07 17:21:18 +01:00
Matt Layher 073e056121
Merge pull request #1131 from prometheus/mdl-collector-export
collector: export NodeCollector for documentation purposes
2018-10-31 12:38:48 -04:00
Matt Layher c0a55e3f80 collector: add bounds check and test for filesystem collector (#1133)
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2018-10-30 22:12:42 +01:00
Patrick bdc0e7e678 Collect additional common Infiniband counters (#1120)
* Collect additional common Infiniband counters

Signed-off-by: Patrick Freeman <will.pat.free@gmail.com>
2018-10-30 21:54:09 +01:00
Paul Gier 988f049040 collector/hwmon_linux: handle temperature sensor file which doesn't have item suffix (#1123)
In some cases the file might be called "temp" instead of the usual format "temp<index>_<item>"
as described in the kernel docs: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface
In this case, treat this as an _input file containing the current temperature reading.

Fixes #1122

Signed-off-by: Paul Gier <pgier@redhat.com>
2018-10-30 18:49:22 +01:00
Paul Gier 38163f234f collector/diskstats: don't fail if there are extra stats, just ignore… (#1125)
* collector/diskstats: don't fail if there are extra stats, just ignore them

Signed-off-by: Paul Gier <pgier@redhat.com>
2018-10-30 18:45:00 +01:00
Matt Layher 778124a56c collector: add bounds check and test for tcpstat collector (#1134)
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2018-10-27 09:21:36 +02:00
Matt Layher 3d798aa4a1 collector: fix golint problems in ZFS collector (#1132)
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2018-10-27 09:18:33 +02:00
Matt Layher 2c2ee93519
collector: export NodeCollector for documentation purposes
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2018-10-26 15:42:00 -04:00
Ben Kochie a0a164defb
Update cpufreq metrics collector (#1117)
* Update Linux cpufreq collector to use new procfs library functions.
* Split thermal throttle collection to a separate function.
* Add new required fixtures and repack ttar file.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-10-18 17:28:19 +02:00
Paul Gier 7057c64f45 fix a few minor golint warnings (#1110)
Signed-off-by: Paul Gier <pgier@redhat.com>
2018-10-15 18:44:06 +02:00
Paul Gier e8d8199072 Update diskstats for linux kernel 4.19 (#1109)
The format of /proc/diskstats is changing in linux-4.19 to include some
additional fields.  See: https://www.kernel.org/doc/Documentation/iostats.txt

* collector/diskstats: use constants for some hard coded strings
* collector/diskstats: update diskstats for linux-4.19
* collector/diskstats: remove kernel doc url from individual metrics

Signed-off-by: Paul Gier <pgier@redhat.com>
2018-10-15 17:24:28 +02:00
Ben Kochie 0880d460d7
Ignore additional virtual filesystems (#1104)
Add more virtual filesystems to the default ignore list
* bpf
* cgroup2
* selinuxfs
* squashfs

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-10-12 11:24:32 +02:00
Dario Maiocchi 01ec8c5c5c Remove continue with label (#1084)
Instead of continue with label use helper function
Signed-off-by: dmaiocchi <dmaiocchi@suse.com>
2018-10-05 13:20:30 +02:00
Ben Kochie a1ce712e22
Cleanup unused /proc/mounts fixture. (#1097)
* Cleanup unused /proc/mounts fixture.
* Ignore Uint -> Unit in codespell.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-10-04 18:07:12 +02:00
Mario Trangoni 3659260b66 infiniband: Handle iWARP* RDMA modules N/A (#974)
* infiniband: Add not connected i40iw0/ports/1 fixtures
* infiniband: Handle issue when iWARP* RDMA modules are not available

This is related to #966, and handle this error,

Jun 07 13:33:24 hostname node_exporter[81888]: time="2018-06-07T13:33:24+02:00" level=error msg="ERROR: infiniband
collector failed after 0.000929s: strconv.ParseUint: parsing \"N/A (no PMA)\": invalid syntax" source="collector.go:132"

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2018-10-04 15:05:59 +02:00
Yecheng Fu 0f9842f20a [continue 912] strip rootfs prefix for run in docker (#1058)
* strip rootfs prefix for run in docker
* Use `/` as default value of path.rootfs, and parse mounts from `/proc/1/mounts`.
* No need to mount `/proc` and `/sys` because we share host's PID
namespace, which allows processes within the container to see all of the
processes on the system.

Closes: #66

Signed-off-by: Ivan Mikheykin <ivan.mikheykin@flant.com>
Signed-off-by: Yecheng Fu <cofyc.jackson@gmail.com>
2018-10-04 14:11:21 +02:00
Ralf Horstmann 9f820bd3ee Update cpu collector for OpenBSD 6.4 (#1094)
Starting with (not yet released) OpenBSD 6.4, sysctl KERN_CPTIME2 will
return ENODEV for offline CPUs.

SMT siblings are reported as offline when hw.smt is disabled, which is
the default since one of the later Spectre variants. So this might
affect a few systems.

For more details see:
https://cvsweb.openbsd.org/src/sys/kern/kern_sysctl.c#rev1.348

Signed-off-by: Ralf Horstmann <ralf+github@ackstorm.de>
2018-10-02 10:21:30 +02:00
Daniele Sluijters d999dacdc6 filesystem: Ignore netns/nsfs mounts (#1047)
When starting Docker containers a whole bunch of netns (network
namespace) mounts are created that the node exporter can't make any
sense of (and can't read either).

This ignores all nsfs filesystems.

Fixes #875

Signed-off-by: Daniele Sluijters <daenney@users.noreply.github.com>
2018-09-26 10:45:51 +02:00
Ben Kochie 0fdc089187
Change systemd unit filtering (#1083)
* Change systemd unit filtering

Get all units from systemd and filter in Go.
* Improves compatibility with older versions of systemd.
* Improve debugging by printing when units pass the filter.
* Remove extraneous newlines from log messages.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-09-24 15:04:55 +02:00
Luca Bruno 4672ea1671 collector/timex: remove cgo dependency (#1079)
This removes the cgo import from timex collector, as it was only used
to define two constants. Those are part of the Linux kernel<->userspace
interface, thus there is no need to depend on libc to source them:
https://github.com/torvalds/linux/blob/v4.18/include/uapi/linux/timex.h

Signed-off-by: Luca Bruno <luca.bruno@coreos.com>
2018-09-20 11:51:34 +02:00
Björn Rabenstein 1c9ea46cca Update vendoring for client_golang and friends (#1076)
Signed-off-by: beorn7 <beorn@soundcloud.com>
2018-09-17 17:09:52 +02:00
Ben Kochie ebdd524123
Correctly cast Darwin memory info (#1060)
* Correctly cast Darwin memory info

* Cast stats to float64 before doing math on them to avoid integer
wrapping.
* Remove invalid `_total` suffix from gauge values.
* Handle counters in `meminfo.go`.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-09-07 22:27:52 +02:00
Marco Tulio R Braga 05e55bddad Fix typo on description of read_time_seconds_total (#1057)
Fix typo on unit description of metric `*read_time_seconds_total` from milliseconds to seconds.

Signed-off-by: Marco Tulio R Braga <marco.tulio@mtulio.eng.br>
2018-09-02 09:46:45 +02:00
Dan Fredell c52e0d3353 Fix SmartOS build #1017 (#1018)
Signed-off-by: Dan Fredell <Dan.Fredell@gmail.com>
2018-08-23 10:57:15 +00:00
James Hartig 60c827231a NRestarts or NRefused aren't available on older systemd versions (#1039)
* If NRestarts or NRefused are not available, don't ignore the unit itself
* Don't report systemd metrics (NRestarts/NRefused) that are not available

Signed-off-by: James Hartig <james@getadmiral.com>
2018-08-14 14:28:26 +02:00
Ben Kochie fe5a117831
Handle vanishing PIDs (#1043)
PIDs can vanish (exit) from /proc/ between gathering the list of PIDs
and getting all of their stats.

* Ignore file not found errors.
* Explicitly count the PIDs we find.
* Cleanup some error style issues.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-08-13 17:27:23 +02:00
Ben Kochie 0662673ad6
Disable wifi collector by default (#1037)
* Disable wifi collector by default

Disable the wifi collector by default due to suspected cashing issues and goroutine leaks.
* https://github.com/prometheus/node_exporter/issues/870
* https://github.com/prometheus/node_exporter/issues/1008

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-08-07 10:27:20 +02:00
Ben Kochie 5d23ad0ca7
Fix supervisord collector (#978)
* Replace supervisord xmlrpc library
* Use `github.com/mattn/go-xmlrpc` that doesn't leak goroutines.
* Fix uptime metric

* Use Prometheus best practices for uptime metric.
  * Use "start time" rather than "uptime".
  * Don't emit a start time if the process is down.
* Add changelog entry.
* Add example compatibility rules.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-08-06 16:54:46 +02:00
Julius Volz 2c52b8c761
systemd: Remove unneeded/unhandled error returns (#1035)
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-08-05 16:55:25 +02:00
Christian Hoffmann 6bdc5558ec build: make staticcheck happy by using real regexp patterns #1025 (#1026)
Signed-off-by: Christian Hoffmann <mail@hoffmann-christian.info>
2018-07-30 07:57:18 +02:00
Hannes Körber 14a4f0028e Enable nfs protocol (#998)
* vendor: Update prometheus/procfs

Signed-off-by: Hannes Körber <hannes.koerber@haktec.de>

* mountstats: Use new NFS protocol field

In https://github.com/prometheus/procfs/pull/100, the NFSTransportStats
struct was expanded by a field called protocol that specifies the NFS
protocol in use, either "tcp" or "udp". This commit adds the protocol as
a label to all NFS metrics exported via the mountstats collector.

Signed-off-by: Hannes Körber <hannes.koerber@haktec.de>

* Update fixtures for UDP mount

Signed-off-by: Hannes Körber <hannes.koerber@haktec.de>
2018-07-24 00:47:12 +02:00
Johannes Wienke 5c780d132c Exclude only subdirectories of /var/lib/docker (#1003)
It is quite common to put /var/lib/docker itself on a separate partition
and that should be monitored as well.

Signed-off-by: Johannes Wienke <languitar@semipol.de>
2018-07-23 15:43:42 +02:00
Ben Kochie 23f95c8e04
Fix ntp collector thread safety (#1014)
Make the ntp collector thread safe by wrapping a mutex lock around the
leapMidnight variable.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-07-22 14:36:33 +02:00
xginn8 140b8b85c3 Filter out uninstalled systemd units when collecting all units (#1011)
fixes #567

Signed-off-by: Matthew McGinn <mamcgi@gmail.com>
2018-07-22 09:20:03 +02:00
Sven Lange 2ae8c1c7a7 Add systemd uptime metric collection (#952)
* Add systemd uptime metric collection

Signed-off-by: Sven Lange <tdl@hadiko.de>
2018-07-18 16:02:05 +02:00
neiledgar 7e4d9bd150 Update wifi stats to support multiple stations (#977) (#980)
Signed-off-by: neiledgar <neil.edgar@btinternet.com>
2018-07-16 16:02:25 +02:00
xginn8 9b97f44a70 Add a counter for refused socket unit connections, available as of systemd 239 (#995)
Signed-off-by: xginn8 <mamcgi@gmail.com>
2018-07-16 16:01:42 +02:00
Brandon Gilmore 76bbd8dd18 Use /proc/mounts instead of statfs(2) for ro state (#1002)
While the statfs(2) approach is reliable for normally mounted filesystems, the
flags returned can be inconsistent when filesystem has been remounted read-only
after encountering an error. The returned flags do accurately represent the
internal state of the filesystem, but they do not reflect whether the VFS layer
will accept writes. Instead, it makes sense to parse the current VFS mount
state from the options field in /proc/mounts since it takes precedence.

Signed-off-by: Brandon Gilmore <bgilmore@valvesoftware.com>
2018-07-16 15:56:27 +02:00
Jan Klat c4102f1175 Add sys/class/net parsing from procfs and expose its metrics (#851)
* add sys/class/net parsing from procfs and expose its metrics

Signed-off-by: Jan Klat <jenik@klatys.cz>

* change code to use int pointers per procfs change, move netclass to separate collector, change metric naming

Signed-off-by: Jan Klat <jenik@klatys.cz>

* bump year in licence, remove redundant newline, correct fixtures

Signed-off-by: Jan Klat <jenik@klatys.cz>

* fix style

Signed-off-by: Jan Klat <jenik@klatys.cz>

* change carrier changes to counter type

Signed-off-by: Jan Klat <jenik@klatys.cz>

* fix e2e output

Signed-off-by: Jan Klat <jenik@klatys.cz>

* add fixtures

Signed-off-by: Jan Klat <jenik@klatys.cz>

* update vendor, use fixtures correctly

Signed-off-by: Jan Klat <jenik@klatys.cz>

* change fixtures (device in /sys/class/net should be symlinked)

Signed-off-by: Jan Klat <jenik@klatys.cz>

* correct fixtures for 64k page, updated readme

Signed-off-by: Jan Klat <jenik@klatys.cz>
2018-07-16 15:08:18 +02:00
mknapphrt 09b4305090 Changed the way that stuck mounts are handled. If a mount fails to return, it will stop being queried until it returns. (#997)
Fixed spelling mistakes.

Update transport_generic.go

Changed to a mutex approach instead of channels and added a timeout before declaring a mount stuck.

Removed unnecessary lock channel and clarified some var names.

Fixed style nits.

Signed-off-by: Mark Knapp <mknapp@hudson-trading.com>
2018-07-14 11:10:28 +02:00
xginn8 ac5a981761 Adding socket stat collection for systemd socket units (#968)
Signed-off-by: xginn8 <mamcgi@gmail.com>
2018-07-05 16:26:48 +02:00
xginn8 8af84a215d Add support for NRestarts counter introduced in systemd 235 (#992)
* Add support for NRestarts counter introduced in systemd 235

`.service` units increment this counter any time the Restart= condition is
triggered.

Signed-off-by: Matthew McGinn <mamcgi@gmail.com>
2018-07-05 13:31:45 +02:00
Ben Kochie 107e5dfecc
Fix mdadm collector issues (#985)
* Send "Personality unknown" to debug, not info, remove unnecessary newline.
* Add support for "linear" personality.
* Always set number of active disks to 0 when a device is inactive.
* Add total disks calculation to unknown personalites.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-07-02 12:38:20 +02:00
Derek Marcotte 2678d68dcc Fix for #945, cpu temperature is signed. (#965)
* Fix for #945, cpu temperature is signed.

Added a type conversion to cpu temperature sysctl.  Will still
collect/report -1 when the value is -1, this is because it should be up
to interpretation whether this is the correct value for the system or
not.

Some drivers will report -1 for cpu temperature.  Other sensors will
report "an input into the fan control algorithm", i.e. not the actual
temperature, but how much fan it wants.  Some people cool their machines
with liquid nitrogen.

Signed-off-by: Derek Marcotte <554b8425@razorfever.net>
2018-06-07 15:01:25 +02:00
Brad Beam e3cf1d5187 Adding support for evaluating octal characters in mountpoint (#954)
Signed-off-by: Brad Beam <brad.beam@b-rad.info>
2018-06-06 16:49:19 +02:00
Pavlo Kutishchev 456bf5094a Add processes exporter (#950)
* Add processes exporter

Signed-off-by: Pavel Kutishchev <pavel.kutishchev@olx.com>
Signed-off-by: Ben Kochie <superq@gmail.com>
2018-06-05 19:38:32 +02:00
Alexey Kopytov dd98a09bb2 A couple of ARM64-related fixes (#934)
* Do not rely on AArch64 CPUs to support 32-bit ARM for cross-testing.

Signed-off-by: Alexey Kopytov <akopytov@gmail.com>

* aarch64 like ppc64le reports 64k node_sockstat_TCP_mem_bytes due to 64k pages.

Signed-off-by: Alexey Kopytov <akopytov@gmail.com>
2018-05-14 15:55:49 +02:00
Steve Kotsopoulos 84dc362b05 Align Darwin disk stat names with Linux (#930)
Signed-off-by: Steve Kotsopoulos <sk@fywss.com>
2018-05-02 11:32:55 +02:00
Mario Trangoni 24a28fcc9e Remove unused func, var, and const (#928)
Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2018-04-29 14:35:43 +02:00
Mario Trangoni c9f421d0dd Fix some golint issues (#927)
* collector/cpu_*: rename nodeCpuSecondsDesc to nodeCPUSecondsDesc

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>

* collector/qdisc_linux.go: add NewQdiscStatCollector comment

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>

* collector/cpu_linux.go: rename core_map to coreMap

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2018-04-29 14:34:47 +02:00
Ben Kochie 361b5bf85d
Merge pull request #852 from prometheus/remove-gmond
Remove gmond collector
2018-04-27 10:02:16 +02:00
Ben Kochie b10ca77680
Fix /proc/net/dev/ interface name handling
* Allow any character (UTF-8) for Linux interface names.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-04-18 12:53:59 +02:00
Ben Kochie 1ab4a460c7 Update ppc64le end-to-end fixture.
Signed-off-by: Ben Kochie <superq@gmail.com>
2018-04-18 09:12:21 +02:00
Johannes 'fish' Ziemke fd66a86a30 Remove gmond collector
Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2018-04-17 20:20:24 +02:00
Ben Kochie 0f5be132ac
Merge pull request #904 from prometheus/superq/if_alias
Fix parsing of interface aliases in netdev linux
2018-04-17 13:37:21 +02:00
Ben Kochie a528966dcd Fix parsing of interface aliases in netdev linux
Very old kernels expose interface aliases as `foo0:0`, adjust the line
parsing to handle these names.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-04-17 13:15:02 +02:00
Ben Kochie f6008b242b
Merge pull request #901 from mischief/bsd_boottime
collector: implement node_boot_time_seconds for OpenBSD/NetBSD/Darwin
2018-04-17 07:48:39 +02:00
Jürgen Hötzel de0632c2e9 Fix memory corruption when number of filesystems > 16 (#900)
Signed-off-by: Juergen Hoetzel <juergen@archlinux.org>
2018-04-16 12:39:15 +02:00
mischief 26a385d7ab collector: implement node_boot_time_seconds for OpenBSD/NetBSD/Darwin
Signed-off-by: mischief <mischief@offblast.org>
2018-04-15 08:26:46 +00:00
Ben Kochie 015b86670a
Update ppc64le e2e output.
Signed-off-by: Ben Kochie <superq@gmail.com>
2018-04-14 15:28:06 +02:00
Ben Kochie 0507b0c9a2
Fix formatting.
Signed-off-by: Ben Kochie <superq@gmail.com>
2018-04-14 15:02:20 +02:00
Dmitriy Lukyanchikov eddd1b9357 Fix netdev collector for linux (#890)
fix variable name, fix transmitHeader extracting
modify fixtures to run tests with updated netdev_linux collector

Signed-off-by: dmitriy-lukyanchikov <d.lukyanchikov@anchorfree.com>
2018-04-14 13:58:56 +02:00
Derek Marcotte fe86e908da Update ppc64 fixtures to unbreak end-to-end.
efc1fdb added new labels.

Signed-off-by: Derek Marcotte <554b8425@razorfever.net>
2018-04-13 06:33:38 -04:00
Karsten Weiss 7e392e6634 Fix spelling mistakes found by codespell
Signed-off-by: Karsten Weiss <knweiss@gmail.com>
2018-04-09 18:27:17 +02:00
Karsten Weiss efc1fdb6d0 cpu: Add a 2nd label 'package' to metric node_cpu_core_throttles_total (#871)
* cpu: Add a 2nd label 'package' to metric node_cpu_core_throttles_total

This commit fixes the node_cpu_core_throttles_total metrics on
multi-socket systems as the core_ids are the same for each package.
I.e. we need to count them seperately.

Rename the node_package_throttles_total metric label `node` to `package`.

Reorganize the sys.ttar archive and use the same symlinks as the Linux
kernel. Also, the new fixtures now use a dual-socket dual-core cpu w/o
HT/SMT (node0: cpu0+1, node1: cpu2+3) as well as processor-less
(memory-only) NUMA node 'node2' (this is a very rare case).

Signed-off-by: Karsten Weiss <knweiss@gmail.com>

* cpu: Use the direct /sys path to the cpu files.

Use the direct path /sys/devices/system/cpu/cpu[0-9]* (without symlinks)
instead of /sys/bus/cpu/devices/cpu[0-9]*.

The latter path also does not exist e.g. on RHEL 6.9's kernel.

Signed-off-by: Karsten Weiss <knweiss@gmail.com>

* cpu: Reverse core+package throttle processing order

Signed-off-by: Karsten Weiss <knweiss@gmail.com>

* cpu: Add documentation URLs

Signed-off-by: Karsten Weiss <knweiss@gmail.com>
2018-04-09 18:01:52 +02:00
Brian Brazil 31ce32f1fe
Greatly trim what netstat collector exposes by default (#876)
Netstat is 40% of the metrics on my laptop, many of which
are highly detailed information about IP internals in the kernel.
~300 such metrics on every machine in your fleet is excessive,
so focus on key metrics by default, overridable by the user.

Fixes #515

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-03-30 19:28:08 +01:00
Ben Kochie cf3edadcbb Update fixtures
* Add oom_kill to fixture.
* Update e2e outputs.
* Put regexp in order.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-03-29 22:00:02 +01:00
Brian Brazil 499c342fed Greatly reduce the metrics vmstat returns by default.
Vmstat has over 100 fields, most of which are highly
detailed debug information. Trim this down to only
essential fields by default, configurable by flag.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-03-29 22:00:02 +01:00
Brian Brazil c8c144587e
Enable bonding collector by default. (#872)
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-03-29 15:18:12 +01:00
Ben Kochie 779090db7e
Update ppc64le fixture (#867)
Update to match standard e2e output.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-03-27 17:05:20 +02:00
Mario Trangoni 1f11a86d59 Fix nfs golint issues (#863)
* procfs: update vendoring

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>

* procfs: fix e2e tests after nfs changes

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2018-03-22 22:25:37 +01:00
Ben Kochie 7b720df1c5
Use lowercase cpu label name in interrupts (#849)
To match other CPU related metric labels, use a lowercase named label.
2018-03-08 15:04:49 +01:00
Johannes 'fish' Ziemke 424ca8e322 Drop exec_ in boot_timestamp_seconds on *bsd (#839)
This closes #827.
2018-03-08 12:59:48 +01:00
colmbuckley 098f975b48 Correct the ClocksPerSec scaling factor on Darwin (#846)
* Update cpu_darwin.go

Change the definition of ClocksPerSec to read from limits.h

* Update cpu_darwin.go
2018-03-07 11:56:57 +01:00
Julius Volz 864a6ee935 Treat custom textfile metric timestamps as errors (#769)
This is clearer behavior and users will notice and fix their textfiles faster
than if we just output a warning.
2018-02-27 19:43:38 +01:00
Rene Treffer c504c7e264 Only report core throttles per core, not per cpu (#836)
* Only report core throttles per core, not per cpu

* Add topology/core_id to the cpu sysfs fixtures

* Add new cpu fixtures to ttar file

* Merge core_id reading and thermal throttle accounting

* Declare core_id
2018-02-27 19:43:15 +01:00
Ben Kochie e0d54a509c
Cleanup NFS metrics (#834)
* Cleanup NFS metrics

* Update `nfs` metric names to match `nfsd`.
* Remove uneeded `tcp` label from TCP connections metric.
* Remove uneeded `v` on `nfsd` metrics.
* Enable all `nfs` v4 client metrics.
* Remove `nfs` metric name overrides.

* Add ppc64le fixture.

* Fix typo.
2018-02-21 07:25:41 +01:00
Ben Kochie 3f41a2fecb
Update ppc64le fixture (#832)
Updates fixture for ppc64le arch to latest output.
2018-02-19 20:43:33 +01:00
Ben Kochie d33a447047
Remove deprecated prometheus.InstrumentHandlerFunc (#831)
Update Prometheus client golang use to use `promhttp.Handler()` instead
of `prometheus.InstrumentHandlerFunc()`.
2018-02-19 15:44:59 +01:00
Richard Elling d7348a5c78 updates for zfsonlinux 0.7.5 (#779)
* updates for zfsonlinux 0.7.5

* add constants for KSTAT_DATA_* types

* added e2e test for negative values represented by uint64 that can result from ZFS bugs
2018-02-16 15:46:31 +01:00
Ben Kochie 6468e7c80b
Enable NFS client metrics by default. (#828)
Enable NFS client metrics by default now that it nolonger prints errors
on scrape if there are no metrics to display.

Also fixup the nfsd README to match the nfs entry.
2018-02-16 15:42:47 +01:00
Ralf Horstmann 8d9c7ca659 Use swpginuse instead of swpgonly in meminfo_openbsd (#813)
All tools in OpenBSD base system use swpginuse instead of swpgonly
for reporting swap usage (snmpd, swapctl, top, vmstat), so let
memory collector use that as well for consistency.
2018-02-16 11:34:41 +01:00
Kasinath Kottukkal f6965e1812 Add overlay to defIgnoredFSTypes (#824)
* Add overlay to defIgnoredFSTypes

To avoid statfs() errors if node_exporter is running as non privileged user.

* Updated defIngoredFSTypes values in sorted order
2018-02-16 09:47:50 +01:00
Ben Kochie 01bd99fb1a
Refactor NFS client collector (#816)
* Update vendor github.com/prometheus/procfs/...

* Refactor NFS collector

Use new procfs library to parse NFS client stats.

* Ignore nfs proc file not existing.

* Refactor with reflection to walk the structs.
2018-02-15 13:40:38 +01:00
Brian Brazil 52c031890e
Add _seconds suffix to node_time. (#823) 2018-02-14 16:59:08 +00:00
Ben Kochie 05eabe60fb
Fix error output in nfsd collector. (#821) 2018-02-14 13:57:35 +01:00
Ben Kochie 3de2542d21
Fix NFSd metric type (#819)
RPC Count should be a counter, not a gauge.
2018-02-13 17:03:22 +01:00
Matt Layher 544488ddd6 Fix remaining metric naming issues (#799) 2018-02-12 18:53:31 +01:00
Ben Kochie 6a041692ed
Add NFS Server metrics collector. (#803)
* Add NFS Server metrics collector.

* Add File Handles metrics.

* Add nfsd IO stats.

* Add metrics for NFSd threads.

* Add metrics for NFSd read ahead cache.

* Add NFSd network traffic counters.

* Add RPC metrics.

* Add V2 requests metrics.

* Add NFSv3 metrics.

* Add NFSv4 metrics.

* Update reply cache comment.

* Update help text.
2018-02-12 17:56:05 +01:00
Brian Brazil 1072f2868d Fix log level regression in #533 2018-02-07 15:16:20 +00:00
Brian Brazil 7e41a2b279 Ignore /var/lib/docker by default. (#814)
The node exporter runs unprivileged, so it cannot statfs any filesystems
under this directory causing log spam.  In addition there tends to be
high churn in the filesystems here (as it's basically application
monitoring) which can cause high cardinaltiy and in one case caused
Prometheus's index symbol table to get very large.
Accordingly this should be ignored to reduce log spam and avoid
performance issues. The filesystems themselves can in principle be
monitored via container oriented exporters, and the underlying
filesystems will still be monitored.
2018-02-06 17:10:59 +01:00
Ralf Horstmann 29ac809e48 Use unified CPU metric description on OpenBSD (#810) 2018-02-01 23:59:19 +01:00
Derek Marcotte fde5d2c6c9 Remove unsafe typecasts from sysctl_bsd getStructTimeval. (#741)
There is a simpler way.
2018-02-01 18:43:40 +01:00
Ben Kochie 14d60958d6
Unify CPU collector conventions (#806)
* Unify CPU collector conventions

Add a common CPU metric description.
* All collectors use the same `nodeCpuSecondsDesc`.
* All collectors drop the `cpu` prefix for `cpu` label values.

* Fix subsystem string in cpu_freebsd.

* Fix Linux CPU freq label names.
2018-02-01 18:42:20 +01:00
Ralf Horstmann e3c76b1f0c Add OpenBSD CPU collector (#805) 2018-02-01 18:33:49 +01:00
Tom Wilkie 6833eec187 Fix tests. 2018-01-31 15:22:17 +00:00
Tom Wilkie 0316bacceb Only use one dbus connection, required some refactoring. 2018-01-31 15:19:18 +00:00
Tom Wilkie a7fd6b8743 Export systemd timer last trigger sec. 2018-01-31 15:07:04 +00:00
Ben Kochie 111e3af437
Remove obsolete megacli collector. (#798)
This collector has been replaced by the textfile collector tool
`storcli.py`.
2018-01-23 11:25:42 +01:00
Julius Volz 6cac74f0e0
Add unit suffix to textfile collector mtime metric (#796) 2018-01-22 14:02:19 +01:00
Brian Brazil a98067a294 Make metrics better follow guidelines (#787)
* Improve stat linux metric names.

cpu is no longer used.

* node_cpu -> node_cpu_seconds_total for Linux

* Improve filesystem metric names with units

* Improve units and names of linux disk stats

Remove sector metrics, the bytes metrics cover those already.

* Infiniband counters should end in _total

* Improve timex metric names, convert to more normal units.

See
3c073991eb/kernel/time/ntp.c (L909)
for what stabil means, looks like a moving average of some form.

* Update test fixture

* For meminfo metrics that had "kB" units, add _bytes

* Interrupts counter should have _total
2018-01-17 17:55:55 +01:00
Ben Kochie b4d7ba119a
Add fixture for ppc64le (#785)
* Add support for per-architecture fixtures.
* Add output for ppc64le.
2018-01-11 13:56:19 +01:00
Nick Owens 0629a081db multiply page size after float64 coercion to avoid signed integer overflow (#780) 2018-01-08 15:36:49 +01:00
Franz Pletz d432f9857e Use uint64 in the ZFS collector (#714)
ZFS metrics can also be unsigned 64-bit integers that won't fit in
int64 and causes the whole collector to fail.
2018-01-06 12:36:55 +01:00
Derek Marcotte 477fe4665a Move FreeBSD/DragonflyBSD out of meminfo add kvm. (#547)
* Move FreeBSD/DragonflyBSD out of meminfo add kvm.

This gives us SwapUsed, and everything under one roof.

* Fix typos per review.

* Update to use newer API.

* Remove premature optimization per PR feedback.
2018-01-04 12:23:26 +01:00
Sevag Hanssian 4329b0a86b Add summary metrics for systemd exporter (#765) 2018-01-04 11:49:36 +01:00
Matthieu Guegan d6ef10bb56 Add openbsd meminfo (#724)
* Implements meminfo collector for OpenBSD

This is a rework of #151.

* Fix CGO import

* Add some useful metrics

* Rename total -> size for normalization
2018-01-04 10:32:08 +01:00
Ben Kochie 7f6c59e198
Ignore more virtual filesystems (#775)
Add additional Linux virtual filesystem types to the default list.
2018-01-03 17:22:02 +01:00
Netmonk 2aa8d0eb0c [FIX] Exclude Linux proc from filesystem type regexp (#774)
* [FIX] Issue 63, error on excluding proc filesystem on linux, improving regexp

* [FIX] Reordering filter order
2018-01-03 11:40:32 +01:00
Julius Volz f536857ac6
Fix e2e tests after textfile custom timestamp removal (#768) 2017-12-24 11:54:33 +01:00
Shubheksha Jalan 1f2458f42c Filter out testfile metrics correctly when using collect[] filters (#763)
* remove injection hook for textfile metrics, convert them to prometheus format

* add support for summaries

* add support for histograms

* add logic for handling inconsistent labels within a metric family for counter, gauge, untyped

* change logic for parsing the metrics textfile

* fix logic to adding missing labels

* Export time and error metrics for textfiles

* Add tests for new textfile collector, fix found bugs

* refactor Update() to split into smaller functions

* remove parseTextFiles(), fix import issue

* add mtime metric directly to channel, fix handling of mtime during testing

* rename variables related to labels

* refactor: add default case, remove if guard for metrics, remove extra loop and slice

* refactor: remove extra loop iterating over metric families

* test: add test case for different metric type, fix found bug

* test: add test for metrics with inconsistent labels

* test: add test for histogram

* test: add test for histogram with extra dimension

* test: add test for summary

* test: add test for summary with extra dimension

* remove unnecessary creation of protobuf

* nit: remove extra blank line
2017-12-23 20:21:58 +01:00
Ben Kochie cd2a17176a
Add full make to CircleCI (#761)
* Add full make to CircleCI

Ensure end-to-end test is run.

* Fix go fmt error.

* Fix end-to-end output.
2017-12-21 16:24:23 +01:00
Wei Li 1e9bb4ec3a textfile: fix duplicate metrics error (#738)
The textfile gatherer should only be added to gatherer list once.

Signed-off-by: Li Wei <liwei@anbutu.com>
2017-12-06 17:05:40 +01:00
Kristian Klausen a96f1738b3 netdev: Change valueType to CounterValue (#749)
All the metric only goes up, so the type should be counter.
This also add _total to all the metric name.

Fix: #747
2017-12-06 13:58:35 +01:00
Ben Kochie 2a80537547
Split out guest cpu metrics on Linux. (#744)
Linux "guest" metrics for VMs are already accounted for in node_cpu
`user` and `nice` metrics.  Separate these into their own metric to
avoid duplication of data.
2017-11-23 15:04:47 +01:00
Karsten Weiss a8d7d1101a cpu: Support processor-less (memory-only) NUMA nodes (#734)
* cpu: Support processor-less (memory-only) NUMA nodes

Processor-less (memory-only) NUMA nodes exist e.g. in systems that use
Intel Optane drives for RAM expansion using Intel Memory Drive
Technology (IMDT).

IMDT RAM expansion supports two modes:

* "Unify Remote Memory domains": present a processor-less (memory-only)
  NUMA domain, which is the default
* "Expand local memory domains": to expand each processor’s memory domain
  with a portion of the memory made available by Optane and IMDT

This commit fixes a crash in the first case (when "cpulist" is empty).

Here's an example of such a system:

$ numastat -m|head -n5

Per-node system memory usage (in MBs):
                          Node 0          Node 1          Node 2           Total
                 --------------- --------------- --------------- ---------------
MemTotal               118239.56       130816.00       464384.00       713439.56

$ for i in {0..2}; do echo -n "$i: " ; cat /sys/bus/node/devices/node$i/cpulist ; done
0: 0-7,16-23
1: 8-15,24-31
2:

$ /opt/vsmp/bin/vsmpversion -vvv
Memory Drive Technology: 8.2.1455.74 (Sep 28 2017 13:09:59)
System configuration:
    Boards:      3
       1 x Proc. + I/O + Memory
       2 x NVM devices (Intel SSDPED1K375GAQ)
    Processors:  2, Cores: 16, Threads: 32
        Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz Stepping 01
    Memory (MB): 713472 (of 977450), Cache: 251416, Private: 12562
       1 x 249088MB   [262036/   678/12270]
       1 x 232192MB   [357707/125369/  146]  82:00.0#1
       1 x 232192MB   [357707/125369/  146]  83:00.0#1

* cpu: rename some variables (pkg => node)

* cpu: Use %v not %q in log.Debugf() format strings
2017-11-10 15:31:26 +01:00
Matt Layher f6f9c8d6cc Add and use sysReadFile in hwmon collector (#728) 2017-11-07 07:49:37 +01:00
Tobias Klauser d73f1e60c4 Simplify Utsname string conversion (#716)
* Update golang.org/x/sys/unix

This allows to use simplified string conversion of Utsname members.

* Simplify Utsname string conversion

Use Utsname from golang.org/x/sys/unix which contains byte array
instead of int8/uint8 array members. This allows to simplify the string
conversions of these members.
2017-11-02 11:57:14 +01:00
Ben Kochie ea250d73f4
Fix off by one in Linux interrupts collector (#721)
* Fix off by one in Linux interrupts collector

* Fix off by one in CPU column handler.
* Add test.

* Enable interrupts in end-to-end test.
2017-11-02 09:59:46 +01:00
Matt Layher 296b62acb7
netstat: return nothing when /proc/net/snmp6 not found 2017-10-31 15:26:32 -04:00
Derek Marcotte 0eecaa9547 Correct buffer_bytes > INT_MAX on BSD/amd64. (#712)
* Correct buffer_bytes > INT_MAX on BSD/amd64.

The sysctl vfs.bufspace returns either an int or a long, depending on
the value.  Large values of vfs.bufspace will result in error messages
like:

  couldn't get meminfo: cannot allocate memory

This will detect the returned data type, and cast appropriately.

* Added explicit length checks per feedback.

* Flatten Value() to make it easier to read.

* Simplify per feedback.

* Fix style.

* Doc updates.
2017-10-25 20:55:22 +02:00
Matt Layher f9ad88fc03
xfs: expose correct fields, fix metric names 2017-10-20 18:41:51 -04:00
Siavash Safi f3a7022602 Add collect[] parameter (#699)
* Add `collect[]` parameter

* Add TODo comment about staticcheck ignored

* Restore promhttp.HandlerOpts

* Log a warning and return HTTP error instead of failing

* Check collector existence and status, cleanups

* Fix warnings and error messages

* Don't panic, return error if collector registration failed

* Update README
2017-10-14 14:23:42 +02:00
Ben Kochie deadfef4c9 Update vendoring (#685)
* Update vendor github.com/coreos/go-systemd/dbus@v15

* Update vendor github.com/ema/qdisc

* Update vendor github.com/godbus/dbus

* Update vendor github.com/golang/protobuf/proto

* Update vendor github.com/lufia/iostat

* Update vendor github.com/matttproud/golang_protobuf_extensions/pbutil@v1.0.0

* Update vendor github.com/prometheus/client_golang/...

* Update vendor github.com/prometheus/common/...

* Update vendor github.com/prometheus/procfs/...

* Update vendor github.com/sirupsen/logrus@v1.0.3

Adds vendor golang.org/x/crypto

* Update vendor golang.org/x/net/...

* Update vendor golang.org/x/sys/...

* Update end to end output.
2017-10-05 16:20:47 +02:00
Brett Vickers b62c7bc0ad Updated vendored ntp package (#681)
The github.com/beevik/ntp package was recently updated with some
API changes that broke node_exporter. This commit fetches the
latest version of the ntp package and brings node_exporter in
line with the latest API.
2017-10-04 08:33:49 +02:00
Calle Pettersson 859a825bb8 Replace --collectors.enabled with per-collector flags (#640)
* Move NodeCollector into package collector

* Refactor collector enabling

* Update README with new collector enabled flags

* Fix out-of-date inline flag reference syntax

* Use new flags in end-to-end tests

* Add flag to disable all default collectors

* Track if a flag has been set explicitly

* Add --collectors.disable-defaults to README

* Revert disable-defaults flag

* Shorten flags

* Fixup timex collector registration

* Fix end-to-end tests

* Change procfs and sysfs path flags

* Fix review comments
2017-09-28 15:06:26 +02:00
Sami Kerola 3762191e66 Add timex collector (#664)
This collector is based on adjtimex(2) system call.  The collector returns
three values, status if time is synchronised, offset to remote reference,
and local clock frequency adjustment.

Values are taken from kernel time keeping data structures to avoid getting
involved how the synchronisation is implemented.  By that I mean one should
not care if time is update using ntpd, systemd.timesyncd, ptpd, and so on.
Since all time sync implementation will always end up telling to kernel what
is the status with time one can simply omit the software in between, and
look results of the syncing.  As a positive side effect this makes collector
very quick and conceptually specific, this does not monitor availability of
NTP server, or network in between, or dns resolution, and other unrelated
but necessary things.

Minimum set of values to keep eye on are the following three:

    The node_timex_sync_status tells if local clock is in sync with a remote
    clock.  Value is set to zero when synchronisation to a reliable server
    is lost, or a time sync software is misconfigured.

    The node_timex_offset_seconds tells how much local clock is off when
    compared to reference.  In case of multiple time references this value
    is outcome of RFC 5905 adjustment algorithm.  Ideally offset should be
    close to zero, and it depends about use case how large value is
    acceptable.  For example a typical web server is probably fine if offset
    is about 0.1 or less, but that would not be good enough for mobile phone
    base station operator.

    The node_timex_freq tells amount of adjustment to local clock tick
    frequency.  For example if offset is one second and growing the local
    clock will need instruction to tick quicker.  Number value itself is not
    very important, and occasional small adjustments are fine.  When
    frequency is unusually in stable one can assume quality of time stamps
    will not be accurate to very far in sub second range.  Obviously
    explaining why local clock frequency behaves like a passenger in roller
    coaster is different matter.  Explanations can vary from system load, to
    environmental issues such as a machine being physically too hot.

Rest of the measurements can help when debugging.  If you run a clock server
do probably want to collect and keep track of everything.

Pull-request: https://github.com/prometheus/node_exporter/pull/664
2017-09-19 07:54:06 -07:00
Leonid Evdokimov c169b4b1c5 Add metrics from SNTPv4 packet to ntp collector & add ntpd sanity check (#655)
* Add metrics from SNTPv4 packet to ntp collector & add ntpd sanity check

1. Checking local clock against remote NTP daemon is bad idea, local
ntpd acting as a  client should do it better and avoid excessive load on
remote NTP server so the collector is refactored to query local NTP
server.

2. Checking local clock against remote one does not check local ntpd
itself. Local ntpd may be down or out of sync due to network issues, but
clock will be OK.

3. Checking NTP server using sanity of it's response is tricky and
depends on ntpd implementation, that's why common `node_ntp_sanity`
variable is exported.

* `govendor add golang.org/x/net/ipv4`, it is dependency of github.com/beevik/ntp

* Update github.com/beevik/ntp to include boring SNTP fix

* Use variable name from RFC5905

* ntp: move code to make export of raw metrics more explicit

* Move NTP math to `github.com/beevik/ntp`

* Make `golint` happy

* Add some brief docs explaining `ntp` #655 and `timex` #664 modules

* ntp: drop XXX comment that got its decision

* ntp: add `_seconds` suffix to relevant metrics

* Better `node_ntp_leap` comment

* s/node_ntp_reftime/node_ntp_reference_timestamp_seconds/ as requested by @discordianfish

* Extract subsystem name to const as suggested by @SuperQ
2017-09-19 10:36:14 +02:00
Karsten Weiss b0d5c00832 cpu: Metric 'package_throttles_total' is per package. (#657)
* cpu: Metric 'package_throttles_total' is per package.

'package_throttles_total' is per package, not per cpu. This also reduces
the total number of cpu time series a lot (esp for multi core cpus).

* cpu: Better handling of a cpulist edge-case.

* cpu: Extract the package number from the directory name.

Do not rely on the range index.

* cpu: Add package_throttle_count for node0 cpu1

This file must be ignored by the cpu collector.
2017-09-07 23:24:18 +02:00
Matthias Rampke e1f129c729 Use int64 throughout the ZFS collector.
This avoids issues with integer overflows on 32-bit architectures. The
Prometheus data format is float64, so regardless of the architecture we
should handle large numbers.

Fixes #629.
2017-08-21 16:40:16 +00:00
Ben Kochie 8839640cd1 Ignore wifi collector permission errors (#646)
Ignore the permission denined error when the wifi collector has no
permission to read metrics.
2017-08-18 10:19:48 +02:00
Calle Pettersson dfe07eaae8 Switch to kingpin flags (#639)
* Switch to kingpin flags

* Fix logrus vendoring

* Fix flags in main tests

* Fix vendoring versions
2017-08-12 15:07:24 +02:00
Ben Kochie 46c31d8a7e Enable IPVS collector by default (#623)
* Silence error output when no IPVS present.
* Enable by default.
* Update end-to-end fixture.
* Update README.
2017-07-26 15:20:28 +02:00
Tobias Schmidt 515b5a933d Fix build tags of loadavg collector
The collector is only implemented for a subset of all operating systems
supported by go. Compilation will fail if attempted for another OS
target.
2017-07-20 15:13:58 -04:00
Tobias Schmidt 016d79535d Fix build tags of meminfo collector
The meminfo collector only supports darwin, dragonfly, freebsd and linux
and must not be included in other archtictures.
2017-07-20 14:37:10 -04:00
Andrea De Pasquale 1369763067 Change raid0 status line regexp for mdadm collector (#619) 2017-07-20 17:04:33 +02:00
Tobias Schmidt 921319c7eb Merge pull request #583 from knweiss/golint
Golint fixes
2017-07-10 23:49:36 +02:00
Aleksey Zhukov 7a914e58f2 Add parsing /proc/net/snmp6 file for netstat-linux (#615)
* Add parsing /proc/net/snmp6 file

* add /proc/net/snmp6 fixture

* fix e2e test

* gofmt

* remove unuser variable

* safe checks

* add tests

* change help format
2017-07-08 20:16:35 +02:00
Matt Layher 6e82fd1c56 Add XFS block mapping and block map B-tree stats (#575) 2017-07-07 07:27:52 +02:00
ideaship 8d90276283 Add bcache collector (#597)
* Add bcache collector for Linux

This collector gathers metrics related to the Linux block cache
(bcache) from sysfs.

* Removed commented out code

* Use project comment style

* Add _sectors to metric name to indicate unit

* Really use project comment style

* Rename bcache.go to bcache_linux.go

* Keep collector namespace clean

Rename:
- metric -> bcacheMetric
- periodStatsToMetrics -> bcachePeriodStatsToMetric

* Shorten slice initialization

* Change label names to backing_device, cache_device

* Remove five minute metrics (keep only total)

* Include units in additional metric names

* Enable bcache collector by default

* Provide metrics in seconds, not nanoseconds

* remove metrics with label "all"

* Add fixtures, update end-to-end for bcache collector

* Move fixtures/sys into tar.gz

This changeset moves the collector/fixtures/sys directory into
collector/fixtures/sys.tar.gz and tweaks the Makefile to unpack the
tarball before tests are run.

The reason for this change is that Windows does not allow colons in a
path (colons are present in some of the bcache fixture files), nor can
it (out of the box) deal with pathnames longer than 260 characters
(which we would be increasingly likely to hit if we tried to replace
colons with longer codes that are guaranteed not the turn up in regular
file names).

* Add ttar: plain text archive, replacement for tar

This changeset adds ttar, a plain text replacement for tar, and uses it
for the sysfs fixture archive. The syntax is loosely based on tar(1).

Using a plain text archive makes it possible to review changes without
downloading and extracting the archive. Also, when working on the repo,
git diff and git log become useful again, allowing a committer to verify
and track changes over time.

The code is written in bash, because bash is available out of the box on
all major flavors of Linux and on macOS. The feature set used is
restricted to bash version 3.2 because that is what Apple is still
shipping.

The programm also works on Windows if bash is installed. Obviously, it
does not solve the Windows limitations (path length limited to 260
characters, no symbolic links) that prompted the move to an archive
format in the first place.
2017-07-07 07:20:18 +02:00
kadota kyohei a077024f51 add diskstats on Darwin (#593)
* Add diskstats collector for Darwin

* Update year in the header

* Update README.md

* Add github.com/lufia/iostat to vendored packages

* Change stats to follow naming guidelines

* Add a entry of github.com/lufia/iostat into vendor.json

* Remove /proc/diskstats from description
2017-07-06 13:51:24 +02:00
Rene Treffer 56bf8d4b2d Add link to kernel documentation for sysfs/cpufreq files 2017-06-27 11:25:06 +02:00
Rene Treffer bcc3cd92b8 Fix cpufreq statistics by converting kHz to Hz 2017-06-27 11:05:55 +02:00
Ben Kochie 182810056f Fix Linux cpu errors (#606)
Make the Linux cpu collector soft-error on missing `cpufreq` and
`thermal_throttle` features.
2017-06-20 07:51:26 +02:00
Rene Treffer 2e9f1913b8 Move stat_linux to cpu_linux and add cpufreq stats (#548) 2017-06-13 11:21:53 +02:00
Emanuele Rocca 047003b6bb Add qdisc collector for Linux (#580)
* Add qdisc collector for Linux

This collector gathers basic queueing discipline metrics via netlink,
similarly to what `tc -s qdisc show` does.

* qdisc collector: nl-specific code moved, names fixed

- netlink-specific parts moved to github.com/ema/qdisc
- avoid using shortened names
- counters renamed into XXX_total

* Get rid of parseMessage error checking leftover

* Add github.com/ema/qdisc to vendored packages

* Update help texts and comments

* Add qdisc collector to README file

* qdisc collector end-to-end testing

* Update qdisc dependency to latest version

Update github.com/ema/qdisc dependency to revision 2c7e72d, which
includes unit testing.

* qdisc collector: rename "iface" label into "device"
2017-05-23 11:55:50 +02:00
Karsten Weiss b2f4fd5776 Remove unused devstatCollector struct member 'bytes_total'.
This also fixes this golint issue:

devstat_freebsd.go:40:2: don't use underscores in Go names; struct field bytes_total should be bytesTotal
2017-05-14 19:51:53 +02:00
Jonas Große Sundrup e6d031788f Correct typo (#582) 2017-05-14 19:46:23 +02:00
Karsten Weiss bca09abf1c golint: Fix NewStatCollector() doc string. 2017-05-14 13:51:47 +02:00
Karsten Weiss b3e7420a27 cpu_darwin.go: s/cpu_ticks/cpuTicks/g 2017-05-14 13:51:42 +02:00
Karsten Weiss b05c7d8dab cpu_darwin.go: Fix doc strings. 2017-05-14 13:51:34 +02:00
Karsten Weiss fff03c6c0c Fix NewTCPStatCollector doc string. 2017-05-14 13:23:57 +02:00
Karsten Weiss 6720cfdbfe golint: Fix comment on exported function NewDevstatCollector. 2017-05-14 13:21:39 +02:00
Karsten Weiss b73af72853 Explicitly check for the rc 3 in call to getloadavg(). Reorder logic. 2017-05-14 13:07:54 +02:00
Karsten Weiss af358ec800 golint fixes: if block ends with a return statement, so drop this else and outdent its block. 2017-05-14 12:55:44 +02:00
Karsten Weiss 732f839810 sysctl_bsd.go: golint fixes. Typo fix. 2017-05-14 12:51:57 +02:00
Robert Clark 58f50b31f2 Multiply port data XMIT/RCV metrics by 4 (#579)
According to Mellanox, it is standard practice that the port_xmit_data and port_rcv_data
files are split into 4 lanes. To get the actual transmit and receive values for each
port, the metric needs to be multiplied by 4.

Signed-Off-By: Robert Clark <robert.d.clark@hpe.com>
2017-05-12 07:28:53 +02:00
Ben Kochie 8f3cddf734 Merge pull request #568 from mdlayher/xfs-init
Initial XFS collector
2017-04-25 09:54:28 +02:00
Kai S 59f9b8c5c1 Handle nonexisting bonding_masters file (#569)
* silently ignore nonexisting bonding_masters file

Add an empty fixtures dir without a bonding_masters file to test.

* Moved the check to the Update() method

Dropped the empty test dir.
2017-04-24 23:19:17 +04:00
Matt Layher 1feb091b36
Initial XFS collector 2017-04-22 11:53:07 -04:00
Ben Kochie e9aad0157c Merge pull request #550 from derekmarcotte/dm-boottime
Add exec_boot_time for freebsd, dragonfly
2017-04-22 09:18:05 +02:00
Derek Marcotte 5b557bf973 Fix metric name per review. 2017-04-21 16:25:31 -04:00
Derek Marcotte db8ec9c6b4 Add exec_boot_time for freebsd, dragonfly
Adds new sysctl type, bsdSysctlTypeStructTimeval to enable parsing of
timevals from raw memory.
2017-04-21 10:23:19 -04:00
Daniele Sluijters bb9d4ade0b uname_linux: Build for 32bit MIPS too
Since Go 1.8 32bit MIPS Big/Little Endian are supported assuming the
target runs Linux and the kernel either emulates an FPU or can access
the CPU one.

This allows the node_collector to build for mips and mipsle opening up
the possibility of running it on things like home routers
(DD-|Open|ASUS-)Wrt firmware usually has the necessary bits in place.
2017-04-20 13:30:40 +02:00
Brian Brazil f291d2d6dd Get full resolution for node_time (#555) 2017-04-19 18:31:21 +01:00
Karsten Weiss d9703ff7c6 edac: Fix typo in csrow label of node_edac_csrow_uncorrectable_errors_total metric. 2017-04-18 12:45:06 +02:00
Tobias Schmidt 266f0958d2 Merge pull request #561 from derekmarcotte/dm-fix-dfly-build
Fixes broken build on Dragonfly.
2017-04-17 17:31:12 +02:00
Derek Marcotte 83cecfa696 Fixes broken build on Dragonfly.
Undefined err:

84eaa8fecd/collector/devstat_dragonfly.go (L145)
2017-04-17 10:50:49 -04:00
Karsten Weiss 45ca8db352 Support the 'guest_nice' cpu mode of /proc/stat.
'guest_nice' is available since Linux 2.6.33.
2017-04-14 12:50:37 +02:00
Sam Kottler 6eafa51fa8 Add ARP collector for Linux (#540)
* Implement commonalities and linux support for ARP collection

* Add ARP collector to fixtures and run as part of e2e tests

* Bubble up scanner errors

* Use single return values where it makes sense

* Add missing annotation

* Move arp_common into arp_linux

* Add license header to arp_linux.go

* Address initial feedback

* Use strings.Fields instead of strings.Split

* Deal with scanner.Err() rather than throwing away errors

* Check for scan errors in-line before interacting with the entries map

* Don't interact with potentially empty text from scan

* Check for scan errors outside the scan loop

* Add comment about moving procfs parsing

* Add more direct comment

* Update initialism style to match go style guide

* Put function args on the same line

* Add TODO in front of comment about procfs extraction

* Guard against strings.Fields returning an empty slice

* Be more defensive about ARP table format and use upcase more broadly

* Enable the ARP collector by default

* Add ARP collector to the README

* Remove 'entry'
2017-04-11 17:45:19 +02:00
Tobias Schmidt 8aec44617a Remove Windows support
Use https://github.com/martinlindhe/wmi_exporter instead.
2017-04-10 23:27:23 -03:00
Tobias Schmidt 41a44a4d24 Merge pull request #532 from prometheus/grobie/remove-extra-file-check
mdadm: Remove extra file existence check
2017-03-31 05:35:12 +02:00
Ben Kochie 5f43211f67 Blacklist systemd scope units
Blacklist `scope` units from systemd collector by default.

These units are created with unique IDs programatically[0].  This leads to
huge cardinality problems.

[0]: https://www.freedesktop.org/software/systemd/man/systemd.scope.html
2017-03-23 14:02:46 +01:00
Tobias Schmidt d290ea94b8 Fix export of stale device error metrics for unmounted filesystems
Instead of maintaining a counter metric for device errors in memory,
this change exports a gauge and uses const metrics to avoid leaking
metrics for unmounted filesystems.
2017-03-22 21:48:18 -03:00
Tobias Schmidt 7b93b52010 Fix lint issues on filesystem BSD implementation 2017-03-22 21:48:12 -03:00
Tobias Schmidt 445ed44082 mdadm: Remove extra file existence check 2017-03-22 10:11:19 -03:00
Johannes 'fish' Ziemke 9676f5f2dc Merge pull request #523 from roclark/support-legacy-infiniband
Add support for legacy InfiniBand drivers
2017-03-21 10:52:07 +01:00
Johannes 'fish' Ziemke 620e9937e6 Merge pull request #524 from mdlayher/wifi-expand
Expand wifi collector for more interface types
2017-03-21 10:32:44 +01:00
Juergen Hoetzel aef2601cf6 Add missing dependency for static FreeBSD build 2017-03-20 16:59:45 +00:00
Matt Layher 2bfe410fb7
Expand wifi collector for more interface types 2017-03-20 12:25:01 -04:00
Robert Clark 3a5917dfdc Add support for legacy InfiniBand drivers
Older versions of the OFED drivers contain 64-bit variants of the port counters and are located in a directory named 'counters_ext'. This patch includes these older metrics that have since been deprecated with OFED 4.0.

Signed-Off-By: Robert Clark <robert.d.clark@hpe.com>
2017-03-20 10:37:21 -05:00
Tobias Schmidt 0400e437be Fix and simplify parsing of raid metrics
Fixes the wrong reporting of active+total disk metrics for inactive
raids. Also simplifies the code and removes a couple of redundant
comments.
2017-03-19 08:03:58 -03:00
Matt Layher 42c8a20545
Unexport wifiCollector metrics 2017-03-16 17:11:09 -04:00
Matt Layher 69368b7f9c Add synthetic node_wifi_station_info metric for BSS information 2017-03-16 16:24:23 -04:00
Brian Brazil a02e469b07 Report collector success/failure and duration per scrape. (#516)
This is in line with best practices, and also saves us
63 timeseries on a default Linux setup.
2017-03-16 17:21:00 +00:00
Robert Clark 413e5af502 Skip metric files that don't exist
In case a metric file within the InfiniBand collector doesn't exist, skip the metric in order to allow collection of the remaining valid InfiniBand metrics.

Signed-Off-By: Robert Clark <robert.d.clark@hpe.com>
2017-03-09 11:05:36 -06:00
Derek Marcotte 72d8576185 Refactor meminfo_bsd.go to use sysctl_bsd.go (#501)
* Refactor meminfo_bsd.go to use sysctl_bsd.go

* Fixed spelling.
2017-03-07 21:54:28 -04:00
Ben Kochie 5d22d41ed7 Merge pull request #484 from prometheus/grobie/update-vendored-packages
Update vendored packages
2017-03-01 08:05:45 +01:00
Derek Marcotte bdc2131332 Added node_memory_buffer, node_memory_swaptotal to meminfo_bsd (#451) 2017-03-01 01:36:02 -04:00
Tobias Schmidt ce117d7a40 Update vendored packages 2017-02-28 18:20:24 -04:00
Tobias Schmidt 84eaa8fecd Remove more unnecessarily named return values 2017-02-28 17:33:46 -04:00
Derek Marcotte 5c28ab044d Add BSD exec statistics collector (#457)
* First pass of a sysctl_bsd source, exec_bsd + exec metrics

* Incorportate PR feedback, including removing pre-build descriptions, unit conversion callback.

* Remove redundant cached_description field, per PR feedback

* Incorporate PR feedback
2017-02-28 17:23:10 -04:00
Tobias Schmidt 1bd94074dd Delete unused code 2017-02-28 17:20:16 -04:00
Tobias Schmidt 922e74d58f Remove unnecessarily named return variables
Named return variables should only be used to describe the returned type
further, e.g. `err error` doesn't add any new information and is just
stutter.
2017-02-28 16:04:25 -04:00
Tobias Schmidt 084e585c2a Fix scanner usage without error handling 2017-02-28 16:04:25 -04:00
Tobias Schmidt d1dfda86ee Fix wrong end-to-end expectation 2017-02-28 16:02:43 -04:00
Tobias Schmidt abdebef47c Fix gofmt -s and spelling issues 2017-02-28 14:01:28 -04:00
Tobias Schmidt 195b4d596c Merge pull request #480 from prometheus/grobie/gosimple
Simplify go code
2017-02-28 13:59:01 -04:00
Tobias Schmidt 694294baf5 Remove unnecessary conversions 2017-02-28 13:57:49 -04:00
Tobias Schmidt 21e13c7f52 Simplify code 2017-02-28 13:54:27 -04:00
Tobias Schmidt c703435790 Fix all open go lint and vet issues 2017-02-28 13:05:38 -04:00
Ben Kochie 38cd07ebb9 Merge pull request #450 from roclark/add-infiniband
infiniband: Add new collector for InfiniBand statistics
2017-02-16 14:33:19 +01:00
Ben Kochie a097dd36b3 Merge pull request #459 from joehandzik/wip-zpool-io-cherrypick
ZFS Collector: Add zpool IO statistics
2017-02-16 08:16:55 +01:00
Thorhallur Sverrisson 19813d3e02 Changing datastructure for BuddyInfo 2017-02-15 10:15:44 -06:00
Thorhallur Sverrisson 5ab285e098 Adding buddyinfo to end to end test. 2017-02-15 10:15:44 -06:00
Thorhallur Sverrisson 55417d7688 Moving buddyinfo logic to procfs 2017-02-15 10:15:44 -06:00
Thorhallur Sverrisson 492c96f6b6 Moving buddyinfo_test.go to procfs library 2017-02-15 10:15:43 -06:00
Thorhallur Sverrisson 3ba15c1ddb Adding support for /proc/buddyinfo for linux free memory fragmentation.
/prod/buddyinfo returns data on the free blocks fragments available
for use from the kernel.  This data is useful when diagnosing
possible memory fragmentation.

More info can be found in:
* https://lwn.net/Articles/7868/
* https://andorian.blogspot.com/2014/03/making-sense-of-procbuddyinfo.html
2017-02-15 10:15:43 -06:00
Joe Handzik bb8b3fca88 ZFS Collector: Add zpool IO statistics
Signed-Off-By: Joe Handzik <joseph.t.handzik@hpe.com>
2017-02-10 13:31:25 -06:00
Robert Clark 36f81282b7 Add unit tests for InfiniBand collector
Signed-Off-By: Robert Clark <robert.d.clark@hpe.com>
2017-02-07 11:09:08 -06:00
Robert Clark 4866adcb71 Add new collector for InfiniBand statistics
Add new metrics for the InfiniBand network protocol including the amount of packets sent and received, the number of times the link has been downed and how many times the link has recovered from an error state.

Signed-Off-By: Robert Clark <robert.d.clark@hpe.com>
2017-02-07 11:09:08 -06:00
Joe Handzik 8c23f5ff54 ZFS Collector: Convert dashes to underscores for metrics
This fixes #442, and prevents other ZFS metrics from slipping through in the future.

Signed-Off-By: Joe Handzik <joseph.t.handzik@hpe.com>
2017-01-31 14:11:56 -06:00
Ben Kochie 7cfa5e75b8 Merge pull request #439 from mdlayher/collector-staticcheck
Fix two staticcheck issues in IPVS collector tests
2017-01-31 08:53:10 -05:00
Ben Kochie 71362d45eb Merge pull request #432 from joehandzik/wip-zfs-zfetchstats
Update ZFS Collector with most non-zpool metrics
2017-01-31 08:52:41 -05:00
Joe Handzik e5ee274a32 ZFS Collector: Move from camelcase to underscores for metric prefixes
Signed-Off-By: Joe Handzik <joseph.t.handzik@hpe.com>
2017-01-29 15:59:01 -06:00
Matt Layher c8e546926a
Fix two staticcheck issues in IPVS collector tests 2017-01-27 15:20:36 -05:00
Joe Handzik e213ccbc57 ZFS Collector: Refactor to use maps/slices and fewer globals
Removed all global types that were unnecessary, and refactored to use constructor-created values and inline values instead of globals.

Signed-Off-By: Joe Handzik <joseph.t.handzik@hpe.com>
2017-01-27 14:02:28 -06:00
Ben Kochie 5a6db5c8d2 Handle multiple NFS device mounts
It's possible to mount an NFS share in multiple locations.
* Duplicates contain the same metric values, so they can be ignored.
* Update fixture.
2017-01-24 13:44:08 +01:00
Joe Handzik 94fb93a9f3 ZFS Collector: Add dmu_tx functionality
Signed-Off-By: Joe Handzik <joseph.t.handzik@hpe.com>
2017-01-23 16:41:15 -06:00
Joe Handzik 07c7ae733a ZFS Collector: Add fm functionality
Signed-Off-By: Joe Handzik <joseph.t.handzik@hpe.com>
2017-01-23 16:31:22 -06:00
Joe Handzik 05048c067d ZFS Collector: Add xuio_stats functionality
Signed-Off-By: Joe Handzik <joseph.t.handzik@hpe.com>
2017-01-23 16:30:37 -06:00
Joe Handzik 3c9e779989 ZFS Collector: Add vdev_cache_stats functionality
Signed-Off-By: Joe Handzik <joseph.t.handzik@hpe.com>
2017-01-23 16:29:50 -06:00
Joe Handzik a02ca9502c ZFS Collector: Add zil functionality
Signed-Off-By: Joe Handzik <joseph.t.handzik@hpe.com>
2017-01-23 16:29:00 -06:00
Joe Handzik a3125ab4d9 ZFS Collector: Add zfetchstats functionality
Signed-Off-By: Joe Handzik <joseph.t.handzik@hpe.com>
2017-01-23 16:28:11 -06:00
Ben Kochie acb495ccab Merge pull request #425 from mdlayher/wifi-update
Update vendored wifi, handle stations with missing info
2017-01-20 08:43:44 -05:00
Matt Layher dfd661a633
Allow graceful failure in hwmon collector 2017-01-17 11:24:28 -05:00
Matt Layher ca3f07feef
Update vendored wifi, handle stations with missing info 2017-01-17 00:54:18 -05:00
Ben Kochie 92537020a3 Fix runit collector flag typo. 2017-01-16 23:41:33 +01:00
Julius Volz 276112c7ef Merge pull request #418 from mdlayher/wifi-graceful-fail
Make wifi collector fail gracefully if metrics not available
2017-01-13 20:31:21 -05:00
Matt Layher d3089f2ce8
Make wifi collector fail gracefully if metrics not available 2017-01-13 13:35:20 -05:00
Matt Layher 1e1775e761
Make ZFS collector fail gracefully when not available 2017-01-12 12:54:16 -05:00
Johannes 'fish' Ziemke 2884181cce Merge pull request #415 from mdlayher/mountstats-nfs-additional
Add NFS event metrics to mountstats collector
2017-01-12 14:08:21 +01:00
Matt Layher e3f99e13b9
Add NFS event metrics to mountstats collector 2017-01-11 11:41:13 -05:00
Matt Layher efa25665ec
Add initial wifi collector, bump netlink to fix 32-bit builds 2017-01-11 10:08:44 -05:00
Johannes 'fish' Ziemke 55170e8feb Merge pull request #411 from discordianfish/hwmon-move-label-metrics
Use filename as label, move 'label' to own metric
2017-01-10 12:21:18 +01:00
Ben Kochie 38a4a36061 Update end-to-end test. 2017-01-10 10:23:16 +01:00
Ben Kochie b4fa10ca9d Add collector for Linux EDAC
Collect "Error detection and correction" metrics from memory
controllers.
* Supported on Linux only.
* Add basic fixtures.
* Enabled by default.
2017-01-10 10:14:19 +01:00
Johannes 'fish' Ziemke 6aef20f8d8 Use filename as label, move 'label' to own metric
This closes #406
2017-01-09 18:33:31 +01:00
Joe Handzik e7442d6517 end-to-end-test.sh: Add zfs plugin
Enables fixture test and updates e2e-output.txt.

Signed-Off-By: Joe Handzik <joseph.t.handzik@hpe.com>
2017-01-08 11:13:35 -06:00
Corey Stewart 10ba27bf2c Remove FreeBSD support for zfs plugin.
This also involves removing zfs_zpool code for now.

Signed-Off-By: Corey Stewart <stewa169@purdue.edu>
Signed-Off-By: Joe Handzik <joseph.t.handzik@hpe.com>
2017-01-08 11:13:35 -06:00
Corey Stewart a8c94d48e6 Style changes and cleanup
This patch makes stylistic changes to error strings, unexports method names by lower casing them, removes unused dataSetMetric, and adds copyright/licence information.

Signed-Off-By: Corey Stewart <stewa169@purdue.edu>
2017-01-08 10:23:58 -06:00
Christian Schwarz f29f3873ea Add a collector for ZFS, currently focussed on ARC stats.
It is tested on FreeBSD 10.2-RELEASE and Linux (ZFS on Linux 0.6.5.4).

On FreeBSD, Solaris, etc. ZFS metrics are exposed through sysctls.
ZFS on Linux exposes the same metrics through procfs `/proc/spl/...`.

In addition to sysctl metrics, 'computed metrics' are exposed by
the collector, which are based on several sysctl values.
There is some conditional logic involved in computing these metrics
which cannot be easily mapped to PromQL.

Not all 92 ARC sysctls are exposed right now but this can be changed
with one additional LOC each.
2017-01-08 10:23:58 -06:00
Johannes 'fish' Ziemke 2e47fcb8c5 Only store relevant e2e output
This makes commits ligher/more readable when updating the output.
2017-01-06 12:36:26 +01:00
Johannes 'fish' Ziemke ad2eb4a788 Use Gauge for megacli counters
Without refactoring this to use const metrics, we need to make this a
gauge to we can keep using Set() which was deprecated for counters.
2017-01-06 12:33:21 +01:00
Johannes 'fish' Ziemke 01a9a37556 Stop using deprecated SetMetricFamilyInjectionHook 2017-01-06 12:21:12 +01:00
Johannes 'fish' Ziemke 3e266e28b9 Merge pull request #397 from dominikh/freebsd-cpu
Collect CPU temperatures on FreeBSD
2017-01-05 17:32:48 +01:00
Johannes 'fish' Ziemke fc1113cd11 Merge pull request #396 from dominikh/bsd-memleak
Don't leak or race in FreeBSD devstat collector
2017-01-05 17:31:57 +01:00
Dominik Honnef d827db8e17 Better error handling when collecting CPU temps
Log why we couldn't collect the temperature, and set metric to NaN if
the CPU should support temperature collection but had an error.
2017-01-05 15:19:56 +01:00
Johannes 'fish' Ziemke 91f4781234 Merge pull request #311 from kpettijohn/solaris-loadavg
Added loadavg collector for Solaris
2017-01-05 11:49:16 +01:00
Dominik Honnef 9847257bc0 Add missing license headers 2017-01-05 06:18:34 +01:00
Dominik Honnef 782eaee100 Collect CPU temperatures on FreeBSD 2017-01-05 06:17:16 +01:00
Dominik Honnef 38c5890428 Reuse devinfo struct
The devstat API expects us to reuse one devinfo for many invocations of
devstat_getstats. In particular, it allocates and resizes memory
referenced by devinfo.
2017-01-05 05:38:26 +01:00
Dominik Honnef ea55d0f5cb Don't race in FreeBSD devstat collector
Querying the number of devices separately from the device list itself is
racy. Devices may be added or removed between the two calls; and removed
devices would lead to a segfault.
2017-01-05 05:38:26 +01:00
Dominik Honnef 5e220c1665 Move cgo portions of FreeBSD devstat collector into own file
Embedding 100 lines of code in a comment doesn't make for good reading,
editing or code quality.
2017-01-05 05:38:26 +01:00
Dominik Honnef 20ca0f1376 Eliminate memory leak in FreeBSD devstat collector
The memory allocated by calloc was never freed. Since the devinfo struct
never leaves the function, anyway, we might as well just allocate it on
the stack.
2017-01-05 05:38:26 +01:00
Dominik Honnef 732dd67729 Fix build of cpu_freebsd.go
Corrects an incorrect merge in 8e50b80
2017-01-05 03:16:51 +01:00
Kevin Pettijohn d2fbeeb3c3 Added loadavg collector for solaris
It seems solaris prefers "sys/loadavg.h" over "stdlib.h" when
fetching the load average.

For Illumos based OSes it was required to include "sys/time.h" to
ensure that "hrtime_t" was defined.

https://www.illumos.org/issues/6002

It also required setting the ldflags "-fno-stack-protector -lssp" to
avoid undefined symbols when linking with gcc.

/opt/local/go/pkg/tool/solaris_amd64/link: running gcc failed: exit status 1
Undefined                       first referenced
 symbol                             in file
 __stack_chk_fail                    /tmp/go-link-138622936/000002.o
 __stack_chk_guard                   /tmp/go-link-138622936/000002.o
2017-01-04 17:45:40 -08:00
Johannes 'fish' Ziemke f9d3f830cb Merge pull request #399 from discordianfish/fish-fs-uniq-metric
Make sure we only return one metric per mounted fs
2017-01-04 16:48:04 +01:00
Johannes 'fish' Ziemke 4c9131b7d8 Make sure we only return one metric per mounted fs 2017-01-04 16:45:25 +01:00
Johannes 'fish' Ziemke 6dd39b15c2 Do not build meminfo on freebsd 2017-01-04 16:02:49 +01:00
Johannes 'fish' Ziemke a97ff2bcda Do not build meminfo on windows 2017-01-04 15:16:13 +01:00
Johannes 'fish' Ziemke d17b1b44a6 Merge pull request #398 from prometheus/fish-netdev-check-scan-errror
Check for errors in netdev scanner
2017-01-03 16:00:08 +01:00
Johannes 'fish' Ziemke 9969f93e7d Merge pull request #387 from discordianfish/fish-fix-meminfo-darwin
Refactor meminfo and add darwin metrics
2017-01-03 14:50:52 +01:00
Johannes 'fish' Ziemke 6576571ac8 Check for errors in netdev scanner 2017-01-03 14:48:52 +01:00
Johannes 'fish' Ziemke 26c6182c84 Move comment and remove superfluous newline 2017-01-03 14:41:05 +01:00
Johannes 'fish' Ziemke b68a9ec7af Merge pull request #359 from CloudAndHeat/feature/hwmon_chip_name_metric
hwmon: Provide annotation metric to link chip sysfs paths to human-readable chip types
2017-01-03 14:38:43 +01:00
Johannes 'fish' Ziemke 4e696d5d31 Merge pull request #391 from discordianfish/fish-add-cpu-darwin
Add cpu collector for darwin
2017-01-03 14:23:50 +01:00
Johannes 'fish' Ziemke 079fd701a0 Merge pull request #389 from prometheus/fish-use-const-metrics
Convert remaining collectors to use ConstMetrics
2017-01-03 14:22:58 +01:00
Johannes 'fish' Ziemke d2ca252457 Merge pull request #393 from discordianfish/fish-add-netdev-darwin
Add netdev collector for darwin
2017-01-03 14:12:36 +01:00
Johannes 'fish' Ziemke 8e50b80d12 Convert remaining collectors to use ConstMetrics 2017-01-03 14:11:10 +01:00
Johannes 'fish' Ziemke 3db2f442ae Limit node-exporter scope, deprecated collectors 2017-01-03 14:03:23 +01:00
Johannes 'fish' Ziemke c21c59dfeb Add meminfo stats for Darwin 2017-01-03 11:22:46 +01:00
Johannes 'fish' Ziemke 2983c4a31d Refactor meminfo collector similar to filesystem
Instead of doing the whole metric exposition in a platform specific collector
implementation, this creates and updates the metrics in meminfo.go and
expected a platform specific implementation of getMemInfo on
*meminfoCollector.
2017-01-03 11:20:36 +01:00
Johannes 'fish' Ziemke 3c47ef8e60 Add netdev collector for darwin
Same as for openbsd, this is just slightly adjusted from freebsd
variant.
2016-12-29 19:17:15 +01:00
Dominik Honnef f0adcd163d Implement CPU collector on FreeBSD without cgo 2016-12-29 04:29:52 +01:00
Dominik Honnef d2a43f7d05 Implement meminfo on BSD without cgo
This removes some error handling, which should be fine. If the calls
fail, we will get the zeroes, which is a safe enough fallback.
Additionally, if the first sysctl (page_size) succeeded it is unlikely
that other ones will fail.
2016-12-29 02:19:21 +01:00
Johannes 'fish' Ziemke 050d6f7f13 Add cpu collector for darwin 2016-12-28 18:38:52 +01:00
Dominik Honnef 0f6191987e Implement file systems on FreeBSD without cgo
The code may also work for other BSDs, but I don't have access to those
for testing.
2016-12-26 23:06:17 +01:00
Dominik Honnef 54c74923ee Implement loadavg on FreeBSD without cgo
The code may also work for other BSDs, but I don't have access to those
for testing.
2016-12-26 23:06:05 +01:00
Ben Kochie 10e525ff02 Merge pull request #375 from prometheus/fish-add-runit-servicedir-flag
Add runit service dir flag
2016-12-26 13:01:51 +01:00
Johannes 'fish' Ziemke d506b2266c Merge pull request #374 from prometheus/fish-add-filesystem-errors
Add node_filesystem_device_errors_total metric
2016-12-26 11:51:14 +01:00
Bjørn Forsman 64e637cbcc Ignore autofs filesystems on linux
node_exporter currently triggers autofs to mount the underlying
filesystem on every scrape. This is undesirable. Better ignore autofs.

The underlying filesystem that autofs mounts will be monitored though,
when the (real) filesystem is mounted.
2016-12-25 15:13:45 +01:00
Johannes 'fish' Ziemke 71ea37987f Merge pull request #365 from EdSchouten/drbd
A collector for DRBD
2016-12-25 11:04:43 +01:00
Ed Schouten b0d15eaac6 Reduce the severity of these messages.
They get printed all the time, as there are some tokens in the /proc
file that we simply don't support. It's better to keep these as
debugging messages, which may come in useful if new tags start to
appear.
2016-12-23 15:57:46 +01:00
Ed Schouten 4adf7fa96c Improve the help strings, as proposed in the code review. 2016-12-23 15:55:49 +01:00
Ed Schouten b7daf27678 Process feedback from the code review.
- Use the right number of printf() arguments. Use %q where it makes sense.
- Use "DRBD" instead of "Drbd", per Go's style guide.
- Add _total suffixes to counter metrics.
- Mention the unit (bytes) in documentation strings once more.
2016-12-22 13:57:19 +01:00
Björn Rabenstein 08c9347e88 Merge pull request #367 from mdlayher/mountstats
Add mountstats collector for detailed NFS statistics
2016-12-20 17:20:41 +01:00
Matt Layher 25a93e38e7
Add mountstats collector for detailed NFS statistics 2016-12-20 11:13:02 -05:00
Johannes 'fish' Ziemke 9039a425d0 Add runit service dir flag 2016-12-19 13:10:38 +01:00
Johannes 'fish' Ziemke deebf0aa49 Add node_filesystem_device_errors_total metric
This metric is the total number of errors occurred when getting stats
for the given device.
2016-12-19 11:48:32 +01:00
Ed Schouten d1fa279105 Use a descriptive name for the file descriptor. 2016-12-16 11:45:14 +01:00
Ben Kochie 677ed28575 Merge pull request #361 from lucasbergman/mips-build-fix
mips64 build fix
2016-12-16 11:39:53 +01:00
Ed Schouten 6ff620e387 Properly propagate parse errors. 2016-12-16 11:36:36 +01:00
Ed Schouten 6269f7502a Add a collector for DRBD.
This collector exposes most of the useful information that can be found
in /proc/drbd. Sizes are normalised to be in bytes, as /proc/drbd uses
kibibytes.
2016-12-11 11:55:28 +01:00
Ed Schouten a696830c38 Add a collector for NFS client statistics.
This change adds a new collector called "nfs" that parses the contents
of /proc/net/rpc/nfs and turns it into metrics. It can be used to
inspect the number of operations per type, but also to keep an eye on an
extraneous number of retransmissions, which may indicate connectivity
issues.

I've picked the name "nfs", as most operating systems use "nfs" for the
client component and "nfsd" as the server component. If we want to add
stats for the NFS server as well, we'd better call such a collector
"nfsd".
2016-12-09 19:58:08 +01:00
Jonas Wielicki 3efaa1a6a8 Update end-to-end tests 2016-12-01 10:00:50 +01:00
Jonas Wielicki c481dd19da Re-introduce human-readable chip types
The chip label generation has been changed in #334 to prefer the
unique device path (e.g. the location on the PCI bus) due to #333.

Here, a new annotation metric ``node_hwmon_chip_names`` is
introduced which allows to link the unique chip sysfs path to a
human-readable chip name which may not be unique among chip sysfs
paths (for example, dual-slot systems have multiple
chipType="coretemp" sensors).

This allows to mitigate the downsides of the solution to #333
(namely that the device path may not be stable across kernels and
reboots) for cases where it does not matter that multiple devices
may have the same human-readable name (e.g. aggregation or where
at most one device with a common chip name is present).

For cases where no human-readable name can be derived, the
annotation metric is not emitted.
2016-12-01 09:59:52 +01:00
Lucas Bergman 4f479e55e0 linux/mips: Unbreak the build
Specifically, uname syscall support on Linux is controlled by a build
tag white list, and both mips64 platforms were missing from the list.
2016-11-30 13:13:49 -06:00
Ben Kochie f8af350ae2 Merge pull request #346 from mcdan/people/mcdan/issues/219
Fix additional mdadm parsing cases
2016-11-17 21:13:38 +01:00
dan mcweeney 13aa37025f Feedback on PR, thanks @tcolgate for the review 2016-11-17 10:23:01 -05:00
Ben Kochie 4fd03c31e4 Merge pull request #323 from stuartnelson3/dfly-devstat
Dragonfly devstat
2016-11-17 13:33:50 +01:00
Ben Kochie 7a9aad01b4 Merge pull request #310 from stuartnelson3/dfly-cpu
export DragonFlyBSD CPU time
2016-11-17 13:33:11 +01:00
stuart nelson e589a2b8af Remove gauges and convert to NewConstMetric format 2016-11-17 13:23:54 +01:00
stuart nelson 2b74cf7498 Export devstat for dragonfly 2016-11-17 13:23:54 +01:00
dan mcweeney 1f6b5aee39 #219 - add fixes for @samzhang111 super token 2016-11-16 14:49:57 -05:00
dan mcweeney 8d756cab50 Fixes end to end test 2016-11-16 14:47:03 -05:00
dan mcweeney 00c9a88a55 Fixes #219 - use the default to catch personalities that are unknown
Assumes all raid configurations start with raid and that anything
else is unknown.
2016-11-16 14:47:03 -05:00
Ed Schouten 9749c2c0b3 mdstat: Fix parsing of RAID0 lines that contain additional attributes.
We seem to have a small number of Linux servers here that have lines in
/proc/mdstat that cannot be parsed by the node exporter, due to them
containing attributes that are not matched by the regular expression
("super 1.2").

Extend the regular expression to skip this data, just like we do for all
of the other status lines.
2016-11-16 17:21:25 +01:00
Rene Treffer abe8e297a6 Prefer device path based names over exported names (#334)
* Prefer device path based names over exported names

For some sensors (like coretemp) it is possible that multiple
instances exist, thus base the name on the device path and not on
the exported name.

* Update end-to-end test for dual socket machines

Explicitly have 2 coretemp instances with a symlink for the device
such that the hwmon collector must pick that name (or fail)
2016-10-28 20:25:44 +01:00
Ben Kochie c6162312f2 Add Linux NUMA "numastat" metrics (#249)
* Add Linux NUMA "numastat" metrics
  Read the `numastat` metrics from /sys/devices/system/node/node* when reading NUMA meminfo metrics.
* Update end-to-end test output.
* Add `numastat` metrics as counters.
* Add tests for error conditions.
* Refactor meminfo numa metrics struct
* Refactor meminfoKey into a simple struct of metric data.
  This makes it easier to pass slices of metrics around.
* Refactor tests.
* Fixup: Add suggested fixes.
* Fixup:  More fixes
* Add another scanner.Err() return
* Add "_total" to counter metrics.
2016-10-12 13:07:49 +02:00
Rene Treffer 081ecc5db0 Add hwmon /sensors support (#278)
* Add hwmon support (mainly known from lm-sensors)

This commit adds initial support for linux hardware sensors, exported
through sysfs.

Details of the interface can be found at
https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface

* Add end-to-end test with some real life data

* Cleanup comments on hwmon collector

* Drop raw sensor name from hwmon output

* Let the sensor label be "sensor"

* Add hwmon short description to README.
2016-10-06 16:33:24 +01:00
stuart nelson 450fe0f3ba Add test 2016-09-28 09:10:05 +02:00
stuart nelson cf3710191a Compile meminfo for dfly (#315)
* Compile meminfo for dfly

* Update README.me
2016-09-28 08:08:19 +01:00
stuart nelson ef1925db7d Compile netdev on dragonfly (#314)
* Compile netdev on dragonfly

* Only run netdev bsd test on bsd

* Update README.md
2016-09-27 21:44:13 +01:00
stuart nelson ee37a27d91 Export values as uint64_t 2016-09-20 23:27:56 +02:00
stuart nelson e942d7e234 Maintain granularity in cpu data
Export cpu mode times as original uint64_t data,
and update frequency, and do the conversion to
float64 and subsequent division in go.
2016-09-20 09:10:53 +02:00
Ben Kochie afac1f7433 Update mdstat fixture based on linux source.
Update `Contains` matching for `resync=`
2016-09-19 16:11:16 +02:00
stuart nelson 57f88ac4f6 Update comment 2016-09-19 09:48:53 +02:00
stuart nelson 78c84b1a47 Remove old freq finding code
This is the code that was lifted from the freebsd
implementation, but was not correct.
2016-09-19 09:48:34 +02:00
stuart nelson 45ac033d9e Use correct frequency for calculating cpu time
The correct frequency is the systimer frequency,
not the stathz.

From one of the DragonFly developers:

The bump upon each statclock is:
((cur_systimer - prev_systimer) * systimer_freq) >> 32

systimer_freq can be extracted from following
sysctl in userspace:
sysctl kern.cputimer.freq
2016-09-19 09:35:41 +02:00
stuart nelson 8cc06aab04 Remove unneeded ncpu variable 2016-09-18 17:36:39 +02:00
stuart nelson 9f7822ccdc Remember to bzero string
Duplication was caused by malloc returning a
region of memory that already had data in it.
2016-09-18 16:17:49 +02:00
stuart nelson c02dcdeb35 Remove unused comment. 2016-09-18 14:21:54 +02:00
stuart nelson 3e4a154656 Correctly exporting values
Moved to exporting via a string, which is then
split and parsed.

The string is sometimes duplicated, however.
2016-09-18 14:16:26 +02:00
Ben Kochie 64b82596ef Fix mdadm collector for resync=PENDING.
Add fix for mdadm devices in state `resync=PENDING`.
* Update test and fixture.
2016-09-18 08:30:20 +02:00
stuart nelson 4b4385bd44 Remove free
Don't need it since we aren't malloc'ing
2016-09-17 19:14:31 +02:00