Commit graph

1189 commits

Author SHA1 Message Date
Ben Kochie becca1275c
Convert to Go modules (#1178)
* Convert to Go modules

* Update promu config.
* Convert to Go modules.
* Update vendoring.
* Update Makefile.common.
* Update circleci config.
* Use Prometheus release tar for promtool.
* Fixup unpack

* Use temp dir for unpacking tools.
* Use BSD compatible tar command.
* OpenBSD mkdir doesn't support `-v`.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-30 14:01:20 +01:00
Ben Kochie 1732478361
circleci: switch to 2.1 config
Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-29 12:06:34 +01:00
Andreas Wirooks 9c9e17aba7 Handle 'Unknown' as measurement value. (#1113)
We use the output-compatible perccli and storcli.py does not handle 'Unknown' as a result:
```
sg="Error parsing \"/var/lib/node_exporter/perccli.prom\": text format parsing error in line 222: expected float as value, got \"Unknown\"" source="textfile.go:212"
```
I know, the perccli should not return 'Unknown' but this error breaks all other useful measurements because the prom file is not parsable. My if condition fixes this.

Signed-off-by: Andreas Wirooks <andreas.wirooks@1und1.de>
2018-11-23 16:29:56 +01:00
ioriveur ea8e1373f7 Change Dfly's CPU counting frequency (#1140)
* Change Dfly's CPU counting frequency, see: https://github.com/prometheus/node_exporter/issues/1129

* Convert Dfly's CPU unit into second

Signed-off-by: iori-yja <fivo.11235813@gmail.com>
2018-11-21 13:45:22 +01:00
Ben Kochie ffefc8e74d Add a limit to the number of in-flight requests (#1166)
In order to avoid stuck collectors using up all system resources, add a
limit to the number of parallel in-flight scrape requests. This will
return a 503 error.

Default to 40 requests, this seems like a reasonable number based on:
* Two Prometheus servers scraping every 15 seconds.
* Failing scrapes after 5 minutes of stuckness.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-20 18:11:40 +01:00
Johannes 'fish' Ziemke bcec99e0aa Add link to prometheus-dcgm (#1164)
Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2018-11-19 19:35:01 +01:00
Nemikolh 62f99f95f0 Add receive/transmit bytes total metric (wifi collector). (#1150)
Signed-off-by: Nemikolh <Nemikolh@users.noreply.github.com>
2018-11-19 19:15:54 +01:00
ioriveur 17fee8081f Check BSD's mib which accounts for swap size (#1149)
* Change Dfly's CPU counting frequency, see: https://github.com/prometheus/node_exporter/issues/1129

Signed-off-by: iori-yja <fivio.11235813@gmail.com>

* Convert Dfly's CPU unit into second

Signed-off-by: iori-yja <fivio.11235813@gmail.com>

* Check BSD's mib which accounts for swap size; see #1127

Signed-off-by: iori-yja <fivo.11235813@gmail.com>

* fix swap check code

Signed-off-by: iori-yja <fivo.11235813@gmail.com>
2018-11-17 11:02:54 +01:00
Paul Gier 3cf5b006fb examples/init.d: fix web.listen-address flag (#1157)
CLI flags use two dashes instead of one since v0.15.0
Also, use default port number
Fixes #1156

Signed-off-by: Paul Gier <pgier@redhat.com>
2018-11-16 00:50:09 +01:00
Ben Kochie ab19e0c831
Add changelog entry for #1148 (#1154)
Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-15 04:22:02 +01:00
Arno Uhlig 6edd9d217e [systemd] collect taskCurrent, tasksMax per systemd unit (#1098)
* [systemd] collect taskCurrent, tasksMax per systemd unit

Signed-off-by: Arno Uhlig <arno.uhlig@sap.com>
2018-11-14 10:50:39 +01:00
Björn Rabenstein 174b854080
Merge pull request #1148 from prometheus/beorn7/metrics
Add --web.disable-exporter-metrics flag
2018-11-13 15:24:38 +01:00
beorn7 cd2331a185 Add --web.disable-exporter-metrics flag
If this flag is set, the metrics about the exporter itself (go_*,
process_*, promhttp_*) will be excluded from /metrics.

The Kingpin way of handling boolean flags makes the negative flag
wording (_dis_able) the most reasonably one.

This also refactors the flow in node_exporter.go quite a bit to avoid
mixing up the global and a local registry and to avoid re-creating a
registry even if no filtering is requested.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2018-11-13 14:22:25 +01:00
Ben Kochie b1eec66640
Add TCPSynRetrans to netstat default filter (#1143)
Tcp SYN packet retransmits are a very useful signal as they affect
network performance disproportionately to regular TCP retransmits.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-07 17:21:18 +01:00
Christopher Blum 1b98db9fa7 textfile example storcli enhancements (#1145)
* storcli.py: Remove IntEnum

This removes an external dependency.
Moved VD state to VD info labels

* storcli.py: Fix BBU health detection

BBU Status is 0 for a healthy cache vault and 32 for a healthy BBU.

* storcli.py: Strip all strings from PD

Strip all strings that we get from PDs.
They often contain whitespaces....

* storcli.py: Add formatting options

Add help text explaining how this documented was formatted

* storcli.py: Add DG to pd_info label

Add disk group to pd_info.
That way we can relate to PDs in the same DG.
For example to check if all disks in one RAID
use the same interface...

* storcli.py: Fix promtool issues

Fix linting issues reported by promtool check-metrics
* storcli.py: Exit if storcli reports issues

storcli reports if the command was a success.
We should not continue if there are issues.

* storcli.py: Try to parse metrics to float

This will sanitize the values we hand over to
node_exporter - eliminating any unforeseen values we read out...

* storcli.py: Refactor code to implement handle_sas_controller()

Move code into methods so that we can now also support HBA queries.
* storcli.py: Sort inputs

"...like a good python developer"
  - Daniel Swarbrick

* storcli.py: Replace external dateutil library with internal datetime

Removes external dependency...

* storcli.py: Also collect temperature on megaraid cards

We have already collected them on mpt3sas cards...

* storcli.py: Clean up old code

Removed dead code that is not used any more.

* storcli.py: strip() all information for labels

They often contain whitespaces...

* storcli.py: Try to catch KeyErrors generally

If some key we expect is not there, we will want to
still print whatever we have collected so far...

* storcli.py: Increment version number

We have made some changes here and there.
The general look of the data has not been changed.

* storcli.py: Fix CodeSpell issue

Split string to avoid issues with Codespell due to Celcius in JSON Key

Signed-off-by: Christopher Blum <zeichenanonym@web.de>
2018-11-07 17:12:23 +01:00
Sven Haardiek 29d4629f55 Introduce example to get pending updates from pacman (#1114)
* Introduce example to get pending updates from pacman

Signed-off-by: Sven Haardiek <sven@haardiek.de>
2018-11-05 22:27:57 +01:00
Cougar 764da30556 Add compat rules for node_time, node_memory_ShmemHugePages and node_memory_ShmemPmdMapped (#1138)
Signed-off-by: Cougar <cougar@random.ee>
2018-11-05 16:40:19 +01:00
Benjamin Drung 2d5fcdeef4 Add mellanox_hca_temp text collector example (#1128)
* deleted_libraries: Upgrade to Python 3

Python 2.7 will not be maintained past 2020. Therefore upgrade
text_collector_examples/deleted_libraries.py to Python 3.

* Add mellanox_hca_temp text collector example

mellanox_hca_temp is a script that reads Mellanox HCA temperature using
the Mellanox mget_temp_ext tool.

Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
2018-11-01 12:23:06 +01:00
Matt Layher 073e056121
Merge pull request #1131 from prometheus/mdl-collector-export
collector: export NodeCollector for documentation purposes
2018-10-31 12:38:48 -04:00
Matt Layher c0a55e3f80 collector: add bounds check and test for filesystem collector (#1133)
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2018-10-30 22:12:42 +01:00
Patrick bdc0e7e678 Collect additional common Infiniband counters (#1120)
* Collect additional common Infiniband counters

Signed-off-by: Patrick Freeman <will.pat.free@gmail.com>
2018-10-30 21:54:09 +01:00
Paul Gier 988f049040 collector/hwmon_linux: handle temperature sensor file which doesn't have item suffix (#1123)
In some cases the file might be called "temp" instead of the usual format "temp<index>_<item>"
as described in the kernel docs: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface
In this case, treat this as an _input file containing the current temperature reading.

Fixes #1122

Signed-off-by: Paul Gier <pgier@redhat.com>
2018-10-30 18:49:22 +01:00
Paul Gier 38163f234f collector/diskstats: don't fail if there are extra stats, just ignore… (#1125)
* collector/diskstats: don't fail if there are extra stats, just ignore them

Signed-off-by: Paul Gier <pgier@redhat.com>
2018-10-30 18:45:00 +01:00
Kazumasa Kohtaka 9bd4416822 Makefile: add target for checking Prometheus rules (#1126)
Signed-off-by: Kazumasa Kohtaka <kkohtaka@gmail.com>
2018-10-30 18:44:17 +01:00
Matt Layher 778124a56c collector: add bounds check and test for tcpstat collector (#1134)
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2018-10-27 09:21:36 +02:00
Matt Layher 3d798aa4a1 collector: fix golint problems in ZFS collector (#1132)
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2018-10-27 09:18:33 +02:00
Matt Layher 2c2ee93519
collector: export NodeCollector for documentation purposes
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2018-10-26 15:42:00 -04:00
Ben Kochie 7519967619
Fix promu config (#1119)
Rename promu no-cgo config to default promu name to avoid crossbuild
problems.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-10-20 08:21:51 +02:00
Ben Kochie 0da9d248e7
Update for 0.17.0-rc.0 release (#1118)
* Update VERSION.
* Update CHANGELOG.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-10-19 17:29:19 +02:00
Ben Kochie a0a164defb
Update cpufreq metrics collector (#1117)
* Update Linux cpufreq collector to use new procfs library functions.
* Split thermal throttle collection to a separate function.
* Add new required fixtures and repack ttar file.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-10-18 17:28:19 +02:00
Ben Kochie ef7a02dfa8
Update vendor github.com/prometheus/client_golang/...@v0.9.0 (#1111)
* Update vendor github.com/prometheus/client_golang/...@v0.9.0
* Update vendor github.com/prometheus/common/...

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-10-15 20:40:34 +02:00
Paul Gier 7057c64f45 fix a few minor golint warnings (#1110)
Signed-off-by: Paul Gier <pgier@redhat.com>
2018-10-15 18:44:06 +02:00
Paul Gier e8d8199072 Update diskstats for linux kernel 4.19 (#1109)
The format of /proc/diskstats is changing in linux-4.19 to include some
additional fields.  See: https://www.kernel.org/doc/Documentation/iostats.txt

* collector/diskstats: use constants for some hard coded strings
* collector/diskstats: update diskstats for linux-4.19
* collector/diskstats: remove kernel doc url from individual metrics

Signed-off-by: Paul Gier <pgier@redhat.com>
2018-10-15 17:24:28 +02:00
Ben Kochie 0880d460d7
Ignore additional virtual filesystems (#1104)
Add more virtual filesystems to the default ignore list
* bpf
* cgroup2
* selinuxfs
* squashfs

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-10-12 11:24:32 +02:00
Ben Kochie 9cf508e673
Update vendoring (#1105)
* Update vendor github.com/sirupsen/logrus@v1.1.1
* Update vendor github.com/coreos/go-systemd/dbus@v17
* Update vendor github.com/golang/protobuf/proto@v1.2.0
* Update vendor github.com/konsorten/go-windows-terminal-sequences@v1.0.1
* Update vendor github.com/mdlayher/...
* Update vendor github.com/prometheus/procfs/...
* Update vendor golang.org/x/...

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-10-11 18:41:41 +02:00
Bryan Boreham f0d2a06b11 Update readme (#1107)
* State that wifi collector is disabled by default

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Add the 'processes' collector to the Readme

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2018-10-11 18:27:41 +02:00
dbalakirev 5273b00df9 launchctl example based on LaunchDaemons (#1102)
LaunchDaemons are the correct way to create services that are restart proof.
There is now only a single destination place mentioned in the readme for the plist file.

Signed-off-by: Dávid Balakirev <dave00ster@gmail.com>
2018-10-10 12:44:05 +02:00
Björn Rabenstein bddf41d327 Update prometheus/client_golang vendoring (#1099)
This is mostly required to fix a bug with histograms on 32bit platforms.
(Which might or might not be used in node_exporter. Just in case...)

Signed-off-by: beorn7 <beorn@soundcloud.com>
2018-10-05 16:05:02 +02:00
Dario Maiocchi 01ec8c5c5c Remove continue with label (#1084)
Instead of continue with label use helper function
Signed-off-by: dmaiocchi <dmaiocchi@suse.com>
2018-10-05 13:20:30 +02:00
Ben Kochie a1ce712e22
Cleanup unused /proc/mounts fixture. (#1097)
* Cleanup unused /proc/mounts fixture.
* Ignore Uint -> Unit in codespell.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-10-04 18:07:12 +02:00
Mario Trangoni 3659260b66 infiniband: Handle iWARP* RDMA modules N/A (#974)
* infiniband: Add not connected i40iw0/ports/1 fixtures
* infiniband: Handle issue when iWARP* RDMA modules are not available

This is related to #966, and handle this error,

Jun 07 13:33:24 hostname node_exporter[81888]: time="2018-06-07T13:33:24+02:00" level=error msg="ERROR: infiniband
collector failed after 0.000929s: strconv.ParseUint: parsing \"N/A (no PMA)\": invalid syntax" source="collector.go:132"

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2018-10-04 15:05:59 +02:00
Yecheng Fu 0f9842f20a [continue 912] strip rootfs prefix for run in docker (#1058)
* strip rootfs prefix for run in docker
* Use `/` as default value of path.rootfs, and parse mounts from `/proc/1/mounts`.
* No need to mount `/proc` and `/sys` because we share host's PID
namespace, which allows processes within the container to see all of the
processes on the system.

Closes: #66

Signed-off-by: Ivan Mikheykin <ivan.mikheykin@flant.com>
Signed-off-by: Yecheng Fu <cofyc.jackson@gmail.com>
2018-10-04 14:11:21 +02:00
gentlejo 2269df255c Add node_exporter script for init.d (#1059)
* Add node_exporter script for init.d

Signed-off-by: gentlejo <josungil@gmail.com>
2018-10-04 13:57:49 +02:00
Andrew Banchich 5da107b02c Add missing words and update markdown syntax (#1095)
Signed-off-by: Andrew Banchich <andrewbanchich@gmail.com>
2018-10-03 09:03:25 +02:00
Ralf Horstmann 9f820bd3ee Update cpu collector for OpenBSD 6.4 (#1094)
Starting with (not yet released) OpenBSD 6.4, sysctl KERN_CPTIME2 will
return ENODEV for offline CPUs.

SMT siblings are reported as offline when hw.smt is disabled, which is
the default since one of the later Spectre variants. So this might
affect a few systems.

For more details see:
https://cvsweb.openbsd.org/src/sys/kern/kern_sysctl.c#rev1.348

Signed-off-by: Ralf Horstmann <ralf+github@ackstorm.de>
2018-10-02 10:21:30 +02:00
Ben Kochie 5a461d261c
Add linux/s390x build (#1092)
Signed-off-by: Ben Kochie <superq@gmail.com>
2018-09-30 16:45:32 +02:00
Ben Kochie 526eac15c5
Add ppc64 build. (#1089)
Add ppc64 build.
2018-09-30 13:45:47 +02:00
Fabian Heymann 2f381f0c44 Update dependency mattn/go-xmlrpc (#1091)
Signed-off-by: Fabian Heymann <fabian.heymann@finanzcheck.de>
2018-09-30 09:27:14 +02:00
Daniele Sluijters d999dacdc6 filesystem: Ignore netns/nsfs mounts (#1047)
When starting Docker containers a whole bunch of netns (network
namespace) mounts are created that the node exporter can't make any
sense of (and can't read either).

This ignores all nsfs filesystems.

Fixes #875

Signed-off-by: Daniele Sluijters <daenney@users.noreply.github.com>
2018-09-26 10:45:51 +02:00
Ben Kochie c7dfb82dac
Update build (#1081)
* Update build

* Only use CGO when building non-Linux.
* Update build to Go 1.11
* Use tab indenting consistently.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-09-25 16:02:42 +02:00