Commit graph

1150 commits

Author SHA1 Message Date
Dai Dang Van 085d872aaf Add S.M.A.R.T metrics (#1209)
Update metrics following SMART attributes in [1][2]
- Seek_Error_Rate - ID: 7
- Reallocated_Event_Count - ID: 196

[1] https://en.wikipedia.org/wiki/S.M.A.R.T.#Known_ATA_S.M.A.R.T._attributes
[2] https://en.wikibooks.org/wiki/Minimizing_Hard_Disk_Drive_Failure_and_Data_Loss/Self-Monitoring,_Analysis,_and_Reporting_Technology

Signed-off-by: Dai, Dang Van <daikk115@gmail.com>
2019-01-03 18:12:28 +01:00
Anton Tolchanov cf8b29d1fb Add a sample btrfs stats collector script (#1200)
Signed-off-by: Anton Tolchanov <commits@knyar.net>
2018-12-21 14:10:03 +01:00
Simon Pasquier 97dab59e18 Fix go.sum after Go1.11.4 bump (#1202)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-12-19 11:41:27 +00:00
dhewg 7c960fd683 smartmon.sh: add metric for active/low-power mode (#1192)
Add this new metric (where sda is active and sdb is in standby mode):
smartmon_device_active{disk="/dev/sda",type="sat"} 1
smartmon_device_active{disk="/dev/sdb",type="sat"} 0

Also skip further metrics if the drive is in a low-power mode. This
prevents spinning up disks just to get the metrics (which matches e.g.
debian's default behavior for smartd).

Signed-off-by: Andre Heider <a.heider@gmail.com>
2018-12-13 16:11:23 +01:00
Paul Gier 03bb276deb Makefile.common: fix promu download path for arm32 (#1196)
Signed-off-by: Paul Gier <pgier@redhat.com>
2018-12-13 16:07:22 +01:00
Paul Gier 614b815e00 Makefile.common: fix format rule (#1195)
Signed-off-by: Paul Gier <pgier@redhat.com>
2018-12-11 17:47:09 +01:00
Ben Kochie 73ddf5f1f7 netstat: Add TCP In/Out Segs (#1185)
* netstat: Add TCP In/Out Segs

In order to get a better idea of TCP packet loss, we need to know how
many `node_netstat_Tcp_OutSegs` there are so we can compare this to
`node_netstat_Tcp_RetransSegs`.

Signed-off-by: Ben Kochie <superq@gmail.com>

* Update fixtures

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-12-08 12:16:02 +01:00
Tariq Ibrahim 6bd51269b7 update to host_statistics64 for Darwin meminfo (#1183)
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2018-12-06 16:47:20 +01:00
Ben Kochie f9dd8e9b8c
Release v0.17.0 (#1168)
* Update CHANGELOG
* Update VERSION

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-30 15:18:48 +01:00
Ben Kochie 4abc6fba7d
Add fallback for missing /proc/1/mounts (#1172)
* Add fallback for missing /proc/1/mounts

On some systems, `/proc/1/mounts` is hidden from non-root users due to
the `hidepid` procfs feature. Attempt to fallback to `/proc/mounts` if
`/proc/1/mounts` is not found.

Signed-off-by: Ben Kochie <superq@gmail.com>

* Add tests.

Signed-off-by: Ben Kochie <superq@gmail.com>

* Add CHANGELOG entry.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-30 14:01:55 +01:00
Jerome Froelich 0cb0c4d911 Remove unused variable readOnly from filesystem_linux.go. (#1173)
The pull request #1002 changed the logic used on Linux servers to determine if a filesystem is
read-only. As a result of this change, the variable `readOnly` is now unused and can be removed.

Signed-off-by: Jerome Froelich <jeromefroelich@hotmail.com>
2018-11-30 14:01:39 +01:00
Ben Kochie becca1275c
Convert to Go modules (#1178)
* Convert to Go modules

* Update promu config.
* Convert to Go modules.
* Update vendoring.
* Update Makefile.common.
* Update circleci config.
* Use Prometheus release tar for promtool.
* Fixup unpack

* Use temp dir for unpacking tools.
* Use BSD compatible tar command.
* OpenBSD mkdir doesn't support `-v`.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-30 14:01:20 +01:00
Ben Kochie 1732478361
circleci: switch to 2.1 config
Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-29 12:06:34 +01:00
Andreas Wirooks 9c9e17aba7 Handle 'Unknown' as measurement value. (#1113)
We use the output-compatible perccli and storcli.py does not handle 'Unknown' as a result:
```
sg="Error parsing \"/var/lib/node_exporter/perccli.prom\": text format parsing error in line 222: expected float as value, got \"Unknown\"" source="textfile.go:212"
```
I know, the perccli should not return 'Unknown' but this error breaks all other useful measurements because the prom file is not parsable. My if condition fixes this.

Signed-off-by: Andreas Wirooks <andreas.wirooks@1und1.de>
2018-11-23 16:29:56 +01:00
ioriveur ea8e1373f7 Change Dfly's CPU counting frequency (#1140)
* Change Dfly's CPU counting frequency, see: https://github.com/prometheus/node_exporter/issues/1129

* Convert Dfly's CPU unit into second

Signed-off-by: iori-yja <fivo.11235813@gmail.com>
2018-11-21 13:45:22 +01:00
Ben Kochie ffefc8e74d Add a limit to the number of in-flight requests (#1166)
In order to avoid stuck collectors using up all system resources, add a
limit to the number of parallel in-flight scrape requests. This will
return a 503 error.

Default to 40 requests, this seems like a reasonable number based on:
* Two Prometheus servers scraping every 15 seconds.
* Failing scrapes after 5 minutes of stuckness.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-20 18:11:40 +01:00
Johannes 'fish' Ziemke bcec99e0aa Add link to prometheus-dcgm (#1164)
Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2018-11-19 19:35:01 +01:00
Nemikolh 62f99f95f0 Add receive/transmit bytes total metric (wifi collector). (#1150)
Signed-off-by: Nemikolh <Nemikolh@users.noreply.github.com>
2018-11-19 19:15:54 +01:00
ioriveur 17fee8081f Check BSD's mib which accounts for swap size (#1149)
* Change Dfly's CPU counting frequency, see: https://github.com/prometheus/node_exporter/issues/1129

Signed-off-by: iori-yja <fivio.11235813@gmail.com>

* Convert Dfly's CPU unit into second

Signed-off-by: iori-yja <fivio.11235813@gmail.com>

* Check BSD's mib which accounts for swap size; see #1127

Signed-off-by: iori-yja <fivo.11235813@gmail.com>

* fix swap check code

Signed-off-by: iori-yja <fivo.11235813@gmail.com>
2018-11-17 11:02:54 +01:00
Paul Gier 3cf5b006fb examples/init.d: fix web.listen-address flag (#1157)
CLI flags use two dashes instead of one since v0.15.0
Also, use default port number
Fixes #1156

Signed-off-by: Paul Gier <pgier@redhat.com>
2018-11-16 00:50:09 +01:00
Ben Kochie ab19e0c831
Add changelog entry for #1148 (#1154)
Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-15 04:22:02 +01:00
Arno Uhlig 6edd9d217e [systemd] collect taskCurrent, tasksMax per systemd unit (#1098)
* [systemd] collect taskCurrent, tasksMax per systemd unit

Signed-off-by: Arno Uhlig <arno.uhlig@sap.com>
2018-11-14 10:50:39 +01:00
Björn Rabenstein 174b854080
Merge pull request #1148 from prometheus/beorn7/metrics
Add --web.disable-exporter-metrics flag
2018-11-13 15:24:38 +01:00
beorn7 cd2331a185 Add --web.disable-exporter-metrics flag
If this flag is set, the metrics about the exporter itself (go_*,
process_*, promhttp_*) will be excluded from /metrics.

The Kingpin way of handling boolean flags makes the negative flag
wording (_dis_able) the most reasonably one.

This also refactors the flow in node_exporter.go quite a bit to avoid
mixing up the global and a local registry and to avoid re-creating a
registry even if no filtering is requested.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2018-11-13 14:22:25 +01:00
Ben Kochie b1eec66640
Add TCPSynRetrans to netstat default filter (#1143)
Tcp SYN packet retransmits are a very useful signal as they affect
network performance disproportionately to regular TCP retransmits.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-07 17:21:18 +01:00
Christopher Blum 1b98db9fa7 textfile example storcli enhancements (#1145)
* storcli.py: Remove IntEnum

This removes an external dependency.
Moved VD state to VD info labels

* storcli.py: Fix BBU health detection

BBU Status is 0 for a healthy cache vault and 32 for a healthy BBU.

* storcli.py: Strip all strings from PD

Strip all strings that we get from PDs.
They often contain whitespaces....

* storcli.py: Add formatting options

Add help text explaining how this documented was formatted

* storcli.py: Add DG to pd_info label

Add disk group to pd_info.
That way we can relate to PDs in the same DG.
For example to check if all disks in one RAID
use the same interface...

* storcli.py: Fix promtool issues

Fix linting issues reported by promtool check-metrics
* storcli.py: Exit if storcli reports issues

storcli reports if the command was a success.
We should not continue if there are issues.

* storcli.py: Try to parse metrics to float

This will sanitize the values we hand over to
node_exporter - eliminating any unforeseen values we read out...

* storcli.py: Refactor code to implement handle_sas_controller()

Move code into methods so that we can now also support HBA queries.
* storcli.py: Sort inputs

"...like a good python developer"
  - Daniel Swarbrick

* storcli.py: Replace external dateutil library with internal datetime

Removes external dependency...

* storcli.py: Also collect temperature on megaraid cards

We have already collected them on mpt3sas cards...

* storcli.py: Clean up old code

Removed dead code that is not used any more.

* storcli.py: strip() all information for labels

They often contain whitespaces...

* storcli.py: Try to catch KeyErrors generally

If some key we expect is not there, we will want to
still print whatever we have collected so far...

* storcli.py: Increment version number

We have made some changes here and there.
The general look of the data has not been changed.

* storcli.py: Fix CodeSpell issue

Split string to avoid issues with Codespell due to Celcius in JSON Key

Signed-off-by: Christopher Blum <zeichenanonym@web.de>
2018-11-07 17:12:23 +01:00
Sven Haardiek 29d4629f55 Introduce example to get pending updates from pacman (#1114)
* Introduce example to get pending updates from pacman

Signed-off-by: Sven Haardiek <sven@haardiek.de>
2018-11-05 22:27:57 +01:00
Cougar 764da30556 Add compat rules for node_time, node_memory_ShmemHugePages and node_memory_ShmemPmdMapped (#1138)
Signed-off-by: Cougar <cougar@random.ee>
2018-11-05 16:40:19 +01:00
Benjamin Drung 2d5fcdeef4 Add mellanox_hca_temp text collector example (#1128)
* deleted_libraries: Upgrade to Python 3

Python 2.7 will not be maintained past 2020. Therefore upgrade
text_collector_examples/deleted_libraries.py to Python 3.

* Add mellanox_hca_temp text collector example

mellanox_hca_temp is a script that reads Mellanox HCA temperature using
the Mellanox mget_temp_ext tool.

Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
2018-11-01 12:23:06 +01:00
Matt Layher 073e056121
Merge pull request #1131 from prometheus/mdl-collector-export
collector: export NodeCollector for documentation purposes
2018-10-31 12:38:48 -04:00
Matt Layher c0a55e3f80 collector: add bounds check and test for filesystem collector (#1133)
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2018-10-30 22:12:42 +01:00
Patrick bdc0e7e678 Collect additional common Infiniband counters (#1120)
* Collect additional common Infiniband counters

Signed-off-by: Patrick Freeman <will.pat.free@gmail.com>
2018-10-30 21:54:09 +01:00
Paul Gier 988f049040 collector/hwmon_linux: handle temperature sensor file which doesn't have item suffix (#1123)
In some cases the file might be called "temp" instead of the usual format "temp<index>_<item>"
as described in the kernel docs: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface
In this case, treat this as an _input file containing the current temperature reading.

Fixes #1122

Signed-off-by: Paul Gier <pgier@redhat.com>
2018-10-30 18:49:22 +01:00
Paul Gier 38163f234f collector/diskstats: don't fail if there are extra stats, just ignore… (#1125)
* collector/diskstats: don't fail if there are extra stats, just ignore them

Signed-off-by: Paul Gier <pgier@redhat.com>
2018-10-30 18:45:00 +01:00
Kazumasa Kohtaka 9bd4416822 Makefile: add target for checking Prometheus rules (#1126)
Signed-off-by: Kazumasa Kohtaka <kkohtaka@gmail.com>
2018-10-30 18:44:17 +01:00
Matt Layher 778124a56c collector: add bounds check and test for tcpstat collector (#1134)
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2018-10-27 09:21:36 +02:00
Matt Layher 3d798aa4a1 collector: fix golint problems in ZFS collector (#1132)
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2018-10-27 09:18:33 +02:00
Matt Layher 2c2ee93519
collector: export NodeCollector for documentation purposes
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2018-10-26 15:42:00 -04:00
Ben Kochie 7519967619
Fix promu config (#1119)
Rename promu no-cgo config to default promu name to avoid crossbuild
problems.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-10-20 08:21:51 +02:00
Ben Kochie 0da9d248e7
Update for 0.17.0-rc.0 release (#1118)
* Update VERSION.
* Update CHANGELOG.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-10-19 17:29:19 +02:00
Ben Kochie a0a164defb
Update cpufreq metrics collector (#1117)
* Update Linux cpufreq collector to use new procfs library functions.
* Split thermal throttle collection to a separate function.
* Add new required fixtures and repack ttar file.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-10-18 17:28:19 +02:00
Ben Kochie ef7a02dfa8
Update vendor github.com/prometheus/client_golang/...@v0.9.0 (#1111)
* Update vendor github.com/prometheus/client_golang/...@v0.9.0
* Update vendor github.com/prometheus/common/...

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-10-15 20:40:34 +02:00
Paul Gier 7057c64f45 fix a few minor golint warnings (#1110)
Signed-off-by: Paul Gier <pgier@redhat.com>
2018-10-15 18:44:06 +02:00
Paul Gier e8d8199072 Update diskstats for linux kernel 4.19 (#1109)
The format of /proc/diskstats is changing in linux-4.19 to include some
additional fields.  See: https://www.kernel.org/doc/Documentation/iostats.txt

* collector/diskstats: use constants for some hard coded strings
* collector/diskstats: update diskstats for linux-4.19
* collector/diskstats: remove kernel doc url from individual metrics

Signed-off-by: Paul Gier <pgier@redhat.com>
2018-10-15 17:24:28 +02:00
Ben Kochie 0880d460d7
Ignore additional virtual filesystems (#1104)
Add more virtual filesystems to the default ignore list
* bpf
* cgroup2
* selinuxfs
* squashfs

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-10-12 11:24:32 +02:00
Ben Kochie 9cf508e673
Update vendoring (#1105)
* Update vendor github.com/sirupsen/logrus@v1.1.1
* Update vendor github.com/coreos/go-systemd/dbus@v17
* Update vendor github.com/golang/protobuf/proto@v1.2.0
* Update vendor github.com/konsorten/go-windows-terminal-sequences@v1.0.1
* Update vendor github.com/mdlayher/...
* Update vendor github.com/prometheus/procfs/...
* Update vendor golang.org/x/...

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-10-11 18:41:41 +02:00
Bryan Boreham f0d2a06b11 Update readme (#1107)
* State that wifi collector is disabled by default

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Add the 'processes' collector to the Readme

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2018-10-11 18:27:41 +02:00
dbalakirev 5273b00df9 launchctl example based on LaunchDaemons (#1102)
LaunchDaemons are the correct way to create services that are restart proof.
There is now only a single destination place mentioned in the readme for the plist file.

Signed-off-by: Dávid Balakirev <dave00ster@gmail.com>
2018-10-10 12:44:05 +02:00
Björn Rabenstein bddf41d327 Update prometheus/client_golang vendoring (#1099)
This is mostly required to fix a bug with histograms on 32bit platforms.
(Which might or might not be used in node_exporter. Just in case...)

Signed-off-by: beorn7 <beorn@soundcloud.com>
2018-10-05 16:05:02 +02:00
Dario Maiocchi 01ec8c5c5c Remove continue with label (#1084)
Instead of continue with label use helper function
Signed-off-by: dmaiocchi <dmaiocchi@suse.com>
2018-10-05 13:20:30 +02:00