Commit graph

1463 commits

Author SHA1 Message Date
Dai Dang Van 085d872aaf Add S.M.A.R.T metrics (#1209)
Update metrics following SMART attributes in [1][2]
- Seek_Error_Rate - ID: 7
- Reallocated_Event_Count - ID: 196

[1] https://en.wikipedia.org/wiki/S.M.A.R.T.#Known_ATA_S.M.A.R.T._attributes
[2] https://en.wikibooks.org/wiki/Minimizing_Hard_Disk_Drive_Failure_and_Data_Loss/Self-Monitoring,_Analysis,_and_Reporting_Technology

Signed-off-by: Dai, Dang Van <daikk115@gmail.com>
2019-01-03 18:12:28 +01:00
Anton Tolchanov cf8b29d1fb Add a sample btrfs stats collector script (#1200)
Signed-off-by: Anton Tolchanov <commits@knyar.net>
2018-12-21 14:10:03 +01:00
Simon Pasquier 97dab59e18 Fix go.sum after Go1.11.4 bump (#1202)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-12-19 11:41:27 +00:00
dhewg 7c960fd683 smartmon.sh: add metric for active/low-power mode (#1192)
Add this new metric (where sda is active and sdb is in standby mode):
smartmon_device_active{disk="/dev/sda",type="sat"} 1
smartmon_device_active{disk="/dev/sdb",type="sat"} 0

Also skip further metrics if the drive is in a low-power mode. This
prevents spinning up disks just to get the metrics (which matches e.g.
debian's default behavior for smartd).

Signed-off-by: Andre Heider <a.heider@gmail.com>
2018-12-13 16:11:23 +01:00
Paul Gier 03bb276deb Makefile.common: fix promu download path for arm32 (#1196)
Signed-off-by: Paul Gier <pgier@redhat.com>
2018-12-13 16:07:22 +01:00
Paul Gier 614b815e00 Makefile.common: fix format rule (#1195)
Signed-off-by: Paul Gier <pgier@redhat.com>
2018-12-11 17:47:09 +01:00
Ben Kochie 73ddf5f1f7 netstat: Add TCP In/Out Segs (#1185)
* netstat: Add TCP In/Out Segs

In order to get a better idea of TCP packet loss, we need to know how
many `node_netstat_Tcp_OutSegs` there are so we can compare this to
`node_netstat_Tcp_RetransSegs`.

Signed-off-by: Ben Kochie <superq@gmail.com>

* Update fixtures

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-12-08 12:16:02 +01:00
Tariq Ibrahim 6bd51269b7 update to host_statistics64 for Darwin meminfo (#1183)
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2018-12-06 16:47:20 +01:00
Ben Kochie f9dd8e9b8c
Release v0.17.0 (#1168)
* Update CHANGELOG
* Update VERSION

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-30 15:18:48 +01:00
Ben Kochie 4abc6fba7d
Add fallback for missing /proc/1/mounts (#1172)
* Add fallback for missing /proc/1/mounts

On some systems, `/proc/1/mounts` is hidden from non-root users due to
the `hidepid` procfs feature. Attempt to fallback to `/proc/mounts` if
`/proc/1/mounts` is not found.

Signed-off-by: Ben Kochie <superq@gmail.com>

* Add tests.

Signed-off-by: Ben Kochie <superq@gmail.com>

* Add CHANGELOG entry.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-30 14:01:55 +01:00
Jerome Froelich 0cb0c4d911 Remove unused variable readOnly from filesystem_linux.go. (#1173)
The pull request #1002 changed the logic used on Linux servers to determine if a filesystem is
read-only. As a result of this change, the variable `readOnly` is now unused and can be removed.

Signed-off-by: Jerome Froelich <jeromefroelich@hotmail.com>
2018-11-30 14:01:39 +01:00
Ben Kochie becca1275c
Convert to Go modules (#1178)
* Convert to Go modules

* Update promu config.
* Convert to Go modules.
* Update vendoring.
* Update Makefile.common.
* Update circleci config.
* Use Prometheus release tar for promtool.
* Fixup unpack

* Use temp dir for unpacking tools.
* Use BSD compatible tar command.
* OpenBSD mkdir doesn't support `-v`.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-30 14:01:20 +01:00
Ben Kochie 1732478361
circleci: switch to 2.1 config
Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-29 12:06:34 +01:00
Andreas Wirooks 9c9e17aba7 Handle 'Unknown' as measurement value. (#1113)
We use the output-compatible perccli and storcli.py does not handle 'Unknown' as a result:
```
sg="Error parsing \"/var/lib/node_exporter/perccli.prom\": text format parsing error in line 222: expected float as value, got \"Unknown\"" source="textfile.go:212"
```
I know, the perccli should not return 'Unknown' but this error breaks all other useful measurements because the prom file is not parsable. My if condition fixes this.

Signed-off-by: Andreas Wirooks <andreas.wirooks@1und1.de>
2018-11-23 16:29:56 +01:00
ioriveur ea8e1373f7 Change Dfly's CPU counting frequency (#1140)
* Change Dfly's CPU counting frequency, see: https://github.com/prometheus/node_exporter/issues/1129

* Convert Dfly's CPU unit into second

Signed-off-by: iori-yja <fivo.11235813@gmail.com>
2018-11-21 13:45:22 +01:00
Ben Kochie ffefc8e74d Add a limit to the number of in-flight requests (#1166)
In order to avoid stuck collectors using up all system resources, add a
limit to the number of parallel in-flight scrape requests. This will
return a 503 error.

Default to 40 requests, this seems like a reasonable number based on:
* Two Prometheus servers scraping every 15 seconds.
* Failing scrapes after 5 minutes of stuckness.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-20 18:11:40 +01:00
Johannes 'fish' Ziemke bcec99e0aa Add link to prometheus-dcgm (#1164)
Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2018-11-19 19:35:01 +01:00
Nemikolh 62f99f95f0 Add receive/transmit bytes total metric (wifi collector). (#1150)
Signed-off-by: Nemikolh <Nemikolh@users.noreply.github.com>
2018-11-19 19:15:54 +01:00
Matthias Loibl 0bcded8d2b
node-mixin: Update dashboards to v0.16
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
2018-11-19 17:40:30 +01:00
Matthias Loibl 61bc03adbe
node-mixin: Ignore jsonnetfile.lock.json and vendor folder
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
2018-11-19 16:56:05 +01:00
Matthias Loibl 53e4093b64
node-mixin: Update alerts to node_exporter v0.16
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
2018-11-19 16:46:51 +01:00
Matthias Loibl 619e23e5df
node-mixin: Update rules to node_exporter v0.16
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
2018-11-19 16:46:48 +01:00
Matthias Loibl 961aa67701
Append .rules to node_exporter.rules group name
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
2018-11-19 16:46:45 +01:00
Matthias Loibl 1482cc0309
Rename group names to node-exporter to avoid naming collisions
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
2018-11-19 16:46:41 +01:00
Matthias Loibl ff0a13d900
Fix multiline strings
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
2018-11-19 16:46:27 +01:00
Tom Wilkie bd648827fe
Remove k8s from dashboard title, make gauges use datasource variable.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-11-19 16:46:25 +01:00
Tom Wilkie 642f67ffa1
Fix up some of the USE metrics.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-11-19 16:46:23 +01:00
Tom Wilkie c34275d6e5
Switch gauges to percentunit.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-11-19 16:46:22 +01:00
Tom Wilkie 417316b0e4
Switch to irate[1m] for node dashboard.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-11-19 16:46:20 +01:00
Tom Wilkie 9303cf78ff
Lower case binary operators and fix indentation.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-11-19 16:46:18 +01:00
Tom Wilkie bafe1707f1
Beginnings of a node-exporter monitoring mixin.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-11-19 16:46:10 +01:00
ioriveur 17fee8081f Check BSD's mib which accounts for swap size (#1149)
* Change Dfly's CPU counting frequency, see: https://github.com/prometheus/node_exporter/issues/1129

Signed-off-by: iori-yja <fivio.11235813@gmail.com>

* Convert Dfly's CPU unit into second

Signed-off-by: iori-yja <fivio.11235813@gmail.com>

* Check BSD's mib which accounts for swap size; see #1127

Signed-off-by: iori-yja <fivo.11235813@gmail.com>

* fix swap check code

Signed-off-by: iori-yja <fivo.11235813@gmail.com>
2018-11-17 11:02:54 +01:00
Paul Gier 3cf5b006fb examples/init.d: fix web.listen-address flag (#1157)
CLI flags use two dashes instead of one since v0.15.0
Also, use default port number
Fixes #1156

Signed-off-by: Paul Gier <pgier@redhat.com>
2018-11-16 00:50:09 +01:00
Ben Kochie ab19e0c831
Add changelog entry for #1148 (#1154)
Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-15 04:22:02 +01:00
Arno Uhlig 6edd9d217e [systemd] collect taskCurrent, tasksMax per systemd unit (#1098)
* [systemd] collect taskCurrent, tasksMax per systemd unit

Signed-off-by: Arno Uhlig <arno.uhlig@sap.com>
2018-11-14 10:50:39 +01:00
Björn Rabenstein 174b854080
Merge pull request #1148 from prometheus/beorn7/metrics
Add --web.disable-exporter-metrics flag
2018-11-13 15:24:38 +01:00
beorn7 cd2331a185 Add --web.disable-exporter-metrics flag
If this flag is set, the metrics about the exporter itself (go_*,
process_*, promhttp_*) will be excluded from /metrics.

The Kingpin way of handling boolean flags makes the negative flag
wording (_dis_able) the most reasonably one.

This also refactors the flow in node_exporter.go quite a bit to avoid
mixing up the global and a local registry and to avoid re-creating a
registry even if no filtering is requested.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2018-11-13 14:22:25 +01:00
Ben Kochie b1eec66640
Add TCPSynRetrans to netstat default filter (#1143)
Tcp SYN packet retransmits are a very useful signal as they affect
network performance disproportionately to regular TCP retransmits.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-07 17:21:18 +01:00
Christopher Blum 1b98db9fa7 textfile example storcli enhancements (#1145)
* storcli.py: Remove IntEnum

This removes an external dependency.
Moved VD state to VD info labels

* storcli.py: Fix BBU health detection

BBU Status is 0 for a healthy cache vault and 32 for a healthy BBU.

* storcli.py: Strip all strings from PD

Strip all strings that we get from PDs.
They often contain whitespaces....

* storcli.py: Add formatting options

Add help text explaining how this documented was formatted

* storcli.py: Add DG to pd_info label

Add disk group to pd_info.
That way we can relate to PDs in the same DG.
For example to check if all disks in one RAID
use the same interface...

* storcli.py: Fix promtool issues

Fix linting issues reported by promtool check-metrics
* storcli.py: Exit if storcli reports issues

storcli reports if the command was a success.
We should not continue if there are issues.

* storcli.py: Try to parse metrics to float

This will sanitize the values we hand over to
node_exporter - eliminating any unforeseen values we read out...

* storcli.py: Refactor code to implement handle_sas_controller()

Move code into methods so that we can now also support HBA queries.
* storcli.py: Sort inputs

"...like a good python developer"
  - Daniel Swarbrick

* storcli.py: Replace external dateutil library with internal datetime

Removes external dependency...

* storcli.py: Also collect temperature on megaraid cards

We have already collected them on mpt3sas cards...

* storcli.py: Clean up old code

Removed dead code that is not used any more.

* storcli.py: strip() all information for labels

They often contain whitespaces...

* storcli.py: Try to catch KeyErrors generally

If some key we expect is not there, we will want to
still print whatever we have collected so far...

* storcli.py: Increment version number

We have made some changes here and there.
The general look of the data has not been changed.

* storcli.py: Fix CodeSpell issue

Split string to avoid issues with Codespell due to Celcius in JSON Key

Signed-off-by: Christopher Blum <zeichenanonym@web.de>
2018-11-07 17:12:23 +01:00
Sven Haardiek 29d4629f55 Introduce example to get pending updates from pacman (#1114)
* Introduce example to get pending updates from pacman

Signed-off-by: Sven Haardiek <sven@haardiek.de>
2018-11-05 22:27:57 +01:00
Cougar 764da30556 Add compat rules for node_time, node_memory_ShmemHugePages and node_memory_ShmemPmdMapped (#1138)
Signed-off-by: Cougar <cougar@random.ee>
2018-11-05 16:40:19 +01:00
Benjamin Drung 2d5fcdeef4 Add mellanox_hca_temp text collector example (#1128)
* deleted_libraries: Upgrade to Python 3

Python 2.7 will not be maintained past 2020. Therefore upgrade
text_collector_examples/deleted_libraries.py to Python 3.

* Add mellanox_hca_temp text collector example

mellanox_hca_temp is a script that reads Mellanox HCA temperature using
the Mellanox mget_temp_ext tool.

Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
2018-11-01 12:23:06 +01:00
Matt Layher 073e056121
Merge pull request #1131 from prometheus/mdl-collector-export
collector: export NodeCollector for documentation purposes
2018-10-31 12:38:48 -04:00
Matt Layher c0a55e3f80 collector: add bounds check and test for filesystem collector (#1133)
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2018-10-30 22:12:42 +01:00
Patrick bdc0e7e678 Collect additional common Infiniband counters (#1120)
* Collect additional common Infiniband counters

Signed-off-by: Patrick Freeman <will.pat.free@gmail.com>
2018-10-30 21:54:09 +01:00
Paul Gier 988f049040 collector/hwmon_linux: handle temperature sensor file which doesn't have item suffix (#1123)
In some cases the file might be called "temp" instead of the usual format "temp<index>_<item>"
as described in the kernel docs: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface
In this case, treat this as an _input file containing the current temperature reading.

Fixes #1122

Signed-off-by: Paul Gier <pgier@redhat.com>
2018-10-30 18:49:22 +01:00
Paul Gier 38163f234f collector/diskstats: don't fail if there are extra stats, just ignore… (#1125)
* collector/diskstats: don't fail if there are extra stats, just ignore them

Signed-off-by: Paul Gier <pgier@redhat.com>
2018-10-30 18:45:00 +01:00
Kazumasa Kohtaka 9bd4416822 Makefile: add target for checking Prometheus rules (#1126)
Signed-off-by: Kazumasa Kohtaka <kkohtaka@gmail.com>
2018-10-30 18:44:17 +01:00
Matt Layher 778124a56c collector: add bounds check and test for tcpstat collector (#1134)
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2018-10-27 09:21:36 +02:00
Matt Layher 3d798aa4a1 collector: fix golint problems in ZFS collector (#1132)
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2018-10-27 09:18:33 +02:00