Commit graph

1654 commits

Author SHA1 Message Date
Daniel Hodges b14168cf6a
Add perf tracepoint collection flag (#1664)
* Add tracepoint collector option for perf collector

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2020-04-17 12:02:08 +02:00
Daniel Hodges 44357ed677
Fix initialization in perf collector when using multiple CPUs (#1665)
* Fix initialization in perf collector when using multiple CPUs

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2020-04-17 11:59:07 +02:00
Michael Vorburger ⛑️ 4135c00d33 minor README doc fix re. collector.perf.cpus
Signed-off-by: Michael Vorburger <mike@vorburger.ch>
2020-04-17 11:02:26 +02:00
jangdm d4d2e1db98
fix typo in TIME.md (#1670)
fix typo in TIME.md

Signed-off-by: jangdm <jamin4@naver.com>
2020-04-09 09:00:00 +02:00
WOO CHANG HO 612ea0cd12 Add more compatible rules
Signed-off-by: zodiac12k <zodiac12k@gmail.com>
2020-04-08 10:19:44 +02:00
J0WI 674ddfa35c Fix typo in README.md
Signed-off-by: J0WI <J0WI@users.noreply.github.com>
2020-04-08 10:18:22 +02:00
Fatih Degirmenci a78c5d3cd8
Update systemd example readme file (#1663)
The readme file does not mention the need to create a folder named
/var/lib/node_exporter/textfile_collector as a step. Lack of this
folder results errors for node_exporter service which is visible
in systemd status output. These errors possibly harmless but it is
not good to have them still.

$ sudo systemctl status node_exporter
--- snipped ---
Apr 04 14:51:35 ubuntu node_exporter[14713]: level=info ts=2020-04-04T14:51:35.584Z caller=node_exporter.go:190 msg="Listening on" address=:9100
Apr 04 15:05:34 ubuntu node_exporter[14876]: level=error ts=2020-04-04T15:05:34.464Z caller=textfile.go:197 collector=textfile msg="failed to read textfile collector directory" path=/var/lib/node_exporter/textfile_collector=textfile msg="failed to read textfile collector directory" path=/var/lib/node_exporter/textfile_collector err="open /var/lib/node_exporter/textfile_collector: no such file or directory"
--- snipped ---

Signed-off-by: Fatih Degirmenci <fdegir@gmail.com>
2020-04-06 15:32:02 +02:00
Povilas Versockas bd3e6d224c
Add NodeTextFileCollectorScrapeError alert to mixin
Signed-off-by: Povilas Versockas <p.versockas@gmail.com>
2020-03-31 18:12:36 +03:00
Peter Bueschel da5972b539
Add gauges for allocated memory for queued UDP and TCP packages (#1503)
* Two new states will be added to the tcpstat collector called rx_queued_bytes and tx_queued_bytes.

For UDP datagrams an additional collector 'udp_queues' can be used to expose the total lengths of the tx_queue and rx_queue.
@SuperQ and @discordianfish this changes gives us the option to check for overloaded UDP + TCP processing.
The names of the new TCP states and the UDP metric can be discussed.
The current reasons are just:

I don't want to add another collector for the same exposed file, so I just added the new states to the tcpstat collector.
I chose the name 'udp_queue' instead of 'udpstat' as UDP has no state.


Signed-off-by: Peter Bueschel <peter.bueschel@logmein.com>
2020-03-31 10:46:32 +02:00
Ben Kochie 4891b01b6c
Add changelog entry for #1647
Signed-off-by: Ben Kochie <superq@gmail.com>
2020-03-27 21:36:39 +01:00
Paweł Krupa 1771fc87d9
collector/systemd: use regexp to extract systemd version (#1647)
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2020-03-27 21:35:56 +01:00
Björn Rabenstein a57f246579
Merge pull request #1649 from prometheus/beorn7/mixin
Fix sign error in `NodeClockSkewDetected`
2020-03-25 14:44:11 +01:00
beorn7 8b00b22904 Fix sign error in NodeClockSkewDetected
Signed-off-by: beorn7 <beorn@grafana.com>
2020-03-25 13:07:23 +01:00
Björn Rabenstein 7f5a0ea5f6
Merge pull request #1480 from paulfantom/time_offset
docs/node-mixin: alert on desynchronised clock
2020-03-23 21:17:41 +01:00
paulfantom 820f8d595e
docs/node-mixin: alert on desynchronised clock
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2020-03-23 08:23:58 +01:00
Björn Rabenstein 99182a3fe0
Merge pull request #1644 from Neraud/dev/fix_mixin_alerts
[node-mixin] Add missing coma in alerts
2020-03-21 21:39:29 +01:00
Tom Wilkie 6496c24d61
Metrics for IO errors on Mac. (#1636)
* Metrics for IO errors and retries on Mac.

Signed-off-by: Tom Wilkie <tom@grafana.com>
2020-03-21 21:05:38 +01:00
Neraud 1006a2c4bb Add missing coma
Signed-off-by: Neraud <neraud.login@gmail.com>
2020-03-21 13:06:43 +01:00
Povilas Versockas 48bb6f670c Add NodeHighNumberConntrackEntriesUsed
Signed-off-by: Povilas Versockas <p.versockas@gmail.com>
2020-03-20 17:46:05 +01:00
Benjamin Drung 34d50e15d5 Add model_name and stepping to node_cpu_info metric
The `node_cpu_info` metric contains some information like the `model`
(which is an integer), but not the human readable model name. Also the
stepping of the processor might be interesting, since different stepping
of a processor might behave differently.

Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
2020-03-20 17:27:11 +01:00
Ben Kochie 47610d0d2b
Update procfs library (#1640)
Bump procfs to latest release.

Fixes: https://github.com/prometheus/node_exporter/issues/1625
Fixes: https://github.com/prometheus/node_exporter/issues/1634

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-03-19 19:51:20 +01:00
Ben Kochie e49a13d0cf
Catch missing schedstat file (#1641)
Suppres error log noise if schedstat file doesn't exist.

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-03-19 19:50:36 +01:00
iuri aranda 0107bc7942
Make FS space alerts thresholds configurable (#1624)
* Make FS space alerts thresholds configurable (#1)

This makes it possible to tweak the thresholds for
the NodeFilesystemSpaceFillingUp alerts. Which
might be necessary in systems like Kubernetes,
where the image garbage collector runs at 85%,
so it's not a problem that the disk reaches that usage %.

Signed-off-by: iuri aranda <iuri@skyscrapers.eu>
2020-03-02 16:24:51 +01:00
Ben Kochie a7c31ff7ed
Enable golint (#1623)
* Enable golint in golangci-lint tests.
* Fix up minor linting issues.

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-02-27 11:59:02 +01:00
Ben Kochie ef7c05816a
Release 1.0.0-rc.0 (#1614)
Update CHANGELOG/VERSION for 1.0.0-rc.0 release.
* Add a note about new https settings to top-level README.
* Mark --web.config flag as experimental.

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-02-20 13:42:47 +01:00
Ben Kochie c4183f9935
Minor cleanup in perf collector (#1616)
* Use `strconv.Itoa()` instead of `fmt.Sprintf()` for simple conversion.
* Eliminate copy-paste in collector setup.

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-02-20 12:05:59 +01:00
Daniel Hodges ec62141388
Fix num cpu (#1561)
* add a map of profilers to CPUids

`runtime.NumCPU()` returns the number of CPUs that the process can run
on. This number does not necessarily correlate to CPU ids if the
affinity mask of the process is set.

This change maintains the current behavior as default, but also allows
the user to specify a range of CPUids to use instead.

The CPU id is stored as the value of a map keyed on the profiler
object's address.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
Signed-off-by: Daniel Hodges <hodges@uber.com>

Co-authored-by: jdamato-fsly <55214354+jdamato-fsly@users.noreply.github.com>
2020-02-20 11:36:33 +01:00
Paul Gier b40954dce5
new flag to disable all default collectors (#1460)
* new flag to disable all default collectors

Signed-off-by: Paul Gier <pgier@redhat.com>

Co-authored-by: Ben Kochie <superq@gmail.com>
2020-02-20 11:03:33 +01:00
Ben Kochie 3e1b0f1bee
Don't count empty collection as success (#1613)
Many collectors depend on underlying features to be enabled. This causes
confusion about what "success" means. This changes the behavior of the
`node_scrape_collector_success` metric.

* When a collector is unable to find data don't return success.
* Catch the no data error and send to Debug log level to avoid log spam.
* Update collectors to support this new functionality.
* Fix copy-pasta mistake in infiband debug message.

Closes: https://github.com/prometheus/node_exporter/issues/1323

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-02-19 16:11:29 +01:00
Ben Kochie 1a75bc7b50
Fix up Darwin swap metrics
* Add a changelog entry.
* Remove redundant swap free metric.

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-02-19 15:52:47 +01:00
jonas-lindmark 9828533697
Swap usage on darwin from sysctl vm.swapusage (#1608)
Signed-off-by: jonas <jonas.lindmark@denacode.se>
2020-02-19 15:51:29 +01:00
Silke Hofstra 8faa843fc4
Add Btrfs collector (#1512)
* Add procfs/btrfs to vendor folder
* Add Btrfs collector

Resolves #1100

Signed-off-by: Silke Hofstra <silke@slxh.eu>
2020-02-19 15:48:51 +01:00
Benjamin Drung ca1ac435ea
Collect non-numeric data from /sys/class/infiniband (#1563)
Let the node exporter collect the non-numeric data from
/sys/class/infiniband: board ID, firmware version, and HCA type.

Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>

Co-authored-by: Ben Kochie <superq@gmail.com>
2020-02-19 15:18:44 +01:00
Phil Porada 14eafab016
Adds metrics and tests for UDP receive and send buffer errors (#1534)
* Adds metrics for UDP receive and send buffer errors

Signed-off-by: Phil Porada <philporada@gmail.com>
2020-02-19 14:41:40 +01:00
Julian Kornberger cfcaeee145
Use strconv.Itoa() instead of fmt.Sprintf() (#1566)
Signed-off-by: Julian Kornberger <jk+github@digineo.de>
2020-02-19 14:34:05 +01:00
Tobias Klauser 6ad94ae4bc
Implement loadavg on all BSDs without cgo (#1584)
Reuse the Go-only implementation already in place for FreeBSD (#385) on
Darwin, DragonflyBSD, NetBSD and OpenBSD.

Tested on all affected platforms.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
2020-02-18 14:14:35 +01:00
Ben Kochie 1567cefdae
Bump all vendoring (#1612)
Update all vendoring to current releases.

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-02-18 13:27:11 +01:00
Ben Kochie 14df2a1a1a
Update to latest procfs library (#1611)
Bump to v0.0.10 procfs library.

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-02-18 11:33:46 +01:00
Johannes 'fish' Ziemke dcfd610433
systemd: Clarify private flag description (#1587)
This requires root, so it shouldn't be used.

This closes #1246

Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
2020-02-15 11:39:45 +01:00
Julien Pivotto 84c6446094 netdev: clean zero-value assignments
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-02-13 12:20:27 +01:00
Ben Kochie 92ea3c6a3f Fix inifiband collector log noise (#1599)
Handle non-existent infiniband results silent.

Fixes: https://github.com/prometheus/node_exporter/issues/1511

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-02-08 17:18:17 +01:00
Ukri Niemimuukko eac3e30f7f rapl_linux collector
This exposes RAPL statistics from /sys/class/powercap.

Co-Authored-By: Ben Kochie <superq@gmail.com>
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2020-02-01 12:06:30 +01:00
Paul Cameron 9bb37873a8 Add unix socket support for supervisord collector (#1592)
* Add unix socket support for supervisord collector

For example:
  --collector.supervisord.url=unix:///var/run/supervisor.sock

Fixes prometheus/node_exporter#262

Signed-off-by: Paul Cameron <cameronpm@gmail.com>
2020-01-28 08:50:23 +01:00
vitt-bagal 04ad4b3510 Added s390x support for docker image (#1539)
Signed-off-by: Vitthal Bagal <vitthalb@us.ibm.com>
2020-01-27 10:55:35 +01:00
Peter Tribble e7a27366a0 Fix Solaris build (typos in function names) (#1522)
Signed-off-by: Peter Tribble <peter.tribble@gmail.com>

Co-authored-by: Ben Kochie <superq@gmail.com>
2020-01-24 18:06:10 +01:00
Thomas Lin 3ddc82c2d8 Fixed inaccurate 'node_network_speed_bytes' when speeds are low (#1580)
Integer division and the order of operations when converting Mbps to Bps
results in a loss of accuracy if the interface speeds are set low.
e.g. 100 Mbps is reported as 12000000 Bps, should be 12500000
     10 Mbps is reported as 1000000 Bps, should be 1250000

Signed-off-by: Thomas Lin <t.lin@mail.utoronto.ca>
2020-01-01 13:10:53 +01:00
Ben Kochie f316099f87
Fix up softnet collector for go-kit change (#1581)
Add missing update for new go-kit logging change.

Signed-off-by: Ben Kochie <superq@gmail.com>
2019-12-31 19:36:39 +01:00
Ben Ye 2477c5c67d switch to go-kit/log (#1575)
Signed-off-by: yeya24 <yb532204897@gmail.com>
2019-12-31 17:19:37 +01:00
Peter Nicholson a80b7d0bc5 Add softnet collector (#1576)
Signed-off-by: Peter Nicholson <petergoods@hotmail.com>
2019-12-30 01:36:10 +01:00
Julian Kornberger cafb12dc59 Add cause to error message
Signed-off-by: Julian Kornberger <jk+github@digineo.de>
2019-12-19 15:26:55 +01:00