Commit graph

112 commits

Author SHA1 Message Date
Ben Kochie 08ce3c6dd4
Merge pull request #1733 from prometheus/superq/OutRsts
Include TCP OutRsts in netstat metrics
2020-06-18 17:12:45 +02:00
Ben Kochie a34630b8a2
Update for 1.0.1 release
Update changelog and version for 1.0.1 release.

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-06-15 14:34:07 +02:00
Ben Kochie c8c1618074
Merge pull request #1747 from prometheus/superq/fix_powersupplyclass
Handle no data from powersupplyclass
2020-06-14 15:45:12 +02:00
Ben Kochie 5fed4f01e9
Handle no data from powersupplyclass
Handle the case when /sys/class/power_supply doesn't exist. Fixes
logging error spam.

Requires https://github.com/prometheus/procfs/pull/308

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-06-13 11:09:16 +02:00
Ben Kochie 7e49b68d3a
Improve filter flag names.
Update netdev and systemd collectors to deprecate poorly chosen flag names.

Old flag names to be removed in 2.0.0.

https://github.com/prometheus/node_exporter/issues/1742

Add log messages for parsed flag values to help discover quoting isuses in
supervisors.

https://github.com/prometheus/node_exporter/issues/1737

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-06-12 12:46:31 +02:00
Ben Kochie 204164e4e4
Include TCP OutRsts in netstat metrics
TCP "OutRsts" is the number of TCP Resets sent by the node. This can be
useful for monitoring connection failures and flooding.

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-06-04 08:51:39 +02:00
Ben Kochie 11a0aaaa0a
Release 1.0.0
* The netdev collector CLI argument `--collector.netdev.ignored-devices` was renamed to `--collector.netdev.device-blacklist` in order to conform with the systemd collector. #1279
* The label named `state` on `node_systemd_service_restart_total` metrics was changed to `name` to better describe the metric. #1393
* Refactoring of the mdadm collector changes several metrics
    - `node_md_disks_active` is removed
    - `node_md_disks` now has a `state` label for "fail", "spare", "active" disks.
    - `node_md_is_active` is replaced by `node_md_state` with a state set of "active", "inactive", "recovering", "resync".
* Additional label `mountaddr` added to NFS device metrics to distinguish mounts from the same URL, but different IP addresses. #1417
* Metrics node_cpu_scaling_frequency_min_hrts and node_cpu_scaling_frequency_max_hrts of the cpufreq collector were renamed to node_cpu_scaling_frequency_min_hertz and node_cpu_scaling_frequency_max_hertz. #1510
* Collectors that are enabled, but are unable to find data to collect, now return 0 for `node_scrape_collector_success`.

* [CHANGE] Add `--collector.netdev.device-whitelist`. #1279
* [CHANGE] Ignore iso9600 filesystem on Linux #1355
* [CHANGE] Refactor mdadm collector #1403
* [CHANGE] Add `mountaddr` label to NFS metrics. #1417
* [CHANGE] Don't count empty collectors as success. #1613
* [FEATURE] New flag to disable default collectors #1276
* [FEATURE] Add experimental TLS support #1277, #1687, #1695
* [FEATURE] Add collector for Power Supply Class #1280
* [FEATURE] Add new schedstat collector #1389
* [FEATURE] Add FreeBSD zfs support #1394
* [FEATURE] Add uname support for Darwin and OpenBSD #1433
* [FEATURE] Add new metric node_cpu_info #1489
* [FEATURE] Add new thermal_zone collector #1425
* [FEATURE] Add new cooling_device metrics to thermal zone collector #1445
* [FEATURE] Add swap usage on darwin #1508
* [FEATURE] Add Btrfs collector #1512
* [FEATURE] Add RAPL collector #1523
* [FEATURE] Add new softnet collector #1576
* [FEATURE] Add new udp_queues collector #1503
* [FEATURE] Add basic authentication #1673
* [ENHANCEMENT] Log pid when there is a problem reading the process stats #1341
* [ENHANCEMENT] Collect InfiniBand port state and physical state #1357
* [ENHANCEMENT] Include additional XFS runtime statistics. #1423
* [ENHANCEMENT] Report non-fatal collection errors in the exporter metric. #1439
* [ENHANCEMENT] Expose IPVS firewall mark as a label #1455
* [ENHANCEMENT] Add check for systemd version before attempting to query certain metrics. #1413
* [ENHANCEMENT] Add a flag to adjust mount timeout #1486
* [ENHANCEMENT] Add new counters for flush requests in Linux 5.5 #1548
* [ENHANCEMENT] Add metrics and tests for UDP receive and send buffer errors #1534
* [ENHANCEMENT] The sockstat collector now exposes IPv6 statistics in addition to the existing IPv4 support. #1552
* [ENHANCEMENT] Add infiniband info metric #1563
* [ENHANCEMENT] Add unix socket support for supervisord collector #1592
* [ENHANCEMENT] Implement loadavg on all BSDs without cgo #1584
* [ENHANCEMENT] Add model_name and stepping to node_cpu_info metric #1617
* [ENHANCEMENT] Add `--collector.perf.cpus` to allow setting the CPU list for perf stats. #1561
* [ENHANCEMENT] Add metrics for IO errors and retires on Darwin. #1636
* [ENHANCEMENT] Add perf tracepoint collection flag #1664
* [ENHANCEMENT] ZFS: read contents of objset file #1632
* [ENHANCEMENT] Linux CPU: Cache CPU metrics to make them monotonically increasing #1711
* [BUGFIX] Read /proc/net files with a single read syscall #1380
* [BUGFIX] Renamed label `state` to `name` on `node_systemd_service_restart_total`. #1393
* [BUGFIX] Fix netdev nil reference on Darwin #1414
* [BUGFIX] Strip path.rootfs from mountpoint labels #1421
* [BUGFIX] Fix seconds reported by schedstat #1426
* [BUGFIX] Fix empty string in path.rootfs #1464
* [BUGFIX] Fix typo in cpufreq metric names #1510
* [BUGFIX] Read /proc/stat in one syscall #1538
* [BUGFIX] Fix OpenBSD cache memory information #1542
* [BUGFIX] Refactor textfile collector to avoid looping defer #1549
* [BUGFIX] Fix network speed math #1580
* [BUGFIX] collector/systemd: use regexp to extract systemd version #1647
* [BUGFIX] Fix initialization in perf collector when using multiple CPUs #1665
* [BUGFIX] Fix accidentally empty lines in meminfo_linux #1671

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-05-25 14:03:04 +02:00
Ben Kochie 3565316d7e
Linux CPU: Cache CPU metrics
Cache CPU metrics to avoid counters (ie iowait) jumping backwards.

Fixes: https://github.com/prometheus/node_exporter/issues/1686

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-05-24 16:31:26 +02:00
Ben Kochie 3cedd344fd Release 1.0.0-rc.1
* Update CHANGELOG with fixes and improvements from rc.0

Signed-off-by: Ben Kochie <superq@gmail.com>
Signed-off-by: Richard Hartmann <richih@richih.org>
2020-05-14 16:41:37 +02:00
Julien Pivotto 202ecf9c9d
Add basic authentication (#1683)
* Add basic authentication

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-05-01 14:26:51 +02:00
Peter Bueschel da5972b539
Add gauges for allocated memory for queued UDP and TCP packages (#1503)
* Two new states will be added to the tcpstat collector called rx_queued_bytes and tx_queued_bytes.

For UDP datagrams an additional collector 'udp_queues' can be used to expose the total lengths of the tx_queue and rx_queue.
@SuperQ and @discordianfish this changes gives us the option to check for overloaded UDP + TCP processing.
The names of the new TCP states and the UDP metric can be discussed.
The current reasons are just:

I don't want to add another collector for the same exposed file, so I just added the new states to the tcpstat collector.
I chose the name 'udp_queue' instead of 'udpstat' as UDP has no state.


Signed-off-by: Peter Bueschel <peter.bueschel@logmein.com>
2020-03-31 10:46:32 +02:00
Ben Kochie 4891b01b6c
Add changelog entry for #1647
Signed-off-by: Ben Kochie <superq@gmail.com>
2020-03-27 21:36:39 +01:00
Tom Wilkie 6496c24d61
Metrics for IO errors on Mac. (#1636)
* Metrics for IO errors and retries on Mac.

Signed-off-by: Tom Wilkie <tom@grafana.com>
2020-03-21 21:05:38 +01:00
Benjamin Drung 34d50e15d5 Add model_name and stepping to node_cpu_info metric
The `node_cpu_info` metric contains some information like the `model`
(which is an integer), but not the human readable model name. Also the
stepping of the processor might be interesting, since different stepping
of a processor might behave differently.

Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
2020-03-20 17:27:11 +01:00
Ben Kochie ef7c05816a
Release 1.0.0-rc.0 (#1614)
Update CHANGELOG/VERSION for 1.0.0-rc.0 release.
* Add a note about new https settings to top-level README.
* Mark --web.config flag as experimental.

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-02-20 13:42:47 +01:00
Daniel Hodges ec62141388
Fix num cpu (#1561)
* add a map of profilers to CPUids

`runtime.NumCPU()` returns the number of CPUs that the process can run
on. This number does not necessarily correlate to CPU ids if the
affinity mask of the process is set.

This change maintains the current behavior as default, but also allows
the user to specify a range of CPUids to use instead.

The CPU id is stored as the value of a map keyed on the profiler
object's address.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
Signed-off-by: Daniel Hodges <hodges@uber.com>

Co-authored-by: jdamato-fsly <55214354+jdamato-fsly@users.noreply.github.com>
2020-02-20 11:36:33 +01:00
Paul Gier b40954dce5
new flag to disable all default collectors (#1460)
* new flag to disable all default collectors

Signed-off-by: Paul Gier <pgier@redhat.com>

Co-authored-by: Ben Kochie <superq@gmail.com>
2020-02-20 11:03:33 +01:00
Ben Kochie 3e1b0f1bee
Don't count empty collection as success (#1613)
Many collectors depend on underlying features to be enabled. This causes
confusion about what "success" means. This changes the behavior of the
`node_scrape_collector_success` metric.

* When a collector is unable to find data don't return success.
* Catch the no data error and send to Debug log level to avoid log spam.
* Update collectors to support this new functionality.
* Fix copy-pasta mistake in infiband debug message.

Closes: https://github.com/prometheus/node_exporter/issues/1323

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-02-19 16:11:29 +01:00
Ben Kochie 1a75bc7b50
Fix up Darwin swap metrics
* Add a changelog entry.
* Remove redundant swap free metric.

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-02-19 15:52:47 +01:00
Silke Hofstra 8faa843fc4
Add Btrfs collector (#1512)
* Add procfs/btrfs to vendor folder
* Add Btrfs collector

Resolves #1100

Signed-off-by: Silke Hofstra <silke@slxh.eu>
2020-02-19 15:48:51 +01:00
Ukri Niemimuukko eac3e30f7f rapl_linux collector
This exposes RAPL statistics from /sys/class/powercap.

Co-Authored-By: Ben Kochie <superq@gmail.com>
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2020-02-01 12:06:30 +01:00
Paul Cameron 9bb37873a8 Add unix socket support for supervisord collector (#1592)
* Add unix socket support for supervisord collector

For example:
  --collector.supervisord.url=unix:///var/run/supervisor.sock

Fixes prometheus/node_exporter#262

Signed-off-by: Paul Cameron <cameronpm@gmail.com>
2020-01-28 08:50:23 +01:00
Thomas Lin 3ddc82c2d8 Fixed inaccurate 'node_network_speed_bytes' when speeds are low (#1580)
Integer division and the order of operations when converting Mbps to Bps
results in a loss of accuracy if the interface speeds are set low.
e.g. 100 Mbps is reported as 12000000 Bps, should be 12500000
     10 Mbps is reported as 1000000 Bps, should be 1250000

Signed-off-by: Thomas Lin <t.lin@mail.utoronto.ca>
2020-01-01 13:10:53 +01:00
Peter Nicholson a80b7d0bc5 Add softnet collector (#1576)
Signed-off-by: Peter Nicholson <petergoods@hotmail.com>
2019-12-30 01:36:10 +01:00
Ben Kochie 0d9d7e961a
Update CHANGELOG
Add/update entries for recent merged PRs.

Signed-off-by: Ben Kochie <superq@gmail.com>
2019-11-25 21:50:00 +01:00
Matt Layher da6b66371f collector: reimplement sockstat collector with procfs (#1552)
* collector: reimplement sockstat collector with procfs
* collector: handle sockstat IPv4 disabled, debug logging

Signed-off-by: Matt Layher <mdlayher@gmail.com>
2019-11-25 13:41:38 -06:00
John Belmonte 15e36e2230 fix typo in cpufreq metric names (#1510)
Signed-off-by: John Belmonte <john@neggie.net>
2019-10-11 02:12:20 +09:00
Paul Gier 9f5225456d fix order of items in CHANGELOG
Signed-off-by: Paul Gier <pgier@redhat.com>
2019-09-25 14:39:43 -05:00
Paul Gier 4d72cb8059 add node_cpu_info metric
Contains information gathered from /proc/cpuinfo

Signed-off-by: Paul Gier <pgier@redhat.com>
2019-09-25 14:38:57 -05:00
Ben Kochie 82b7b1f732
Merge branch 'master' into coolingDevice 2019-09-09 17:44:03 +02:00
dt-rush 93fbb93a46 fix issue where rootfs path strips to the empty string (#1464)
Change-type: patch
Connects-to: #1463
Signed-off-by: dt-rush <nickp@balena.io>
2019-09-09 17:39:24 +02:00
Paul Gier 8c3de12c22 systemd: check version for availability of properties (#1413)
The dbus property 'SystemState' and the timer property 'LastTriggerUSec'
were added in version 212 of systemd.
Check that the version of systemd is higher than 212 before attempting
to query these properties

f755e3b74b
dedabea4b3

Resolves issue #291

Signed-off-by: Paul Gier <pgier@redhat.com>
2019-09-04 16:27:25 +02:00
Alex Schmitz 664025d60c
Scrape cooling_device state
Signed-off-by: Alex Schmitz <alex.schmitz@gmail.com>
2019-08-30 08:58:47 -05:00
Boris Momčilović 93c12e03a1 Ipvs firewall mark (#1455)
* IPVS: include firewall mark label

Signed-off-by: Boris Momčilović <boris@firstbeatmedia.com>
2019-08-27 14:24:11 +02:00
Richard Kojedzinszky 75462bf4fe Scrape thermal_zone temperatures (#1425)
* Scrape thermal_zone temperatures

Signed-off-by: Richard Kojedzinszky <richard@kojedz.in>
2019-08-04 12:56:36 +02:00
Ben Kochie 10146109ec
Update CHANGELOG for #1433
Signed-off-by: Ben Kochie <superq@gmail.com>
2019-08-03 12:33:25 +02:00
Dipack P Panjabi a7452023db Added mountinfo changes to node_exporter (#1417)
Use the extra information gleaned from the mountinfo file to add
a 'mountaddr' field for NFS metrics. This helps prevent prometheus from
ignoring mounts that come from the same URL, but are actually from
different IP addresses.

This commit also rebases to current master

Signed-off-by: Dipack P Panjabi <dpanjabi@hudson-trading.com>
2019-07-28 11:32:40 +02:00
Ben Kochie 852b340a46
Add changelog entry for #1439
Signed-off-by: Ben Kochie <superq@gmail.com>
2019-07-28 10:38:41 +02:00
dt-rush 5d3e2ce2ef properly strip path.rootfs from mountpoint labels (#1421)
Change-type: patch
Connects-to: #1418
Signed-off-by: dt-rush <nickp@balena.io>
2019-07-19 16:51:17 +02:00
Steven Kreuzer d8e47a9f9f Expose additional XFS runtime statistics (#1423)
Include directory operation, read/write system call, and vnode runtime
statistics for XFS filesystems.

Signed-off-by: Steven Kreuzer <skreuzer@FreeBSD.org>
2019-07-15 16:28:09 +02:00
Ben Kochie 0de95ef8f3
Add changelog entry for #1414
Signed-off-by: Ben Kochie <superq@gmail.com>
2019-07-12 14:25:17 +02:00
Phil Frost f693a71c06 Scrape CPU latency stats from /proc/schedstat (#1389)
These are useful as a direct indication of CPU contention and task
scheduler latency.

Handy references:
 - https://github.com/torvalds/linux/blob/master/Documentation/scheduler/sched-stats.txt
 - https://doc.opensuse.org/documentation/leap/tuning/html/book.sle.tuning/cha.tuning.taskscheduler.html

procfs is updated to pull in the enabling change:
https://github.com/prometheus/procfs/pull/186

Signed-off-by: Phil Frost <phil@postmates.com>
2019-07-10 09:16:24 +02:00
Advait Bhatwadekar 3f49b31101 Closes issue #261 on node_exporter. (#1403)
* Closes issue #261 on node_exporter.

Delegated mdstat parsing to procfs project. mdadm_linux.go now only exports the metrics.
-> Added disk labels: "fail", "spare", "active" to indicate disk status
-> hanged metric node_md_disks_total ==> node_md_disks_required
-> Removed test cases for mdadm_linux.go, as the functionality they tested for has been moved to procfs project.

Signed-off-by: Advait Bhatwadekar <advait123@ymail.com>
2019-07-01 11:56:06 +02:00
mknapphrt 3108a50fb6 Fix systemd restart counter label from state to name (#1393)
Signed-off-by: Mark Knapp <mknapp@hudson-trading.com>
2019-06-25 09:37:48 +02:00
Ben Kochie c39f6749fc
Bugfix release 0.18.1 (#1366)
Cherry-pick two bug fixes into 0.18.1.

Signed-off-by: Ben Kochie <superq@gmail.com>
2019-06-04 14:29:33 +02:00
Ben Kochie 4a15edf0b6
Add changelog entry for #1364
Signed-off-by: Ben Kochie <superq@gmail.com>
2019-06-03 11:20:06 +02:00
Ben Kochie fdf9846282 Fixup 0.17.0 changelog (#1354)
* Fix ordering of CHANGE items by PR number.
* Add missing CHANGE for #1003

Signed-off-by: Ben Kochie <superq@gmail.com>
2019-06-02 10:51:07 +01:00
Noam Meltzer 501ccf9fb4 Add --collector.netdev.device-whitelist flag (#1279)
* Add --collector.netdev.device-whitelist flag

Sometimes it is desired to monitor only one netdev. The golang regexp
does not support a negated regex, so the ignored-devices flag is too
cumbersome for this task.
This change introduces a new flag: accept-devices, which is mutually
exclusive to ignored-devices. This flag allows specifying ONLY the
netdev you'd like.

Signed-off-by: Noam Meltzer <noam@cynerio.co>
2019-05-31 17:55:50 +02:00
David O'Rourke 814ef064c0 meminfo: Fix the size mismatch in the swapTotal check mib for BSD. (#1345)
Signed-off-by: David O'Rourke <david.orourke@gmail.com>
2019-05-14 17:42:36 -05:00
Ben Kochie f97f01c46c
Update for 0.18.0 release (#1337)
* Update CHANGELOG for release.
* Bump VERSION.
* Update vendoring.

Signed-off-by: Ben Kochie <superq@gmail.com>
2019-05-09 13:19:12 -05:00