Commit graph

68 commits

Author SHA1 Message Date
Matthias Rampke b133213c7a Report non-fatal collection errors in the exporter metric. (#1439)
As per prometheus/client_golang#543, pass the Registry for exporter
metrics when setting up the /metrics HTTP handler.

With this, the `promhttp_metric_handler_errors_total` metric will
increment on (possibly non-fatal) collection-time errors, such as
duplicate metrics from text files.

Signed-off-by: Matthias Rampke <mr@soundcloud.com>
2019-07-28 10:37:10 +02:00
Ben Kochie ffefc8e74d Add a limit to the number of in-flight requests (#1166)
In order to avoid stuck collectors using up all system resources, add a
limit to the number of parallel in-flight scrape requests. This will
return a 503 error.

Default to 40 requests, this seems like a reasonable number based on:
* Two Prometheus servers scraping every 15 seconds.
* Failing scrapes after 5 minutes of stuckness.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-20 18:11:40 +01:00
beorn7 cd2331a185 Add --web.disable-exporter-metrics flag
If this flag is set, the metrics about the exporter itself (go_*,
process_*, promhttp_*) will be excluded from /metrics.

The Kingpin way of handling boolean flags makes the negative flag
wording (_dis_able) the most reasonably one.

This also refactors the flow in node_exporter.go quite a bit to avoid
mixing up the global and a local registry and to avoid re-creating a
registry even if no filtering is requested.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2018-11-13 14:22:25 +01:00
Brian Brazil be9d82b66e
Sort collector names in startup logs (#857)
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-03-29 13:42:44 +01:00
Ben Kochie d33a447047
Remove deprecated prometheus.InstrumentHandlerFunc (#831)
Update Prometheus client golang use to use `promhttp.Handler()` instead
of `prometheus.InstrumentHandlerFunc()`.
2018-02-19 15:44:59 +01:00
Siavash Safi f3a7022602 Add collect[] parameter (#699)
* Add `collect[]` parameter

* Add TODo comment about staticcheck ignored

* Restore promhttp.HandlerOpts

* Log a warning and return HTTP error instead of failing

* Check collector existence and status, cleanups

* Fix warnings and error messages

* Don't panic, return error if collector registration failed

* Update README
2017-10-14 14:23:42 +02:00
Calle Pettersson 859a825bb8 Replace --collectors.enabled with per-collector flags (#640)
* Move NodeCollector into package collector

* Refactor collector enabling

* Update README with new collector enabled flags

* Fix out-of-date inline flag reference syntax

* Use new flags in end-to-end tests

* Add flag to disable all default collectors

* Track if a flag has been set explicitly

* Add --collectors.disable-defaults to README

* Revert disable-defaults flag

* Shorten flags

* Fixup timex collector registration

* Fix end-to-end tests

* Change procfs and sysfs path flags

* Fix review comments
2017-09-28 15:06:26 +02:00
Sami Kerola 3762191e66 Add timex collector (#664)
This collector is based on adjtimex(2) system call.  The collector returns
three values, status if time is synchronised, offset to remote reference,
and local clock frequency adjustment.

Values are taken from kernel time keeping data structures to avoid getting
involved how the synchronisation is implemented.  By that I mean one should
not care if time is update using ntpd, systemd.timesyncd, ptpd, and so on.
Since all time sync implementation will always end up telling to kernel what
is the status with time one can simply omit the software in between, and
look results of the syncing.  As a positive side effect this makes collector
very quick and conceptually specific, this does not monitor availability of
NTP server, or network in between, or dns resolution, and other unrelated
but necessary things.

Minimum set of values to keep eye on are the following three:

    The node_timex_sync_status tells if local clock is in sync with a remote
    clock.  Value is set to zero when synchronisation to a reliable server
    is lost, or a time sync software is misconfigured.

    The node_timex_offset_seconds tells how much local clock is off when
    compared to reference.  In case of multiple time references this value
    is outcome of RFC 5905 adjustment algorithm.  Ideally offset should be
    close to zero, and it depends about use case how large value is
    acceptable.  For example a typical web server is probably fine if offset
    is about 0.1 or less, but that would not be good enough for mobile phone
    base station operator.

    The node_timex_freq tells amount of adjustment to local clock tick
    frequency.  For example if offset is one second and growing the local
    clock will need instruction to tick quicker.  Number value itself is not
    very important, and occasional small adjustments are fine.  When
    frequency is unusually in stable one can assume quality of time stamps
    will not be accurate to very far in sub second range.  Obviously
    explaining why local clock frequency behaves like a passenger in roller
    coaster is different matter.  Explanations can vary from system load, to
    environmental issues such as a machine being physically too hot.

Rest of the measurements can help when debugging.  If you run a clock server
do probably want to collect and keep track of everything.

Pull-request: https://github.com/prometheus/node_exporter/pull/664
2017-09-19 07:54:06 -07:00
Calle Pettersson dfe07eaae8 Switch to kingpin flags (#639)
* Switch to kingpin flags

* Fix logrus vendoring

* Fix flags in main tests

* Fix vendoring versions
2017-08-12 15:07:24 +02:00
Ben Kochie 46c31d8a7e Enable IPVS collector by default (#623)
* Silence error output when no IPVS present.
* Enable by default.
* Update end-to-end fixture.
* Update README.
2017-07-26 15:20:28 +02:00
ideaship 8d90276283 Add bcache collector (#597)
* Add bcache collector for Linux

This collector gathers metrics related to the Linux block cache
(bcache) from sysfs.

* Removed commented out code

* Use project comment style

* Add _sectors to metric name to indicate unit

* Really use project comment style

* Rename bcache.go to bcache_linux.go

* Keep collector namespace clean

Rename:
- metric -> bcacheMetric
- periodStatsToMetrics -> bcachePeriodStatsToMetric

* Shorten slice initialization

* Change label names to backing_device, cache_device

* Remove five minute metrics (keep only total)

* Include units in additional metric names

* Enable bcache collector by default

* Provide metrics in seconds, not nanoseconds

* remove metrics with label "all"

* Add fixtures, update end-to-end for bcache collector

* Move fixtures/sys into tar.gz

This changeset moves the collector/fixtures/sys directory into
collector/fixtures/sys.tar.gz and tweaks the Makefile to unpack the
tarball before tests are run.

The reason for this change is that Windows does not allow colons in a
path (colons are present in some of the bcache fixture files), nor can
it (out of the box) deal with pathnames longer than 260 characters
(which we would be increasingly likely to hit if we tried to replace
colons with longer codes that are guaranteed not the turn up in regular
file names).

* Add ttar: plain text archive, replacement for tar

This changeset adds ttar, a plain text replacement for tar, and uses it
for the sysfs fixture archive. The syntax is loosely based on tar(1).

Using a plain text archive makes it possible to review changes without
downloading and extracting the archive. Also, when working on the repo,
git diff and git log become useful again, allowing a committer to verify
and track changes over time.

The code is written in bash, because bash is available out of the box on
all major flavors of Linux and on macOS. The feature set used is
restricted to bash version 3.2 because that is what Apple is still
shipping.

The programm also works on Windows if bash is installed. Obviously, it
does not solve the Windows limitations (path length limited to 260
characters, no symbolic links) that prompted the move to an archive
format in the first place.
2017-07-07 07:20:18 +02:00
Matt Layher 1feb091b36
Initial XFS collector 2017-04-22 11:53:07 -04:00
Sam Kottler 6eafa51fa8 Add ARP collector for Linux (#540)
* Implement commonalities and linux support for ARP collection

* Add ARP collector to fixtures and run as part of e2e tests

* Bubble up scanner errors

* Use single return values where it makes sense

* Add missing annotation

* Move arp_common into arp_linux

* Add license header to arp_linux.go

* Address initial feedback

* Use strings.Fields instead of strings.Split

* Deal with scanner.Err() rather than throwing away errors

* Check for scan errors in-line before interacting with the entries map

* Don't interact with potentially empty text from scan

* Check for scan errors outside the scan loop

* Add comment about moving procfs parsing

* Add more direct comment

* Update initialism style to match go style guide

* Put function args on the same line

* Add TODO in front of comment about procfs extraction

* Guard against strings.Fields returning an empty slice

* Be more defensive about ARP table format and use upcase more broadly

* Enable the ARP collector by default

* Add ARP collector to the README

* Remove 'entry'
2017-04-11 17:45:19 +02:00
Brian Brazil a02e469b07 Report collector success/failure and duration per scrape. (#516)
This is in line with best practices, and also saves us
63 timeseries on a default Linux setup.
2017-03-16 17:21:00 +00:00
Tobias Schmidt dace41e3d4 Continue scrape with duplicated metrics
Problems of a single collector, like duplicated metrics read via the
textfile collector, should not fail the collection and export of other
metrics.
2017-03-14 00:38:02 -03:00
Derek Marcotte 5c28ab044d Add BSD exec statistics collector (#457)
* First pass of a sysctl_bsd source, exec_bsd + exec metrics

* Incorportate PR feedback, including removing pre-build descriptions, unit conversion callback.

* Remove redundant cached_description field, per PR feedback

* Incorporate PR feedback
2017-02-28 17:23:10 -04:00
Tobias Schmidt c703435790 Fix all open go lint and vet issues 2017-02-28 13:05:38 -04:00
Robert Clark b0c9133cba Enable InfiniBand by default
Signed-Off-By: Robert Clark <robert.d.clark@hpe.com>
2017-02-07 11:09:08 -06:00
Matt Layher ba635842fc Add wifi collector to default collectors (#447) 2017-02-04 07:44:01 +00:00
Ben Kochie b4fa10ca9d Add collector for Linux EDAC
Collect "Error detection and correction" metrics from memory
controllers.
* Supported on Linux only.
* Add basic fixtures.
* Enabled by default.
2017-01-10 10:14:19 +01:00
Christian Schwarz c95bfa705e Enable ZFS exporter by default and update README. 2017-01-08 10:23:58 -06:00
Johannes 'fish' Ziemke 3983cd58ff Use promhttp and setup logger 2017-01-05 19:30:48 +01:00
Rene Treffer 081ecc5db0 Add hwmon /sensors support (#278)
* Add hwmon support (mainly known from lm-sensors)

This commit adds initial support for linux hardware sensors, exported
through sysfs.

Details of the interface can be found at
https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface

* Add end-to-end test with some real life data

* Cleanup comments on hwmon collector

* Drop raw sensor name from hwmon output

* Let the sensor label be "sensor"

* Add hwmon short description to README.
2016-10-06 16:33:24 +01:00
Steve Durrheimer 60cbc9efc0
Make version informations consistent between prometheus components
This also fixes #231 by adding the '-version' flag
2016-05-04 08:43:33 +02:00
Christian Schwarz 9a189b903e Add FreeBSD 'cpu' exporter to default collectors.
As of `1fc84e2fb69ee3d1f063399b00a6284fc8e27cb8` it does not require root anymore.
2016-02-18 12:15:08 +01:00
Tobias Schmidt 3a96e6881b Remove unused flag -debug.memprofile-file
The option to write out a memory profile to file was removed in a730cff.
Declaring flags as local variable does not only result in cleaner, more
testable code, but also ensures that the program won't compile anymore
when unused flags are left in place.
2016-02-04 20:24:16 -05:00
Richard Hartmann aee580d8d8 Introduce entropy collector for Linux 2016-01-13 18:29:52 +01:00
Florian Koch 5d5346af8a Add vmstat collector, enabled per default 2016-01-11 07:58:30 +01:00
Caskey L. Dickson ab9ee574fb Build cleanly under windows.
Removes unused signal handlers left over from signal based collection
and block the non windows-relevant collectors loadavg and interrupts.

Signal based collection removed in 1c17481a42.
2016-01-07 17:59:16 -08:00
Daniel Bechler fc3931c924 Add build_info metric similar to the one of Prometheus itself 2016-01-06 23:54:33 +01:00
Brian Brazil a82b4c30cb Add linux conntrack collector. 2015-12-20 00:57:52 +00:00
Pavel Borzenkov 46527808aa Filter list of collectors enabled by default
Enabled by default collectors are chosen for Linux, which supports all
of the implemented collectors. But for other OSes (OS X, for example)
this list is not suitable, because they lack most of those collectors.

Because of that, it is not possible to run node_exporter with default
options on such OSes. Fix this by filtering list of enabled by default
collectors based on their availability for current platform.

Closes #149

Signed-off-by: Pavel Borzenkov <pavel.borzenkov@gmail.com>
2015-11-13 10:42:10 +03:00
Travis Truman 78cc741277 Closes #100 by removing support for HTTP basic auth 2015-11-05 09:20:01 -05:00
Nick Owens eb79937340 switch to github.com/prometheus/common/log for logging 2015-10-30 13:20:06 -07:00
Tobias Schmidt 7e2b65f942 Clean up lint errors 2015-10-16 18:53:44 -04:00
Matthias Rampke 2d0d72b97d Add license headers to all code files. 2015-09-26 17:44:39 +02:00
Jonas Große Sundrup 9f2aa24e12 Add collector for metrics of linux software raids 2015-09-11 18:36:39 +02:00
Julius Volz 7b39ccc144 Add Linux uname collector.
This creates a single metric like:

node_uname_info{domainname="(none)",machine="x86_64",nodename="desktop",release="3.16.0-48-generic",sysname="Linux",version="#64~14.04.1-Ubuntu SMP Thu Aug 20 23:03:57 UTC 2015"} 1
2015-09-11 14:32:18 +02:00
Ken Herner 7569c6ce23 Initial implementation of file-nr
Fixed file-nr update function

Fixed file-nr test case

Fixed file-nr test case again

Fixed file-nr separator to tab

Updated file-nr to filenr.

Updated file-nr to filenr.

Fixed file-nr test cases, added comments

Remove reporting the second value from file-nr as it will alwasy be zero in linux 2.6 and greator

Renaming file-nr to filefd

Updated build constraint

Updates and code cleanup for filefd.

Updated enabledCollectors with the correct name for filefd

Fixed filefd test wording
2015-09-10 10:27:58 -04:00
Ken Herner 356e1bb866 Added sockstat test file
initial work on sockstat work

Fixed package name

Finished implementation of the sockstat plugin

missed a return value

Added sockstat to default plugins to start

Fixed scanner read on sockstat

fixed sockstat linux test for TCP alloc

update sockstat test case

Updated sockstat to return TCP and UDP memory in bytes instead of page count
2015-09-09 10:48:17 -04:00
Julius Volz e606744068 Make logging of collector executions less verbose.
This fixes https://github.com/prometheus/node_exporter/issues/86
2015-06-22 13:32:31 +02:00
Julius Volz e65bc868fc Switch logging from glog to github.com/prometheus/log. 2015-05-28 21:34:02 +02:00
Johannes 'fish' Ziemke 665b05eedc Use flags instead of config and remove attributes 2015-05-21 11:36:56 +02:00
Matthias Rampke 57a6701dc9 Sort collector names.
This makes the `-collectors.print` output easier to read.
2015-05-12 15:04:08 +00:00
Tobias Schmidt 626900fe21 Log version at startup 2015-04-16 00:02:08 -04:00
Julius Volz a730cff002 Remove memprofile file, add pprof HTTP endpoint instead. 2015-03-05 17:08:59 +01:00
Julius Volz 0283fc9ab8 Make flag names consistent across projects. 2015-02-09 00:23:56 +01:00
Brian Brazil 352cde6d20 Add text file exporter
This allows static metrics (e.g. an attributes collector replacement),
and cronjobs to expose stats by echoing into a file.

For example:

echo "my_metric 123" > mycronjob.prom.$$
mv mycronjob.prom.$$ mycronjob.prom
2015-01-25 16:25:25 +00:00
Brian Brazil b50da59ee6 Add simple home page to node exporter. 2015-01-09 14:36:49 +00:00
Johannes 'fish' Ziemke da28c460c8 Make config optional 2014-12-18 12:25:02 +01:00