Commit graph

1069 commits

Author SHA1 Message Date
Sami Kerola 3762191e66 Add timex collector (#664)
This collector is based on adjtimex(2) system call.  The collector returns
three values, status if time is synchronised, offset to remote reference,
and local clock frequency adjustment.

Values are taken from kernel time keeping data structures to avoid getting
involved how the synchronisation is implemented.  By that I mean one should
not care if time is update using ntpd, systemd.timesyncd, ptpd, and so on.
Since all time sync implementation will always end up telling to kernel what
is the status with time one can simply omit the software in between, and
look results of the syncing.  As a positive side effect this makes collector
very quick and conceptually specific, this does not monitor availability of
NTP server, or network in between, or dns resolution, and other unrelated
but necessary things.

Minimum set of values to keep eye on are the following three:

    The node_timex_sync_status tells if local clock is in sync with a remote
    clock.  Value is set to zero when synchronisation to a reliable server
    is lost, or a time sync software is misconfigured.

    The node_timex_offset_seconds tells how much local clock is off when
    compared to reference.  In case of multiple time references this value
    is outcome of RFC 5905 adjustment algorithm.  Ideally offset should be
    close to zero, and it depends about use case how large value is
    acceptable.  For example a typical web server is probably fine if offset
    is about 0.1 or less, but that would not be good enough for mobile phone
    base station operator.

    The node_timex_freq tells amount of adjustment to local clock tick
    frequency.  For example if offset is one second and growing the local
    clock will need instruction to tick quicker.  Number value itself is not
    very important, and occasional small adjustments are fine.  When
    frequency is unusually in stable one can assume quality of time stamps
    will not be accurate to very far in sub second range.  Obviously
    explaining why local clock frequency behaves like a passenger in roller
    coaster is different matter.  Explanations can vary from system load, to
    environmental issues such as a machine being physically too hot.

Rest of the measurements can help when debugging.  If you run a clock server
do probably want to collect and keep track of everything.

Pull-request: https://github.com/prometheus/node_exporter/pull/664
2017-09-19 07:54:06 -07:00
Leonid Evdokimov c169b4b1c5 Add metrics from SNTPv4 packet to ntp collector & add ntpd sanity check (#655)
* Add metrics from SNTPv4 packet to ntp collector & add ntpd sanity check

1. Checking local clock against remote NTP daemon is bad idea, local
ntpd acting as a  client should do it better and avoid excessive load on
remote NTP server so the collector is refactored to query local NTP
server.

2. Checking local clock against remote one does not check local ntpd
itself. Local ntpd may be down or out of sync due to network issues, but
clock will be OK.

3. Checking NTP server using sanity of it's response is tricky and
depends on ntpd implementation, that's why common `node_ntp_sanity`
variable is exported.

* `govendor add golang.org/x/net/ipv4`, it is dependency of github.com/beevik/ntp

* Update github.com/beevik/ntp to include boring SNTP fix

* Use variable name from RFC5905

* ntp: move code to make export of raw metrics more explicit

* Move NTP math to `github.com/beevik/ntp`

* Make `golint` happy

* Add some brief docs explaining `ntp` #655 and `timex` #664 modules

* ntp: drop XXX comment that got its decision

* ntp: add `_seconds` suffix to relevant metrics

* Better `node_ntp_leap` comment

* s/node_ntp_reftime/node_ntp_reference_timestamp_seconds/ as requested by @discordianfish

* Extract subsystem name to const as suggested by @SuperQ
2017-09-19 10:36:14 +02:00
Karsten Weiss b0d5c00832 cpu: Metric 'package_throttles_total' is per package. (#657)
* cpu: Metric 'package_throttles_total' is per package.

'package_throttles_total' is per package, not per cpu. This also reduces
the total number of cpu time series a lot (esp for multi core cpus).

* cpu: Better handling of a cpulist edge-case.

* cpu: Extract the package number from the directory name.

Do not rely on the range index.

* cpu: Add package_throttle_count for node0 cpu1

This file must be ignored by the cpu collector.
2017-09-07 23:24:18 +02:00
Alexey Palazhchenko abb58a31e2 Test with Go 1.9.x (#667) 2017-08-31 18:00:55 +02:00
Matt Bostock 89a2f21f45 Always try to return smartmon_device_info metric (#663)
* Always try to return smartmon_device_info metric

Sometimes the 'model family' field is not returned by `smartctl' because
a disk is not in the disk database for the version of smartmontools
installed on the system.

In those cases, the device model and serial number is still returned (at
least as far as I have observed.

Re-work the logic to prefer the 'vendor' field first, and if not
present, always output a `smartmon_device_info` metric even if some
labels have empty values.

On the box I'm testing this on, where previously no metric was returned,
it now returns:

    # HELP smartmon_device_info SMART metric device_info
    # TYPE smartmon_device_info gauge
    smartmon_device_info{disk="/dev/sda",type="sat",model_family="",device_model="INTEL REDACTED",serial_number="REDACTED",firmware_version="REDACTED"} 1
    smartmon_device_info{disk="/dev/sdb",type="sat",model_family="",device_model="INTEL REDACTED",serial_number="REDACTED",firmware_version="REDACTED"} 1
    smartmon_device_info{disk="/dev/sdc",type="sat",model_family="",device_model="INTEL REDACTED",serial_number="REDACTED",firmware_version="REDACTED"} 1
    smartmon_device_info{disk="/dev/sdd",type="sat",model_family="",device_model="INTEL REDACTED",serial_number="REDACTED",firmware_version="REDACTED"} 1
    smartmon_device_info{disk="/dev/sde",type="sat",model_family="",device_model="INTEL REDACTED",serial_number="REDACTED",firmware_version="REDACTED"} 1
    smartmon_device_info{disk="/dev/sdf",type="sat",model_family="",device_model="INTEL REDACTED",serial_number="REDACTED",firmware_version="REDACTED"} 1

* Add trailing newline

Because POSIX:
https://stackoverflow.com/a/729795
2017-08-31 18:00:42 +02:00
Tobias Schmidt f9a2388c60 Merge pull request #662 from prometheus/bjk/buildkite
Add buildkite status badge.
2017-08-24 12:59:18 +02:00
Ben Kochie 9947f602f3 Add buildkite status badge. 2017-08-24 12:29:34 +02:00
Matthias Rampke d3e3a9c181 Only cross-test 32bit on Linux (#658)
This doesn't work on at least FreeBSD and Darwin. It does work on Linux,
only try it there.
2017-08-24 09:13:17 +02:00
Christian Will 2ed98fd5a5 define binary name in promu configuration file (#650) 2017-08-22 17:24:07 +02:00
Tobias Schmidt 505275b48c Merge pull request #652 from prometheus/mr/test-32
Automatically cross-test 32bit based on GOARCH
2017-08-22 00:10:04 +02:00
Tobias Schmidt ba6897583b Merge pull request #653 from prometheus/mr/fix-629
Use int64 throughout the ZFS collector.
2017-08-21 22:28:37 +02:00
Matthias Rampke 7420046383 Automatically cross-test 32bit based on GOARCH
Try to determine the corresponding 32bit architecture from the current
GOARCH and run the tests under that architecture. This only works on a
GOOS/GOARCH that can execute binaries for the smaller architecture, such
as running linux/386 binaries under linux/amd64.

I tested that this works under linux/amd64 and darwin/amd64, the rest of
the architectures is guesswork.

While we still only run regular tests on Intel/Linux architectures, this
covers general integer overflow issues like #629.
2017-08-21 17:27:25 +00:00
Matthias Rampke 5aa6819eb1 gofmt node_exporter_test 2017-08-21 16:45:42 +00:00
Matthias Rampke e1f129c729 Use int64 throughout the ZFS collector.
This avoids issues with integer overflows on 32-bit architectures. The
Prometheus data format is float64, so regardless of the architecture we
should handle large numbers.

Fixes #629.
2017-08-21 16:40:16 +00:00
Matthias Rampke 8661bbbb42 Merge pull request #651 from TheTincho/fix_integration_test_timing
Fix path and timing issues with integration tests.
2017-08-19 15:12:42 +02:00
Martín Ferrari 2cd49eb020 Fix path and timing issues with integration tests. 2017-08-19 11:37:57 +02:00
Ben Kochie 8839640cd1 Ignore wifi collector permission errors (#646)
Ignore the permission denined error when the wifi collector has no
permission to read metrics.
2017-08-18 10:19:48 +02:00
Ben Kochie b7cc6fbea7 Add additional field to github issue template. (#645)
* Add additional field to github issue template.

Request the command line flags to the exporter.

* Update version flag for kingpin.
2017-08-17 12:44:26 +02:00
Hemant Kumar de08e38c5e Add dockerfile for ppc64le (#638)
* Add dockerfile for ppc64le and related changes

* Pass the fill file as DOCKEFILE

* Add the dockerfile name to build msg
2017-08-17 11:53:04 +02:00
Joe Handzik 4b011bfe44 Clarify Infiniband collector support (#643)
Tested a DL360 Gen9 box with an Omni-Path adapter in it. The existing InfiniBand collector can provide support for the same metrics on Omni-Path cards as well.

Signed-Off-By: Joe Handzik <joseph.t.handzik@hpe.com>
2017-08-16 07:32:54 +02:00
Calle Pettersson dfe07eaae8 Switch to kingpin flags (#639)
* Switch to kingpin flags

* Fix logrus vendoring

* Fix flags in main tests

* Fix vendoring versions
2017-08-12 15:07:24 +02:00
Vojtech Galda 1467d845fb Status information in /proc/drbd (#630)
in version 8.4 deprecated (but won’t be removed)
2017-08-02 08:04:13 +02:00
Matthias Rampke 6506513be5 Merge pull request #626 from teohhanhui/patch-1
Fix Docker mountpoint prefix docs
2017-07-28 09:32:19 +02:00
Teoh Han Hui 0b1f64bb15 Fix Docker mountpoint prefix docs 2017-07-28 15:06:28 +08:00
Ben Kochie 46c31d8a7e Enable IPVS collector by default (#623)
* Silence error output when no IPVS present.
* Enable by default.
* Update end-to-end fixture.
* Update README.
2017-07-26 15:20:28 +02:00
Tobias Schmidt efe5f62717 Merge pull request #620 from prometheus/grobie/fix-meminfo-collector
Restrict build tags of collectors to supported operating systems
2017-07-20 15:25:47 -04:00
Tobias Schmidt 515b5a933d Fix build tags of loadavg collector
The collector is only implemented for a subset of all operating systems
supported by go. Compilation will fail if attempted for another OS
target.
2017-07-20 15:13:58 -04:00
Tobias Schmidt 016d79535d Fix build tags of meminfo collector
The meminfo collector only supports darwin, dragonfly, freebsd and linux
and must not be included in other archtictures.
2017-07-20 14:37:10 -04:00
Tobias Schmidt efc1ea14ba Ignore extracted sysfs fixture files from git 2017-07-20 14:36:48 -04:00
Andrea De Pasquale 1369763067 Change raid0 status line regexp for mdadm collector (#619) 2017-07-20 17:04:33 +02:00
Ben Kochie 971de21945 Minor tweak to GitHub issue template. 2017-07-20 10:57:07 +02:00
Tobias Schmidt 921319c7eb Merge pull request #583 from knweiss/golint
Golint fixes
2017-07-10 23:49:36 +02:00
Aleksey Zhukov 7a914e58f2 Add parsing /proc/net/snmp6 file for netstat-linux (#615)
* Add parsing /proc/net/snmp6 file

* add /proc/net/snmp6 fixture

* fix e2e test

* gofmt

* remove unuser variable

* safe checks

* add tests

* change help format
2017-07-08 20:16:35 +02:00
Jerome Froelich cb14fff6c6 [test] Call cmd.Start and cmd.Wait separately to avoid triggering race detector (#616)
* [test] Call cmd.Start and cmd.Wait separately to avoid triggering race detector

* [test] Enable race detector for tests
2017-07-08 20:15:40 +02:00
Matt Layher 6e82fd1c56 Add XFS block mapping and block map B-tree stats (#575) 2017-07-07 07:27:52 +02:00
fahlke a89d72b5eb Resolves prometheus/node_exporter#585 (#586)
* Resolves prometheus/node_exporter#585

* - removed 'docker rm' as it is not allowed on CircleCI
See discussion: https://discuss.circleci.com/t/docker-error-removing-intermediate-container/70
2017-07-07 07:26:11 +02:00
ideaship 8d90276283 Add bcache collector (#597)
* Add bcache collector for Linux

This collector gathers metrics related to the Linux block cache
(bcache) from sysfs.

* Removed commented out code

* Use project comment style

* Add _sectors to metric name to indicate unit

* Really use project comment style

* Rename bcache.go to bcache_linux.go

* Keep collector namespace clean

Rename:
- metric -> bcacheMetric
- periodStatsToMetrics -> bcachePeriodStatsToMetric

* Shorten slice initialization

* Change label names to backing_device, cache_device

* Remove five minute metrics (keep only total)

* Include units in additional metric names

* Enable bcache collector by default

* Provide metrics in seconds, not nanoseconds

* remove metrics with label "all"

* Add fixtures, update end-to-end for bcache collector

* Move fixtures/sys into tar.gz

This changeset moves the collector/fixtures/sys directory into
collector/fixtures/sys.tar.gz and tweaks the Makefile to unpack the
tarball before tests are run.

The reason for this change is that Windows does not allow colons in a
path (colons are present in some of the bcache fixture files), nor can
it (out of the box) deal with pathnames longer than 260 characters
(which we would be increasingly likely to hit if we tried to replace
colons with longer codes that are guaranteed not the turn up in regular
file names).

* Add ttar: plain text archive, replacement for tar

This changeset adds ttar, a plain text replacement for tar, and uses it
for the sysfs fixture archive. The syntax is loosely based on tar(1).

Using a plain text archive makes it possible to review changes without
downloading and extracting the archive. Also, when working on the repo,
git diff and git log become useful again, allowing a committer to verify
and track changes over time.

The code is written in bash, because bash is available out of the box on
all major flavors of Linux and on macOS. The feature set used is
restricted to bash version 3.2 because that is what Apple is still
shipping.

The programm also works on Windows if bash is installed. Obviously, it
does not solve the Windows limitations (path length limited to 260
characters, no symbolic links) that prompted the move to an archive
format in the first place.
2017-07-07 07:20:18 +02:00
Alexey Palazhchenko bba075710d Set Go import path on Travis CI (#612) 2017-07-06 14:12:22 +02:00
kadota kyohei a077024f51 add diskstats on Darwin (#593)
* Add diskstats collector for Darwin

* Update year in the header

* Update README.md

* Add github.com/lufia/iostat to vendored packages

* Change stats to follow naming guidelines

* Add a entry of github.com/lufia/iostat into vendor.json

* Remove /proc/diskstats from description
2017-07-06 13:51:24 +02:00
Tobias Schmidt ab3414e6fd Merge pull request #614 from percona/prometheus-master
Log response body if test fails.
2017-07-05 11:45:51 +02:00
Alexey Palazhchenko 4d294889da Log response body if test fails. 2017-07-05 10:51:27 +03:00
Alexey Palazhchenko 190d1347ba Use latest released Go 1.8 (#611) 2017-06-30 18:33:33 +02:00
Tobias Schmidt e27a1a1f0b Merge pull request #609 from rtreffer/fix-cpufreq-data
Fix cpufreq statistics by converting kHz to Hz
2017-06-27 23:51:14 +02:00
Rene Treffer 56bf8d4b2d Add link to kernel documentation for sysfs/cpufreq files 2017-06-27 11:25:06 +02:00
Rene Treffer bcc3cd92b8 Fix cpufreq statistics by converting kHz to Hz 2017-06-27 11:05:55 +02:00
Ben Kochie 182810056f Fix Linux cpu errors (#606)
Make the Linux cpu collector soft-error on missing `cpufreq` and
`thermal_throttle` features.
2017-06-20 07:51:26 +02:00
Rene Treffer be6291adde Add changelog entry for node_cpu metrics move (#603) 2017-06-19 10:28:53 +02:00
Ben Kochie f3a4afc059 Add go path help to build instructions (#601)
Add `go get` and source directory requirements to the build
instructions.
2017-06-15 09:32:45 +02:00
Rene Treffer 2e9f1913b8 Move stat_linux to cpu_linux and add cpufreq stats (#548) 2017-06-13 11:21:53 +02:00
Johannes 'fish' Ziemke 798950d25b Run node-exporter in Docker as nobody (#599) 2017-06-08 20:02:20 +02:00