* Move NodeCollector into package collector
* Refactor collector enabling
* Update README with new collector enabled flags
* Fix out-of-date inline flag reference syntax
* Use new flags in end-to-end tests
* Add flag to disable all default collectors
* Track if a flag has been set explicitly
* Add --collectors.disable-defaults to README
* Revert disable-defaults flag
* Shorten flags
* Fixup timex collector registration
* Fix end-to-end tests
* Change procfs and sysfs path flags
* Fix review comments
This collector is based on adjtimex(2) system call. The collector returns
three values, status if time is synchronised, offset to remote reference,
and local clock frequency adjustment.
Values are taken from kernel time keeping data structures to avoid getting
involved how the synchronisation is implemented. By that I mean one should
not care if time is update using ntpd, systemd.timesyncd, ptpd, and so on.
Since all time sync implementation will always end up telling to kernel what
is the status with time one can simply omit the software in between, and
look results of the syncing. As a positive side effect this makes collector
very quick and conceptually specific, this does not monitor availability of
NTP server, or network in between, or dns resolution, and other unrelated
but necessary things.
Minimum set of values to keep eye on are the following three:
The node_timex_sync_status tells if local clock is in sync with a remote
clock. Value is set to zero when synchronisation to a reliable server
is lost, or a time sync software is misconfigured.
The node_timex_offset_seconds tells how much local clock is off when
compared to reference. In case of multiple time references this value
is outcome of RFC 5905 adjustment algorithm. Ideally offset should be
close to zero, and it depends about use case how large value is
acceptable. For example a typical web server is probably fine if offset
is about 0.1 or less, but that would not be good enough for mobile phone
base station operator.
The node_timex_freq tells amount of adjustment to local clock tick
frequency. For example if offset is one second and growing the local
clock will need instruction to tick quicker. Number value itself is not
very important, and occasional small adjustments are fine. When
frequency is unusually in stable one can assume quality of time stamps
will not be accurate to very far in sub second range. Obviously
explaining why local clock frequency behaves like a passenger in roller
coaster is different matter. Explanations can vary from system load, to
environmental issues such as a machine being physically too hot.
Rest of the measurements can help when debugging. If you run a clock server
do probably want to collect and keep track of everything.
Pull-request: https://github.com/prometheus/node_exporter/pull/664
* Add metrics from SNTPv4 packet to ntp collector & add ntpd sanity check
1. Checking local clock against remote NTP daemon is bad idea, local
ntpd acting as a client should do it better and avoid excessive load on
remote NTP server so the collector is refactored to query local NTP
server.
2. Checking local clock against remote one does not check local ntpd
itself. Local ntpd may be down or out of sync due to network issues, but
clock will be OK.
3. Checking NTP server using sanity of it's response is tricky and
depends on ntpd implementation, that's why common `node_ntp_sanity`
variable is exported.
* `govendor add golang.org/x/net/ipv4`, it is dependency of github.com/beevik/ntp
* Update github.com/beevik/ntp to include boring SNTP fix
* Use variable name from RFC5905
* ntp: move code to make export of raw metrics more explicit
* Move NTP math to `github.com/beevik/ntp`
* Make `golint` happy
* Add some brief docs explaining `ntp` #655 and `timex` #664 modules
* ntp: drop XXX comment that got its decision
* ntp: add `_seconds` suffix to relevant metrics
* Better `node_ntp_leap` comment
* s/node_ntp_reftime/node_ntp_reference_timestamp_seconds/ as requested by @discordianfish
* Extract subsystem name to const as suggested by @SuperQ
Tested a DL360 Gen9 box with an Omni-Path adapter in it. The existing InfiniBand collector can provide support for the same metrics on Omni-Path cards as well.
Signed-Off-By: Joe Handzik <joseph.t.handzik@hpe.com>
* Add bcache collector for Linux
This collector gathers metrics related to the Linux block cache
(bcache) from sysfs.
* Removed commented out code
* Use project comment style
* Add _sectors to metric name to indicate unit
* Really use project comment style
* Rename bcache.go to bcache_linux.go
* Keep collector namespace clean
Rename:
- metric -> bcacheMetric
- periodStatsToMetrics -> bcachePeriodStatsToMetric
* Shorten slice initialization
* Change label names to backing_device, cache_device
* Remove five minute metrics (keep only total)
* Include units in additional metric names
* Enable bcache collector by default
* Provide metrics in seconds, not nanoseconds
* remove metrics with label "all"
* Add fixtures, update end-to-end for bcache collector
* Move fixtures/sys into tar.gz
This changeset moves the collector/fixtures/sys directory into
collector/fixtures/sys.tar.gz and tweaks the Makefile to unpack the
tarball before tests are run.
The reason for this change is that Windows does not allow colons in a
path (colons are present in some of the bcache fixture files), nor can
it (out of the box) deal with pathnames longer than 260 characters
(which we would be increasingly likely to hit if we tried to replace
colons with longer codes that are guaranteed not the turn up in regular
file names).
* Add ttar: plain text archive, replacement for tar
This changeset adds ttar, a plain text replacement for tar, and uses it
for the sysfs fixture archive. The syntax is loosely based on tar(1).
Using a plain text archive makes it possible to review changes without
downloading and extracting the archive. Also, when working on the repo,
git diff and git log become useful again, allowing a committer to verify
and track changes over time.
The code is written in bash, because bash is available out of the box on
all major flavors of Linux and on macOS. The feature set used is
restricted to bash version 3.2 because that is what Apple is still
shipping.
The programm also works on Windows if bash is installed. Obviously, it
does not solve the Windows limitations (path length limited to 260
characters, no symbolic links) that prompted the move to an archive
format in the first place.
* Add diskstats collector for Darwin
* Update year in the header
* Update README.md
* Add github.com/lufia/iostat to vendored packages
* Change stats to follow naming guidelines
* Add a entry of github.com/lufia/iostat into vendor.json
* Remove /proc/diskstats from description
* Add qdisc collector for Linux
This collector gathers basic queueing discipline metrics via netlink,
similarly to what `tc -s qdisc show` does.
* qdisc collector: nl-specific code moved, names fixed
- netlink-specific parts moved to github.com/ema/qdisc
- avoid using shortened names
- counters renamed into XXX_total
* Get rid of parseMessage error checking leftover
* Add github.com/ema/qdisc to vendored packages
* Update help texts and comments
* Add qdisc collector to README file
* qdisc collector end-to-end testing
* Update qdisc dependency to latest version
Update github.com/ema/qdisc dependency to revision 2c7e72d, which
includes unit testing.
* qdisc collector: rename "iface" label into "device"
* Implement commonalities and linux support for ARP collection
* Add ARP collector to fixtures and run as part of e2e tests
* Bubble up scanner errors
* Use single return values where it makes sense
* Add missing annotation
* Move arp_common into arp_linux
* Add license header to arp_linux.go
* Address initial feedback
* Use strings.Fields instead of strings.Split
* Deal with scanner.Err() rather than throwing away errors
* Check for scan errors in-line before interacting with the entries map
* Don't interact with potentially empty text from scan
* Check for scan errors outside the scan loop
* Add comment about moving procfs parsing
* Add more direct comment
* Update initialism style to match go style guide
* Put function args on the same line
* Add TODO in front of comment about procfs extraction
* Guard against strings.Fields returning an empty slice
* Be more defensive about ARP table format and use upcase more broadly
* Enable the ARP collector by default
* Add ARP collector to the README
* Remove 'entry'
This patch makes stylistic changes to error strings, unexports method names by lower casing them, removes unused dataSetMetric, and adds copyright/licence information.
Signed-Off-By: Corey Stewart <stewa169@purdue.edu>
This collector exposes most of the useful information that can be found
in /proc/drbd. Sizes are normalised to be in bytes, as /proc/drbd uses
kibibytes.
This change adds a new collector called "nfs" that parses the contents
of /proc/net/rpc/nfs and turns it into metrics. It can be used to
inspect the number of operations per type, but also to keep an eye on an
extraneous number of retransmissions, which may indicate connectivity
issues.
I've picked the name "nfs", as most operating systems use "nfs" for the
client component and "nfsd" as the server component. If we want to add
stats for the NFS server as well, we'd better call such a collector
"nfsd".
* Add hwmon support (mainly known from lm-sensors)
This commit adds initial support for linux hardware sensors, exported
through sysfs.
Details of the interface can be found at
https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface
* Add end-to-end test with some real life data
* Cleanup comments on hwmon collector
* Drop raw sensor name from hwmon output
* Let the sensor label be "sensor"
* Add hwmon short description to README.