The Linux CPU idle stat can also jump backwards slightly in some cases.
Allow the jump back up to 3 seconds before we attempt to reset the CPU
counter cache.
Fixes: https://github.com/prometheus/node_exporter/issues/1903
Signed-off-by: Ben Kochie <superq@gmail.com>
Fix the error logging of the promhttp handler by connecting it to the
promlog setup.
* Switch to go-kit/log.
* Cleanup CHANGELOG.
Fixes: https://github.com/prometheus/node_exporter/issues/1886
Signed-off-by: Ben Kochie <superq@gmail.com>
We've gathered enough evidence that the CPU counter bug workaround is
working as intended. Downgrade the message from Warning to Debug.
Signed-off-by: Ben Kochie <superq@gmail.com>
* Expose cpu bugs and flags as info metrics with a regexp filter.
* Automatically enable CPU info metrics when using flags or bugs feature.
Signed-off-by: domgoer <domdoumc@gmail.com>
The `node_cpu_info` metric contains some information like the `model`
(which is an integer), but not the human readable model name. Also the
stepping of the processor might be interesting, since different stepping
of a processor might behave differently.
Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
Minor change to match naming convention in other collectors.
Initialize the proc or sys FS instance once while initializing
each collector instead of re-creating for each metric update.
Signed-off-by: Paul Gier <pgier@redhat.com>
The cpu frequency information is not always needed and/or available.
This change allows the cpu frequency metrics to be enabled/disabled
separately from the other cpu metrics, and also prevents a frequency
metric failure (such as a parse error) from failing the main cpu
collector.
Fixes#1241
Signed-off-by: Paul Gier <pgier@redhat.com>
* Update Linux cpufreq collector to use new procfs library functions.
* Split thermal throttle collection to a separate function.
* Add new required fixtures and repack ttar file.
Signed-off-by: Ben Kochie <superq@gmail.com>
* cpu: Add a 2nd label 'package' to metric node_cpu_core_throttles_total
This commit fixes the node_cpu_core_throttles_total metrics on
multi-socket systems as the core_ids are the same for each package.
I.e. we need to count them seperately.
Rename the node_package_throttles_total metric label `node` to `package`.
Reorganize the sys.ttar archive and use the same symlinks as the Linux
kernel. Also, the new fixtures now use a dual-socket dual-core cpu w/o
HT/SMT (node0: cpu0+1, node1: cpu2+3) as well as processor-less
(memory-only) NUMA node 'node2' (this is a very rare case).
Signed-off-by: Karsten Weiss <knweiss@gmail.com>
* cpu: Use the direct /sys path to the cpu files.
Use the direct path /sys/devices/system/cpu/cpu[0-9]* (without symlinks)
instead of /sys/bus/cpu/devices/cpu[0-9]*.
The latter path also does not exist e.g. on RHEL 6.9's kernel.
Signed-off-by: Karsten Weiss <knweiss@gmail.com>
* cpu: Reverse core+package throttle processing order
Signed-off-by: Karsten Weiss <knweiss@gmail.com>
* cpu: Add documentation URLs
Signed-off-by: Karsten Weiss <knweiss@gmail.com>
* Only report core throttles per core, not per cpu
* Add topology/core_id to the cpu sysfs fixtures
* Add new cpu fixtures to ttar file
* Merge core_id reading and thermal throttle accounting
* Declare core_id
* Unify CPU collector conventions
Add a common CPU metric description.
* All collectors use the same `nodeCpuSecondsDesc`.
* All collectors drop the `cpu` prefix for `cpu` label values.
* Fix subsystem string in cpu_freebsd.
* Fix Linux CPU freq label names.
* Improve stat linux metric names.
cpu is no longer used.
* node_cpu -> node_cpu_seconds_total for Linux
* Improve filesystem metric names with units
* Improve units and names of linux disk stats
Remove sector metrics, the bytes metrics cover those already.
* Infiniband counters should end in _total
* Improve timex metric names, convert to more normal units.
See
3c073991eb/kernel/time/ntp.c (L909)
for what stabil means, looks like a moving average of some form.
* Update test fixture
* For meminfo metrics that had "kB" units, add _bytes
* Interrupts counter should have _total
Linux "guest" metrics for VMs are already accounted for in node_cpu
`user` and `nice` metrics. Separate these into their own metric to
avoid duplication of data.
* cpu: Support processor-less (memory-only) NUMA nodes
Processor-less (memory-only) NUMA nodes exist e.g. in systems that use
Intel Optane drives for RAM expansion using Intel Memory Drive
Technology (IMDT).
IMDT RAM expansion supports two modes:
* "Unify Remote Memory domains": present a processor-less (memory-only)
NUMA domain, which is the default
* "Expand local memory domains": to expand each processor’s memory domain
with a portion of the memory made available by Optane and IMDT
This commit fixes a crash in the first case (when "cpulist" is empty).
Here's an example of such a system:
$ numastat -m|head -n5
Per-node system memory usage (in MBs):
Node 0 Node 1 Node 2 Total
--------------- --------------- --------------- ---------------
MemTotal 118239.56 130816.00 464384.00 713439.56
$ for i in {0..2}; do echo -n "$i: " ; cat /sys/bus/node/devices/node$i/cpulist ; done
0: 0-7,16-23
1: 8-15,24-31
2:
$ /opt/vsmp/bin/vsmpversion -vvv
Memory Drive Technology: 8.2.1455.74 (Sep 28 2017 13:09:59)
System configuration:
Boards: 3
1 x Proc. + I/O + Memory
2 x NVM devices (Intel SSDPED1K375GAQ)
Processors: 2, Cores: 16, Threads: 32
Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz Stepping 01
Memory (MB): 713472 (of 977450), Cache: 251416, Private: 12562
1 x 249088MB [262036/ 678/12270]
1 x 232192MB [357707/125369/ 146] 82:00.0#1
1 x 232192MB [357707/125369/ 146] 83:00.0#1
* cpu: rename some variables (pkg => node)
* cpu: Use %v not %q in log.Debugf() format strings
* Move NodeCollector into package collector
* Refactor collector enabling
* Update README with new collector enabled flags
* Fix out-of-date inline flag reference syntax
* Use new flags in end-to-end tests
* Add flag to disable all default collectors
* Track if a flag has been set explicitly
* Add --collectors.disable-defaults to README
* Revert disable-defaults flag
* Shorten flags
* Fixup timex collector registration
* Fix end-to-end tests
* Change procfs and sysfs path flags
* Fix review comments
* cpu: Metric 'package_throttles_total' is per package.
'package_throttles_total' is per package, not per cpu. This also reduces
the total number of cpu time series a lot (esp for multi core cpus).
* cpu: Better handling of a cpulist edge-case.
* cpu: Extract the package number from the directory name.
Do not rely on the range index.
* cpu: Add package_throttle_count for node0 cpu1
This file must be ignored by the cpu collector.