Commit graph

51 commits

Author SHA1 Message Date
Ben Kochie 090957658e
Update logging (#3097)
Some checks failed
golangci-lint / lint (push) Has been cancelled
Switch from promlog/go-kit to promslog/slog for logging.
* Update Go build to 1.23.

Signed-off-by: Ben Kochie <superq@gmail.com>
2024-09-11 10:51:28 +02:00
Ben Kochie acb36765b4
Update build (#3000)
* Update Go to 1.22.
* Update Go modules.
* Use new version collector.
* Use standard library slices package.

Signed-off-by: Ben Kochie <superq@gmail.com>
2024-04-20 12:32:49 +02:00
Simon Pasquier 12f1744e79
Fix debug log in cpu collector (#2857)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2023-11-24 16:37:27 +01:00
John Kordich e120d958f5 Change log message from Warn to Debug
Signed-off-by: John Kordich <jkordich@gmail.com>

Co-authored-by: Ben Kochie <superq@gmail.com>
Signed-off-by: John Kordich <jkordich@gmail.com>
2023-08-20 13:38:47 +02:00
John Kordich 933b1c1797 Add new node_cpu_frequency_hertz metric
Revert changes to node_cpu_info and add new node_cpu_frequency_hertz
metric for measuring CPU frequency from /proc/cpuinfo

Signed-off-by: John Kordich <jkordich@gmail.com>
2023-08-20 13:38:47 +02:00
John Kordich 223ebbd50c Add CPU MHz as the value for "node_cpu_info" metric
For CPUs which don't have an available (or insertable) cpufreq driver,
the /proc/cpuinfo file can sometimes have accurate CPU core frequency
measurements. This change replaces the constant value of "1" for the
"node_cpu_info" metric with the parsed CPU MHz value from
/proc/cpuinfo for each core.

Signed-off-by: John Kordich <jkordich@gmail.com>
2023-08-20 13:38:47 +02:00
Ben Kochie c23b76bfbb
Update exporter-toolkit
* Bump exporter-toolkit to the latest release.
* Use new toolkit landing page function.
* Update kingpin flags.

Signed-off-by: Ben Kochie <superq@gmail.com>
2023-03-07 15:18:38 +01:00
Haoyu Sun 37d49746bc Remove metrics of offline CPUs in CPU collector
Signed-off-by: Haoyu Sun <hasun@redhat.com>
2023-03-07 14:01:02 +01:00
Jia Xin 39b4556b5b fix cpustat when some cpus are offline
Signed-off-by: Jia Xin <alexjx@gmail.com>
2023-01-20 01:24:06 +00:00
david c2085cf8ca flip branches for early return
Signed-off-by: david <davidventura27@gmail.com>
2022-07-26 11:21:08 +02:00
david 75c05f3d97 remove error from signature; update doc for function
Signed-off-by: david <davidventura27@gmail.com>
2022-07-26 11:21:08 +02:00
david 840d32622f check for nil isolatedCpus before calling updateIsolated
Signed-off-by: david <davidventura27@gmail.com>
2022-07-26 11:21:08 +02:00
david 5340d1ec37 add debug log for not existent file
Signed-off-by: david <davidventura27@gmail.com>
2022-07-26 11:21:08 +02:00
david c05af934af warn if isolcpus cannot be read and default to an empty slice
Signed-off-by: david <davidventura27@gmail.com>
2022-07-26 11:21:08 +02:00
david 9ea9a5f029 only publish metrics for isolated cpus
Signed-off-by: david <davidventura27@gmail.com>
2022-07-26 11:21:08 +02:00
david 5d68d5b9ad move logic to procfs; create a new metric for isolation
Signed-off-by: david <davidventura27@gmail.com>
2022-07-26 11:21:08 +02:00
david 512e086dec Implement #2250: Add "isolated" label on cpu collector on linux
Signed-off-by: david <davidventura27@gmail.com>
2022-07-26 11:21:08 +02:00
Park Beomsu c861ba93aa
Remove redundant nil check (#2206)
Signed-off-by: computerphilosopher <bspark@jam2in.com>
2021-11-15 11:23:49 +01:00
Julien Pivotto 68a6c78c0d
Update go to 1.17 (#2159)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-10-03 13:35:24 +02:00
Sergei Semenchuk 5de46c6bac
collect flag_info and bug_info only for one core (#2156)
Signed-off-by: binjip978 <binjip978@gmail.com>
2021-09-28 07:44:03 +02:00
Ben Kochie 84b36c4fd8
Add flag to disable guest CPU metrics
In high scale virtualized / cloud environments there are typically
no guest VMs. Add a boolean flag to allow disabling the Linux guest
CPU metrics.

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-08-17 13:04:46 +02:00
Ben Kochie 73c9a10d37
Handle small backwards jumps in CPU idle
The Linux CPU idle stat can also jump backwards slightly in some cases.
Allow the jump back up to 3 seconds before we attempt to reset the CPU
counter cache.

Fixes: https://github.com/prometheus/node_exporter/issues/1903

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-07-07 12:24:46 +02:00
Ben Kochie 3bc9a93c20
Add ErrorLog plumbing to promhttp
Fix the error logging of the promhttp handler by connecting it to the
promlog setup.
* Switch to go-kit/log.
* Cleanup CHANGELOG.

Fixes: https://github.com/prometheus/node_exporter/issues/1886

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-06-03 10:47:41 +02:00
Ben Kochie 306a365377 Downgrade CPU counter warnings
We've gathered enough evidence that the CPU counter bug workaround is
working as intended. Downgrade the message from Warning to Debug.

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-10-01 12:41:15 +02:00
Julius Volz d05aac43e4 Fix capitalization of CPU acronym throughout
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2020-09-03 23:34:33 +02:00
domchan 503e4fc848
Expose cpu bugs and flags as info metrics. (#1788)
* Expose cpu bugs and flags as info metrics with a regexp filter.
* Automatically enable CPU info metrics when using flags or bugs feature.

Signed-off-by: domgoer <domdoumc@gmail.com>
2020-07-17 18:32:23 +02:00
Ben Kochie 3565316d7e
Linux CPU: Cache CPU metrics
Cache CPU metrics to avoid counters (ie iowait) jumping backwards.

Fixes: https://github.com/prometheus/node_exporter/issues/1686

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-05-24 16:31:26 +02:00
Benjamin Drung 34d50e15d5 Add model_name and stepping to node_cpu_info metric
The `node_cpu_info` metric contains some information like the `model`
(which is an integer), but not the human readable model name. Also the
stepping of the processor might be interesting, since different stepping
of a processor might behave differently.

Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
2020-03-20 17:27:11 +01:00
Julian Kornberger cfcaeee145
Use strconv.Itoa() instead of fmt.Sprintf() (#1566)
Signed-off-by: Julian Kornberger <jk+github@digineo.de>
2020-02-19 14:34:05 +01:00
Ben Ye 2477c5c67d switch to go-kit/log (#1575)
Signed-off-by: yeya24 <yb532204897@gmail.com>
2019-12-31 17:19:37 +01:00
Julian Kornberger 043fecbfd8 Wrap errors in the Go 1.13 way
Signed-off-by: Julian Kornberger <jk+github@digineo.de>
2019-12-19 15:26:55 +01:00
Paul Gier 4d72cb8059 add node_cpu_info metric
Contains information gathered from /proc/cpuinfo

Signed-off-by: Paul Gier <pgier@redhat.com>
2019-09-25 14:38:57 -05:00
Paul Gier 2bc133cd48 update procfs to v0.0.2 (#1376)
Signed-off-by: Paul Gier <pgier@redhat.com>
2019-06-12 20:47:16 +02:00
Paul Gier b1298677aa Early init of procfs (#1315)
Minor change to match naming convention in other collectors.

Initialize the proc or sys FS instance once while initializing
each collector instead of re-creating for each metric update.

Signed-off-by: Paul Gier <pgier@redhat.com>
2019-04-10 18:16:12 +02:00
Paul Gier cc847f2f44 collector/cpu: split cpu freq metrics into separate collector (#1253)
The cpu frequency information is not always needed and/or available.
This change allows the cpu frequency metrics to be enabled/disabled
separately from the other cpu metrics, and also prevents a frequency
metric failure (such as a parse error) from failing the main cpu
collector.

Fixes #1241

Signed-off-by: Paul Gier <pgier@redhat.com>
2019-02-19 17:22:54 +01:00
mknapphrt 7fbdd0ae93 Update procfs vendor (#1248)
Signed-off-by: Mark Knapp <mknapp@hudson-trading.com>
2019-02-04 16:54:41 +01:00
Ben Kochie a0a164defb
Update cpufreq metrics collector (#1117)
* Update Linux cpufreq collector to use new procfs library functions.
* Split thermal throttle collection to a separate function.
* Add new required fixtures and repack ttar file.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-10-18 17:28:19 +02:00
Mario Trangoni 24a28fcc9e Remove unused func, var, and const (#928)
Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2018-04-29 14:35:43 +02:00
Mario Trangoni c9f421d0dd Fix some golint issues (#927)
* collector/cpu_*: rename nodeCpuSecondsDesc to nodeCPUSecondsDesc

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>

* collector/qdisc_linux.go: add NewQdiscStatCollector comment

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>

* collector/cpu_linux.go: rename core_map to coreMap

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2018-04-29 14:34:47 +02:00
Karsten Weiss efc1fdb6d0 cpu: Add a 2nd label 'package' to metric node_cpu_core_throttles_total (#871)
* cpu: Add a 2nd label 'package' to metric node_cpu_core_throttles_total

This commit fixes the node_cpu_core_throttles_total metrics on
multi-socket systems as the core_ids are the same for each package.
I.e. we need to count them seperately.

Rename the node_package_throttles_total metric label `node` to `package`.

Reorganize the sys.ttar archive and use the same symlinks as the Linux
kernel. Also, the new fixtures now use a dual-socket dual-core cpu w/o
HT/SMT (node0: cpu0+1, node1: cpu2+3) as well as processor-less
(memory-only) NUMA node 'node2' (this is a very rare case).

Signed-off-by: Karsten Weiss <knweiss@gmail.com>

* cpu: Use the direct /sys path to the cpu files.

Use the direct path /sys/devices/system/cpu/cpu[0-9]* (without symlinks)
instead of /sys/bus/cpu/devices/cpu[0-9]*.

The latter path also does not exist e.g. on RHEL 6.9's kernel.

Signed-off-by: Karsten Weiss <knweiss@gmail.com>

* cpu: Reverse core+package throttle processing order

Signed-off-by: Karsten Weiss <knweiss@gmail.com>

* cpu: Add documentation URLs

Signed-off-by: Karsten Weiss <knweiss@gmail.com>
2018-04-09 18:01:52 +02:00
Rene Treffer c504c7e264 Only report core throttles per core, not per cpu (#836)
* Only report core throttles per core, not per cpu

* Add topology/core_id to the cpu sysfs fixtures

* Add new cpu fixtures to ttar file

* Merge core_id reading and thermal throttle accounting

* Declare core_id
2018-02-27 19:43:15 +01:00
Ben Kochie 14d60958d6
Unify CPU collector conventions (#806)
* Unify CPU collector conventions

Add a common CPU metric description.
* All collectors use the same `nodeCpuSecondsDesc`.
* All collectors drop the `cpu` prefix for `cpu` label values.

* Fix subsystem string in cpu_freebsd.

* Fix Linux CPU freq label names.
2018-02-01 18:42:20 +01:00
Brian Brazil a98067a294 Make metrics better follow guidelines (#787)
* Improve stat linux metric names.

cpu is no longer used.

* node_cpu -> node_cpu_seconds_total for Linux

* Improve filesystem metric names with units

* Improve units and names of linux disk stats

Remove sector metrics, the bytes metrics cover those already.

* Infiniband counters should end in _total

* Improve timex metric names, convert to more normal units.

See
3c073991eb/kernel/time/ntp.c (L909)
for what stabil means, looks like a moving average of some form.

* Update test fixture

* For meminfo metrics that had "kB" units, add _bytes

* Interrupts counter should have _total
2018-01-17 17:55:55 +01:00
Ben Kochie 2a80537547
Split out guest cpu metrics on Linux. (#744)
Linux "guest" metrics for VMs are already accounted for in node_cpu
`user` and `nice` metrics.  Separate these into their own metric to
avoid duplication of data.
2017-11-23 15:04:47 +01:00
Karsten Weiss a8d7d1101a cpu: Support processor-less (memory-only) NUMA nodes (#734)
* cpu: Support processor-less (memory-only) NUMA nodes

Processor-less (memory-only) NUMA nodes exist e.g. in systems that use
Intel Optane drives for RAM expansion using Intel Memory Drive
Technology (IMDT).

IMDT RAM expansion supports two modes:

* "Unify Remote Memory domains": present a processor-less (memory-only)
  NUMA domain, which is the default
* "Expand local memory domains": to expand each processor’s memory domain
  with a portion of the memory made available by Optane and IMDT

This commit fixes a crash in the first case (when "cpulist" is empty).

Here's an example of such a system:

$ numastat -m|head -n5

Per-node system memory usage (in MBs):
                          Node 0          Node 1          Node 2           Total
                 --------------- --------------- --------------- ---------------
MemTotal               118239.56       130816.00       464384.00       713439.56

$ for i in {0..2}; do echo -n "$i: " ; cat /sys/bus/node/devices/node$i/cpulist ; done
0: 0-7,16-23
1: 8-15,24-31
2:

$ /opt/vsmp/bin/vsmpversion -vvv
Memory Drive Technology: 8.2.1455.74 (Sep 28 2017 13:09:59)
System configuration:
    Boards:      3
       1 x Proc. + I/O + Memory
       2 x NVM devices (Intel SSDPED1K375GAQ)
    Processors:  2, Cores: 16, Threads: 32
        Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz Stepping 01
    Memory (MB): 713472 (of 977450), Cache: 251416, Private: 12562
       1 x 249088MB   [262036/   678/12270]
       1 x 232192MB   [357707/125369/  146]  82:00.0#1
       1 x 232192MB   [357707/125369/  146]  83:00.0#1

* cpu: rename some variables (pkg => node)

* cpu: Use %v not %q in log.Debugf() format strings
2017-11-10 15:31:26 +01:00
Calle Pettersson 859a825bb8 Replace --collectors.enabled with per-collector flags (#640)
* Move NodeCollector into package collector

* Refactor collector enabling

* Update README with new collector enabled flags

* Fix out-of-date inline flag reference syntax

* Use new flags in end-to-end tests

* Add flag to disable all default collectors

* Track if a flag has been set explicitly

* Add --collectors.disable-defaults to README

* Revert disable-defaults flag

* Shorten flags

* Fixup timex collector registration

* Fix end-to-end tests

* Change procfs and sysfs path flags

* Fix review comments
2017-09-28 15:06:26 +02:00
Karsten Weiss b0d5c00832 cpu: Metric 'package_throttles_total' is per package. (#657)
* cpu: Metric 'package_throttles_total' is per package.

'package_throttles_total' is per package, not per cpu. This also reduces
the total number of cpu time series a lot (esp for multi core cpus).

* cpu: Better handling of a cpulist edge-case.

* cpu: Extract the package number from the directory name.

Do not rely on the range index.

* cpu: Add package_throttle_count for node0 cpu1

This file must be ignored by the cpu collector.
2017-09-07 23:24:18 +02:00
Rene Treffer 56bf8d4b2d Add link to kernel documentation for sysfs/cpufreq files 2017-06-27 11:25:06 +02:00
Rene Treffer bcc3cd92b8 Fix cpufreq statistics by converting kHz to Hz 2017-06-27 11:05:55 +02:00
Ben Kochie 182810056f Fix Linux cpu errors (#606)
Make the Linux cpu collector soft-error on missing `cpufreq` and
`thermal_throttle` features.
2017-06-20 07:51:26 +02:00