Commit graph

12 commits

Author SHA1 Message Date
Karsten Weiss efc1fdb6d0 cpu: Add a 2nd label 'package' to metric node_cpu_core_throttles_total (#871)
* cpu: Add a 2nd label 'package' to metric node_cpu_core_throttles_total

This commit fixes the node_cpu_core_throttles_total metrics on
multi-socket systems as the core_ids are the same for each package.
I.e. we need to count them seperately.

Rename the node_package_throttles_total metric label `node` to `package`.

Reorganize the sys.ttar archive and use the same symlinks as the Linux
kernel. Also, the new fixtures now use a dual-socket dual-core cpu w/o
HT/SMT (node0: cpu0+1, node1: cpu2+3) as well as processor-less
(memory-only) NUMA node 'node2' (this is a very rare case).

Signed-off-by: Karsten Weiss <knweiss@gmail.com>

* cpu: Use the direct /sys path to the cpu files.

Use the direct path /sys/devices/system/cpu/cpu[0-9]* (without symlinks)
instead of /sys/bus/cpu/devices/cpu[0-9]*.

The latter path also does not exist e.g. on RHEL 6.9's kernel.

Signed-off-by: Karsten Weiss <knweiss@gmail.com>

* cpu: Reverse core+package throttle processing order

Signed-off-by: Karsten Weiss <knweiss@gmail.com>

* cpu: Add documentation URLs

Signed-off-by: Karsten Weiss <knweiss@gmail.com>
2018-04-09 18:01:52 +02:00
Rene Treffer c504c7e264 Only report core throttles per core, not per cpu (#836)
* Only report core throttles per core, not per cpu

* Add topology/core_id to the cpu sysfs fixtures

* Add new cpu fixtures to ttar file

* Merge core_id reading and thermal throttle accounting

* Declare core_id
2018-02-27 19:43:15 +01:00
Ben Kochie 14d60958d6
Unify CPU collector conventions (#806)
* Unify CPU collector conventions

Add a common CPU metric description.
* All collectors use the same `nodeCpuSecondsDesc`.
* All collectors drop the `cpu` prefix for `cpu` label values.

* Fix subsystem string in cpu_freebsd.

* Fix Linux CPU freq label names.
2018-02-01 18:42:20 +01:00
Brian Brazil a98067a294 Make metrics better follow guidelines (#787)
* Improve stat linux metric names.

cpu is no longer used.

* node_cpu -> node_cpu_seconds_total for Linux

* Improve filesystem metric names with units

* Improve units and names of linux disk stats

Remove sector metrics, the bytes metrics cover those already.

* Infiniband counters should end in _total

* Improve timex metric names, convert to more normal units.

See
3c073991eb/kernel/time/ntp.c (L909)
for what stabil means, looks like a moving average of some form.

* Update test fixture

* For meminfo metrics that had "kB" units, add _bytes

* Interrupts counter should have _total
2018-01-17 17:55:55 +01:00
Ben Kochie 2a80537547
Split out guest cpu metrics on Linux. (#744)
Linux "guest" metrics for VMs are already accounted for in node_cpu
`user` and `nice` metrics.  Separate these into their own metric to
avoid duplication of data.
2017-11-23 15:04:47 +01:00
Karsten Weiss a8d7d1101a cpu: Support processor-less (memory-only) NUMA nodes (#734)
* cpu: Support processor-less (memory-only) NUMA nodes

Processor-less (memory-only) NUMA nodes exist e.g. in systems that use
Intel Optane drives for RAM expansion using Intel Memory Drive
Technology (IMDT).

IMDT RAM expansion supports two modes:

* "Unify Remote Memory domains": present a processor-less (memory-only)
  NUMA domain, which is the default
* "Expand local memory domains": to expand each processor’s memory domain
  with a portion of the memory made available by Optane and IMDT

This commit fixes a crash in the first case (when "cpulist" is empty).

Here's an example of such a system:

$ numastat -m|head -n5

Per-node system memory usage (in MBs):
                          Node 0          Node 1          Node 2           Total
                 --------------- --------------- --------------- ---------------
MemTotal               118239.56       130816.00       464384.00       713439.56

$ for i in {0..2}; do echo -n "$i: " ; cat /sys/bus/node/devices/node$i/cpulist ; done
0: 0-7,16-23
1: 8-15,24-31
2:

$ /opt/vsmp/bin/vsmpversion -vvv
Memory Drive Technology: 8.2.1455.74 (Sep 28 2017 13:09:59)
System configuration:
    Boards:      3
       1 x Proc. + I/O + Memory
       2 x NVM devices (Intel SSDPED1K375GAQ)
    Processors:  2, Cores: 16, Threads: 32
        Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz Stepping 01
    Memory (MB): 713472 (of 977450), Cache: 251416, Private: 12562
       1 x 249088MB   [262036/   678/12270]
       1 x 232192MB   [357707/125369/  146]  82:00.0#1
       1 x 232192MB   [357707/125369/  146]  83:00.0#1

* cpu: rename some variables (pkg => node)

* cpu: Use %v not %q in log.Debugf() format strings
2017-11-10 15:31:26 +01:00
Calle Pettersson 859a825bb8 Replace --collectors.enabled with per-collector flags (#640)
* Move NodeCollector into package collector

* Refactor collector enabling

* Update README with new collector enabled flags

* Fix out-of-date inline flag reference syntax

* Use new flags in end-to-end tests

* Add flag to disable all default collectors

* Track if a flag has been set explicitly

* Add --collectors.disable-defaults to README

* Revert disable-defaults flag

* Shorten flags

* Fixup timex collector registration

* Fix end-to-end tests

* Change procfs and sysfs path flags

* Fix review comments
2017-09-28 15:06:26 +02:00
Karsten Weiss b0d5c00832 cpu: Metric 'package_throttles_total' is per package. (#657)
* cpu: Metric 'package_throttles_total' is per package.

'package_throttles_total' is per package, not per cpu. This also reduces
the total number of cpu time series a lot (esp for multi core cpus).

* cpu: Better handling of a cpulist edge-case.

* cpu: Extract the package number from the directory name.

Do not rely on the range index.

* cpu: Add package_throttle_count for node0 cpu1

This file must be ignored by the cpu collector.
2017-09-07 23:24:18 +02:00
Rene Treffer 56bf8d4b2d Add link to kernel documentation for sysfs/cpufreq files 2017-06-27 11:25:06 +02:00
Rene Treffer bcc3cd92b8 Fix cpufreq statistics by converting kHz to Hz 2017-06-27 11:05:55 +02:00
Ben Kochie 182810056f Fix Linux cpu errors (#606)
Make the Linux cpu collector soft-error on missing `cpufreq` and
`thermal_throttle` features.
2017-06-20 07:51:26 +02:00
Rene Treffer 2e9f1913b8 Move stat_linux to cpu_linux and add cpufreq stats (#548) 2017-06-13 11:21:53 +02:00