Many collectors depend on underlying features to be enabled. This causes
confusion about what "success" means. This changes the behavior of the
`node_scrape_collector_success` metric.
* When a collector is unable to find data don't return success.
* Catch the no data error and send to Debug log level to avoid log spam.
* Update collectors to support this new functionality.
* Fix copy-pasta mistake in infiband debug message.
Closes: https://github.com/prometheus/node_exporter/issues/1323
Signed-off-by: Ben Kochie <superq@gmail.com>
Let the node exporter collect the non-numeric data from
/sys/class/infiniband: board ID, firmware version, and HCA type.
Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
Co-authored-by: Ben Kochie <superq@gmail.com>
Collect the InfiniBand port state, the physical state, and the maximum
signal transfer rate.
Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
Parsing the sysfs files for InfiniBand was added to the procfs library
(see https://github.com/prometheus/procfs/pull/164).
Therefore use `InfiniBandClass` from the procfs library instead of
parsing sysfs itself.
If the port counter return `N/A (no PMA)` no metric will be returned
(instead of returning 0 for this metric.
Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
* infiniband: Add not connected i40iw0/ports/1 fixtures
* infiniband: Handle issue when iWARP* RDMA modules are not available
This is related to #966, and handle this error,
Jun 07 13:33:24 hostname node_exporter[81888]: time="2018-06-07T13:33:24+02:00" level=error msg="ERROR: infiniband
collector failed after 0.000929s: strconv.ParseUint: parsing \"N/A (no PMA)\": invalid syntax" source="collector.go:132"
Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
* Improve stat linux metric names.
cpu is no longer used.
* node_cpu -> node_cpu_seconds_total for Linux
* Improve filesystem metric names with units
* Improve units and names of linux disk stats
Remove sector metrics, the bytes metrics cover those already.
* Infiniband counters should end in _total
* Improve timex metric names, convert to more normal units.
See
3c073991eb/kernel/time/ntp.c (L909)
for what stabil means, looks like a moving average of some form.
* Update test fixture
* For meminfo metrics that had "kB" units, add _bytes
* Interrupts counter should have _total
* Move NodeCollector into package collector
* Refactor collector enabling
* Update README with new collector enabled flags
* Fix out-of-date inline flag reference syntax
* Use new flags in end-to-end tests
* Add flag to disable all default collectors
* Track if a flag has been set explicitly
* Add --collectors.disable-defaults to README
* Revert disable-defaults flag
* Shorten flags
* Fixup timex collector registration
* Fix end-to-end tests
* Change procfs and sysfs path flags
* Fix review comments
According to Mellanox, it is standard practice that the port_xmit_data and port_rcv_data
files are split into 4 lanes. To get the actual transmit and receive values for each
port, the metric needs to be multiplied by 4.
Signed-Off-By: Robert Clark <robert.d.clark@hpe.com>
Older versions of the OFED drivers contain 64-bit variants of the port counters and are located in a directory named 'counters_ext'. This patch includes these older metrics that have since been deprecated with OFED 4.0.
Signed-Off-By: Robert Clark <robert.d.clark@hpe.com>
In case a metric file within the InfiniBand collector doesn't exist, skip the metric in order to allow collection of the remaining valid InfiniBand metrics.
Signed-Off-By: Robert Clark <robert.d.clark@hpe.com>
Named return variables should only be used to describe the returned type
further, e.g. `err error` doesn't add any new information and is just
stutter.
Add new metrics for the InfiniBand network protocol including the amount of packets sent and received, the number of times the link has been downed and how many times the link has recovered from an error state.
Signed-Off-By: Robert Clark <robert.d.clark@hpe.com>