We've gathered enough evidence that the CPU counter bug workaround is
working as intended. Downgrade the message from Warning to Debug.
Signed-off-by: Ben Kochie <superq@gmail.com>
* Expose metric for state=check for node_md_state
* Added new e2e output fixture including md201 which is in checking state and a the new state=check labeled metric for all other md
Signed-off-by: Christian Rohmann <github@frittentheke.de>
* Check interface name before loading interface data
* Reduce indentation
* Optimize nested netdev map
This change avoids conversion to strings and back.
Signed-off-by: Julian Kornberger <jk+github@digineo.de>
* Expose cpu bugs and flags as info metrics with a regexp filter.
* Automatically enable CPU info metrics when using flags or bugs feature.
Signed-off-by: domgoer <domdoumc@gmail.com>
Handle the case when /sys/class/power_supply doesn't exist. Fixes
logging error spam.
Requires https://github.com/prometheus/procfs/pull/308
Signed-off-by: Ben Kochie <superq@gmail.com>
TCP "OutRsts" is the number of TCP Resets sent by the node. This can be
useful for monitoring connection failures and flooding.
Signed-off-by: Ben Kochie <superq@gmail.com>
We must know the length of the various filesystem C strings before
turning them from a byte array into a Go string, otherwise our Go
strings could contain null bytes, corrupting the label values.
Signed-off-by: David O'Rourke <david.orourke@gmail.com>
* Two new states will be added to the tcpstat collector called rx_queued_bytes and tx_queued_bytes.
For UDP datagrams an additional collector 'udp_queues' can be used to expose the total lengths of the tx_queue and rx_queue.
@SuperQ and @discordianfish this changes gives us the option to check for overloaded UDP + TCP processing.
The names of the new TCP states and the UDP metric can be discussed.
The current reasons are just:
I don't want to add another collector for the same exposed file, so I just added the new states to the tcpstat collector.
I chose the name 'udp_queue' instead of 'udpstat' as UDP has no state.
Signed-off-by: Peter Bueschel <peter.bueschel@logmein.com>
The `node_cpu_info` metric contains some information like the `model`
(which is an integer), but not the human readable model name. Also the
stepping of the processor might be interesting, since different stepping
of a processor might behave differently.
Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
* Use `strconv.Itoa()` instead of `fmt.Sprintf()` for simple conversion.
* Eliminate copy-paste in collector setup.
Signed-off-by: Ben Kochie <superq@gmail.com>
* add a map of profilers to CPUids
`runtime.NumCPU()` returns the number of CPUs that the process can run
on. This number does not necessarily correlate to CPU ids if the
affinity mask of the process is set.
This change maintains the current behavior as default, but also allows
the user to specify a range of CPUids to use instead.
The CPU id is stored as the value of a map keyed on the profiler
object's address.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
Signed-off-by: Daniel Hodges <hodges@uber.com>
Co-authored-by: jdamato-fsly <55214354+jdamato-fsly@users.noreply.github.com>
Many collectors depend on underlying features to be enabled. This causes
confusion about what "success" means. This changes the behavior of the
`node_scrape_collector_success` metric.
* When a collector is unable to find data don't return success.
* Catch the no data error and send to Debug log level to avoid log spam.
* Update collectors to support this new functionality.
* Fix copy-pasta mistake in infiband debug message.
Closes: https://github.com/prometheus/node_exporter/issues/1323
Signed-off-by: Ben Kochie <superq@gmail.com>
Let the node exporter collect the non-numeric data from
/sys/class/infiniband: board ID, firmware version, and HCA type.
Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
Co-authored-by: Ben Kochie <superq@gmail.com>