We use the output-compatible perccli and storcli.py does not handle 'Unknown' as a result:
```
sg="Error parsing \"/var/lib/node_exporter/perccli.prom\": text format parsing error in line 222: expected float as value, got \"Unknown\"" source="textfile.go:212"
```
I know, the perccli should not return 'Unknown' but this error breaks all other useful measurements because the prom file is not parsable. My if condition fixes this.
Signed-off-by: Andreas Wirooks <andreas.wirooks@1und1.de>
In order to avoid stuck collectors using up all system resources, add a
limit to the number of parallel in-flight scrape requests. This will
return a 503 error.
Default to 40 requests, this seems like a reasonable number based on:
* Two Prometheus servers scraping every 15 seconds.
* Failing scrapes after 5 minutes of stuckness.
Signed-off-by: Ben Kochie <superq@gmail.com>
If this flag is set, the metrics about the exporter itself (go_*,
process_*, promhttp_*) will be excluded from /metrics.
The Kingpin way of handling boolean flags makes the negative flag
wording (_dis_able) the most reasonably one.
This also refactors the flow in node_exporter.go quite a bit to avoid
mixing up the global and a local registry and to avoid re-creating a
registry even if no filtering is requested.
Signed-off-by: beorn7 <beorn@soundcloud.com>
Tcp SYN packet retransmits are a very useful signal as they affect
network performance disproportionately to regular TCP retransmits.
Signed-off-by: Ben Kochie <superq@gmail.com>
* storcli.py: Remove IntEnum
This removes an external dependency.
Moved VD state to VD info labels
* storcli.py: Fix BBU health detection
BBU Status is 0 for a healthy cache vault and 32 for a healthy BBU.
* storcli.py: Strip all strings from PD
Strip all strings that we get from PDs.
They often contain whitespaces....
* storcli.py: Add formatting options
Add help text explaining how this documented was formatted
* storcli.py: Add DG to pd_info label
Add disk group to pd_info.
That way we can relate to PDs in the same DG.
For example to check if all disks in one RAID
use the same interface...
* storcli.py: Fix promtool issues
Fix linting issues reported by promtool check-metrics
* storcli.py: Exit if storcli reports issues
storcli reports if the command was a success.
We should not continue if there are issues.
* storcli.py: Try to parse metrics to float
This will sanitize the values we hand over to
node_exporter - eliminating any unforeseen values we read out...
* storcli.py: Refactor code to implement handle_sas_controller()
Move code into methods so that we can now also support HBA queries.
* storcli.py: Sort inputs
"...like a good python developer"
- Daniel Swarbrick
* storcli.py: Replace external dateutil library with internal datetime
Removes external dependency...
* storcli.py: Also collect temperature on megaraid cards
We have already collected them on mpt3sas cards...
* storcli.py: Clean up old code
Removed dead code that is not used any more.
* storcli.py: strip() all information for labels
They often contain whitespaces...
* storcli.py: Try to catch KeyErrors generally
If some key we expect is not there, we will want to
still print whatever we have collected so far...
* storcli.py: Increment version number
We have made some changes here and there.
The general look of the data has not been changed.
* storcli.py: Fix CodeSpell issue
Split string to avoid issues with Codespell due to Celcius in JSON Key
Signed-off-by: Christopher Blum <zeichenanonym@web.de>
* deleted_libraries: Upgrade to Python 3
Python 2.7 will not be maintained past 2020. Therefore upgrade
text_collector_examples/deleted_libraries.py to Python 3.
* Add mellanox_hca_temp text collector example
mellanox_hca_temp is a script that reads Mellanox HCA temperature using
the Mellanox mget_temp_ext tool.
Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
In some cases the file might be called "temp" instead of the usual format "temp<index>_<item>"
as described in the kernel docs: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface
In this case, treat this as an _input file containing the current temperature reading.
Fixes#1122
Signed-off-by: Paul Gier <pgier@redhat.com>
* Update Linux cpufreq collector to use new procfs library functions.
* Split thermal throttle collection to a separate function.
* Add new required fixtures and repack ttar file.
Signed-off-by: Ben Kochie <superq@gmail.com>
The format of /proc/diskstats is changing in linux-4.19 to include some
additional fields. See: https://www.kernel.org/doc/Documentation/iostats.txt
* collector/diskstats: use constants for some hard coded strings
* collector/diskstats: update diskstats for linux-4.19
* collector/diskstats: remove kernel doc url from individual metrics
Signed-off-by: Paul Gier <pgier@redhat.com>
* State that wifi collector is disabled by default
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Add the 'processes' collector to the Readme
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
LaunchDaemons are the correct way to create services that are restart proof.
There is now only a single destination place mentioned in the readme for the plist file.
Signed-off-by: Dávid Balakirev <dave00ster@gmail.com>
This is mostly required to fix a bug with histograms on 32bit platforms.
(Which might or might not be used in node_exporter. Just in case...)
Signed-off-by: beorn7 <beorn@soundcloud.com>
* infiniband: Add not connected i40iw0/ports/1 fixtures
* infiniband: Handle issue when iWARP* RDMA modules are not available
This is related to #966, and handle this error,
Jun 07 13:33:24 hostname node_exporter[81888]: time="2018-06-07T13:33:24+02:00" level=error msg="ERROR: infiniband
collector failed after 0.000929s: strconv.ParseUint: parsing \"N/A (no PMA)\": invalid syntax" source="collector.go:132"
Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
* strip rootfs prefix for run in docker
* Use `/` as default value of path.rootfs, and parse mounts from `/proc/1/mounts`.
* No need to mount `/proc` and `/sys` because we share host's PID
namespace, which allows processes within the container to see all of the
processes on the system.
Closes: #66
Signed-off-by: Ivan Mikheykin <ivan.mikheykin@flant.com>
Signed-off-by: Yecheng Fu <cofyc.jackson@gmail.com>
Starting with (not yet released) OpenBSD 6.4, sysctl KERN_CPTIME2 will
return ENODEV for offline CPUs.
SMT siblings are reported as offline when hw.smt is disabled, which is
the default since one of the later Spectre variants. So this might
affect a few systems.
For more details see:
https://cvsweb.openbsd.org/src/sys/kern/kern_sysctl.c#rev1.348
Signed-off-by: Ralf Horstmann <ralf+github@ackstorm.de>
When starting Docker containers a whole bunch of netns (network
namespace) mounts are created that the node exporter can't make any
sense of (and can't read either).
This ignores all nsfs filesystems.
Fixes#875
Signed-off-by: Daniele Sluijters <daenney@users.noreply.github.com>
* Update build
* Only use CGO when building non-Linux.
* Update build to Go 1.11
* Use tab indenting consistently.
Signed-off-by: Ben Kochie <superq@gmail.com>
* Change systemd unit filtering
Get all units from systemd and filter in Go.
* Improves compatibility with older versions of systemd.
* Improve debugging by printing when units pass the filter.
* Remove extraneous newlines from log messages.
Signed-off-by: Ben Kochie <superq@gmail.com>
This removes the cgo import from timex collector, as it was only used
to define two constants. Those are part of the Linux kernel<->userspace
interface, thus there is no need to depend on libc to source them:
https://github.com/torvalds/linux/blob/v4.18/include/uapi/linux/timex.h
Signed-off-by: Luca Bruno <luca.bruno@coreos.com>