- Use a stacked graph instead of a gauge as development over time is
especially useful for disk space usage.
- By only taking one metric per device into account, we avoid
double-counting for devices that are mounted multiple times.
Signed-off-by: beorn7 <beorn@grafana.com>
* node-mixin: Improve nodes dashboard
- Use stacking where it makes sense.
- Normalize idle CPU so that stacking is more meaningful.
- Consistently fill where stacking is used but don't fill where not.
- Fix y axis max value for Idle CPU panel.
- Fix y axis min value for memory usage panel.
- Use `$__interval` for range where applicable (and set min step
to 1m).
- Make the right Y axis for disk I/O actually work.
This is just an incremental improvements. It doesn't touch the more
involved TODOs.
Signed-off-by: beorn7 <beorn@grafana.com>
This addresses the blissful scenario where single-node failures are
unproblematic. No reason to wake somebody up if a node is about to
screw itself up by filling the disk.
Signed-off-by: beorn7 <beorn@grafana.com>
- Normalize cluster memory utilisation.
- Fix missing `1m` in memory saturation.
- Have both disk-related row next to each other instead with the
network row in between.
- Correctly render transmit network traffic as negative, using
`seriesOverrides` and `min: null` for the y-axis.
- Make panel and row naming consistent.
- Remove legend where it would just display a single entry with
exactly the title of the panel.
- Fix metric name in individual node CPU Saturation panel.
- Break up disk space utilisation by device in the panel for an
individual node.
NB: All of that doesn't touch any more subtle issues captured in the
various TODOs.
Signed-off-by: beorn7 <beorn@grafana.com>
Use the extra information gleaned from the mountinfo file to add
a 'mountaddr' field for NFS metrics. This helps prevent prometheus from
ignoring mounts that come from the same URL, but are actually from
different IP addresses.
This commit also rebases to current master
Signed-off-by: Dipack P Panjabi <dpanjabi@hudson-trading.com>
As per prometheus/client_golang#543, pass the Registry for exporter
metrics when setting up the /metrics HTTP handler.
With this, the `promhttp_metric_handler_errors_total` metric will
increment on (possibly non-fatal) collection-time errors, such as
duplicate metrics from text files.
Signed-off-by: Matthias Rampke <mr@soundcloud.com>
This will cause a query to be valid even if values of selector are
empty.
Additionally fixing query responsible for disk space usage.
Signed-off-by: paulfantom <pawel@krupa.net.pl>
Include directory operation, read/write system call, and vnode runtime
statistics for XFS filesystems.
Signed-off-by: Steven Kreuzer <skreuzer@FreeBSD.org>
The only deviation that happened so far is to use format="percentunit"
in a Grafana gauge. This change wasn't even properly used in this repo
so far, so I opted to stick with "upstream" for now. If changes are
really needed, we can try to change upstream first.
Another change was done in parallal here and upstream, but it was
"more correct" in upstream. (Change datasource to $datasource
variable, only partially applied here.) Which is another point for
using the upstream and not copy it here.
Signed-off-by: beorn7 <beorn@grafana.com>