Commit graph

28 commits

Author SHA1 Message Date
beorn7 553f904f2d mixin: Add a capability to exclude non-prod AM instances
Signed-off-by: beorn7 <beorn@grafana.com>
2020-12-03 20:59:53 +01:00
beorn7 638e99c814 prometheus-mixin: Make PrometheusRemoteWriteBehind more generic
Currently, it relies on `job, instance` being the labels completely
identifying a Prometheus instance. However, what's intended is to
simply not match on `remote_name, url`.

Signed-off-by: beorn7 <beorn@grafana.com>
2020-11-17 13:29:49 +01:00
beorn7 371ca9ff46 prometheus-mixin: add HA-group aware alerts
There is certainly a potential to add more of these. This is mostly
meant to introduce the concept and cover a few critical parts.

Signed-off-by: beorn7 <beorn@grafana.com>
2020-11-11 19:45:34 +01:00
Simon Pasquier f381d8a9bd documentation/prometheus-mixin: improve PrometheusNotIngestingSamples
The alert shouldn't fire when there's no target and no rule configured.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-10-15 11:13:17 +02:00
Julien Pivotto f482c7bdd7
Add per scrape-config targets limit (#7554)
* Add per scrape-config targets limit

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-30 14:20:24 +02:00
Callum Styan 5400e71b91 Update mixin dashboards and alerts for new remote write label names.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2020-04-08 12:56:00 -07:00
Marco Pracucci 1e1785690a
Fix queue in alerts annotation
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2020-02-12 12:48:13 +01:00
beorn7 9c8f9bfa63 Fix the description template for PrometheusRemoteWriteDesiredShards
Signed-off-by: beorn7 <beorn@grafana.com>
2019-10-30 13:27:37 +01:00
beorn7 61617eb2d9 Fix PrometheusRemoteWriteDesiredShards
This rule has the same labels on both sides. We don't want
`group_right` and `on`, we want nothing.

Signed-off-by: beorn7 <beorn@grafana.com>
2019-10-29 00:23:39 +01:00
Simon Pasquier e36ab7e192
prometheus-mixin: improve description of sample alerts (#6050)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-09-24 17:44:27 +02:00
Björn Rabenstein 3b3eaf3496
Merge pull request #5787 from cstyan/reshard-max-logging
Add metrics for max/min/desired shards to queue manager.
2019-09-09 22:32:54 +02:00
Callum Styan a98599bea8 Update remote write max shards alert; properly template/query for max
shards in description.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-09-09 12:01:11 -07:00
Callum Styan 3b75614892 Add a warning alert, since the remote write behind alert will probably
already be going off, about desired shards being higher than max shards.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-08-08 06:45:46 -07:00
Simon Pasquier dd174963a2 prometheus-mixin: remove PrometheusTSDBWALCorruptions
The counter is only increased when tsdb.Open() is called which
Prometheus does only once in its lifetime (when it initializes). If the
corruption can't be recovered, tsdb.Open() returns an error and
Prometheus exits. Hence the metric is either 0 (no corruption) or 1
(corruption detected and repaired). If the latter, the alert isn't
actionable and the only way to resolve it is to restart Prometheus which
would reset the counter.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-08-06 14:36:56 +02:00
beorn7 4825585834 Tweak tenses
Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-28 17:37:49 +02:00
beorn7 9a2177949d Protect gauge-based alerts against failed scrapes
Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-28 16:46:19 +02:00
beorn7 7a25a2586d Sync with alerts from kube-prometheus
While doing so, re-introduce the summary/description
annotations. Also, add a few more rules and tweak a few of the
existing ones.

Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-27 23:50:26 +02:00
beorn7 1336a28848 Use a config variable for the Prometheus name
Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-27 14:34:11 +02:00
beorn7 e34af6d4d3 Address various comments from the review
Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-26 23:22:16 +02:00
beorn7 23c03207e9 Fixed indentation
Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-26 20:31:05 +02:00
Tom Wilkie 38a9bbbec2 Loosen off PrometheusRemoteWriteBehind alert.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-03-04 12:47:24 +00:00
Tom Wilkie b615069289 Update metric names.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-03-01 07:39:48 -08:00
Tom Wilkie e248ffb220 Add alert for WAL remote write falling behind.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-12 15:22:58 +00:00
Tom Wilkie 638204c775 Typo
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-11-19 12:23:42 +00:00
Tom Wilkie 8f42192e52 Add Prometheus alerts from kube-prometheus, remove the alertmanager alerts.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-11-19 11:22:55 +00:00
Tom Wilkie 50861d586a Alert if more than 1% of alerts fail for a given integration.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-11-16 17:17:47 +00:00
Tom Wilkie 266ba185fe Remove PromScrapeFailed alert.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-11-16 17:17:47 +00:00
Tom Wilkie ee1427faad Prometheus monitoring mixin for Prometheus itself.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-11-16 17:17:47 +00:00