Commit graph

20 commits

Author SHA1 Message Date
beorn7 61617eb2d9 Fix PrometheusRemoteWriteDesiredShards
This rule has the same labels on both sides. We don't want
`group_right` and `on`, we want nothing.

Signed-off-by: beorn7 <beorn@grafana.com>
2019-10-29 00:23:39 +01:00
Simon Pasquier e36ab7e192
prometheus-mixin: improve description of sample alerts (#6050)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-09-24 17:44:27 +02:00
Björn Rabenstein 3b3eaf3496
Merge pull request #5787 from cstyan/reshard-max-logging
Add metrics for max/min/desired shards to queue manager.
2019-09-09 22:32:54 +02:00
Callum Styan a98599bea8 Update remote write max shards alert; properly template/query for max
shards in description.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-09-09 12:01:11 -07:00
Callum Styan 3b75614892 Add a warning alert, since the remote write behind alert will probably
already be going off, about desired shards being higher than max shards.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-08-08 06:45:46 -07:00
Simon Pasquier dd174963a2 prometheus-mixin: remove PrometheusTSDBWALCorruptions
The counter is only increased when tsdb.Open() is called which
Prometheus does only once in its lifetime (when it initializes). If the
corruption can't be recovered, tsdb.Open() returns an error and
Prometheus exits. Hence the metric is either 0 (no corruption) or 1
(corruption detected and repaired). If the latter, the alert isn't
actionable and the only way to resolve it is to restart Prometheus which
would reset the counter.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-08-06 14:36:56 +02:00
beorn7 4825585834 Tweak tenses
Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-28 17:37:49 +02:00
beorn7 9a2177949d Protect gauge-based alerts against failed scrapes
Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-28 16:46:19 +02:00
beorn7 7a25a2586d Sync with alerts from kube-prometheus
While doing so, re-introduce the summary/description
annotations. Also, add a few more rules and tweak a few of the
existing ones.

Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-27 23:50:26 +02:00
beorn7 1336a28848 Use a config variable for the Prometheus name
Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-27 14:34:11 +02:00
beorn7 e34af6d4d3 Address various comments from the review
Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-26 23:22:16 +02:00
beorn7 23c03207e9 Fixed indentation
Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-26 20:31:05 +02:00
Tom Wilkie 38a9bbbec2 Loosen off PrometheusRemoteWriteBehind alert.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-03-04 12:47:24 +00:00
Tom Wilkie b615069289 Update metric names.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-03-01 07:39:48 -08:00
Tom Wilkie e248ffb220 Add alert for WAL remote write falling behind.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-12 15:22:58 +00:00
Tom Wilkie 638204c775 Typo
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-11-19 12:23:42 +00:00
Tom Wilkie 8f42192e52 Add Prometheus alerts from kube-prometheus, remove the alertmanager alerts.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-11-19 11:22:55 +00:00
Tom Wilkie 50861d586a Alert if more than 1% of alerts fail for a given integration.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-11-16 17:17:47 +00:00
Tom Wilkie 266ba185fe Remove PromScrapeFailed alert.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-11-16 17:17:47 +00:00
Tom Wilkie ee1427faad Prometheus monitoring mixin for Prometheus itself.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-11-16 17:17:47 +00:00