prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-11-17 19:14:04 -08:00

Author	SHA1	Message	Date
Iain Lane	e5cd5a33d0	PrometheusHighQueryLoad alert: use configured selector Currently we're hardcoding `job="prometheus-k8s"` as selector. This doesn't work if your prometheus is elsewhere. Fortunately we have `prometheusSelector` in `$._config` which all the other alerts use. Use that here too. Signed-off-by: Iain Lane <iain@orangesquash.org.uk>	2022-07-15 10:04:32 +01:00
Haoyu Sun	26a7f80aa1	add alert PrometheusHighQueryLoad. Signed-off-by: Haoyu Sun <hasun@redhat.com>	2022-07-13 14:08:24 +02:00
fpetkovski	501a8a7865	Address code review comments Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>	2022-03-30 09:35:08 +02:00
fpetkovski	877320784b	Add alert in mixin for exceeded sample limit This commit adds an alert in the prometheus mixin which triggers when Prometheus has failed scrapes that have exceeded the configured sample_limit for that job. Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>	2022-03-30 09:31:35 +02:00
Haoyu Sun	3c903af474	Add Alert PrometheusScrapeBodySizeLimitHit Signed-off-by: Haoyu Sun <hasun@redhat.com>	2022-03-22 15:13:00 +01:00
Björn Rabenstein	2234798f60	Merge pull request #9700 from nikosmeds/nikosmeds/hagroupcrashlooping-mixin-60m Increase time range for PrometheusHAGroupCrashlooping alert	2021-11-19 12:53:55 +01:00
Niko Smeds	53ca693f9e	Be specific Signed-off-by: Niko Smeds <nikosmeds@gmail.com>	2021-11-18 11:28:38 -08:00
Niko Smeds	0bc2cbdd7d	Leave time range for clean restarts as-is Signed-off-by: Niko Smeds <nikosmeds@gmail.com>	2021-11-17 15:14:26 -08:00
Fatih Sarhan	bc89e9e494	mixin: Reorder template variables on Remote Write dashboard Signed-off-by: f9n <f9n@protonmail.com>	2021-11-12 14:38:05 +03:00
Niko Smeds	fdcd423dfe	Increase time range for PrometheusHAGroupCrashlooping alert Signed-off-by: Niko Smeds <nikosmeds@gmail.com>	2021-11-08 15:06:42 -08:00
SuperQ	3cd2c033e2	Use Go 1.16+ install for mixin tests Use new `go install` syntax to fetch tools. Signed-off-by: SuperQ <superq@gmail.com>	2021-10-23 22:52:16 +02:00
Julien Pivotto	d5676fb9e0	Merge pull request #9254 from prometheus/superq/go1.17 Build with Go 1.17 / npm 7 / node 16	2021-08-28 18:36:42 +02:00
Frederic Hemberger	16b8911b1a	docs: Replace `go get` with `go install` for command installation (#9098 ) `go get` is deprecated for installation of commands as of go v1.17 Ref: https://go.googlesource.com/go/+/ced0fdbad0655d63d535390b1a7126fd1fef8348 Signed-off-by: Frederic Hemberger <mail@frederic-hemberger.de>	2021-08-27 11:08:21 +02:00
SuperQ	e167a45c65	Add new Go build tags. Add new go:build comments based on 1.17 formatting[0]. [0]: https://golang.org/doc/go1.17#gofmt Signed-off-by: SuperQ <superq@gmail.com>	2021-08-27 10:24:14 +02:00
Philip Gough	751ca03fad	mixin: Filter instance by job for Prometheus overview dashboard Signed-off-by: Philip Gough <philip.p.gough@gmail.com>	2021-07-28 14:34:26 +01:00
Julien Duchesne	8855c2e626	Add `prometheus_tsdb_clean_start` metric (#8824 ) Add cleanup of the lockfile when the db is cleanly closed The metric describes the status of the lockfile on startup 0: Already existed 1: Did not exist -1: Disabled Therefore, if the min value over time of this metric is 0, that means that executions have exited uncleanly We can then use that metric to have a much lower threshold on the crashlooping alert: If the metric exists and it has been zero, two restarts is enough to trigger the alarm If it does not exist (old prom version for example), the current five restarts threshold remains Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com> * Change metric name + set unset value to -1 Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com> * Only check the last value of the clean start alert Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com> * Fix test + nit Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>	2021-06-16 15:03:02 +05:30
hanjm	1df05bfd49	Add body_size_limit to prevent bad targets response large body cause Prometheus server OOM (#8827 ) Signed-off-by: hanjm <hanjinming@outlook.com>	2021-05-29 07:05:42 +08:00
Levi Harrison	2826fbeeb7	SD: Add target creation failure counter and change failure handling (#8786 ) * Added metric and changed failure/drop strategy Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-05-28 23:50:59 +02:00
Damien Grisonnet	b50f9c1c84	Add label scrape limits (#8777 ) * scrape: add label limits per scrape Add three new limits to the scrape configuration to provide some mechanism to defend against unbound number of labels and excessive label lengths. If any of these limits are broken by a sample from a scrape, the whole scrape will fail. For all of these configuration options, a zero value means no limit. The `label_limit` configuration will provide a mechanism to bound the number of labels per-scrape of a certain sample to a user defined limit. This limit will be tested against the sample labels plus the discovery labels, but it will exclude the __name__ from the count since it is a mandatory Prometheus label to which applying constraints isn't meaningful. The `label_name_length_limit` and `label_value_length_limit` will prevent having labels of excessive lengths. These limits also skip the __name__ label for the same reasons as the `label_limit` option and will also make the scrape fail if any sample has a label name/value length that exceed the predefined limits. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com> * scrape: add metrics and alert to label limits Add three gauge, one for each label limit to easily access the limit set by a certain scrape target. Also add a counter to count the number of targets that exceeded the label limits and thus were dropped. This is useful for the `PrometheusLabelLimitHit` alert that will notify the users that scraping some targets failed because they had samples exceeding the label limits defined in the scrape configuration. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com> * scrape: apply label limits to __name__ label Apply limits to the __name__ label that was previously skipped and truncate the label names and values in the error messages as they can be very very long. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com> * scrape: remove label limits gauges and refactor Remove `prometheus_target_scrape_pool_label_limit`, `prometheus_target_scrape_pool_label_name_length_limit`, and `prometheus_target_scrape_pool_label_value_length_limit` as they are not really useful since we don't have the information on the labels in it. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>	2021-05-06 09:56:21 +01:00
ravilr	adc8807851	Update remote-write alert rules mixin (#8423 ) Signed-off-by: ravilr <raviprasad_lr@yahoo.com>	2021-01-31 20:07:49 +00:00
Frederic Branczyk	62bc755733	mixin: Scope grafana config In its current form this configuration clashes in one of the most widely used configurations (kube-prometheus). This patch scopes the configuration to prevent this. Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>	2020-12-30 17:50:34 +01:00
Nicolas Lamirault	aa1ca13025	Add: Custom tags and prefix in Prometheus Mixin (#8287 ) * Add: custom tags and prefix Signed-off-by: Nicolas Lamirault <nicolas.lamirault@gmail.com> * Fix: fmt Signed-off-by: Nicolas Lamirault <nicolas.lamirault@gmail.com>	2020-12-16 18:49:06 +01:00
Björn Rabenstein	511511324a	Merge pull request #8235 from Allex1/master Update remote-write grafana mixin	2020-12-08 14:50:47 +01:00
beorn7	553f904f2d	mixin: Add a capability to exclude non-prod AM instances Signed-off-by: beorn7 <beorn@grafana.com>	2020-12-03 20:59:53 +01:00
birca	3ec4161575	Update remote-write grafana mixin Signed-off-by: birca <birca@adobe.com>	2020-12-02 09:50:15 +02:00
beorn7	638e99c814	prometheus-mixin: Make PrometheusRemoteWriteBehind more generic Currently, it relies on `job, instance` being the labels completely identifying a Prometheus instance. However, what's intended is to simply not match on `remote_name, url`. Signed-off-by: beorn7 <beorn@grafana.com>	2020-11-17 13:29:49 +01:00
beorn7	371ca9ff46	prometheus-mixin: add HA-group aware alerts There is certainly a potential to add more of these. This is mostly meant to introduce the concept and cover a few critical parts. Signed-off-by: beorn7 <beorn@grafana.com>	2020-11-11 19:45:34 +01:00
Matthias Loibl	13ba013a24	Use absolute jsonnet import paths This should be the way forward when importing libraries in jsonnet. It's closer to how Go imports look and makes it more obvious where packages live. This is not breaking anything, as the old imports were already symlinks to the now directly used directories. Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>	2020-10-20 11:42:30 +02:00
Björn Rabenstein	d49f267f76	Merge pull request #8054 from simonpasquier/improve-not-ingesting-samples-alert documentation/prometheus-mixin: improve PrometheusNotIngestingSamples	2020-10-15 12:29:39 +02:00
Simon Pasquier	f381d8a9bd	documentation/prometheus-mixin: improve PrometheusNotIngestingSamples The alert shouldn't fire when there's no target and no rule configured. Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2020-10-15 11:13:17 +02:00
Julien Pivotto	4596abee4d	Mixin: Ignore unset remote write timestamp (#8046 ) * Mixin: Ignore unset remote write timestamp This pull request ignores the zero value of highest_sent_timestamp_seconds in Highest Timestamp In vs. Highest Timestamp Sent which just show that remote write has not been successful yet. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-15 09:15:59 +02:00
Simon Pasquier	e693af6c01	.circleci/config.yml: check mixins (#6895 ) * .circleci/config.yml: check mixins Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Run jsonnetfmt Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Install tools in the image instead of using coreos/jsonnet-ci The latter is deprecated Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Update jsonnetfile.json Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2020-08-25 15:59:41 +02:00
Julien Pivotto	f482c7bdd7	Add per scrape-config targets limit (#7554 ) * Add per scrape-config targets limit Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-07-30 14:20:24 +02:00
Tom Wilkie	27b1009acd	Rename the dashboard in the mixin to 'Prometheus Overview'. (#7489 ) Due to https://github.com/grafana/grafana/issues/15642, this prevents users putting this dashboard in a Grafana folder called 'Prometheus'. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2020-06-30 15:45:44 +01:00
Manuel Fontan	6e7554639b	Update Readme since jsonnetfmt is available in the jsonnet go implementation since v0.16.0 Signed-off-by: Manuel Fontan <mfontangarcia@slack-corp.com>	2020-06-16 10:41:58 +01:00
Callum Styan	5400e71b91	Update mixin dashboards and alerts for new remote write label names. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2020-04-08 12:56:00 -07:00
Marco Pracucci	1e1785690a	Fix queue in alerts annotation Signed-off-by: Marco Pracucci <marco@pracucci.com>	2020-02-12 12:48:13 +01:00
paulfantom	7321f1d227	documentation/prometheus-mixin: add dependency on grafonnet Signed-off-by: paulfantom <pawel@krupa.net.pl>	2020-01-11 23:18:04 +01:00
Callum Styan	f4fb6dc208	Simplify remote write dashboard in mixin. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-11-18 19:58:07 -08:00
beorn7	9c8f9bfa63	Fix the description template for PrometheusRemoteWriteDesiredShards Signed-off-by: beorn7 <beorn@grafana.com>	2019-10-30 13:27:37 +01:00
beorn7	61617eb2d9	Fix PrometheusRemoteWriteDesiredShards This rule has the same labels on both sides. We don't want `group_right` and `on`, we want nothing. Signed-off-by: beorn7 <beorn@grafana.com>	2019-10-29 00:23:39 +01:00
Callum Styan	da6d46625f	Repeat shards panels on the queue label. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-10-21 11:03:50 -07:00
Callum Styan	818974ff8f	Rewrite remote write dashboard using base grafonnet. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-10-17 15:40:58 -07:00
Callum Styan	81fa63006c	Add additional shards/segment graphs to remote write dashboard. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-10-09 09:59:02 -07:00
Simon Pasquier	e36ab7e192	prometheus-mixin: improve description of sample alerts (#6050 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-09-24 17:44:27 +02:00
Björn Rabenstein	3b3eaf3496	Merge pull request #5787 from cstyan/reshard-max-logging Add metrics for max/min/desired shards to queue manager.	2019-09-09 22:32:54 +02:00
Callum Styan	a98599bea8	Update remote write max shards alert; properly template/query for max shards in description. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-09-09 12:01:11 -07:00
Callum Styan	3b75614892	Add a warning alert, since the remote write behind alert will probably already be going off, about desired shards being higher than max shards. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-08-08 06:45:46 -07:00
Simon Pasquier	dd174963a2	prometheus-mixin: remove PrometheusTSDBWALCorruptions The counter is only increased when tsdb.Open() is called which Prometheus does only once in its lifetime (when it initializes). If the corruption can't be recovered, tsdb.Open() returns an error and Prometheus exits. Hence the metric is either 0 (no corruption) or 1 (corruption detected and repaired). If the latter, the alert isn't actionable and the only way to resolve it is to restart Prometheus which would reset the counter. Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-08-06 14:36:56 +02:00
Matthias Loibl	20d12ff1c7	Fix prometheus-mixin dashboards to use grafanaDashboards Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>	2019-07-11 15:40:26 +02:00

1 2

76 commits