Commit graph

473 commits

Author SHA1 Message Date
Julien Pivotto 3a56817a30
Rules: set otel status to ERROR when a rule fails (#10745)
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2022-05-25 10:06:17 +02:00
Julien Pivotto 0d94cdf107
rules: remove classic UI code (#10730)
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2022-05-23 16:21:50 +02:00
Łukasz Mierzwa d3c9c4f574
Stop rule manager before TSDB is stopped (#10680)
During shutdown TSDB is stopped before rule manager is stopped. Since  TSDB shutdown can take a long time (minutes or 10s of minutes) it keeps rule manager running while parts of Prometheus are already stopped (most notebly scrape manager). This can cause false positive alerts to fire, mostly those that rely on absent() calls since new sample appends will stop while alert queries are still evaluated.
Stop rules before stopping TSDB and scrape manager to avoid this problem.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2022-05-20 23:26:06 +02:00
Matthieu MOREL e2ede285a2
refactor: move from io/ioutil to io and os packages (#10528)
* refactor: move from io/ioutil to io and os packages
* use fs.DirEntry instead of os.FileInfo after os.ReadDir

Signed-off-by: MOREL Matthieu <matthieu.morel@cnp.fr>
2022-04-27 11:24:36 +02:00
Wilbert Guo 83a2e52bc2
Add SyncForState Implementation for Ruler HA (#10070)
* continuously syncing activeAt for alerts

Signed-off-by: Yijie Qin <qinyijie@amazon.com>
Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

* add import

Signed-off-by: Yijie Qin <qinyijie@amazon.com>
Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

* Refactor SyncForState and add unit tests

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

* Format code

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

* Add hook for syncForState

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

Fix go lint

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

Refactor syncForState override implementation

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

Add syncForState override func as argument to Update()

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

Fix go formatting

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

Fix circleci test errors

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

Remove overrideFunc as argument to run()

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

* remove the syncForState

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* use the override function to decide if need to replace the activeAt or not

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix test case

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix format

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* Trigger build

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fixing comments

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* return the result of map of alerts instead of single one

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* upper case the QueryforStateSeries

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* use a more generic rule group post process function type

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix indentation

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix gofmt

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix lint

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fixing naming

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix comments

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* add the lastEvalTimestamp as parameter

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fmt

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* change funcType to func

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

Co-authored-by: Yijie Qin <qinyijie@amazon.com>
Co-authored-by: Yijie Qin <63399121+qinxx108@users.noreply.github.com>
2022-03-29 02:16:46 +02:00
Alan Protasio 606ef33d91 Track and report Samples Queried per query
We always track total samples queried and add those to the standard set
of stats queries can report.

We also allow optionally tracking per-step samples queried. This must be
enabled both at the engine and query level to be tracked and rendered.
The engine flag is exposed via a Prometheus feature flag, while the
query flag is set when stats=all.

Co-authored-by: Alan Protasio <approtas@amazon.com>
Co-authored-by: Andrew Bloomgarden <blmgrdn@amazon.com>
Co-authored-by: Harkishen Singh <harkishensingh@hotmail.com>
Signed-off-by: Andrew Bloomgarden <blmgrdn@amazon.com>
2022-03-21 23:49:17 +01:00
Alvin Lin cd739214dd
Log rule name when evaluating rule groups' Eval function logs anything (#10454)
* Add benchingmark test for rule group eval

Signed-off-by: Alvin Lin <alvinlin@amazon.com>
2022-03-21 19:52:20 +01:00
Matej Gera 2c61d29b2a
Tracing: Migrate to OpenTelemetry library (#9724)
Signed-off-by: Matej Gera <matejgera@gmail.com>
2022-01-25 11:08:04 +01:00
Björn Rabenstein 4c56a193c5
Merge pull request #9478 from prometheus/beorn7/pkg-deprecation
Move packages out of deprecated pkg directory
2021-11-09 11:09:16 +01:00
beorn7 c954cd9d1d Move packages out of deprecated pkg directory
This creates a new `model` directory and moves all data-model related
packages over there:
  exemplar labels relabel rulefmt textparse timestamp value

All the others are more or less utilities and have been moved to `util`:
  gate logging modetimevfs pool runtime

Signed-off-by: beorn7 <beorn@grafana.com>
2021-11-09 08:03:10 +01:00
Bryan Boreham 26d8ae0e41 Rules: simplify map key for stale series detection
The rules manager keeps a note of which series were generated by the
last run, so it can write a stale marker to those that disappeared.
Since the keys are not for human eyes, we can use a simpler format
and save the effort of quoting label values.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2021-11-08 22:18:48 +01:00
Yijie Qin 6fce45838a
Add access function for restoration state of alerting rule (#9665) 2021-11-05 18:26:29 -04:00
Mateusz Gozdek 1a6c2283a3 Format Go source files using 'gofumpt -w -s -extra'
Part of #9557

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
2021-11-02 19:52:34 +01:00
Levi Harrison d81bbe154d
Rule alerts/series limit updates (#9541)
* Add docs and do not limit inactive alerts.

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-10-21 23:14:17 +02:00
Levi Harrison dc2f1993d8
Limit number of alerts or series produced by a rule (#9260)
* Add limit to rules

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-09-15 09:48:26 +02:00
George Robinson 049b4f4f13
Support customization of template options in TemplateExpander (#9290)
Signed-off-by: George Robinson <george.robinson@grafana.com>
2021-09-13 17:19:08 +05:30
Levi Harrison 8c29046ab2
Remove unneeded state modifications
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-08-20 16:42:31 -04:00
Michal Wasilewski 3f686cad8b
fixes yamllint errors
Signed-off-by: Michal Wasilewski <mwasilewski@gmx.com>
2021-06-12 12:47:47 +02:00
Levi Harrison b5f6f8fb36 Switched to go-kit/log
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-06-11 12:28:36 -04:00
Levi Harrison 26274527df
Updated/added tests
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-05-30 23:35:41 -04:00
Levi Harrison 17ea8d006a
Added external URL access
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-05-30 23:35:26 -04:00
Owen Diehl 23999df27c expose rule metrics fields
Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
2021-04-30 13:36:44 -04:00
Goutham Veeramachaneni 2efdf660b1
Increase evaluation failures on Commit() (#8770)
I think we should increment the metric here, we're setting the rule
health anyways. This means even if the "evaluation" suceeded, none of
the samples made it to storage.

This is a simplified solution to: https://github.com/prometheus/prometheus/pull/8410/

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2021-04-29 14:28:48 +02:00
Björn Rabenstein 9549a15c6f
Merge pull request #7675 from JessicaGreben/jg/11-retroactive-rule-eval
Add rule importer to backfill
2021-03-29 19:09:21 +02:00
Goutham Veeramachaneni 4b5ab80ca6
[rule] Update rule health for append/commit fails (#8619)
* [rule] Update rule health for append/commit fails

Similar to https://github.com/prometheus/prometheus/pull/8410 will
provide more context.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Add test for updating health on append fails

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2021-03-18 15:44:33 +01:00
jessicagreben 78e84aed89 resolve merge conflict
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2021-02-24 09:47:29 -08:00
Tom Wilkie 7369561305
Combine Appender.Add and AddFast into a single Append method. (#8489)
This moves the label lookup into TSDB, whilst still keeping the cached-ref optimisation for repeated Appends.

This makes the API easier to consume and implement.  In particular this change is motivated by the scrape-time-aggregation work, which I don't think is possible to implement without it as it needs access to label values.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2021-02-18 17:37:00 +05:30
jessicagreben ac06d0a657 merge master/resolve conflict
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2020-11-26 08:43:07 -08:00
jessicagreben 75654715d3 fix panics
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2020-11-01 07:54:04 -08:00
jessicagreben 6980bcf671 unexport backfiller
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2020-10-31 06:40:56 -07:00
Julien Pivotto 6c56a1faaa
Testify: move to require (#8122)
* Testify: move to require

Moving testify to require to fail tests early in case of errors.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* More moves

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-29 09:43:23 +00:00
Julien Pivotto 1282d1b39c
Refactor test assertions (#8110)
* Refactor test assertions

This pull request gets rid of assert.True where possible to use
fine-grained assertions.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-27 11:06:53 +01:00
Julien Pivotto 4e5b1722b3
Move away from testutil, refactor imports (#8087)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-22 11:00:08 +02:00
jessicagreben 36ac0b68f1 merge master, fix conflicts 2020-10-17 08:20:21 -07:00
Łukasz Mierzwa 19c190b406
Add a rule_group_samples metric (#7977)
This new metric allows tracking how many samples did each rule group generate.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2020-09-25 16:48:38 +01:00
李国忠 4a52faf2ae
Unnecessary go routine spawn. (#7951)
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
2020-09-21 11:29:03 +01:00
jessicagreben dfa510086b add alignment, mv rule importer to promtool dir, add queryRange
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2020-09-13 08:07:59 -07:00
Anonymous 8219b442c8
rules: Remove redundant RLock to avoid double RLock (#7183)
Signed-off-by: BurtonQin <bobbqqin@gmail.com>
2020-09-07 18:58:21 +01:00
johncming 2f2a51a43a
web/api/v1: make names consistent. (#7841)
Signed-off-by: johncming <johncming@yahoo.com>
2020-08-25 11:38:06 +01:00
Goutham Veeramachaneni cb830b0a9c
Label rule_group_iterations metric with group name (#7823)
* Label rule_group_iterations metric with group name

evalTotal and evalFailures having the label but iterations not having it
is an odd mismatch.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Remove the metrics when a group is deleted.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Initialise the metrics

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2020-08-19 15:29:13 +02:00
johncming 362080ba28
rules: add evaluationTimestamp when copy state. (#7775)
Signed-off-by: johncming <johncming@yahoo.com>
2020-08-14 09:42:13 +01:00
Callum Styan c7a17f6491
Add additional tag/log to rules Manager Eval trace span. (#7708)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2020-08-06 08:42:20 -07:00
Annanay 9bba8a6eae Merge branch 'master' into appender-context
Signed-off-by: Annanay <annanayagarwal@gmail.com>
2020-07-30 16:43:18 +05:30
Julien Pivotto e76c436e9c
Goleak in discoveries, scrape, rules (#7662)
* Add go leak tests for discoveries with goroutines

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Add go leak tests in rules

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Add go leak tests in scrape tests

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-27 09:38:08 +01:00
Annanay 7f98a744e5 Add context to Appender interface
Signed-off-by: Annanay <annanayagarwal@gmail.com>
2020-07-24 19:40:51 +05:30
Owen Diehl 9ccedc0407
Arbitrary rule & group loading (#7569)
* allows loading rule groups via an interface

Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
2020-07-22 15:19:34 +01:00
Julien Pivotto b83cbacbdd
Rule manager: remove blocking channel in mail (#7631)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-22 00:13:24 +02:00
johncming a69a8b931f
rules: fix bug for unknown alert state. (#7599)
Signed-off-by: johncming <johncming@yahoo.com>
2020-07-17 08:39:15 +01:00
Guangwen Feng ad9449d1f1
Add unit test case for func HasAlertingRules in manager.go (#7502)
Signed-off-by: Guangwen Feng <fenggw-fnst@cn.fujitsu.com>
2020-07-06 10:35:16 +01:00
Guangwen Feng 487f1e07ff
Add unit test case for func State in alerting.go (#7476)
Signed-off-by: Guangwen Feng <fenggw-fnst@cn.fujitsu.com>
2020-06-29 13:16:52 +01:00