prometheus/rules
gotjosh e7219e3d36
Rule Manager: Add rule_group_last_restore_duration_seconds to measure restore time per rule group
When a rule group changes or prometheus is restarted we need to ensure we restore the active alerts that were firing for a corresponding rule, for that Prometheus uses the `ALERTS_FOR_STATE` series to query the previous state and restore it. If a given rule has high cardinality (think 100s of 1000s for series) this proccess can take a bit of time - this is the first of a series of PRs to improve this problem and I'd like to start with exposing the time it takes to restore a rule group as a gauge.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-23 09:57:08 +01:00
..
fixtures Block until all rules, both sync & async, have completed evaluating 2024-01-29 10:08:41 +01:00
alerting.go Support expansion of native histogram values in alert templates 2024-03-26 22:30:01 +01:00
alerting_test.go Support expansion of native histogram values in alert templates 2024-03-26 22:30:01 +01:00
group.go Rule Manager: Add rule_group_last_restore_duration_seconds to measure restore time per rule group 2024-04-23 09:57:08 +01:00
manager.go Allow using alternative PromQL engines for rule evaluation 2024-03-06 14:54:33 +11:00
manager_test.go golangci-lint: enable whitespace linter (#13905) 2024-04-11 09:27:54 +01:00
origin.go Decouple ruler dependency controller from concurrency controller 2024-02-02 10:06:37 +01:00
origin_test.go Decouple ruler dependency controller from concurrency controller 2024-02-02 10:06:37 +01:00
recording.go Decouple ruler dependency controller from concurrency controller 2024-02-02 10:06:37 +01:00
recording_test.go Tests: use replacement DeepEquals using go-cmp 2024-02-08 19:30:20 +00:00
rule.go Decouple ruler dependency controller from concurrency controller 2024-02-02 10:06:37 +01:00