Commit graph

265 commits

Author SHA1 Message Date
Julien Duchesne 6607a2ba7c
Rules: Store dependencies instead of boolean
To improve https://github.com/prometheus/prometheus/pull/15681 further, we'll need to store the dependencies and dependents of each

Right now, if a rule has both (at least 1) dependents and dependencies, it is not possible to determine the order to run the rules and they must all run sequentially

This PR only changes the dependents and dependencies attributes of rules, it does not implement a new topological sort algorithm

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>
2024-12-17 23:18:33 -05:00
Julien Duchesne e2f037e554
rules: Add new RuleEvaluationTimeSum field to groups (#15672)
Some checks are pending
CI / Go tests (push) Waiting to run
CI / More Go tests (push) Waiting to run
CI / Go tests with previous Go version (push) Waiting to run
CI / UI tests (push) Waiting to run
CI / Go tests on Windows (push) Waiting to run
CI / Mixins tests (push) Waiting to run
CI / Build Prometheus for common architectures (0) (push) Waiting to run
CI / Build Prometheus for common architectures (1) (push) Waiting to run
CI / Build Prometheus for common architectures (2) (push) Waiting to run
CI / Build Prometheus for all architectures (0) (push) Waiting to run
CI / Build Prometheus for all architectures (1) (push) Waiting to run
CI / Build Prometheus for all architectures (10) (push) Waiting to run
CI / Build Prometheus for all architectures (11) (push) Waiting to run
CI / Build Prometheus for all architectures (2) (push) Waiting to run
CI / Build Prometheus for all architectures (3) (push) Waiting to run
CI / Build Prometheus for all architectures (4) (push) Waiting to run
CI / Build Prometheus for all architectures (5) (push) Waiting to run
CI / Build Prometheus for all architectures (6) (push) Waiting to run
CI / Build Prometheus for all architectures (7) (push) Waiting to run
CI / Build Prometheus for all architectures (8) (push) Waiting to run
CI / Build Prometheus for all architectures (9) (push) Waiting to run
CI / Report status of build Prometheus for all architectures (push) Blocked by required conditions
CI / Check generated parser (push) Waiting to run
CI / golangci-lint (push) Waiting to run
CI / fuzzing (push) Waiting to run
CI / codeql (push) Waiting to run
CI / Publish main branch artifacts (push) Blocked by required conditions
CI / Publish release artefacts (push) Blocked by required conditions
CI / Publish UI on npm Registry (push) Blocked by required conditions
Scorecards supply-chain security / Scorecards analysis (push) Waiting to run
* feat(ruler): Add new `RuleEvaluationTimeSum` field to groups
Coupled with a metric: `rule_group_last_rule_duration_sum_seconds`

This will give us more observability into how fast a group runs with or without concurrency

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>

* Update rules/group.go

Co-authored-by: gotjosh <josue.abreu@gmail.com>
Signed-off-by: Julien Duchesne <julienduchesne@live.com>
Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>

* Apply suggestions from code review

Co-authored-by: gotjosh <josue.abreu@gmail.com>
Signed-off-by: Julien Duchesne <julienduchesne@live.com>
Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>

* Remove `in seconds`. A duration is a duration

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>

---------

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>
Signed-off-by: Julien Duchesne <julienduchesne@live.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
2024-12-13 21:42:46 +00:00
Julien Duchesne 7802ca263d
RuleDependencyController: Fix for indeterminate conditions (#15560)
The dependency map being empty meant that all the rules were being set as having no dependencies or dependents. Which is the opposite of what we want
Added two new tests that verify the behavior, they failed before the fix, running all the rules concurrently

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>
2024-12-13 16:48:29 +00:00
Bryan Boreham 70e2d23027
Merge pull request #11474 from clwluvw/group-label
Some checks are pending
CI / Go tests (push) Waiting to run
CI / More Go tests (push) Waiting to run
CI / Go tests with previous Go version (push) Waiting to run
CI / UI tests (push) Waiting to run
CI / Go tests on Windows (push) Waiting to run
CI / Mixins tests (push) Waiting to run
CI / Build Prometheus for common architectures (0) (push) Waiting to run
CI / Build Prometheus for common architectures (1) (push) Waiting to run
CI / Build Prometheus for common architectures (2) (push) Waiting to run
CI / Build Prometheus for all architectures (0) (push) Waiting to run
CI / Build Prometheus for all architectures (1) (push) Waiting to run
CI / Build Prometheus for all architectures (10) (push) Waiting to run
CI / Build Prometheus for all architectures (11) (push) Waiting to run
CI / Build Prometheus for all architectures (2) (push) Waiting to run
CI / Build Prometheus for all architectures (3) (push) Waiting to run
CI / Build Prometheus for all architectures (4) (push) Waiting to run
CI / Build Prometheus for all architectures (5) (push) Waiting to run
CI / Build Prometheus for all architectures (6) (push) Waiting to run
CI / Build Prometheus for all architectures (7) (push) Waiting to run
CI / Build Prometheus for all architectures (8) (push) Waiting to run
CI / Build Prometheus for all architectures (9) (push) Waiting to run
CI / Report status of build Prometheus for all architectures (push) Blocked by required conditions
CI / Check generated parser (push) Waiting to run
CI / golangci-lint (push) Waiting to run
CI / fuzzing (push) Waiting to run
CI / codeql (push) Waiting to run
CI / Publish main branch artifacts (push) Blocked by required conditions
CI / Publish release artefacts (push) Blocked by required conditions
CI / Publish UI on npm Registry (push) Blocked by required conditions
Scorecards supply-chain security / Scorecards analysis (push) Waiting to run
[FEATURE] rules: add labels at group level
2024-10-21 14:47:12 +01:00
TJ Hoplock 6ebfbd2d54 chore!: adopt log/slog, remove go-kit/log
For: #14355

This commit updates Prometheus to adopt stdlib's log/slog package in
favor of go-kit/log. As part of converting to use slog, several other
related changes are required to get prometheus working, including:
- removed unused logging util func `RateLimit()`
- forward ported the util/logging/Deduper logging by implementing a small custom slog.Handler that does the deduping before chaining log calls to the underlying real slog.Logger
- move some of the json file logging functionality to use prom/common package functionality
- refactored some of the new json file logging for scraping
- changes to promql.QueryLogger interface to swap out logging methods for relevant slog sugar wrappers
- updated lots of tests that used/replicated custom logging functionality, attempting to keep the logical goal of the tests consistent after the transition
- added a healthy amount of `if logger == nil { $makeLogger }` type conditional checks amongst various functions where none were provided -- old code that used the go-kit/log.Logger interface had several places where there were nil references when trying to use functions like `With()` to add keyvals on the new *slog.Logger type

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
2024-10-07 15:58:50 -04:00
Seena Fallah f253d36361 rule: allow merging labels from group level
Support merging labels from groups to rule labels

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2024-07-26 20:18:05 +02:00
gotjosh 465891cc56
Rules: Refactor concurrency controller interface (#14491)
* Rules: Refactor concurrency controller interface

Even though the main purpose of this refactor is to modify the interface of the concurrency controller to accept a Context. I did two drive-by modifications that I think are sensible:

1. I have moved the check for dependencies on rules to the controller itself - this aligns with how the controller should behave as it is a deciding factor on wether we should run concurrently or not.
2. I cleaned up some unused methods from the days of the old interface before #13527 changed it.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
---------

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-07-22 14:11:18 +01:00
Saswata Mukherjee 398f42de5f
Add label-matcher support to Rules API (#10194)
* Add label-matcher support to Rules API

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* Implement suggestions

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* Match any matcherSet instead of all

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* Don't treat labels.Labels as slice

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* Remove non-templated check and fix tests

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* Update docs

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix comments

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix comment

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* Add comment for matching logic, fix tests after rebase

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Yijie Qin <qinyijie@amazon.com>
Co-authored-by: Yijie Qin <qinyijie@amazon.com>
2024-07-10 13:18:29 +01:00
Raphael Silva e0c9b2ee19 Fix linting errors
Signed-off-by: Raphael Silva <rapphil@gmail.com>
2024-06-28 23:44:08 +00:00
Raphael Silva cd5a7b5020 Make rules Manager Update method no-op after Close
This has to be done because Close and Update methods are accessed concurrently.

Signed-off-by: Raphael Silva <rapphil@gmail.com>
2024-06-28 23:39:46 +00:00
gotjosh 37b408c6cd
Feature: Allow configuration of a rule evaluation delay (#14061)
* [PATCH] Allow having evaluation delay for rule groups

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* [PATCH] Fix lint

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* [PATCH] Move the option to ManagerOptions

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* [PATCH] Include evaluation_delay in the group config

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix comments

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Add a server configuration option.

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Appease the linter #1

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Add the new server flag documentation

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Improve documentation of the new flag and configuration

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Use named parameters for clarity on the `Rule` interface

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Add `initial` to the flag help

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Change the CHANGELOG area from `ruler` to `rules`

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Rename evaluation_delay to `rule_query_offset`/`query_offset` and make it a global configuration option.

Signed-off-by: gotjosh <josue.abreu@gmail.com>

E Your branch is up to date with 'origin/gotjosh/evaluation-delay'.

* more docs

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Improve wording on CHANGELOG

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Add `RuleQueryOffset` to the default config in tests in case it changes

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Update docs/configuration/recording_rules.md

Co-authored-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Rename `RuleQueryOffset` to `QueryOffset` when in the group context.

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Improve docstring and documentation on the `rule_query_offset`

Signed-off-by: gotjosh <josue.abreu@gmail.com>

---------

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Julius Volz <julius.volz@gmail.com>
2024-05-30 11:49:50 +01:00
Charles Korn 4e77e8e5ef
Allow using alternative PromQL engines for rule evaluation
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2024-03-06 14:54:33 +11:00
machine424 f477e0539a
Move from golang.org/x/exp/slices into slices now that we only support Go >= 1.21
Prevent adding back golang.org/x/exp/slices.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-02-28 14:54:53 +01:00
Marco Pracucci 5ee3fbe825
Decouple ruler dependency controller from concurrency controller
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-02-02 10:06:37 +01:00
Marco Pracucci 046cd7599f
Introduced sequentialRuleEvalController
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-01-29 10:19:18 +01:00
Marco Pracucci 23f89c18b2
Improved RuleConcurrencyController interface doc
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-01-29 10:18:29 +01:00
Danny Kopping 7aa3b10c3f
Block until all rules, both sync & async, have completed evaluating
Updated & added tests
Review feedback nits
Return empty map if not indeterminate
Use highWatermark to track inflight requests counter
Appease the linter
Clarify feature flag

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2024-01-29 10:08:41 +01:00
Danny Kopping f922534c4d
Refactoring for performance, and to allow controller to be overridden
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2024-01-29 10:08:41 +01:00
Danny Kopping 94cdfa30cd
Refactoring
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2024-01-29 10:08:41 +01:00
Danny Kopping e7758d187e
Refactor concurrency control
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2024-01-29 10:08:39 +01:00
Danny Kopping 940f83a540
Implementation
NOTE:
Rebased from main after refactor in #13014

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2024-01-29 10:07:15 +01:00
Oleksandr Redko fa90ca46e5 ci(lint): enable godot; append dot at the end of comments
Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>
2023-10-31 19:53:38 +02:00
Danny Kopping 498b836654
Refactoring manager.go into separate concerns
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2023-10-21 11:11:11 +02:00
Goutham Veeramachaneni 86729d4d7b
Update exp package (#12650) 2023-09-21 22:53:51 +02:00
Arve Knudsen 6daee89e5f
Add context argument to Querier.Select (#12660)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2023-09-12 12:37:38 +02:00
Julien Pivotto 782e6f64fb
Merge pull request #11295 from dimitarvdimitrov/dimitar/simplify-evalTimestamp
Simplify rule group's EvalTimestamp formula
2023-07-18 13:21:20 +02:00
Bryan Boreham 5255bf06ad Replace sort.Slice with faster slices.SortFunc
The generic version is more efficient.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-07-02 22:17:08 +00:00
beorn7 c3c7d44d84 lint: Adjust to the lint warnings raised by current versions of golint-ci
We haven't updated golint-ci in our CI yet, but this commit prepares
for that.

There are a lot of new warnings, and it is mostly because the "revive"
linter got updated. I agree with most of the new warnings, mostly
around not naming unused function parameters (although it is justified
in some cases for documentation purposes – while things like mocks are
a good example where not naming the parameter is clearer).

I'm pretty upset about the "empty block" warning to include `for`
loops. It's such a common pattern to do something in the head of the
`for` loop and then have an empty block. There is still an open issue
about this: https://github.com/mgechev/revive/issues/810 I have
disabled "revive" altogether in files where empty blocks are used
excessively, and I have made the effort to add individual
`// nolint:revive` where empty blocks are used just once or twice.
It's borderline noisy, though, but let's go with it for now.

I should mention that none of the "empty block" warnings for `for`
loop bodies were legitimate.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-19 17:10:10 +02:00
Ben Ye fd3630b9a3 add ctx to QueryEngine interface
Signed-off-by: Ben Ye <benye@amazon.com>
2023-04-17 21:32:38 -07:00
beorn7 c0879d64cf promql: Separate Point into FPoint and HPoint
In other words: Instead of having a “polymorphous” `Point` that can
either contain a float value or a histogram value, use an `FPoint` for
floats and an `HPoint` for histograms.

This seemingly small change has a _lot_ of repercussions throughout
the codebase.

The idea here is to avoid the increase in size of `Point` arrays that
happened after native histograms had been added.

The higher-level data structures (`Sample`, `Series`, etc.) are still
“polymorphous”. The same idea could be applied to them, but at each
step the trade-offs needed to be evaluated.

The idea with this change is to do the minimum necessary to get back
to pre-histogram performance for functions that do not touch
histograms. Here are comparisons for the `changes` function. The test
data doesn't include histograms yet. Ideally, there would be no change
in the benchmark result at all.

First runtime v2.39 compared to directly prior to this commit:

```
name                                                  old time/op    new time/op    delta
RangeQuery/expr=changes(a_one[1d]),steps=1-16            391µs ± 2%     542µs ± 1%  +38.58%  (p=0.000 n=9+8)
RangeQuery/expr=changes(a_one[1d]),steps=10-16           452µs ± 2%     617µs ± 2%  +36.48%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_one[1d]),steps=100-16         1.12ms ± 1%    1.36ms ± 2%  +21.58%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_one[1d]),steps=1000-16        7.83ms ± 1%    8.94ms ± 1%  +14.21%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_ten[1d]),steps=1-16           2.98ms ± 0%    3.30ms ± 1%  +10.67%  (p=0.000 n=9+10)
RangeQuery/expr=changes(a_ten[1d]),steps=10-16          3.66ms ± 1%    4.10ms ± 1%  +11.82%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_ten[1d]),steps=100-16         10.5ms ± 0%    11.8ms ± 1%  +12.50%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_ten[1d]),steps=1000-16        77.6ms ± 1%    87.4ms ± 1%  +12.63%  (p=0.000 n=9+9)
RangeQuery/expr=changes(a_hundred[1d]),steps=1-16       30.4ms ± 2%    32.8ms ± 1%   +8.01%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=10-16      37.1ms ± 2%    40.6ms ± 2%   +9.64%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=100-16      105ms ± 1%     117ms ± 1%  +11.69%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16     783ms ± 3%     876ms ± 1%  +11.83%  (p=0.000 n=9+10)
```

And then runtime v2.39 compared to after this commit:

```
name                                                  old time/op    new time/op    delta
RangeQuery/expr=changes(a_one[1d]),steps=1-16            391µs ± 2%     547µs ± 1%  +39.84%  (p=0.000 n=9+8)
RangeQuery/expr=changes(a_one[1d]),steps=10-16           452µs ± 2%     616µs ± 2%  +36.15%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_one[1d]),steps=100-16         1.12ms ± 1%    1.26ms ± 1%  +12.20%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_one[1d]),steps=1000-16        7.83ms ± 1%    7.95ms ± 1%   +1.59%  (p=0.000 n=10+8)
RangeQuery/expr=changes(a_ten[1d]),steps=1-16           2.98ms ± 0%    3.38ms ± 2%  +13.49%  (p=0.000 n=9+10)
RangeQuery/expr=changes(a_ten[1d]),steps=10-16          3.66ms ± 1%    4.02ms ± 1%   +9.80%  (p=0.000 n=10+9)
RangeQuery/expr=changes(a_ten[1d]),steps=100-16         10.5ms ± 0%    10.8ms ± 1%   +3.08%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_ten[1d]),steps=1000-16        77.6ms ± 1%    78.1ms ± 1%   +0.58%  (p=0.035 n=9+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=1-16       30.4ms ± 2%    33.5ms ± 4%  +10.18%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=10-16      37.1ms ± 2%    40.0ms ± 1%   +7.98%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=100-16      105ms ± 1%     107ms ± 1%   +1.92%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16     783ms ± 3%     775ms ± 1%   -1.02%  (p=0.019 n=9+9)
```

In summary, the runtime doesn't really improve with this change for
queries with just a few steps. For queries with many steps, this
commit essentially reinstates the old performance. This is good
because the many-step queries are the one that matter most (longest
absolute runtime).

In terms of allocations, though, this commit doesn't make a dent at
all (numbers not shown). The reason is that most of the allocations
happen in the sampleRingIterator (in the storage package), which has
to be addressed in a separate commit.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-13 19:25:16 +02:00
Soon-Ping 6cecb87941
Generalized rule group iteration evaluation hook (#11885)
Signed-off-by: Soon-Ping Phang <soonping@amazon.com>
2023-04-04 20:21:13 +02:00
Trevor Whitney c3e0a83725
rules: no longer force CounterResetHint to Gauge
Signed-off-by: Trevor Whitney <trevorjwhitney@gmail.com>
2023-03-14 14:22:07 -06:00
Julien Pivotto ce55e5074d Add 'keep_firing_for' field to alerting rules
This commit adds a new 'keep_firing_for' field to Prometheus alerting
rules. The 'resolve_delay' field specifies the minimum amount of time
that an alert should remain firing, even if the expression does not
return any results.

This feature was discussed at a previous dev summit, and it was
determined that a feature like this would be useful in order to allow
the expression time to stabilize and prevent confusing resolved messages
from being propagated through Alertmanager.

This approach is simpler than having two PromQL queries, as was
sometimes discussed, and it should be easy to implement.

This commit does not include tests for the 'resolve_delay' field.  This
is intentional, as the purpose of this commit is to gather comments on
the proposed design of the 'resolve_delay' field before implementing
tests. Once the design of the 'resolve_delay' field has been finalized,
a follow-up commit will be submitted with tests."

See https://github.com/prometheus/prometheus/issues/11570

Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-01-13 12:11:39 +01:00
Ganesh Vernekar d82ea2eb1c
Merge pull request #11838 from codesome/histo-rec
rules: Support native histograms
2023-01-12 12:35:15 +05:30
Ganesh Vernekar 53a5071a72
rules: Support native histograms
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-01-10 19:07:24 +05:30
Ganesh Vernekar f1a332c496
rules: Consider ErrTooOldSample in expected errors
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-01-05 14:49:30 +05:30
Bryan Boreham 3c7de69059 storage: allow re-use of iterators
Patterned after `Chunk.Iterator()`: pass the old iterator in so it
can be re-used to avoid allocating a new object.

(This commit does not do any re-use; it is just changing all the method
signatures so re-use is possible in later commits.)

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-12-15 18:32:45 +00:00
Julius Volz 1a2c645dfa Correctly handle error unwrapping in rules and remote write receiver
errors.Unwrap() actually dangerously returns nil if the error does not have an
Unwrap() method, which is the case in at least one of these places where I
noticed that no error was being logged at all when it should have.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2022-12-15 12:50:55 +01:00
Dimitar Dimitrov 03ab8dcca0
Add comments on EvalTimestamp
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
2022-10-12 14:16:22 +02:00
Ganesh Vernekar 648be89822
Merge remote-tracking branch 'upstream/main' into fix-conflict
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-10-12 14:20:02 +05:30
Ganesh Vernekar 46b26c4f09
Fix notifier relabel changing the labels of active alerts (#11427)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-10-07 20:28:17 +05:30
Dimitar Dimitrov 3fb881af26
Simplify rule group's EvalTimestamp formula
I found it hard to understand how EvalTimestamp works, so I wanted to simplify the math there. This PR should be a noop.

Current formula is:

```
offset        = g.hash % g.interval
adjNow        = startTime - offset
base          = adjNow - (adjNow % g.interval)
EvalTimestamp = base + offset
```

I simplify `EvalTimestamp`

```
EvalTimestamp = base + offset
                # expand base
              = adjNow - (adjNow % g.interval) + offset
                # expand adjNow
              = startTime - offset - ((startTime - offset) % g.interval) + offset
                # cancel out offset
              = startTime - ((startTime - offset) % g.interval)
                # expand A+B (mod M) = (A (mod M) + B (mod M)) (mod M)
              = startTime - (startTime % g.interval - offset % g.interval) % g.interval
                # expand offset
              = startTime - (startTime % g.interval - ((g.hash % g.interval) % g.interval)) % g.interval
                # remove redundant mod g.interval
              = startTime - (startTime % g.interval - g.hash % g.interval) % g.interval
                # simplify (A (mod M) + B (mod M)) (mod M) = A+B (mod M)
              = startTime - (startTime - g.hash) % g.interval

offset            = (startTime - g.hash) % g.interval
EvalTimestamp     = startTime - offset
```

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
2022-09-13 10:52:32 +02:00
beorn7 28f028e938 Merge branch 'main' into sparsehistogram 2022-07-12 19:07:13 +02:00
Matthieu MOREL ddfa9a7cc5
refactor (rules): move from github.com/pkg/errors to 'errors' and 'fmt' (#10855)
* refactor (rules): move from github.com/pkg/errors to 'errors' and 'fmt'

Signed-off-by: Matthieu MOREL <mmorel-35@users.noreply.github.com>
2022-06-17 09:54:25 +02:00
beorn7 40ad5e284a Merge branch 'main' into beorn7/sparsehistogram 2022-06-09 20:50:30 +02:00
Julien Pivotto 3a56817a30
Rules: set otel status to ERROR when a rule fails (#10745)
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2022-05-25 10:06:17 +02:00
Julien Pivotto 0d94cdf107
rules: remove classic UI code (#10730)
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2022-05-23 16:21:50 +02:00
Łukasz Mierzwa d3c9c4f574
Stop rule manager before TSDB is stopped (#10680)
During shutdown TSDB is stopped before rule manager is stopped. Since  TSDB shutdown can take a long time (minutes or 10s of minutes) it keeps rule manager running while parts of Prometheus are already stopped (most notebly scrape manager). This can cause false positive alerts to fire, mostly those that rely on absent() calls since new sample appends will stop while alert queries are still evaluated.
Stop rules before stopping TSDB and scrape manager to avoid this problem.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2022-05-20 23:26:06 +02:00
beorn7 7ee1836ef5 Merge branch 'main' into sparsehistogram 2022-04-05 18:31:19 +02:00
Wilbert Guo 83a2e52bc2
Add SyncForState Implementation for Ruler HA (#10070)
* continuously syncing activeAt for alerts

Signed-off-by: Yijie Qin <qinyijie@amazon.com>
Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

* add import

Signed-off-by: Yijie Qin <qinyijie@amazon.com>
Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

* Refactor SyncForState and add unit tests

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

* Format code

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

* Add hook for syncForState

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

Fix go lint

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

Refactor syncForState override implementation

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

Add syncForState override func as argument to Update()

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

Fix go formatting

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

Fix circleci test errors

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

Remove overrideFunc as argument to run()

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

* remove the syncForState

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* use the override function to decide if need to replace the activeAt or not

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix test case

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix format

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* Trigger build

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fixing comments

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* return the result of map of alerts instead of single one

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* upper case the QueryforStateSeries

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* use a more generic rule group post process function type

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix indentation

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix gofmt

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix lint

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fixing naming

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix comments

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* add the lastEvalTimestamp as parameter

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fmt

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* change funcType to func

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

Co-authored-by: Yijie Qin <qinyijie@amazon.com>
Co-authored-by: Yijie Qin <63399121+qinxx108@users.noreply.github.com>
2022-03-29 02:16:46 +02:00