Commit graph

551 commits

Author SHA1 Message Date
Marco Pracucci 5ee3fbe825
Decouple ruler dependency controller from concurrency controller
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-02-02 10:06:37 +01:00
Marco Pracucci cbbbd6e70a
Remove superfluous nil check in Group.metrics
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-01-29 10:21:57 +01:00
Marco Pracucci 046cd7599f
Introduced sequentialRuleEvalController
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-01-29 10:19:18 +01:00
Marco Pracucci 23f89c18b2
Improved RuleConcurrencyController interface doc
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-01-29 10:18:29 +01:00
Marco Pracucci 2764c46531
Added more test cases to TestDependenciesEdgeCases
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-01-29 10:18:03 +01:00
Marco Pracucci 52bc568d04
Add more test cases to TestDependenciesEdgeCases
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-01-29 10:17:13 +01:00
Marco Pracucci 21a03dc018
Simplify the design to update concurrency controller once the rule evaluation has done
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-01-29 10:16:31 +01:00
Danny Kopping 7aa3b10c3f
Block until all rules, both sync & async, have completed evaluating
Updated & added tests
Review feedback nits
Return empty map if not indeterminate
Use highWatermark to track inflight requests counter
Appease the linter
Clarify feature flag

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2024-01-29 10:08:41 +01:00
Danny Kopping f922534c4d
Refactoring for performance, and to allow controller to be overridden
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2024-01-29 10:08:41 +01:00
Danny Kopping 94cdfa30cd
Refactoring
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2024-01-29 10:08:41 +01:00
Danny Kopping 0dc7036db3
Optimising dependencies/dependents funcs to not produce new slices each request
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2024-01-29 10:08:41 +01:00
Danny Kopping e7758d187e
Refactor concurrency control
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2024-01-29 10:08:39 +01:00
Danny Kopping 940f83a540
Implementation
NOTE:
Rebased from main after refactor in #13014

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2024-01-29 10:07:15 +01:00
Filip Petkovski 583f3e587c
Optimize histogram iterators (#13340)
Optimize histogram iterators

Histogram iterators allocate new objects in the AtHistogram and
AtFloatHistogram methods, which makes calculating rates over long
ranges expensive.

In #13215 we allowed an existing object to be reused
when converting an integer histogram to a float histogram. This commit follows
the same idea and allows injecting an existing object in the AtHistogram and
AtFloatHistogram methods. When the injected value is nil, iterators allocate
new histograms, otherwise they populate and return the injected object.

The commit also adds a CopyTo method to Histogram and FloatHistogram which
is used in the BufferedIterator to overwrite items in the ring instead of making
new copies.

Note that a specialized HPoint pool is needed for all of this to work 
(`matrixSelectorHPool`).

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
2024-01-23 17:02:14 +01:00
Filip Petkovski 10a82f87fd
Enable reusing memory when converting between histogram types
The 'ToFloat' method on integer histograms currently allocates new memory
each time it is called.

This commit adds an optional *FloatHistogram parameter that can be used
to reuse span and bucket slices. It is up to the caller to make sure the
input float histogram is not used anymore after the call.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2023-12-08 10:22:59 +01:00
Matthieu MOREL 9c4782f1cc
golangci-lint: enable testifylint linter (#13254)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-12-07 11:35:01 +00:00
Björn Rabenstein a43669e611
Merge pull request #12928 from alexandear/ci-enable-godot
ci(lint): enable godot; append dot at the end of comments
2023-11-01 17:15:41 +01:00
Oleksandr Redko fa90ca46e5 ci(lint): enable godot; append dot at the end of comments
Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>
2023-10-31 19:53:38 +02:00
Charles Korn 9a8dbf06bc
Address PR feedback
Co-authored-by: Julien Pivotto <roidelapluie@o11y.eu>
Signed-off-by: Charles Korn <charleskorn@users.noreply.github.com>
2023-10-31 09:56:05 +11:00
Charles Korn 667a1efb04
Add trace ID to log lines emitted during rule evaluation
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2023-10-26 16:14:54 +11:00
Charles Korn fc132a4557
Use common logger instance to reduce duplication in Group.Eval()
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2023-10-26 16:14:12 +11:00
Danny Kopping 498b836654
Refactoring manager.go into separate concerns
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2023-10-21 11:11:11 +02:00
Goutham Veeramachaneni 86729d4d7b
Update exp package (#12650) 2023-09-21 22:53:51 +02:00
Arve Knudsen 6daee89e5f
Add context argument to Querier.Select (#12660)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2023-09-12 12:37:38 +02:00
Michael Hoffmann 4d8e380269
promql: allow tests to be imported (#12050)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2023-08-18 20:48:59 +02:00
Julien Pivotto 782e6f64fb
Merge pull request #11295 from dimitarvdimitrov/dimitar/simplify-evalTimestamp
Simplify rule group's EvalTimestamp formula
2023-07-18 13:21:20 +02:00
Bryan Boreham 5255bf06ad Replace sort.Slice with faster slices.SortFunc
The generic version is more efficient.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-07-02 22:17:08 +00:00
beorn7 5b53aa1108 style: Replace else if cascades with switch
Wiser coders than myself have come to the conclusion that a `switch`
statement is almost always superior to a statement that includes any
`else if`.

The exceptions that I have found in our codebase are just these two:

* The `if else` is followed by an additional statement before the next
  condition (separated by a `;`).
* The whole thing is within a `for` loop and `break` statements are
  used. In this case, using `switch` would require tagging the `for`
  loop, which probably tips the balance.

Why are `switch` statements more readable?

For one, fewer curly braces. But more importantly, the conditions all
have the same alignment, so the whole thing follows the natural flow
of going down a list of conditions. With `else if`, in contrast, all
conditions but the first are "hidden" behind `} else if `, harder to
spot and (for no good reason) presented differently from the first
condition.

I'm sure the aforemention wise coders can list even more reasons.

In any case, I like it so much that I have found myself recommending
it in code reviews. I would like to make it a habit in our code base,
without making it a hard requirement that we would test on the CI. But
for that, there has to be a role model, so this commit eliminates all
`if else` occurrences, unless it is autogenerated code or fits one of
the exceptions above.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-19 17:22:31 +02:00
beorn7 c3c7d44d84 lint: Adjust to the lint warnings raised by current versions of golint-ci
We haven't updated golint-ci in our CI yet, but this commit prepares
for that.

There are a lot of new warnings, and it is mostly because the "revive"
linter got updated. I agree with most of the new warnings, mostly
around not naming unused function parameters (although it is justified
in some cases for documentation purposes – while things like mocks are
a good example where not naming the parameter is clearer).

I'm pretty upset about the "empty block" warning to include `for`
loops. It's such a common pattern to do something in the head of the
`for` loop and then have an empty block. There is still an open issue
about this: https://github.com/mgechev/revive/issues/810 I have
disabled "revive" altogether in files where empty blocks are used
excessively, and I have made the effort to add individual
`// nolint:revive` where empty blocks are used just once or twice.
It's borderline noisy, though, but let's go with it for now.

I should mention that none of the "empty block" warnings for `for`
loop bodies were legitimate.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-19 17:10:10 +02:00
Ben Ye fd3630b9a3 add ctx to QueryEngine interface
Signed-off-by: Ben Ye <benye@amazon.com>
2023-04-17 21:32:38 -07:00
beorn7 c0879d64cf promql: Separate Point into FPoint and HPoint
In other words: Instead of having a “polymorphous” `Point` that can
either contain a float value or a histogram value, use an `FPoint` for
floats and an `HPoint` for histograms.

This seemingly small change has a _lot_ of repercussions throughout
the codebase.

The idea here is to avoid the increase in size of `Point` arrays that
happened after native histograms had been added.

The higher-level data structures (`Sample`, `Series`, etc.) are still
“polymorphous”. The same idea could be applied to them, but at each
step the trade-offs needed to be evaluated.

The idea with this change is to do the minimum necessary to get back
to pre-histogram performance for functions that do not touch
histograms. Here are comparisons for the `changes` function. The test
data doesn't include histograms yet. Ideally, there would be no change
in the benchmark result at all.

First runtime v2.39 compared to directly prior to this commit:

```
name                                                  old time/op    new time/op    delta
RangeQuery/expr=changes(a_one[1d]),steps=1-16            391µs ± 2%     542µs ± 1%  +38.58%  (p=0.000 n=9+8)
RangeQuery/expr=changes(a_one[1d]),steps=10-16           452µs ± 2%     617µs ± 2%  +36.48%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_one[1d]),steps=100-16         1.12ms ± 1%    1.36ms ± 2%  +21.58%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_one[1d]),steps=1000-16        7.83ms ± 1%    8.94ms ± 1%  +14.21%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_ten[1d]),steps=1-16           2.98ms ± 0%    3.30ms ± 1%  +10.67%  (p=0.000 n=9+10)
RangeQuery/expr=changes(a_ten[1d]),steps=10-16          3.66ms ± 1%    4.10ms ± 1%  +11.82%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_ten[1d]),steps=100-16         10.5ms ± 0%    11.8ms ± 1%  +12.50%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_ten[1d]),steps=1000-16        77.6ms ± 1%    87.4ms ± 1%  +12.63%  (p=0.000 n=9+9)
RangeQuery/expr=changes(a_hundred[1d]),steps=1-16       30.4ms ± 2%    32.8ms ± 1%   +8.01%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=10-16      37.1ms ± 2%    40.6ms ± 2%   +9.64%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=100-16      105ms ± 1%     117ms ± 1%  +11.69%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16     783ms ± 3%     876ms ± 1%  +11.83%  (p=0.000 n=9+10)
```

And then runtime v2.39 compared to after this commit:

```
name                                                  old time/op    new time/op    delta
RangeQuery/expr=changes(a_one[1d]),steps=1-16            391µs ± 2%     547µs ± 1%  +39.84%  (p=0.000 n=9+8)
RangeQuery/expr=changes(a_one[1d]),steps=10-16           452µs ± 2%     616µs ± 2%  +36.15%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_one[1d]),steps=100-16         1.12ms ± 1%    1.26ms ± 1%  +12.20%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_one[1d]),steps=1000-16        7.83ms ± 1%    7.95ms ± 1%   +1.59%  (p=0.000 n=10+8)
RangeQuery/expr=changes(a_ten[1d]),steps=1-16           2.98ms ± 0%    3.38ms ± 2%  +13.49%  (p=0.000 n=9+10)
RangeQuery/expr=changes(a_ten[1d]),steps=10-16          3.66ms ± 1%    4.02ms ± 1%   +9.80%  (p=0.000 n=10+9)
RangeQuery/expr=changes(a_ten[1d]),steps=100-16         10.5ms ± 0%    10.8ms ± 1%   +3.08%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_ten[1d]),steps=1000-16        77.6ms ± 1%    78.1ms ± 1%   +0.58%  (p=0.035 n=9+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=1-16       30.4ms ± 2%    33.5ms ± 4%  +10.18%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=10-16      37.1ms ± 2%    40.0ms ± 1%   +7.98%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=100-16      105ms ± 1%     107ms ± 1%   +1.92%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16     783ms ± 3%     775ms ± 1%   -1.02%  (p=0.019 n=9+9)
```

In summary, the runtime doesn't really improve with this change for
queries with just a few steps. For queries with many steps, this
commit essentially reinstates the old performance. This is good
because the many-step queries are the one that matter most (longest
absolute runtime).

In terms of allocations, though, this commit doesn't make a dent at
all (numbers not shown). The reason is that most of the allocations
happen in the sampleRingIterator (in the storage package), which has
to be addressed in a separate commit.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-13 19:25:16 +02:00
Soon-Ping 6cecb87941
Generalized rule group iteration evaluation hook (#11885)
Signed-off-by: Soon-Ping Phang <soonping@amazon.com>
2023-04-04 20:21:13 +02:00
Bryan Boreham b987afa7ef labels: simplify call to get Labels from Builder
It took a `Labels` where the memory could be re-used, but in practice
this hardly ever benefitted. Especially after converting `relabel.Process`
to `relabel.ProcessBuilder`.

Comparing the parameter to `nil` was a bug; `EmptyLabels` is not `nil`
so the slice was reallocated multiple times by `append`.

Lastly `Builder.Labels()` now estimates that the final size will depend
on labels added and deleted.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-03-22 17:05:20 +00:00
Björn Rabenstein 847093479b
Merge pull request #11978 from trevorwhitney/set-counter-hint
Set `CounterResetHint` and use in recording rules
2023-03-14 21:52:41 +01:00
Trevor Whitney c3e0a83725
rules: no longer force CounterResetHint to Gauge
Signed-off-by: Trevor Whitney <trevorjwhitney@gmail.com>
2023-03-14 14:22:07 -06:00
Charles Korn 3db98d7dde
Avoid unnecessary allocations in recording rule evaluation (#11812)
Re-use the Builder each time round the loop.
2023-03-08 12:57:19 +00:00
Bryan Boreham 3f7ba22bde rules: two places need to call EmptyLabels
Can't assume nil is a valid value.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-02-22 15:14:07 +00:00
Julien Pivotto 259bb5c692
Merge pull request #11826 from dannykopping/dannykopping/rule-eval
Pass rule details in evaluation context
2023-02-14 21:38:19 +01:00
Justin Lei af1d9e01c7
Refactor tsdbutil for tests/native histograms (#11948)
* Add float histograms to ChunkFromSamplesGeneric

Signed-off-by: Justin Lei <justin.lei@grafana.com>

* Add Generate*Samples functions to tsdbutil

Signed-off-by: Justin Lei <justin.lei@grafana.com>

* PR responses

Signed-off-by: Justin Lei <justin.lei@grafana.com>

---------

Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-02-10 17:09:33 +05:30
Danny Kopping 98c70e1817
Correcting NewAlertingRule args
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2023-01-26 13:21:50 +02:00
Danny Kopping df078e0a84
Merge branch 'main' into dannykopping/rule-eval
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2023-01-26 13:10:18 +02:00
Julien Pivotto e811d14963 Add comments
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-01-23 13:59:43 +01:00
Danny Kopping c4ca791f18
Appeasing the linter
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2023-01-20 10:53:42 +02:00
Danny Kopping 6486d28c7a
Panic if rule type was not expected
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2023-01-20 10:27:50 +02:00
Julien Pivotto c0724f4e62 New test
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-01-19 11:56:04 +01:00
Julien Pivotto 2c408289f8 Add stabilizing to UI
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-01-19 11:33:54 +01:00
Julien Pivotto 5ad74e6e71 Add tests
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-01-19 10:36:01 +01:00
Julien Pivotto ce55e5074d Add 'keep_firing_for' field to alerting rules
This commit adds a new 'keep_firing_for' field to Prometheus alerting
rules. The 'resolve_delay' field specifies the minimum amount of time
that an alert should remain firing, even if the expression does not
return any results.

This feature was discussed at a previous dev summit, and it was
determined that a feature like this would be useful in order to allow
the expression time to stabilize and prevent confusing resolved messages
from being propagated through Alertmanager.

This approach is simpler than having two PromQL queries, as was
sometimes discussed, and it should be easy to implement.

This commit does not include tests for the 'resolve_delay' field.  This
is intentional, as the purpose of this commit is to gather comments on
the proposed design of the 'resolve_delay' field before implementing
tests. Once the design of the 'resolve_delay' field has been finalized,
a follow-up commit will be submitted with tests."

See https://github.com/prometheus/prometheus/issues/11570

Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-01-13 12:11:39 +01:00
Ganesh Vernekar d82ea2eb1c
Merge pull request #11838 from codesome/histo-rec
rules: Support native histograms
2023-01-12 12:35:15 +05:30
Ganesh Vernekar 98a0523e4a
rules: Test native histograms in recording rules
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-01-11 18:27:57 +05:30