Commit graph

621 commits

Author SHA1 Message Date
beorn7 c0879d64cf promql: Separate Point into FPoint and HPoint
In other words: Instead of having a “polymorphous” `Point` that can
either contain a float value or a histogram value, use an `FPoint` for
floats and an `HPoint` for histograms.

This seemingly small change has a _lot_ of repercussions throughout
the codebase.

The idea here is to avoid the increase in size of `Point` arrays that
happened after native histograms had been added.

The higher-level data structures (`Sample`, `Series`, etc.) are still
“polymorphous”. The same idea could be applied to them, but at each
step the trade-offs needed to be evaluated.

The idea with this change is to do the minimum necessary to get back
to pre-histogram performance for functions that do not touch
histograms. Here are comparisons for the `changes` function. The test
data doesn't include histograms yet. Ideally, there would be no change
in the benchmark result at all.

First runtime v2.39 compared to directly prior to this commit:

```
name                                                  old time/op    new time/op    delta
RangeQuery/expr=changes(a_one[1d]),steps=1-16            391µs ± 2%     542µs ± 1%  +38.58%  (p=0.000 n=9+8)
RangeQuery/expr=changes(a_one[1d]),steps=10-16           452µs ± 2%     617µs ± 2%  +36.48%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_one[1d]),steps=100-16         1.12ms ± 1%    1.36ms ± 2%  +21.58%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_one[1d]),steps=1000-16        7.83ms ± 1%    8.94ms ± 1%  +14.21%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_ten[1d]),steps=1-16           2.98ms ± 0%    3.30ms ± 1%  +10.67%  (p=0.000 n=9+10)
RangeQuery/expr=changes(a_ten[1d]),steps=10-16          3.66ms ± 1%    4.10ms ± 1%  +11.82%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_ten[1d]),steps=100-16         10.5ms ± 0%    11.8ms ± 1%  +12.50%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_ten[1d]),steps=1000-16        77.6ms ± 1%    87.4ms ± 1%  +12.63%  (p=0.000 n=9+9)
RangeQuery/expr=changes(a_hundred[1d]),steps=1-16       30.4ms ± 2%    32.8ms ± 1%   +8.01%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=10-16      37.1ms ± 2%    40.6ms ± 2%   +9.64%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=100-16      105ms ± 1%     117ms ± 1%  +11.69%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16     783ms ± 3%     876ms ± 1%  +11.83%  (p=0.000 n=9+10)
```

And then runtime v2.39 compared to after this commit:

```
name                                                  old time/op    new time/op    delta
RangeQuery/expr=changes(a_one[1d]),steps=1-16            391µs ± 2%     547µs ± 1%  +39.84%  (p=0.000 n=9+8)
RangeQuery/expr=changes(a_one[1d]),steps=10-16           452µs ± 2%     616µs ± 2%  +36.15%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_one[1d]),steps=100-16         1.12ms ± 1%    1.26ms ± 1%  +12.20%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_one[1d]),steps=1000-16        7.83ms ± 1%    7.95ms ± 1%   +1.59%  (p=0.000 n=10+8)
RangeQuery/expr=changes(a_ten[1d]),steps=1-16           2.98ms ± 0%    3.38ms ± 2%  +13.49%  (p=0.000 n=9+10)
RangeQuery/expr=changes(a_ten[1d]),steps=10-16          3.66ms ± 1%    4.02ms ± 1%   +9.80%  (p=0.000 n=10+9)
RangeQuery/expr=changes(a_ten[1d]),steps=100-16         10.5ms ± 0%    10.8ms ± 1%   +3.08%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_ten[1d]),steps=1000-16        77.6ms ± 1%    78.1ms ± 1%   +0.58%  (p=0.035 n=9+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=1-16       30.4ms ± 2%    33.5ms ± 4%  +10.18%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=10-16      37.1ms ± 2%    40.0ms ± 1%   +7.98%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=100-16      105ms ± 1%     107ms ± 1%   +1.92%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16     783ms ± 3%     775ms ± 1%   -1.02%  (p=0.019 n=9+9)
```

In summary, the runtime doesn't really improve with this change for
queries with just a few steps. For queries with many steps, this
commit essentially reinstates the old performance. This is good
because the many-step queries are the one that matter most (longest
absolute runtime).

In terms of allocations, though, this commit doesn't make a dent at
all (numbers not shown). The reason is that most of the allocations
happen in the sampleRingIterator (in the storage package), which has
to be addressed in a separate commit.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-13 19:25:16 +02:00
Soon-Ping 6cecb87941
Generalized rule group iteration evaluation hook (#11885)
Signed-off-by: Soon-Ping Phang <soonping@amazon.com>
2023-04-04 20:21:13 +02:00
Bryan Boreham b987afa7ef labels: simplify call to get Labels from Builder
It took a `Labels` where the memory could be re-used, but in practice
this hardly ever benefitted. Especially after converting `relabel.Process`
to `relabel.ProcessBuilder`.

Comparing the parameter to `nil` was a bug; `EmptyLabels` is not `nil`
so the slice was reallocated multiple times by `append`.

Lastly `Builder.Labels()` now estimates that the final size will depend
on labels added and deleted.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-03-22 17:05:20 +00:00
Björn Rabenstein 847093479b
Merge pull request #11978 from trevorwhitney/set-counter-hint
Set `CounterResetHint` and use in recording rules
2023-03-14 21:52:41 +01:00
Trevor Whitney c3e0a83725
rules: no longer force CounterResetHint to Gauge
Signed-off-by: Trevor Whitney <trevorjwhitney@gmail.com>
2023-03-14 14:22:07 -06:00
Charles Korn 3db98d7dde
Avoid unnecessary allocations in recording rule evaluation (#11812)
Re-use the Builder each time round the loop.
2023-03-08 12:57:19 +00:00
Bryan Boreham 3f7ba22bde rules: two places need to call EmptyLabels
Can't assume nil is a valid value.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-02-22 15:14:07 +00:00
Julien Pivotto 259bb5c692
Merge pull request #11826 from dannykopping/dannykopping/rule-eval
Pass rule details in evaluation context
2023-02-14 21:38:19 +01:00
Justin Lei af1d9e01c7
Refactor tsdbutil for tests/native histograms (#11948)
* Add float histograms to ChunkFromSamplesGeneric

Signed-off-by: Justin Lei <justin.lei@grafana.com>

* Add Generate*Samples functions to tsdbutil

Signed-off-by: Justin Lei <justin.lei@grafana.com>

* PR responses

Signed-off-by: Justin Lei <justin.lei@grafana.com>

---------

Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-02-10 17:09:33 +05:30
Danny Kopping 98c70e1817
Correcting NewAlertingRule args
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2023-01-26 13:21:50 +02:00
Danny Kopping df078e0a84
Merge branch 'main' into dannykopping/rule-eval
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2023-01-26 13:10:18 +02:00
Julien Pivotto e811d14963 Add comments
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-01-23 13:59:43 +01:00
Danny Kopping c4ca791f18
Appeasing the linter
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2023-01-20 10:53:42 +02:00
Danny Kopping 6486d28c7a
Panic if rule type was not expected
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2023-01-20 10:27:50 +02:00
Julien Pivotto c0724f4e62 New test
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-01-19 11:56:04 +01:00
Julien Pivotto 2c408289f8 Add stabilizing to UI
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-01-19 11:33:54 +01:00
Julien Pivotto 5ad74e6e71 Add tests
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-01-19 10:36:01 +01:00
Julien Pivotto ce55e5074d Add 'keep_firing_for' field to alerting rules
This commit adds a new 'keep_firing_for' field to Prometheus alerting
rules. The 'resolve_delay' field specifies the minimum amount of time
that an alert should remain firing, even if the expression does not
return any results.

This feature was discussed at a previous dev summit, and it was
determined that a feature like this would be useful in order to allow
the expression time to stabilize and prevent confusing resolved messages
from being propagated through Alertmanager.

This approach is simpler than having two PromQL queries, as was
sometimes discussed, and it should be easy to implement.

This commit does not include tests for the 'resolve_delay' field.  This
is intentional, as the purpose of this commit is to gather comments on
the proposed design of the 'resolve_delay' field before implementing
tests. Once the design of the 'resolve_delay' field has been finalized,
a follow-up commit will be submitted with tests."

See https://github.com/prometheus/prometheus/issues/11570

Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-01-13 12:11:39 +01:00
Ganesh Vernekar d82ea2eb1c
Merge pull request #11838 from codesome/histo-rec
rules: Support native histograms
2023-01-12 12:35:15 +05:30
Ganesh Vernekar 98a0523e4a
rules: Test native histograms in recording rules
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-01-11 18:27:57 +05:30
Ganesh Vernekar 53a5071a72
rules: Support native histograms
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-01-10 19:07:24 +05:30
Danny Kopping 4d8478d9ac
Add license header to appease CI
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2023-01-09 11:05:56 +02:00
Danny Kopping 72527b5f12
Refactoring for simplicity
Include labels

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2023-01-09 11:01:46 +02:00
Danny Kopping d8f3e7d16c
gofumpt
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2023-01-09 11:01:25 +02:00
Danny Kopping 79300340af
Adding recording/alerting rule origin context
This will allow correlation of executed rule queries with their associated rule names and type

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2023-01-09 11:01:24 +02:00
Ganesh Vernekar f1a332c496
rules: Consider ErrTooOldSample in expected errors
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-01-05 14:49:30 +05:30
Bryan Boreham cdbe7f462b Update package rules for new labels.Labels type
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-12-19 15:22:09 +00:00
Bryan Boreham 3c7de69059 storage: allow re-use of iterators
Patterned after `Chunk.Iterator()`: pass the old iterator in so it
can be re-used to avoid allocating a new object.

(This commit does not do any re-use; it is just changing all the method
signatures so re-use is possible in later commits.)

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-12-15 18:32:45 +00:00
Julius Volz 1a2c645dfa Correctly handle error unwrapping in rules and remote write receiver
errors.Unwrap() actually dangerously returns nil if the error does not have an
Unwrap() method, which is the case in at least one of these places where I
noticed that no error was being logged at all when it should have.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2022-12-15 12:50:55 +01:00
Dimitar Dimitrov 03ab8dcca0
Add comments on EvalTimestamp
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
2022-10-12 14:16:22 +02:00
Ganesh Vernekar 648be89822
Merge remote-tracking branch 'upstream/main' into fix-conflict
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-10-12 14:20:02 +05:30
Ganesh Vernekar 46b26c4f09
Fix notifier relabel changing the labels of active alerts (#11427)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-10-07 20:28:17 +05:30
Jesus Vazquez e934d0f011 Merge 'main' into sparsehistogram
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
2022-10-05 22:14:49 +02:00
Dimitar Dimitrov 3fb881af26
Simplify rule group's EvalTimestamp formula
I found it hard to understand how EvalTimestamp works, so I wanted to simplify the math there. This PR should be a noop.

Current formula is:

```
offset        = g.hash % g.interval
adjNow        = startTime - offset
base          = adjNow - (adjNow % g.interval)
EvalTimestamp = base + offset
```

I simplify `EvalTimestamp`

```
EvalTimestamp = base + offset
                # expand base
              = adjNow - (adjNow % g.interval) + offset
                # expand adjNow
              = startTime - offset - ((startTime - offset) % g.interval) + offset
                # cancel out offset
              = startTime - ((startTime - offset) % g.interval)
                # expand A+B (mod M) = (A (mod M) + B (mod M)) (mod M)
              = startTime - (startTime % g.interval - offset % g.interval) % g.interval
                # expand offset
              = startTime - (startTime % g.interval - ((g.hash % g.interval) % g.interval)) % g.interval
                # remove redundant mod g.interval
              = startTime - (startTime % g.interval - g.hash % g.interval) % g.interval
                # simplify (A (mod M) + B (mod M)) (mod M) = A+B (mod M)
              = startTime - (startTime - g.hash) % g.interval

offset            = (startTime - g.hash) % g.interval
EvalTimestamp     = startTime - offset
```

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
2022-09-13 10:52:32 +02:00
Bryan Boreham 8297f5cb6b rules: in tests use labels.FromStrings
And a number of `EmptyLabels()` instead of `nil`.
Replacing code which assumes the internal structure of `Labels`.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-09-09 13:34:49 +02:00
Cosrider bef6556ca5
delete redundant alias (#11180)
Signed-off-by: Cosrider <cosrider7@gmail.com>

Signed-off-by: Cosrider <cosrider7@gmail.com>
2022-08-31 15:50:38 +02:00
Bryan Boreham 8b863c42dd
Optimise relabeling by re-using memory (#11147)
* model/relabel: Add benchmark

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* model/relabel: re-use Builder across relabels

Saves memory allocations.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* labels.Builder: allow re-use of result slice

This reduces memory allocations where the caller has a suitable slice available.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* model/relabel: re-use source values slice

To reduce memory allocations.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Unwind one change causing test failures

Restore original behaviour in PopulateLabels, where we must not overwrite the input set.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* relabel: simplify values optimisation

Use a stack-based array for up to 16 source labels, which will be the
vast majority of cases.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* lint

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-08-19 15:27:52 +05:30
beorn7 c9fd3c235d Merge branch 'main' into sparsehistogram 2022-08-10 17:54:37 +02:00
Jimmie Han a5fea2cdd0
Use atomic field avoid (*AlertingRule).mtx wait when template expanding (#10858)
* Use atomic field avoid (*AlertingRule).mtx wait when template expanding (#10703)

Signed-off-by: hanjm <hanjinming@outlook.com>
2022-07-19 12:58:37 +02:00
beorn7 28f028e938 Merge branch 'main' into sparsehistogram 2022-07-12 19:07:13 +02:00
Matthieu MOREL ddfa9a7cc5
refactor (rules): move from github.com/pkg/errors to 'errors' and 'fmt' (#10855)
* refactor (rules): move from github.com/pkg/errors to 'errors' and 'fmt'

Signed-off-by: Matthieu MOREL <mmorel-35@users.noreply.github.com>
2022-06-17 09:54:25 +02:00
beorn7 40ad5e284a Merge branch 'main' into beorn7/sparsehistogram 2022-06-09 20:50:30 +02:00
Julien Pivotto 3a56817a30
Rules: set otel status to ERROR when a rule fails (#10745)
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2022-05-25 10:06:17 +02:00
Julien Pivotto 0d94cdf107
rules: remove classic UI code (#10730)
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2022-05-23 16:21:50 +02:00
Łukasz Mierzwa d3c9c4f574
Stop rule manager before TSDB is stopped (#10680)
During shutdown TSDB is stopped before rule manager is stopped. Since  TSDB shutdown can take a long time (minutes or 10s of minutes) it keeps rule manager running while parts of Prometheus are already stopped (most notebly scrape manager). This can cause false positive alerts to fire, mostly those that rely on absent() calls since new sample appends will stop while alert queries are still evaluated.
Stop rules before stopping TSDB and scrape manager to avoid this problem.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2022-05-20 23:26:06 +02:00
beorn7 3bc711e333 Merge branch 'main' into sparsehistogram 2022-05-04 13:37:13 +02:00
Matthieu MOREL e2ede285a2
refactor: move from io/ioutil to io and os packages (#10528)
* refactor: move from io/ioutil to io and os packages
* use fs.DirEntry instead of os.FileInfo after os.ReadDir

Signed-off-by: MOREL Matthieu <matthieu.morel@cnp.fr>
2022-04-27 11:24:36 +02:00
beorn7 7ee1836ef5 Merge branch 'main' into sparsehistogram 2022-04-05 18:31:19 +02:00
Wilbert Guo 83a2e52bc2
Add SyncForState Implementation for Ruler HA (#10070)
* continuously syncing activeAt for alerts

Signed-off-by: Yijie Qin <qinyijie@amazon.com>
Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

* add import

Signed-off-by: Yijie Qin <qinyijie@amazon.com>
Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

* Refactor SyncForState and add unit tests

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

* Format code

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

* Add hook for syncForState

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

Fix go lint

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

Refactor syncForState override implementation

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

Add syncForState override func as argument to Update()

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

Fix go formatting

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

Fix circleci test errors

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

Remove overrideFunc as argument to run()

Signed-off-by: Wilbert Guo <wilbeguo@amazon.com>

* remove the syncForState

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* use the override function to decide if need to replace the activeAt or not

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix test case

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix format

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* Trigger build

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fixing comments

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* return the result of map of alerts instead of single one

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* upper case the QueryforStateSeries

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* use a more generic rule group post process function type

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix indentation

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix gofmt

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix lint

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fixing naming

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix comments

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* add the lastEvalTimestamp as parameter

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fmt

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* change funcType to func

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

Co-authored-by: Yijie Qin <qinyijie@amazon.com>
Co-authored-by: Yijie Qin <63399121+qinxx108@users.noreply.github.com>
2022-03-29 02:16:46 +02:00
beorn7 4210aac74a Merge branch 'main' into sparsehistogram 2022-03-22 14:47:42 +01:00
Alan Protasio 606ef33d91 Track and report Samples Queried per query
We always track total samples queried and add those to the standard set
of stats queries can report.

We also allow optionally tracking per-step samples queried. This must be
enabled both at the engine and query level to be tracked and rendered.
The engine flag is exposed via a Prometheus feature flag, while the
query flag is set when stats=all.

Co-authored-by: Alan Protasio <approtas@amazon.com>
Co-authored-by: Andrew Bloomgarden <blmgrdn@amazon.com>
Co-authored-by: Harkishen Singh <harkishensingh@hotmail.com>
Signed-off-by: Andrew Bloomgarden <blmgrdn@amazon.com>
2022-03-21 23:49:17 +01:00
Alvin Lin cd739214dd
Log rule name when evaluating rule groups' Eval function logs anything (#10454)
* Add benchingmark test for rule group eval

Signed-off-by: Alvin Lin <alvinlin@amazon.com>
2022-03-21 19:52:20 +01:00
Matej Gera 2c61d29b2a
Tracing: Migrate to OpenTelemetry library (#9724)
Signed-off-by: Matej Gera <matejgera@gmail.com>
2022-01-25 11:08:04 +01:00
Björn Rabenstein 7e42acd3b1
tsdb: Rework iterators (#9877)
- Pick At... method via return value of Next/Seek.
- Do not clobber returned buckets.
- Add partial FloatHistogram suppert.

Note that the promql package is now _only_ dealing with
FloatHistograms, following the idea that PromQL only knows float
values.

As a byproduct, I have removed the histogramSeries metric. In my
understanding, series can have both float and histogram samples, so
that metric doesn't make sense anymore.

As another byproduct, I have converged the sampleBuf and the
histogramSampleBuf in memSeries into one. The sample type stored in
the sampleBuf has been extended to also contain histograms even before
this commit.

Signed-off-by: beorn7 <beorn@grafana.com>
2021-11-29 13:24:23 +05:30
beorn7 5d4db805ac Merge branch 'main' into sparsehistogram 2021-11-17 19:57:31 +01:00
Björn Rabenstein 4c56a193c5
Merge pull request #9478 from prometheus/beorn7/pkg-deprecation
Move packages out of deprecated pkg directory
2021-11-09 11:09:16 +01:00
beorn7 c954cd9d1d Move packages out of deprecated pkg directory
This creates a new `model` directory and moves all data-model related
packages over there:
  exemplar labels relabel rulefmt textparse timestamp value

All the others are more or less utilities and have been moved to `util`:
  gate logging modetimevfs pool runtime

Signed-off-by: beorn7 <beorn@grafana.com>
2021-11-09 08:03:10 +01:00
Bryan Boreham 26d8ae0e41 Rules: simplify map key for stale series detection
The rules manager keeps a note of which series were generated by the
last run, so it can write a stale marker to those that disappeared.
Since the keys are not for human eyes, we can use a simpler format
and save the effort of quoting label values.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2021-11-08 22:18:48 +01:00
Yijie Qin 6fce45838a
Add access function for restoration state of alerting rule (#9665) 2021-11-05 18:26:29 -04:00
Ganesh Vernekar c8b267efd6
Get histograms from TSDB to the rate() function implementation
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-11-03 19:04:18 +05:30
Mateusz Gozdek 1a6c2283a3 Format Go source files using 'gofumpt -w -s -extra'
Part of #9557

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
2021-11-02 19:52:34 +01:00
Levi Harrison d81bbe154d
Rule alerts/series limit updates (#9541)
* Add docs and do not limit inactive alerts.

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-10-21 23:14:17 +02:00
Levi Harrison dc2f1993d8
Limit number of alerts or series produced by a rule (#9260)
* Add limit to rules

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-09-15 09:48:26 +02:00
George Robinson 049b4f4f13
Support customization of template options in TemplateExpander (#9290)
Signed-off-by: George Robinson <george.robinson@grafana.com>
2021-09-13 17:19:08 +05:30
Levi Harrison 8c29046ab2
Remove unneeded state modifications
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-08-20 16:42:31 -04:00
Michal Wasilewski 3f686cad8b
fixes yamllint errors
Signed-off-by: Michal Wasilewski <mwasilewski@gmx.com>
2021-06-12 12:47:47 +02:00
Levi Harrison b5f6f8fb36 Switched to go-kit/log
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-06-11 12:28:36 -04:00
Levi Harrison 26274527df
Updated/added tests
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-05-30 23:35:41 -04:00
Levi Harrison 17ea8d006a
Added external URL access
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-05-30 23:35:26 -04:00
Owen Diehl 23999df27c expose rule metrics fields
Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
2021-04-30 13:36:44 -04:00
Goutham Veeramachaneni 2efdf660b1
Increase evaluation failures on Commit() (#8770)
I think we should increment the metric here, we're setting the rule
health anyways. This means even if the "evaluation" suceeded, none of
the samples made it to storage.

This is a simplified solution to: https://github.com/prometheus/prometheus/pull/8410/

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2021-04-29 14:28:48 +02:00
Björn Rabenstein 9549a15c6f
Merge pull request #7675 from JessicaGreben/jg/11-retroactive-rule-eval
Add rule importer to backfill
2021-03-29 19:09:21 +02:00
Goutham Veeramachaneni 4b5ab80ca6
[rule] Update rule health for append/commit fails (#8619)
* [rule] Update rule health for append/commit fails

Similar to https://github.com/prometheus/prometheus/pull/8410 will
provide more context.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Add test for updating health on append fails

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2021-03-18 15:44:33 +01:00
jessicagreben 78e84aed89 resolve merge conflict
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2021-02-24 09:47:29 -08:00
Tom Wilkie 7369561305
Combine Appender.Add and AddFast into a single Append method. (#8489)
This moves the label lookup into TSDB, whilst still keeping the cached-ref optimisation for repeated Appends.

This makes the API easier to consume and implement.  In particular this change is motivated by the scrape-time-aggregation work, which I don't think is possible to implement without it as it needs access to label values.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2021-02-18 17:37:00 +05:30
jessicagreben ac06d0a657 merge master/resolve conflict
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2020-11-26 08:43:07 -08:00
jessicagreben 75654715d3 fix panics
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2020-11-01 07:54:04 -08:00
jessicagreben 6980bcf671 unexport backfiller
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2020-10-31 06:40:56 -07:00
Julien Pivotto 6c56a1faaa
Testify: move to require (#8122)
* Testify: move to require

Moving testify to require to fail tests early in case of errors.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* More moves

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-29 09:43:23 +00:00
Julien Pivotto 1282d1b39c
Refactor test assertions (#8110)
* Refactor test assertions

This pull request gets rid of assert.True where possible to use
fine-grained assertions.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-27 11:06:53 +01:00
Julien Pivotto 4e5b1722b3
Move away from testutil, refactor imports (#8087)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-22 11:00:08 +02:00
jessicagreben 36ac0b68f1 merge master, fix conflicts 2020-10-17 08:20:21 -07:00
Łukasz Mierzwa 19c190b406
Add a rule_group_samples metric (#7977)
This new metric allows tracking how many samples did each rule group generate.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2020-09-25 16:48:38 +01:00
李国忠 4a52faf2ae
Unnecessary go routine spawn. (#7951)
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
2020-09-21 11:29:03 +01:00
jessicagreben dfa510086b add alignment, mv rule importer to promtool dir, add queryRange
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2020-09-13 08:07:59 -07:00
Anonymous 8219b442c8
rules: Remove redundant RLock to avoid double RLock (#7183)
Signed-off-by: BurtonQin <bobbqqin@gmail.com>
2020-09-07 18:58:21 +01:00
johncming 2f2a51a43a
web/api/v1: make names consistent. (#7841)
Signed-off-by: johncming <johncming@yahoo.com>
2020-08-25 11:38:06 +01:00
Goutham Veeramachaneni cb830b0a9c
Label rule_group_iterations metric with group name (#7823)
* Label rule_group_iterations metric with group name

evalTotal and evalFailures having the label but iterations not having it
is an odd mismatch.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Remove the metrics when a group is deleted.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Initialise the metrics

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2020-08-19 15:29:13 +02:00
johncming 362080ba28
rules: add evaluationTimestamp when copy state. (#7775)
Signed-off-by: johncming <johncming@yahoo.com>
2020-08-14 09:42:13 +01:00
Callum Styan c7a17f6491
Add additional tag/log to rules Manager Eval trace span. (#7708)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2020-08-06 08:42:20 -07:00
Annanay 9bba8a6eae Merge branch 'master' into appender-context
Signed-off-by: Annanay <annanayagarwal@gmail.com>
2020-07-30 16:43:18 +05:30
Julien Pivotto e76c436e9c
Goleak in discoveries, scrape, rules (#7662)
* Add go leak tests for discoveries with goroutines

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Add go leak tests in rules

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Add go leak tests in scrape tests

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-27 09:38:08 +01:00
Annanay 7f98a744e5 Add context to Appender interface
Signed-off-by: Annanay <annanayagarwal@gmail.com>
2020-07-24 19:40:51 +05:30
Owen Diehl 9ccedc0407
Arbitrary rule & group loading (#7569)
* allows loading rule groups via an interface

Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
2020-07-22 15:19:34 +01:00
Julien Pivotto b83cbacbdd
Rule manager: remove blocking channel in mail (#7631)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-22 00:13:24 +02:00
johncming a69a8b931f
rules: fix bug for unknown alert state. (#7599)
Signed-off-by: johncming <johncming@yahoo.com>
2020-07-17 08:39:15 +01:00
Guangwen Feng ad9449d1f1
Add unit test case for func HasAlertingRules in manager.go (#7502)
Signed-off-by: Guangwen Feng <fenggw-fnst@cn.fujitsu.com>
2020-07-06 10:35:16 +01:00
Guangwen Feng 487f1e07ff
Add unit test case for func State in alerting.go (#7476)
Signed-off-by: Guangwen Feng <fenggw-fnst@cn.fujitsu.com>
2020-06-29 13:16:52 +01:00
Frederic Branczyk d17d88935c
rules: Use narrower interface for rule manager loading of for state (#7472)
To load ALERT_FOR_STATE only `storage.Queryable` interface is required,
so this patch uses this narrower interface for to perform this.

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
2020-06-26 19:06:36 +01:00
Kemal Akkoyun 66dfb951c4
*: Consistent Error/Warning handling for SeriesSet iterator: Allowing Async Select (#7251)
* Add errors and Warnings to SeriesSet

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Change Querier interface and refactor accordingly

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Refactor promql/engine to propagate warnings at eval stage

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Address review issues

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Make sure all the series from all Selects are pre-advanced

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Address review issues

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Separate merge series sets

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Clean

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Refactor merge querier failure handling

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Refactored and simplified fanout with improvements from incoming chunk iterator PRs.

* Secondary logic is hidden, instead of weird failed series set logic we had.
* Fanout is well commented
* Fanout closing record all errors
* MergeQuerier improved API (clearer)
* deferredGenericMergeSeriesSet is not needed as we return no samples anyway for failed series sets (next = false).

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fix formatting

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Fix CI issues

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Added final tests for error handling.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Brian's comments.

* Moved hints in populate to be allocated only when needed.
* Used sync.Once in secondary Querier to achieve all-or-nothing partial response logic.
* Select after first Next is done will panic.

NOTE: in lazySeriesSet in theory we could just panic, I think however we can
totally just return error, it will panic in expand anyway.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Utilize errWithWarnings

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Fix recently introduced expansion issue

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add tests for secondary querier error handling

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Implement lazy merge

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add name to test cases

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Reorganize

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Address review comments

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Address review comments

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Remove redundant warnings

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Fix rebase mistake

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-06-09 17:57:31 +01:00
Brian Brazil 5368066b58
Give a bit more slack for alertmanager send failures. (#7228)
Fixes #5277

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2020-05-09 05:37:46 +01:00
Julien Pivotto fc3fb3265a
Merge pull request #7145 from prometheus/release-2.17
Backport release 2.17 into master
2020-04-20 14:08:12 +02:00
Chris Marchbanks a7b449320d
Fix updating rule manager never finishing (#7138)
Rather than sending a value to the done channel on a group to indicate
whether or not to add stale markers to a closing rule group use an
explicit boolean. This allows more functions than just run() to read
from the done channel and fixes an issue where Eval() could consume the
channel during an update, causing run() to never return.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2020-04-18 14:32:18 +02:00
ZouYu 2b7437d60e
Fix some warnings: 'redundant type from array, slice, or map composite literal' (#7109)
Signed-off-by: ZouYu <zouy.fnst@cn.fujitsu.com>
2020-04-15 11:17:41 +01:00
Marek Slabicki 8224ddec23
Capitalizing first letter of all log lines (#7043)
Signed-off-by: Marek Slabicki <thaniri@gmail.com>
2020-04-11 09:22:18 +01:00
Ben Ye 00730bfee7
add rule_group label to rule evaluation metrics (#7094)
Signed-off-by: yeya24 <yb532204897@gmail.com>
2020-04-08 23:21:37 +02:00
Muhammad Falak R Wani 2d1a80aa82
rules: manager: clarify doc string for NewGroupMetrics (#7084)
* rules: manager: clarify doc string for NewGroupMetrics

Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com>
2020-04-07 14:06:01 +02:00
Brian Brazil 7646cbca32
Use .UTC everywhere we use time.Unix (#7066)
time.Unix attaches the local timezone, which can then
leak out (e.g. in the alert json). While this is harmless,
we should be consistent.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2020-03-29 17:35:39 +01:00
Bartlomiej Plotka c4eefd1b3a storage: Removed SelectSorted method; Simplified interface; Added requirement for remote read to sort response.
This is technically BREAKING CHANGE, but it was like this from the beginning: I just notice that we rely in
Prometheus on remote read being sorted. This is because we use selected data from remote reads in MergeSeriesSet
which rely on sorting.

I found during work on https://github.com/prometheus/prometheus/pull/5882 that
we do so many repetitions because of this, for not good reason. I think
I found a good balance between convenience and readability with just one method.
Smaller the interface = better.

Also I don't know what TestSelectSorted was testing, but now it's testing sorting.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-03-20 21:14:43 +01:00
Björn Rabenstein 1da83305be
Merge pull request #7009 from prometheus/release-2.17
Merge release-2.17 into master
2020-03-19 13:46:28 +01:00
Julien Pivotto 8907ba6235 Make TSDB use storage errors
This fixes #6992, which was introduced by #6777. There was an
intermediate component which translated TSDB errors into storage errors,
but that component was deleted and this bug went unnoticed, until we
were watching at the Prombench results. Without this, scrape will fail
instead of dropping samples or using "Add" when the series have been
garbage collected.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-03-17 22:24:25 +01:00
Björn Rabenstein d80b0810c1
Move crucial actions to defer (#6918)
With defer having less of a performance penalty, there is no reason
not to do those crucial operations via defer.

Context: With isolation in place, if we forget to Commit/Rollback, the
low watermark will get stuck forever.

The current code should not have any bugs, but moving to defer helps
to avoid future bugs.

This is also moving the `closeAppend` in the `Commit` implementation
itself to defer. If logging to the WAL fails, we would have missed the
`closeAppend`.

Signed-off-by: beorn7 <beorn@grafana.com>
2020-03-13 20:54:47 +01:00
Bartlomiej Plotka fe802f29c9 storage: Removed SelectSorted method; Simplified interface; Added requirement for remote read to sort response.
This is technically BREAKING CHANGE, but it was like this from the beginning: I just notice that we rely in
Prometheus on remote read being sorted. This is because we use selected data from remote reads in MergeSeriesSet
which rely on sorting.

I found during work on https://github.com/prometheus/prometheus/pull/5882 that
we do so many repetitions because of this, for not good reason. I think
I found a good balance between convenience and readability with just one method.
Smaller the interface = better.

Also I don't know what TestSelectSorted was testing, but now it's testing sorting.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-03-13 13:06:25 +00:00
Julien Pivotto 2051ae2e6a Revert "Fix race condition in Rule manager.Update() function"
This reverts commit 8b11d2cfb6.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-03-02 08:01:21 +01:00
fuling 8b11d2cfb6 Fix race condition in Rule manager.Update() function
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
2020-03-01 12:00:51 +01:00
Tobias Guggenmos 4835bbf376
Merge branch 'master' into split_parser 2020-02-19 15:18:13 +01:00
Bartlomiej Plotka 34426766d8 Unify Iterator interfaces. All point to storage now.
This is part of https://github.com/prometheus/prometheus/pull/5882 that can be done to simplify things.
All todos I added will be fixed in follow up PRs.

* querier.Querier, querier.Appender, querier.SeriesSet, and querier.Series interfaces merged
with storage interface.go. All imports that.
* querier.SeriesIterator replaced by chunkenc.Iterator
* Added chunkenc.Iterator.Seek method and tests for xor implementation (?)
* Since we properly handle SelectParams for Select methods I adjusted min max
based on that. This should help in terms of performance for queries with functions like offset.
* added Seek to deletedIterator and test.
* storage/tsdb was removed as it was only a unnecessary glue with incompatible structs.

No logic was changed, only different source of abstractions, so no need for benchmarks.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:54 +00:00
Tobias Guggenmos 6c00f2ffcb Comment fixes
Signed-off-by: Tobias Guggenmos <tguggenm@redhat.com>
2020-02-17 16:09:23 +01:00
Tobias Guggenmos 20b1f596f6 Fix build errors in rest of prometheus
Signed-off-by: Tobias Guggenmos <tguggenm@redhat.com>
2020-02-17 16:09:23 +01:00
Julien Pivotto 135cc30063
rules: Make deleted rule series as stale after a reload (#6745)
* rules: Make deleted rule series as stale after a reload

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-02-12 16:22:18 +01:00
Julien Pivotto 0e912faf4f
rules: Cleanup unused function alert.Duration (#6734)
The function HoldDuration and Duration did the exact same thing.

Let's only keep HoldDuration() as Duration() is more confusing.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-02-01 07:31:37 +00:00
Julien Pivotto f82d55e79f
Refactor and simplify rule_group_interval_seconds (#6711)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-29 11:26:08 +00:00
Julien Pivotto 9adad8ad30 Remove MaxConcurrent from the PromQL engine opts (#6712)
Since we use ActiveQueryTracker to check for concurrency in
d992c36b3a it does not make sense to keep
the MaxConcurrent value as an option of the PromQL engine.

This pull request removes it from the PromQL engine options, sets the
max concurrent metric to -1 if there is no active query tracker, and use
the value of the active query tracker otherwise.

It removes dead code and also will inform people who import the promql
package that we made that change, as it breaks the EngineOpts struct.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-28 20:38:49 +00:00
Julien Pivotto 56ebd5afde Delete prometheus_rule_group metrics when groups are removed (#6693)
* Delete prometheus_rule_group metrics when groups are removed

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-27 12:41:32 +00:00
Julien Pivotto cf42888e4d Fix order of testutil.Equals (#6695)
Equals takes the expected value as first parameter, and the actual value
as second parameter.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-27 12:21:59 +00:00
Julien Pivotto 5f27ac3583 Refactor query log fields (#6694)
* Refactor query log fields

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-27 09:53:10 +00:00
Harkishen Singh 84e6459c4d Adds support for line-column numbers for invalid rules, promtool (#6533)
Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>
2020-01-15 18:07:54 +00:00
Julien Pivotto 0011bba19b Query Log: add origin of the rules (#6592)
* Query Log: add origin of the rules

We don't set rule name and rule kind because the added value would be
quite low, given we have now the file, the group and the query.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-10 08:28:17 +00:00
Julien Pivotto e079c9ed45 manager: add full stops on comments
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2019-12-19 11:46:22 +01:00
alburthoffman 156dcb8cca avoid stopping rule groups if new rule groups are as same as old rule… (#6450)
* avoid stopping rule groups if new rule groups are as same as old rule groups

Signed-off-by: alburthoffman <alburthoffman@gmail.com>
2019-12-19 10:41:11 +00:00
Julien Pivotto 2d7c8069d0 Check that rules don't contain metrics with the same labelset (#6469)
Closes #5529

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2019-12-18 12:29:35 +00:00
Simon Pasquier 06066a3619
*: improve error messages when parsing bad rules (#5965)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-08-28 17:36:48 +02:00
Brian Brazil e62f30d497
Correctly handle empty labels from alert templates. (#5845)
Fixes https://github.com/prometheus/common/issues/36

Move logic handling this into the labels package,
so all the cases are handled in one place and we're
less likely to have this come up again.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-08-13 11:19:17 +01:00
Chris Marchbanks 0685eb5395
Refactor testutil.NewStorage into a new package
This avoids a circular dependency between the testutil and storage
packages.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2019-08-08 19:43:04 -06:00
Brian Brazil 2184b79763
Mark deleted rule's series as stale on next evaluation. (#5759)
Fixes #5755

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-08-07 16:11:05 +01:00
AllenZMC 5c2c9a03e9 fix word 'consequentally' to 'consequently' (#5827)
Signed-off-by: czm <zhongming.chang@daocloud.io>
2019-08-03 14:56:58 +01:00
beorn7 dd81912554 Add objectives to Summaries
With the next release of client_golang, Summaries will not have
objectives by default. To not lose the objectives we have right now,
explicitly state the current default objectives.

Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-12 02:03:13 +02:00
pbhudiaBAE 43953b105b Sorting alerts by group name in /alerts (#5448)
* Working group name

Signed-off-by: Pritam Bhudia <pritam.bhudia@baesystems.com>

* Working categorised by group name

Signed-off-by: Pritam Bhudia <pritam.bhudia@baesystems.com>

* Changed group sorting in web

Signed-off-by: Pritam Bhudia <pritam.bhudia@baesystems.com>

* Fixed group sorting and comments

Signed-off-by: Pritam Bhudia <pritam.bhudia@baesystems.com>

* Fixed group sorting and comments with gofmt

Signed-off-by: Pritam Bhudia <pritam.bhudia@baesystems.com>

* Added file and group name

Signed-off-by: Pritam Bhudia <pritam.bhudia@baesystems.com>

* reverted back to full path to yml file

Signed-off-by: Pritam Bhudia <pritam.bhudia@baesystems.com>
2019-05-14 23:14:27 +02:00
Yao Zengzeng 5544cb252a fix some mistakes in comments (#5533)
Signed-off-by: YaoZengzeng <yaozengzeng@zju.edu.cn>
2019-05-05 10:48:42 +01:00
Simon Pasquier 45506841e6
*: enable all default linters (#5504)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-05-03 15:11:28 +02:00
Bjoern Rabenstein 76102d570c Add test for external labels in label template
Signed-off-by: Bjoern Rabenstein <bjoern@rabenste.in>
2019-04-21 00:08:39 +02:00
Bjoern Rabenstein 38d518c0fe Rework #5009 after comments
Signed-off-by: Bjoern Rabenstein <bjoern@rabenste.in>
2019-04-17 01:40:10 +02:00
Sylvain Rabot 335a34486e Add external labels to template expansion
This affects the expansion of templates in alert labels and
annotations and console templates.

Signed-off-by: Sylvain Rabot <sylvain@abstraction.fr>
2019-04-17 01:40:10 +02:00
Tariq Ibrahim 8fdfa8abea refine error handling in prometheus (#5388)
i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors.
ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives.
iii) Does away with the use of fmt package for errors in favour of pkg/errors

Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-03-26 00:01:12 +01:00
James Ravn e15d8c5802 reload: copy state on both name and labels (#5368)
* reload: copy state on both name and labels

Fix https://github.com/prometheus/prometheus/issues/5193

Using just name causes the linked issue - if new rules are inserted with
the same name (but different labels), the reordering will cause stale
markers to be inserted in the next eval for all shifted rules, despite
them not being stale.

Ideally we want to avoid stale markers for time series that still exist
in the new rules, with name and labels being the unique identifer.

This change adds labels to the internal map when copying the old rule
data to the new rule data. This prevents the problem of staling rules
that simply shifted order.

If labels change, it is a new time series and the old series will stale
regardless. So it should be safe to always match on name and labels when
copying state.

Signed-off-by: James Ravn <james@r-vn.org>
2019-03-15 15:23:36 +00:00
David Symonds 46361a7c85 rules: Fix sorting of result from (*Manager).RuleGroups (#5260)
The previous code was defective in that it never sorted groups within a
file due to doing a multi-key sort incorrectly.

Signed-off-by: David Symonds <dsymonds@gmail.com>
2019-02-23 09:51:44 +01:00
beorn7 2db1eeb4ec Fix prometheus_rule_group_last_evaluation_timestamp_seconds
It should be a unix timestamp, not the seconds in the minute.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-06 11:02:49 +01:00
Ganesh Vernekar 787eb1e904 Set rule_group_last_duration_seconds to seconds (#5153)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-01-31 11:07:58 +01:00
Matt Layher 302148fd69 *: apply gofmt -s
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2019-01-16 17:28:14 -05:00
Vishnunarayan K I fd3ef6ba34 Add metric rule_group_rules_loaded to get the number of rules loaded (#5090)
Signed-off-by: Vishnunarayan K I <appukuttancr@gmail.com>
2019-01-13 14:28:07 +00:00