Commit graph

793 commits

Author SHA1 Message Date
Jeanette Tan 0fccba0db9 Merge remote-tracking branch 'upstream/main' 2023-04-26 21:25:21 +08:00
Matthieu MOREL 7e9acc2e46
golangci-lint: remove skip-cache and restore singleCaseSwitch rule
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-04-20 18:43:51 +02:00
Julien Pivotto f7c6130ff2
Merge pull request #12251 from prymitive/query_samples_total
Add query_samples_total metric
2023-04-20 15:48:24 +02:00
Matthieu MOREL bae9a21200
Merge branch 'main' into linter/nilerr
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-04-19 19:56:39 +02:00
beorn7 5b53aa1108 style: Replace else if cascades with switch
Wiser coders than myself have come to the conclusion that a `switch`
statement is almost always superior to a statement that includes any
`else if`.

The exceptions that I have found in our codebase are just these two:

* The `if else` is followed by an additional statement before the next
  condition (separated by a `;`).
* The whole thing is within a `for` loop and `break` statements are
  used. In this case, using `switch` would require tagging the `for`
  loop, which probably tips the balance.

Why are `switch` statements more readable?

For one, fewer curly braces. But more importantly, the conditions all
have the same alignment, so the whole thing follows the natural flow
of going down a list of conditions. With `else if`, in contrast, all
conditions but the first are "hidden" behind `} else if `, harder to
spot and (for no good reason) presented differently from the first
condition.

I'm sure the aforemention wise coders can list even more reasons.

In any case, I like it so much that I have found myself recommending
it in code reviews. I would like to make it a habit in our code base,
without making it a hard requirement that we would test on the CI. But
for that, there has to be a role model, so this commit eliminates all
`if else` occurrences, unless it is autogenerated code or fits one of
the exceptions above.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-19 17:22:31 +02:00
beorn7 c3c7d44d84 lint: Adjust to the lint warnings raised by current versions of golint-ci
We haven't updated golint-ci in our CI yet, but this commit prepares
for that.

There are a lot of new warnings, and it is mostly because the "revive"
linter got updated. I agree with most of the new warnings, mostly
around not naming unused function parameters (although it is justified
in some cases for documentation purposes – while things like mocks are
a good example where not naming the parameter is clearer).

I'm pretty upset about the "empty block" warning to include `for`
loops. It's such a common pattern to do something in the head of the
`for` loop and then have an empty block. There is still an open issue
about this: https://github.com/mgechev/revive/issues/810 I have
disabled "revive" altogether in files where empty blocks are used
excessively, and I have made the effort to add individual
`// nolint:revive` where empty blocks are used just once or twice.
It's borderline noisy, though, but let's go with it for now.

I should mention that none of the "empty block" warnings for `for`
loop bodies were legitimate.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-19 17:10:10 +02:00
Đurica Yuri Nikolić d162bb51b6
Merge pull request #485 from grafana/yuri/bring-prometheus-upstream
Synch with Prometheus up to 2023-04-18 (b028112)
2023-04-18 15:07:26 +02:00
Ben Ye fd3630b9a3 add ctx to QueryEngine interface
Signed-off-by: Ben Ye <benye@amazon.com>
2023-04-17 21:32:38 -07:00
Jeanette Tan 8bf0d6f59c Merge branch 'main' into sync-upstream-14-apr-2023 2023-04-18 10:10:09 +08:00
Marco Pracucci c461e22341
Improve fast regexp matcher cache (#482)
* Limit FastRegexMatcher by size (bytes) and add a TTL to each entry

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Add TestNewFastRegexMatcher_CacheSizeLimit

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Tolerate ristretto goroutines when checking goroutine leaks

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Tolerate ristretto goroutines when checking goroutine leaks

Signed-off-by: Marco Pracucci <marco@pracucci.com>

---------

Signed-off-by: Marco Pracucci <marco@pracucci.com>
2023-04-17 15:20:58 +02:00
Jeanette Tan 894f657c48 Fix bugs from merge 2023-04-14 18:23:02 +08:00
Jeanette Tan 1570114ae1 Merge remote-tracking branch 'upstream/main' 2023-04-14 17:34:40 +08:00
Matthieu MOREL fb3eb21230 enable gocritic, unconvert and unused linters
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-04-13 19:20:22 +00:00
beorn7 551de0346f promql: Do not return nil slices to the pool
Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-13 19:25:24 +02:00
beorn7 817a2396cb Name float values as "floats", not as "values"
In the past, every sample value was a float, so it was fine to call a
variable holding such a float "value" or "sample". With native
histograms, a sample might have a histogram value. And a histogram
value is still a value. Calling a float value just "value" or "sample"
or "V" is therefore misleading. Over the last few commits, I already
renamed many variables, but this cleans up a few more places where the
changes are more invasive.

Note that we do not to attempt naming in the JSON APIs or in the
protobufs. That would be quite a disruption. However, internally, we
can call variables as we want, and we should go with the option of
avoiding misunderstandings.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-13 19:25:24 +02:00
beorn7 c0879d64cf promql: Separate Point into FPoint and HPoint
In other words: Instead of having a “polymorphous” `Point` that can
either contain a float value or a histogram value, use an `FPoint` for
floats and an `HPoint` for histograms.

This seemingly small change has a _lot_ of repercussions throughout
the codebase.

The idea here is to avoid the increase in size of `Point` arrays that
happened after native histograms had been added.

The higher-level data structures (`Sample`, `Series`, etc.) are still
“polymorphous”. The same idea could be applied to them, but at each
step the trade-offs needed to be evaluated.

The idea with this change is to do the minimum necessary to get back
to pre-histogram performance for functions that do not touch
histograms. Here are comparisons for the `changes` function. The test
data doesn't include histograms yet. Ideally, there would be no change
in the benchmark result at all.

First runtime v2.39 compared to directly prior to this commit:

```
name                                                  old time/op    new time/op    delta
RangeQuery/expr=changes(a_one[1d]),steps=1-16            391µs ± 2%     542µs ± 1%  +38.58%  (p=0.000 n=9+8)
RangeQuery/expr=changes(a_one[1d]),steps=10-16           452µs ± 2%     617µs ± 2%  +36.48%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_one[1d]),steps=100-16         1.12ms ± 1%    1.36ms ± 2%  +21.58%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_one[1d]),steps=1000-16        7.83ms ± 1%    8.94ms ± 1%  +14.21%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_ten[1d]),steps=1-16           2.98ms ± 0%    3.30ms ± 1%  +10.67%  (p=0.000 n=9+10)
RangeQuery/expr=changes(a_ten[1d]),steps=10-16          3.66ms ± 1%    4.10ms ± 1%  +11.82%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_ten[1d]),steps=100-16         10.5ms ± 0%    11.8ms ± 1%  +12.50%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_ten[1d]),steps=1000-16        77.6ms ± 1%    87.4ms ± 1%  +12.63%  (p=0.000 n=9+9)
RangeQuery/expr=changes(a_hundred[1d]),steps=1-16       30.4ms ± 2%    32.8ms ± 1%   +8.01%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=10-16      37.1ms ± 2%    40.6ms ± 2%   +9.64%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=100-16      105ms ± 1%     117ms ± 1%  +11.69%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16     783ms ± 3%     876ms ± 1%  +11.83%  (p=0.000 n=9+10)
```

And then runtime v2.39 compared to after this commit:

```
name                                                  old time/op    new time/op    delta
RangeQuery/expr=changes(a_one[1d]),steps=1-16            391µs ± 2%     547µs ± 1%  +39.84%  (p=0.000 n=9+8)
RangeQuery/expr=changes(a_one[1d]),steps=10-16           452µs ± 2%     616µs ± 2%  +36.15%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_one[1d]),steps=100-16         1.12ms ± 1%    1.26ms ± 1%  +12.20%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_one[1d]),steps=1000-16        7.83ms ± 1%    7.95ms ± 1%   +1.59%  (p=0.000 n=10+8)
RangeQuery/expr=changes(a_ten[1d]),steps=1-16           2.98ms ± 0%    3.38ms ± 2%  +13.49%  (p=0.000 n=9+10)
RangeQuery/expr=changes(a_ten[1d]),steps=10-16          3.66ms ± 1%    4.02ms ± 1%   +9.80%  (p=0.000 n=10+9)
RangeQuery/expr=changes(a_ten[1d]),steps=100-16         10.5ms ± 0%    10.8ms ± 1%   +3.08%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_ten[1d]),steps=1000-16        77.6ms ± 1%    78.1ms ± 1%   +0.58%  (p=0.035 n=9+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=1-16       30.4ms ± 2%    33.5ms ± 4%  +10.18%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=10-16      37.1ms ± 2%    40.0ms ± 1%   +7.98%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=100-16      105ms ± 1%     107ms ± 1%   +1.92%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16     783ms ± 3%     775ms ± 1%   -1.02%  (p=0.019 n=9+9)
```

In summary, the runtime doesn't really improve with this change for
queries with just a few steps. For queries with many steps, this
commit essentially reinstates the old performance. This is good
because the many-step queries are the one that matter most (longest
absolute runtime).

In terms of allocations, though, this commit doesn't make a dent at
all (numbers not shown). The reason is that most of the allocations
happen in the sampleRingIterator (in the storage package), which has
to be addressed in a separate commit.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-13 19:25:16 +02:00
Oleg Zaytsev 4086a5f042 Merge branch 'main' into prometheus-2023-04-03-3923e83 2023-04-13 09:15:24 +02:00
Łukasz Mierzwa b6573353c1 Add query_samples_total metric
query_samples_total is a counter that tracks the total number of samples loaded by all queries.

The goal with this metric is to be able to see the amount of 'work' done by Prometheus to service queries.
At the moment we have metrics with the number of queries, plus more detailed metrics showing how much time each step of a query takes.
While those metrics do help they don't show us the whole picture.
Queries that do load more samples are (in general) more expensive than queries that do load fewer samples.
This means that looking only at the number of queries doesn't tell us how much 'work' Prometheus received.
Adding a counter that tracks the total number of samples loaded allows us to see if there was a spike in the cost of queries, not just the number of them.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2023-04-12 14:05:06 +01:00
Ganesh Vernekar 5588cab8b2
Merge pull request #12173 from bboreham/builder-no-empty-labels
labels: simplify call to get Labels from Builder
2023-04-04 12:02:55 +05:30
Bryan Boreham 1bb6b8b309
Merge pull request #12190 from bboreham/faster-topk
promql: use faster heap method for topk/bottomk
2023-03-30 14:05:53 +01:00
Oleg Zaytsev 6e2905a4d4
Use zeropool.Pool to workaround SA6002 (#12189)
* Use zeropool.Pool to workaround SA6002

I built a tiny library called https://github.com/colega/zeropool to
workaround the SA6002 staticheck issue.

While searching for the references of that SA6002 staticheck issues on
Github first results was Prometheus itself, with quite a lot of ignores
of it.

This changes the usages of `sync.Pool` to `zeropool.Pool[T]` where a
pointer is not available.

Also added a benchmark for HeadAppender Append/Commit when series
already exist, which is one of the most usual cases IMO, as I didn't find
any.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Improve BenchmarkHeadAppender with more cases

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* A little copying is better than a little dependency

https://www.youtube.com/watch?v=PAAkCSZUG1c&t=9m28s

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Fix imports order

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Add license header

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Copyright should be on one of the first 3 lines

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Use require.Equal for testing

I don't depend on testify in my lib, but here we have it available.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Avoid flaky test

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Also use zeropool for pointsPool in engine.go

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

---------

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2023-03-29 20:34:34 +01:00
Bryan Boreham f2fd85df82 promql: use faster heap method for topk/bottomk
Call `Fix()` instead of `Pop()` followed by `Push()`.

This is slightly faster.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-03-28 11:07:31 +00:00
Bryan Boreham cf54a14f9c promql: add a benchmark for topk with k > 1
I picked k = 5.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-03-28 11:07:29 +00:00
Bryan Boreham 0679437825 promql/tests: couple more labels.FromStrings
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-03-27 17:41:30 +00:00
Bryan Boreham b987afa7ef labels: simplify call to get Labels from Builder
It took a `Labels` where the memory could be re-used, but in practice
this hardly ever benefitted. Especially after converting `relabel.Process`
to `relabel.ProcessBuilder`.

Comparing the parameter to `nil` was a bug; `EmptyLabels` is not `nil`
so the slice was reallocated multiple times by `append`.

Lastly `Builder.Labels()` now estimates that the final size will depend
on labels added and deleted.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-03-22 17:05:20 +00:00
Ganesh Vernekar 41649ceb1b
Merge remote-tracking branch 'upstream/main' into codesome/sync-prom
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-03-22 08:35:08 +05:30
Björn Rabenstein 847093479b
Merge pull request #11978 from trevorwhitney/set-counter-hint
Set `CounterResetHint` and use in recording rules
2023-03-14 21:52:41 +01:00
Trevor Whitney dd94ebb87b
promql: set CounterResetHint after rate and sum
Signed-off-by: Trevor Whitney <trevorjwhitney@gmail.com>
2023-03-14 14:21:59 -06:00
Ganesh Vernekar 7e74f73733
Merge remote-tracking branch 'upstream/main' into sync-prom
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-03-13 12:38:59 +05:30
Bryan Boreham d21229b27a
Merge pull request #12101 from bboreham/disable-slow-promql-tests
promql: disable some slow cases in TestConcurrentRangeQueries
2023-03-09 11:08:12 +00:00
Marco Pracucci fa57574183
Merge pull request #449 from grafana/yuri/merge-upstream
Merging remotes/prometheus/main into origin/main
2023-03-09 11:17:40 +01:00
Marco Pracucci 242e82b8e6
Optimize regex star operation (#448)
* Optimize .* regex matcher

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Consistent benchmark runs for BenchmarkFastRegexMatcher

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed TestParseExpressions

Signed-off-by: Marco Pracucci <marco@pracucci.com>

---------

Signed-off-by: Marco Pracucci <marco@pracucci.com>
2023-03-09 09:38:41 +01:00
Yuri Nikolic 0eb15fce6d Merge remotes/prometheus/main into origin/main 2023-03-08 18:25:45 +01:00
Yuri Nikolic 13c2945af0 Fixing conflicts with commit 86d3d5fec6 2023-03-08 16:48:58 +01:00
Julien Pivotto 1fd59791e1 Update tests
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-03-08 16:32:39 +01:00
Bryan Boreham be4a9c25f0 promql: disable some slow cases in TestConcurrentRangeQueries
TestConcurrentRangeQueries runs many queries, up to 4 at the same time,
to try to expose any race conditions.
This change stops four of them from running with a thousand or more steps:

  `holt_winters(a_X[1d], 0.3, 0.3)`
  `changes(a_X[1d])`
  `rate(a_X[1d])`
  `absent_over_time(a_X[1d])`

Particularly when the test runs with `-race` in CI, this reduces the
time and resources required.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-03-08 14:28:30 +00:00
tyltr 24a9678dcc
typo 'efficcient' (#12090)
Signed-off-by: tylitianrui <tylitianrui@126.com>
2023-03-08 09:59:08 +00:00
Justin Lei af1d9e01c7
Refactor tsdbutil for tests/native histograms (#11948)
* Add float histograms to ChunkFromSamplesGeneric

Signed-off-by: Justin Lei <justin.lei@grafana.com>

* Add Generate*Samples functions to tsdbutil

Signed-off-by: Justin Lei <justin.lei@grafana.com>

* PR responses

Signed-off-by: Justin Lei <justin.lei@grafana.com>

---------

Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-02-10 17:09:33 +05:30
Marco Pracucci 2461dee551
Merge remote-tracking branch 'remotes/prometheus/main' into update-upstream 2023-01-26 18:41:17 +01:00
Björn Rabenstein 60d763282e
Merge pull request #11864 from prometheus/beorn7/histogram2
histograms: Return actually useful counter reset hints
2023-01-26 11:22:40 +01:00
beorn7 1cfc8f65a3 histograms: Return actually useful counter reset hints
This is a bit more conservative than we could be. As long as a chunk
isn't the first in a block, we can be pretty sure that the previous
chunk won't disappear. However, the incremental gain of returning
NotCounterReset in these cases is probably very small and might not be
worth the code complications.

Wwith this, we now also pay attention to an explicitly set counter
reset during ingestion. While the case doesn't show up in practice
yet, there could be scenarios where the metric source knows there was
a counter reset even if it might not be visible from the values in the
histogram. It is also useful for testing.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-01-25 16:57:21 +01:00
György Krajcsovits e880b19c55 Merge remote-tracking branch 'upstream/main' into krajo/merge-upstream-main
At 9fb8fe0d4e
2023-01-25 08:53:49 +01:00
Bryan Boreham 9ae3572d24 TestConcurrentRangeQueries: log query with error
We've seen some timeouts in CI, and wanted to know what queries are
involved.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-01-19 16:01:28 +00:00
fayzal-g e320585436 Remove log 2023-01-16 17:47:03 +00:00
fayzal-g 026fcb2f86 Log failed query 2023-01-16 17:25:24 +00:00
fayzal-g 7aef6e28fe Merge remote-tracking branch 'upstream/main' into merge-jan-16-upstream 2023-01-16 15:24:00 +00:00
Ganesh Vernekar 57bcbf1888
Merge pull request #11783 from codesome/gauge-histogram
tsdb: Add gauge histogram support
2023-01-10 19:06:08 +05:30
Ganesh Vernekar 3c2ea91a83
tsdb: Test gauge float histograms
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-01-10 18:35:37 +05:30
Ganesh Vernekar fd89d7892c
Merge pull request #11809 from bboreham/dont-sort-postings-values
tsdb: sort values for Postings only when required
2023-01-10 15:02:21 +05:30
György Krajcsovits 103c4fd289 Merge remote-tracking branch 'upstream/main' into main
# Conflicts:
#	.github/workflows/ci.yml
#	tsdb/block.go
#	tsdb/compact.go
#	tsdb/compact_test.go
#	tsdb/head_read.go
#	tsdb/index/index.go
#	tsdb/ooo_head_read.go
#	tsdb/querier_test.go
2023-01-08 14:55:44 +01:00