Commit graph

14103 commits

Author SHA1 Message Date
George Krajcsovits 9a8b6c52ca
Merge pull request #14313 from prometheus/merge-2.53-to-main
Merge 2.53 to main
2024-06-19 10:23:02 +02:00
György Krajcsovits fcabffb999 Merge branch 'release-2.53' into merge-2.53-to-main 2024-06-19 10:06:57 +02:00
machine424 70beda092a fix(notifier): take alertmanagerSet.mtx before checking alertmanagerSet.ams in sendAll
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-06-19 09:43:52 +02:00
machine424 690de487e2 chore(notifier): Split 'Run()' into two goroutines: one to receive target updates and trigger reloads and the other one to send notifications.
This is done to prevent the latter operation from blocking/starving the former, as previously, the `tsets` channel was consumed by the same goroutine that consumes and feeds the buffered `n.more` channel, the `tsets` channel was less likely to be ready as it's unbuffered and only fed every `SDManager.updatert` seconds.

See https://github.com/prometheus/prometheus/issues/13676 and https://github.com/prometheus/prometheus/issues/8768

The synchronization with the sendLoop goroutine is managed through the n.mtx mutex.

This uses a similar approach than scrape manager's efbd6e41c5/scrape/manager.go (L115-L117)

The old TestHangingNotifier was replaced by the new one to more closely reflect reality.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-06-19 09:43:52 +02:00
machine424 94d28cd6cf chore(notifier): add a reproducer for https://github.com/prometheus/prometheus/issues/13676
to show "targets groups update" starvation when the notifications queue is full and an Alertmanager
is down.

The existing `TestHangingNotifier` that was added in https://github.com/prometheus/prometheus/pull/10948 doesn't really reflect the reality as the SD changes are manually fed into `syncCh` in a continuous way, whereas in reality, updates are only resent every `updatert`.

The test added here sets up an SD manager and links it to the notifier. The SD changes will be triggered by that manager as it's done in reality.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>

Co-authored-by: Ethan Hunter <ehunter@hudson-trading.com>
2024-06-19 09:43:52 +02:00
anarcat 545d31f184
docs: clarify backup requirements for storage (#14297)
* clarify backup requirements for storage

After reading this (again) recently, I was under the impression that our backup strategy ("just throw Bacula at it") was just not good enough and that our backups were inconsistent. I filed [an issue internally][41627] about this because of that concern.

But reading a conversation with @SuperQ on IRC, I came under the impression that only the WAL files would be lost. This is an attempt at documenting this more clearly.

[41627]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41627
---------

Signed-off-by: anarcat <anarcat@users.noreply.github.com>
Co-authored-by: Ben Kochie <superq@gmail.com>
2024-06-19 07:46:13 +02:00
Arve Knudsen be975bf8d7 golangci-lint: Enable loggercheck linter
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-06-18 20:41:26 +02:00
Björn Rabenstein b6ef745016
Merge pull request #14305 from charleskorn/charleskorn/convert-range-query-tests
promql: Convert more test cases to test scripting language
2024-06-18 17:27:55 +02:00
Björn Rabenstein d968408f51
Merge branch 'main' into charleskorn/convert-range-query-tests 2024-06-18 17:11:57 +02:00
George Krajcsovits d3318c21a3
Merge pull request #14287 from krajorama/nhcb-suggest-fix
native histograms: only reduce resolution for exponential histograms
2024-06-18 16:50:30 +02:00
György Krajcsovits 79020b1e85 Comment float histogram as well
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-06-18 15:22:03 +02:00
György Krajcsovits c309f50ee7 Add comment to state intent of check
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-06-18 15:21:17 +02:00
George Krajcsovits 4c35b9250a
Merge pull request #14303 from prometheus/prepare-2.53.0-release
Prepare 2.53.0 release
2024-06-18 15:08:14 +02:00
Rens Groothuijsen 1c3f322f78
docs: mention implicitly watched directories in documentation (#14019)
* docs: mention implicitly watched directories in documentation

Signed-off-by: Rens Groothuijsen <l.groothuijsen@alumni.maastrichtuniversity.nl>

* Add mention of atomic file renaming

Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Rens Groothuijsen <l.groothuijsen@alumni.maastrichtuniversity.nl>

---------

Signed-off-by: Rens Groothuijsen <l.groothuijsen@alumni.maastrichtuniversity.nl>
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Co-authored-by: Björn Rabenstein <beorn@grafana.com>
2024-06-18 13:51:47 +02:00
George Krajcsovits 29d3e48267
Update CHANGELOG.md
Co-authored-by: Julien <291750+roidelapluie@users.noreply.github.com>
Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>
2024-06-18 13:45:53 +02:00
Oleg Zaytsev fd1a89b7c8
Pass affected labels to MemPostings.Delete() (#14307)
* Pass affected labels to MemPostings.Delete

As suggested by @bboreham, we can track the labels of the deleted series
and avoid iterating through all the label/value combinations.

This looks much faster on the MemPostings.Delete call. We don't have a
benchmark on stripeSeries.gc() where we'll pay the price of iterating
the labels of each one of the deleted series.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-06-18 10:28:56 +00:00
Oleg Zaytsev 4f78cc809c
Refactor toNormalisedLower: shorter and slightly faster. (#14299)
Refactor toNormalisedLower: shorter and slightly faster

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-06-18 09:57:37 +00:00
Julien 6572b1fe63
Merge pull request #14306 from pracucci/export-tolabelmatchers
Export remote.ToLabelMatchers()
2024-06-18 11:23:08 +02:00
Marco Pracucci 0fbf4a2529
Export remote.ToLabelMatchers()
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-06-17 10:40:45 +02:00
Charles Korn aeec30f082
Convert TestTimestampFunction_StepsMoreOftenThanSamples
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2024-06-17 16:56:56 +10:00
Charles Korn 987fa5c6a2
Convert range query test cases to test scripting language
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2024-06-17 16:43:01 +10:00
George Krajcsovits 5efc8dd27b
Merge pull request #14302 from yeya24/fix-check-ctx-cancel-count
fix check context cancellation not incrementing count
2024-06-17 08:36:56 +02:00
Ben Ye 0e6fca8e76 add unit test
Signed-off-by: Ben Ye <benye@amazon.com>
2024-06-16 12:09:42 -07:00
György Krajcsovits e121d07388 Prepare release 2.53.0
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-06-16 10:24:09 +02:00
Ben Ye e7db2e30a4 fix check context cancellation not incrementing count
Signed-off-by: Ben Ye <benye@amazon.com>
2024-06-15 11:43:26 -07:00
Oleg Zaytsev 4c1e71fa0b
Reduce the flakiness of TestAsyncRuleEvaluation (#14300)
* Reduce the flakiness of TestAsyncRuleEvaluation

This tests sleeps for 15 millisecond per rule group, and then comprares
the entire execution time to be smaller than a multiple of that delay.

The ruleCount is 6, so it assumes that the test will come to the
assertions in less than 90ms.

Meanwhile, the Github's Windows runner:
- ...Huh, oh? What? How much time? milliwhat? Sorry I don't speak that.

TL;DR, this increases the delay to 250 millisecond. This won't prevent
the test from being flaky, but will reduce the flakiness by several
orders of magnitude and hopefully won't be an issue anymore.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Make tests parallel

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

---------

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-06-14 15:02:46 +02:00
Arve Knudsen b7320ef636 Merge remote-tracking branch 'prometheus/main' into arve/close-engine 2024-06-14 10:51:35 +02:00
Oleg Zaytsev 03cf6141d4
Fix Matcher.String() with empty label name
When the label name is empty, which can happen now with quoted label
name, it should be quoted when printed as a string again.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-06-13 18:46:35 +02:00
Ben Ye 5a218708f1
tsdb: Extend compactor interface to allow compactions to create multiple output blocks (#14143)
* add hook to allow head compaction to create multiple output blocks

Signed-off-by: Ben Ye <benye@amazon.com>

* change Compact interface; remove BlockPopulator changes

Signed-off-by: Ben Ye <benye@amazon.com>

* rebase main

Signed-off-by: Ben Ye <benye@amazon.com>

* fix lint

Signed-off-by: Ben Ye <benye@amazon.com>

* fix unit test

Signed-off-by: Ben Ye <benye@amazon.com>

* address feedbacks; add unit test

Signed-off-by: Ben Ye <benye@amazon.com>

* Apply suggestions from code review

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Update tsdb/compact_test.go

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
2024-06-12 17:31:25 -04:00
Sebastian Rabenhorst 05380aa0ac
agent db: make rejecting ooo samples configurable (#14094)
feat: Make OOO ingestion time window configurable for Prometheus Agent.

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>
2024-06-12 11:07:42 -03:00
Oleg Zaytsev 64a9abb8be
Change LabelValuesFor() to accept index.Postings (#14280)
The only call we have to LabelValuesFor() has an index.Postings, and we
expand it to pass to this method, which will iterate over the values.

That's a waste of resources: we can iterate on the index.Postings
directly.

If there's any downstream implementation that has a slice of series,
they can always do an index.ListPostings from them: doing that is
cheaper than expanding an abstract index.Postings.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-06-11 15:36:46 +02:00
George Krajcsovits 604287400c
Merge pull request #14284 from prometheus/prepare-2.53.rc.1
Prepare 2.53.rc.1
2024-06-11 14:01:40 +02:00
György Krajcsovits dd8676218b Prepare 2.53.0-rc.1 release
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-06-11 12:56:32 +02:00
György Krajcsovits 4cfec57606 Revert "Update changelog due to pr 14273"
This reverts commit dd44001465.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-06-11 12:55:02 +02:00
Ben Kochie 4c4c2be6dd
Merge pull request #14288 from prometheus/superq/pick_14285
Tune default GOGC
2024-06-11 11:58:10 +02:00
SuperQ 38bf349ff7
Update changelog for GOGC tuning
Include #14285 in changelog.

Signed-off-by: SuperQ <superq@gmail.com>
2024-06-11 11:37:32 +02:00
SuperQ 6ccee2c4a5
Tune default GOGC
Adjust the default GOGC value to 75. This is less of a memory savings,
but has less impact on CPU use.

Signed-off-by: SuperQ <superq@gmail.com>
2024-06-11 11:17:33 +02:00
György Krajcsovits 0793a26d96 native histograms: only reduce resolution for exponential histograms
Currently we can only reduce the resolution of exponential native
histograms, so checking the schema for that is slightly more precise
than checking against max schema.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-06-11 11:17:14 +02:00
Ben Kochie 7f0caf7229
Merge pull request #14285 from prometheus/superq/tune_gogc
Tune default GOGC
2024-06-11 11:14:50 +02:00
George Krajcsovits 23cca90cac
Merge pull request #14277 from zenador/nhcb-review
[nhcb branch] address review comments and merge with main
2024-06-11 10:55:02 +02:00
Julien b31aaa6f8a
Merge pull request #14282 from parnavh/fix-broken-link
fix: broken link on github mobile
2024-06-11 10:12:42 +02:00
SuperQ ea2b39a31e
Tune default GOGC
Adjust the default GOGC value to 75. This is less of a memory savings,
but has less impact on CPU use.

Signed-off-by: SuperQ <superq@gmail.com>
2024-06-11 03:44:06 +02:00
Ranveer Avhad 39902ba694
[BUGFIX] FastRegexpMatcher: do Unicode normalization as part of case-insensitive comparison (#14170)
* Converted string to standarized form
* Added golang.org/x/text in Go dependencies
* Added test cases for FastRegexMatcher
* Added benchmark for toNormalizedLower

Signed-off-by: RA <ranveeravhad777@gmail.com>
2024-06-10 18:31:41 -04:00
Bryan Boreham 64c5cc5134
Merge pull request #14209 from bboreham/api-error-url
[ENHANCEMENT] HTTP API: Add url to errors logged while sending response
2024-06-11 01:09:11 +03:00
Bryan Boreham c5d923aa7c
Merge pull request #14279 from colega/fix-label-names-for-not-found
headIndexReader.LabelNamesFor: skip not found series
2024-06-11 01:06:19 +03:00
Sergey 5a5a6f08ef
chore: use HumanizeDuration from prometheus/common (#14202)
* chore: use HumanizeDuration from prometheus/common

Signed-off-by: Sergey <freak12techno@gmail.com>

* chore: fixed linting

Signed-off-by: Sergey <freak12techno@gmail.com>

* chore: review fixes

---------

Signed-off-by: Sergey <freak12techno@gmail.com>
2024-06-10 20:40:11 +02:00
Rens Groothuijsen 19fd5212c3
docs: clarify default Docker command line parameters (#14194)
* docs: clarify default Docker command line parameters

Signed-off-by: Rens Groothuijsen <l.groothuijsen@alumni.maastrichtuniversity.nl>

* docs: move Docker command line parameters section and refer to Dockerfile

Signed-off-by: Rens Groothuijsen <l.groothuijsen@alumni.maastrichtuniversity.nl>

* Add link to Dockerfile in documentation

Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Rens Groothuijsen <l.groothuijsen@alumni.maastrichtuniversity.nl>

---------

Signed-off-by: Rens Groothuijsen <l.groothuijsen@alumni.maastrichtuniversity.nl>
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
2024-06-10 20:16:02 +02:00
Oleg Zaytsev 10a3c7220b
MemPostings.PostingsForLabelMatching(): don't hold the mutex while matching (#14286)
* MemPostings.PostingsForLabelMatching: let mutex go

This changes the `MemPostings.PostingsForLabelMatching` implementation
to stop holding the read mutex while matching the label values.

We've seen that this method can be slow when the matcher is expensive,
that's why we even added a context expiration check.

However, there are critical process that might be waiting on this mutex:
writes (adding new series) and compaction (deleting the
garbage-collected ones), so we should avoid holding it for a long period
of time.

Given that we've copied the values to a slice anyway, there's no need to
hold the lock while matching.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-06-10 14:24:17 +02:00
Oleg Zaytsev 2dc177d8af
MemPostings.Delete(): reduce locking/unlocking (#13286)
* MemPostings: reduce locking/unlocking

MemPostings.Delete is called from Head.gc(), i.e. it gets the IDs of the
series that have churned.

I'd assume that many label values aren't affected by that churn at all,
so it doesn't make sense to touch the lock while checking them.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-06-10 14:23:22 +02:00
Björn Rabenstein 08621bebe9
Merge pull request #14269 from prometheus/beorn7/histogram-test
promql: Add tests for histogram counter reset only in bucket
2024-06-08 16:59:47 +02:00