Commit graph

13242 commits

Author SHA1 Message Date
gotjosh 37b408c6cd
Feature: Allow configuration of a rule evaluation delay (#14061)
* [PATCH] Allow having evaluation delay for rule groups

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* [PATCH] Fix lint

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* [PATCH] Move the option to ManagerOptions

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* [PATCH] Include evaluation_delay in the group config

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix comments

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Add a server configuration option.

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Appease the linter #1

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Add the new server flag documentation

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Improve documentation of the new flag and configuration

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Use named parameters for clarity on the `Rule` interface

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Add `initial` to the flag help

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Change the CHANGELOG area from `ruler` to `rules`

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Rename evaluation_delay to `rule_query_offset`/`query_offset` and make it a global configuration option.

Signed-off-by: gotjosh <josue.abreu@gmail.com>

E Your branch is up to date with 'origin/gotjosh/evaluation-delay'.

* more docs

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Improve wording on CHANGELOG

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Add `RuleQueryOffset` to the default config in tests in case it changes

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Update docs/configuration/recording_rules.md

Co-authored-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Rename `RuleQueryOffset` to `QueryOffset` when in the group context.

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Improve docstring and documentation on the `rule_query_offset`

Signed-off-by: gotjosh <josue.abreu@gmail.com>

---------

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Julius Volz <julius.volz@gmail.com>
2024-05-30 11:49:50 +01:00
Bryan Boreham 3ee52abb53 [ENHANCEMENT] TSDB: Save map lookup on validation
Goes faster.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-05-30 09:17:11 +01:00
Bryan Boreham 7d98487447 [ENHANCEMENT] TSDB: let Resize re-use buffer
This saves having to zero the buffer every time.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-05-30 09:17:11 +01:00
Bryan Boreham c0bb156eca [ENHANCEMENT] TSDB: Eliminate pointer when storing exemplars
Saves memory and effort.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-05-30 09:17:11 +01:00
Bryan Boreham 3eb5581877 [ENHANCEMENT] TSDB: Reduce map lookups on exemplar index
In many cases we already have a pointer to the entry.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-05-30 09:17:11 +01:00
Bryan Boreham f0c50b5a66 [Test] TSDB: BenchmarkResizeExemplar multiple per series
One exemplar per series is not a typical workload. Make it the same as
`BenchmarkAddExemplar`.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-05-30 09:17:11 +01:00
Bryan Boreham 929fbf860e [Test] TSDB: let BenchmarkAddExemplar reuse slots
Test with different amounts of capacity and exemplars, so that sometimes
new exemplars are evicting older exemplars.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-05-30 09:16:30 +01:00
Ben Ye 6683895620
optimize regex matching for empty label values in posting match (#14075)
Also update tests.

Signed-off-by: Ben Ye <benye@amazon.com>
2024-05-29 16:03:33 +01:00
Arthur Silva Sens 3c1aadd942
Prepare v2.52.1 release
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
2024-05-29 10:41:52 -03:00
Arthur Silva Sens 296dd12ff4
Merge pull request #14141 from dandrucz/LinodeListOptsFix
Linode: bugfix, resolves partial fetch problem in 2.52 when service discovery would return more than exceeds 500 elements
2024-05-29 10:36:38 -03:00
Ben Kochie a6316a5dcb
Merge pull request #14148 from prometheus/superq/more_go_metrics
Enable additional Go metrics
2024-05-28 11:35:41 +02:00
SuperQ 25b0991c3d
Enable additional Go metrics
Enable some additioal Go runtime metrics in order to observe additional
performance data.

Enables a number of new metrics:
```
HELP go_gc_cycles_automatic_gc_cycles_total Count of completed GC cycles generated by the Go runtime.
HELP go_gc_cycles_forced_gc_cycles_total Count of completed GC cycles forced by the application.
HELP go_gc_cycles_total_gc_cycles_total Count of all completed GC cycles.
HELP go_gc_gogc_percent Heap size target percentage configured by the user, otherwise 100. This value is set by the GOGC environment variable, and the runtime/debug.SetGCPercent function.
HELP go_gc_gomemlimit_bytes Go runtime memory limit configured by the user, otherwise math.MaxInt64. This value is set by the GOMEMLIMIT environment variable, and the runtime/debug.SetMemoryLimit function.
HELP go_gc_heap_allocs_by_size_bytes Distribution of heap allocations by approximate size. Bucket counts increase monotonically. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks.
HELP go_gc_heap_allocs_bytes_total Cumulative sum of memory allocated to the heap by the application.
HELP go_gc_heap_allocs_objects_total Cumulative count of heap allocations triggered by the application. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks.
HELP go_gc_heap_frees_by_size_bytes Distribution of freed heap allocations by approximate size. Bucket counts increase monotonically. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks.
HELP go_gc_heap_frees_bytes_total Cumulative sum of heap memory freed by the garbage collector.
HELP go_gc_heap_frees_objects_total Cumulative count of heap allocations whose storage was freed by the garbage collector. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks.
HELP go_gc_heap_goal_bytes Heap size target for the end of the GC cycle.
HELP go_gc_heap_live_bytes Heap memory occupied by live objects that were marked by the previous GC.
HELP go_gc_heap_objects_objects Number of objects, live or unswept, occupying heap memory.
HELP go_gc_heap_tiny_allocs_objects_total Count of small allocations that are packed together into blocks. These allocations are counted separately from other allocations because each individual allocation is not tracked by the runtime, only their block. Each block is already accounted for in allocs-by-size and frees-by-size.
HELP go_gc_limiter_last_enabled_gc_cycle GC cycle the last time the GC CPU limiter was enabled. This metric is useful for diagnosing the root cause of an out-of-memory error, because the limiter trades memory for CPU time when the GC's CPU time gets too high. This is most likely to occur with use of SetMemoryLimit. The first GC cycle is cycle 1, so a value of 0 indicates that it was never enabled.
HELP go_gc_pauses_seconds Deprecated. Prefer the identical /sched/pauses/total/gc:seconds.
HELP go_gc_scan_globals_bytes The total amount of global variable space that is scannable.
HELP go_gc_scan_heap_bytes The total amount of heap space that is scannable.
HELP go_gc_scan_stack_bytes The number of bytes of stack that were scanned last GC cycle.
HELP go_gc_scan_total_bytes The total amount space that is scannable. Sum of all metrics in /gc/scan.
HELP go_gc_stack_starting_size_bytes The stack size of new goroutines.
HELP go_sched_gomaxprocs_threads The current runtime.GOMAXPROCS setting, or the number of operating system threads that can execute user-level Go code simultaneously.
HELP go_sched_goroutines_goroutines Count of live goroutines.
HELP go_sched_latencies_seconds Distribution of the time goroutines have spent in the scheduler in a runnable state before actually running. Bucket counts increase monotonically.
HELP go_sched_pauses_stopping_gc_seconds Distribution of individual GC-related stop-the-world stopping latencies. This is the time it takes from deciding to stop the world until all Ps are stopped. This is a subset of the total GC-related stop-the-world time (/sched/pauses/total/gc:seconds). During this time, some threads may be executing. Bucket counts increase monotonically.
HELP go_sched_pauses_stopping_other_seconds Distribution of individual non-GC-related stop-the-world stopping latencies. This is the time it takes from deciding to stop the world until all Ps are stopped. This is a subset of the total non-GC-related stop-the-world time (/sched/pauses/total/other:seconds). During this time, some threads may be executing. Bucket counts increase monotonically.
HELP go_sched_pauses_total_gc_seconds Distribution of individual GC-related stop-the-world pause latencies. This is the time from deciding to stop the world until the world is started again. Some of this time is spent getting all threads to stop (this is measured directly in /sched/pauses/stopping/gc:seconds), during which some threads may still be running. Bucket counts increase monotonically.
HELP go_sched_pauses_total_other_seconds Distribution of individual non-GC-related stop-the-world pause latencies. This is the time from deciding to stop the world until the world is started again. Some of this time is spent getting all threads to stop (measured directly in /sched/pauses/stopping/other:seconds). Bucket counts increase monotonically.
```

Signed-off-by: SuperQ <superq@gmail.com>
2024-05-28 10:53:04 +02:00
Matthieu MOREL 013998fa7f
Bump golangci-lint action (#14154)
* Bump golangci-lint action to 6.0.1
* Synchronize script/golangci-lint.yml and workflows/ci.yml

Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-05-28 10:06:08 +02:00
Ben Kochie 1971a584ff
Merge pull request #14153 from aknuds1/arve/upgrade-linter
Upgrade to golangci-lint v1.59.0
2024-05-28 07:57:56 +02:00
Arve Knudsen b2396c0c8f Upgrade to golangci-lint v1.59.0
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-05-27 22:38:48 +02:00
Arve Knudsen 707e9d917e
Merge pull request #14000 from aknuds1/arve/query-logger-munmap
promql.ActiveQueryTracker: Unmap mmapped file when done
2024-05-27 21:45:07 +02:00
David Andruczyk 851f68d1cc BUGFIX: Need seperate listOptions structs since linodego writes into them for pagination
Signed-off-by: David Andruczyk <dandrucz@akamai.com>
2024-05-27 17:19:58 +00:00
Simon Pasquier e6f1f7e32d
docs/configuration: clarify OpenStack metadata labels (#14149)
On several occasions, users assumed that the
`__meta_openstack_tag_<key>` labels were about tags [1] instead of
metadata [2]. While we can't really change the Prometheus label name, we
can at least clarify in the documentation what's the information carried
in the label.

[1] https://specs.openstack.org/openstack/api-wg/guidelines/tags.html
[2] https://docs.openstack.org/api-ref/compute/#server-metadata-servers-metadata

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2024-05-27 18:25:02 +02:00
Arve Knudsen f3b8750339 Join errors
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-05-27 17:14:17 +02:00
Arve Knudsen 7b56353090 Merge remote-tracking branch 'prometheus/main' into arve/query-logger-munmap 2024-05-27 17:08:33 +02:00
Alan Protasio 8894d65cd6
Fix head stats and hooks when replaying a corrupted snapshot (#14079)
* Fixing head stats and hooks when replaying a corrupted snapshot

Signed-off-by: alanprot <alanprot@gmail.com>

* Fixing create/removed series metrics

Signed-off-by: alanprot <alanprot@gmail.com>

* Refactoring to have common code between gc and flush method

Signed-off-by: alanprot <alanprot@gmail.com>

* Update tsdb/head.go

Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Alan Protasio <alanprot@gmail.com>

* refactor

Signed-off-by: alanprot <alanprot@gmail.com>

* Update tsdb/head_test.go

Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Alan Protasio <alanprot@gmail.com>

* Update tsdb/head_test.go

Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Alan Protasio <alanprot@gmail.com>

---------

Signed-off-by: alanprot <alanprot@gmail.com>
Signed-off-by: Alan Protasio <alanprot@gmail.com>
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
2024-05-24 22:43:21 -04:00
Arthur Silva Sens c0221d9739
Merge pull request #14130 from mohamedawnallah/fetchGoVersionGitpodDockerFile
.gitpod.Dockerfile: Auto-fetch Go and goyacc versions
2024-05-24 10:35:51 -03:00
Julien a895265c50
Merge pull request #12339 from mickael-carl/mcarl/lint
Document running golangci-lint and make it work on arm64
2024-05-24 14:59:15 +02:00
Julien 0512ebf9da
Merge branch 'main' into mcarl/lint
Signed-off-by: Julien <291750+roidelapluie@users.noreply.github.com>
2024-05-24 14:56:51 +02:00
Mohamed Awnallah 5be753f177
.gitpod.Dockerfile: Auto-fetch Go and goyacc vers
In this commit we auto-fetch Go version from go.mod
and goyacc version from Makefile in the Prometheus repo.

Signed-off-by: Mohamed Awnallah <mohamedmohey2352@gmail.com>
2024-05-24 09:17:18 +03:00
Björn Rabenstein 1081e336a0
Merge pull request #14129 from prometheus/beorn7/doc
doc: Clarify the limits of dumping/backfilling via OpenMetrics
2024-05-23 13:37:42 +02:00
Jayapriya Pai 2d2b440304
fix: correct the typo in azuread sdk auth (#14106)
Signed-off-by: Jayapriya Pai <janantha@redhat.com>
2024-05-21 19:08:35 +02:00
Ayoub Mrini fabcd7e7c6
fix(api): Send warnings only if the limit is really exceeded (#14116)
for the the series, label names and label values APIs

Add warnings count check to TestEndpoints

The limit param was added in https://github.com/prometheus/prometheus/pull/13396

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-05-21 19:07:29 +02:00
Björn Rabenstein 5c85a55e3f
Merge pull request #14120 from kushalShukla-web/remote
added some lines prometheus.md and main.go
2024-05-21 17:49:33 +02:00
beorn7 3127a4029e doc: Clarify the limits of dumping/backfilling via OpenMetrics
This is about native histograms (not yet supported) and staleness
markers (for which OpenMetrics support isn't even planned).

Signed-off-by: beorn7 <beorn@grafana.com>
2024-05-21 14:50:06 +02:00
Björn Rabenstein 3119b8a055
Merge pull request #13218 from machine424/ro-promtool
Make DBReadOnly more RO
2024-05-21 13:27:40 +02:00
Oleg Zaytsev fe9cb5a803
Check context every 128 labels instead of 100 (#14118)
Follow up on https://github.com/prometheus/prometheus/pull/14096

As promised, I bring a benchmark, which shows a very small improvement
if context is checked every 128 iterations of label instead of every
100.

It's much easier for a computer to check modulo 128 than modulo 100.
This is a very small 0-2% improvement but I'd say this is one of the
hottest paths of the app so this is still relevant.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-05-21 11:30:43 +02:00
Björn Rabenstein 114dc5c393
Merge pull request #13638 from NeerajGartia21/promql-test
Converts existing native histogram unit tests to the PromQL testing framework
2024-05-19 19:35:36 +02:00
George Krajcsovits 52f68a96a4
web/api: export defaultStatsRenderer (#14121)
defaultStatsRenderer->DefaultStatsRenderer
Add short docstrings.

I'd like to use the stats renderer to peek at the statistics in
https://github.com/grafana/mimir/pull/7966

However I cannot call the original function without this export afterwards.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-05-18 16:12:33 +02:00
kushagra Shukla 0fea1065fe added line When set, query.max-concurrency may need to be adjusted accordingly. Signed-off-by: kushagra Shukla <kushalshukla110@gmail.com>
Signed-off-by: kushagra Shukla <kushalshukla110@gmail.com>
2024-05-18 07:26:59 -04:00
Charles Korn 76b1237215
Document sorting behaviour
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2024-05-17 13:54:08 +10:00
Julien edf5ebd844
Merge pull request #13970 from jiekun/doc/ovh-dedicated-server-label
docs: [ovh sd] Added missing label for OVH dedicated server in SD
2024-05-16 12:19:06 +02:00
Julien d1eff95faf
Merge pull request #14100 from bboreham/windows-flake
[TEST] Rules: Sleep 15ms to fit Windows behaviour better
2024-05-16 12:04:42 +02:00
Arve Knudsen 5ca56eeb6b
tsdb/index: Refactor Reader tests (#14071)
tsdb/index: Refactor Reader tests

Co-authored-by: Björn Rabenstein <github@rabenste.in>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
2024-05-16 11:51:46 +02:00
Bryan Boreham 1e0b0e250a
Merge pull request #14090 from colega/improve-zeroOrOneCharacterStringMatcher-Matches
Improve `zeroOrOneCharacterStringMatcher` by using `utf8.DecodeRuneInString`
2024-05-16 09:28:53 +01:00
Arve Knudsen 0f01d4b336 Fix flaky test
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-05-15 21:58:56 +02:00
Matthias Loibl 0b1a0c04d8
Merge pull request #14062 from rexagod/multicluster-opt-out
bugfix: allow opting-out of multi-cluster setups
2024-05-15 20:17:02 +01:00
Arve Knudsen bf8d88f326 Merge remote-tracking branch 'origin/main' into arve/query-logger-munmap 2024-05-15 21:02:03 +02:00
Björn Rabenstein 806073ad63
Merge pull request #14091 from alexandear/enable-perfsprint-linter
Enable perfsprint linter and fix up code issues
2024-05-15 17:43:43 +02:00
Oleksandr Redko f10c3454e9 Enable perfsprint linter and fix up code
Signed-off-by: Oleksandr Redko <oleksandr.red+github@gmail.com>
2024-05-15 17:51:05 +03:00
Björn Rabenstein 179163a4c6
Merge pull request #14103 from krajorama/handle-context-cancel-in-postingsformatcher
tsdb/index/postings: fix missing lock unlock
2024-05-15 14:37:12 +02:00
György Krajcsovits b215a41be4 tsdb/index/postings: fix missing lock unlock
Followup to #14096

Unfortunately the previous PR introduced this bug by not releasing the
lock before returning.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-05-15 14:02:39 +02:00
George Krajcsovits fdaafdb041
tsdb: check for context cancel before regex matching postings (#14096)
* tsdb: check for context cancel before regex matching postings

Regex matching can be heavy if the regex takes a lot of cycles to
evaluate and we can get stuck evaluating postings for a long time
without this fix. The constant checkContextEveryNIterations=100
may be changed later.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-05-15 06:26:19 +02:00
Bryan Boreham 10eb23bd6b [TEST] Rules: Sleep 15ms to fit Windows behaviour better
On Windows, Go will sleep 15ms if you ask for less.  TestAsyncRuleEvaluation
compares actual delay to the nominal time, so using 15ms should work
better on Windows, and be hardly noticeable elsewhere.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-05-14 17:45:42 +01:00
Björn Rabenstein e6be4240be
Merge pull request #14068 from colega/quote-label-name-in-matchers-when-needed
Bugfix: quote label name in matchers when needed
2024-05-14 17:18:58 +02:00