prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-11-09 23:24:05 -08:00

Author	SHA1	Message	Date
Danny Kopping	f922534c4d	Refactoring for performance, and to allow controller to be overridden Signed-off-by: Danny Kopping <danny.kopping@grafana.com>	2024-01-29 10:08:41 +01:00
Danny Kopping	94cdfa30cd	Refactoring Signed-off-by: Danny Kopping <danny.kopping@grafana.com>	2024-01-29 10:08:41 +01:00
Danny Kopping	e7758d187e	Refactor concurrency control Signed-off-by: Danny Kopping <danny.kopping@grafana.com>	2024-01-29 10:08:39 +01:00
Danny Kopping	940f83a540	Implementation NOTE: Rebased from main after refactor in #13014 Signed-off-by: Danny Kopping <danny.kopping@grafana.com>	2024-01-29 10:07:15 +01:00
Oleksandr Redko	fa90ca46e5	ci(lint): enable godot; append dot at the end of comments Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>	2023-10-31 19:53:38 +02:00
Danny Kopping	498b836654	Refactoring manager.go into separate concerns Signed-off-by: Danny Kopping <danny.kopping@grafana.com>	2023-10-21 11:11:11 +02:00
Goutham Veeramachaneni	86729d4d7b	Update exp package (#12650 )	2023-09-21 22:53:51 +02:00
Arve Knudsen	6daee89e5f	Add context argument to Querier.Select (#12660 ) Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2023-09-12 12:37:38 +02:00
Julien Pivotto	782e6f64fb	Merge pull request #11295 from dimitarvdimitrov/dimitar/simplify-evalTimestamp Simplify rule group's EvalTimestamp formula	2023-07-18 13:21:20 +02:00
Bryan Boreham	5255bf06ad	Replace sort.Slice with faster slices.SortFunc The generic version is more efficient. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2023-07-02 22:17:08 +00:00
beorn7	c3c7d44d84	lint: Adjust to the lint warnings raised by current versions of golint-ci We haven't updated golint-ci in our CI yet, but this commit prepares for that. There are a lot of new warnings, and it is mostly because the "revive" linter got updated. I agree with most of the new warnings, mostly around not naming unused function parameters (although it is justified in some cases for documentation purposes – while things like mocks are a good example where not naming the parameter is clearer). I'm pretty upset about the "empty block" warning to include `for` loops. It's such a common pattern to do something in the head of the `for` loop and then have an empty block. There is still an open issue about this: https://github.com/mgechev/revive/issues/810 I have disabled "revive" altogether in files where empty blocks are used excessively, and I have made the effort to add individual `// nolint:revive` where empty blocks are used just once or twice. It's borderline noisy, though, but let's go with it for now. I should mention that none of the "empty block" warnings for `for` loop bodies were legitimate. Signed-off-by: beorn7 <beorn@grafana.com>	2023-04-19 17:10:10 +02:00
Ben Ye	fd3630b9a3	add ctx to QueryEngine interface Signed-off-by: Ben Ye <benye@amazon.com>	2023-04-17 21:32:38 -07:00
beorn7	c0879d64cf	promql: Separate `Point` into `FPoint` and `HPoint` In other words: Instead of having a “polymorphous” `Point` that can either contain a float value or a histogram value, use an `FPoint` for floats and an `HPoint` for histograms. This seemingly small change has a _lot_ of repercussions throughout the codebase. The idea here is to avoid the increase in size of `Point` arrays that happened after native histograms had been added. The higher-level data structures (`Sample`, `Series`, etc.) are still “polymorphous”. The same idea could be applied to them, but at each step the trade-offs needed to be evaluated. The idea with this change is to do the minimum necessary to get back to pre-histogram performance for functions that do not touch histograms. Here are comparisons for the `changes` function. The test data doesn't include histograms yet. Ideally, there would be no change in the benchmark result at all. First runtime v2.39 compared to directly prior to this commit: ``` name old time/op new time/op delta RangeQuery/expr=changes(a_one[1d]),steps=1-16 391µs ± 2% 542µs ± 1% +38.58% (p=0.000 n=9+8) RangeQuery/expr=changes(a_one[1d]),steps=10-16 452µs ± 2% 617µs ± 2% +36.48% (p=0.000 n=10+10) RangeQuery/expr=changes(a_one[1d]),steps=100-16 1.12ms ± 1% 1.36ms ± 2% +21.58% (p=0.000 n=8+10) RangeQuery/expr=changes(a_one[1d]),steps=1000-16 7.83ms ± 1% 8.94ms ± 1% +14.21% (p=0.000 n=10+10) RangeQuery/expr=changes(a_ten[1d]),steps=1-16 2.98ms ± 0% 3.30ms ± 1% +10.67% (p=0.000 n=9+10) RangeQuery/expr=changes(a_ten[1d]),steps=10-16 3.66ms ± 1% 4.10ms ± 1% +11.82% (p=0.000 n=10+10) RangeQuery/expr=changes(a_ten[1d]),steps=100-16 10.5ms ± 0% 11.8ms ± 1% +12.50% (p=0.000 n=8+10) RangeQuery/expr=changes(a_ten[1d]),steps=1000-16 77.6ms ± 1% 87.4ms ± 1% +12.63% (p=0.000 n=9+9) RangeQuery/expr=changes(a_hundred[1d]),steps=1-16 30.4ms ± 2% 32.8ms ± 1% +8.01% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=10-16 37.1ms ± 2% 40.6ms ± 2% +9.64% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=100-16 105ms ± 1% 117ms ± 1% +11.69% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16 783ms ± 3% 876ms ± 1% +11.83% (p=0.000 n=9+10) ``` And then runtime v2.39 compared to after this commit: ``` name old time/op new time/op delta RangeQuery/expr=changes(a_one[1d]),steps=1-16 391µs ± 2% 547µs ± 1% +39.84% (p=0.000 n=9+8) RangeQuery/expr=changes(a_one[1d]),steps=10-16 452µs ± 2% 616µs ± 2% +36.15% (p=0.000 n=10+10) RangeQuery/expr=changes(a_one[1d]),steps=100-16 1.12ms ± 1% 1.26ms ± 1% +12.20% (p=0.000 n=8+10) RangeQuery/expr=changes(a_one[1d]),steps=1000-16 7.83ms ± 1% 7.95ms ± 1% +1.59% (p=0.000 n=10+8) RangeQuery/expr=changes(a_ten[1d]),steps=1-16 2.98ms ± 0% 3.38ms ± 2% +13.49% (p=0.000 n=9+10) RangeQuery/expr=changes(a_ten[1d]),steps=10-16 3.66ms ± 1% 4.02ms ± 1% +9.80% (p=0.000 n=10+9) RangeQuery/expr=changes(a_ten[1d]),steps=100-16 10.5ms ± 0% 10.8ms ± 1% +3.08% (p=0.000 n=8+10) RangeQuery/expr=changes(a_ten[1d]),steps=1000-16 77.6ms ± 1% 78.1ms ± 1% +0.58% (p=0.035 n=9+10) RangeQuery/expr=changes(a_hundred[1d]),steps=1-16 30.4ms ± 2% 33.5ms ± 4% +10.18% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=10-16 37.1ms ± 2% 40.0ms ± 1% +7.98% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=100-16 105ms ± 1% 107ms ± 1% +1.92% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16 783ms ± 3% 775ms ± 1% -1.02% (p=0.019 n=9+9) ``` In summary, the runtime doesn't really improve with this change for queries with just a few steps. For queries with many steps, this commit essentially reinstates the old performance. This is good because the many-step queries are the one that matter most (longest absolute runtime). In terms of allocations, though, this commit doesn't make a dent at all (numbers not shown). The reason is that most of the allocations happen in the sampleRingIterator (in the storage package), which has to be addressed in a separate commit. Signed-off-by: beorn7 <beorn@grafana.com>	2023-04-13 19:25:16 +02:00
Soon-Ping	6cecb87941	Generalized rule group iteration evaluation hook (#11885 ) Signed-off-by: Soon-Ping Phang <soonping@amazon.com>	2023-04-04 20:21:13 +02:00
Trevor Whitney	c3e0a83725	rules: no longer force CounterResetHint to Gauge Signed-off-by: Trevor Whitney <trevorjwhitney@gmail.com>	2023-03-14 14:22:07 -06:00
Julien Pivotto	ce55e5074d	Add 'keep_firing_for' field to alerting rules This commit adds a new 'keep_firing_for' field to Prometheus alerting rules. The 'resolve_delay' field specifies the minimum amount of time that an alert should remain firing, even if the expression does not return any results. This feature was discussed at a previous dev summit, and it was determined that a feature like this would be useful in order to allow the expression time to stabilize and prevent confusing resolved messages from being propagated through Alertmanager. This approach is simpler than having two PromQL queries, as was sometimes discussed, and it should be easy to implement. This commit does not include tests for the 'resolve_delay' field. This is intentional, as the purpose of this commit is to gather comments on the proposed design of the 'resolve_delay' field before implementing tests. Once the design of the 'resolve_delay' field has been finalized, a follow-up commit will be submitted with tests." See https://github.com/prometheus/prometheus/issues/11570 Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>	2023-01-13 12:11:39 +01:00
Ganesh Vernekar	d82ea2eb1c	Merge pull request #11838 from codesome/histo-rec rules: Support native histograms	2023-01-12 12:35:15 +05:30
Ganesh Vernekar	53a5071a72	rules: Support native histograms Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2023-01-10 19:07:24 +05:30
Ganesh Vernekar	f1a332c496	rules: Consider ErrTooOldSample in expected errors Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2023-01-05 14:49:30 +05:30
Bryan Boreham	3c7de69059	storage: allow re-use of iterators Patterned after `Chunk.Iterator()`: pass the old iterator in so it can be re-used to avoid allocating a new object. (This commit does not do any re-use; it is just changing all the method signatures so re-use is possible in later commits.) Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-12-15 18:32:45 +00:00
Julius Volz	1a2c645dfa	Correctly handle error unwrapping in rules and remote write receiver errors.Unwrap() actually dangerously returns nil if the error does not have an Unwrap() method, which is the case in at least one of these places where I noticed that no error was being logged at all when it should have. Signed-off-by: Julius Volz <julius.volz@gmail.com>	2022-12-15 12:50:55 +01:00
Dimitar Dimitrov	03ab8dcca0	Add comments on EvalTimestamp Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>	2022-10-12 14:16:22 +02:00
Ganesh Vernekar	648be89822	Merge remote-tracking branch 'upstream/main' into fix-conflict Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-10-12 14:20:02 +05:30
Ganesh Vernekar	46b26c4f09	Fix notifier relabel changing the labels of active alerts (#11427 ) Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-10-07 20:28:17 +05:30
Dimitar Dimitrov	3fb881af26	Simplify rule group's EvalTimestamp formula I found it hard to understand how EvalTimestamp works, so I wanted to simplify the math there. This PR should be a noop. Current formula is: ``` offset = g.hash % g.interval adjNow = startTime - offset base = adjNow - (adjNow % g.interval) EvalTimestamp = base + offset ``` I simplify `EvalTimestamp` ``` EvalTimestamp = base + offset # expand base = adjNow - (adjNow % g.interval) + offset # expand adjNow = startTime - offset - ((startTime - offset) % g.interval) + offset # cancel out offset = startTime - ((startTime - offset) % g.interval) # expand A+B (mod M) = (A (mod M) + B (mod M)) (mod M) = startTime - (startTime % g.interval - offset % g.interval) % g.interval # expand offset = startTime - (startTime % g.interval - ((g.hash % g.interval) % g.interval)) % g.interval # remove redundant mod g.interval = startTime - (startTime % g.interval - g.hash % g.interval) % g.interval # simplify (A (mod M) + B (mod M)) (mod M) = A+B (mod M) = startTime - (startTime - g.hash) % g.interval offset = (startTime - g.hash) % g.interval EvalTimestamp = startTime - offset ``` Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>	2022-09-13 10:52:32 +02:00
beorn7	28f028e938	Merge branch 'main' into sparsehistogram	2022-07-12 19:07:13 +02:00
Matthieu MOREL	ddfa9a7cc5	refactor (rules): move from github.com/pkg/errors to 'errors' and 'fmt' (#10855 ) * refactor (rules): move from github.com/pkg/errors to 'errors' and 'fmt' Signed-off-by: Matthieu MOREL <mmorel-35@users.noreply.github.com>	2022-06-17 09:54:25 +02:00
beorn7	40ad5e284a	Merge branch 'main' into beorn7/sparsehistogram	2022-06-09 20:50:30 +02:00
Julien Pivotto	3a56817a30	Rules: set otel status to ERROR when a rule fails (#10745 ) Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>	2022-05-25 10:06:17 +02:00
Julien Pivotto	0d94cdf107	rules: remove classic UI code (#10730 ) Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>	2022-05-23 16:21:50 +02:00
Łukasz Mierzwa	d3c9c4f574	Stop rule manager before TSDB is stopped (#10680 ) During shutdown TSDB is stopped before rule manager is stopped. Since TSDB shutdown can take a long time (minutes or 10s of minutes) it keeps rule manager running while parts of Prometheus are already stopped (most notebly scrape manager). This can cause false positive alerts to fire, mostly those that rely on absent() calls since new sample appends will stop while alert queries are still evaluated. Stop rules before stopping TSDB and scrape manager to avoid this problem. Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>	2022-05-20 23:26:06 +02:00
beorn7	7ee1836ef5	Merge branch 'main' into sparsehistogram	2022-04-05 18:31:19 +02:00
Wilbert Guo	83a2e52bc2	Add SyncForState Implementation for Ruler HA (#10070 ) * continuously syncing activeAt for alerts Signed-off-by: Yijie Qin <qinyijie@amazon.com> Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> * add import Signed-off-by: Yijie Qin <qinyijie@amazon.com> Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> * Refactor SyncForState and add unit tests Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> * Format code Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> * Add hook for syncForState Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> Fix go lint Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> Refactor syncForState override implementation Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> Add syncForState override func as argument to Update() Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> Fix go formatting Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> Fix circleci test errors Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> Remove overrideFunc as argument to run() Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> * remove the syncForState Signed-off-by: Yijie Qin <qinyijie@amazon.com> * use the override function to decide if need to replace the activeAt or not Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fix test case Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fix format Signed-off-by: Yijie Qin <qinyijie@amazon.com> * Trigger build Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fixing comments Signed-off-by: Yijie Qin <qinyijie@amazon.com> * return the result of map of alerts instead of single one Signed-off-by: Yijie Qin <qinyijie@amazon.com> * upper case the QueryforStateSeries Signed-off-by: Yijie Qin <qinyijie@amazon.com> * use a more generic rule group post process function type Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fix indentation Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fix gofmt Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fix lint Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fixing naming Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fix comments Signed-off-by: Yijie Qin <qinyijie@amazon.com> * add the lastEvalTimestamp as parameter Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fmt Signed-off-by: Yijie Qin <qinyijie@amazon.com> * change funcType to func Signed-off-by: Yijie Qin <qinyijie@amazon.com> Co-authored-by: Yijie Qin <qinyijie@amazon.com> Co-authored-by: Yijie Qin <63399121+qinxx108@users.noreply.github.com>	2022-03-29 02:16:46 +02:00
beorn7	4210aac74a	Merge branch 'main' into sparsehistogram	2022-03-22 14:47:42 +01:00
Alan Protasio	606ef33d91	Track and report Samples Queried per query We always track total samples queried and add those to the standard set of stats queries can report. We also allow optionally tracking per-step samples queried. This must be enabled both at the engine and query level to be tracked and rendered. The engine flag is exposed via a Prometheus feature flag, while the query flag is set when stats=all. Co-authored-by: Alan Protasio <approtas@amazon.com> Co-authored-by: Andrew Bloomgarden <blmgrdn@amazon.com> Co-authored-by: Harkishen Singh <harkishensingh@hotmail.com> Signed-off-by: Andrew Bloomgarden <blmgrdn@amazon.com>	2022-03-21 23:49:17 +01:00
Alvin Lin	cd739214dd	Log rule name when evaluating rule groups' Eval function logs anything (#10454 ) * Add benchingmark test for rule group eval Signed-off-by: Alvin Lin <alvinlin@amazon.com>	2022-03-21 19:52:20 +01:00
Matej Gera	2c61d29b2a	Tracing: Migrate to OpenTelemetry library (#9724 ) Signed-off-by: Matej Gera <matejgera@gmail.com>	2022-01-25 11:08:04 +01:00
Björn Rabenstein	7e42acd3b1	tsdb: Rework iterators (#9877 ) - Pick At... method via return value of Next/Seek. - Do not clobber returned buckets. - Add partial FloatHistogram suppert. Note that the promql package is now _only_ dealing with FloatHistograms, following the idea that PromQL only knows float values. As a byproduct, I have removed the histogramSeries metric. In my understanding, series can have both float and histogram samples, so that metric doesn't make sense anymore. As another byproduct, I have converged the sampleBuf and the histogramSampleBuf in memSeries into one. The sample type stored in the sampleBuf has been extended to also contain histograms even before this commit. Signed-off-by: beorn7 <beorn@grafana.com>	2021-11-29 13:24:23 +05:30
beorn7	5d4db805ac	Merge branch 'main' into sparsehistogram	2021-11-17 19:57:31 +01:00
Björn Rabenstein	4c56a193c5	Merge pull request #9478 from prometheus/beorn7/pkg-deprecation Move packages out of deprecated pkg directory	2021-11-09 11:09:16 +01:00
beorn7	c954cd9d1d	Move packages out of deprecated pkg directory This creates a new `model` directory and moves all data-model related packages over there: exemplar labels relabel rulefmt textparse timestamp value All the others are more or less utilities and have been moved to `util`: gate logging modetimevfs pool runtime Signed-off-by: beorn7 <beorn@grafana.com>	2021-11-09 08:03:10 +01:00
Bryan Boreham	26d8ae0e41	Rules: simplify map key for stale series detection The rules manager keeps a note of which series were generated by the last run, so it can write a stale marker to those that disappeared. Since the keys are not for human eyes, we can use a simpler format and save the effort of quoting label values. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2021-11-08 22:18:48 +01:00
Ganesh Vernekar	c8b267efd6	Get histograms from TSDB to the rate() function implementation Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-11-03 19:04:18 +05:30
Mateusz Gozdek	1a6c2283a3	Format Go source files using 'gofumpt -w -s -extra' Part of #9557 Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>	2021-11-02 19:52:34 +01:00
Levi Harrison	dc2f1993d8	Limit number of alerts or series produced by a rule (#9260 ) * Add limit to rules Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-09-15 09:48:26 +02:00
Levi Harrison	8c29046ab2	Remove unneeded state modifications Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-08-20 16:42:31 -04:00
Levi Harrison	b5f6f8fb36	Switched to go-kit/log Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-06-11 12:28:36 -04:00
Levi Harrison	17ea8d006a	Added external URL access Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-05-30 23:35:26 -04:00
Owen Diehl	23999df27c	expose rule metrics fields Signed-off-by: Owen Diehl <ow.diehl@gmail.com>	2021-04-30 13:36:44 -04:00
Goutham Veeramachaneni	2efdf660b1	Increase evaluation failures on Commit() (#8770 ) I think we should increment the metric here, we're setting the rule health anyways. This means even if the "evaluation" suceeded, none of the samples made it to storage. This is a simplified solution to: https://github.com/prometheus/prometheus/pull/8410/ Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>	2021-04-29 14:28:48 +02:00

1 2 3 4 5

248 commits