prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-11-14 17:44:06 -08:00

Author	SHA1	Message	Date
TJ Hoplock	6ebfbd2d54	chore!: adopt log/slog, remove go-kit/log For: #14355 This commit updates Prometheus to adopt stdlib's log/slog package in favor of go-kit/log. As part of converting to use slog, several other related changes are required to get prometheus working, including: - removed unused logging util func `RateLimit()` - forward ported the util/logging/Deduper logging by implementing a small custom slog.Handler that does the deduping before chaining log calls to the underlying real slog.Logger - move some of the json file logging functionality to use prom/common package functionality - refactored some of the new json file logging for scraping - changes to promql.QueryLogger interface to swap out logging methods for relevant slog sugar wrappers - updated lots of tests that used/replicated custom logging functionality, attempting to keep the logical goal of the tests consistent after the transition - added a healthy amount of `if logger == nil { $makeLogger }` type conditional checks amongst various functions where none were provided -- old code that used the go-kit/log.Logger interface had several places where there were nil references when trying to use functions like `With()` to add keyvals on the new *slog.Logger type Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>	2024-10-07 15:58:50 -04:00
Arve Knudsen	c9a460d570	Merge remote-tracking branch 'prometheus/main' into arve/close-engine Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2024-08-26 12:17:10 +02:00
Max Amin	84b819a69f	feat: add Google cloud roundtripper for remote write (#14346 ) * feat: Google Auth for remote write Signed-off-by: Max Amin <maxamin@google.com> --------- Signed-off-by: Max Amin <maxamin@google.com>	2024-07-30 16:25:19 +01:00
Arve Knudsen	fbc9eddfaf	Refactor engine creation in tests Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2024-07-14 13:58:51 +02:00
Arve Knudsen	e8ae8cf012	Merge remote-tracking branch 'prometheus/main' into arve/close-engine Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2024-07-01 10:47:21 +02:00
Raphael Silva	e0c9b2ee19	Fix linting errors Signed-off-by: Raphael Silva <rapphil@gmail.com>	2024-06-28 23:44:08 +00:00
Raphael Silva	cd5a7b5020	Make rules Manager Update method no-op after Close This has to be done because Close and Update methods are accessed concurrently. Signed-off-by: Raphael Silva <rapphil@gmail.com>	2024-06-28 23:39:46 +00:00
Jeanette Tan	dda5f48c9e	Merge branch 'main' into nhcb-review-2	2024-06-20 22:50:00 +08:00
Oleg Zaytsev	4c1e71fa0b	Reduce the flakiness of TestAsyncRuleEvaluation (#14300 ) * Reduce the flakiness of TestAsyncRuleEvaluation This tests sleeps for 15 millisecond per rule group, and then comprares the entire execution time to be smaller than a multiple of that delay. The ruleCount is 6, so it assumes that the test will come to the assertions in less than 90ms. Meanwhile, the Github's Windows runner: - ...Huh, oh? What? How much time? milliwhat? Sorry I don't speak that. TL;DR, this increases the delay to 250 millisecond. This won't prevent the test from being flaky, but will reduce the flakiness by several orders of magnitude and hopefully won't be an issue anymore. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Make tests parallel Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> --------- Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>	2024-06-14 15:02:46 +02:00
Jeanette Tan	14f8dded39	Merge branch 'main' into nhcb Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2024-06-07 19:17:14 +08:00
Jeanette Tan	9adc1699c3	fix according to code review Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2024-06-07 18:50:59 +08:00
Arve Knudsen	e57aac8084	Merge remote-tracking branch 'prometheus/main' into arve/close-engine Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2024-06-05 11:37:44 +02:00
gotjosh	37b408c6cd	Feature: Allow configuration of a rule evaluation delay (#14061 ) * [PATCH] Allow having evaluation delay for rule groups Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * [PATCH] Fix lint Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * [PATCH] Move the option to ManagerOptions Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * [PATCH] Include evaluation_delay in the group config Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix comments Signed-off-by: gotjosh <josue.abreu@gmail.com> * Add a server configuration option. Signed-off-by: gotjosh <josue.abreu@gmail.com> * Appease the linter #1 Signed-off-by: gotjosh <josue.abreu@gmail.com> * Add the new server flag documentation Signed-off-by: gotjosh <josue.abreu@gmail.com> * Improve documentation of the new flag and configuration Signed-off-by: gotjosh <josue.abreu@gmail.com> * Use named parameters for clarity on the `Rule` interface Signed-off-by: gotjosh <josue.abreu@gmail.com> * Add `initial` to the flag help Signed-off-by: gotjosh <josue.abreu@gmail.com> * Change the CHANGELOG area from `ruler` to `rules` Signed-off-by: gotjosh <josue.abreu@gmail.com> * Rename evaluation_delay to `rule_query_offset`/`query_offset` and make it a global configuration option. Signed-off-by: gotjosh <josue.abreu@gmail.com> E Your branch is up to date with 'origin/gotjosh/evaluation-delay'. * more docs Signed-off-by: gotjosh <josue.abreu@gmail.com> * Improve wording on CHANGELOG Signed-off-by: gotjosh <josue.abreu@gmail.com> * Add `RuleQueryOffset` to the default config in tests in case it changes Signed-off-by: gotjosh <josue.abreu@gmail.com> * Update docs/configuration/recording_rules.md Co-authored-by: Julius Volz <julius.volz@gmail.com> Signed-off-by: gotjosh <josue.abreu@gmail.com> * Rename `RuleQueryOffset` to `QueryOffset` when in the group context. Signed-off-by: gotjosh <josue.abreu@gmail.com> * Improve docstring and documentation on the `rule_query_offset` Signed-off-by: gotjosh <josue.abreu@gmail.com> --------- Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Signed-off-by: gotjosh <josue.abreu@gmail.com> Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com> Co-authored-by: Julius Volz <julius.volz@gmail.com>	2024-05-30 11:49:50 +01:00
Arve Knudsen	0cc99e677a	promql.Engine: Add Close method Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2024-05-28 12:01:47 +02:00
Julien	d1eff95faf	Merge pull request #14100 from bboreham/windows-flake [TEST] Rules: Sleep 15ms to fit Windows behaviour better	2024-05-16 12:04:42 +02:00
Oleksandr Redko	f10c3454e9	Enable perfsprint linter and fix up code Signed-off-by: Oleksandr Redko <oleksandr.red+github@gmail.com>	2024-05-15 17:51:05 +03:00
Bryan Boreham	10eb23bd6b	[TEST] Rules: Sleep 15ms to fit Windows behaviour better On Windows, Go will sleep 15ms if you ask for less. TestAsyncRuleEvaluation compares actual delay to the nominal time, so using 15ms should work better on Windows, and be hardly noticeable elsewhere. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2024-05-14 17:45:42 +01:00
Jeanette Tan	f028496133	Merge branch 'main' into nhcb Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2024-05-14 16:20:15 +08:00
Bryan Boreham	3fd24d1cd7	Merge pull request #13999 from bboreham/extract-promqltest [Test] Extract most PromQL test code into separate packages	2024-05-09 13:23:11 +01:00
Bryan Boreham	8fd96241ab	test: add promqltest package references To packages outside of promql. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2024-05-08 16:08:04 +01:00
Jeanette Tan	796b1bbfde	Merge branch 'main' into nhcb Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2024-05-08 19:11:39 +08:00
gotjosh	c10186eeea	BUGFIX: Mark the rule's restoration process as completed always (#14048 ) * BUGFIX: Mark the rule's restoration process as completed always In https://github.com/prometheus/prometheus/pull/13980 I introduced a change to reduce the number of queries executed when we restore alert statuses. With this, the querying semantics changed as we now need to go through all series before we enter the alert restoration loop and I missed the fact that exiting early when there are no rules to restore would lead to an incomplete restoration. An alert being restored is used as a proxy for "we're now ready to write `ALERTS/ALERTS_FOR_SERIES` metrics" so as a result we weren't writing the series if we didn't restore anything the first time around. --------- Signed-off-by: gotjosh <josue.abreu@gmail.com>	2024-05-03 14:23:46 +01:00
gotjosh	05ca082b07	Rename `alerts` to `expectedAlerts` in the test case input Signed-off-by: gotjosh <josue.abreu@gmail.com>	2024-04-30 12:43:09 +01:00
gotjosh	f63dbc3db2	Remove duplicated sorted and assignment of expected alerts. Signed-off-by: gotjosh <josue.abreu@gmail.com>	2024-04-30 12:39:07 +01:00
gotjosh	276201598c	Fix tests and a bug with the series lookup logic. Signed-off-by: gotjosh <josue.abreu@gmail.com>	2024-04-24 18:46:05 +01:00
György Krajcsovits	bcafa5f1f9	Merge remote-tracking branch 'upstream/main' into update-nhcb	2024-04-24 11:06:59 +02:00
Matthieu MOREL	6f595c6762	golangci-lint: enable whitespace linter (#13905 ) Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2024-04-11 09:27:54 +01:00
György Krajcsovits	5d0a0a7542	Add custom buckets to native histogram model (#13592 ) * add custom buckets to native histogram model * simple copy for custom bounds * return errors for unsupported add/sub operations * add test cases for string and update appendhistogram in scrape to account for new schema * check fields which are supposed to be unused but may affect results in equals * allow appending custom buckets histograms regardless of max schema Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2024-02-28 14:06:43 +01:00
Bryan Boreham	39af788dbd	Tests: use replacement DeepEquals using go-cmp Use DeepEqual replacement using go-cmp, which is more flexible. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2024-02-08 19:30:20 +00:00
Marco Pracucci	5ee3fbe825	Decouple ruler dependency controller from concurrency controller Signed-off-by: Marco Pracucci <marco@pracucci.com>	2024-02-02 10:06:37 +01:00
Marco Pracucci	cbbbd6e70a	Remove superfluous nil check in Group.metrics Signed-off-by: Marco Pracucci <marco@pracucci.com>	2024-01-29 10:21:57 +01:00
Marco Pracucci	046cd7599f	Introduced sequentialRuleEvalController Signed-off-by: Marco Pracucci <marco@pracucci.com>	2024-01-29 10:19:18 +01:00
Marco Pracucci	2764c46531	Added more test cases to TestDependenciesEdgeCases Signed-off-by: Marco Pracucci <marco@pracucci.com>	2024-01-29 10:18:03 +01:00
Marco Pracucci	52bc568d04	Add more test cases to TestDependenciesEdgeCases Signed-off-by: Marco Pracucci <marco@pracucci.com>	2024-01-29 10:17:13 +01:00
Danny Kopping	7aa3b10c3f	Block until all rules, both sync & async, have completed evaluating Updated & added tests Review feedback nits Return empty map if not indeterminate Use highWatermark to track inflight requests counter Appease the linter Clarify feature flag Signed-off-by: Danny Kopping <danny.kopping@grafana.com>	2024-01-29 10:08:41 +01:00
Danny Kopping	f922534c4d	Refactoring for performance, and to allow controller to be overridden Signed-off-by: Danny Kopping <danny.kopping@grafana.com>	2024-01-29 10:08:41 +01:00
Danny Kopping	94cdfa30cd	Refactoring Signed-off-by: Danny Kopping <danny.kopping@grafana.com>	2024-01-29 10:08:41 +01:00
Danny Kopping	0dc7036db3	Optimising dependencies/dependents funcs to not produce new slices each request Signed-off-by: Danny Kopping <danny.kopping@grafana.com>	2024-01-29 10:08:41 +01:00
Danny Kopping	e7758d187e	Refactor concurrency control Signed-off-by: Danny Kopping <danny.kopping@grafana.com>	2024-01-29 10:08:39 +01:00
Danny Kopping	940f83a540	Implementation NOTE: Rebased from main after refactor in #13014 Signed-off-by: Danny Kopping <danny.kopping@grafana.com>	2024-01-29 10:07:15 +01:00
Filip Petkovski	583f3e587c	Optimize histogram iterators (#13340 ) Optimize histogram iterators Histogram iterators allocate new objects in the AtHistogram and AtFloatHistogram methods, which makes calculating rates over long ranges expensive. In #13215 we allowed an existing object to be reused when converting an integer histogram to a float histogram. This commit follows the same idea and allows injecting an existing object in the AtHistogram and AtFloatHistogram methods. When the injected value is nil, iterators allocate new histograms, otherwise they populate and return the injected object. The commit also adds a CopyTo method to Histogram and FloatHistogram which is used in the BufferedIterator to overwrite items in the ring instead of making new copies. Note that a specialized HPoint pool is needed for all of this to work (`matrixSelectorHPool`). --------- Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>	2024-01-23 17:02:14 +01:00
Filip Petkovski	10a82f87fd	Enable reusing memory when converting between histogram types The 'ToFloat' method on integer histograms currently allocates new memory each time it is called. This commit adds an optional *FloatHistogram parameter that can be used to reuse span and bucket slices. It is up to the caller to make sure the input float histogram is not used anymore after the call. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>	2023-12-08 10:22:59 +01:00
Matthieu MOREL	9c4782f1cc	golangci-lint: enable testifylint linter (#13254 ) Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2023-12-07 11:35:01 +00:00
Arve Knudsen	6daee89e5f	Add context argument to Querier.Select (#12660 ) Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2023-09-12 12:37:38 +02:00
Michael Hoffmann	4d8e380269	promql: allow tests to be imported (#12050 ) Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>	2023-08-18 20:48:59 +02:00
beorn7	5b53aa1108	style: Replace `else if` cascades with `switch` Wiser coders than myself have come to the conclusion that a `switch` statement is almost always superior to a statement that includes any `else if`. The exceptions that I have found in our codebase are just these two: * The `if else` is followed by an additional statement before the next condition (separated by a `;`). * The whole thing is within a `for` loop and `break` statements are used. In this case, using `switch` would require tagging the `for` loop, which probably tips the balance. Why are `switch` statements more readable? For one, fewer curly braces. But more importantly, the conditions all have the same alignment, so the whole thing follows the natural flow of going down a list of conditions. With `else if`, in contrast, all conditions but the first are "hidden" behind `} else if `, harder to spot and (for no good reason) presented differently from the first condition. I'm sure the aforemention wise coders can list even more reasons. In any case, I like it so much that I have found myself recommending it in code reviews. I would like to make it a habit in our code base, without making it a hard requirement that we would test on the CI. But for that, there has to be a role model, so this commit eliminates all `if else` occurrences, unless it is autogenerated code or fits one of the exceptions above. Signed-off-by: beorn7 <beorn@grafana.com>	2023-04-19 17:22:31 +02:00
beorn7	c3c7d44d84	lint: Adjust to the lint warnings raised by current versions of golint-ci We haven't updated golint-ci in our CI yet, but this commit prepares for that. There are a lot of new warnings, and it is mostly because the "revive" linter got updated. I agree with most of the new warnings, mostly around not naming unused function parameters (although it is justified in some cases for documentation purposes – while things like mocks are a good example where not naming the parameter is clearer). I'm pretty upset about the "empty block" warning to include `for` loops. It's such a common pattern to do something in the head of the `for` loop and then have an empty block. There is still an open issue about this: https://github.com/mgechev/revive/issues/810 I have disabled "revive" altogether in files where empty blocks are used excessively, and I have made the effort to add individual `// nolint:revive` where empty blocks are used just once or twice. It's borderline noisy, though, but let's go with it for now. I should mention that none of the "empty block" warnings for `for` loop bodies were legitimate. Signed-off-by: beorn7 <beorn@grafana.com>	2023-04-19 17:10:10 +02:00
beorn7	c0879d64cf	promql: Separate `Point` into `FPoint` and `HPoint` In other words: Instead of having a “polymorphous” `Point` that can either contain a float value or a histogram value, use an `FPoint` for floats and an `HPoint` for histograms. This seemingly small change has a _lot_ of repercussions throughout the codebase. The idea here is to avoid the increase in size of `Point` arrays that happened after native histograms had been added. The higher-level data structures (`Sample`, `Series`, etc.) are still “polymorphous”. The same idea could be applied to them, but at each step the trade-offs needed to be evaluated. The idea with this change is to do the minimum necessary to get back to pre-histogram performance for functions that do not touch histograms. Here are comparisons for the `changes` function. The test data doesn't include histograms yet. Ideally, there would be no change in the benchmark result at all. First runtime v2.39 compared to directly prior to this commit: ``` name old time/op new time/op delta RangeQuery/expr=changes(a_one[1d]),steps=1-16 391µs ± 2% 542µs ± 1% +38.58% (p=0.000 n=9+8) RangeQuery/expr=changes(a_one[1d]),steps=10-16 452µs ± 2% 617µs ± 2% +36.48% (p=0.000 n=10+10) RangeQuery/expr=changes(a_one[1d]),steps=100-16 1.12ms ± 1% 1.36ms ± 2% +21.58% (p=0.000 n=8+10) RangeQuery/expr=changes(a_one[1d]),steps=1000-16 7.83ms ± 1% 8.94ms ± 1% +14.21% (p=0.000 n=10+10) RangeQuery/expr=changes(a_ten[1d]),steps=1-16 2.98ms ± 0% 3.30ms ± 1% +10.67% (p=0.000 n=9+10) RangeQuery/expr=changes(a_ten[1d]),steps=10-16 3.66ms ± 1% 4.10ms ± 1% +11.82% (p=0.000 n=10+10) RangeQuery/expr=changes(a_ten[1d]),steps=100-16 10.5ms ± 0% 11.8ms ± 1% +12.50% (p=0.000 n=8+10) RangeQuery/expr=changes(a_ten[1d]),steps=1000-16 77.6ms ± 1% 87.4ms ± 1% +12.63% (p=0.000 n=9+9) RangeQuery/expr=changes(a_hundred[1d]),steps=1-16 30.4ms ± 2% 32.8ms ± 1% +8.01% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=10-16 37.1ms ± 2% 40.6ms ± 2% +9.64% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=100-16 105ms ± 1% 117ms ± 1% +11.69% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16 783ms ± 3% 876ms ± 1% +11.83% (p=0.000 n=9+10) ``` And then runtime v2.39 compared to after this commit: ``` name old time/op new time/op delta RangeQuery/expr=changes(a_one[1d]),steps=1-16 391µs ± 2% 547µs ± 1% +39.84% (p=0.000 n=9+8) RangeQuery/expr=changes(a_one[1d]),steps=10-16 452µs ± 2% 616µs ± 2% +36.15% (p=0.000 n=10+10) RangeQuery/expr=changes(a_one[1d]),steps=100-16 1.12ms ± 1% 1.26ms ± 1% +12.20% (p=0.000 n=8+10) RangeQuery/expr=changes(a_one[1d]),steps=1000-16 7.83ms ± 1% 7.95ms ± 1% +1.59% (p=0.000 n=10+8) RangeQuery/expr=changes(a_ten[1d]),steps=1-16 2.98ms ± 0% 3.38ms ± 2% +13.49% (p=0.000 n=9+10) RangeQuery/expr=changes(a_ten[1d]),steps=10-16 3.66ms ± 1% 4.02ms ± 1% +9.80% (p=0.000 n=10+9) RangeQuery/expr=changes(a_ten[1d]),steps=100-16 10.5ms ± 0% 10.8ms ± 1% +3.08% (p=0.000 n=8+10) RangeQuery/expr=changes(a_ten[1d]),steps=1000-16 77.6ms ± 1% 78.1ms ± 1% +0.58% (p=0.035 n=9+10) RangeQuery/expr=changes(a_hundred[1d]),steps=1-16 30.4ms ± 2% 33.5ms ± 4% +10.18% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=10-16 37.1ms ± 2% 40.0ms ± 1% +7.98% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=100-16 105ms ± 1% 107ms ± 1% +1.92% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16 783ms ± 3% 775ms ± 1% -1.02% (p=0.019 n=9+9) ``` In summary, the runtime doesn't really improve with this change for queries with just a few steps. For queries with many steps, this commit essentially reinstates the old performance. This is good because the many-step queries are the one that matter most (longest absolute runtime). In terms of allocations, though, this commit doesn't make a dent at all (numbers not shown). The reason is that most of the allocations happen in the sampleRingIterator (in the storage package), which has to be addressed in a separate commit. Signed-off-by: beorn7 <beorn@grafana.com>	2023-04-13 19:25:16 +02:00
Soon-Ping	6cecb87941	Generalized rule group iteration evaluation hook (#11885 ) Signed-off-by: Soon-Ping Phang <soonping@amazon.com>	2023-04-04 20:21:13 +02:00
Trevor Whitney	c3e0a83725	rules: no longer force CounterResetHint to Gauge Signed-off-by: Trevor Whitney <trevorjwhitney@gmail.com>	2023-03-14 14:22:07 -06:00

1 2 3

148 commits