prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-11-15 01:54:06 -08:00

Author	SHA1	Message	Date
Ganesh Vernekar	3d54bcc018	Merge pull request #14362 from charleskorn/charleskorn/sum-infinity	2024-07-03 01:05:03 -04:00
Charles Korn	fd6bdf5230	Fix issue where summation of +/- infinity returns NaN instead of infinity Signed-off-by: Charles Korn <charles.korn@grafana.com>	2024-06-28 11:26:54 +10:00
Jeanette Tan	f028496133	Merge branch 'main' into nhcb Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2024-05-14 16:20:15 +08:00
Bryan Boreham	2b0c87b1b6	test: turn TestKahanSum into scripted test This saves having a function solely to call kahanSumInc. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2024-05-08 13:42:55 +01:00
György Krajcsovits	bcafa5f1f9	Merge remote-tracking branch 'upstream/main' into update-nhcb	2024-04-24 11:06:59 +02:00
Matthieu MOREL	6f595c6762	golangci-lint: enable whitespace linter (#13905 ) Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2024-04-11 09:27:54 +01:00
György Krajcsovits	2a4aa085d2	Merge branch 'main' into nhcb	2024-03-27 18:42:10 +01:00
Björn Rabenstein	b9a2a4e329	Merge pull request #13852 from prometheus/fix-hist-std-dev-var-negative Fix hist std dev var negative	2024-03-27 17:58:03 +01:00
Jeanette Tan	4f2df329bd	improve handling of empty buckets with infinite bounds in histogram std dev/var Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2024-03-27 17:06:12 +01:00
Jeanette Tan	22d0f4f114	improve handling of negative bounds in histogram std dev/var Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2024-03-27 17:06:12 +01:00
György Krajcsovits	a3d1a46eda	Merge branch 'main' into nhcb	2024-03-22 14:51:48 +01:00
tdakkota	f6834c347a	promql: validate `label_join` destination label Signed-off-by: tdakkota <tanc13@yandex.ru>	2024-03-20 22:02:10 +03:00
beorn7	7f912db15a	promql: Fix limiting of extrapolation to negative values This is a bit tough to explain, but I'll try: `rate` & friends have a sophisticated extrapolation algorithm. Usually, we extrapolate the result to the total interval specified in the range selector. However, if the first sample within the range is too far away from the beginning of the interval, or if the last sample within the range is too far away from the end of the interval, we assume the series has just started half a sampling interval before the first sample or after the last sample, respectively, and shorten the extrapolation interval correspondingly. We calculate the sampling interval by looking at the average time between samples within the range, and we define "too far away" as "more than 110% of that sampling interval". However, if this algorithm leads to an extrapolated starting value that is negative, we limit the start of the extrapolation interval to the point where the extrapolated starting value is zero. At least that was the intention. What we actually implemented is the following: If extrapolating all the way to the beginning of the total interval would lead to an extrapolated negative value, we would only extrapolate to the zero point as above, even if the algorithm above would have selected a starting point that is just half a sampling interval before the first sample and that starting point would not have an extrapolated negative value. In other word: What was meant as a _limitation_ of the extrapolation interval yielded a _longer_ extrapolation interval in this case. There is an exception to the case just described: If the increase of the extrapolation interval is more than 110% of the sampling interval, we suddenly drop back to only extrapolate to half a sampling interval. This behavior can be nicely seen in the testcounter_zero_cutoff test, where the rate goes up all the way to 0.7 and then jumps back to 0.6. This commit changes the behavior to what was (presumably) intended from the beginning: The extension of the extrapolation interval is only limited if actually needed to prevent extrapolation to negative values, but the "limitation" never leads to _more_ extrapolation anymore. The difference is subtle, and probably it never bothered anyone. However, if you calculate a rate of a classic histograms, the old behavior might create non-monotonic histograms as a result (because of the jumps you can see nicely in the old version of the testcounter_zero_cutoff test). With this fix, that doesn't happen anymore. Signed-off-by: beorn7 <beorn@grafana.com>	2024-03-07 01:20:33 +01:00
Björn Rabenstein	9187bcbdd5	Merge pull request #13536 from bboreham/faster-label-replace promql: faster range-query of label_replace and label_join	2024-02-29 17:03:00 +01:00
machine424	f477e0539a	Move from golang.org/x/exp/slices into slices now that we only support Go >= 1.21 Prevent adding back golang.org/x/exp/slices. Signed-off-by: machine424 <ayoubmrini424@gmail.com>	2024-02-28 14:54:53 +01:00
György Krajcsovits	5d0a0a7542	Add custom buckets to native histogram model (#13592 ) * add custom buckets to native histogram model * simple copy for custom bounds * return errors for unsupported add/sub operations * add test cases for string and update appendhistogram in scrape to account for new schema * check fields which are supposed to be unused but may affect results in equals * allow appending custom buckets histograms regardless of max schema Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2024-02-28 14:06:43 +01:00
Bryan Boreham	fdd5b85e06	promql: faster range-query of label_replace and label_join These functions act on the labels only, so don't need to go step by step over the samples in a range query. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2024-02-04 11:12:55 +01:00
Faustas Butkus	6feffeb92e	promql: add histogram_avg function (#13467 ) Add histogram_avg function --------- Signed-off-by: Faustas Butkus <faustas.butkus@gmail.com> Signed-off-by: Björn Rabenstein <github@rabenste.in> Co-authored-by: Björn Rabenstein <github@rabenste.in>	2024-02-01 18:28:42 +01:00
Filip Petkovski	a577a0a542	Fix last_over_time for native histograms The last_over_time retains a histogram sample without making a copy. This sample is now coming from the buffered iterator used for windowing functions, and can be reused for reading subsequent samples as the iterator progresses. I would propose copying the sample in the last_over_time function, similar to how it is done for rate, sum_over_time and others. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>	2024-01-26 15:02:40 +01:00
Bryan Boreham	74b73d1e2c	Labels: Add DropMetricName function, used in PromQL (#13446 ) This function is called very frequently when executing PromQL functions, and we can do it much more efficiently inside Labels. In the common case that `__name__` comes first in the labels, we simply re-point to start at the next label, which is nearly free. `DropMetricName` is now so cheap I removed the cache - benchmarks show everything still goes faster. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2024-01-25 11:48:49 +01:00
Björn Rabenstein	bfbb13cf36	Merge pull request #13267 from linasm/simplify-native-histogram-math promql: simplify Native Histogram arithmetics	2024-01-18 13:50:59 +01:00
zenador	a3ddfbd1ee	Add warnings for histogramRate applied with isCounter not matching counter/gauge histogram (#13392 ) Add warnings for histogramRate applied with isCounter not matching counter/gauge histogram --------- Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2024-01-17 17:06:35 +01:00
Ivan Babrou	a6b35ff304	promql: use natural sort in sort_by_label and sort_by_label_desc (#13411 ) These functions are intended for humans, as robots can already sort the results however they please. Humans like things sorted "naturally": * https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/ A similar thing has been done to Grafana, which is also used by humans: * https://github.com/grafana/grafana/pull/78024 * https://github.com/grafana/grafana/pull/78494 Signed-off-by: Ivan Babrou <github@ivan.computer>	2024-01-16 21:34:09 -03:00
Linas Medziunas	7319ad6a0b	promql: simplify Native Histogram arithmetics Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>	2023-12-08 10:59:00 +02:00
Jeanette Tan	9bf4cc993e	Add mad_over_time function Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2023-12-01 01:22:58 +08:00
beorn7	0eb0ca42c5	Update “conventional histogram” → “classic histogram” Signed-off-by: beorn7 <beorn@grafana.com>	2023-11-29 15:22:58 +01:00
Alexander Trost	5051a993ab	promql: add sort_by_label and sort_by_label_desc functions This adds functions to sort a vector by its label value. Based on https://github.com/prometheus/prometheus/pull/1533 Signed-off-by: Alexander Trost <galexrt@googlemail.com>	2023-11-28 14:40:07 +01:00
zenador	ccfe14d7e7	PromQL: ignore small errors for bucketQuantile (#13153 ) promql: Improve histogram_quantile calculation for classic buckets Tiny differences between classic buckets are most likely caused by floating point precision issues. With this commit, relative changes below a certain threshold are ignored. This makes the result of histogram_quantile more meaningful, and also avoids triggering the _input to histogram_quantile needed to be fixed for monotonicity_ annotations in unactionable cases. This commit also adds explanation of the new adjustment and of the monotonicity annotation to the documentation of `histogram_quantile`. --------- Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2023-11-25 00:05:38 +01:00
Björn Rabenstein	a43669e611	Merge pull request #12928 from alexandear/ci-enable-godot ci(lint): enable godot; append dot at the end of comments	2023-11-01 17:15:41 +01:00
Julien Pivotto	f568221610	Merge pull request #13057 from prometheus/release-2.48 Merge release-2.48 back into main	2023-10-31 15:24:39 -04:00
Oleksandr Redko	fa90ca46e5	ci(lint): enable godot; append dot at the end of comments Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>	2023-10-31 19:53:38 +02:00
Oleksandr Redko	8e5f0387a2	ci(lint): enable nolintlint and remove redundant comments (#12926 ) Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>	2023-10-31 12:35:13 +01:00
zenador	80e977aae6	Remove `NewPossibleNonCounterInfo` and minimise creating empty annotations (#13012 ) * Remove NewPossibleNonCounterInfo until it can be made more efficient, and avoid creating empty annotations as much as possible Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2023-10-24 17:36:07 +01:00
Jeanette Tan	9a8bd8eac6	Fix possible non-counter warning for empty names and native histograms Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2023-10-16 15:52:10 +08:00
Julius Volz	191c24a0ed	Fix: Exempt "_bucket" suffix from PossibleNonCounterInfo warning (#12982 ) Related to PR #12152 Signed-off-by: Julius Volz <julius.volz@gmail.com> Signed-off-by: Levi Harrison <git@leviharrison.dev>	2023-10-15 13:47:42 -04:00
Jeanette Tan	0cbf0c1c68	Revise according to code review Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2023-10-06 19:09:32 +08:00
Jeanette Tan	feaa93da77	Add warning when monotonicity is forced in the input to histogram_quantile Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2023-10-04 18:53:55 +08:00
zenador	69edd8709b	Add warnings (and annotations) to PromQL query results (#12152 ) Return annotations (warnings and infos) from PromQL queries This generalizes the warnings we have already used before (but only for problems with remote read) as "annotations". Annotations can be warnings or infos (the latter could be false positives). We do not treat them different in the API for now and return them all as "warnings". It would be easy to distinguish them and return infos separately, should that appear useful in the future. The new annotations are then used to create a lot of warnings or infos during PromQL evaluations. Partially these are things we have wanted for a long time (e.g. inform the user that they have applied `rate` to a metric that doesn't look like a counter), but the new native histograms have created even more needs for those annotations (e.g. if a query tries to aggregate float numbers with histograms). The annotations added here are not yet complete. A prominent example would be a warning about a range too short for a rate calculation. But such a warnings is more tricky to create with good fidelity and we will tackle it later. Another TODO is to take annotations into account when evaluating recording rules. --------- Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2023-09-14 18:57:31 +02:00
zenador	54aaa2bd7e	Add `histogram_stdvar` and `histogram_stddev` functions (#12614 ) * Add new function: histogram_stdvar and histogram_stddev Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2023-08-24 21:02:14 +02:00
zenador	191bf9055b	Handle more arithmetic operators for native histograms (#12262 ) Handle more arithmetic operators and aggregators for native histograms This includes operators for multiplication (formerly known as scaling), division, and subtraction. Plus aggregations for average and the avg_over_time function. Stdvar and stddev will (for now) ignore histograms properly (rather than counting them but adding a 0 for them). Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2023-05-16 21:15:20 +02:00
Matthieu MOREL	bae9a21200	Merge branch 'main' into linter/nilerr Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2023-04-19 19:56:39 +02:00
beorn7	5b53aa1108	style: Replace `else if` cascades with `switch` Wiser coders than myself have come to the conclusion that a `switch` statement is almost always superior to a statement that includes any `else if`. The exceptions that I have found in our codebase are just these two: * The `if else` is followed by an additional statement before the next condition (separated by a `;`). * The whole thing is within a `for` loop and `break` statements are used. In this case, using `switch` would require tagging the `for` loop, which probably tips the balance. Why are `switch` statements more readable? For one, fewer curly braces. But more importantly, the conditions all have the same alignment, so the whole thing follows the natural flow of going down a list of conditions. With `else if`, in contrast, all conditions but the first are "hidden" behind `} else if `, harder to spot and (for no good reason) presented differently from the first condition. I'm sure the aforemention wise coders can list even more reasons. In any case, I like it so much that I have found myself recommending it in code reviews. I would like to make it a habit in our code base, without making it a hard requirement that we would test on the CI. But for that, there has to be a role model, so this commit eliminates all `if else` occurrences, unless it is autogenerated code or fits one of the exceptions above. Signed-off-by: beorn7 <beorn@grafana.com>	2023-04-19 17:22:31 +02:00
beorn7	c3c7d44d84	lint: Adjust to the lint warnings raised by current versions of golint-ci We haven't updated golint-ci in our CI yet, but this commit prepares for that. There are a lot of new warnings, and it is mostly because the "revive" linter got updated. I agree with most of the new warnings, mostly around not naming unused function parameters (although it is justified in some cases for documentation purposes – while things like mocks are a good example where not naming the parameter is clearer). I'm pretty upset about the "empty block" warning to include `for` loops. It's such a common pattern to do something in the head of the `for` loop and then have an empty block. There is still an open issue about this: https://github.com/mgechev/revive/issues/810 I have disabled "revive" altogether in files where empty blocks are used excessively, and I have made the effort to add individual `// nolint:revive` where empty blocks are used just once or twice. It's borderline noisy, though, but let's go with it for now. I should mention that none of the "empty block" warnings for `for` loop bodies were legitimate. Signed-off-by: beorn7 <beorn@grafana.com>	2023-04-19 17:10:10 +02:00
Matthieu MOREL	fb3eb21230	enable gocritic, unconvert and unused linters Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2023-04-13 19:20:22 +00:00
beorn7	817a2396cb	Name float values as "floats", not as "values" In the past, every sample value was a float, so it was fine to call a variable holding such a float "value" or "sample". With native histograms, a sample might have a histogram value. And a histogram value is still a value. Calling a float value just "value" or "sample" or "V" is therefore misleading. Over the last few commits, I already renamed many variables, but this cleans up a few more places where the changes are more invasive. Note that we do not to attempt naming in the JSON APIs or in the protobufs. That would be quite a disruption. However, internally, we can call variables as we want, and we should go with the option of avoiding misunderstandings. Signed-off-by: beorn7 <beorn@grafana.com>	2023-04-13 19:25:24 +02:00
beorn7	c0879d64cf	promql: Separate `Point` into `FPoint` and `HPoint` In other words: Instead of having a “polymorphous” `Point` that can either contain a float value or a histogram value, use an `FPoint` for floats and an `HPoint` for histograms. This seemingly small change has a _lot_ of repercussions throughout the codebase. The idea here is to avoid the increase in size of `Point` arrays that happened after native histograms had been added. The higher-level data structures (`Sample`, `Series`, etc.) are still “polymorphous”. The same idea could be applied to them, but at each step the trade-offs needed to be evaluated. The idea with this change is to do the minimum necessary to get back to pre-histogram performance for functions that do not touch histograms. Here are comparisons for the `changes` function. The test data doesn't include histograms yet. Ideally, there would be no change in the benchmark result at all. First runtime v2.39 compared to directly prior to this commit: ``` name old time/op new time/op delta RangeQuery/expr=changes(a_one[1d]),steps=1-16 391µs ± 2% 542µs ± 1% +38.58% (p=0.000 n=9+8) RangeQuery/expr=changes(a_one[1d]),steps=10-16 452µs ± 2% 617µs ± 2% +36.48% (p=0.000 n=10+10) RangeQuery/expr=changes(a_one[1d]),steps=100-16 1.12ms ± 1% 1.36ms ± 2% +21.58% (p=0.000 n=8+10) RangeQuery/expr=changes(a_one[1d]),steps=1000-16 7.83ms ± 1% 8.94ms ± 1% +14.21% (p=0.000 n=10+10) RangeQuery/expr=changes(a_ten[1d]),steps=1-16 2.98ms ± 0% 3.30ms ± 1% +10.67% (p=0.000 n=9+10) RangeQuery/expr=changes(a_ten[1d]),steps=10-16 3.66ms ± 1% 4.10ms ± 1% +11.82% (p=0.000 n=10+10) RangeQuery/expr=changes(a_ten[1d]),steps=100-16 10.5ms ± 0% 11.8ms ± 1% +12.50% (p=0.000 n=8+10) RangeQuery/expr=changes(a_ten[1d]),steps=1000-16 77.6ms ± 1% 87.4ms ± 1% +12.63% (p=0.000 n=9+9) RangeQuery/expr=changes(a_hundred[1d]),steps=1-16 30.4ms ± 2% 32.8ms ± 1% +8.01% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=10-16 37.1ms ± 2% 40.6ms ± 2% +9.64% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=100-16 105ms ± 1% 117ms ± 1% +11.69% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16 783ms ± 3% 876ms ± 1% +11.83% (p=0.000 n=9+10) ``` And then runtime v2.39 compared to after this commit: ``` name old time/op new time/op delta RangeQuery/expr=changes(a_one[1d]),steps=1-16 391µs ± 2% 547µs ± 1% +39.84% (p=0.000 n=9+8) RangeQuery/expr=changes(a_one[1d]),steps=10-16 452µs ± 2% 616µs ± 2% +36.15% (p=0.000 n=10+10) RangeQuery/expr=changes(a_one[1d]),steps=100-16 1.12ms ± 1% 1.26ms ± 1% +12.20% (p=0.000 n=8+10) RangeQuery/expr=changes(a_one[1d]),steps=1000-16 7.83ms ± 1% 7.95ms ± 1% +1.59% (p=0.000 n=10+8) RangeQuery/expr=changes(a_ten[1d]),steps=1-16 2.98ms ± 0% 3.38ms ± 2% +13.49% (p=0.000 n=9+10) RangeQuery/expr=changes(a_ten[1d]),steps=10-16 3.66ms ± 1% 4.02ms ± 1% +9.80% (p=0.000 n=10+9) RangeQuery/expr=changes(a_ten[1d]),steps=100-16 10.5ms ± 0% 10.8ms ± 1% +3.08% (p=0.000 n=8+10) RangeQuery/expr=changes(a_ten[1d]),steps=1000-16 77.6ms ± 1% 78.1ms ± 1% +0.58% (p=0.035 n=9+10) RangeQuery/expr=changes(a_hundred[1d]),steps=1-16 30.4ms ± 2% 33.5ms ± 4% +10.18% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=10-16 37.1ms ± 2% 40.0ms ± 1% +7.98% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=100-16 105ms ± 1% 107ms ± 1% +1.92% (p=0.000 n=10+10) RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16 783ms ± 3% 775ms ± 1% -1.02% (p=0.019 n=9+9) ``` In summary, the runtime doesn't really improve with this change for queries with just a few steps. For queries with many steps, this commit essentially reinstates the old performance. This is good because the many-step queries are the one that matter most (longest absolute runtime). In terms of allocations, though, this commit doesn't make a dent at all (numbers not shown). The reason is that most of the allocations happen in the sampleRingIterator (in the storage package), which has to be addressed in a separate commit. Signed-off-by: beorn7 <beorn@grafana.com>	2023-04-13 19:25:16 +02:00
Bryan Boreham	b987afa7ef	labels: simplify call to get Labels from Builder It took a `Labels` where the memory could be re-used, but in practice this hardly ever benefitted. Especially after converting `relabel.Process` to `relabel.ProcessBuilder`. Comparing the parameter to `nil` was a bug; `EmptyLabels` is not `nil` so the slice was reallocated multiple times by `append`. Lastly `Builder.Labels()` now estimates that the final size will depend on labels added and deleted. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2023-03-22 17:05:20 +00:00
Trevor Whitney	dd94ebb87b	promql: set CounterResetHint after rate and sum Signed-off-by: Trevor Whitney <trevorjwhitney@gmail.com>	2023-03-14 14:21:59 -06:00
Bryan Boreham	56fefcd812	Update package promql for new labels.Labels type We use `labels.Builder` to parse metrics, to avoid depending on the internal implementation. This is not efficient, but the feature is only used in tests. It wasn't efficient previously either - calling `Sort()` after adding each label. `createLabelsForAbsentFunction` also uses a Builder now, and gets an extra `map` to replace the previous `Has()` usage. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> Fix up promql to compile with changes to Labels	2022-12-19 15:22:09 +00:00
Bryan Boreham	6bdecf377c	Switch from 'sanity' to more inclusive lanuage (#9376 ) * Switch from 'sanity' to more inclusive lanuage "Removing ableist language in code is important; it helps to create and maintain an environment that welcomes all developers of all backgrounds, while emphasizing that we as developers select the most articulate, precise, descriptive language we can rather than relying on metaphors. The phrase sanity check is ableist, and unnecessarily references mental health in our code bases. It denotes that people with mental illnesses are inferior, wrong, or incorrect, and the phrase sanity continues to be used by employers and other individuals to discriminate against these people." From https://gist.github.com/seanmhanson/fe370c2d8bd2b3228680e38899baf5cc Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-11-28 17:09:18 +00:00

1 2 3 4

183 commits