Commit graph

39 commits

Author SHA1 Message Date
beorn7 6fcd225aee promql(native histograms): Introduce exponential interpolation
Some checks failed
CI / Go tests (push) Has been cancelled
CI / More Go tests (push) Has been cancelled
CI / Go tests with previous Go version (push) Has been cancelled
CI / UI tests (push) Has been cancelled
CI / Go tests on Windows (push) Has been cancelled
CI / Mixins tests (push) Has been cancelled
CI / Build Prometheus for common architectures (0) (push) Has been cancelled
CI / Build Prometheus for common architectures (1) (push) Has been cancelled
CI / Build Prometheus for common architectures (2) (push) Has been cancelled
CI / Build Prometheus for all architectures (0) (push) Has been cancelled
CI / Build Prometheus for all architectures (1) (push) Has been cancelled
CI / Build Prometheus for all architectures (10) (push) Has been cancelled
CI / Build Prometheus for all architectures (11) (push) Has been cancelled
CI / Build Prometheus for all architectures (2) (push) Has been cancelled
CI / Build Prometheus for all architectures (3) (push) Has been cancelled
CI / Build Prometheus for all architectures (4) (push) Has been cancelled
CI / Build Prometheus for all architectures (5) (push) Has been cancelled
CI / Build Prometheus for all architectures (6) (push) Has been cancelled
CI / Build Prometheus for all architectures (7) (push) Has been cancelled
CI / Build Prometheus for all architectures (8) (push) Has been cancelled
CI / Build Prometheus for all architectures (9) (push) Has been cancelled
CI / Check generated parser (push) Has been cancelled
CI / golangci-lint (push) Has been cancelled
CI / fuzzing (push) Has been cancelled
CI / codeql (push) Has been cancelled
CI / Report status of build Prometheus for all architectures (push) Has been cancelled
CI / Publish main branch artifacts (push) Has been cancelled
CI / Publish release artefacts (push) Has been cancelled
CI / Publish UI on npm Registry (push) Has been cancelled
The linear interpolation (assuming that observations are uniformly
distributed within a bucket) is a solid and simple assumption in lack
of any other information. However, the exponential bucketing used by
standard schemas of native histograms has been chosen to cover the
whole range of observations in a way that bucket populations are
spread out over buckets in a reasonably way for typical distributions
encountered in real-world scenarios.

This is the origin of the idea implemented here: If we divide a given
bucket into two (or more) smaller exponential buckets, we "most
naturally" expect that the samples in the original buckets will split
among those smaller buckets in a more or less uniform fashion. With
this assumption, we end up with an "exponential interpolation", which
therefore appears to be a better match for histograms with exponential
bucketing.

This commit leaves the linear interpolation in place for NHCB, but
changes the interpolation for exponential native histograms to
exponential. This affects `histogram_quantile` and
`histogram_fraction` (because the latter is more or less the inverse
of the former).

The zero bucket has to be treated specially because the assumption
above would lead to an "interpolation to zero" (the bucket density
approaches infinity around zero, and with the postulated uniform usage
of buckets, we would end up with an estimate of zero for all quantiles
ending up in the zero bucket). We simply fall back to linear
interpolation within the zero bucket.

At the same time, this commit makes the call to stick with the
assumption that the zero bucket only contains positive observations
for native histograms without negative buckets (and vice versa). (This
is an assumption relevant for interpolation. It is a mostly academic
point, as the zero bucket is supposed to be very small anyway.
However, in cases where it _is_ relevantly broad, the assumption helps
a lot in practice.)

This commit also updates and completes the documentation to match both
details about interpolation.

As a more high level note: The approach here attempts to strike a
balance between a more simplistic approach without any assumption, and
a more involved approach with more sophisticated assumptions. I will
shortly describe both for reference:

The "zero assumption" approach would be to not interpolate at all, but
_always_ return the harmonic mean of the bucket boundaries of the
bucket the quantile ends up in. This has the advantage of minimizing
the maximum possible relative error of the quantile estimation.
(Depending on the exact definition of the relative error of an
estimation, there is also an argument to return the arithmetic mean of
the bucket boundaries.) While limiting the maximum possible relative
error is a good property, this approach would throw away the
information if a quantile is closer to the upper or lower end of the
population within a bucket. This can be valuable trending information
in a dashboard. With any kind of interpolation, the maximum possible
error of a quantile estimation increases to the full width of a bucket
(i.e. it more than doubles for the harmonic mean approach, and
precisely doubles for the arithmetic mean approach). However, in
return the _expectation value_ of the error decreases. The increase of
the theoretical maximum only has practical relevance for pathologic
distributions. For example, if there are thousand observations within
a bucket, they could _all_ be at the upper bound of the bucket. If the
quantile calculation picks the 1st observation in the bucket as the
relevant one, an interpolation will yield a value close to the lower
bucket boundary, while the true quantile value is close to the upper
boundary.

The "fancy interpolation" approach would be one that analyses the
_actual_ distribution of samples in the histogram. A lot of statistics
could be applied based on the information we have available in the
histogram. This would include the population of neighboring (or even
all) buckets in the histogram. In general, the resolution of a native
histogram should be quite high, and therefore, those "fancy"
approaches would increase the computational cost quite a bit with very
little practical benefits (i.e. just tiny corrections of the estimated
quantile value). The results are also much harder to reason with.

Signed-off-by: beorn7 <beorn@grafana.com>
2024-09-19 14:19:10 +02:00
Jan Fajerski 91608c002f Merge branch 'main' into release-3.0-beta.0
Conflicts:
	scrape/scrape_test.go
          Pick both changes.
2024-09-10 20:51:20 +02:00
Charles Korn e8c7482137
Return negative counts when multiplied or divided by a negative value
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2024-09-09 14:37:59 +10:00
Charles Korn 113de6301c
Add failing test cases for unary negation and multiplication and division with negative scalars
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2024-09-04 16:20:28 +10:00
Charles Korn 9b451abec7
Make positive and negative bucket counts different in existing test cases
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2024-09-04 16:08:05 +10:00
Jan Fajerski 956245b25b promqltest: adjust eval times and range selector
In order to fix new tests for changes added in
https://github.com/prometheus/prometheus/pull/13904.

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
2024-09-02 11:27:39 +02:00
Jan Fajerski 00315ce15e Merge branch 'main' into 3.0-main-sync-24-08-30
using -Xours

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
2024-09-02 11:27:18 +02:00
Neeraj Gartia 8c7bf39d96
Moves TestNativeHistogram_MulDivOperator to promql testing framework (#14688)
Some checks are pending
CI / Go tests (push) Waiting to run
CI / More Go tests (push) Waiting to run
CI / Go tests with previous Go version (push) Waiting to run
CI / UI tests (push) Waiting to run
CI / Go tests on Windows (push) Waiting to run
CI / Mixins tests (push) Waiting to run
CI / Build Prometheus for common architectures (0) (push) Waiting to run
CI / Build Prometheus for common architectures (1) (push) Waiting to run
CI / Build Prometheus for common architectures (2) (push) Waiting to run
CI / Build Prometheus for all architectures (0) (push) Waiting to run
CI / Build Prometheus for all architectures (1) (push) Waiting to run
CI / Build Prometheus for all architectures (10) (push) Waiting to run
CI / Build Prometheus for all architectures (11) (push) Waiting to run
CI / Build Prometheus for all architectures (2) (push) Waiting to run
CI / Build Prometheus for all architectures (3) (push) Waiting to run
CI / Build Prometheus for all architectures (4) (push) Waiting to run
CI / Build Prometheus for all architectures (5) (push) Waiting to run
CI / Build Prometheus for all architectures (6) (push) Waiting to run
CI / Build Prometheus for all architectures (7) (push) Waiting to run
CI / Build Prometheus for all architectures (8) (push) Waiting to run
CI / Build Prometheus for all architectures (9) (push) Waiting to run
CI / Report status of build Prometheus for all architectures (push) Blocked by required conditions
CI / Check generated parser (push) Waiting to run
CI / golangci-lint (push) Waiting to run
CI / fuzzing (push) Waiting to run
CI / codeql (push) Waiting to run
CI / Publish main branch artifacts (push) Blocked by required conditions
CI / Publish release artefacts (push) Blocked by required conditions
CI / Publish UI on npm Registry (push) Blocked by required conditions
Scorecards supply-chain security / Scorecards analysis (push) Waiting to run
PromQL: add test for mul and div operator

Also, remove the converted test from the engine_test.go file.

This also includes an extension of the test framework to allow NaN/Inf in histogram buckets.

---------

Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
2024-08-29 16:42:35 +02:00
Jan Fajerski 7c8c748399 promql tests: adjust range query intervals
Some test queries need their interval adjusted to account for
https://github.com/prometheus/prometheus/pull/13904. Otherwise the
queries don't return enough samples.
promql/engine_test.go:TestHistogramCopyFromIteratorRegression needed the
same, but also the result needed a fix since `increase` interpolates
over the full range.

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
2024-08-21 12:33:52 +02:00
Jan Fajerski 5138922b0d Merge branch 'main' into 3.0-main-sync-24-08-21 2024-08-21 09:09:36 +02:00
Ziqi Zhao 8f828d45c1 convert TestNativeHistogram_Sum_Count_Add_AvgOperator into testing framework
Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
2024-08-21 09:24:50 +08:00
Charles Korn 52818a97e2
Merge branch 'main' into sum-and-avg-over-mixed-custom-exponential-histograms
# Conflicts:
#	promql/promqltest/testdata/native_histograms.test
2024-08-14 07:52:08 +10:00
György Krajcsovits 386fc8b9f6 Update from review comments.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-08-13 15:26:07 +02:00
György Krajcsovits 6aee5b4b38 fix typo
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-08-12 12:04:45 +02:00
György Krajcsovits 06a8886b94 Native histograms: define behavior when rate is null.
Histogram quantile returns NaN in this case, which might be
surprising, so add a unit test that clarifies that this is
intentional.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-08-12 10:40:21 +02:00
Charles Korn f992f81bd0
Merge branch 'main' into sum-and-avg-over-mixed-custom-exponential-histograms
Signed-off-by: Charles Korn <charleskorn@users.noreply.github.com>
2024-08-09 13:58:54 +10:00
Charles Korn 5cfdde327c
Address PR feedback: add extra test case
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2024-08-09 13:57:37 +10:00
Charles Korn 82bb35fabb
Address PR feedback: fix typo and rename variable
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2024-08-09 13:51:31 +10:00
Charles Korn f07b3ae67b
Fix issue where avg over mixed exponential and custom buckets, or incompatible custom buckets, produces incorrect results or panics
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2024-08-07 15:32:35 +10:00
Charles Korn 5ee94f49a2
Fix issue where sum over mixed exponential and custom buckets, or incompatible custom buckets, produces incorrect results
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2024-08-07 15:30:01 +10:00
Charles Korn 424cefcf5e
Fix "cannot reduce resolution to custom buckets schema" panic in rate over native histograms with mix of custom and exponential buckets
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2024-08-07 14:45:02 +10:00
Charles Korn aadec25faf
promql: Fix issue where some native histogram-related annotations are not emitted by rate (#14575)
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2024-08-06 09:10:40 +01:00
Jan Fajerski adf5d6bce1 Merge branch 'main' into 3.0-main-sync-24-07-18
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>

Conflicts:
	VERSION
          pick 3.0.0
	promql/promqltest/testdata/histograms.test
          pick changes from c39776c5b5,
          but adjust 5m range selectors to 10m to account for
          https://github.com/prometheus/prometheus/pull/13904.
Fixes:
        promql/promqltest/testdata/functions.test
	promql/promqltest/testdata/staleness.test
          Tests added in https://github.com/prometheus/prometheus/pull/9138
          need to be adjusted to account for
          https://github.com/prometheus/prometheus/pull/13904.
2024-07-18 15:56:40 +02:00
Filip Petkovski acb6c1ae4b
Fix decoding buckets for native histograms in binops
The optimizer which detects cases where histogram buckets can be skipped
does not take into account binary expressions. This can lead to buckets
not being decoded if a metric is used with both histogram_fraction/quantile and
histogram_sum/count in the same expression.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2024-07-10 11:55:29 +02:00
Jan Fajerski c0d67fd845 Merge branch 'main' into 3.0-main-sync-24-07-09 2024-07-09 16:51:07 +02:00
Jan Fajerski b4152309a4 Merge branch 'main' into 3.0-main-sync
Conflicts:
  promql/engine_test.go
    Resolved by picking main changes but adjusting total_samples for
    query "max_over_time(metricWith1HistogramEvery10Seconds[60s])[20s:5s]"
    to 312. Via https://github.com/prometheus/prometheus/pull/13662 this
    histogram now stores 13 values per timestamp, but via
    https://github.com/prometheus/prometheus/pull/13904 the range query
    is now left-open.
  promql/promqltest/testdata/functions.test
    Resolved by picking changes in main. See also
    https://github.com/prometheus/prometheus/pull/13662, but adjust some
    range selectors (`s/1m/2m/`) to account for
    https://github.com/prometheus/prometheus/pull/13904.
  promql/promqltest/testdata/histograms.test
    Resolved by picking changes in main. See also
    https://github.com/prometheus/prometheus/pull/13662, but adjust some
    range selectors (`s/5m/10m/`) to account for
    https://github.com/prometheus/prometheus/pull/13904.

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
2024-07-03 09:45:25 +02:00
Ganesh Vernekar 3d54bcc018
Merge pull request #14362 from charleskorn/charleskorn/sum-infinity 2024-07-03 01:05:03 -04:00
Charles Korn fd6bdf5230
Fix issue where summation of +/- infinity returns NaN instead of infinity
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2024-06-28 11:26:54 +10:00
Jeanette Tan dda5f48c9e Merge branch 'main' into nhcb-review-2 2024-06-20 22:50:00 +08:00
Zhang Zhanpeng debbdb8608 make matrix selection and lookback left-open and right-closed
Signed-off-by: Zhang Zhanpeng <zhangzhanpeng.zzp@alibaba-inc.com>
Signed-off-by: beorn7 <beorn@grafana.com>
Co-authored-by: beorn7 <beorn@grafana.com>
2024-06-20 22:05:40 +08:00
Jeanette Tan 14f8dded39 Merge branch 'main' into nhcb
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2024-06-07 19:17:14 +08:00
beorn7 c7fdfe8004 promql: Add tests for histogram counter reset only in bucket
This also exercises the "fast path" (only decoding count and sum),
i.e. where the counter reset isn't visible at all in the decoded data.

Signed-off-by: beorn7 <beorn@grafana.com>
2024-06-06 17:47:38 +02:00
Jeanette Tan f028496133 Merge branch 'main' into nhcb
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2024-05-14 16:20:15 +08:00
Neeraj Gartia 661856cb65 removes the added tests from engine_test.go
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
2024-05-13 22:58:25 +05:30
Neeraj Gartia 6119124d0e some nits
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
2024-05-13 22:55:38 +05:30
Neeraj Gartia adf5a36c1e adds test for sum, count, stddev, stdvar, quantile and fraction func to promql testing framework
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
2024-05-13 22:55:38 +05:30
Neeraj Gartia 8b838a05d9 adds test for native histogram rate func in promql testing framework
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
2024-05-13 22:55:38 +05:30
Neeraj Gartia 548bd9d6fb adds TestNativeHistogramRate func to promql test framework
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
2024-05-13 22:55:38 +05:30
Bryan Boreham 11b27d5d22 test: move test files into new promqltest package
So that promql package does not bring in test-only dependencies.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-05-08 13:42:55 +01:00
Renamed from promql/testdata/native_histograms.test (Browse further)