mirror of
https://github.com/prometheus/prometheus.git
synced 2025-01-12 14:27:27 -08:00
Document the native histogram feature flag and PromQL (#11446)
Signed-off-by: beorn7 <beorn@grafana.com> Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
This commit is contained in:
parent
50529b4804
commit
41035469d3
|
@ -103,3 +103,26 @@ When enabled, the default ports for HTTP (`:80`) or HTTPS (`:443`) will _not_ be
|
|||
the address used to scrape a target (the value of the `__address_` label), contrary to the default behavior.
|
||||
In addition, if a default HTTP or HTTPS port has already been added either in a static configuration or
|
||||
by a service discovery mechanism and the respective scheme is specified (`http` or `https`), that port will be removed.
|
||||
|
||||
## Native Histograms
|
||||
|
||||
`--enable-feature=native-histograms`
|
||||
|
||||
When enabled, Prometheus will ingest native histograms (formerly also known as
|
||||
sparse histograms or high-res histograms). Native histograms are still highly
|
||||
experimental. Expect breaking changes to happen (including those rendering the
|
||||
TSDB unreadable).
|
||||
|
||||
Native histograms are currently only supported in the traditional Prometheus
|
||||
protobuf exposition format. This feature flag therefore also enables a new (and
|
||||
also experimental) protobuf parser, through which _all_ metrics are ingested
|
||||
(i.e. not only native histograms). Prometheus will try to negotiate the
|
||||
protobuf format first. The instrumented target needs to support the protobuf
|
||||
format, too, _and_ it needs to expose native histograms. The protobuf format
|
||||
allows to expose conventional and native histograms side by side. With this
|
||||
feature flag disabled, Prometheus will continue to parse the conventional
|
||||
histogram (albeit via the text format). With this flag enabled, Prometheus will
|
||||
still ingest those conventional histograms that do not come with a
|
||||
corresponding native histogram. However, if a native histogram is present,
|
||||
Prometheus will ignore the corresponding conventional histogram, with the
|
||||
notable exception of exemplars, which are always ingested.
|
||||
|
|
|
@ -32,6 +32,16 @@ expression), only some of these types are legal as the result from a
|
|||
user-specified expression. For example, an expression that returns an instant
|
||||
vector is the only type that can be directly graphed.
|
||||
|
||||
_Notes about the experimental native histograms:_
|
||||
|
||||
* Ingesting native histograms has to be enabled via a [feature
|
||||
flag](../feature_flags/#native-histograms).
|
||||
* Once native histograms have been ingested into the TSDB (and even after
|
||||
disabling the feature flag again), both instant vectors and range vectors may
|
||||
now contain samples that aren't simple floating point numbers (float samples)
|
||||
but complete histograms (histogram samples). A vector may contain a mix of
|
||||
float samples and histogram samples.
|
||||
|
||||
## Literals
|
||||
|
||||
### String literals
|
||||
|
|
|
@ -11,6 +11,22 @@ instant-vector)`. This means that there is one argument `v` which is an instant
|
|||
vector, which if not provided it will default to the value of the expression
|
||||
`vector(time())`.
|
||||
|
||||
_Notes about the experimental native histograms:_
|
||||
|
||||
* Ingesting native histograms has to be enabled via a [feature
|
||||
flag](../feature_flags/#native-histograms). As long as no native histograms
|
||||
have been ingested into the TSDB, all functions will behave as usual.
|
||||
* Functions that do not explicitly mention native histograms in their
|
||||
documentation (see below) effectively treat a native histogram as a float
|
||||
sample of value 0. (This is confusing and will change before native
|
||||
histograms become a stable feature.)
|
||||
* Functions that do already act on native histograms might still change their
|
||||
behavior in the future.
|
||||
* If a function requires the same bucket layout between multiple native
|
||||
histograms it acts on, it will automatically convert them
|
||||
appropriately. (With the currently supported bucket schemas, that's always
|
||||
possible.)
|
||||
|
||||
## `abs()`
|
||||
|
||||
`abs(v instant-vector)` returns the input vector with all sample values converted to
|
||||
|
@ -19,8 +35,8 @@ their absolute value.
|
|||
## `absent()`
|
||||
|
||||
`absent(v instant-vector)` returns an empty vector if the vector passed to it
|
||||
has any elements and a 1-element vector with the value 1 if the vector passed to
|
||||
it has no elements.
|
||||
has any elements (floats or native histograms) and a 1-element vector with the
|
||||
value 1 if the vector passed to it has no elements.
|
||||
|
||||
This is useful for alerting on when no time series exist for a given metric name
|
||||
and label combination.
|
||||
|
@ -42,8 +58,8 @@ of the 1-element output vector from the input vector.
|
|||
## `absent_over_time()`
|
||||
|
||||
`absent_over_time(v range-vector)` returns an empty vector if the range vector
|
||||
passed to it has any elements and a 1-element vector with the value 1 if the
|
||||
range vector passed to it has no elements.
|
||||
passed to it has any elements (floats or native histograms) and a 1-element
|
||||
vector with the value 1 if the range vector passed to it has no elements.
|
||||
|
||||
This is useful for alerting on when no time series exist for a given metric name
|
||||
and label combination for a certain amount of time.
|
||||
|
@ -130,7 +146,14 @@ between now and 2 hours ago:
|
|||
delta(cpu_temp_celsius{host="zeus"}[2h])
|
||||
```
|
||||
|
||||
`delta` should only be used with gauges.
|
||||
`delta` acts on native histograms by calculating a new histogram where each
|
||||
compononent (sum and count of observations, buckets) is the difference between
|
||||
the respective component in the first and last native histogram in
|
||||
`v`. However, each element in `v` that contains a mix of float and native
|
||||
histogram samples within the range, will be missing from the result vector.
|
||||
|
||||
`delta` should only be used with gauges and native histograms where the
|
||||
components behave like gauges (so-called gauge histograms).
|
||||
|
||||
## `deriv()`
|
||||
|
||||
|
@ -156,15 +179,19 @@ to the nearest integer.
|
|||
|
||||
## `histogram_count()` and `histogram_sum()`
|
||||
|
||||
_Both functions only act on native histograms, which are an experimental
|
||||
feature. The behavior of these functions may change in future versions of
|
||||
Prometheus, including their removal from PromQL._
|
||||
|
||||
`histogram_count(v instant-vector)` returns the count of observations stored in
|
||||
a native Histogram. Samples that are not native Histograms are ignored and do
|
||||
a native histogram. Samples that are not native histograms are ignored and do
|
||||
not show up in the returned vector.
|
||||
|
||||
Similarly, `histogram_sum(v instant-vector)` returns the sum of observations
|
||||
stored in a native Histogram.
|
||||
stored in a native histogram.
|
||||
|
||||
Use `histogram_count` in the following way to calculate a rate of observations
|
||||
(in this case corresponding to “requests per second”) from a native Histogram:
|
||||
(in this case corresponding to “requests per second”) from a native histogram:
|
||||
|
||||
histogram_count(rate(http_request_duration_seconds[10m]))
|
||||
|
||||
|
@ -177,57 +204,121 @@ observed values (in this case corresponding to “average request duration”):
|
|||
|
||||
## `histogram_fraction()`
|
||||
|
||||
TODO(beorn7): Add documentation.
|
||||
_This function only acts on native histograms, which are an experimental
|
||||
feature. The behavior of this function may change in future versions of
|
||||
Prometheus, including its removal from PromQL._
|
||||
|
||||
For a native histogram, `histogram_fraction(lower scalar, upper scalar, v
|
||||
instant-vector)` returns the estimated fraction of observations between the
|
||||
provided lower and upper values. Samples that are not native histograms are
|
||||
ignored and do not show up in the returned vector.
|
||||
|
||||
For example, the following expression calculates the fraction of HTTP requests
|
||||
over the last hour that took 200ms or less:
|
||||
|
||||
histogram_fraction(0, 0.2, rate(http_request_duration_seconds[1h]))
|
||||
|
||||
The error of the estimation depends on the resolution of the underlying native
|
||||
histogram and how closely the provided boundaries are aligned with the bucket
|
||||
boundaries in the histogram.
|
||||
|
||||
`+Inf` and `-Inf` are valid boundary values. For example, if the histogram in
|
||||
the expression above included negative observations (which shouldn't be the
|
||||
case for request durations), the appropriate lower boundary to include all
|
||||
observations less than or equal 0.2 would be `-Inf` rather than `0`.
|
||||
|
||||
Whether the provided boundaries are inclusive or exclusive is only relevant if
|
||||
the provided boundaries are precisely aligned with bucket boundaries in the
|
||||
underlying native histogram. In this case, the behavior depends on the schema
|
||||
definition of the histogram. The currently supported schemas all feature
|
||||
inclusive upper boundaries and exclusive lower boundaries for positive values
|
||||
(and vice versa for negative values). Without a precise alignment of
|
||||
boundaries, the function uses linear interpolation to estimate the
|
||||
fraction. With the resulting uncertainty, it becomes irrelevant if the
|
||||
boundaries are inclusive or exclusive.
|
||||
|
||||
## `histogram_quantile()`
|
||||
|
||||
TODO(beorn7): This needs a lot of updates for Histograms as sample value types.
|
||||
`histogram_quantile(φ scalar, b instant-vector)` calculates the φ-quantile (0 ≤
|
||||
φ ≤ 1) from a [conventional
|
||||
histogram](https://prometheus.io/docs/concepts/metric_types/#histogram) or from
|
||||
a native histogram. (See [histograms and
|
||||
summaries](https://prometheus.io/docs/practices/histograms) for a detailed
|
||||
explanation of φ-quantiles and the usage of the (conventional) histogram metric
|
||||
type in general.)
|
||||
|
||||
`histogram_quantile(φ scalar, b instant-vector)` calculates the φ-quantile (0 ≤ φ
|
||||
≤ 1) from the buckets `b` of a
|
||||
[histogram](https://prometheus.io/docs/concepts/metric_types/#histogram). (See
|
||||
[histograms and summaries](https://prometheus.io/docs/practices/histograms) for
|
||||
a detailed explanation of φ-quantiles and the usage of the histogram metric type
|
||||
in general.) The samples in `b` are the counts of observations in each bucket.
|
||||
Each sample must have a label `le` where the label value denotes the inclusive
|
||||
upper bound of the bucket. (Samples without such a label are silently ignored.)
|
||||
The [histogram metric type](https://prometheus.io/docs/concepts/metric_types/#histogram)
|
||||
automatically provides time series with the `_bucket` suffix and the appropriate
|
||||
labels.
|
||||
_Note that native histograms are an experimental feature. The behavior of this
|
||||
function when dealing with native histograms may change in future versions of
|
||||
Prometheus._
|
||||
|
||||
The conventional float samples in `b` are considered the counts of observations
|
||||
in each bucket of one or more conventional histograms. Each float sample must
|
||||
have a label `le` where the label value denotes the inclusive upper bound of
|
||||
the bucket. (Float samples without such a label are silently ignored.) The
|
||||
other labels and the metric name are used to identify the buckets belonging to
|
||||
each conventional histogram. The [histogram metric
|
||||
type](https://prometheus.io/docs/concepts/metric_types/#histogram)
|
||||
automatically provides time series with the `_bucket` suffix and the
|
||||
appropriate labels.
|
||||
|
||||
The native histogram samples in `b` are treated each individually as a separate
|
||||
histogram to calculate the quantile from.
|
||||
|
||||
As long as no naming collisions arise, `b` may contain a mix of conventional
|
||||
and native histograms.
|
||||
|
||||
Use the `rate()` function to specify the time window for the quantile
|
||||
calculation.
|
||||
|
||||
Example: A histogram metric is called `http_request_duration_seconds`. To
|
||||
calculate the 90th percentile of request durations over the last 10m, use the
|
||||
following expression:
|
||||
Example: A histogram metric is called `http_request_duration_seconds` (and
|
||||
therefore the metric name for the buckets of a conventional histogram is
|
||||
`http_request_duration_seconds_bucket`). To calculate the 90th percentile of request
|
||||
durations over the last 10m, use the following expression in case
|
||||
`http_request_duration_seconds` is a conventional histogram:
|
||||
|
||||
histogram_quantile(0.9, rate(http_request_duration_seconds_bucket[10m]))
|
||||
|
||||
For a native histogram, use the following expression instead:
|
||||
|
||||
histogram_quantile(0.9, rate(http_request_duration_seconds[10m]))
|
||||
|
||||
The quantile is calculated for each label combination in
|
||||
`http_request_duration_seconds`. To aggregate, use the `sum()` aggregator
|
||||
around the `rate()` function. Since the `le` label is required by
|
||||
`histogram_quantile()`, it has to be included in the `by` clause. The following
|
||||
expression aggregates the 90th percentile by `job`:
|
||||
`histogram_quantile()` to deal with conventional histograms, it has to be
|
||||
included in the `by` clause. The following expression aggregates the 90th
|
||||
percentile by `job` for conventional histograms:
|
||||
|
||||
histogram_quantile(0.9, sum by (job, le) (rate(http_request_duration_seconds_bucket[10m])))
|
||||
|
||||
To aggregate everything, specify only the `le` label:
|
||||
When aggregating native histograms, the expression simplifies to:
|
||||
|
||||
histogram_quantile(0.9, sum by (job) (rate(http_request_duration_seconds[10m])))
|
||||
|
||||
To aggregate all conventional histograms, specify only the `le` label:
|
||||
|
||||
histogram_quantile(0.9, sum by (le) (rate(http_request_duration_seconds_bucket[10m])))
|
||||
|
||||
The `histogram_quantile()` function interpolates quantile values by
|
||||
assuming a linear distribution within a bucket. The highest bucket
|
||||
must have an upper bound of `+Inf`. (Otherwise, `NaN` is returned.) If
|
||||
a quantile is located in the highest bucket, the upper bound of the
|
||||
second highest bucket is returned. A lower limit of the lowest bucket
|
||||
is assumed to be 0 if the upper bound of that bucket is greater than
|
||||
0. In that case, the usual linear interpolation is applied within that
|
||||
bucket. Otherwise, the upper bound of the lowest bucket is returned
|
||||
for quantiles located in the lowest bucket.
|
||||
With native histograms, aggregating everything works as usual without any `by` clause:
|
||||
|
||||
histogram_quantile(0.9, sum(rate(http_request_duration_seconds[10m])))
|
||||
|
||||
The `histogram_quantile()` function interpolates quantile values by
|
||||
assuming a linear distribution within a bucket.
|
||||
|
||||
If `b` has 0 observations, `NaN` is returned. For φ < 0, `-Inf` is
|
||||
returned. For φ > 1, `+Inf` is returned. For φ = `NaN`, `NaN` is returned.
|
||||
|
||||
The following is only relevant for conventional histograms: If `b` contains
|
||||
fewer than two buckets, `NaN` is returned. The highest bucket must have an
|
||||
upper bound of `+Inf`. (Otherwise, `NaN` is returned.) If a quantile is located
|
||||
in the highest bucket, the upper bound of the second highest bucket is
|
||||
returned. A lower limit of the lowest bucket is assumed to be 0 if the upper
|
||||
bound of that bucket is greater than
|
||||
0. In that case, the usual linear interpolation is applied within that
|
||||
bucket. Otherwise, the upper bound of the lowest bucket is returned for
|
||||
quantiles located in the lowest bucket.
|
||||
|
||||
If `b` has 0 observations, `NaN` is returned. If `b` contains fewer than two buckets,
|
||||
`NaN` is returned. For φ < 0, `-Inf` is returned. For φ > 1, `+Inf` is returned. For φ = `NaN`, `NaN` is returned.
|
||||
|
||||
## `holt_winters()`
|
||||
|
||||
|
@ -269,11 +360,17 @@ over the last 5 minutes, per time series in the range vector:
|
|||
increase(http_requests_total{job="api-server"}[5m])
|
||||
```
|
||||
|
||||
`increase` should only be used with counters. It is syntactic sugar
|
||||
for `rate(v)` multiplied by the number of seconds under the specified
|
||||
time range window, and should be used primarily for human readability.
|
||||
Use `rate` in recording rules so that increases are tracked consistently
|
||||
on a per-second basis.
|
||||
`increase` acts on native histograms by calculating a new histogram where each
|
||||
compononent (sum and count of observations, buckets) is the increase between
|
||||
the respective component in the first and last native histogram in
|
||||
`v`. However, each element in `v` that contains a mix of float and native
|
||||
histogram samples within the range, will be missing from the result vector.
|
||||
|
||||
`increase` should only be used with counters and native histograms where the
|
||||
components behave like counters. It is syntactic sugar for `rate(v)` multiplied
|
||||
by the number of seconds under the specified time range window, and should be
|
||||
used primarily for human readability. Use `rate` in recording rules so that
|
||||
increases are tracked consistently on a per-second basis.
|
||||
|
||||
## `irate()`
|
||||
|
||||
|
@ -385,8 +482,15 @@ over the last 5 minutes, per time series in the range vector:
|
|||
rate(http_requests_total{job="api-server"}[5m])
|
||||
```
|
||||
|
||||
`rate` should only be used with counters. It is best suited for alerting,
|
||||
and for graphing of slow-moving counters.
|
||||
`rate` acts on native histograms by calculating a new histogram where each
|
||||
compononent (sum and count of observations, buckets) is the rate of increase
|
||||
between the respective component in the first and last native histogram in
|
||||
`v`. However, each element in `v` that contains a mix of float and native
|
||||
histogram samples within the range, will be missing from the result vector.
|
||||
|
||||
`rate` should only be used with counters and native histograms where the
|
||||
components behave like counters. It is best suited for alerting, and for
|
||||
graphing of slow-moving counters.
|
||||
|
||||
Note that when combining `rate()` with an aggregation operator (e.g. `sum()`)
|
||||
or a function aggregating over time (any function ending in `_over_time`),
|
||||
|
|
|
@ -306,3 +306,31 @@ highest to lowest.
|
|||
Operators on the same precedence level are left-associative. For example,
|
||||
`2 * 3 % 2` is equivalent to `(2 * 3) % 2`. However `^` is right associative,
|
||||
so `2 ^ 3 ^ 2` is equivalent to `2 ^ (3 ^ 2)`.
|
||||
|
||||
## Operators for native histograms
|
||||
|
||||
Native histograms are an experimental feature. Ingesting native histograms has
|
||||
to be enabled via a [feature flag](../feature_flags/#native-histograms). Once
|
||||
native histograms have been ingested, they can be queried (even after the
|
||||
feature flag has been disabled again). However, the operator support for native
|
||||
histograms is still very limited.
|
||||
|
||||
Logical/set binary operators work as expected even if histogram samples are
|
||||
involved. They only check for the existence of a vector element and don't
|
||||
change their behavior depending on the sample type of an element (float or
|
||||
histogram).
|
||||
|
||||
The binary `+` operator between two native histograms and the `sum` aggregation
|
||||
operator to aggregate native histograms are fully supported. Even if the
|
||||
histograms involved have different bucket layouts, the buckets are
|
||||
automatically converted appropriately so that the operation can be
|
||||
performed. (With the currently supported bucket schemas, that's always
|
||||
possible.) If either operator has to sum up a mix of histogram samples and
|
||||
float samples, the corresponding vector element is removed from the output
|
||||
vector entirely.
|
||||
|
||||
All other operators do not behave in a meaningful way. They either treat the
|
||||
histogram sample as if it were a float sample of value 0, or (in case of
|
||||
arithmetic operations between a scalar and a vector) they leave the histogram
|
||||
sample unchanged. This behavior will change to a meaningful one before native
|
||||
histograms are a stable feature.
|
||||
|
|
Loading…
Reference in a new issue