mirror of
https://github.com/prometheus/prometheus.git
synced 2025-01-12 06:17:27 -08:00
Document the native histogram feature flag and PromQL (#11446)
Signed-off-by: beorn7 <beorn@grafana.com> Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
This commit is contained in:
parent
50529b4804
commit
41035469d3
|
@ -103,3 +103,26 @@ When enabled, the default ports for HTTP (`:80`) or HTTPS (`:443`) will _not_ be
|
||||||
the address used to scrape a target (the value of the `__address_` label), contrary to the default behavior.
|
the address used to scrape a target (the value of the `__address_` label), contrary to the default behavior.
|
||||||
In addition, if a default HTTP or HTTPS port has already been added either in a static configuration or
|
In addition, if a default HTTP or HTTPS port has already been added either in a static configuration or
|
||||||
by a service discovery mechanism and the respective scheme is specified (`http` or `https`), that port will be removed.
|
by a service discovery mechanism and the respective scheme is specified (`http` or `https`), that port will be removed.
|
||||||
|
|
||||||
|
## Native Histograms
|
||||||
|
|
||||||
|
`--enable-feature=native-histograms`
|
||||||
|
|
||||||
|
When enabled, Prometheus will ingest native histograms (formerly also known as
|
||||||
|
sparse histograms or high-res histograms). Native histograms are still highly
|
||||||
|
experimental. Expect breaking changes to happen (including those rendering the
|
||||||
|
TSDB unreadable).
|
||||||
|
|
||||||
|
Native histograms are currently only supported in the traditional Prometheus
|
||||||
|
protobuf exposition format. This feature flag therefore also enables a new (and
|
||||||
|
also experimental) protobuf parser, through which _all_ metrics are ingested
|
||||||
|
(i.e. not only native histograms). Prometheus will try to negotiate the
|
||||||
|
protobuf format first. The instrumented target needs to support the protobuf
|
||||||
|
format, too, _and_ it needs to expose native histograms. The protobuf format
|
||||||
|
allows to expose conventional and native histograms side by side. With this
|
||||||
|
feature flag disabled, Prometheus will continue to parse the conventional
|
||||||
|
histogram (albeit via the text format). With this flag enabled, Prometheus will
|
||||||
|
still ingest those conventional histograms that do not come with a
|
||||||
|
corresponding native histogram. However, if a native histogram is present,
|
||||||
|
Prometheus will ignore the corresponding conventional histogram, with the
|
||||||
|
notable exception of exemplars, which are always ingested.
|
||||||
|
|
|
@ -32,6 +32,16 @@ expression), only some of these types are legal as the result from a
|
||||||
user-specified expression. For example, an expression that returns an instant
|
user-specified expression. For example, an expression that returns an instant
|
||||||
vector is the only type that can be directly graphed.
|
vector is the only type that can be directly graphed.
|
||||||
|
|
||||||
|
_Notes about the experimental native histograms:_
|
||||||
|
|
||||||
|
* Ingesting native histograms has to be enabled via a [feature
|
||||||
|
flag](../feature_flags/#native-histograms).
|
||||||
|
* Once native histograms have been ingested into the TSDB (and even after
|
||||||
|
disabling the feature flag again), both instant vectors and range vectors may
|
||||||
|
now contain samples that aren't simple floating point numbers (float samples)
|
||||||
|
but complete histograms (histogram samples). A vector may contain a mix of
|
||||||
|
float samples and histogram samples.
|
||||||
|
|
||||||
## Literals
|
## Literals
|
||||||
|
|
||||||
### String literals
|
### String literals
|
||||||
|
|
|
@ -11,6 +11,22 @@ instant-vector)`. This means that there is one argument `v` which is an instant
|
||||||
vector, which if not provided it will default to the value of the expression
|
vector, which if not provided it will default to the value of the expression
|
||||||
`vector(time())`.
|
`vector(time())`.
|
||||||
|
|
||||||
|
_Notes about the experimental native histograms:_
|
||||||
|
|
||||||
|
* Ingesting native histograms has to be enabled via a [feature
|
||||||
|
flag](../feature_flags/#native-histograms). As long as no native histograms
|
||||||
|
have been ingested into the TSDB, all functions will behave as usual.
|
||||||
|
* Functions that do not explicitly mention native histograms in their
|
||||||
|
documentation (see below) effectively treat a native histogram as a float
|
||||||
|
sample of value 0. (This is confusing and will change before native
|
||||||
|
histograms become a stable feature.)
|
||||||
|
* Functions that do already act on native histograms might still change their
|
||||||
|
behavior in the future.
|
||||||
|
* If a function requires the same bucket layout between multiple native
|
||||||
|
histograms it acts on, it will automatically convert them
|
||||||
|
appropriately. (With the currently supported bucket schemas, that's always
|
||||||
|
possible.)
|
||||||
|
|
||||||
## `abs()`
|
## `abs()`
|
||||||
|
|
||||||
`abs(v instant-vector)` returns the input vector with all sample values converted to
|
`abs(v instant-vector)` returns the input vector with all sample values converted to
|
||||||
|
@ -19,8 +35,8 @@ their absolute value.
|
||||||
## `absent()`
|
## `absent()`
|
||||||
|
|
||||||
`absent(v instant-vector)` returns an empty vector if the vector passed to it
|
`absent(v instant-vector)` returns an empty vector if the vector passed to it
|
||||||
has any elements and a 1-element vector with the value 1 if the vector passed to
|
has any elements (floats or native histograms) and a 1-element vector with the
|
||||||
it has no elements.
|
value 1 if the vector passed to it has no elements.
|
||||||
|
|
||||||
This is useful for alerting on when no time series exist for a given metric name
|
This is useful for alerting on when no time series exist for a given metric name
|
||||||
and label combination.
|
and label combination.
|
||||||
|
@ -42,8 +58,8 @@ of the 1-element output vector from the input vector.
|
||||||
## `absent_over_time()`
|
## `absent_over_time()`
|
||||||
|
|
||||||
`absent_over_time(v range-vector)` returns an empty vector if the range vector
|
`absent_over_time(v range-vector)` returns an empty vector if the range vector
|
||||||
passed to it has any elements and a 1-element vector with the value 1 if the
|
passed to it has any elements (floats or native histograms) and a 1-element
|
||||||
range vector passed to it has no elements.
|
vector with the value 1 if the range vector passed to it has no elements.
|
||||||
|
|
||||||
This is useful for alerting on when no time series exist for a given metric name
|
This is useful for alerting on when no time series exist for a given metric name
|
||||||
and label combination for a certain amount of time.
|
and label combination for a certain amount of time.
|
||||||
|
@ -130,7 +146,14 @@ between now and 2 hours ago:
|
||||||
delta(cpu_temp_celsius{host="zeus"}[2h])
|
delta(cpu_temp_celsius{host="zeus"}[2h])
|
||||||
```
|
```
|
||||||
|
|
||||||
`delta` should only be used with gauges.
|
`delta` acts on native histograms by calculating a new histogram where each
|
||||||
|
compononent (sum and count of observations, buckets) is the difference between
|
||||||
|
the respective component in the first and last native histogram in
|
||||||
|
`v`. However, each element in `v` that contains a mix of float and native
|
||||||
|
histogram samples within the range, will be missing from the result vector.
|
||||||
|
|
||||||
|
`delta` should only be used with gauges and native histograms where the
|
||||||
|
components behave like gauges (so-called gauge histograms).
|
||||||
|
|
||||||
## `deriv()`
|
## `deriv()`
|
||||||
|
|
||||||
|
@ -156,15 +179,19 @@ to the nearest integer.
|
||||||
|
|
||||||
## `histogram_count()` and `histogram_sum()`
|
## `histogram_count()` and `histogram_sum()`
|
||||||
|
|
||||||
|
_Both functions only act on native histograms, which are an experimental
|
||||||
|
feature. The behavior of these functions may change in future versions of
|
||||||
|
Prometheus, including their removal from PromQL._
|
||||||
|
|
||||||
`histogram_count(v instant-vector)` returns the count of observations stored in
|
`histogram_count(v instant-vector)` returns the count of observations stored in
|
||||||
a native Histogram. Samples that are not native Histograms are ignored and do
|
a native histogram. Samples that are not native histograms are ignored and do
|
||||||
not show up in the returned vector.
|
not show up in the returned vector.
|
||||||
|
|
||||||
Similarly, `histogram_sum(v instant-vector)` returns the sum of observations
|
Similarly, `histogram_sum(v instant-vector)` returns the sum of observations
|
||||||
stored in a native Histogram.
|
stored in a native histogram.
|
||||||
|
|
||||||
Use `histogram_count` in the following way to calculate a rate of observations
|
Use `histogram_count` in the following way to calculate a rate of observations
|
||||||
(in this case corresponding to “requests per second”) from a native Histogram:
|
(in this case corresponding to “requests per second”) from a native histogram:
|
||||||
|
|
||||||
histogram_count(rate(http_request_duration_seconds[10m]))
|
histogram_count(rate(http_request_duration_seconds[10m]))
|
||||||
|
|
||||||
|
@ -177,57 +204,121 @@ observed values (in this case corresponding to “average request duration”):
|
||||||
|
|
||||||
## `histogram_fraction()`
|
## `histogram_fraction()`
|
||||||
|
|
||||||
TODO(beorn7): Add documentation.
|
_This function only acts on native histograms, which are an experimental
|
||||||
|
feature. The behavior of this function may change in future versions of
|
||||||
|
Prometheus, including its removal from PromQL._
|
||||||
|
|
||||||
|
For a native histogram, `histogram_fraction(lower scalar, upper scalar, v
|
||||||
|
instant-vector)` returns the estimated fraction of observations between the
|
||||||
|
provided lower and upper values. Samples that are not native histograms are
|
||||||
|
ignored and do not show up in the returned vector.
|
||||||
|
|
||||||
|
For example, the following expression calculates the fraction of HTTP requests
|
||||||
|
over the last hour that took 200ms or less:
|
||||||
|
|
||||||
|
histogram_fraction(0, 0.2, rate(http_request_duration_seconds[1h]))
|
||||||
|
|
||||||
|
The error of the estimation depends on the resolution of the underlying native
|
||||||
|
histogram and how closely the provided boundaries are aligned with the bucket
|
||||||
|
boundaries in the histogram.
|
||||||
|
|
||||||
|
`+Inf` and `-Inf` are valid boundary values. For example, if the histogram in
|
||||||
|
the expression above included negative observations (which shouldn't be the
|
||||||
|
case for request durations), the appropriate lower boundary to include all
|
||||||
|
observations less than or equal 0.2 would be `-Inf` rather than `0`.
|
||||||
|
|
||||||
|
Whether the provided boundaries are inclusive or exclusive is only relevant if
|
||||||
|
the provided boundaries are precisely aligned with bucket boundaries in the
|
||||||
|
underlying native histogram. In this case, the behavior depends on the schema
|
||||||
|
definition of the histogram. The currently supported schemas all feature
|
||||||
|
inclusive upper boundaries and exclusive lower boundaries for positive values
|
||||||
|
(and vice versa for negative values). Without a precise alignment of
|
||||||
|
boundaries, the function uses linear interpolation to estimate the
|
||||||
|
fraction. With the resulting uncertainty, it becomes irrelevant if the
|
||||||
|
boundaries are inclusive or exclusive.
|
||||||
|
|
||||||
## `histogram_quantile()`
|
## `histogram_quantile()`
|
||||||
|
|
||||||
TODO(beorn7): This needs a lot of updates for Histograms as sample value types.
|
`histogram_quantile(φ scalar, b instant-vector)` calculates the φ-quantile (0 ≤
|
||||||
|
φ ≤ 1) from a [conventional
|
||||||
|
histogram](https://prometheus.io/docs/concepts/metric_types/#histogram) or from
|
||||||
|
a native histogram. (See [histograms and
|
||||||
|
summaries](https://prometheus.io/docs/practices/histograms) for a detailed
|
||||||
|
explanation of φ-quantiles and the usage of the (conventional) histogram metric
|
||||||
|
type in general.)
|
||||||
|
|
||||||
`histogram_quantile(φ scalar, b instant-vector)` calculates the φ-quantile (0 ≤ φ
|
_Note that native histograms are an experimental feature. The behavior of this
|
||||||
≤ 1) from the buckets `b` of a
|
function when dealing with native histograms may change in future versions of
|
||||||
[histogram](https://prometheus.io/docs/concepts/metric_types/#histogram). (See
|
Prometheus._
|
||||||
[histograms and summaries](https://prometheus.io/docs/practices/histograms) for
|
|
||||||
a detailed explanation of φ-quantiles and the usage of the histogram metric type
|
The conventional float samples in `b` are considered the counts of observations
|
||||||
in general.) The samples in `b` are the counts of observations in each bucket.
|
in each bucket of one or more conventional histograms. Each float sample must
|
||||||
Each sample must have a label `le` where the label value denotes the inclusive
|
have a label `le` where the label value denotes the inclusive upper bound of
|
||||||
upper bound of the bucket. (Samples without such a label are silently ignored.)
|
the bucket. (Float samples without such a label are silently ignored.) The
|
||||||
The [histogram metric type](https://prometheus.io/docs/concepts/metric_types/#histogram)
|
other labels and the metric name are used to identify the buckets belonging to
|
||||||
automatically provides time series with the `_bucket` suffix and the appropriate
|
each conventional histogram. The [histogram metric
|
||||||
labels.
|
type](https://prometheus.io/docs/concepts/metric_types/#histogram)
|
||||||
|
automatically provides time series with the `_bucket` suffix and the
|
||||||
|
appropriate labels.
|
||||||
|
|
||||||
|
The native histogram samples in `b` are treated each individually as a separate
|
||||||
|
histogram to calculate the quantile from.
|
||||||
|
|
||||||
|
As long as no naming collisions arise, `b` may contain a mix of conventional
|
||||||
|
and native histograms.
|
||||||
|
|
||||||
Use the `rate()` function to specify the time window for the quantile
|
Use the `rate()` function to specify the time window for the quantile
|
||||||
calculation.
|
calculation.
|
||||||
|
|
||||||
Example: A histogram metric is called `http_request_duration_seconds`. To
|
Example: A histogram metric is called `http_request_duration_seconds` (and
|
||||||
calculate the 90th percentile of request durations over the last 10m, use the
|
therefore the metric name for the buckets of a conventional histogram is
|
||||||
following expression:
|
`http_request_duration_seconds_bucket`). To calculate the 90th percentile of request
|
||||||
|
durations over the last 10m, use the following expression in case
|
||||||
|
`http_request_duration_seconds` is a conventional histogram:
|
||||||
|
|
||||||
histogram_quantile(0.9, rate(http_request_duration_seconds_bucket[10m]))
|
histogram_quantile(0.9, rate(http_request_duration_seconds_bucket[10m]))
|
||||||
|
|
||||||
|
For a native histogram, use the following expression instead:
|
||||||
|
|
||||||
|
histogram_quantile(0.9, rate(http_request_duration_seconds[10m]))
|
||||||
|
|
||||||
The quantile is calculated for each label combination in
|
The quantile is calculated for each label combination in
|
||||||
`http_request_duration_seconds`. To aggregate, use the `sum()` aggregator
|
`http_request_duration_seconds`. To aggregate, use the `sum()` aggregator
|
||||||
around the `rate()` function. Since the `le` label is required by
|
around the `rate()` function. Since the `le` label is required by
|
||||||
`histogram_quantile()`, it has to be included in the `by` clause. The following
|
`histogram_quantile()` to deal with conventional histograms, it has to be
|
||||||
expression aggregates the 90th percentile by `job`:
|
included in the `by` clause. The following expression aggregates the 90th
|
||||||
|
percentile by `job` for conventional histograms:
|
||||||
|
|
||||||
histogram_quantile(0.9, sum by (job, le) (rate(http_request_duration_seconds_bucket[10m])))
|
histogram_quantile(0.9, sum by (job, le) (rate(http_request_duration_seconds_bucket[10m])))
|
||||||
|
|
||||||
|
When aggregating native histograms, the expression simplifies to:
|
||||||
|
|
||||||
To aggregate everything, specify only the `le` label:
|
histogram_quantile(0.9, sum by (job) (rate(http_request_duration_seconds[10m])))
|
||||||
|
|
||||||
|
To aggregate all conventional histograms, specify only the `le` label:
|
||||||
|
|
||||||
histogram_quantile(0.9, sum by (le) (rate(http_request_duration_seconds_bucket[10m])))
|
histogram_quantile(0.9, sum by (le) (rate(http_request_duration_seconds_bucket[10m])))
|
||||||
|
|
||||||
The `histogram_quantile()` function interpolates quantile values by
|
With native histograms, aggregating everything works as usual without any `by` clause:
|
||||||
assuming a linear distribution within a bucket. The highest bucket
|
|
||||||
must have an upper bound of `+Inf`. (Otherwise, `NaN` is returned.) If
|
histogram_quantile(0.9, sum(rate(http_request_duration_seconds[10m])))
|
||||||
a quantile is located in the highest bucket, the upper bound of the
|
|
||||||
second highest bucket is returned. A lower limit of the lowest bucket
|
The `histogram_quantile()` function interpolates quantile values by
|
||||||
is assumed to be 0 if the upper bound of that bucket is greater than
|
assuming a linear distribution within a bucket.
|
||||||
0. In that case, the usual linear interpolation is applied within that
|
|
||||||
bucket. Otherwise, the upper bound of the lowest bucket is returned
|
If `b` has 0 observations, `NaN` is returned. For φ < 0, `-Inf` is
|
||||||
for quantiles located in the lowest bucket.
|
returned. For φ > 1, `+Inf` is returned. For φ = `NaN`, `NaN` is returned.
|
||||||
|
|
||||||
|
The following is only relevant for conventional histograms: If `b` contains
|
||||||
|
fewer than two buckets, `NaN` is returned. The highest bucket must have an
|
||||||
|
upper bound of `+Inf`. (Otherwise, `NaN` is returned.) If a quantile is located
|
||||||
|
in the highest bucket, the upper bound of the second highest bucket is
|
||||||
|
returned. A lower limit of the lowest bucket is assumed to be 0 if the upper
|
||||||
|
bound of that bucket is greater than
|
||||||
|
0. In that case, the usual linear interpolation is applied within that
|
||||||
|
bucket. Otherwise, the upper bound of the lowest bucket is returned for
|
||||||
|
quantiles located in the lowest bucket.
|
||||||
|
|
||||||
If `b` has 0 observations, `NaN` is returned. If `b` contains fewer than two buckets,
|
|
||||||
`NaN` is returned. For φ < 0, `-Inf` is returned. For φ > 1, `+Inf` is returned. For φ = `NaN`, `NaN` is returned.
|
|
||||||
|
|
||||||
## `holt_winters()`
|
## `holt_winters()`
|
||||||
|
|
||||||
|
@ -269,11 +360,17 @@ over the last 5 minutes, per time series in the range vector:
|
||||||
increase(http_requests_total{job="api-server"}[5m])
|
increase(http_requests_total{job="api-server"}[5m])
|
||||||
```
|
```
|
||||||
|
|
||||||
`increase` should only be used with counters. It is syntactic sugar
|
`increase` acts on native histograms by calculating a new histogram where each
|
||||||
for `rate(v)` multiplied by the number of seconds under the specified
|
compononent (sum and count of observations, buckets) is the increase between
|
||||||
time range window, and should be used primarily for human readability.
|
the respective component in the first and last native histogram in
|
||||||
Use `rate` in recording rules so that increases are tracked consistently
|
`v`. However, each element in `v` that contains a mix of float and native
|
||||||
on a per-second basis.
|
histogram samples within the range, will be missing from the result vector.
|
||||||
|
|
||||||
|
`increase` should only be used with counters and native histograms where the
|
||||||
|
components behave like counters. It is syntactic sugar for `rate(v)` multiplied
|
||||||
|
by the number of seconds under the specified time range window, and should be
|
||||||
|
used primarily for human readability. Use `rate` in recording rules so that
|
||||||
|
increases are tracked consistently on a per-second basis.
|
||||||
|
|
||||||
## `irate()`
|
## `irate()`
|
||||||
|
|
||||||
|
@ -385,8 +482,15 @@ over the last 5 minutes, per time series in the range vector:
|
||||||
rate(http_requests_total{job="api-server"}[5m])
|
rate(http_requests_total{job="api-server"}[5m])
|
||||||
```
|
```
|
||||||
|
|
||||||
`rate` should only be used with counters. It is best suited for alerting,
|
`rate` acts on native histograms by calculating a new histogram where each
|
||||||
and for graphing of slow-moving counters.
|
compononent (sum and count of observations, buckets) is the rate of increase
|
||||||
|
between the respective component in the first and last native histogram in
|
||||||
|
`v`. However, each element in `v` that contains a mix of float and native
|
||||||
|
histogram samples within the range, will be missing from the result vector.
|
||||||
|
|
||||||
|
`rate` should only be used with counters and native histograms where the
|
||||||
|
components behave like counters. It is best suited for alerting, and for
|
||||||
|
graphing of slow-moving counters.
|
||||||
|
|
||||||
Note that when combining `rate()` with an aggregation operator (e.g. `sum()`)
|
Note that when combining `rate()` with an aggregation operator (e.g. `sum()`)
|
||||||
or a function aggregating over time (any function ending in `_over_time`),
|
or a function aggregating over time (any function ending in `_over_time`),
|
||||||
|
|
|
@ -306,3 +306,31 @@ highest to lowest.
|
||||||
Operators on the same precedence level are left-associative. For example,
|
Operators on the same precedence level are left-associative. For example,
|
||||||
`2 * 3 % 2` is equivalent to `(2 * 3) % 2`. However `^` is right associative,
|
`2 * 3 % 2` is equivalent to `(2 * 3) % 2`. However `^` is right associative,
|
||||||
so `2 ^ 3 ^ 2` is equivalent to `2 ^ (3 ^ 2)`.
|
so `2 ^ 3 ^ 2` is equivalent to `2 ^ (3 ^ 2)`.
|
||||||
|
|
||||||
|
## Operators for native histograms
|
||||||
|
|
||||||
|
Native histograms are an experimental feature. Ingesting native histograms has
|
||||||
|
to be enabled via a [feature flag](../feature_flags/#native-histograms). Once
|
||||||
|
native histograms have been ingested, they can be queried (even after the
|
||||||
|
feature flag has been disabled again). However, the operator support for native
|
||||||
|
histograms is still very limited.
|
||||||
|
|
||||||
|
Logical/set binary operators work as expected even if histogram samples are
|
||||||
|
involved. They only check for the existence of a vector element and don't
|
||||||
|
change their behavior depending on the sample type of an element (float or
|
||||||
|
histogram).
|
||||||
|
|
||||||
|
The binary `+` operator between two native histograms and the `sum` aggregation
|
||||||
|
operator to aggregate native histograms are fully supported. Even if the
|
||||||
|
histograms involved have different bucket layouts, the buckets are
|
||||||
|
automatically converted appropriately so that the operation can be
|
||||||
|
performed. (With the currently supported bucket schemas, that's always
|
||||||
|
possible.) If either operator has to sum up a mix of histogram samples and
|
||||||
|
float samples, the corresponding vector element is removed from the output
|
||||||
|
vector entirely.
|
||||||
|
|
||||||
|
All other operators do not behave in a meaningful way. They either treat the
|
||||||
|
histogram sample as if it were a float sample of value 0, or (in case of
|
||||||
|
arithmetic operations between a scalar and a vector) they leave the histogram
|
||||||
|
sample unchanged. This behavior will change to a meaningful one before native
|
||||||
|
histograms are a stable feature.
|
||||||
|
|
Loading…
Reference in a new issue