mirror of https://github.com/prometheus/prometheus.git synced 2024-09-19 23:37:31 -07:00

JuanJo Ciarlante c94c5b64c3

feat: add limitk() and limit_ratio() operators (#12503 )

* rebase 2024-07-01, picks previous renaming to `limitk()` and `limit_ratio()`

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* gofumpt -d -extra

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* more lint fixes

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* more lint fixes+

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* put limitk() and limit_ratio() behind --enable-feature=promql-experimental-functions

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* EnableExperimentalFunctions for TestConcurrentRangeQueries() also

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* use testutil.RequireEqual to fix tests, WIP equivalent thingie for require.Contains

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* lint fix

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* moar linting

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* rebase 2024-06-19

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* re-add limit(2, metric) testing for N=2 common series subset

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* move `ratio = param` to default switch case, for better readability

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* gofumpt -d -extra util/testutil/cmp.go

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* early break when reaching k elems in limitk(), should have always been so (!)

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* small typo fix

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* no-change small break-loop rearrange for readability

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* remove IsNan(ratio) condition in switch-case, already handled as input validation

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* no-change adding some comments

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* no-change simplify fullMatrix() helper functions used for tests

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* add `limitk(-1, metric)` testcase, which is handled as any k < 1 case

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* engine_test.go: no-change create `requireCommonSeries() helper func (moving code into it) for readability

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* rebase 2024-06-21

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* engine_test.go: HAPPY NOW about its code -> reorg, create and use simpleRangeQuery() function, less lines and more readable ftW \o/

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* move limitk(), limit_ratio() testing to promql/promqltest/testdata/limit.test

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* remove stale leftover after moving tests from engine_test.go to testdata/

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* fix flaky `limit_ratio(0.5, ...)` test case

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* Update promql/engine.go

Co-authored-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* Update promql/engine.go

Co-authored-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* Update promql/engine.go

Co-authored-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* fix AddRatioSample() implementation to use a single conditional (instead of switch/case + fallback return)

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* docs/querying/operators.md: document r < 0

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* add negative limit_ratio() example to docs/querying/examples.md

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* move more extensive docu examples to docs/querying/operators.md

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* typo

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* small docu fix for poor-mans-normality-check, add it to limit.test ;)

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* limit.test: expand "Poor man's normality check" to whole eval range

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* restore mistakenly removed existing small comment

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* expand poors-man-normality-check case(s)

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* Revert "expand poors-man-normality-check case(s)"

This reverts commit f69e1603b2ebe69c0a100197cfbcf6f81644b564, indeed too
flaky 0:)

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* remove humor from docs/querying/operators.md

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* fix signoff

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* add web/ui missing changes

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* expand limit_ratio test cases, cross-fingering they'll not be flaky

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* remove flaky test

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* add missing warnings.Merge(ws) in instant-query return shortcut

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* add missing LimitK||LimitRatio case to codemirror-promql/src/parser/parser.ts

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* fix ui-lint

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

* actually fix returned warnings :]

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>

---------

Signed-off-by: JuanJo Ciarlante <juanjosec@gmail.com>
Co-authored-by: Julius Volz <julius.volz@gmail.com>

2024-07-03 22:18:57 +02:00

3.9 KiB

Raw Blame History

title	nav_title	sort_rank
Querying examples	Examples	4

Query examples

Simple time series selection

Return all time series with the metric http_requests_total:

http_requests_total

Return all time series with the metric http_requests_total and the given job and handler labels:

http_requests_total{job="apiserver", handler="/api/comments"}

Return a whole range of time (in this case 5 minutes up to the query time) for the same vector, making it a range vector:

http_requests_total{job="apiserver", handler="/api/comments"}[5m]

Note that an expression resulting in a range vector cannot be graphed directly, but viewed in the tabular ("Console") view of the expression browser.

Using regular expressions, you could select time series only for jobs whose name match a certain pattern, in this case, all jobs that end with server:

http_requests_total{job=~".*server"}

All regular expressions in Prometheus use RE2 syntax.

To select all HTTP status codes except 4xx ones, you could run:

http_requests_total{status!~"4.."}

Subquery

Return the 5-minute rate of the http_requests_total metric for the past 30 minutes, with a resolution of 1 minute.

rate(http_requests_total[5m])[30m:1m]

This is an example of a nested subquery. The subquery for the deriv function uses the default resolution. Note that using subqueries unnecessarily is unwise.

max_over_time(deriv(rate(distance_covered_total[5s])[30s:5s])[10m:])

Using functions, operators, etc.

Return the per-second rate for all time series with the http_requests_total metric name, as measured over the last 5 minutes:

rate(http_requests_total[5m])

Assuming that the http_requests_total time series all have the labels job (fanout by job name) and instance (fanout by instance of the job), we might want to sum over the rate of all instances, so we get fewer output time series, but still preserve the job dimension:

sum by (job) (
  rate(http_requests_total[5m])
)

If we have two different metrics with the same dimensional labels, we can apply binary operators to them and elements on both sides with the same label set will get matched and propagated to the output. For example, this expression returns the unused memory in MiB for every instance (on a fictional cluster scheduler exposing these metrics about the instances it runs):

(instance_memory_limit_bytes - instance_memory_usage_bytes) / 1024 / 1024

The same expression, but summed by application, could be written like this:

sum by (app, proc) (
  instance_memory_limit_bytes - instance_memory_usage_bytes
) / 1024 / 1024

If the same fictional cluster scheduler exposed CPU usage metrics like the following for every instance:

instance_cpu_time_ns{app="lion", proc="web", rev="34d0f99", env="prod", job="cluster-manager"}
instance_cpu_time_ns{app="elephant", proc="worker", rev="34d0f99", env="prod", job="cluster-manager"}
instance_cpu_time_ns{app="turtle", proc="api", rev="4d3a513", env="prod", job="cluster-manager"}
instance_cpu_time_ns{app="fox", proc="widget", rev="4d3a513", env="prod", job="cluster-manager"}
...

...we could get the top 3 CPU users grouped by application (app) and process type (proc) like this:

topk(3, sum by (app, proc) (rate(instance_cpu_time_ns[5m])))

Assuming this metric contains one time series per running instance, you could count the number of running instances per application like this:

count by (app) (instance_cpu_time_ns)

If we are exploring some metrics for their labels, to e.g. be able to aggregate over some of them, we could use the following:

limitk(10, app_foo_metric_bar)

Alternatively, if we wanted the returned timeseries to be more evenly sampled, we could use the following to get approximately 10% of them:

limit_ratio(0.1, app_foo_metric_bar)

3.9 KiB Raw Blame History

Query examples

Simple time series selection

Subquery

Using functions, operators, etc.

3.9 KiB

Raw Blame History