Commit graph

596 commits

Author SHA1 Message Date
Akshay Siwal 4c898fc4a1
Update getting_started.md
Updating signal for graceful shutdown.

Signed-off-by: Akshay Siwal <akshay.singh.siwal@gmail.com>
2023-05-15 13:18:30 +05:30
Björn Rabenstein b727e69b76
Merge pull request #12350 from prometheus/beorn7/histogram
textparse/scrape: Add option to scrape both classic and native histograms
2023-05-13 02:16:11 +02:00
beorn7 9e500345f3 textparse/scrape: Add option to scrape both classic and native histograms
So far, if a target exposes a histogram with both classic and native
buckets, a native-histogram enabled Prometheus would ignore the
classic buckets. With the new scrape config option
`scrape_classic_histograms` set, both buckets will be ingested,
creating all the series of a classic histogram in parallel to the
native histogram series. For example, a histogram `foo` would create a
native histogram series `foo` and classic series called `foo_sum`,
`foo_count`, and `foo_bucket`.

This feature can be used in a migration strategy from classic to
native histograms, where it is desired to have a transition period
during which both native and classic histograms are present.

Note that two bugs in classic histogram parsing were found and fixed
as a byproduct of testing the new feature:

1. Series created from classic _gauge_ histograms didn't get the
   _sum/_count/_bucket prefix set.
2. Values of classic _float_ histograms weren't parsed properly.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-05-13 01:32:25 +02:00
Bryan Boreham 734baa37e0
Merge pull request #12344 from bboreham/rw-stable
docs: state that remote write sending is stable
2023-05-10 18:56:16 +01:00
Bryan Boreham b1b8fd77c4 docs: state that remote write is stable
Since https://github.com/prometheus/docs/pull/2313 has been merged
declaring remote write to be stable at version 1.0.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-05-09 10:29:52 +00:00
Jeanette Tan 40240c9c1c Update according to code review
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2023-05-05 02:33:00 +08:00
Jeanette Tan 2ad39baa72 Treat bucket limit like sample limit and make it fail the whole scrape and return an error
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2023-04-22 03:25:07 +08:00
Jeanette Tan d3ad158a66 Update docs and comments
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2023-04-22 03:14:19 +08:00
gotjosh 74e6668e87
update docs
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2023-04-20 09:34:15 +01:00
gotjosh cf230bcd18
more wordsmithing
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2023-04-19 09:51:41 +01:00
gotjosh 28909a4636
more worthsmithing
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2023-04-18 16:51:35 +01:00
gotjosh e2a2790b2c
add more docs
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2023-04-18 16:50:16 +01:00
gotjosh f3394bf7a1
Rules API: Allow filtering by rule name
Introduces support for a new query parameter in the `/rules` API endpoint that allows filtering by rule names.

If all the rules of a group are filtered, we skip the group entirely.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2023-04-18 10:12:08 +01:00
beorn7 c0879d64cf promql: Separate Point into FPoint and HPoint
In other words: Instead of having a “polymorphous” `Point` that can
either contain a float value or a histogram value, use an `FPoint` for
floats and an `HPoint` for histograms.

This seemingly small change has a _lot_ of repercussions throughout
the codebase.

The idea here is to avoid the increase in size of `Point` arrays that
happened after native histograms had been added.

The higher-level data structures (`Sample`, `Series`, etc.) are still
“polymorphous”. The same idea could be applied to them, but at each
step the trade-offs needed to be evaluated.

The idea with this change is to do the minimum necessary to get back
to pre-histogram performance for functions that do not touch
histograms. Here are comparisons for the `changes` function. The test
data doesn't include histograms yet. Ideally, there would be no change
in the benchmark result at all.

First runtime v2.39 compared to directly prior to this commit:

```
name                                                  old time/op    new time/op    delta
RangeQuery/expr=changes(a_one[1d]),steps=1-16            391µs ± 2%     542µs ± 1%  +38.58%  (p=0.000 n=9+8)
RangeQuery/expr=changes(a_one[1d]),steps=10-16           452µs ± 2%     617µs ± 2%  +36.48%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_one[1d]),steps=100-16         1.12ms ± 1%    1.36ms ± 2%  +21.58%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_one[1d]),steps=1000-16        7.83ms ± 1%    8.94ms ± 1%  +14.21%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_ten[1d]),steps=1-16           2.98ms ± 0%    3.30ms ± 1%  +10.67%  (p=0.000 n=9+10)
RangeQuery/expr=changes(a_ten[1d]),steps=10-16          3.66ms ± 1%    4.10ms ± 1%  +11.82%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_ten[1d]),steps=100-16         10.5ms ± 0%    11.8ms ± 1%  +12.50%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_ten[1d]),steps=1000-16        77.6ms ± 1%    87.4ms ± 1%  +12.63%  (p=0.000 n=9+9)
RangeQuery/expr=changes(a_hundred[1d]),steps=1-16       30.4ms ± 2%    32.8ms ± 1%   +8.01%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=10-16      37.1ms ± 2%    40.6ms ± 2%   +9.64%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=100-16      105ms ± 1%     117ms ± 1%  +11.69%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16     783ms ± 3%     876ms ± 1%  +11.83%  (p=0.000 n=9+10)
```

And then runtime v2.39 compared to after this commit:

```
name                                                  old time/op    new time/op    delta
RangeQuery/expr=changes(a_one[1d]),steps=1-16            391µs ± 2%     547µs ± 1%  +39.84%  (p=0.000 n=9+8)
RangeQuery/expr=changes(a_one[1d]),steps=10-16           452µs ± 2%     616µs ± 2%  +36.15%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_one[1d]),steps=100-16         1.12ms ± 1%    1.26ms ± 1%  +12.20%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_one[1d]),steps=1000-16        7.83ms ± 1%    7.95ms ± 1%   +1.59%  (p=0.000 n=10+8)
RangeQuery/expr=changes(a_ten[1d]),steps=1-16           2.98ms ± 0%    3.38ms ± 2%  +13.49%  (p=0.000 n=9+10)
RangeQuery/expr=changes(a_ten[1d]),steps=10-16          3.66ms ± 1%    4.02ms ± 1%   +9.80%  (p=0.000 n=10+9)
RangeQuery/expr=changes(a_ten[1d]),steps=100-16         10.5ms ± 0%    10.8ms ± 1%   +3.08%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_ten[1d]),steps=1000-16        77.6ms ± 1%    78.1ms ± 1%   +0.58%  (p=0.035 n=9+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=1-16       30.4ms ± 2%    33.5ms ± 4%  +10.18%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=10-16      37.1ms ± 2%    40.0ms ± 1%   +7.98%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=100-16      105ms ± 1%     107ms ± 1%   +1.92%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16     783ms ± 3%     775ms ± 1%   -1.02%  (p=0.019 n=9+9)
```

In summary, the runtime doesn't really improve with this change for
queries with just a few steps. For queries with many steps, this
commit essentially reinstates the old performance. This is good
because the many-step queries are the one that matter most (longest
absolute runtime).

In terms of allocations, though, this commit doesn't make a dent at
all (numbers not shown). The reason is that most of the allocations
happen in the sampleRingIterator (in the storage package), which has
to be addressed in a separate commit.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-13 19:25:16 +02:00
Julien Pivotto 391473141d
Check health & ready: move to flags (#12223)
This makes it more consistent with other command like import rules. We
don't have stricts rules and uniformity accross promtool unfortunately,
but I think it's better to only have the http config on relevant check
commands to avoid thinking Prometheus can e.g. check the config over the
wire.

Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-04-05 09:45:39 +02:00
Nidhey Nitin Indurkar 3f7beeecc6
feat: health and readiness check of prometheus server in CLI (promtool) (#12096)
* feat: health and readiness check of prometheus server in CLI (promtool)

Signed-off-by: nidhey27 <nidhey.indurkar@infracloud.io>
2023-04-03 22:32:39 +02:00
Julien Pivotto ae220724d4 Docs: use boolean instead of bool
boolean makes the type consistent and clickable on
https://prometheus.io/docs/prometheus/latest/configuration/configuration/

Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-03-21 17:28:31 +01:00
Julien Pivotto fd8992cdbd
Merge pull request #12137 from g3offrey/fix/update-prometheus-ansible-link
docs: update ansible installation link
2023-03-20 11:14:56 +01:00
beorn7 71c57a1292 docs: Clarify that range selectors use a closed interval
Signed-off-by: beorn7 <beorn@grafana.com>
2023-03-16 13:55:57 +01:00
g3offrey d01c51fad0 docs: update ansible installation link
Signed-off-by: g3offrey <11151445+g3offrey@users.noreply.github.com>
2023-03-15 15:58:44 +01:00
Julien Pivotto bec68558ba
Merge pull request #12125 from roidelapluie/promtooldoc
Command Line Documentation
2023-03-14 10:59:33 +01:00
Julien Pivotto 1922db0586 Document command line tools
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-03-13 14:20:55 +01:00
Bartlomiej Plotka 742979a3e5
Merge pull request #10704 from hdost/feat/167-prometheus-docs
docs: Add signal information to getting started
2023-03-12 13:02:54 +01:00
Harold Dost 3125e169ae docs: Add signal information to getting started
Closes prometheus/docs#167

Signed-off-by: Harold Dost <h.dost@criteo.com>
2023-03-12 00:31:04 +01:00
Julien Pivotto 0c56e5d014 Update our own dependencies, support proxy from env
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-03-08 12:00:17 +01:00
Julien Pivotto 599b70a05d Add include scrape configs
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-03-06 23:35:39 +01:00
Charles Korn e023d896f2
Correct statement in docs about query results returning either floats or histograms but not both. (#11880)
* Correct statement in docs about query results returning either floats or histograms but not both.

* Move documentation for range and instant vectors under their corresponding headings.

Signed-off-by: Charles Korn <charles.korn@grafana.com>
2023-01-31 13:34:17 +05:30
Julien Pivotto aeecf6854f
Merge pull request #11827 from roidelapluie/stabilize
Add 'keep_firing_for' field to alerting rules
2023-01-25 09:52:45 +01:00
Peter Nicholson bba95df0e9 Update documentation
Signed-off-by: Peter Nicholson <petergoods@hotmail.com>
2023-01-19 18:58:17 +01:00
Frederic Branczyk 9f91215bf6
Merge pull request #11844 from bawhetst/add-pod-container-id
discovery/kubernetes: add container ID as a meta label for pod targets
2023-01-17 19:19:22 +01:00
Ben Whetstone 52d5a7c60f Document the __meta_kubernetes_pod_container_id meta label
Signed-off-by: Ben Whetstone <ben.whetstone@sysdig.com>
2023-01-17 11:15:52 -05:00
Julien Pivotto a35e54cc56
Merge pull request #11786 from LeviHarrison/remove-nomad-datacenter-docs
Remove Nomad `datacenter` field in configuration docs
2023-01-16 14:42:40 +01:00
Julien Pivotto ce55e5074d Add 'keep_firing_for' field to alerting rules
This commit adds a new 'keep_firing_for' field to Prometheus alerting
rules. The 'resolve_delay' field specifies the minimum amount of time
that an alert should remain firing, even if the expression does not
return any results.

This feature was discussed at a previous dev summit, and it was
determined that a feature like this would be useful in order to allow
the expression time to stabilize and prevent confusing resolved messages
from being propagated through Alertmanager.

This approach is simpler than having two PromQL queries, as was
sometimes discussed, and it should be easy to implement.

This commit does not include tests for the 'resolve_delay' field.  This
is intentional, as the purpose of this commit is to gather comments on
the proposed design of the 'resolve_delay' field before implementing
tests. Once the design of the 'resolve_delay' field has been finalized,
a follow-up commit will be submitted with tests."

See https://github.com/prometheus/prometheus/issues/11570

Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-01-13 12:11:39 +01:00
Ganesh Vernekar b4e15899d1
docs: Update recording rule docs about native histograms
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-01-12 20:43:02 +05:30
Ganesh Vernekar 2e538be5d7
docs: Update federation docs for native histograms
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-01-12 20:43:02 +05:30
Sam Jewell f88a0a7d83
Update example rules file to be valid with the default scrape config (#11692)
* Update docs example rules for default config

The prometheus download includes a default config to scrape itself.
This self-scraping prometheus doesn't include any metric named as
`http_inprogress_requests`, but does include one named
`prometheus_http_requests_total`.
Updating this example rule in the docs to one which can be used
out-of-the-box with the default download would be a nice improvement.

Signed-off-by: Sam Jewell <sam.jewell@grafana.com>

* Update syntax as per @LeviHarrison's review

Co-authored-by: Levi Harrison <levisamuelharrison@gmail.com>
Signed-off-by: Sam Jewell <2903904+samjewell@users.noreply.github.com>

Signed-off-by: Sam Jewell <sam.jewell@grafana.com>
Signed-off-by: Sam Jewell <2903904+samjewell@users.noreply.github.com>
Co-authored-by: Levi Harrison <levisamuelharrison@gmail.com>
2023-01-09 19:36:07 -05:00
Robbe Haesendonck e802ddf435 docs: 📝 Changed occurences of proxy_connect_headers to proxy_connect_header
Since the struct defines proxy_connect_header instead of proxy_connect_headers, all relevant occurences of it were replaced with the correct configuration name as defined in the HTTPClientConfig struct.

Signed-off-by: Robbe Haesendonck <googleit@inuits.eu>
2023-01-09 14:11:00 +01:00
Levi Harrison 89539c35c9 Remove nomad datacenter field in configuration docs
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2022-12-29 14:18:42 -05:00
Łukasz Mierzwa e1b7082008
Show individual scrape pools on /targets page (#11142)
* Add API endpoints for getting scrape pool names

This adds api/v1/scrape_pools endpoint that returns the list of *names* of all the scrape pools configured.
Having it allows to find out what scrape pools are defined without having to list and parse all targets.

The second change is adding scrapePool query parameter support in api/v1/targets endpoint, that allows to
filter returned targets by only finding ones for passed scrape pool name.

Both changes allow to query for a specific scrape pool data, rather than getting all the targets for all possible scrape pools.
The problem with api/v1/targets endpoint is that it returns huge amount of data if you configure a lot of scrape pools.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>

* Add a scrape pool selector on /targets page

Current targets page lists all possible targets. This works great if you only have a few scrape pools configured,
but for systems with a lot of scrape pools and targets this slow things down a lot.
Not only does the /targets page load very slowly in such case (waiting for huge API response) but it also take
a long time to render, due to huge number of elements.
This change adds a dropdown selector so it's possible to select only intersting scrape pool to view.
There's also scrapePool query param that will open selected pool automatically.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2022-12-23 11:55:08 +01:00
Pablo Ley fb30ffda75
Fixed typo in the Remote Read API docs
Signed-off-by: Pablo Ley <pablo_ley@hotmail.com>

Signed-off-by: Pablo Ley <pablo_ley@hotmail.com>
2022-12-21 12:44:25 +01:00
Julien Pivotto b2226258bc
Merge pull request #11706 from dannystaple/patch-1
Docs [unit-testing]: Add an explanation to the expanding notation
2022-12-21 09:21:12 +01:00
David Fridman 52adf55631
Add VM size label to azure service discovery (#11575) (#11650)
* Add VM size label to azure service discovery (#11575)

Signed-off-by: davidifr <davidfr.mail@gmail.com>

* Add VM size label to azure service discovery (#11575)

Signed-off-by: davidifr <davidfr.mail@gmail.com>

* Add VM size label to azure service discovery (#11575)

Signed-off-by: davidifr <davidfr.mail@gmail.com>

Signed-off-by: davidifr <davidfr.mail@gmail.com>
2022-12-16 13:14:35 -05:00
Danny Staple f3f800ea6f
Terminology amendment
Signed-off-by: Danny Staple <danny@orionrobots.co.uk>
2022-12-15 16:22:40 +00:00
Oleg Zaytsev 6197ed63d8
Remove comments from the remote read docs
I think these are not intended to be here.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2022-12-15 16:57:44 +01:00
Danny Staple 7269a6e21a
Fix the output example
(based on empirical unit testing)

Signed-off-by: Danny Staple <danny@orionrobots.co.uk>
2022-12-14 12:21:34 +00:00
Danny Staple 87b9f1d24a
Fix typo I introduced in unit testing rules.
Signed-off-by: Danny Staple <danny@orionrobots.co.uk>
2022-12-14 12:20:28 +00:00
Julien Pivotto c396c3e32f Update go dependencies before 2.41
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2022-12-14 11:30:36 +01:00
Danny Staple b614fdd8a7
Update unit_testing_rules.md
Update the shorthand, and note the different behaviour between missing samples and numbers.

Signed-off-by: Danny Staple <danny@orionrobots.co.uk>
2022-12-13 15:52:40 +00:00
Danny Staple 300d6e4390
Add an explanation to the expanding notation
After some team discussion, we found this to be a useful was to explain the samples.

Signed-off-by: Danny Staple <danny@orionrobots.co.uk>
2022-12-13 11:11:13 +00:00
John Carlo Roberto 924ba90c3f
Add link to best practices in "Defining Recording Rules" page (#11696)
* docs: Add link to best practices in "Defining Recording Rules" page

Signed-off-by: John Carlo Roberto <10111643+Irizwaririz@users.noreply.github.com>

* docs: Improve wording

Signed-off-by: John Carlo Roberto <10111643+Irizwaririz@users.noreply.github.com>

Signed-off-by: John Carlo Roberto <10111643+Irizwaririz@users.noreply.github.com>
2022-12-12 16:08:45 +01:00