Commit graph

12499 commits

Author SHA1 Message Date
Daniel Kimsey aa3e58358b consul: Add support for catalog list services filter
This adds support for Consul's Catalog [List Services][^1] API's `filter`
parameter added in 1.14.x. This parameter grants the operator more
flexibility to do server-side filtering of the Catalog, before
Prometheus subscribes for updates. Operators can use this to improve
both the performance of Prometheus's Consul SD and reduce the impact of
enumerating large catalogs.

[^1]: https://developer.hashicorp.com/consul/api-docs/v1.14.x/catalog

Signed-off-by: Daniel Kimsey <dekimsey@protonmail.com>
2024-03-17 20:32:54 -05:00
Bryan Boreham a0e93e403e
Merge pull request #13764 from bboreham/remove-deprecated-wal
[Cleanup] TSDB: Remove old deprecated WAL implementation

Deprecated since 2018.
2024-03-17 09:34:57 +00:00
Jan-Otto Kröpke 302e151de8
{discovery,remote_write}/azure: Support default SDK authentication (#13099)
* discovery/azure: Offer default SDK authentication

Signed-off-by: Jan-Otto Kröpke <mail@jkroepke.de>
2024-03-16 11:06:57 +00:00
Bryan Boreham 5ed21c0d76
Merge pull request #12933 from prymitive/duplicated_samples
When Prometheus scrapes a target and it sees the same time series repeated multiple times it currently silently ignores that. This change adds a test for that and fixes the scrape loop so that:

* Only first sample for each unique time series is appended
* Duplicated samples increment the prometheus_target_scrapes_sample_duplicate_timestamp_total metric
This allows one to identify such scrape jobs and targets.

Also fix some tests and benchmark.
2024-03-16 09:18:46 +00:00
Darshan Chaudhary b7047f7fcb
Fix retention boundary so 2h retention deletes blocks right at the 2h boundary (#9633)
Signed-off-by: darshanime <deathbullet@gmail.com>
2024-03-15 19:35:16 +01:00
Bryan Boreham c8c1ab36dc
MAINTAINERS: Add Bryan Boreham, Ayoub Mrini (#13771)
Also simplify structure. Ordering of 'general' maintainers is alphabetical by 2nd name.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-03-15 16:47:14 +00:00
Julien d1abc3f255
Merge pull request #13777 from roidelapluie/remoteread2
Chunked remote read: close the querier earlier
2024-03-15 14:42:30 +01:00
Julien Pivotto 53091126c2 Chunked remote read: close the querier earlier
I have seen prometheis instances misebehaving because of broken chinked remote
read requests.

In order to avoid OOM's when this happens, I propose to close the
queries used by the streamed remote read requests earlier.

Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2024-03-15 14:03:16 +01:00
Bryan Boreham d45b5deb75 TSDB: move function only used in tests
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-03-15 08:54:47 +00:00
Bryan Boreham 3274cac0d3 TSDB: remove unused function
Was only used in old WAL implementation.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-03-15 08:51:57 +00:00
Goutham Veeramachaneni 15809223c7
Merge pull request #13759 from jcajka/main
otlptranslator: fix up import paths
2024-03-15 09:40:26 +01:00
Björn Rabenstein 0f70fc7687
Merge pull request #13772 from machine424/signals
adjust signal termination warning log
2024-03-14 19:34:37 +01:00
machine424 3eed6c760a
adjust signal termination warning log
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-03-14 18:45:45 +01:00
Ben Kochie 537ce87d6b
Merge pull request #13770 from prometheus/superq/sync_description
Add container_description.yml to repo sync
2024-03-14 12:11:42 +01:00
Arve Knudsen 1de49d5b69
Remove unused function tsdb/chunks.PopulatedChunk (#13763)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-03-14 11:15:17 +01:00
SuperQ a0fbc75f34
Add container_description.yml to repo sync
Add the container_description.yml workflow to the repo file sync script.
* Skip sync if there is no Dockerfile.
* Fixup minor typo in container_description.yml.

Signed-off-by: SuperQ <superq@gmail.com>
2024-03-14 09:20:40 +01:00
Ben Kochie f252d8a9d1
Merge pull request #13761 from prometheus/superq/fixup_workflow
Fix container_description workflow
2024-03-14 08:42:15 +01:00
Bryan Boreham 87edf1f960 [Cleanup] TSDB: Remove old deprecated WAL implementation
Deprecated since 2018.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-03-13 15:57:23 +00:00
SuperQ 46401b988e
Normalize and fixup step names.
Signed-off-by: SuperQ <superq@gmail.com>
2024-03-13 15:56:37 +01:00
Jakub Čajka 505fd638be
otlptranslator: fix up import paths
Signed-off-by: Jakub Čajka <jcajka@redhat.com>
2024-03-13 15:56:14 +01:00
Bartlomiej Plotka 312e3fd728
Merge pull request #13713 from charleskorn/query-engine-interface
rules: allow using alternative PromQL engines for rule evaluation by callers using Prometheus as a lib.
2024-03-13 14:45:42 +01:00
SuperQ 3bff79451d
Fix container_description workflow
Fix yaml indentation. 🤦

Signed-off-by: SuperQ <superq@gmail.com>
2024-03-13 14:28:05 +01:00
Björn Rabenstein f1e9ec29f8
Merge pull request #13752 from prometheus/superq/publish_docker_readme
Add GitHub action to publish container README
2024-03-13 12:57:22 +01:00
dependabot[bot] cd3e0078f0
build(deps): bump github.com/prometheus/common (#13728)
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.49.0 to 0.50.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](https://github.com/prometheus/common/compare/v0.49.0...v0.50.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-12 20:07:03 +01:00
Björn Rabenstein af3618fd35
Merge pull request #13667 from prometheus/beorn7/promql
Improve TestQueryStatistics and fix bugs exposed by it
2024-03-12 16:17:11 +01:00
SuperQ 2061eb0a6a
Add GitHub action to publish container README
Add a GitHub action to publish the README.md to Docker Hub and Quay.io.

Fixes: https://github.com/prometheus/prometheus/issues/5348

Signed-off-by: SuperQ <superq@gmail.com>
2024-03-12 14:18:52 +01:00
Bryan Boreham 0bb5588386
labels: optimize String method (#13673)
Use a stack buffer to reduce memory allocations.

`Write(AppendQuote(AvailableBuffer` does not allocate or copy when
the buffer has sufficient space.

Also add a benchmark, with some refactoring.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-03-12 11:34:03 +00:00
Bryan Boreham d08f054950
[ENHANCEMENT] TSDB: Check CRC without allocating (#13742)
Use the existing utility function which does this.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-03-12 12:24:27 +01:00
Charles Korn 26262a1eb7
Remove unnecessary SetQueryLogger method on QueryEngine interface
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2024-03-12 22:01:01 +11:00
Julien ca57bd6352
Merge pull request #13743 from carrychair/main
fix function and struct names in comments
2024-03-12 11:50:25 +01:00
Julien 106d98b986
Merge pull request #13751 from testwill/http_statuscode
chore: use constant instead of numeric literal
2024-03-12 11:41:34 +01:00
guoguangwu 1cccdbaedb chore: use constant instead of numeric literal
Signed-off-by: guoguangwu <guoguangwug@gmail.com>
2024-03-12 17:19:50 +08:00
Julien b4d4dcd9f6
Merge pull request #13739 from bboreham/no-race-prev-go
CI: don't run race-detector on tests with previous Go version
2024-03-11 13:55:19 +01:00
carrychair 856f6e49c8 fix function and struct name
Signed-off-by: carrychair <linghuchong404@gmail.com>
2024-03-09 17:53:17 +08:00
michaelact eea6ab1cdd
[BUGFIX] Azure SD: Fix 'error: parameter virtualMachineScaleSetName cannot be empty' (#13702)
Erroneous code was introduced during a merge-back-to-main at #13399.

Signed-off-by: michaelact <86778470+michaelact@users.noreply.github.com>
2024-03-08 15:19:39 +00:00
Bryan Boreham e8bf2ce4e1
Merge pull request #13735 from bboreham/fix-notifier-relabel
[BUGFIX] Alerts: don't reuse payload after relabeling.
2024-03-08 12:29:36 +00:00
Bryan Boreham 54f50e1498
Merge pull request #13737 from bboreham/fix-scrape-tolerance
[BUGFIX] Scraping: Tolerance should be max 1% of interval
2024-03-08 12:28:04 +00:00
Bryan Boreham e54082a621 CI: don't run race-detector on tests with previous Go version
The purpose of running with a previous Go version is to spot usage of
new language features; we don't need to intensively look for bugs.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-03-08 10:38:10 +00:00
Bryan Boreham 6c41ec984f [BUGFIX] Scraping: Tolerance should be max 1% of interval
Previous code set it at minimum 1%, which was not intended.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-03-08 10:18:18 +00:00
Bryan Boreham 8c4e4b72a8 Notifier: pass parameters to goroutine explicitly
Avoids possible false sharing between loops.

Plausibly there is no problem in the current code, but it's easy enough to write it more safely.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-03-08 09:20:36 +00:00
Bryan Boreham 57c799132b Notifier: don't reuse payload after relabeling
Also clarify why these variables are being cleared.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-03-08 09:16:43 +00:00
Björn Rabenstein 9acae57937
Merge pull request #13681 from krajorama/native-latency-histograms
Add native histograms to latency/duration metrics
2024-03-07 20:46:43 +01:00
Bryan Boreham 3d16d39881
Merge pull request #13716 from prometheus/update-go-deps
Update Go dependencies for v2.51
2024-03-07 12:05:58 +00:00
Björn Rabenstein b0c0961f9d
Merge pull request #13725 from prometheus/beorn7/promql2
promql: Fix limiting of extrapolation to negative values
2024-03-07 12:36:17 +01:00
Bryan Boreham 28f8a346bc Wind back gophercloud to 1.8.0
It triggers a failure in TestOpenstackSDInstanceRefresh.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-03-07 10:13:59 +00:00
Bryan Boreham 34a655716e Back off google.golang.org/protobuf to v1.32.0
Otherwise we get a compilation error.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-03-07 10:13:59 +00:00
Bryan Boreham 3c2c9ac067 Update Go dependencies for v2.51
Simply make update-all-go-deps

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-03-07 10:13:59 +00:00
Bryan Boreham e705d83e37
Merge pull request #13721 from bboreham/no-codeql-go
CI: stop running codeql for Go
2024-03-07 10:09:53 +00:00
Bryan Boreham b965aac82d
Merge pull request #13722 from bboreham/split-go-tests
CI: split Go tests into two parts, to run in parallel
2024-03-07 10:09:13 +00:00
beorn7 7f912db15a promql: Fix limiting of extrapolation to negative values
This is a bit tough to explain, but I'll try:

`rate` & friends have a sophisticated extrapolation algorithm.
Usually, we extrapolate the result to the total interval specified in
the range selector. However, if the first sample within the range is
too far away from the beginning of the interval, or if the last sample
within the range is too far away from the end of the interval, we
assume the series has just started half a sampling interval before the
first sample or after the last sample, respectively, and shorten the
extrapolation interval correspondingly. We calculate the sampling
interval by looking at the average time between samples within the
range, and we define "too far away" as "more than 110% of that
sampling interval".

However, if this algorithm leads to an extrapolated starting value
that is negative, we limit the start of the extrapolation interval to
the point where the extrapolated starting value is zero.

At least that was the intention.

What we actually implemented is the following: If extrapolating all
the way to the beginning of the total interval would lead to an
extrapolated negative value, we would only extrapolate to the zero
point as above, even if the algorithm above would have selected a
starting point that is just half a sampling interval before the first
sample and that starting point would not have an extrapolated negative
value. In other word: What was meant as a _limitation_ of the
extrapolation interval yielded a _longer_ extrapolation interval in
this case.

There is an exception to the case just described: If the increase of
the extrapolation interval is more than 110% of the sampling interval,
we suddenly drop back to only extrapolate to half a sampling interval.

This behavior can be nicely seen in the testcounter_zero_cutoff test,
where the rate goes up all the way to 0.7 and then jumps back to 0.6.

This commit changes the behavior to what was (presumably) intended
from the beginning: The extension of the extrapolation interval is
only limited if actually needed to prevent extrapolation to negative
values, but the "limitation" never leads to _more_ extrapolation
anymore.

The difference is subtle, and probably it never bothered anyone.
However, if you calculate a rate of a classic histograms, the old
behavior might create non-monotonic histograms as a result (because of
the jumps you can see nicely in the old version of the
testcounter_zero_cutoff test). With this fix, that doesn't happen
anymore.

Signed-off-by: beorn7 <beorn@grafana.com>
2024-03-07 01:20:33 +01:00