Commit graph

12702 commits

Author SHA1 Message Date
gotjosh 276201598c
Fix tests and a bug with the series lookup logic.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-24 18:46:05 +01:00
gotjosh e6dcbd2e26
bug: nil check against the series set not errors
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-24 18:46:05 +01:00
gotjosh 4daaa59c08
Rule Manager: Only query once per alert rule when restoring alert state
Prometheus restores alert state between restarts and updates. For each rule, it looks at the alerts that are meant to be active and then queries the `ALERTS_FOR_STATE` series for _each_ alert within the rules.

If the alert rule has 120 instances (or series) it'll execute the same query with slightly different labels.

This PR changes the approach so that we only query once per alert rule and then match the corresponding alert that we're about to restore against the series-set. While the approach might use a bit more memory at start-up (if even?) the restore proccess is only ran once per restart so I'd consider this a big win.

This builds on top of #13974

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-24 18:46:05 +01:00
gotjosh 4ac78063ee
Merge pull request #13974 from prometheus/measure-restore-time-rules
Rule Manager: Add `rule_group_last_restore_duration_seconds` to measure restore time per rule group
2024-04-24 16:04:30 +01:00
Alan Protasio d15869af32
Avoid creating new slices for labels values on postings for matchers (#13958)
* Avoid creating new slices for labels values on postings for matchers

Signed-off-by: alanprot <alanprot@gmail.com>

* refactor

Signed-off-by: alanprot <alanprot@gmail.com>

---------

Signed-off-by: alanprot <alanprot@gmail.com>
2024-04-24 16:41:33 +02:00
gotjosh 5beb2fe005
Improve the metric description
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-24 15:24:35 +01:00
gotjosh d672eda979
Add a changelog entry
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-24 14:31:18 +01:00
gotjosh 381a77ac1e
Change variable name to restoreStartTime from now and introduce a log line to record total time
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-24 14:21:11 +01:00
Will Hegedus bd1878700b
promtool: Fix panic on extended tsdb analyze (#13976)
Currently, running promtool tsdb analyze with the --extended flag
will cause an 'index out of range' error if running it
against a block that does not have any native histogram chunks.

This change ensures that promtool won't try to display data that doesn't exist.

Signed-off-by: Will Hegedus <whegedus@linode.com>
2024-04-24 11:35:34 +10:00
gotjosh e7219e3d36
Rule Manager: Add rule_group_last_restore_duration_seconds to measure restore time per rule group
When a rule group changes or prometheus is restarted we need to ensure we restore the active alerts that were firing for a corresponding rule, for that Prometheus uses the `ALERTS_FOR_STATE` series to query the previous state and restore it. If a given rule has high cardinality (think 100s of 1000s for series) this proccess can take a bit of time - this is the first of a series of PRs to improve this problem and I'd like to start with exposing the time it takes to restore a rule group as a gauge.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-23 09:57:08 +01:00
Arthur Silva Sens 76b0318ed5
Merge pull request #13962 from prometheus/dependabot/go_modules/github.com/aws/aws-sdk-go-1.51.25
build(deps): bump github.com/aws/aws-sdk-go from 1.51.24 to 1.51.25
2024-04-22 09:26:07 -03:00
Arthur Silva Sens a903ef83ee
Merge pull request #13961 from prometheus/dependabot/go_modules/github.com/hetznercloud/hcloud-go/v2-2.7.2
build(deps): bump github.com/hetznercloud/hcloud-go/v2 from 2.7.1 to 2.7.2
2024-04-22 09:25:47 -03:00
Ben Kochie 8cd7e04fd2
Merge pull request #13874 from prometheus/dependabot/github_actions/bufbuild/buf-lint-action-1.1.1
build(deps): bump bufbuild/buf-lint-action from 1.1.0 to 1.1.1
2024-04-20 13:14:15 +02:00
Ben Kochie 97eab6842c
Merge pull request #13873 from prometheus/dependabot/github_actions/bufbuild/buf-breaking-action-1.1.4
build(deps): bump bufbuild/buf-breaking-action from 1.1.2 to 1.1.4
2024-04-20 13:13:55 +02:00
dependabot[bot] f65e94bdbc
build(deps): bump github.com/aws/aws-sdk-go from 1.51.24 to 1.51.25
Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.51.24 to 1.51.25.
- [Release notes](https://github.com/aws/aws-sdk-go/releases)
- [Commits](https://github.com/aws/aws-sdk-go/compare/v1.51.24...v1.51.25)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-20 10:23:03 +00:00
dependabot[bot] cd078b07d9
build(deps): bump github.com/hetznercloud/hcloud-go/v2
Bumps [github.com/hetznercloud/hcloud-go/v2](https://github.com/hetznercloud/hcloud-go) from 2.7.1 to 2.7.2.
- [Release notes](https://github.com/hetznercloud/hcloud-go/releases)
- [Changelog](https://github.com/hetznercloud/hcloud-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/hetznercloud/hcloud-go/compare/v2.7.1...v2.7.2)

---
updated-dependencies:
- dependency-name: github.com/hetznercloud/hcloud-go/v2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-20 10:23:00 +00:00
Arthur Silva Sens 0f222edf7f
Merge pull request #13954 from prometheus/update-go-modules
Update go modules before 2.52 release
2024-04-20 07:21:42 -03:00
Simon Pasquier f36915b6b1
Merge pull request #13935 from simonpasquier/more-endpointslices-metadata
discovery(k8s): add metadata labels to endpointslices
2024-04-19 10:31:16 +02:00
Arthur Silva Sens b5b5e1e5ae
Merge pull request #13919 from GiedriusS/dont_forget_to_unregister
tsdb/wlog: unregister metrics on WL close
2024-04-18 16:44:03 -03:00
Arthur Silva Sens bcb3e2c515
Downgrade github.com/ovh/go-ovh back to v1.4.3
Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>
2024-04-18 16:35:58 -03:00
Arthur Silva Sens 8543f4827b
Downgrade k8s apis back to v0.29.3
Since it requires go 1.22

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>
2024-04-18 16:20:37 -03:00
Arthur Silva Sens c152b026b4
Update Go dependencies before 2.52
Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>
2024-04-18 15:56:11 -03:00
Arthur Silva Sens 9faf105ab1
Merge pull request #13952 from prometheus/dependabot/go_modules/documentation/examples/remote_storage/github.com/prometheus/common-0.53.0
build(deps): bump github.com/prometheus/common from 0.50.0 to 0.53.0 in /documentation/examples/remote_storage
2024-04-18 14:58:43 -03:00
dependabot[bot] 4aca4e2cbd
build(deps): bump github.com/prometheus/common
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.50.0 to 0.53.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](https://github.com/prometheus/common/compare/v0.50.0...v0.53.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-18 14:27:33 +00:00
Arthur Silva Sens 1bab59564e
Merge pull request #13812 from prometheus/dependabot/github_actions/actions/setup-node-4.0.2
build(deps): bump actions/setup-node from 4.0.1 to 4.0.2
2024-04-18 11:26:15 -03:00
Arthur Silva Sens 4557da8e07
Merge pull request #13811 from prometheus/dependabot/github_actions/actions/cache-4.0.2
build(deps): bump actions/cache from 4.0.1 to 4.0.2
2024-04-18 11:25:41 -03:00
Arthur Silva Sens ee42e639c1
Merge pull request #13810 from prometheus/dependabot/github_actions/actions/checkout-4.1.2
build(deps): bump actions/checkout from 4.1.1 to 4.1.2
2024-04-18 11:25:21 -03:00
Arthur Silva Sens 1030205cf1
Merge pull request #13867 from prometheus/dependabot/go_modules/documentation/examples/remote_storage/github.com/prometheus/prometheus-0.51.1
build(deps): bump github.com/prometheus/prometheus from 0.50.1 to 0.51.1 in /documentation/examples/remote_storage
2024-04-18 09:40:51 -03:00
Arthur Silva Sens dff4c0678a
Merge pull request #13817 from prometheus/dependabot/go_modules/github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/compute/armcompute/v5-5.6.0
build(deps): bump github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/compute/armcompute/v5 from 5.5.0 to 5.6.0
2024-04-18 09:40:20 -03:00
Giedrius Statkevičius bdf490726a tsdb/wlog: add test for metrics unregistering
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-04-18 11:11:37 +03:00
Simon Pasquier 7704cde4ea
discovery(k8s): add metadata labels to endpointslices
This commit adds 2 new metadata labels for the endpointslice role:
* `__meta_kubernetes_endpointslice_endpoint_node_name`
* `__meta_kubernetes_endpointslice_endpoint_zone`

The latter is only present when the `discovery.k8s.io/v1` API group is
available.

I also updated the configuration doc and added an entry for the
`__meta_kubernetes_endpointslice_endpoint_hostname` label which was
missing.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2024-04-17 11:20:19 +02:00
Björn Rabenstein b938bbc111
Merge pull request #13891 from prometheus/beorn7/maintainers
List Prometheus v3 coordinators in MAINTAINERS.md
2024-04-16 14:15:07 +02:00
Owen Williams 4a6f8704ef
parser: remake generated_parser output (#13923)
In a previous PR, the generated parser was created using an old version of goyacc.

Also adds -l to disable line directives, which fixes debug processing and reduces diffs at the expense of making it more difficult to reason about the generated output.

Signed-off-by: Owen Williams <owen.williams@grafana.com>
2024-04-13 12:59:54 +02:00
Björn Rabenstein 4ec5c25393
Merge pull request #13731 from suntala/suntala/native-histogram-template
histograms: support expansion of native histogram values in templating
2024-04-11 13:24:26 +02:00
Neeraj Gartia 612de026da
Adds Inf and NaN as Numbers to Histogram in Promql Testing Framework (#13916)
includes Inf and NaN as numbers to histogram

---------

Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
Signed-off-by: Björn Rabenstein <github@rabenste.in>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
2024-04-11 12:53:28 +02:00
Bryan Boreham 2a43026558
Merge pull request #13913 from prometheus/merge-2.51.2-into-main
Merge 2.51.2 into main
2024-04-11 09:38:24 +01:00
Giedrius Statkevičius 3b8fe00767 tsdb/wlog: unregister metrics on WL close
Thanos can create and destroy TSDBs dynamically, and once a TSDB
disappears its files are deleted. Calculating the size of the
WAL then fails with errors like:

```
msg: "Failed to calculate size of "wal" dir", "err": "lstat
/tsdbdir/wal: no such file or directory", "caller": "wlog.go:271"
```

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-04-11 11:30:05 +03:00
Matthieu MOREL 6f595c6762
golangci-lint: enable whitespace linter (#13905)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-04-11 09:27:54 +01:00
Bryan Boreham e1dd8e72df
Merge branch 'main' into merge-2.51.2-into-main
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-04-10 15:05:52 +01:00
Bryan Boreham 594b317ecc
Merge pull request #13898 from hanghuge/main
[DOCS] Fix unavailable link to Kubernetes docs
2024-04-10 11:52:12 +01:00
Bryan Boreham adf993946b
Merge pull request #13906 from mmorel-35/usestdlibvars
golangci-lint: enable usestdlibvars linter
2024-04-10 11:14:54 +01:00
Bryan Boreham b4c0ab52c3 Cut release 2.51.2
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-04-10 11:08:16 +01:00
beorn7 f07f4f293f List Prometheus v3 coordinators in MAINTAINERS.md
The Prometheus v3 coordinators are full members of the Prometheus GH
org, a status that our governance specifies for maintainers. As
discussed, it appears best to formalize maintainership by listing the
coordinators in the MAINTAINERS.md file of prometheus/prometheus.

Signed-off-by: beorn7 <beorn@grafana.com>
2024-04-09 17:44:25 +02:00
Matthieu MOREL d496687c8e golangci-lint: enable usestdlibvars linter
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-04-08 19:26:23 +00:00
Jonathan Halterman 633224886a
Write out of order hint when initially creating meta file (#13894)
Signed-off-by: Jonathan Halterman <jonathan@grafana.com>
Signed-off-by: Jonathan Halterman <jhalterman@gmail.com>
Co-authored-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>
2024-04-08 17:34:14 +02:00
hanghuge c14a158d03 Signed-off-by: hanghuge <cmoman@outlook.com>
Fix unavailable link

Signed-off-by: hanghuge <cmoman@outlook.com>
2024-04-08 18:44:22 +08:00
Bryan Boreham 776eea6e96
Merge pull request #13895 from schustersv/scrape_interval_relabel_doc_fix
doc: scrape_config/interval relabelling is not experimental any more
2024-04-07 18:33:33 +01:00
Łukasz Mierzwa 277f04f0c4
Stop compactions if there's a block to write (#13754)
* Stop compactions if there's a block to write

db.Compact() checks if there's a block to write with HEAD chunks before calling db.compactBlocks().
This is to ensure that if we need to write a block then it happens ASAP, otherwise memory usage might keep growing.

But what can also happen is that we don't need to write any block, we start db.compactBlocks(),
compaction takes hours, and in the meantime HEAD needs to write out chunks to a block.

This can be especially problematic if, for example, you run Thanos sidecar that's uploading block,
which requires that compactions are disabled. Then you disable Thanos sidecar and re-enable compactions.
When db.compactBlocks() is finally called it might have a huge number of blocks to compact, which might
take a very long time, during which HEAD cannot write out chunks to a new block.
In such case memory usage will keep growing until either:
- compactions are finally finished and HEAD can write a block
- we run out of memory and Prometheus gets OOM-killed

This change adds a check for pending HEAD block writes inside db.compactBlocks(), so that
we bail out early if there are still compactions to run, but we also need to write a new
block.

Also add a test for compactBlocks.

---------

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
Signed-off-by: Lukasz Mierzwa <lukasz@cloudflare.com>
2024-04-07 18:28:28 +01:00
Bryan Boreham fc567a1df8
Merge pull request #13889 from komisan19/refactor/utilize_standard_functions_max/min
refactor: utilize standard functions max/min in promtool and tsdb
2024-04-06 10:23:18 +01:00
Arthur Silva Sens b4a973753c
Merge pull request #13897 from dashpole/unregister_scrape_metrics 2024-04-05 14:44:32 -03:00