Commit graph

9665 commits

Author SHA1 Message Date
Bryan Boreham 1ed94142fc
remote-write: slow down retries to avoid DDOS (#9634)
* remote-write: slow down retries to avoid DDOS

Increase the default max retry time from 100ms to 5 seconds.

Remote write calls are retried after a recoverable error such as the
back-end returning 500. Prometheus waits the minimum time and retries,
then doubles the wait on each subsequent retry until the maximum is
reached.

If some data is still getting through, remote-write will also increase
shards, and the default maximum is 200. 200 shards sending every 100ms
is 20 calls per second, to a back-end that is already in trouble.

5 seconds was chosen to match the default BatchSendDeadline: if we can
afford to wait that long for no response, then we can wait the same time
to retry. We will reach 5 seconds after 9 successive failures.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Update config doc for max_backoff change

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2021-11-09 14:08:24 -08:00
Mateusz Gozdek f4650c27e7 tsdb/wal: fix flaky TestReaderFuzz* tests
It seems sometimes you can get error like:

                Error:          Not equal:
                                expected: []byte(nil)
                                actual  : []byte{}

                                Diff:
                                --- Expected
                                +++ Actual
                                @@ -1,2 +1,3 @@
                                -([]uint8) <nil>
                                +([]uint8) {
                                +}

This commit does what bytes.Equal does to silence those differences. I'm
not sure if this is a correct solution or just covering up the actual bug.

Closes #9574

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
2021-11-09 14:32:20 +01:00
Björn Rabenstein 4c56a193c5
Merge pull request #9478 from prometheus/beorn7/pkg-deprecation
Move packages out of deprecated pkg directory
2021-11-09 11:09:16 +01:00
曹明 a0d31c28fc tsdb: Add windows arm64 support.
Signed-off-by: 曹明 <caoming1@kingsoft.com>
2021-11-09 11:07:27 +01:00
Mateusz Gozdek b319b14431
tsdb/chunks: preallocate at least some space on non-Windows systems (#9581)
To avoid potential chunk corruption read, which I am not sure why is
happening.

Closes #9561.

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
2021-11-09 13:47:00 +05:30
beorn7 c954cd9d1d Move packages out of deprecated pkg directory
This creates a new `model` directory and moves all data-model related
packages over there:
  exemplar labels relabel rulefmt textparse timestamp value

All the others are more or less utilities and have been moved to `util`:
  gate logging modetimevfs pool runtime

Signed-off-by: beorn7 <beorn@grafana.com>
2021-11-09 08:03:10 +01:00
Niko Smeds fdcd423dfe Increase time range for PrometheusHAGroupCrashlooping alert
Signed-off-by: Niko Smeds <nikosmeds@gmail.com>
2021-11-08 15:06:42 -08:00
Julien Pivotto c564984daa
Reduce Prometheus pull requests builds (#9666)
* Reduce Prometheus builds

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-11-09 00:02:23 +01:00
Torbjörn Lönnemark 5e06527190 Fix incorrect PR reference in 2.25.0 changelog
Signed-off-by: Torbjörn Lönnemark <tobbez@ryara.net>
2021-11-08 23:25:45 +01:00
Julien Pivotto 9d1c1cd551 Add Robert as tsdb/agent maintainer
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-11-08 22:32:39 +01:00
beorn7 a1e595edac Fix two trivial lint warnings
Not sure why those show up for me locally but not if run by the CI.

Signed-off-by: beorn7 <beorn@grafana.com>
2021-11-08 22:32:13 +01:00
Bryan Boreham 26d8ae0e41 Rules: simplify map key for stale series detection
The rules manager keeps a note of which series were generated by the
last run, so it can write a stale marker to those that disappeared.
Since the keys are not for human eyes, we can use a simpler format
and save the effort of quoting label values.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2021-11-08 22:18:48 +01:00
Matthew 628211c25a
Feat UI metrics search (#9629)
* feat: add search to metrics explorer

Signed-off-by: mtfoley <mtfoley.mae@gmail.com>

* fix: ui-lint and ui-build errors

Signed-off-by: mtfoley <mtfoley.mae@gmail.com>

* feat: use @nexucis/fuzzy

Signed-off-by: mtfoley <mtfoley.mae@gmail.com>

* chore: code style and delete commented test

Signed-off-by: mtfoley <mtfoley.mae@gmail.com>

* rename Props to MetricsExplorerProps

Signed-off-by: mtfoley <mtfoley.mae@gmail.com>
2021-11-08 19:11:39 +01:00
Augustin Husson 4caae4e4a6
add a negative boost for some trigonometric functions that can overlapp other regular promQL functions (#9688)
* add a negative boost for some trigonometric functions that can overlapp other regular promQL functions

Signed-off-by: Augustin Husson <husson.augustin@gmail.com>

* add comments to explain the purpose of the attribute boost

Signed-off-by: Augustin Husson <husson.augustin@gmail.com>
2021-11-08 14:32:38 +01:00
Conor Evans c28b9a0574
Add datacenter to Consul service discovery logs (#9668)
* add datacenter to consul service discovery logs

Signed-off-by: Conor Evans <coevans@tcd.ie>
2021-11-08 09:34:21 +01:00
Dieter Plaetinck cda025b5b5
TSDB: demistify SeriesRefs and ChunkRefs (#9536)
* TSDB: demistify seriesRefs and ChunkRefs

The TSDB package contains many types of series and chunk references,
all shrouded in uint types.  Often the same uint value may
actually mean one of different types, in non-obvious ways.

This PR aims to clarify the code and help navigating to relevant docs,
usage, etc much quicker.

Concretely:

* Use appropriately named types and document their semantics and
  relations.
* Make multiplexing and demuxing of types explicit
  (on the boundaries between concrete implementations and generic
  interfaces).
* Casting between different types should be free.  None of the changes
  should have any impact on how the code runs.

TODO: Implement BlockSeriesRef where appropriate (for a future PR)

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* feedback

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* agent: demistify seriesRefs and ChunkRefs

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
2021-11-06 15:40:04 +05:30
johncming b882d2b7c7
tsdb/wal: Avoid writing closed channel. (#9566)
Signed-off-by: johncming <johncming@yahoo.com>
2021-11-06 15:11:06 +05:30
chenlujjj d18e42c650
refine comments of Checkpoint function (#9655)
Signed-off-by: chenlujjj <953546398@qq.com>
2021-11-06 15:09:16 +05:30
Bartlomiej Plotka 789274bf9c cmd: Fixed storage flag regression introduced in #9660
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2021-11-06 00:16:43 +01:00
Yijie Qin 6fce45838a
Add access function for restoration state of alerting rule (#9665) 2021-11-05 18:26:29 -04:00
Julien Pivotto b9c814fce6
Merge pull request #9681 from prometheus/release-2.31
merge back release 2.31
2021-11-05 21:21:04 +01:00
Julien Pivotto 411021ada9
Release 2.31.1 (#9669)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-11-05 21:13:21 +01:00
Marco Pracucci 309b094b92
Optimized MemPostings.EnsureOrder() (#9673)
* Optimizes MemPostings.EnsureOrder()

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Ignore linter warning

Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-11-05 10:01:23 +00:00
Julien Pivotto 9621c2c0cc
Fix race with targets update during ApplyConfig (#9656)
I ended up extending the lock so refTargets remains valid for the
duration of the update.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-11-05 01:13:04 +01:00
Sunil Thaha 4bdaea7663
fix: storage.tsdb.path randomly initialised to data-agent/ (#9660)
Using the same variable for storage.tsdb.path and storage.agent.path
as below in main.go causes cfg.localStoragePath to be data/ or
data-agent/ at random.

  a.Flag("storage.tsdb.path", "Base path for metrics storage.").
      PreAction(serverOnlySetting()).
      Default("data/").StringVar(&cfg.localStoragePath)

  a.Flag("storage.agent.path", "Base path for metrics storage.").
      PreAction(agentOnlySetting()).
      Default("data-agent/").StringVar(&cfg.localStoragePath)
This patch fixes it by using a different variable for storage.agent.path

Signed-off-by: Sunil Thaha sthaha@redhat.com

Signed-off-by: Sunil Thaha <sthaha@redhat.com>
2021-11-04 10:08:01 +00:00
Bartlomiej Plotka e68ccc7708
Fix misleading agent-only/server-only check messages. (#9650)
* Fix misleading agent-only/server-only check messages.

Issue:

```
[root@host01 ~]# docker run -it --net=host --rm -v /root/editor/prom-agent-batcopter.yaml:/etc/prometheus/prometheus.yaml -v /root/prom-batcopter-data:/prometheus -u root --name prom-agent-batcopter quay.io/prometheus/prometheus:main --enable-feature=agent --config.file=/etc/prometheus/prometheus.yaml --storage.tsdb.path=/prometheus --web.listen-address=:9091
ts=2021-11-02T16:00:59.789Z caller=main.go:205 level=info msg="Experimental agent mode enabled."
The following flag(s) can not be used in agent mode: ["--enable-feature"]
```

Problem was that PreAction gives us all parsed flag. Context does not give us any info on what flag clause it was defined.

Also added info for flag help about being server or agent only.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* gofumpt.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2021-11-04 09:08:53 +00:00
Augustin Husson 17fc57948a
codemirror-promql moved to prometheus org (#9651)
Signed-off-by: Augustin Husson <husson.augustin@gmail.com>
2021-11-03 12:46:58 +01:00
Marco Pracucci 9f5ff5b269
Allow to disable trimming when querying TSDB (#9647)
* Allow to disable trimming when querying TSDB

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Addressed review comments

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added unit test

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Renamed TrimDisabled to DisableTrimming

Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-11-03 15:38:34 +05:30
sniper f82e56fbba
fix request bytes size and continue is useless (#9635)
Signed-off-by: kalmanzhao <kalmanzhao@tencent.com>

Co-authored-by: kalmanzhao <kalmanzhao@tencent.com>
2021-11-03 14:40:31 +05:30
Marco Pracucci edd05d7010
Add Head.AppendableMinValidTime() (#9643)
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-11-03 13:09:54 +05:30
Julien Pivotto b40e254f25
Agent: Add a boolean to the index to indicate agent mode. (#9649)
I would like to avoid extra API call's to determine if we are running in
Agent Mode, so I think we could use this approach.

This is a bootstrap of #9612

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-11-03 03:47:14 +00:00
Mateusz Gozdek ea924746b3
discovery/kubernetes: improve test logic for waiting for discoverers (#9584)
When running tests in parallel, 10 milliseconds may not be enough for
all discoverers to register, which will make test flaky.

This commit changes the waiting logic to wait for number of discoverers
to stop increasing during given time frame, which should be large enough
for single discoverer to register in test environment.

A following run passes with this commit:

go test -failfast -race -count 100 -v ./discovery/kubernetes/

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
2021-11-02 22:17:32 +01:00
Mateusz Gozdek c3beca72e2 cmd/prometheus: wait for Prometheus to shutdown in tests
So temporary data directory can be successfully removed, as on Windows,
directory cannot be in used while removal.

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
2021-11-02 20:14:19 +01:00
Mateusz Gozdek 01c5582216 .golangci.yml: enable gofumpt and goimports linters
For imports and more opinionated code formatting.

Closes #9557

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
2021-11-02 19:52:34 +01:00
Mateusz Gozdek ce65883588 .golangci.yml: don't lint autogenerated files
So when we enable linters for formatting, they do not complain about
those files.

Refs #9557

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
2021-11-02 19:52:34 +01:00
Mateusz Gozdek b7bdf6fab2 Fix imports formatting
According to
2829908806 (r58457095).

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
2021-11-02 19:52:34 +01:00
Mateusz Gozdek 1a6c2283a3 Format Go source files using 'gofumpt -w -s -extra'
Part of #9557

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
2021-11-02 19:52:34 +01:00
Julien Pivotto b1e8e8a0ca
Merge pull request #9642 from prometheus/release-2.31
Merge back release 2.31
2021-11-02 14:19:28 +01:00
Julien Pivotto 807f46a1ed
Gate agent behind a feature flag, valide mode flags (#9620)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-11-02 13:03:35 +00:00
Julien Pivotto 6e1d6edb33
Exclude agent from windows tests (#9645)
We are aware of the issue, but while we are working on it,
having main tests broken is an annoyance.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-11-02 13:58:51 +01:00
Julien Pivotto d4c83da6d2
Release 2.31 (#9639)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-11-02 11:07:15 +01:00
Björn Rabenstein b862218389
Merge pull request #9588 from darshanime/kahan
Use kahan summation for better numerical stability
2021-11-01 14:58:22 +01:00
Darshan Chaudhary a7e554b158
add check service-discovery command (#8970)
Signed-off-by: darshanime <deathbullet@gmail.com>
2021-11-01 14:42:12 +01:00
Hu Shuai 4b799c361a
Fix in typo in cmd/prometheus/main.go (#9632)
Signed-off-by: Hu Shuai <hus.fnst@cn.fujitsu.com>
2021-11-01 16:08:23 +05:30
chenlujjj 660329d5b3
add tombstoneFormatVersionSize & tombstonesCRCSize constants (#9625)
Signed-off-by: chenlujjj <953546398@qq.com>
2021-11-01 16:05:19 +05:30
Praveen Ghuge 64d9b41998
Use testing.T.TempDir() instead of ioutil.TempDir() in tsdb/wal unit tests (#9602)
Signed-off-by: Praveen Ghuge <praveen.ghuge@outlook.com>
2021-11-01 12:28:18 +05:30
darshanime 42d786f1ac use kahan summation for aggregation functions
Signed-off-by: darshanime <deathbullet@gmail.com>
2021-10-30 19:41:36 +05:30
darshanime 694b872dee address stylistic nits
Signed-off-by: darshanime <deathbullet@gmail.com>
2021-10-30 19:08:23 +05:30
darshanime a905354da3 use kahan for avg_over_time
Signed-off-by: darshanime <deathbullet@gmail.com>
2021-10-30 19:04:18 +05:30
darshanime 0a9deb9597 use kahan summation for numerical stability
Signed-off-by: darshanime <deathbullet@gmail.com>
2021-10-30 19:04:18 +05:30