* remote-write: slow down retries to avoid DDOS
Increase the default max retry time from 100ms to 5 seconds.
Remote write calls are retried after a recoverable error such as the
back-end returning 500. Prometheus waits the minimum time and retries,
then doubles the wait on each subsequent retry until the maximum is
reached.
If some data is still getting through, remote-write will also increase
shards, and the default maximum is 200. 200 shards sending every 100ms
is 20 calls per second, to a back-end that is already in trouble.
5 seconds was chosen to match the default BatchSendDeadline: if we can
afford to wait that long for no response, then we can wait the same time
to retry. We will reach 5 seconds after 9 successive failures.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Update config doc for max_backoff change
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
It seems sometimes you can get error like:
Error: Not equal:
expected: []byte(nil)
actual : []byte{}
Diff:
--- Expected
+++ Actual
@@ -1,2 +1,3 @@
-([]uint8) <nil>
+([]uint8) {
+}
This commit does what bytes.Equal does to silence those differences. I'm
not sure if this is a correct solution or just covering up the actual bug.
Closes#9574
Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
This creates a new `model` directory and moves all data-model related
packages over there:
exemplar labels relabel rulefmt textparse timestamp value
All the others are more or less utilities and have been moved to `util`:
gate logging modetimevfs pool runtime
Signed-off-by: beorn7 <beorn@grafana.com>
The rules manager keeps a note of which series were generated by the
last run, so it can write a stale marker to those that disappeared.
Since the keys are not for human eyes, we can use a simpler format
and save the effort of quoting label values.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* add a negative boost for some trigonometric functions that can overlapp other regular promQL functions
Signed-off-by: Augustin Husson <husson.augustin@gmail.com>
* add comments to explain the purpose of the attribute boost
Signed-off-by: Augustin Husson <husson.augustin@gmail.com>
* TSDB: demistify seriesRefs and ChunkRefs
The TSDB package contains many types of series and chunk references,
all shrouded in uint types. Often the same uint value may
actually mean one of different types, in non-obvious ways.
This PR aims to clarify the code and help navigating to relevant docs,
usage, etc much quicker.
Concretely:
* Use appropriately named types and document their semantics and
relations.
* Make multiplexing and demuxing of types explicit
(on the boundaries between concrete implementations and generic
interfaces).
* Casting between different types should be free. None of the changes
should have any impact on how the code runs.
TODO: Implement BlockSeriesRef where appropriate (for a future PR)
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* feedback
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* agent: demistify seriesRefs and ChunkRefs
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
Using the same variable for storage.tsdb.path and storage.agent.path
as below in main.go causes cfg.localStoragePath to be data/ or
data-agent/ at random.
a.Flag("storage.tsdb.path", "Base path for metrics storage.").
PreAction(serverOnlySetting()).
Default("data/").StringVar(&cfg.localStoragePath)
a.Flag("storage.agent.path", "Base path for metrics storage.").
PreAction(agentOnlySetting()).
Default("data-agent/").StringVar(&cfg.localStoragePath)
This patch fixes it by using a different variable for storage.agent.path
Signed-off-by: Sunil Thaha sthaha@redhat.com
Signed-off-by: Sunil Thaha <sthaha@redhat.com>
* Fix misleading agent-only/server-only check messages.
Issue:
```
[root@host01 ~]# docker run -it --net=host --rm -v /root/editor/prom-agent-batcopter.yaml:/etc/prometheus/prometheus.yaml -v /root/prom-batcopter-data:/prometheus -u root --name prom-agent-batcopter quay.io/prometheus/prometheus:main --enable-feature=agent --config.file=/etc/prometheus/prometheus.yaml --storage.tsdb.path=/prometheus --web.listen-address=:9091
ts=2021-11-02T16:00:59.789Z caller=main.go:205 level=info msg="Experimental agent mode enabled."
The following flag(s) can not be used in agent mode: ["--enable-feature"]
```
Problem was that PreAction gives us all parsed flag. Context does not give us any info on what flag clause it was defined.
Also added info for flag help about being server or agent only.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* gofumpt.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Allow to disable trimming when querying TSDB
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Addressed review comments
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Added unit test
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Renamed TrimDisabled to DisableTrimming
Signed-off-by: Marco Pracucci <marco@pracucci.com>
I would like to avoid extra API call's to determine if we are running in
Agent Mode, so I think we could use this approach.
This is a bootstrap of #9612
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
When running tests in parallel, 10 milliseconds may not be enough for
all discoverers to register, which will make test flaky.
This commit changes the waiting logic to wait for number of discoverers
to stop increasing during given time frame, which should be large enough
for single discoverer to register in test environment.
A following run passes with this commit:
go test -failfast -race -count 100 -v ./discovery/kubernetes/
Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
So temporary data directory can be successfully removed, as on Windows,
directory cannot be in used while removal.
Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
We are aware of the issue, but while we are working on it,
having main tests broken is an annoyance.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>