Thanos can create and destroy TSDBs dynamically, and once a TSDB
disappears its files are deleted. Calculating the size of the
WAL then fails with errors like:
```
msg: "Failed to calculate size of "wal" dir", "err": "lstat
/tsdbdir/wal: no such file or directory", "caller": "wlog.go:271"
```
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Jonathan Halterman <jonathan@grafana.com>
Signed-off-by: Jonathan Halterman <jhalterman@gmail.com>
Co-authored-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>
* Stop compactions if there's a block to write
db.Compact() checks if there's a block to write with HEAD chunks before calling db.compactBlocks().
This is to ensure that if we need to write a block then it happens ASAP, otherwise memory usage might keep growing.
But what can also happen is that we don't need to write any block, we start db.compactBlocks(),
compaction takes hours, and in the meantime HEAD needs to write out chunks to a block.
This can be especially problematic if, for example, you run Thanos sidecar that's uploading block,
which requires that compactions are disabled. Then you disable Thanos sidecar and re-enable compactions.
When db.compactBlocks() is finally called it might have a huge number of blocks to compact, which might
take a very long time, during which HEAD cannot write out chunks to a new block.
In such case memory usage will keep growing until either:
- compactions are finally finished and HEAD can write a block
- we run out of memory and Prometheus gets OOM-killed
This change adds a check for pending HEAD block writes inside db.compactBlocks(), so that
we bail out early if there are still compactions to run, but we also need to write a new
block.
Also add a test for compactBlocks.
---------
Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
Signed-off-by: Lukasz Mierzwa <lukasz@cloudflare.com>
* TSDB: Don't compact the head block when empty
Don't compact the Head block if there have not yet been any samples
appended.
Previously, the logic for determining if the head should be compacted
relied on the default values for min and max time and integer overflow
when they were checked in `Head.compactable()`. The check in
`Head.compactable()` effectively did `math.MinInt64 - math.MaxInt64`
which overflowed and wrapped to `1`. Since `1` is less than `1.5`
times the chunk range, compaction did not happen. This was the correct
behavior but relying on overflow wrapping is surprising.
This change add a method for checking if the min and max time for the
head is unset and uses it to short-circuit compaction in that case.
It also replaces several explicit checks for the default value to
determine if the head has not yet had any samples added.
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
* tsdb: zero out Labels and memSeries pointers in pool
So that the garbage-collector doesn't see this memory as still in use.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
---------
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Björn Rabenstein <github@rabenste.in>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
Dogfood native histograms.
Allow dependent projects to migrate to native histograms.
I took the defaults from client_golang.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Clarify in the first comment that it is `watch()` that waits, and reduce
verbiage.
The second comment was slightly contradictory to the first and otherwise
didn't seem to add much, since `currentSegment` was incremented just a
few lines later.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* fix bug that would cause us to only read from the WAL on the 15s
fallback timer if remote write had fallen behind and is no longer
reading from the WAL segment that is currently being written to
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* remove unintended logging, fix lint, plus allow test to take slightly
longer because cloud CI
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* address review feedback
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* fix watcher sleeps in test, flu brain is smooth
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* increase timeout, unfortunately cloud CI can require a longer timeout
Signed-off-by: Callum Styan <callumstyan@gmail.com>
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>