* move the wal repair logic in db.Open
This is to allow opening a wal in a read oly mode without triggering a
repair.
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
With the next release of client_golang, Summaries will not have
objectives by default.
As it turns out, for prometheus_tsdb_head_gc_duration_seconds and
prometheus_tsdb_wal_truncate_duration_seconds, the objective-less
default makes more sense then the current default.
To make sure we do the right thing before and after the upcoming
release of client_golang, I have set the objectives explicitly
wherever that was not the case so far:
- prometheus_tsdb_head_gc_duration_seconds and
prometheus_tsdb_wal_truncate_duration_seconds now have no objectives
explicitly.
- prometheus_tsdb_wal_fsync_duration_seconds now explicitly uses the
previous default objectives.
Signed-off-by: beorn7 <beorn@grafana.com>
This reduces disk space usage to not be a minimum of 3 128MB files
in small setups. This will possibly also help debug wal data issues,
by making things a bit more deterministic.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Always create a new clean segment when starting the WAL.
* Ensure we flush the last page after repairing and before recreating the
new segment in Repair.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Currently a time series with empty labels is not treated the same
as one with missing labels. Currently this can only come from
ALERTS&ALERT_FOR_STATE so it's unlikely anyone has actually hit it.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
If all the samples are deleted for a series,
we should still keep the series in the WAL as
anything else reading the WAL will still care
about it in order to understand the samples.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
This change also uses the latest staticcheck version which comes with
new verifications, hence some clean up in the code.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
With 1M series:
Before:
BenchmarkHeadPostingForMatchers-8 1 3501996117 ns/op 61311520 B/op 78 allocs/op
After:
BenchmarkHeadPostingForMatchers-8 1 1403072952 ns/op 69261568 B/op 72 allocs/op
This works out as 3X faster, as the above time includes other things.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
closes https://github.com/prometheus/tsdb/issues/471
after implementing the new WAL this metric was missing so adding it again.
Also added it in a test to make sure it works as expected.
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
* refactor NewSegmentsRangeReader to take multi WAL ranges
In case of an error when checkpointing the WAL the error doesn't show
the exact WAL index that is corrupter. this is because it uses
MultiReader to read multiply WAL files.
This refactoring allows the NewSegmentsRangeReader to take more than a
single WAL range and it reads all of the ranges by iterating each one.
this changes the logs from
create checkpoint: read segments: corruption after 4841144384 bytes:...
to
create checkpoint: read segments: corruption in segment
data/wal/00017351 at 123142208: ...
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
* repair wal when the record cannot be decoded
Currently repair is run only when the error happens in the reader.
A corruption can occur after the record is read and when it is decoded.
This change wraps the error at decoding as a CorruptionErr as this error
is expected to trigger a repair.
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
unexported NewMemTombstones as this returns unexported memTombstones
type which will not be shows in godoc.
Added missing comments for exported methods.
Removed unused RecordLogger,RecordReader interfaces.
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
Calculating the modulus in each worker was a hotspot,
and meant that you had more work to do the more cores you had.
This cuts CPU usage (on my 8 core, 4 real core machine) by
33%, and walltime by 3%
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
This is read far more than it changes.
This cuts ~14% off walltme and ~27% off CPU for WAL reading.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
With the various goroutines running, the locking
in getByID is notable. This cuts cpu usage by ~25%
and walltime by ~20%.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
Some benchmarks for HEAD and allocate the correct slice size in LabelValues , we already know what it'll be
This is ~15% time improvement, and ~25% allocation improvement:
```
benchmark old ns/op new ns/op delta
BenchmarkHeadPostingForMatchers-4 74452 63514 -14.69%
benchmark old allocs new allocs delta
BenchmarkHeadPostingForMatchers-4 20 13 -35.00%
benchmark old bytes new bytes delta
BenchmarkHeadPostingForMatchers-4 5425 3137 -42.18%
```
Signed-off-by: Thomas Jackson <jacksontj.89@gmail.com>
This reverts commit 98fe30438c.
After some discussion, it was concluded that we want the full
`prometheus_tsdb_...` prefix hardcoded in the library.
Signed-off-by: beorn7 <beorn@soundcloud.com>
This fixes various issues when initializing the head time range
under different starting conditions.
Signed-off-by: Fabian Reinartz <freinartz@google.com>