* tsdb: Bug fix for further continued after crash deletions; added more tests.
Additionally: Added log line for block removal.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Addressed comment.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
## Changes:
* Rename dir when deleting
* Ignoring blocks with broken meta.json on start (we do that on reload)
* Compactor writes <ulid>.tmp-for-creation blocks instead of just .tmp
* Delete tmp-for-creation and tmp-for-deletion blocks during DB open.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* tsdb: Added ChunkQueryable implementations to db; unified compactor, querier and fanout block iterating.
Chained to https://github.com/prometheus/prometheus/pull/7059
* NewMerge(Chunk)Querier now takies multiple primaries allowing tsdb DB code to use it.
* Added single SeriesEntry / ChunkEntry for all series implementations.
* Unified all vertical, and non vertical for compact and querying to single
merge series / chunk sets by reusing VerticalSeriesMergeFunc for overlapping algorithm (same logic as before)
* Added block (Base/Chunk/)Querier for block querying. We then use populateAndTomb(Base/Chunk/) to iterate over chunks or samples.
* Refactored endpoint tests and querier tests to include subtests.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Addressed comments from Brian and Beorn.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fixed snapshot test and added chunk iterator support for DBReadOnly.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fixed race when iterating over Ats first.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fixed tests.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fixed populate block tests.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fixed endpoints test.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fixed test.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Added test & fixed case of head open chunk.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fixed DBReadOnly tests and bug producing 1 sample chunks.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Added cases for partial block overlap for multiple full chunks.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Added extra tests for chunk meta after compaction.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fixed small vertical merge bug and added more tests for that.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* tsdb/chunks: Replace sync/atomic with uber-go/atomic
Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>
* tsdb/heaad: Replace sync/atomic with uber-go/atomic
Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>
* vendor: Make go.uber.org/atomic a direct dependency
There is no modifications to go.sum and vendor/ because
it was already vendored.
Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>
* tsdb: Remove comments referring to the sync/atomic alignment bug
Related: https://golang.org/pkg/sync/atomic/#pkg-note-BUG
Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>
* no panic the head memseries has chunks in it
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
* fix a panic when querying after a wal corruption.
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
* review nits
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
* Add test for reading the data after a wal corruption.
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
Update tsdb/db_test.go
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Update tsdb/db_test.go
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
* spellings
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
* Add errors and Warnings to SeriesSet
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Change Querier interface and refactor accordingly
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Refactor promql/engine to propagate warnings at eval stage
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Address review issues
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Make sure all the series from all Selects are pre-advanced
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Address review issues
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Separate merge series sets
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Clean
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Refactor merge querier failure handling
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Refactored and simplified fanout with improvements from incoming chunk iterator PRs.
* Secondary logic is hidden, instead of weird failed series set logic we had.
* Fanout is well commented
* Fanout closing record all errors
* MergeQuerier improved API (clearer)
* deferredGenericMergeSeriesSet is not needed as we return no samples anyway for failed series sets (next = false).
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fix formatting
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Fix CI issues
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Added final tests for error handling.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Addressed Brian's comments.
* Moved hints in populate to be allocated only when needed.
* Used sync.Once in secondary Querier to achieve all-or-nothing partial response logic.
* Select after first Next is done will panic.
NOTE: in lazySeriesSet in theory we could just panic, I think however we can
totally just return error, it will panic in expand anyway.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Utilize errWithWarnings
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Fix recently introduced expansion issue
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Add tests for secondary querier error handling
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Implement lazy merge
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Add name to test cases
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Reorganize
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Address review comments
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Address review comments
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Remove redundant warnings
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Fix rebase mistake
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
When appending to the head and a chunk is full it is flushed to the disk and m-mapped (memory mapped) to free up memory
Prom startup now happens in these stages
- Iterate the m-maped chunks from disk and keep a map of series reference to its slice of mmapped chunks.
- Iterate the WAL as usual. Whenever we create a new series, look for it's mmapped chunks in the map created before and add it to that series.
If a head chunk is corrupted the currpted one and all chunks after that are deleted and the data after the corruption is recovered from the existing WAL which means that a corruption in m-mapped files results in NO data loss.
[Mmaped chunks format](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/head_chunks.md) - main difference is that the chunk for mmaping now also includes series reference because there is no index for mapping series to chunks.
[The block chunks](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/chunks.md) are accessed from the index which includes the offsets for the chunks in the chunks file - example - chunks of series ID have offsets 200, 500 etc in the chunk files.
In case of mmaped chunks, the offsets are stored in memory and accessed from that. During WAL replay, these offsets are restored by iterating all m-mapped chunks as stated above by matching the series id present in the chunk header and offset of that chunk in that file.
**Prombench results**
_WAL Replay_
1h Wal reply time
30% less wal reply time - 4m31 vs 3m36
2h Wal reply time
20% less wal reply time - 8m16 vs 7m
_Memory During WAL Replay_
High Churn:
10-15% less RAM - 32gb vs 28gb
20% less RAM after compaction 34gb vs 27gb
No Churn:
20-30% less RAM - 23gb vs 18gb
40% less RAM after compaction 32.5gb vs 20gb
Screenshots are in [this comment](https://github.com/prometheus/prometheus/pull/6679#issuecomment-621678932)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
This is technically BREAKING CHANGE, but it was like this from the beginning: I just notice that we rely in
Prometheus on remote read being sorted. This is because we use selected data from remote reads in MergeSeriesSet
which rely on sorting.
I found during work on https://github.com/prometheus/prometheus/pull/5882 that
we do so many repetitions because of this, for not good reason. I think
I found a good balance between convenience and readability with just one method.
Smaller the interface = better.
Also I don't know what TestSelectSorted was testing, but now it's testing sorting.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
This fixes#6992, which was introduced by #6777. There was an
intermediate component which translated TSDB errors into storage errors,
but that component was deleted and this bug went unnoticed, until we
were watching at the Prombench results. Without this, scrape will fail
instead of dropping samples or using "Add" when the series have been
garbage collected.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* tsdb: don't allow ingesting empty labelsets
When we ingest an empty labelset in the head, further blocks can not be
compacted, with the error:
```
level=error ts=2020-02-27T21:26:58.379Z caller=db.go:659 component=tsdb
msg="compaction failed" err="persist head block: write compaction:
add series: out-of-order series added with label set \"{}\" / prev:
\"{}\""
```
We should therefore reject those invalid empty labelsets upfront.
This can be reproduced with the following:
```
cat << END > prometheus.yml
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 1s
basic_auth:
username: test
password: test
metric_relabel_configs:
- regex: ".*"
action: labeldrop
static_configs:
- targets:
- 127.0.1.1:9090
END
./prometheus --storage.tsdb.min-block-duration=1m
```
And wait a few minutes.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
This is part of https://github.com/prometheus/prometheus/pull/5882 that can be done to simplify things.
All todos I added will be fixed in follow up PRs.
* querier.Querier, querier.Appender, querier.SeriesSet, and querier.Series interfaces merged
with storage interface.go. All imports that.
* querier.SeriesIterator replaced by chunkenc.Iterator
* Added chunkenc.Iterator.Seek method and tests for xor implementation (?)
* Since we properly handle SelectParams for Select methods I adjusted min max
based on that. This should help in terms of performance for queries with functions like offset.
* added Seek to deletedIterator and test.
* storage/tsdb was removed as it was only a unnecessary glue with incompatible structs.
No logic was changed, only different source of abstractions, so no need for benchmarks.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Add back Windows CI, we lost it when tsdb was merged into the prometheus
repo. There's many tests failing outside tsdb, so only test tsdb for
now.
Fixes#6513
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
Rather than keeping the offset of each postings list, instead
keep the nth offset of the offset of the posting list. As postings
list offsets have always been sorted, we can then get to the closest
entry before the one we want an iterate forwards.
I haven't done much tuning on the 32 number, it was chosen to try
not to read through more than a 4k page of data.
Switch to a bulk interface for fetching postings. Use it to avoid having
to re-read parts of the posting offset table when querying lots of it.
For a index with what BenchmarkHeadPostingForMatchers uses RAM
for r.postings drops from 3.79MB to 80.19kB or about 48x.
Bytes allocated go down by 30%, and suprisingly CPU usage drops by
4-6% for typical queries too.
benchmark old ns/op new ns/op delta
BenchmarkPostingsForMatchers/Block/n="1"-4 35231 36673 +4.09%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4 563380 540627 -4.04%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4 536782 534186 -0.48%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4 533990 541550 +1.42%
BenchmarkPostingsForMatchers/Block/i=~".*"-4 113374598 117969608 +4.05%
BenchmarkPostingsForMatchers/Block/i=~".+"-4 146329884 139651442 -4.56%
BenchmarkPostingsForMatchers/Block/i=~""-4 50346510 44961127 -10.70%
BenchmarkPostingsForMatchers/Block/i!=""-4 41261550 35356165 -14.31%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4 112544418 116904010 +3.87%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4 112487086 116864918 +3.89%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4 41094758 35457904 -13.72%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4 41906372 36151473 -13.73%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4 147262414 140424800 -4.64%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4 28615629 27872072 -2.60%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4 147117177 140462403 -4.52%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4 175096826 167902298 -4.11%
benchmark old allocs new allocs delta
BenchmarkPostingsForMatchers/Block/n="1"-4 4 6 +50.00%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4 7 11 +57.14%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4 7 11 +57.14%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4 15 17 +13.33%
BenchmarkPostingsForMatchers/Block/i=~".*"-4 100010 100012 +0.00%
BenchmarkPostingsForMatchers/Block/i=~".+"-4 200069 200040 -0.01%
BenchmarkPostingsForMatchers/Block/i=~""-4 200072 200045 -0.01%
BenchmarkPostingsForMatchers/Block/i!=""-4 200070 200041 -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4 100013 100017 +0.00%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4 100017 100023 +0.01%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4 200073 200046 -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4 200075 200050 -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4 200074 200049 -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4 111165 111150 -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4 200078 200055 -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4 311282 311238 -0.01%
benchmark old bytes new bytes delta
BenchmarkPostingsForMatchers/Block/n="1"-4 264 296 +12.12%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4 360 424 +17.78%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4 360 424 +17.78%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4 520 552 +6.15%
BenchmarkPostingsForMatchers/Block/i=~".*"-4 1600461 1600482 +0.00%
BenchmarkPostingsForMatchers/Block/i=~".+"-4 24900801 17259077 -30.69%
BenchmarkPostingsForMatchers/Block/i=~""-4 24900836 17259151 -30.69%
BenchmarkPostingsForMatchers/Block/i!=""-4 24900760 17259048 -30.69%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4 1600557 1600621 +0.00%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4 1600717 1600813 +0.01%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4 24900856 17259176 -30.69%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4 24900952 17259304 -30.69%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4 24900993 17259333 -30.69%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4 3788311 3142630 -17.04%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4 24901137 17259509 -30.69%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4 28693086 20405680 -28.88%
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Fix tsdb panic when querying corrupted chunks.
check that the chunk segment has enough data to read all chunk pieces.
* refactor, simplify and add tests.
* simpfiy WriteChunks implementation
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
This PR gives the readonly DB the ability to create blocks from the WAL.
In order to implement this, we modify DBReadOnly.Blocks() to return an
empty slice and no error if no blocks are found.
xref: https://github.com/prometheus/tsdb/issues/346#issuecomment-520786524
Signed-off-by: Lucas Servén Marín <lserven@gmail.com>