* Moves the 64bit atomically accessed field to the top of the struct.
Signed-off-by: Bryan Varner <1652015+bvarner@users.noreply.github.com>
* Moves the 64bit atomically accessed field to the top of the struct.
Signed-off-by: Bryan Varner <1652015+bvarner@users.noreply.github.com>
* Fixing up go fmt formatting issues.
Signed-off-by: Bryan Varner <1652015+bvarner@users.noreply.github.com>
Co-authored-by: Bryan Varner <1652015+bvarner@users.noreply.github.com>
When appending to the head and a chunk is full it is flushed to the disk and m-mapped (memory mapped) to free up memory
Prom startup now happens in these stages
- Iterate the m-maped chunks from disk and keep a map of series reference to its slice of mmapped chunks.
- Iterate the WAL as usual. Whenever we create a new series, look for it's mmapped chunks in the map created before and add it to that series.
If a head chunk is corrupted the currpted one and all chunks after that are deleted and the data after the corruption is recovered from the existing WAL which means that a corruption in m-mapped files results in NO data loss.
[Mmaped chunks format](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/head_chunks.md) - main difference is that the chunk for mmaping now also includes series reference because there is no index for mapping series to chunks.
[The block chunks](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/chunks.md) are accessed from the index which includes the offsets for the chunks in the chunks file - example - chunks of series ID have offsets 200, 500 etc in the chunk files.
In case of mmaped chunks, the offsets are stored in memory and accessed from that. During WAL replay, these offsets are restored by iterating all m-mapped chunks as stated above by matching the series id present in the chunk header and offset of that chunk in that file.
**Prombench results**
_WAL Replay_
1h Wal reply time
30% less wal reply time - 4m31 vs 3m36
2h Wal reply time
20% less wal reply time - 8m16 vs 7m
_Memory During WAL Replay_
High Churn:
10-15% less RAM - 32gb vs 28gb
20% less RAM after compaction 34gb vs 27gb
No Churn:
20-30% less RAM - 23gb vs 18gb
40% less RAM after compaction 32.5gb vs 20gb
Screenshots are in [this comment](https://github.com/prometheus/prometheus/pull/6679#issuecomment-621678932)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
* storage: Added Chunks{Queryable/Querier/SeriesSet/Series/Iteratable. Added generic Merge{SeriesSet/Querier} implementation.
## Rationales:
In many places (e.g. chunk Remote read, Thanos Receive fetching chunk from TSDB), we operate on encoded chunks not samples.
This means that we unnecessary decode/encode, wasting CPU, time and memory.
This PR adds chunk iterator interfaces and makes the merge code to be reused between both seriesSets
I will make the use of it in following PR inside tsdb itself. For now fanout implements it and mergers.
All merges now also allows passing series mergers. This opens doors for custom deduplications other than TSDB vertical ones (e.g. offline one we have in Thanos).
## Changes
* Added Chunk versions of all iterating methods. It all starts in Querier/ChunkQuerier. The plan is that
Storage will implement both chunked and samples.
* Added Seek to chunks.Iterator interface for iterating over chunks.
* NewMergeChunkQuerier was added; Both this and NewMergeQuerier are now using generigMergeQuerier to share the code. Generic code was added.
* Improved tests.
* Added some TODO for further simplifications in next PRs.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Addressed Brian's comments.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Moved s/Labeled/SeriesLabels as per Krasi suggestion.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Addressed Krasi's comments.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Second iteration of Krasi comments.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Another round of comments.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fix tsdb panic when querying corrupted chunks.
check that the chunk segment has enough data to read all chunk pieces.
* refactor, simplify and add tests.
* simpfiy WriteChunks implementation
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>