Commit graph

22 commits

Author SHA1 Message Date
Marco Pracucci 4b49ffbad5
Stop the bleed on chunk mapper panic (#8723)
* Added test to reproduce panic on TSDB head chunks truncated while querying

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added test for Querier too

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Stop the bleed on mmap-ed head chunks panic

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Lower memory pressure in tests to ensure it doesn't OOM

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Skip TestQuerier_ShouldNotPanicIfHeadChunkIsTruncatedWhileReadingQueriedChunks

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Experiment to not trigger runtime.GC() continuously

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Try to fix test in CI

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Do not call runtime.GC() at all

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* I have no idea why it's failing in CI, skipping tests

Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-05-06 14:18:59 -06:00
Bartlomiej Plotka 8bf7bc68f1
Fixed TestChunkDiskMapper_WriteChunk_Chunk_IterateChunks for go1.16 (#8538)
Fixes https://github.com/prometheus/prometheus/issues/8403

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2021-02-25 14:38:12 +05:30
Guangwen Feng e2cd6c5f57 Fix golint issue caused by typo
Signed-off-by: Guangwen Feng <fenggw-fnst@cn.fujitsu.com>
2021-01-06 15:54:35 +08:00
Marco Pracucci db19e05d93
Add option to customise head chunks write buffer size (#8201)
* Add option to customise head chunks write buffer size

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed tests

Signed-off-by: Marco Pracucci <marco@pracucci.com>
2020-11-19 18:30:47 +05:30
Julien Pivotto 8bc369bf9b
Calculate head chunk size based on actual disk usage (#8139)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-11-03 15:34:59 +05:30
Bartlomiej Plotka 3d8826a3d4
MultiError: Refactored MultiError for more concise and safe usage. (#8066)
* MultiError: Refactored MultiError for more concise and safe usage.

* Less lines
* Goland IDE was marking every usage of old MultiError "potential nil" error
* It was easy to forgot using Err() when error was returned, now it's safely assured on compile time.

NOTE: Potentially I would rename package to merrors. (: In different PR.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed review comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fix after rebase.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-10-28 15:24:58 +00:00
Julien Pivotto 4e5b1722b3
Move away from testutil, refactor imports (#8087)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-22 11:00:08 +02:00
Ganesh Vernekar 2624d827fa
Read repair empty last file in chunks_head (#8061)
* Read repair empty file in chunks_head

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Refactor and introduce repairLastChunkFile

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Attempt windows test fix

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-10-21 18:27:13 +05:30
Ganesh Vernekar c806262206
Fix 'chunks.HeadReadWriter: maxt of the files are not set' error (#7856)
* Fix chunks.HeadReadWriter: maxt of the files are not set

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-08-26 19:59:18 +02:00
Yukun Sun cfd4e05c9e
fix: return a corruption error when iterator function find a chunk that is out of sequence (#7855)
Signed-off-by: sunyukun <sunyukun@didiglobal.com>

Co-authored-by: sunyukun <sunyukun@didiglobal.com>
2020-08-26 20:36:27 +05:30
Max Neverov bb5c6b38e2
Fix Possible Race Condition in TSDB (#7815)
* Replace tsdb chunk mapper size with atomic; protect mmappedChunkFiles with read path mutex on DeleteCorrupted

Signed-off-by: Max Neverov <neverov.max@gmail.com>

* PR fixes

Signed-off-by: Max Neverov <neverov.max@gmail.com>
2020-08-26 14:22:48 +05:30
Javier Palomo Almena 348ff4285f
tsdb: Replace sync/atomic with uber-go/atomic in tsdb (#7659)
* tsdb/chunks: Replace sync/atomic with uber-go/atomic

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* tsdb/heaad: Replace sync/atomic with uber-go/atomic

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* vendor: Make go.uber.org/atomic a direct dependency

There is no modifications to go.sum and vendor/ because
it was already vendored.

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* tsdb: Remove comments referring to the sync/atomic alignment bug

Related: https://golang.org/pkg/sync/atomic/#pkg-note-BUG

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>
2020-07-28 10:12:42 +05:30
johncming 9801f52b0a
tsdb/chunks: fix bug of data race(#7643). (#7646)
Signed-off-by: johncming <johncming@yahoo.com>
2020-07-23 18:05:19 +05:30
Ganesh Vernekar b7c46a8c79
Merge remote-tracking branch 'upstream/master' into merge-release-2.19
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-06-19 12:40:29 +05:30
Ganesh Vernekar 48fae12b89
Fix unsequential m-map files (#7414)
* Fix unsequential m-map files

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-06-18 19:24:58 +05:30
Simon Pasquier 2f12049371
tsdb: improve logs when encountering corruption (#7308)
* tsdb: improve logs when encountering corruption

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Wrap corrupted block errors

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Add file path to head chunks

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-06-17 16:40:00 +02:00
Ganesh Vernekar 1627d234da
Moves the atomically accessed member to the top of the struct (#7365)
* Moves the 64bit atomically accessed field to the top of the struct.

Signed-off-by: Bryan Varner <1652015+bvarner@users.noreply.github.com>

* Moves the 64bit atomically accessed field to the top of the struct.

Signed-off-by: Bryan Varner <1652015+bvarner@users.noreply.github.com>

* Fixing up go fmt formatting issues.

Signed-off-by: Bryan Varner <1652015+bvarner@users.noreply.github.com>

Co-authored-by: Bryan Varner <1652015+bvarner@users.noreply.github.com>
2020-06-09 10:55:43 +05:30
Ganesh Vernekar a1355eb7c7
Remove time based m-map file creation (#7314)
* Remove time based m-map file creation

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-05-29 20:08:41 +05:30
Ganesh Vernekar 83619aa9ac
Preallocate m-map file only for Windows (#7306)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-05-28 20:24:19 +05:30
Krasimir Georgiev 09df8d94e0
More explicit chunks and head error handling. (#7277) 2020-05-22 12:03:23 +03:00
Ganesh Vernekar d4b9fe801f
M-map full chunks of Head from disk (#6679)
When appending to the head and a chunk is full it is flushed to the disk and m-mapped (memory mapped) to free up memory

Prom startup now happens in these stages
 - Iterate the m-maped chunks from disk and keep a map of series reference to its slice of mmapped chunks.
- Iterate the WAL as usual. Whenever we create a new series, look for it's mmapped chunks in the map created before and add it to that series.

If a head chunk is corrupted the currpted one and all chunks after that are deleted and the data after the corruption is recovered from the existing WAL which means that a corruption in m-mapped files results in NO data loss.

[Mmaped chunks format](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/head_chunks.md)  - main difference is that the chunk for mmaping now also includes series reference because there is no index for mapping series to chunks.
[The block chunks](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/chunks.md) are accessed from the index which includes the offsets for the chunks in the chunks file - example - chunks of series ID have offsets 200, 500 etc in the chunk files.
In case of mmaped chunks, the offsets are stored in memory and accessed from that. During WAL replay, these offsets are restored by iterating all m-mapped chunks as stated above by matching the series id present in the chunk header and offset of that chunk in that file.

**Prombench results**

_WAL Replay_

1h Wal reply time
30% less wal reply time - 4m31 vs 3m36
2h Wal reply time
20% less wal reply time - 8m16 vs 7m

_Memory During WAL Replay_

High Churn:
10-15% less RAM -  32gb vs 28gb
20% less RAM after compaction 34gb vs 27gb
No Churn:
20-30% less RAM -  23gb vs 18gb
40% less RAM after compaction 32.5gb vs 20gb

Screenshots are in [this comment](https://github.com/prometheus/prometheus/pull/6679#issuecomment-621678932)


Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-05-06 21:00:00 +05:30
Ganesh Vernekar e50fdbc70c
Live m-mapping of chunks on disk (#6830)
* Live m-mapping of chunks on disk

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments Part 2

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments Part 3

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments Part 4

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Attempt to fix windows bug

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-03-19 22:03:44 +05:30