We were still fsyncing while holding the write lock when we cut a new
segment. Given we cannot do anything but logging errors, we might just
as well complete segments asynchronously.
There's not realistic use case where one would fsync after every WAL
entry, thus make the default of a flush interval of 0 to never fsync
which is a much more likely use case.
This adds various new locks to replace the single big lock on
the head. All parts now must be COW as they may be held by clients
after initial retrieval.
Series by ID and hashes are now held in a stripe lock to reduce
contention and total holding time during GC. This should reduce
starvation of readers.
This changes the structure to a single WAL backed by a single head
block.
Parts of the head block can be compacted. This relieves us from any head
amangement and greatly simplifies any consistency and isolation concerns
by just having a single head.
Change index persistence for series to not be accumulated in memory
before being written as one large batch. `Labels` and `ChunkMeta`
objects are reused.
This cuts down memory spikes during compaction of multiple blocks
significantly.
As part of the the Index{Reader,Writer} now have an explicit notion of
symbols and series must be inserted in order.
This is because we cut a new block from where the snapshotted block ends
if we restore from backups and highTimestamp would be where we should be
starting from.
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Let older head blocks be compacted once the newest once has samples at
50% of its total range. This allows the memory of the compacted blocks
to be released and garbage collected before a new head block gets
created. Thereby the number of head blocks is 1 or 2 instead of 2 or 3
and memory spikes are reduced.
Race:
Suppose we have 100 existing series inside a HeadBlock.
Now we open two appenders in two routines A1, A2 and append 30 new series and
60 new series respectively with some common series.
Both try to commit at the same time and the following happens in the given order:
A2 executes createSeries()
A1 executes createSeries() (with its common series referencing the ids from A2)
A1 persists its newlabels, samples
A2 persists its newlabels, samples
Now when reading it back, we read A1's samples which reference A2's id and
thereby fail.
Ref: prometheus/promtheus#2795
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
* Expose Stone as it is used in an exported method.
* Move from tombstoneReader to []Stone for the same reason as above.
* Make WAL reading a little cleaner.
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
* Make sure no reads happen on the block when delete is in progress.
* Fix bugs in compaction.
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
We need to recalculate the sorted ref list everytime we make a
Tombstones() call. This avoids that.
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>