This ensures we close all previously opened queriers if on of the block
querier fails to open.
Also swap in new blocks before closing old ones to avoid the situation
in general. Make read locking of blocks more conservative to avoid
unnecessary retries by clients, e.g. when blocks are getting closed
before we can successfully instantiate querier against them.
This commit introduces error returns in various places and is explicit
about closing persisted blocks.
{Index,Chunk,Tombstone}Readers are more consistent about their Close()
method. Whenever a reader is retrieved, the corresponding close method
must eventually be called. We use this to track pending readers against
persisted blocks.
Querier's against the DB no longer hold a read lock for their entire
lifecycle. This avoids long running queriers to starve new ones when we
have to acquire a write lock when reloading blocks.
This adds various new locks to replace the single big lock on
the head. All parts now must be COW as they may be held by clients
after initial retrieval.
Series by ID and hashes are now held in a stripe lock to reduce
contention and total holding time during GC. This should reduce
starvation of readers.
This changes the structure to a single WAL backed by a single head
block.
Parts of the head block can be compacted. This relieves us from any head
amangement and greatly simplifies any consistency and isolation concerns
by just having a single head.
This fixes the case where between block creations no compaction
plans are ran. We were not compacting anything in these
cases since the on creation the most recent head block always had
a high timestamp of 0.
This is because we cut a new block from where the snapshotted block ends
if we restore from backups and highTimestamp would be where we should be
starting from.
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Let older head blocks be compacted once the newest once has samples at
50% of its total range. This allows the memory of the compacted blocks
to be released and garbage collected before a new head block gets
created. Thereby the number of head blocks is 1 or 2 instead of 2 or 3
and memory spikes are reduced.
* Make sure no reads happen on the block when delete is in progress.
* Fix bugs in compaction.
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
This parallelizes commits to prevent deadlocks across inconsistently
locked heads. As commits are currently not fully atomic across
heads, this does decrease our guarantees.
This adds a workaround to avoid deadlocks for inconsistent write lock
order across headBlocks.
Things keep working if transactions only append data for the same
timestamp, which is generally the case for Prometheus.
Full behavior should be restored in a subsequent change.
GC is triggered rarely, which may cause unnecessarily high memory
spikes when running several compaction cycles in a row. Explicitly run
GC so we don't have idle bytes marked as used from the previous cycle.
This fixes different race condition encoutnered when running Prometheus.
It reduces the overall performance in the synthetic benchmark a fair bit
but has no indiciations of impacting a real-world setup notably.
This adds the Queryable interface to the Block interface. Head and
persisted blocks now implement their own Querier() method and thus
isolate customization (e.g. remapPostings) more cleanly.
This adds more lower-leve interfaces which are used to compose
to different Block interfaces.
The DB only uses interfaces instead of explicit persistedBlock and
headBlock. The headBlock generation property is dropped as the use-case
can be implemented using block sequence numbers.
With hundreds of concurrent appenders the locking order between the
headBlocks on instantiating appenders and write locking the headmtx
is hard to impossible to get consistent.
Just never instantiate appenders while holding the headmtx lock in any
way.
This fixes a bug where the last WAL file was closed after consuming it
instead of being left open for further writes.
Reloading of blocks on startup considers loading head blocks now.
Introduce a seperate mutex for the head blocks to avoid a race where
a post-compaction reload may run between switching the DB's base mutex
to create a new head block in an appender.
This addresses an issue where the compaction triggered on cutting
a new block doesn't find anything as the writers are still active on the
block that should be ready for compaction.