We were determining a chunk's end time once it was one quarter full to
compute it so all chunks have uniform number of samples.
This accidentally skipped the case where series started near the end of
a chunk range/block and never reached that threshold. As a result they
got persisted but were continued across the range.
This resulted in corrupted persisted data.
This ensures we close all previously opened queriers if on of the block
querier fails to open.
Also swap in new blocks before closing old ones to avoid the situation
in general. Make read locking of blocks more conservative to avoid
unnecessary retries by clients, e.g. when blocks are getting closed
before we can successfully instantiate querier against them.
The postings list index may point to series that no longer
exist during garbage collection. This clarifies that this is valid
behavior.
It would be possible, though more complex, to always keep them in sync.
However, series existance means nothing in itself as the queried time
range defines whether there's actual data. Thus our definition is sane
overall as long as drift is kept small.
This commit introduces error returns in various places and is explicit
about closing persisted blocks.
{Index,Chunk,Tombstone}Readers are more consistent about their Close()
method. Whenever a reader is retrieved, the corresponding close method
must eventually be called. We use this to track pending readers against
persisted blocks.
Querier's against the DB no longer hold a read lock for their entire
lifecycle. This avoids long running queriers to starve new ones when we
have to acquire a write lock when reloading blocks.
This changes the structure to a single WAL backed by a single head
block.
Parts of the head block can be compacted. This relieves us from any head
amangement and greatly simplifies any consistency and isolation concerns
by just having a single head.
Change index persistence for series to not be accumulated in memory
before being written as one large batch. `Labels` and `ChunkMeta`
objects are reused.
This cuts down memory spikes during compaction of multiple blocks
significantly.
As part of the the Index{Reader,Writer} now have an explicit notion of
symbols and series must be inserted in order.
Implement labels.PrefixMatcher and use interface conversion in querier
to optimize label tuples search.
[unit-tests]: Fix bug and populate label index for mock index.
Signed-off-by: Dmitry Ilyevsky <ilyevsky@gmail.com>
Let older head blocks be compacted once the newest once has samples at
50% of its total range. This allows the memory of the compacted blocks
to be released and garbage collected before a new head block gets
created. Thereby the number of head blocks is 1 or 2 instead of 2 or 3
and memory spikes are reduced.
* Expose Stone as it is used in an exported method.
* Move from tombstoneReader to []Stone for the same reason as above.
* Make WAL reading a little cleaner.
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
When we have only one chunk that is out of range, then we are returning
it unpopulated (w/o calling `Chunk(ref)`). This would cause a panic
downstream.
Fixes: prometheus/prometheus#2629
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Mutating the chunks can change their length. Hence referencing using
previous indices can cause panics.
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Fixes#43
Added mint, maxt to chunkSeriesIterator. Adding a field there is
inevitable as something similar is required for ignoring deleted
time-ranges.
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
This parallelizes commits to prevent deadlocks across inconsistently
locked heads. As commits are currently not fully atomic across
heads, this does decrease our guarantees.
This adds the Queryable interface to the Block interface. Head and
persisted blocks now implement their own Querier() method and thus
isolate customization (e.g. remapPostings) more cleanly.
This has been a common source of hard to debug issues. Its a premature
and unbenchmarked optimization and semantically, we want ChunkMetas to
be references in all changed cases.
This simplifies some of the iterators by loading chunks from the
ChunkReader earlier, filtering of chunks vs filtering or series is
split into separate iterators for easier testing
Introduce a seperate mutex for the head blocks to avoid a race where
a post-compaction reload may run between switching the DB's base mutex
to create a new head block in an appender.
This adds write path support for segmented chunk data files.
Files of 512MB are pre-allocated and written to. If the file size
is exceeded, the next file is started. On completion, files
are truncated to their final size.