This fixes different race condition encoutnered when running Prometheus.
It reduces the overall performance in the synthetic benchmark a fair bit
but has no indiciations of impacting a real-world setup notably.
This adds the Queryable interface to the Block interface. Head and
persisted blocks now implement their own Querier() method and thus
isolate customization (e.g. remapPostings) more cleanly.
This adds more lower-leve interfaces which are used to compose
to different Block interfaces.
The DB only uses interfaces instead of explicit persistedBlock and
headBlock. The headBlock generation property is dropped as the use-case
can be implemented using block sequence numbers.
With hundreds of concurrent appenders the locking order between the
headBlocks on instantiating appenders and write locking the headmtx
is hard to impossible to get consistent.
Just never instantiate appenders while holding the headmtx lock in any
way.
This fixes a bug where the last WAL file was closed after consuming it
instead of being left open for further writes.
Reloading of blocks on startup considers loading head blocks now.
Introduce a seperate mutex for the head blocks to avoid a race where
a post-compaction reload may run between switching the DB's base mutex
to create a new head block in an appender.
This addresses an issue where the compaction triggered on cutting
a new block doesn't find anything as the writers are still active on the
block that should be ready for compaction.
This adds write path support for segmented chunk data files.
Files of 512MB are pre-allocated and written to. If the file size
is exceeded, the next file is started. On completion, files
are truncated to their final size.
File locks have a multitude of problems that make them hard to use
correctly. As they are just advisory, they are only meaningful to
prevent accidents like running the same process twice.
A simple PID file lock works reliably in those cases and is simpler.
This is an initial (and hacky) first pass on allowing
appending to multiple blocks simultaniously to avoid
dropping samples right after cutting a new head block.
It's also required for cases like the PGW, where a scrape may
contain varying timestamps.