This PR is a reference implementation of the proposal described in #10420.
In addition to what described in #10420, in this PR I've introduced labels.StableHash(). The idea is to offer an hashing function which doesn't change over time, and that's used by query sharding in order to get a stable behaviour over time. The implementation of labels.StableHash() is the hashing function used by Prometheus before stringlabels, and what's used by Grafana Mimir for query sharding (because built before stringlabels was a thing).
Follow up work
As mentioned in #10420, if this PR is accepted I'm also open to upload another foundamental piece used by Grafana Mimir query sharding to accelerate the query execution: an optional, configurable and fast in-memory cache for the series hashes.
Signed-off-by: Marco Pracucci <marco@pracucci.com>
The ChunkReader interface's Chunk() has been changed to ChunkOrIterable().
This is a precursor to OOO native histogram support - with OOO native histograms, the chunks.Meta passed to Chunk() can result in multiple chunks being returned rather than just a single chunk (e.g. if oooMergedChunk has a counter reset in the middle).
To support this, ChunkOrIterable() requires either a single chunk or an iterable to be returned. If an iterable is returned, the caller has the responsibility of converting the samples from the iterable into possibly multiple chunks. The OOOHeadChunkReader now returns an iterable rather than a chunk to prepare for the native histograms case. Also as a beneficial side effect, oooMergedChunk and boundedChunk has been simplified as they only need to implement the Iterable interface now, not the full Chunk interface.
---------
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
In the head and in v1 postings on disk, it makes no difference whether
postings are sorted. Only for v2 does the code step through in order.
So, move the sorting to where it is required, and thus skip it entirely
in the head.
Label values in on-disk blocks are already sorted, but `slices.Sort` is
very fast on already-sorted data so we don't bother checking.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Instead of passing in a `ScratchBuilder` and `Labels`, just pass the
builder and the caller can extract labels from it. In many cases the
caller didn't use the Labels value anyway.
Now in `Labels.ScratchBuilder` we need a slightly different API: one
to assign what will be the result, instead of overwriting some other
`Labels`. This is safer and easier to reason about.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This necessitates a change to the `tsdb.IndexReader` interface:
`index.Reader` is used from multiple goroutines concurrently, so we
can't have state in it.
We do retain a `ScratchBuilder` in `blockBaseSeriesSet` which is
iterator-like.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This call was added by PR #11075 merged before #11318 which changed all
similar calls to `sort.Sort` into a faster one.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Use new experimental package `golang.org/x/exp/slices`.
slices.Sort works on values that are directly comparable, like ints,
so avoids the overhad of an interface call to `.Less()`.
Left tests unchanged, because they don't need the speed and it may be
a cross-check that slices.Sort gives the same answer.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Introduce out-of-order TSDB support
This implementation is based on this design doc:
https://docs.google.com/document/d/1Kppm7qL9C-BJB1j6yb6-9ObG3AbdZnFUBYPNNWwDBYM/edit?usp=sharing
This commit adds support to accept out-of-order ("OOO") sample into the TSDB
up to a configurable time allowance. If OOO is enabled, overlapping querying
are automatically enabled.
Most of the additions have been borrowed from
https://github.com/grafana/mimir-prometheus/
Here is the list ist of the original commits cherry picked
from mimir-prometheus into this branch:
- 4b2198d7ec
- 2836e5513f
- 00b379c3a5
- ff0dc75758
- a632c73352
- c6f3d4ab33
- 5e8406a1d4
- abde1e0ba1
- e70e769889
- df59320886
Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Dieter Plaetinck <dieter@grafana.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* gofumpt files
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Add license header to missing files
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Fix OOO tests due to existing chunk disk mapper implementation
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Fix truncate int overflow
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Add Sync method to the WAL and update tests
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* remove useless sync
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Update minOOOTime after truncating Head
* Update minOOOTime after truncating Head
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Fix lint
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Add a unit test
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Load OutOfOrderTimeWindow only once per appender
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Fix OOO Head LabelValues and PostingsForMatchers
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Fix replay of OOO mmap chunks
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Remove unnecessary err check
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Prevent panic with ApplyConfig
Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Run OOO compaction after restart if there is OOO data from WBL
Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Apply Bartek's suggestions
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Refactor OOO compaction
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Address comments and TODOs
- Added a comment explaining why we need the allow overlapping
compaction toggle
- Clarified TSDBConfig OutOfOrderTimeWindow doc
- Added an owner to all the TODOs in the code
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Run go format
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Fix remaining review comments
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Fix tests
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Change wbl reference when truncating ooo in TestHeadMinOOOTimeUpdate
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Fix TestWBLAndMmapReplay test failure on windows
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Address most of the feedback
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Refactor the block meta for out of order
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Fix windows error
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Fix review comments
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Dieter Plaetinck <dieter@grafana.com>
Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
* refactor: move from io/ioutil to io and os packages
* use fs.DirEntry instead of os.FileInfo after os.ReadDir
Signed-off-by: MOREL Matthieu <matthieu.morel@cnp.fr>
This creates a new `model` directory and moves all data-model related
packages over there:
exemplar labels relabel rulefmt textparse timestamp value
All the others are more or less utilities and have been moved to `util`:
gate logging modetimevfs pool runtime
Signed-off-by: beorn7 <beorn@grafana.com>
* TSDB: demistify seriesRefs and ChunkRefs
The TSDB package contains many types of series and chunk references,
all shrouded in uint types. Often the same uint value may
actually mean one of different types, in non-obvious ways.
This PR aims to clarify the code and help navigating to relevant docs,
usage, etc much quicker.
Concretely:
* Use appropriately named types and document their semantics and
relations.
* Make multiplexing and demuxing of types explicit
(on the boundaries between concrete implementations and generic
interfaces).
* Casting between different types should be free. None of the changes
should have any impact on how the code runs.
TODO: Implement BlockSeriesRef where appropriate (for a future PR)
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* feedback
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* agent: demistify seriesRefs and ChunkRefs
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* Push the matchers for LabelNames all the way into the index.
NB This doesn't actually implement it in the index, just plumbs it through for now...
Signed-off-by: Tom Wilkie <tom@grafana.com>
* Hack it up. Does not work.
Signed-off-by: Tom Wilkie <tom@grafana.com>
* Revert changes I don't understand
Can't see why do we need to hold a mutex on symbols, and the purpose of
the LabelNamesFor method.
Maybe I'll need to re-add this later.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Implement LabelNamesFor
This method provides the label names that appear in the postings
provided. We do that deeper than the label values because we know
beforehand that most of the label names we'll be the same across
different postings, and we don't want to go down an up looking up the
same symbols for all different series.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Mutex on symbols should be unlocked
However, I still don't understand why do we need a mutex here.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Fix head.LabelNamesFor
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Implement mockIndex LabelNames with matchers
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Nitpick on slice initialisation
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Add tests for LabelNamesWithMatchers
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Fix the mutex mess on head.LabelValues/LabelNames
I still don't see why we need to grab that unrelated mutex, but at least
now we're grabbing it consistently
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Check error after iterating postings
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Use the error from posting when there was en error in postings
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Update storage/interface.go comment
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
* Update tsdb/index/index.go comment
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
* Update tsdb/index/index.go wrapped error msg
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
* Update tsdb/index/index.go wrapped error msg
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
* Update tsdb/index/index.go warpped error msg
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
* Remove unneeded comment
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Add testcases for LabelNames w/matchers in api.go
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Use t.Cleanup() instead of defer in tests
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Tom Wilkie <tom@grafana.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
* CleanupTombstones refactored, now reloading blocks after every compaction.
The goal is to remove deletable blocks after every compaction and, thus, decrease disk space used when cleaning tombstones.
Signed-off-by: arthursens <arthursens2005@gmail.com>
* Protect DB against parallel reloads
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
* Fix typos
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
* MultiError: Refactored MultiError for more concise and safe usage.
* Less lines
* Goland IDE was marking every usage of old MultiError "potential nil" error
* It was easy to forgot using Err() when error was returned, now it's safely assured on compile time.
NOTE: Potentially I would rename package to merrors. (: In different PR.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Addressed review comments.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Addressed comments.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fix after rebase.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* tsdb: Added ChunkQueryable implementations to db; unified compactor, querier and fanout block iterating.
Chained to https://github.com/prometheus/prometheus/pull/7059
* NewMerge(Chunk)Querier now takies multiple primaries allowing tsdb DB code to use it.
* Added single SeriesEntry / ChunkEntry for all series implementations.
* Unified all vertical, and non vertical for compact and querying to single
merge series / chunk sets by reusing VerticalSeriesMergeFunc for overlapping algorithm (same logic as before)
* Added block (Base/Chunk/)Querier for block querying. We then use populateAndTomb(Base/Chunk/) to iterate over chunks or samples.
* Refactored endpoint tests and querier tests to include subtests.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Addressed comments from Brian and Beorn.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fixed snapshot test and added chunk iterator support for DBReadOnly.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fixed race when iterating over Ats first.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fixed tests.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fixed populate block tests.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fixed endpoints test.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fixed test.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Added test & fixed case of head open chunk.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fixed DBReadOnly tests and bug producing 1 sample chunks.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Added cases for partial block overlap for multiple full chunks.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Added extra tests for chunk meta after compaction.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fixed small vertical merge bug and added more tests for that.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
This fixes#6992, which was introduced by #6777. There was an
intermediate component which translated TSDB errors into storage errors,
but that component was deleted and this bug went unnoticed, until we
were watching at the Prombench results. Without this, scrape will fail
instead of dropping samples or using "Add" when the series have been
garbage collected.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Series() will fetch all the metadata for a series,
even if it's going to be filtered later due to time ranges.
For 1M series we save ~1.1s if you only needed some of the data, but take an
extra ~.2s if you did want everything.
benchmark old ns/op new ns/op delta
BenchmarkHeadSeries/1of1000000-4 1443715987 131553480 -90.89%
BenchmarkHeadSeries/10of1000000-4 1433394040 130730596 -90.88%
BenchmarkHeadSeries/100of1000000-4 1437444672 131360813 -90.86%
BenchmarkHeadSeries/1000of1000000-4 1438958659 132573137 -90.79%
BenchmarkHeadSeries/10000of1000000-4 1438061766 145742377 -89.87%
BenchmarkHeadSeries/100000of1000000-4 1455060948 281659416 -80.64%
BenchmarkHeadSeries/1000000of1000000-4 1633524504 1803550153 +10.41%
benchmark old allocs new allocs delta
BenchmarkHeadSeries/1of1000000-4 4000055 28 -100.00%
BenchmarkHeadSeries/10of1000000-4 4000073 87 -100.00%
BenchmarkHeadSeries/100of1000000-4 4000253 630 -99.98%
BenchmarkHeadSeries/1000of1000000-4 4002053 6036 -99.85%
BenchmarkHeadSeries/10000of1000000-4 4020053 60054 -98.51%
BenchmarkHeadSeries/100000of1000000-4 4200053 600074 -85.71%
BenchmarkHeadSeries/1000000of1000000-4 6000053 6000094 +0.00%
benchmark old bytes new bytes delta
BenchmarkHeadSeries/1of1000000-4 229192184 2488 -100.00%
BenchmarkHeadSeries/10of1000000-4 229193336 5568 -100.00%
BenchmarkHeadSeries/100of1000000-4 229204856 35536 -99.98%
BenchmarkHeadSeries/1000of1000000-4 229320056 345104 -99.85%
BenchmarkHeadSeries/10000of1000000-4 230472056 3894673 -98.31%
BenchmarkHeadSeries/100000of1000000-4 241992056 40511632 -83.26%
BenchmarkHeadSeries/1000000of1000000-4 357192056 402380440 +12.65%
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
This is part of https://github.com/prometheus/prometheus/pull/5882 that can be done to simplify things.
All todos I added will be fixed in follow up PRs.
* querier.Querier, querier.Appender, querier.SeriesSet, and querier.Series interfaces merged
with storage interface.go. All imports that.
* querier.SeriesIterator replaced by chunkenc.Iterator
* Added chunkenc.Iterator.Seek method and tests for xor implementation (?)
* Since we properly handle SelectParams for Select methods I adjusted min max
based on that. This should help in terms of performance for queries with functions like offset.
* added Seek to deletedIterator and test.
* storage/tsdb was removed as it was only a unnecessary glue with incompatible structs.
No logic was changed, only different source of abstractions, so no need for benchmarks.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Rather than buffer up symbols in RAM, do it one by one
during compaction. Then use the reader's symbol handling
for symbol lookups during the rest of the index write.
There is some slowdown in compaction, due to having to look through a file
rather than a hash lookup. This is noise to the overall cost of compacting
series with thousands of samples though.
benchmark old ns/op new ns/op delta
BenchmarkCompaction/type=normal,blocks=4,series=10000,samplesPerSeriesPerBlock=101-4 539917175 675341565 +25.08%
BenchmarkCompaction/type=normal,blocks=4,series=10000,samplesPerSeriesPerBlock=1001-4 2441815993 2477453524 +1.46%
BenchmarkCompaction/type=normal,blocks=4,series=10000,samplesPerSeriesPerBlock=2001-4 3978543559 3922909687 -1.40%
BenchmarkCompaction/type=normal,blocks=4,series=10000,samplesPerSeriesPerBlock=5001-4 8430219716 8586610007 +1.86%
BenchmarkCompaction/type=vertical,blocks=4,series=10000,samplesPerSeriesPerBlock=101-4 1786424591 1909552782 +6.89%
BenchmarkCompaction/type=vertical,blocks=4,series=10000,samplesPerSeriesPerBlock=1001-4 5328998202 6020839950 +12.98%
BenchmarkCompaction/type=vertical,blocks=4,series=10000,samplesPerSeriesPerBlock=2001-4 10085059958 11085278690 +9.92%
BenchmarkCompaction/type=vertical,blocks=4,series=10000,samplesPerSeriesPerBlock=5001-4 25497010155 27018079806 +5.97%
BenchmarkCompactionFromHead/labelnames=1,labelvalues=100000-4 2427391406 2817217987 +16.06%
BenchmarkCompactionFromHead/labelnames=10,labelvalues=10000-4 2592965497 2538805050 -2.09%
BenchmarkCompactionFromHead/labelnames=100,labelvalues=1000-4 2437388343 2668012858 +9.46%
BenchmarkCompactionFromHead/labelnames=1000,labelvalues=100-4 2317095324 2787423966 +20.30%
BenchmarkCompactionFromHead/labelnames=10000,labelvalues=10-4 2600239857 2096973860 -19.35%
benchmark old allocs new allocs delta
BenchmarkCompaction/type=normal,blocks=4,series=10000,samplesPerSeriesPerBlock=101-4 500851 470794 -6.00%
BenchmarkCompaction/type=normal,blocks=4,series=10000,samplesPerSeriesPerBlock=1001-4 821527 791451 -3.66%
BenchmarkCompaction/type=normal,blocks=4,series=10000,samplesPerSeriesPerBlock=2001-4 1141562 1111508 -2.63%
BenchmarkCompaction/type=normal,blocks=4,series=10000,samplesPerSeriesPerBlock=5001-4 2141576 2111504 -1.40%
BenchmarkCompaction/type=vertical,blocks=4,series=10000,samplesPerSeriesPerBlock=101-4 871466 841424 -3.45%
BenchmarkCompaction/type=vertical,blocks=4,series=10000,samplesPerSeriesPerBlock=1001-4 1941428 1911415 -1.55%
BenchmarkCompaction/type=vertical,blocks=4,series=10000,samplesPerSeriesPerBlock=2001-4 3071573 3041510 -0.98%
BenchmarkCompaction/type=vertical,blocks=4,series=10000,samplesPerSeriesPerBlock=5001-4 6771648 6741509 -0.45%
BenchmarkCompactionFromHead/labelnames=1,labelvalues=100000-4 731493 824888 +12.77%
BenchmarkCompactionFromHead/labelnames=10,labelvalues=10000-4 793918 887311 +11.76%
BenchmarkCompactionFromHead/labelnames=100,labelvalues=1000-4 811842 905204 +11.50%
BenchmarkCompactionFromHead/labelnames=1000,labelvalues=100-4 832244 925081 +11.16%
BenchmarkCompactionFromHead/labelnames=10000,labelvalues=10-4 921553 1019162 +10.59%
benchmark old bytes new bytes delta
BenchmarkCompaction/type=normal,blocks=4,series=10000,samplesPerSeriesPerBlock=101-4 40532648 35698276 -11.93%
BenchmarkCompaction/type=normal,blocks=4,series=10000,samplesPerSeriesPerBlock=1001-4 60340216 53409568 -11.49%
BenchmarkCompaction/type=normal,blocks=4,series=10000,samplesPerSeriesPerBlock=2001-4 81087336 72065552 -11.13%
BenchmarkCompaction/type=normal,blocks=4,series=10000,samplesPerSeriesPerBlock=5001-4 142485576 120878544 -15.16%
BenchmarkCompaction/type=vertical,blocks=4,series=10000,samplesPerSeriesPerBlock=101-4 208661368 203831136 -2.31%
BenchmarkCompaction/type=vertical,blocks=4,series=10000,samplesPerSeriesPerBlock=1001-4 347345904 340484696 -1.98%
BenchmarkCompaction/type=vertical,blocks=4,series=10000,samplesPerSeriesPerBlock=2001-4 585185856 576244648 -1.53%
BenchmarkCompaction/type=vertical,blocks=4,series=10000,samplesPerSeriesPerBlock=5001-4 1357641792 1358966528 +0.10%
BenchmarkCompactionFromHead/labelnames=1,labelvalues=100000-4 126486664 119666744 -5.39%
BenchmarkCompactionFromHead/labelnames=10,labelvalues=10000-4 122323192 115117224 -5.89%
BenchmarkCompactionFromHead/labelnames=100,labelvalues=1000-4 126404504 119469864 -5.49%
BenchmarkCompactionFromHead/labelnames=1000,labelvalues=100-4 119047832 112230408 -5.73%
BenchmarkCompactionFromHead/labelnames=10000,labelvalues=10-4 136576016 116634800 -14.60%
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
Rather than keeping the entire symbol table in memory, keep every nth
offset and walk from there to the entry we need. This ends up slightly
slower, ~360ms per 1M series returned from PostingsForMatchers which is
not much considering the rest of the CPU such a query would go on to
use.
Make LabelValues use the postings tables, rather than having
to do symbol lookups. Use yoloString, as PostingsForMatchers
doesn't need the strings to stick around and adjust the API
call to keep the Querier open until it's all marshalled.
Remove allocatedSymbols memory optimisation, we no longer keep all the
symbol strings in heap memory. Remove LabelValuesFor and LabelIndices,
they're dead code. Ensure we've still tests for label indices,
and add missing test that we can work with old V1 Format index files.
PostingForMatchers performance is slightly better, with a big drop in
allocation counts due to using yoloString for LabelValues:
benchmark old ns/op new ns/op delta
BenchmarkPostingsForMatchers/Block/n="1"-4 36698 36681 -0.05%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4 522786 560887 +7.29%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4 511652 537680 +5.09%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4 522102 564239 +8.07%
BenchmarkPostingsForMatchers/Block/i=~".*"-4 113689911 111795919 -1.67%
BenchmarkPostingsForMatchers/Block/i=~".+"-4 135825572 132871085 -2.18%
BenchmarkPostingsForMatchers/Block/i=~""-4 40782628 38038181 -6.73%
BenchmarkPostingsForMatchers/Block/i!=""-4 31267869 29194327 -6.63%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4 112733329 111568823 -1.03%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4 112868153 111232029 -1.45%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4 31338257 29349446 -6.35%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4 32054482 29972436 -6.50%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4 136504654 133968442 -1.86%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4 27960350 27264997 -2.49%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4 136765564 133860724 -2.12%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4 163714583 159453668 -2.60%
benchmark old allocs new allocs delta
BenchmarkPostingsForMatchers/Block/n="1"-4 6 6 +0.00%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4 11 11 +0.00%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4 11 11 +0.00%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4 17 15 -11.76%
BenchmarkPostingsForMatchers/Block/i=~".*"-4 100012 12 -99.99%
BenchmarkPostingsForMatchers/Block/i=~".+"-4 200040 100040 -49.99%
BenchmarkPostingsForMatchers/Block/i=~""-4 200045 100045 -49.99%
BenchmarkPostingsForMatchers/Block/i!=""-4 200041 100041 -49.99%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4 100017 17 -99.98%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4 100023 23 -99.98%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4 200046 100046 -49.99%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4 200050 100050 -49.99%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4 200049 100049 -49.99%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4 111150 11150 -89.97%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4 200055 100055 -49.99%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4 311238 111238 -64.26%
benchmark old bytes new bytes delta
BenchmarkPostingsForMatchers/Block/n="1"-4 296 296 +0.00%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4 424 424 +0.00%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4 424 424 +0.00%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4 552 1544 +179.71%
BenchmarkPostingsForMatchers/Block/i=~".*"-4 1600482 1606125 +0.35%
BenchmarkPostingsForMatchers/Block/i=~".+"-4 17259065 17264709 +0.03%
BenchmarkPostingsForMatchers/Block/i=~""-4 17259150 17264780 +0.03%
BenchmarkPostingsForMatchers/Block/i!=""-4 17259048 17264680 +0.03%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4 1600610 1606242 +0.35%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4 1600813 1606434 +0.35%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4 17259176 17264808 +0.03%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4 17259304 17264936 +0.03%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4 17259333 17264965 +0.03%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4 3142628 3148262 +0.18%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4 17259509 17265141 +0.03%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4 20405680 20416944 +0.06%
However overall Select performance is down and involves more allocs, due to
having to do more than a simple map lookup to resolve a symbol and that all the strings
returned are allocated:
benchmark old ns/op new ns/op delta
BenchmarkQuerierSelect/Block/1of1000000-4 506092636 862678244 +70.46%
BenchmarkQuerierSelect/Block/10of1000000-4 505638968 860917636 +70.26%
BenchmarkQuerierSelect/Block/100of1000000-4 505229450 882150048 +74.60%
BenchmarkQuerierSelect/Block/1000of1000000-4 515905414 862241115 +67.13%
BenchmarkQuerierSelect/Block/10000of1000000-4 516785354 874841110 +69.29%
BenchmarkQuerierSelect/Block/100000of1000000-4 540742808 907030187 +67.74%
BenchmarkQuerierSelect/Block/1000000of1000000-4 815224288 1181236903 +44.90%
benchmark old allocs new allocs delta
BenchmarkQuerierSelect/Block/1of1000000-4 4000020 6000020 +50.00%
BenchmarkQuerierSelect/Block/10of1000000-4 4000038 6000038 +50.00%
BenchmarkQuerierSelect/Block/100of1000000-4 4000218 6000218 +50.00%
BenchmarkQuerierSelect/Block/1000of1000000-4 4002018 6002018 +49.97%
BenchmarkQuerierSelect/Block/10000of1000000-4 4020018 6020018 +49.75%
BenchmarkQuerierSelect/Block/100000of1000000-4 4200018 6200018 +47.62%
BenchmarkQuerierSelect/Block/1000000of1000000-4 6000018 8000019 +33.33%
benchmark old bytes new bytes delta
BenchmarkQuerierSelect/Block/1of1000000-4 176001468 227201476 +29.09%
BenchmarkQuerierSelect/Block/10of1000000-4 176002620 227202628 +29.09%
BenchmarkQuerierSelect/Block/100of1000000-4 176014140 227214148 +29.09%
BenchmarkQuerierSelect/Block/1000of1000000-4 176129340 227329348 +29.07%
BenchmarkQuerierSelect/Block/10000of1000000-4 177281340 228481348 +28.88%
BenchmarkQuerierSelect/Block/100000of1000000-4 188801340 240001348 +27.12%
BenchmarkQuerierSelect/Block/1000000of1000000-4 304001340 355201616 +16.84%
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
Rather than building up a 2nd copy of all the posting
tables, construct it from the data we've already written
to disk. This takes more time, but saves memory.
Current benchmark numbers have this as slightly faster, but that's
likely due to the synthetic data not having many label names.
Memory usage is roughly halved for the relevant bits.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
Rather than keeping the offset of each postings list, instead
keep the nth offset of the offset of the posting list. As postings
list offsets have always been sorted, we can then get to the closest
entry before the one we want an iterate forwards.
I haven't done much tuning on the 32 number, it was chosen to try
not to read through more than a 4k page of data.
Switch to a bulk interface for fetching postings. Use it to avoid having
to re-read parts of the posting offset table when querying lots of it.
For a index with what BenchmarkHeadPostingForMatchers uses RAM
for r.postings drops from 3.79MB to 80.19kB or about 48x.
Bytes allocated go down by 30%, and suprisingly CPU usage drops by
4-6% for typical queries too.
benchmark old ns/op new ns/op delta
BenchmarkPostingsForMatchers/Block/n="1"-4 35231 36673 +4.09%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4 563380 540627 -4.04%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4 536782 534186 -0.48%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4 533990 541550 +1.42%
BenchmarkPostingsForMatchers/Block/i=~".*"-4 113374598 117969608 +4.05%
BenchmarkPostingsForMatchers/Block/i=~".+"-4 146329884 139651442 -4.56%
BenchmarkPostingsForMatchers/Block/i=~""-4 50346510 44961127 -10.70%
BenchmarkPostingsForMatchers/Block/i!=""-4 41261550 35356165 -14.31%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4 112544418 116904010 +3.87%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4 112487086 116864918 +3.89%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4 41094758 35457904 -13.72%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4 41906372 36151473 -13.73%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4 147262414 140424800 -4.64%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4 28615629 27872072 -2.60%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4 147117177 140462403 -4.52%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4 175096826 167902298 -4.11%
benchmark old allocs new allocs delta
BenchmarkPostingsForMatchers/Block/n="1"-4 4 6 +50.00%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4 7 11 +57.14%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4 7 11 +57.14%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4 15 17 +13.33%
BenchmarkPostingsForMatchers/Block/i=~".*"-4 100010 100012 +0.00%
BenchmarkPostingsForMatchers/Block/i=~".+"-4 200069 200040 -0.01%
BenchmarkPostingsForMatchers/Block/i=~""-4 200072 200045 -0.01%
BenchmarkPostingsForMatchers/Block/i!=""-4 200070 200041 -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4 100013 100017 +0.00%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4 100017 100023 +0.01%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4 200073 200046 -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4 200075 200050 -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4 200074 200049 -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4 111165 111150 -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4 200078 200055 -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4 311282 311238 -0.01%
benchmark old bytes new bytes delta
BenchmarkPostingsForMatchers/Block/n="1"-4 264 296 +12.12%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4 360 424 +17.78%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4 360 424 +17.78%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4 520 552 +6.15%
BenchmarkPostingsForMatchers/Block/i=~".*"-4 1600461 1600482 +0.00%
BenchmarkPostingsForMatchers/Block/i=~".+"-4 24900801 17259077 -30.69%
BenchmarkPostingsForMatchers/Block/i=~""-4 24900836 17259151 -30.69%
BenchmarkPostingsForMatchers/Block/i!=""-4 24900760 17259048 -30.69%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4 1600557 1600621 +0.00%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4 1600717 1600813 +0.01%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4 24900856 17259176 -30.69%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4 24900952 17259304 -30.69%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4 24900993 17259333 -30.69%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4 3788311 3142630 -17.04%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4 24901137 17259509 -30.69%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4 28693086 20405680 -28.88%
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>