* tsdb: use dennwc/varint to speed up decoding
This is a tiny library, MIT-licensed, which unrolls the loop to go
about twice as fast.
Needed to copy the sign-inverting logic inline, previously provided by
the `binary` package.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* More comments to explain varint decoding
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Flushing buffers and doing a pwrite per posting is expensive
time wise, so go back to the old way for those. This doubles
our memory usage, but that's still small as it's only
~8 bytes per time series in the index. This is 30-40% faster.
benchmark old ns/op new ns/op delta
BenchmarkCompactionFromHead/labelnames=1,labelvalues=100000-4 1101429174 724362123 -34.23%
BenchmarkCompactionFromHead/labelnames=10,labelvalues=10000-4 1074466374 720977022 -32.90%
BenchmarkCompactionFromHead/labelnames=100,labelvalues=1000-4 1166510282 677702636 -41.90%
BenchmarkCompactionFromHead/labelnames=1000,labelvalues=100-4 1075013071 696855960 -35.18%
BenchmarkCompactionFromHead/labelnames=10000,labelvalues=10-4 1231673790 829328610 -32.67%
benchmark old allocs new allocs delta
BenchmarkCompactionFromHead/labelnames=1,labelvalues=100000-4 832571 731435 -12.15%
BenchmarkCompactionFromHead/labelnames=10,labelvalues=10000-4 894875 793823 -11.29%
BenchmarkCompactionFromHead/labelnames=100,labelvalues=1000-4 912931 811804 -11.08%
BenchmarkCompactionFromHead/labelnames=1000,labelvalues=100-4 933511 832366 -10.83%
BenchmarkCompactionFromHead/labelnames=10000,labelvalues=10-4 1022791 921554 -9.90%
benchmark old bytes new bytes delta
BenchmarkCompactionFromHead/labelnames=1,labelvalues=100000-4 129063496 126472364 -2.01%
BenchmarkCompactionFromHead/labelnames=10,labelvalues=10000-4 124154888 122300764 -1.49%
BenchmarkCompactionFromHead/labelnames=100,labelvalues=1000-4 128790648 126394856 -1.86%
BenchmarkCompactionFromHead/labelnames=1000,labelvalues=100-4 120570696 118946548 -1.35%
BenchmarkCompactionFromHead/labelnames=10000,labelvalues=10-4 138754288 136317432 -1.76%
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
Rather than building up a 2nd copy of all the posting
tables, construct it from the data we've already written
to disk. This takes more time, but saves memory.
Current benchmark numbers have this as slightly faster, but that's
likely due to the synthetic data not having many label names.
Memory usage is roughly halved for the relevant bits.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
Rather than keeping the offset of each postings list, instead
keep the nth offset of the offset of the posting list. As postings
list offsets have always been sorted, we can then get to the closest
entry before the one we want an iterate forwards.
I haven't done much tuning on the 32 number, it was chosen to try
not to read through more than a 4k page of data.
Switch to a bulk interface for fetching postings. Use it to avoid having
to re-read parts of the posting offset table when querying lots of it.
For a index with what BenchmarkHeadPostingForMatchers uses RAM
for r.postings drops from 3.79MB to 80.19kB or about 48x.
Bytes allocated go down by 30%, and suprisingly CPU usage drops by
4-6% for typical queries too.
benchmark old ns/op new ns/op delta
BenchmarkPostingsForMatchers/Block/n="1"-4 35231 36673 +4.09%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4 563380 540627 -4.04%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4 536782 534186 -0.48%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4 533990 541550 +1.42%
BenchmarkPostingsForMatchers/Block/i=~".*"-4 113374598 117969608 +4.05%
BenchmarkPostingsForMatchers/Block/i=~".+"-4 146329884 139651442 -4.56%
BenchmarkPostingsForMatchers/Block/i=~""-4 50346510 44961127 -10.70%
BenchmarkPostingsForMatchers/Block/i!=""-4 41261550 35356165 -14.31%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4 112544418 116904010 +3.87%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4 112487086 116864918 +3.89%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4 41094758 35457904 -13.72%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4 41906372 36151473 -13.73%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4 147262414 140424800 -4.64%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4 28615629 27872072 -2.60%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4 147117177 140462403 -4.52%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4 175096826 167902298 -4.11%
benchmark old allocs new allocs delta
BenchmarkPostingsForMatchers/Block/n="1"-4 4 6 +50.00%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4 7 11 +57.14%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4 7 11 +57.14%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4 15 17 +13.33%
BenchmarkPostingsForMatchers/Block/i=~".*"-4 100010 100012 +0.00%
BenchmarkPostingsForMatchers/Block/i=~".+"-4 200069 200040 -0.01%
BenchmarkPostingsForMatchers/Block/i=~""-4 200072 200045 -0.01%
BenchmarkPostingsForMatchers/Block/i!=""-4 200070 200041 -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4 100013 100017 +0.00%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4 100017 100023 +0.01%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4 200073 200046 -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4 200075 200050 -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4 200074 200049 -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4 111165 111150 -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4 200078 200055 -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4 311282 311238 -0.01%
benchmark old bytes new bytes delta
BenchmarkPostingsForMatchers/Block/n="1"-4 264 296 +12.12%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4 360 424 +17.78%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4 360 424 +17.78%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4 520 552 +6.15%
BenchmarkPostingsForMatchers/Block/i=~".*"-4 1600461 1600482 +0.00%
BenchmarkPostingsForMatchers/Block/i=~".+"-4 24900801 17259077 -30.69%
BenchmarkPostingsForMatchers/Block/i=~""-4 24900836 17259151 -30.69%
BenchmarkPostingsForMatchers/Block/i!=""-4 24900760 17259048 -30.69%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4 1600557 1600621 +0.00%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4 1600717 1600813 +0.01%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4 24900856 17259176 -30.69%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4 24900952 17259304 -30.69%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4 24900993 17259333 -30.69%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4 3788311 3142630 -17.04%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4 24901137 17259509 -30.69%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4 28693086 20405680 -28.88%
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
We can instead write it as we go, and then go back and write in the
length at the end.
Also fix the compaction benchmark, which indicates no changes.
For the benchmark, this brings maximum memory usage of the buffers
from ~200kB down to 128B.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>