Commit graph

897 commits

Author SHA1 Message Date
Brian Brazil cf76daed2f Avoid WriteAt for Postings.
Flushing buffers and doing a pwrite per posting is expensive
time wise, so go back to the old way for those. This doubles
our memory usage, but that's still small as it's only
~8 bytes per time series in the index. This is 30-40% faster.

benchmark                                                         old ns/op      new ns/op     delta
BenchmarkCompactionFromHead/labelnames=1,labelvalues=100000-4     1101429174     724362123     -34.23%
BenchmarkCompactionFromHead/labelnames=10,labelvalues=10000-4     1074466374     720977022     -32.90%
BenchmarkCompactionFromHead/labelnames=100,labelvalues=1000-4     1166510282     677702636     -41.90%
BenchmarkCompactionFromHead/labelnames=1000,labelvalues=100-4     1075013071     696855960     -35.18%
BenchmarkCompactionFromHead/labelnames=10000,labelvalues=10-4     1231673790     829328610     -32.67%

benchmark                                                         old allocs     new allocs     delta
BenchmarkCompactionFromHead/labelnames=1,labelvalues=100000-4     832571         731435         -12.15%
BenchmarkCompactionFromHead/labelnames=10,labelvalues=10000-4     894875         793823         -11.29%
BenchmarkCompactionFromHead/labelnames=100,labelvalues=1000-4     912931         811804         -11.08%
BenchmarkCompactionFromHead/labelnames=1000,labelvalues=100-4     933511         832366         -10.83%
BenchmarkCompactionFromHead/labelnames=10000,labelvalues=10-4     1022791        921554         -9.90%

benchmark                                                         old bytes     new bytes     delta
BenchmarkCompactionFromHead/labelnames=1,labelvalues=100000-4     129063496     126472364     -2.01%
BenchmarkCompactionFromHead/labelnames=10,labelvalues=10000-4     124154888     122300764     -1.49%
BenchmarkCompactionFromHead/labelnames=100,labelvalues=1000-4     128790648     126394856     -1.86%
BenchmarkCompactionFromHead/labelnames=1000,labelvalues=100-4     120570696     118946548     -1.35%
BenchmarkCompactionFromHead/labelnames=10000,labelvalues=10-4     138754288     136317432     -1.76%

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-12-16 15:30:49 +00:00
Brian Brazil 971dafdfbe Coalesce series reads where we can.
When compacting rather than doing a read of all
series in the index per label name, do many at once
but only when it won't use (much) more ram than writing the
special all index does.

original in-memory postings:
BenchmarkCompactionFromHead/labelnames=1,labelvalues=100000-4                  1        1202383447 ns/op        158936496 B/op   1031511 allocs/op
BenchmarkCompactionFromHead/labelnames=10,labelvalues=10000-4                  1        1141792706 ns/op        154453408 B/op   1093453 allocs/op
BenchmarkCompactionFromHead/labelnames=100,labelvalues=1000-4                  1        1169288829 ns/op        161072336 B/op   1110021 allocs/op
BenchmarkCompactionFromHead/labelnames=1000,labelvalues=100-4                  1        1115700103 ns/op        149480472 B/op   1129180 allocs/op
BenchmarkCompactionFromHead/labelnames=10000,labelvalues=10-4                  1        1283813141 ns/op        162937800 B/op   1202771 allocs/op

before:
BenchmarkCompactionFromHead/labelnames=1,labelvalues=100000-4                  1        1145195941 ns/op        131749984 B/op    834400 allocs/op
BenchmarkCompactionFromHead/labelnames=10,labelvalues=10000-4                  1        1233526345 ns/op        127889416 B/op    897033 allocs/op
BenchmarkCompactionFromHead/labelnames=100,labelvalues=1000-4                  1        1821942296 ns/op        131665648 B/op    914836 allocs/op
BenchmarkCompactionFromHead/labelnames=1000,labelvalues=100-4                  1        8035568665 ns/op        123811832 B/op    934312 allocs/op
BenchmarkCompactionFromHead/labelnames=10000,labelvalues=10-4                  1       71325926267 ns/op        140722648 B/op   1016824 allocs/op

after:
BenchmarkCompactionFromHead/labelnames=1,labelvalues=100000-4                  1        1101429174 ns/op        129063496 B/op    832571 allocs/op
BenchmarkCompactionFromHead/labelnames=10,labelvalues=10000-4                  1        1074466374 ns/op        124154888 B/op    894875 allocs/op
BenchmarkCompactionFromHead/labelnames=100,labelvalues=1000-4                  1        1166510282 ns/op        128790648 B/op    912931 allocs/op
BenchmarkCompactionFromHead/labelnames=1000,labelvalues=100-4                  1        1075013071 ns/op        120570696 B/op    933511 allocs/op
BenchmarkCompactionFromHead/labelnames=10000,labelvalues=10-4                  1        1231673790 ns/op        138754288 B/op   1022791 allocs/op

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-12-12 08:38:14 +00:00
Brian Brazil 373a1fdfbf Reread index series rather than storing in memory.
Rather than building up a 2nd copy of all the posting
tables, construct it from the data we've already written
to disk. This takes more time, but saves memory.

Current benchmark numbers have this as slightly faster, but that's
likely due to the synthetic data not having many label names.
Memory usage is roughly halved for the relevant bits.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-12-11 22:23:39 +00:00
Brian Brazil 48d25e6fe7 Reduce memory used by postings offset table.
Rather than keeping the offset of each postings list, instead
keep the nth offset of the offset of the posting list. As postings
list offsets have always been sorted, we can then get to the closest
entry before the one we want an iterate forwards.

I haven't done much tuning on the 32 number, it was chosen to try
not to read through more than a 4k page of data.

Switch to a bulk interface for fetching postings. Use it to avoid having
to re-read parts of the posting offset table when querying lots of it.

For a index with what BenchmarkHeadPostingForMatchers uses RAM
for r.postings drops from 3.79MB to 80.19kB or about 48x.
Bytes allocated go down by 30%, and suprisingly CPU usage drops by
4-6% for typical queries too.

benchmark                                                               old ns/op      new ns/op      delta
BenchmarkPostingsForMatchers/Block/n="1"-4                              35231          36673          +4.09%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4                      563380         540627         -4.04%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4                      536782         534186         -0.48%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4                     533990         541550         +1.42%
BenchmarkPostingsForMatchers/Block/i=~".*"-4                            113374598      117969608      +4.05%
BenchmarkPostingsForMatchers/Block/i=~".+"-4                            146329884      139651442      -4.56%
BenchmarkPostingsForMatchers/Block/i=~""-4                              50346510       44961127       -10.70%
BenchmarkPostingsForMatchers/Block/i!=""-4                              41261550       35356165       -14.31%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4              112544418      116904010      +3.87%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4       112487086      116864918      +3.89%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4                        41094758       35457904       -13.72%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4                41906372       36151473       -13.73%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4              147262414      140424800      -4.64%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4             28615629       27872072       -2.60%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4       147117177      140462403      -4.52%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4     175096826      167902298      -4.11%

benchmark                                                               old allocs     new allocs     delta
BenchmarkPostingsForMatchers/Block/n="1"-4                              4              6              +50.00%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4                      7              11             +57.14%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4                      7              11             +57.14%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4                     15             17             +13.33%
BenchmarkPostingsForMatchers/Block/i=~".*"-4                            100010         100012         +0.00%
BenchmarkPostingsForMatchers/Block/i=~".+"-4                            200069         200040         -0.01%
BenchmarkPostingsForMatchers/Block/i=~""-4                              200072         200045         -0.01%
BenchmarkPostingsForMatchers/Block/i!=""-4                              200070         200041         -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4              100013         100017         +0.00%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4       100017         100023         +0.01%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4                        200073         200046         -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4                200075         200050         -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4              200074         200049         -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4             111165         111150         -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4       200078         200055         -0.01%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4     311282         311238         -0.01%

benchmark                                                               old bytes     new bytes     delta
BenchmarkPostingsForMatchers/Block/n="1"-4                              264           296           +12.12%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4                      360           424           +17.78%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4                      360           424           +17.78%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4                     520           552           +6.15%
BenchmarkPostingsForMatchers/Block/i=~".*"-4                            1600461       1600482       +0.00%
BenchmarkPostingsForMatchers/Block/i=~".+"-4                            24900801      17259077      -30.69%
BenchmarkPostingsForMatchers/Block/i=~""-4                              24900836      17259151      -30.69%
BenchmarkPostingsForMatchers/Block/i!=""-4                              24900760      17259048      -30.69%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4              1600557       1600621       +0.00%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4       1600717       1600813       +0.01%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4                        24900856      17259176      -30.69%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4                24900952      17259304      -30.69%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4              24900993      17259333      -30.69%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4             3788311       3142630       -17.04%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4       24901137      17259509      -30.69%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4     28693086      20405680      -28.88%

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-12-11 19:59:31 +00:00
Brian Brazil aff9f7a9e8 Extend PostingsForMatchers benchmark to cover Blocks too.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-12-11 19:59:31 +00:00
Brian Brazil 4df814f509
Don't buffer up tables in memory on compaction. (#6422)
We can instead write it as we go, and then go back and write in the
length at the end.

Also fix the compaction benchmark, which indicates no changes.

For the benchmark, this brings maximum memory usage of the buffers 
from ~200kB down to 128B.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-12-11 12:49:13 +00:00
johncming 0ae048ff78 tsdb: add error details in log. (#6415)
Signed-off-by: johncming <johncming@yahoo.com>
2019-12-09 10:37:01 +00:00
Ben Ye c0250cf120 Fix some typos(#6423)
Signed-off-by: yeya24 <yb532204897@gmail.com>
2019-12-08 19:16:46 +00:00
Krasimir Georgiev 549164f252
return err instead of panic for corrupted chunk (#6040)
* Fix tsdb panic when querying corrupted chunks.
check that the chunk segment has enough data to read all chunk pieces.
* refactor, simplify and add tests.
* simpfiy WriteChunks implementation

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
2019-12-04 09:37:49 +02:00
Callum Styan 5830e03691
Merge pull request #6337 from cstyan/rw-log-replay
Log the start and end of the WAL replay within the WAL watcher.
2019-12-03 13:24:26 -08:00
Callum Styan 6a24eee340 Simplify duration check for watcher WAL replay.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-11-26 16:53:11 -08:00
Dipack P Panjabi e2dd5b61ef Added CreateBlock and CreateHead functions to new file (#6331)
* Added CreateBlock and CreateHead functions to new file to make it reusable across packages.

Signed-off-by: Dipack P Panjabi <dipack.panjabi@gmail.com>
2019-11-21 19:10:25 +07:00
Callum Styan 2d3ce3916c Log the start and end of the WAL replay within the WAL watcher.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-11-19 14:21:13 -08:00
Tim Bart 2e77f3a52b Fix typo in posting stats. (#6343)
Signed-off-by: Tim Bart <tbart@cloudflare.com>
2019-11-19 21:03:24 +00:00
Tom Wilkie de0a772b8e Port tsdb to use pkg/labels. (#6326)
* Port tsdb to use pkg/labels.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>

* Get tests passing.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>

* Remove useless cast.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>

* Appease linters.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>

* Fix review comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-11-18 11:53:33 -08:00
naivewong 23c0299d85 [tsdb] Improve mergeSeriesSet (#5920)
* refactor and simplify mergeSeriesSet

Signed-off-by: naivewong <867245430@qq.com>
2019-11-15 21:45:29 +07:00
Dipack P Panjabi ce7bab04dd Compute WAL size and use it during retention size checks (#5886)
* Compute WAL size and account for it when applying the retention settings.

Signed-off-by: Dipack P Panjabi <dpanjabi@hudson-trading.com>
2019-11-12 09:40:16 +07:00
Chris Marchbanks c5b3f0221f Decode WAL in Separate Goroutine (#6230)
* Make WAL replay benchmark more representative

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>

* Move decoding records from the WAL into goroutine

Decoding the WAL records accounts for a significant amount of time on
startup, and can be done in parallel with creating series/samples to
speed up startup. However, records still must be handled in order, so
only a single goroutine can do the decoding.

benchmark
old ns/op     new ns/op     delta
BenchmarkLoadWAL/batches=10,seriesPerBatch=100,samplesPerSeries=7200-8
481607033     391971490     -18.61%
BenchmarkLoadWAL/batches=10,seriesPerBatch=10000,samplesPerSeries=50-8
836394378     629067006     -24.79%
BenchmarkLoadWAL/batches=10,seriesPerBatch=1000,samplesPerSeries=480-8
348238658     234218667     -32.74%

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2019-11-07 17:26:45 +01:00
Sharad Gaur e94503ff5c Head Cardinality Status Page (#6125)
* Adding TSDB Head Stats like cardinality to Status Page

Signed-off-by: Sharad Gaur <sgaur@splunk.com>

* Moving mutx to Head

Signed-off-by: Sharad Gaur <sgaur@splunk.com>

* Renaming variabls

Signed-off-by: Sharad Gaur <sgaur@splunk.com>

* Renaming variabls and html

Signed-off-by: Sharad Gaur <sgaur@splunk.com>

* Removing unwanted whitespaces

Signed-off-by: Sharad Gaur <sgaur@splunk.com>

* Adding Tests, Banchmarks and Max Heap for Postings Stats

Signed-off-by: Sharad Gaur <sgaur@splunk.com>

* Adding more tests for postingstats and web handler

Signed-off-by: Sharad Gaur <sgaur@splunk.com>

* Adding more tests for postingstats and web handler

Signed-off-by: Sharad Gaur <sgaur@splunk.com>

* Remove generated asset file that is no longer used

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>

* Changing comment and variable name for more readability

Signed-off-by: Sharad Gaur <sgaur@splunk.com>

* Using time.Duration in postings status function and removing refresh button from web page

Signed-off-by: Sharad Gaur <sgaur@splunk.com>
2019-11-04 19:06:13 -07:00
Goutham Veeramachaneni 53a99530d6
Edit TSDB README badges
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-10-24 15:35:47 +05:30
yuxiaobo 7850f1b35c new world spelling mistake
Signed-off-by: yuxiaobo <yuxiaobogo@163.com>
2019-10-17 19:09:54 +08:00
Björn Rabenstein 3c70284802
Merge pull request #6137 from gouthamve/propose-ganesh-tsdb
Step down and propose Ganesh as TSDB maintainer
2019-10-15 12:00:58 +02:00
Goutham Veeramachaneni b035c55a96
Step down and propose Ganesh as TSDB maintainer
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-10-15 12:42:20 +05:30
Krasi Georgiev b05b5f9a30
remove debug fmt.Println in tombstones. (#6135)
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
2019-10-14 14:45:26 +03:00
yuxiaobo 47e51c8b2b Correct spelling mistakes
Signed-off-by: yuxiaobo <yuxiaobogo@163.com>
2019-10-10 18:46:27 +08:00
Krasi Georgiev 81d284f806
Merge the 2.13 release branch to master (#6117) 2019-10-09 17:41:46 +02:00
Björn Rabenstein fc945c4e8a
Merge pull request #6021 from prometheus/reload-checkpoint-async
Garbage collect asynchronously in the WAL Watcher
2019-10-08 11:01:13 +02:00
Ganesh Vernekar 4dfc7a2e77
Remove unnecessary lock in loadWAL (#6107)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-10-08 11:17:40 +05:30
Chris Marchbanks 8df4bca470
Garbage collect asynchronously in the WAL Watcher
The WAL Watcher replays a checkpoint after it is created in order to
garbage collect series that no longer exist in the WAL. Currently the
garbage collection process is done serially with reading from the tip of
the WAL which can cause large delays in writing samples to remote
storage just after compaction occurs.

This also fixes a memory leak where dropped series are not cleaned up as
part of the SeriesReset process.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2019-10-07 14:36:10 -06:00
Ganesh Vernekar 53ea6d6390
Allocate the shards only once while reading WAL (#6093)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-10-03 15:21:39 +05:30
Ganesh Vernekar 493ef2f771
Benchmark for loading WAL (#6081)
* Benchmark for loading WAL

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Add more cases

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-10-03 11:49:55 +05:30
陈谭军 103f26d188 fix the wrong word (#6069)
Signed-off-by: chentanjun <2799194073@qq.com>
2019-09-30 09:54:55 -06:00
AllenZMC 1e62435960 fix wrong spells in live_reader.go (#5899)
* fix wrong spells in live_reader.go
* fix wrong spells in lex.go

Signed-off-by: czm <zhongming.chang@daocloud.io>
2019-09-21 16:36:33 +03:00
johncming 612f9cb361 tsdb/wal: pull out wal metrics separately as tsdb.DB (#5957)
Signed-off-by: johncming <johncming@yahoo.com>
2019-09-19 14:24:34 +03:00
zhulongcheng e081406b5b tsdb: update chunks format (#6033)
Signed-off-by: zhulongcheng <zhulongcheng.dev@gmail.com>
2019-09-19 13:56:32 +03:00
Callum Styan 3344bb5c33 Move WAL watcher code to tsdb/wal package. (#5999)
* Move WAL watcher code to tsdb/wal package.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix tests after moving WAL watcher code.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Lint fixes.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-09-19 14:45:41 +05:30
Yao Zengzeng d1f21552b9 some refactor to make PostingsForMatchers more readable (#5897)
Signed-off-by: YaoZengzeng <yaozengzeng@zju.edu.cn>
2019-09-13 16:10:35 +01:00
Lucas Servén Marín 8ab628b354 tsdb: allow readonly DB to create flush WAL (#6006)
This PR gives the readonly DB the ability to create blocks from the WAL.
In order to implement this, we modify DBReadOnly.Blocks() to return an
empty slice and no error if no blocks are found.
xref: https://github.com/prometheus/tsdb/issues/346#issuecomment-520786524

Signed-off-by: Lucas Servén Marín <lserven@gmail.com>
2019-09-13 11:25:21 +01:00
zhulongcheng 937cc1a52a tsdb: add block meta version constant (#5994)
Signed-off-by: zhulongcheng <zhulongcheng.dev@gmail.com>
2019-09-09 12:28:01 +03:00
Erfan Besharat 9336c01dfd Add methods to fetch page's buf data in tsdb WAL (#5967)
* move the WAL page buf reset in its own func

Signed-off-by: Erfan Besharat <erbesharat@gmail.com>
2019-08-30 11:38:36 +03:00
陈谭军 50d453b3c3 fix-up tsdb-typo (#5954)
Signed-off-by: chentanjun <2799194073@qq.com>
2019-08-28 14:43:02 +01:00
li mengyang 1c6d2194c4 fix spelling mistakes in docs (#5952)
Signed-off-by: hwdef <hwdef97@gmail.com>
2019-08-27 11:33:40 -06:00
johncming 7d43feb03f tsdb/wal: some small refactoring for easier reading (#5930)
Signed-off-by: johncming <johncming@yahoo.com>
2019-08-22 16:12:59 +03:00
Chris Marchbanks 5e36aa1491 Add test for MemPostings.Delete (#5910)
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2019-08-22 15:19:12 +03:00
Bartek Plotka f0863a604e Removed extra tsdb/testutil after merge.
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2019-08-14 10:12:32 +01:00
Ganesh Vernekar 5ecef3542d
Cleanup after merging tsdb into prometheus
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-08-13 14:04:14 +05:30
Ganesh Vernekar 7cf09b0395
Moving tsdb into its own subdirectory
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-08-13 13:58:49 +05:30