Commit graph

117 commits

Author SHA1 Message Date
Dieter Plaetinck d5bfbe3114
improve bstream comments and doc (#9560)
* improve bstream comments and doc

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* feedback

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
2021-10-25 18:44:15 +05:30
beorn7 4998b9750f chunkenc: Bugfix and naming tweaks
Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-19 15:38:32 +02:00
beorn7 78ef9c6359 chunkenc: make xor reading more DRY
Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-19 15:28:33 +02:00
beorn7 4a1b84f8b2 chunkenc: make xor writing more DRY
Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-19 15:28:33 +02:00
Björn Rabenstein 3704c6c20a
Merge pull request #9533 from prometheus/beorn7/sparsehistogram
tsdb: Complete chunk format documentation
2021-10-19 13:51:46 +02:00
beorn7 1a4e54cfbb tsdb: Complete chunk format documentation
This also tweaks and fixes a few things done previously.

Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-19 13:51:30 +02:00
beorn7 0876d57aea chunkenc: Add test for chunk layout encoding
And fix a bug exposed by it...

Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-18 19:37:24 +02:00
beorn7 ad9b4c2b68 Fix typos
Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-18 15:44:13 +02:00
beorn7 ed33aea392 Avoid redundant varint decoding in chunk appender construction
Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-15 20:33:14 +02:00
beorn7 d31bb75dc4 Use VarbitUint rather than VarbitInt to encode len(spans)
Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-15 15:27:32 +02:00
beorn7 3179215a59 Encode zero threshold first
This guaranees that the zero threshold is byte-aligned. Not sure if
that helps in any way, but at least it won't harm.

Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-14 14:55:21 +02:00
beorn7 c5522677bf Improve encoding of zero threshold
Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-14 14:47:26 +02:00
beorn7 7093b089f2 Use more varbit in histogram chunks
This adds bit buckets for larger numbers to varbit encoding and also
an unsigned version of varbit encoding.

Then, varbit encoding is used for all the histogram chunk data instead
of varint.

Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-13 20:03:35 +02:00
Björn Rabenstein 7309c20e7e
Merge pull request #9500 from codesome/resettests
Add unit test for counter reset header
2021-10-13 18:19:21 +02:00
Ganesh Vernekar dcaf568279
Metadata -> Layout renaming
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-10-13 20:27:48 +05:30
Ganesh Vernekar 4e206c7c77
Fix reviews
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-10-13 20:23:31 +05:30
Ganesh Vernekar 85e6686f84
Add unit test for counter reset header
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-10-13 15:57:42 +05:30
Björn Rabenstein 311673d62e
Save on slice allocations in histogramIterator (#9494)
Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-13 14:43:49 +05:30
Björn Rabenstein c450c01eb9
Remove obsolete TODOs about metadata (#9490)
Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-12 17:00:30 +05:30
beorn7 7a8bb8222c Style cleanup of all the changes in sparsehistogram so far
A lot of this code was hacked together, literally during a
hackathon. This commit intends not to change the code substantially,
but just make the code obey the usual style practices.

A (possibly incomplete) list of areas:

* Generally address linter warnings.

* The `pgk` directory is deprecated as per dev-summit. No new packages should
  be added to it. I moved the new `pkg/histogram` package to `model`
  anticipating what's proposed in #9478.

* Make the naming of the Sparse Histogram more consistent. Including
  abbreviations, there were just too many names for it: SparseHistogram,
  Histogram, Histo, hist, his, shs, h. The idea is to call it "Histogram" in
  general. Only add "Sparse" if it is needed to avoid confusion with
  conventional Histograms (which is rare because the TSDB really has no notion
  of conventional Histograms). Use abbreviations only in local scope, and then
  really abbreviate (not just removing three out of seven letters like in
  "Histo"). This is in the spirit of
  https://github.com/golang/go/wiki/CodeReviewComments#variable-names

* Several other minor name changes.

* A lot of formatting of doc comments. For one, following
  https://github.com/golang/go/wiki/CodeReviewComments#comment-sentences
  , but also layout question, anticipating how things will look like
  when rendered by `godoc` (even where `godoc` doesn't render them
  right now because they are for unexported types or not a doc comment
  at all but just a normal code comment - consistency is queen!).

* Re-enabled `TestQueryLog` and `TestEndopints` (they pass now,
  leaving them disabled was presumably an oversight).

* Bucket iterator for histogram.Histogram is now created with a
  method.

* HistogramChunk.iterator now allows iterator recycling. (I think
  @dieterbe only commented it out because he was confused by the
  question in the comment.)

* HistogramAppender.Append panics now because we decided to treat
  staleness marker differently.

Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-11 13:02:03 +02:00
Ganesh Vernekar 5d4dc7e413
Convert the header into an enum
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-10-07 19:53:24 +05:30
Ganesh Vernekar 175ef4ebcf
Add a NotCounterReset flag
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-10-06 16:06:34 +05:30
Ganesh Vernekar a280b6c2da
Fix review comments
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-10-06 15:28:10 +05:30
Ganesh Vernekar eb9931e961
Add info about counter resets in chunk meta
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-10-04 18:44:12 +05:30
Ganesh Vernekar 1dd22ed655
Support stale samples for sparse histograms (#9352)
* Support stale samples for sparse histograms

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Don't cut a new chunk for every stale sample

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Update comments for HistoAppender.Appendable

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix review comments

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-10-01 13:41:51 +05:30
Ganesh Vernekar c373200b75
Cut a new chunk on counter resets for any bucket
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-08-24 20:02:08 +05:30
Ganesh Vernekar 19e98e5469
Support storing the zero threshold in the histogram chunk (#9165)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-08-06 18:08:41 +05:30
Ganesh Vernekar 7026e6b4e4
Fix tests in histo_test.go (#9163)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-08-06 14:26:56 +05:30
Ganesh Vernekar 8b70e87ab9
Merge remote-tracking branch 'upstream/main' into sparse-refactor
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-08-05 12:16:08 +05:30
jinglina 1a430e5f89
remove redundant parentheses (#9134)
Signed-off-by: jinglina <jinglinax@163.com>
2021-07-29 18:26:57 +05:30
Bryan Boreham 6788760efa
Reduce memory allocation in benchmarkIterator() (#5983)
Previously it was allocating millions of chunks, all containing the
same 250 samples.  Above some ratio of CPU performance to available
memory, the benchmark cannot run.

Make 250 a const and just allocate one chunk which we iterate
repeatedly till we reach the benchmark count.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2021-07-26 19:36:54 +05:30
Ganesh Vernekar 4fefd7520e
Skip the failing TestHistoChunkSameBuckets (#9089)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-07-15 14:35:35 +05:30
beorn7 cb75747bce Fix re-encoding
Signed-off-by: beorn7 <beorn@grafana.com>
2021-07-06 00:20:35 +02:00
beorn7 01957eee2b Fix interjections even more
Signed-off-by: beorn7 <beorn@grafana.com>
2021-07-05 23:59:33 +02:00
beorn7 dc1c744169 Fix interjections at the end
Signed-off-by: beorn7 <beorn@grafana.com>
2021-07-05 23:01:39 +02:00
Oleg Zaytsev 40126a8494
Use binary literals for xor chunk encoding
An opinionated cosmetic change, but since go 1.13 we have this fancy
0b.... literals so we don't need to write hex and comment the binary
value.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2021-07-05 16:39:24 +02:00
beorn7 deb02d59fb Fix lint issues
Signed-off-by: beorn7 <beorn@grafana.com>
2021-07-05 15:27:46 +02:00
Dieter Plaetinck dc6b068c67 bugfix: only bump numRead when all fields are successfully read
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
2021-07-05 15:57:47 +03:00
Dieter Plaetinck 98f86d671a cleanup comments
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
2021-07-05 15:56:38 +03:00
Dieter Plaetinck 99ae04bb6f add SHS chunk recoding and head cutting to head block (no tests yet)
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
2021-07-05 15:56:38 +03:00
Ganesh Vernekar 4c01ff5194
Bunch of fixes for sparse histograms (#9043)
* Do not panic on histoAppender.Append

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* M-map all chunks on shutdown

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Support negative schema for querying

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-07-03 23:04:34 +05:30
Dieter Plaetinck 6c13375ac8
sparsehistogram recoding upon detection that new buckets have appeared (#9030)
* bucketIterator which returns all valid bucket indices for a []span

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* support for comparing []spans and generating interjections

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* add license header

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* assert order fix

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* handle pathological 0-length span case more gracefully

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* stale todo

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* decode-recode histograms when new buckets appear

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* factor out recoding and also add it to the fallback case

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* make linter happy

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
2021-07-02 11:50:30 +05:30
beorn7 518b77c59d Fix a few trivial style nits
Signed-off-by: beorn7 <beorn@grafana.com>
2021-07-01 17:11:54 +02:00
Ganesh Vernekar f4d3af73f0
Query histograms from TSDB and unit test for append+query (#9022)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-06-30 20:18:13 +05:30
Dieter Plaetinck 4d27816ea5
Sparsehistogram: improve dod encoding, testing, encode chunk metadata (#9015)
* factor out different varbit schemes and include Beorn's "optimum" for buckets

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* use more compact dod encoding scheme for SHS chunk columns

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* remove FB VB and xor dod encoding because we won't use it

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* HistoChunk metadata encoding

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* add SparseHistogram.Copy()

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* histogram test: test appending a few histograms

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* add license headers

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
2021-06-30 16:15:43 +05:30
Ganesh Vernekar 04ad56d9b8
Append sparse histograms into the Head block (#9013)
* Append sparse histograms into the Head block

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Add AtHistogram() to Iterator interface. Make HistoChunk conform to Chunk interface.

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-06-29 20:08:46 +05:30
Dieter Plaetinck 58917d1b76
sparsehistogram: integer types and timestamp separation (#9014)
* integer types and timestamp separation

1) unify types to int64. as suggested by beorn. we want to support
   counters going down (resets) even if we plan to create new chunks for
   now, in that case
2) histogram type doesn't know its own timestamp. include it separately
   in appending and iteration

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* correction: count and zeroCount to remain unsigned

to make api more resilient and that's what we use in protobuf anyway

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* temp hack. Ganesh will fix

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
2021-06-29 19:27:59 +05:30
Dieter Plaetinck fd11a339a7
Sparsehistogram chunk implementation (#9009)
* histogram chunk

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* xorAppender.AppendHistogram non-method

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* basic histogram chunk test

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
2021-06-29 14:07:41 +05:30
Julien Pivotto 6c56a1faaa
Testify: move to require (#8122)
* Testify: move to require

Moving testify to require to fail tests early in case of errors.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* More moves

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-29 09:43:23 +00:00
johncming 28ca0965f0
tsdb/chunkenc: fix typo of return error. (#7670)
* tsdb/chunkenc: fix typo of return error.

Signed-off-by: johncming <johncming@yahoo.com>

* tsdb: fix typo of function in markdonw.

Signed-off-by: johncming <johncming@yahoo.com>
2020-10-28 12:03:11 +00:00
Julien Pivotto 4e5b1722b3
Move away from testutil, refactor imports (#8087)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-22 11:00:08 +02:00
Marco Pracucci 3b529ddbce
Cleanup bstream_test.go based on post-merge feedback received on #7390 (#7413)
* Fixed bstream test license

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Simplified bstreamReader.loadNextBuffer()

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed date in license

Signed-off-by: Marco Pracucci <marco@pracucci.com>
2020-06-18 14:49:39 +05:30
Marco Pracucci f42ed03dc5
Optimized bstream reader used by XORChunk iterator (#7390)
* Optimized bstream reader

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed linter

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added license to new file

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed type cast

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Changed comments

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Improved comments and rolledback no-op changes

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed race condition

Signed-off-by: Marco Pracucci <marco@pracucci.com>
2020-06-15 16:44:40 +01:00
Bartlomiej Plotka d5c33877f9
storage: Added Chunks{Queryable/Querier/SeriesSet/Series/Iteratable. Added generic Merge{SeriesSet/Querier} implementation. (#7005)
* storage: Added Chunks{Queryable/Querier/SeriesSet/Series/Iteratable. Added generic Merge{SeriesSet/Querier} implementation.

## Rationales:

In many places (e.g. chunk Remote read, Thanos Receive fetching chunk from TSDB), we operate on encoded chunks not samples.
This means that we unnecessary decode/encode, wasting CPU, time and memory.
This PR adds chunk iterator interfaces and makes the merge code to be reused between both seriesSets

I will make the use of it in following PR inside tsdb itself. For now fanout implements it and mergers.

All merges now also allows passing series mergers. This opens doors for custom deduplications other than TSDB vertical ones (e.g. offline one we have in Thanos).

## Changes

* Added Chunk versions of all iterating methods. It all starts in Querier/ChunkQuerier. The plan is that
Storage will implement both chunked and samples.
* Added Seek to chunks.Iterator interface for iterating over chunks.
* NewMergeChunkQuerier was added; Both this and NewMergeQuerier are now using generigMergeQuerier to share the code. Generic code was added.
* Improved tests.
* Added some TODO for further simplifications in next PRs.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Brian's comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Moved s/Labeled/SeriesLabels as per Krasi suggestion.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Krasi's comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Second iteration of Krasi comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Another round of comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-03-24 20:15:47 +00:00
Peter Štibraný 1d396b96dc
Specify that returned samples must be ordered by timestamp. (#6877)
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
2020-02-26 13:11:55 +00:00
Bartlomiej Plotka 59c9d6ef45 Addressed Brian's comments, moved metrics to main.go
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:57 +00:00
Bartlomiej Plotka 5d84e5d895 Make chunkenc.Iterator.At behaviour unspecified without Next done.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:57 +00:00
Bartlomiej Plotka cfba92a133 Addressed comments.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:57 +00:00
Bartlomiej Plotka 849faa407b Minor fixes.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:57 +00:00
Bartlomiej Plotka 2cf637fbf5 Addressed comments.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:57 +00:00
Bartlomiej Plotka 34426766d8 Unify Iterator interfaces. All point to storage now.
This is part of https://github.com/prometheus/prometheus/pull/5882 that can be done to simplify things.
All todos I added will be fixed in follow up PRs.

* querier.Querier, querier.Appender, querier.SeriesSet, and querier.Series interfaces merged
with storage interface.go. All imports that.
* querier.SeriesIterator replaced by chunkenc.Iterator
* Added chunkenc.Iterator.Seek method and tests for xor implementation (?)
* Since we properly handle SelectParams for Select methods I adjusted min max
based on that. This should help in terms of performance for queries with functions like offset.
* added Seek to deletedIterator and test.
* storage/tsdb was removed as it was only a unnecessary glue with incompatible structs.

No logic was changed, only different source of abstractions, so no need for benchmarks.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:54 +00:00
Marco Pracucci 699f3e8f4d
Added comments to the Chunk interface
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2020-02-05 13:07:41 +01:00
Marco Pracucci 0703dae7cc
Compact TSDB head chunks after being cut, to reduce inuse memory
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2020-02-05 13:00:39 +01:00
yuxiaobo 47e51c8b2b Correct spelling mistakes
Signed-off-by: yuxiaobo <yuxiaobogo@163.com>
2019-10-10 18:46:27 +08:00
Bartek Plotka f0863a604e Removed extra tsdb/testutil after merge.
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2019-08-14 10:12:32 +01:00
Ganesh Vernekar 5ecef3542d
Cleanup after merging tsdb into prometheus
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-08-13 14:04:14 +05:30
Ganesh Vernekar 7cf09b0395
Moving tsdb into its own subdirectory
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-08-13 13:58:49 +05:30