Merge pull request #9533 from prometheus/beorn7/sparsehistogram

tsdb: Complete chunk format documentation
This commit is contained in:
Björn Rabenstein 2021-10-19 13:51:46 +02:00 committed by GitHub
commit 3704c6c20a
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 60 additions and 13 deletions

View file

@ -27,8 +27,6 @@ const ()
// HistogramChunk holds encoded sample data for a sparse, high-resolution // HistogramChunk holds encoded sample data for a sparse, high-resolution
// histogram. // histogram.
// //
// TODO(beorn7): Document the layout of chunk metadata.
//
// Each sample has multiple "fields", stored in the following way (raw = store // Each sample has multiple "fields", stored in the following way (raw = store
// number directly, delta = store delta to the previous number, dod = store // number directly, delta = store delta to the previous number, dod = store
// delta of the delta to the previous number, xor = what we do for regular // delta of the delta to the previous number, xor = what we do for regular

View file

@ -42,9 +42,9 @@ Notes:
## XOR chunk data ## XOR chunk data
``` ```
┌──────────────────────┬───────────────┬───────────────┬──────────────────────┬──────────────────────┬──────────────────────┬──────────────────────┬─────┐ ┌──────────────────────┬───────────────┬───────────────┬──────────────────────┬──────────────────────┬──────────────────────┬──────────────────────┬─────┬──────────────────────┬──────────────────────┬──────────────────
│ num_samples <uint16> │ ts_0 <varint> │ v_0 <float64> │ ts_1_delta <uvarint> │ v_1_xor <varbit_xor> │ ts_n_dod <varbit_ts> │ v_n_xor <varbit_xor> │ ... │ │ num_samples <uint16> │ ts_0 <varint> │ v_0 <float64> │ ts_1_delta <uvarint> │ v_1_xor <varbit_xor> │ ts_2_dod <varbit_ts> │ v_2_xor <varbit_xor> │ ... │ ts_n_dod <varbit_ts> │ v_n_xor <varbit_xor> │ padding <x bits>
└──────────────────────┴───────────────┴───────────────┴──────────────────────┴──────────────────────┴──────────────────────┴──────────────────────┴─────┘ └──────────────────────┴───────────────┴───────────────┴──────────────────────┴──────────────────────┴──────────────────────┴──────────────────────┴─────┴──────────────────────┴──────────────────────┴──────────────────
``` ```
### Notes: ### Notes:
@ -55,41 +55,90 @@ Notes:
* `<varint>` and `<uvarint>` have 1 to 10 bytes each. * `<varint>` and `<uvarint>` have 1 to 10 bytes each.
* `ts_1_delta` is `ts_1` `ts_0`. * `ts_1_delta` is `ts_1` `ts_0`.
* `ts_n_dod` is the “delta of deltas” of timestamps, i.e. (`ts_n` `ts_n-1`) (`ts_n-1` `ts_n-2`). * `ts_n_dod` is the “delta of deltas” of timestamps, i.e. (`ts_n` `ts_n-1`) (`ts_n-1` `ts_n-2`).
* `<v_n_xor>` is the result of `v_n` XOR `v_n-1`. * `v_n_xor` is the result of `v_n` XOR `v_n-1`.
* `<varbit_xor>` is a specific variable bitwidth encoding of the result of XORing the current and the previous value. It has between 1 bit and 77 bits. * `<varbit_xor>` is a specific variable bitwidth encoding of the result of XORing the current and the previous value. It has between 1 bit and 77 bits.
See [code for details](https://github.com/prometheus/prometheus/blob/7309c20e7e5774e7838f183ec97c65baa4362edc/tsdb/chunkenc/xor.go#L220-L253). See [code for details](https://github.com/prometheus/prometheus/blob/7309c20e7e5774e7838f183ec97c65baa4362edc/tsdb/chunkenc/xor.go#L220-L253).
* `<varbit_ts>` is a specific variable bitwidth encoding for the “delta of deltas” of timestamps (signed integers that are ideally small). * `<varbit_ts>` is a specific variable bitwidth encoding for the “delta of deltas” of timestamps (signed integers that are ideally small).
It has between 1 and 68 bits. It has between 1 and 68 bits.
see [code for details](https://github.com/prometheus/prometheus/blob/7309c20e7e5774e7838f183ec97c65baa4362edc/tsdb/chunkenc/xor.go#L179-L205). see [code for details](https://github.com/prometheus/prometheus/blob/7309c20e7e5774e7838f183ec97c65baa4362edc/tsdb/chunkenc/xor.go#L179-L205).
* `padding` of 0 to 7 bits so that the whole chunk data is byte-aligned.
* The chunk can have as few as one sample, i.e. `ts_1`, `v_1`, etc. are optional.
## Histogram chunk data ## Histogram chunk data
``` ```
┌──────────────────────┬───────────────────────────────┬─────────────────────┬──────────────────┬──────────────────┬────────────────┐ ┌──────────────────────┬──────────────────────────┬───────────────────────────────┬─────────────────────┬──────────────────┬──────────────────┬────────────────┬──────────────────┐
│ num_samples <uint16> │ zero_threshold <1 or 9 bytes> │ schema <varbit_int> │ pos_spans <data> │ neg_spans <data> │ samples <data> │ num_samples <uint16>histogram_flags <1 byte>zero_threshold <1 or 9 bytes> │ schema <varbit_int> │ pos_spans <data> │ neg_spans <data> │ samples <data> padding <x bits>
└──────────────────────┴───────────────────────────────┴─────────────────────┴──────────────────┴──────────────────┴────────────────┘ └──────────────────────┴──────────────────────────┴───────────────────────────────┴─────────────────────┴──────────────────┴──────────────────┴────────────────┴──────────────────┘
``` ```
### Positive and negative spans data: ### Positive and negative spans data:
``` ```
┌───────────────────┬────────────────────────┬───────────────────────┬─────┬───────────────────────────────────────────────────┐ ┌─────────────────────────┬────────────────────────┬───────────────────────┬────────────────────────┬───────────────────────┬─────┬────────────────────────┬───────────────────────┐
│ num <varbit_uint> │ length_1 <varbit_uint> │ offset_1 <varbit_int> │ ... │ length_num <varbit_uint> │ offset_num <varbit_int> │ num_spans <varbit_uint> │ length_0 <varbit_uint> │ offset_0 <varbit_int> │ length_1 <varbit_uint> │ offset_1 <varbit_int> │ ... │ length_n <varbit_uint> │ offset_n <varbit_int>
└───────────────────┴────────────────────────┴───────────────────────┴─────┴───────────────────────────────────────────────────┘ └─────────────────────────┴────────────────────────┴───────────────────────┴────────────────────────┴───────────────────────┴─────┴────────────────────────┴───────────────────────┘
``` ```
### Samples data: ### Samples data:
``` ```
TODO ┌──────────────────────────┐
│ sample_0 <data>
├──────────────────────────┤
│ sample_1 <data>
├──────────────────────────┤
│ sample_2 <data>
├──────────────────────────┤
│ ... │
├──────────────────────────┤
│ Sample_n <data>
└──────────────────────────┘
```
#### Sample 0 data:
```
┌─────────────────┬─────────────────────┬──────────────────────────┬───────────────┬───────────────────────────┬─────┬───────────────────────────┬───────────────────────────┬─────┬───────────────────────────┐
│ ts <varbit_int> │ count <varbit_uint> │ zero_count <varbit_uint> │ sum <float64> │ pos_bucket_0 <varbit_int> │ ... │ pos_bucket_n <varbit_int> │ neg_bucket_0 <varbit_int> │ ... │ neg_bucket_n <varbit_int>
└─────────────────┴─────────────────────┴──────────────────────────┴───────────────┴───────────────────────────┴─────┴───────────────────────────┴───────────────────────────┴─────┴───────────────────────────┘
```
#### Sample 1 data:
```
┌────────────────────────┬───────────────────────────┬────────────────────────────────┬──────────────────────┬─────────────────────────────────┬─────┬─────────────────────────────────┬─────────────────────────────────┬─────┬─────────────────────────────────┐
│ ts_delta <varbit_uint> │ count_delta <varbit_uint> │ zero_count_delta <varbit_uint> │ sum_xor <varbit_xor> │ pos_bucket_0_delta <varbit_int> │ ... │ pos_bucket_n_delta <varbit_int> │ neg_bucket_0_delta <varbit_int> │ ... │ neg_bucket_n_delta <varbit_int>
└────────────────────────┴───────────────────────────┴────────────────────────────────┴──────────────────────┴─────────────────────────────────┴─────┴─────────────────────────────────┴─────────────────────────────────┴─────┴─────────────────────────────────┘
```
#### Sample 2 data and following:
```
┌─────────────────────┬────────────────────────┬─────────────────────────────┬──────────────────────┬───────────────────────────────┬─────┬───────────────────────────────┬───────────────────────────────┬─────┬───────────────────────────────┐
│ ts_dod <varbit_int> │ count_dod <varbit_int> │ zero_count_dod <varbit_int> │ sum_xor <varbit_xor> │ pos_bucket_0_dod <varbit_int> │ ... │ pos_bucket_n_dod <varbit_int> │ neg_bucket_0_dod <varbit_int> │ ... │ neg_bucket_n_dod <varbit_int>
└─────────────────────┴────────────────────────┴─────────────────────────────┴──────────────────────┴───────────────────────────────┴─────┴───────────────────────────────┴───────────────────────────────┴─────┴───────────────────────────────┘
``` ```
### Notes: ### Notes:
* `histogram_flags` is a byte of which currently only the first two bits are used:
* `10`: Counter reset between the previous chunk and this one.
* `01`: No counter reset between the previous chunk and this one.
* `00`: Counter reset status unknown.
* `11`: Chunk is part of a gauge histogram, no counter resets are happening.
* `zero_threshold` has a specific encoding: * `zero_threshold` has a specific encoding:
* If 0, it is a single zero byte. * If 0, it is a single zero byte.
* If a power of two between 2^-243 and 2^10, it is a single byte between 1 and 254. * If a power of two between 2^-243 and 2^10, it is a single byte between 1 and 254.
* Otherwise, it is a byte with all bits set (255), followed by a float64, resulting in 9 bytes length. * Otherwise, it is a byte with all bits set (255), followed by a float64, resulting in 9 bytes length.
* `schema` is a specific value defined by the exposition format. Currently valid values are -4 <= n <= 8. * `schema` is a specific value defined by the exposition format. Currently valid values are -4 <= n <= 8.
* `<varbit_int>` is a variable bitwidth encoding for signed integers, optimized for “delta of deltas” of bucket deltas. It has between 1 bit and 9 bytes. * `<varbit_int>` is a variable bitwidth encoding for signed integers, optimized for “delta of deltas” of bucket deltas. It has between 1 bit and 9 bytes.
See [code for details](https://github.com/prometheus/prometheus/blob/8c1507ebaa4ca552958ffb60c2d1b21afb7150e4/tsdb/chunkenc/varbit.go#L31-L60).
* `<varbit_uint>` is a variable bitwidth encoding for unsigned integers with the same bit-bucketing as `<varbit_int>`. * `<varbit_uint>` is a variable bitwidth encoding for unsigned integers with the same bit-bucketing as `<varbit_int>`.
See [code for details](https://github.com/prometheus/prometheus/blob/8c1507ebaa4ca552958ffb60c2d1b21afb7150e4/tsdb/chunkenc/varbit.go#L136-L165).
* `<varbit_xor>` is a specific variable bitwidth encoding of the result of XORing the current and the previous value. It has between 1 bit and 77 bits.
See [code for details](https://github.com/prometheus/prometheus/blob/8c1507ebaa4ca552958ffb60c2d1b21afb7150e4/tsdb/chunkenc/histogram.go#L538-L574).
* `padding` of 0 to 7 bits so that the whole chunk data is byte-aligned.
* Note that buckets are inherently deltas between the current bucket and the previous bucket. Only `bucket_0` is an absolute count.
* The chunk can have as few as one sample, i.e. sample 1 and following are optional.
* Similarly, there could be down to zero spans and down to zero buckets.