diff --git a/tsdb/docs/format/chunks.md b/tsdb/docs/format/chunks.md index ae2548594..7eb0820e4 100644 --- a/tsdb/docs/format/chunks.md +++ b/tsdb/docs/format/chunks.md @@ -29,14 +29,15 @@ in-file offset (lower 4 bytes) and segment sequence number (upper 4 bytes). # Chunk ``` -┌───────────────┬───────────────────┬──────────────┬────────────────┐ -│ len │ encoding <1 byte> │ data │ CRC32 <4 byte> │ -└───────────────┴───────────────────┴──────────────┴────────────────┘ +┌───────────────┬───────────────────┬─────────────┬────────────────┐ +│ len │ encoding <1 byte> │ data │ CRC32 <4 byte> │ +└───────────────┴───────────────────┴─────────────┴────────────────┘ ``` Notes: * `` has 1 to 10 bytes. -* `encoding`: Currently either `XOR`, `histogram`, or `float histogram`. +* `encoding`: Currently either `XOR`, `histogram`, or `floathistogram`, see + [code for numerical values](https://github.com/prometheus/prometheus/blob/02d0de9987ad99dee5de21853715954fadb3239f/tsdb/chunkenc/chunk.go#L28-L47). * `data`: See below for each encoding. ## XOR chunk data @@ -67,9 +68,9 @@ Notes: ## Histogram chunk data ``` -┌──────────────────────┬──────────────────────────┬───────────────────────────────┬─────────────────────┬──────────────────┬──────────────────┬────────────────┬──────────────────┐ -│ num_samples │ histogram_flags <1 byte> │ zero_threshold <1 or 9 bytes> │ schema │ pos_spans │ neg_spans │ samples │ padding │ -└──────────────────────┴──────────────────────────┴───────────────────────────────┴─────────────────────┴──────────────────┴──────────────────┴────────────────┴──────────────────┘ +┌──────────────────────┬──────────────────────────┬───────────────────────────────┬─────────────────────┬──────────────────┬──────────────────┬──────────────────────┬────────────────┬──────────────────┐ +│ num_samples │ histogram_flags <1 byte> │ zero_threshold <1 or 9 bytes> │ schema │ pos_spans │ neg_spans │ custom_values │ samples │ padding │ +└──────────────────────┴──────────────────────────┴───────────────────────────────┴─────────────────────┴──────────────────┴──────────────────┴──────────────────────┴────────────────┴──────────────────┘ ``` ### Positive and negative spans data: @@ -80,6 +81,16 @@ Notes: └─────────────────────────┴────────────────────────┴───────────────────────┴────────────────────────┴───────────────────────┴─────┴────────────────────────┴───────────────────────┘ ``` +### Custom values data: + +The `custom_values` data is currently only used for schema -53 (custom bucket boundaries). For other schemas, it is empty (length of zero). + +``` +┌──────────────────────────┬──────────────────┬──────────────────┬─────┬──────────────────┐ +│ num_values │ value_0 │ value_1 │ ... │ value_n │ +└──────────────────────────┴─────────────────────────────────────┴─────┴──────────────────┘ +``` + ### Samples data: ``` @@ -131,7 +142,9 @@ Notes: * If 0, it is a single zero byte. * If a power of two between 2^-243 and 2^10, it is a single byte between 1 and 254. * Otherwise, it is a byte with all bits set (255), followed by a float64, resulting in 9 bytes length. -* `schema` is a specific value defined by the exposition format. Currently valid values are -4 <= n <= 8. +* `schema` is a specific value defined by the exposition format. Currently + valid values are either -4 <= n <= 8 (standard exponential schemas) or -53 + (custom bucket boundaries). * `` is a variable bitwidth encoding for signed integers, optimized for “delta of deltas” of bucket deltas. It has between 1 bit and 9 bytes. See [code for details](https://github.com/prometheus/prometheus/blob/8c1507ebaa4ca552958ffb60c2d1b21afb7150e4/tsdb/chunkenc/varbit.go#L31-L60). * `` is a variable bitwidth encoding for unsigned integers with the same bit-bucketing as ``. @@ -143,6 +156,28 @@ Notes: * The chunk can have as few as one sample, i.e. sample 1 and following are optional. * Similarly, there could be down to zero spans and down to zero buckets. +The `` encoding within the custom values data depends on the schema. +For schema -53 (custom bucket boundaries, currently the only use case for +custom values), the values to encode are bucket boundaries in the form of +floats. The encoding of a given float value _x_ works as follows: + +1. Create an intermediate value _y_ = _x_ * 1000. +2. If 0 ≤ _y_ ≤ 33554430 _and_ if the decimal value of _y_ is integer, store + _y_ + 1 as ``. +3. Otherwise, store a 0 bit, followed by the 64 bit of the original _x_ + encoded as plain ``. + +Note that values stored as per (2) will always start with a 1 bit, which allow +decoders to recognize this case in contrast to values stores as per (3), which +always start with a 0 bit. + +The rational behind this encoding is that most custom bucket boundaries are set +by humans as decimal numbers with not very many decimal places. In most cases, +the encoding will therefore result in a short varbit representation. The upper +bound of 33554430 is picked so that the varbit encoded value will take at most +4 bytes. + + ## Float histogram chunk data Float histograms have the same layout as histograms apart from the encoding of samples.