Merge pull request #15135 from prometheus/beorn7/docs

docs: Update chunk layot for NHCB
2025-03-05 20:59:13 -08:00 · 2024-10-10 12:00:51 +02:00 · 2024-10-10 12:00:51 +02:00 · 720a2599a6
parent 02d0de9987 12c39d5421
commit 720a2599a6
1 changed files with 43 additions and 8 deletions
--- a/tsdb/docs/format/chunks.md
+++ b/tsdb/docs/format/chunks.md
@ -29,14 +29,15 @@ in-file offset (lower 4 bytes) and segment sequence number (upper 4 bytes).
 # Chunk
 ```
-┌───────────────┬───────────────────┬──────────────┬────────────────┐
+┌───────────────┬───────────────────┬─────────────┬────────────────┐
-│ len <uvarint> │ encoding <1 byte> │ data <bytes> │ CRC32 <4 byte> │
+│ len <uvarint> │ encoding <1 byte> │ data <data> │ CRC32 <4 byte> │
-└───────────────┴───────────────────┴──────────────┴────────────────┘
+└───────────────┴───────────────────┴─────────────┴────────────────┘
 ```
 Notes:
 * `<uvarint>` has 1 to 10 bytes.
-* `encoding`: Currently either `XOR`, `histogram`, or `float histogram`.
+* `encoding`: Currently either `XOR`, `histogram`, or `floathistogram`, see 
  [code for numerical values](https://github.com/prometheus/prometheus/blob/02d0de9987ad99dee5de21853715954fadb3239f/tsdb/chunkenc/chunk.go#L28-L47).
 * `data`: See below for each encoding.
 ## XOR chunk data
@ -67,9 +68,9 @@ Notes:
 ## Histogram chunk data
 ```
-┌──────────────────────┬──────────────────────────┬───────────────────────────────┬─────────────────────┬──────────────────┬──────────────────┬────────────────┬──────────────────┐
+┌──────────────────────┬──────────────────────────┬───────────────────────────────┬─────────────────────┬──────────────────┬──────────────────┬──────────────────────┬────────────────┬──────────────────┐
-│ num_samples <uint16> │ histogram_flags <1 byte> │ zero_threshold <1 or 9 bytes> │ schema <varbit_int> │ pos_spans <data> │ neg_spans <data> │ samples <data> │ padding <x bits> │
+│ num_samples <uint16> │ histogram_flags <1 byte> │ zero_threshold <1 or 9 bytes> │ schema <varbit_int> │ pos_spans <data> │ neg_spans <data> │ custom_values <data> │ samples <data> │ padding <x bits> │
-└──────────────────────┴──────────────────────────┴───────────────────────────────┴─────────────────────┴──────────────────┴──────────────────┴────────────────┴──────────────────┘
+└──────────────────────┴──────────────────────────┴───────────────────────────────┴─────────────────────┴──────────────────┴──────────────────┴──────────────────────┴────────────────┴──────────────────┘
 ```
 ### Positive and negative spans data:
@ -80,6 +81,16 @@ Notes:
 └─────────────────────────┴────────────────────────┴───────────────────────┴────────────────────────┴───────────────────────┴─────┴────────────────────────┴───────────────────────┘
 ```
 ### Custom values data:
 The `custom_values` data is currently only used for schema -53 (custom bucket boundaries). For other schemas, it is empty (length of zero).
 ```
 ┌──────────────────────────┬──────────────────┬──────────────────┬─────┬──────────────────┐
 │ num_values <varbit_uint> │ value_0 <custom> │ value_1 <custom> │ ... │ value_n <custom> │
 └──────────────────────────┴─────────────────────────────────────┴─────┴──────────────────┘
 ```
 ### Samples data:
 ```
@ -131,7 +142,9 @@ Notes:
  * If 0, it is a single zero byte.
  * If a power of two between 2^-243 and 2^10, it is a single byte between 1 and 254.
  * Otherwise, it is a byte with all bits set (255), followed by a float64, resulting in 9 bytes length.
-* `schema` is a specific value defined by the exposition format. Currently valid values are -4 <= n <= 8.
+* `schema` is a specific value defined by the exposition format. Currently
  valid values are either -4 <= n <= 8 (standard exponential schemas) or -53
  (custom bucket boundaries).
 * `<varbit_int>` is a variable bitwidth encoding for signed integers, optimized for “delta of deltas” of bucket deltas. It has between 1 bit and 9 bytes.
  See [code for details](https://github.com/prometheus/prometheus/blob/8c1507ebaa4ca552958ffb60c2d1b21afb7150e4/tsdb/chunkenc/varbit.go#L31-L60).
 * `<varbit_uint>` is a variable bitwidth encoding for unsigned integers with the same bit-bucketing as `<varbit_int>`.
@ -143,6 +156,28 @@ Notes:
 * The chunk can have as few as one sample, i.e. sample 1 and following are optional.
 * Similarly, there could be down to zero spans and down to zero buckets.
 The `<custom>` encoding within the custom values data depends on the schema.
 For schema -53 (custom bucket boundaries, currently the only use case for
 custom values), the values to encode are bucket boundaries in the form of
 floats. The encoding of a given float value _x_ works as follows:
 1. Create an intermediate value _y_ = _x_ * 1000.
 2. If 0 ≤ _y_ ≤ 33554430 _and_ if the decimal value of _y_ is integer, store
   _y_ + 1 as `<varbit_uint>`.
 3. Otherwise, store a 0 bit, followed by the 64 bit of the original _x_
   encoded as plain `<float64>`.
 Note that values stored as per (2) will always start with a 1 bit, which allow
 decoders to recognize this case in contrast to values stores as per (3), which
 always start with a 0 bit.
 The rational behind this encoding is that most custom bucket boundaries are set
 by humans as decimal numbers with not very many decimal places. In most cases,
 the encoding will therefore result in a short varbit representation. The upper
 bound of 33554430 is picked so that the varbit encoded value will take at most
 4 bytes.
 ## Float histogram chunk data
 Float histograms have the same layout as histograms apart from the encoding of samples.