prometheus/model
Lukasz Mierzwa 565c6fa704 Reduce stringlabels memory usage for common labels
stringlabels stores all time series labels as a single string using this format:

<length><name><length><value>[<length><name><length><value> ...]

So a label set for my_metric{job=foo, instance="bar", env="prod", blank=""} would be encoded as:

[8]__name__[9]my_metric[3]job[3]foo[8]instance[3]bar[3]env[4]prod[5]blank[0]

This is a huge improvement over 'classic' labels implementation that stores all label names & values as seperate strings.
There is some room for improvement though since some string are present more often than others.
For example __name__ will be present for all label sets of every time series we store in HEAD, eating 1+8=9 bytes.
Since __name__ is well known string we can try to use a single byte to store it in our encoded string, rather than repeat it in full each time.
To be able to store strings that are short cut into a single byte we need to somehow signal that to the reader of the encoded string, for that
we use the fact that zero length strings are rare and generaly not stored on time series. If we have an encoded string with zero length
then this will now signal that it represents a mapped value - to learn the true value of this string we need to read the next byte
which gives us index in a static mapping. That mapping must include empty string, so that we can still encode empty strings using this scheme.

Example of our mapping (minimal version):

0: ""
1: "__name__"
2: "instance"
3: "job"

With that mapping our example label set would be encoded as:

[0]1[9]mymetric[0]3[3]foo[0]2[3]bar[3]env[4]prod[5]blank[0]0

The tricky bit is how to populate this mapping with useful strings that will result in measurable memory savings.
This is further complicated by the fact that the mapping must remain static and cannot be modified during Prometheus lifetime.
We can use all the 255 slots we have inside our mapping byte with well known generic strings and that will
provide some measurable savings for all Prometheus users, and is essentially a slightly more compact stringlabels variant.
We could also allow users to pass in a list of well know strings via flags, which will allow Prometheus operators
to reduce memory usage for any labels if they know those are popular.
Third option is to discover most popular strings from TSDB or WAL on startup, but that's more complicated and
we might pick a list that would be the best set of mapped strings on startup, but after some time is no longer
the best set.

Benchmark results:

goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/model/labels
cpu: 13th Gen Intel(R) Core(TM) i7-13800H
                                                 │   main.txt   │                new1.txt                │
                                                 │    sec/op    │    sec/op      vs base                 │
String-20                                           863.8n ± 4%    873.0n ±  4%         ~ (p=0.353 n=10)
Labels_Get/with_5_labels/first_label/get-20         4.763n ± 1%    5.035n ±  0%    +5.72% (p=0.000 n=10)
Labels_Get/with_5_labels/first_label/has-20         3.439n ± 0%    3.967n ±  0%   +15.37% (p=0.000 n=10)
Labels_Get/with_5_labels/middle_label/get-20        7.077n ± 1%    9.588n ±  1%   +35.47% (p=0.000 n=10)
Labels_Get/with_5_labels/middle_label/has-20        5.166n ± 0%    6.990n ±  1%   +35.30% (p=0.000 n=10)
Labels_Get/with_5_labels/last_label/get-20          9.181n ± 1%   12.970n ±  1%   +41.26% (p=0.000 n=10)
Labels_Get/with_5_labels/last_label/has-20          8.101n ± 1%   11.640n ±  1%   +43.69% (p=0.000 n=10)
Labels_Get/with_5_labels/not-found_label/get-20     3.974n ± 0%    4.768n ±  0%   +19.98% (p=0.000 n=10)
Labels_Get/with_5_labels/not-found_label/has-20     3.974n ± 0%    5.033n ±  0%   +26.65% (p=0.000 n=10)
Labels_Get/with_10_labels/first_label/get-20        4.761n ± 0%    5.042n ±  0%    +5.90% (p=0.000 n=10)
Labels_Get/with_10_labels/first_label/has-20        3.442n ± 0%    3.972n ±  0%   +15.40% (p=0.000 n=10)
Labels_Get/with_10_labels/middle_label/get-20       10.62n ± 1%    14.85n ±  1%   +39.83% (p=0.000 n=10)
Labels_Get/with_10_labels/middle_label/has-20       9.360n ± 1%   13.375n ±  0%   +42.90% (p=0.000 n=10)
Labels_Get/with_10_labels/last_label/get-20         18.19n ± 1%    22.00n ±  0%   +20.97% (p=0.000 n=10)
Labels_Get/with_10_labels/last_label/has-20         16.51n ± 0%    20.50n ±  1%   +24.14% (p=0.000 n=10)
Labels_Get/with_10_labels/not-found_label/get-20    3.985n ± 0%    4.768n ±  0%   +19.62% (p=0.000 n=10)
Labels_Get/with_10_labels/not-found_label/has-20    3.973n ± 0%    5.045n ±  0%   +26.97% (p=0.000 n=10)
Labels_Get/with_30_labels/first_label/get-20        4.773n ± 0%    5.050n ±  1%    +5.80% (p=0.000 n=10)
Labels_Get/with_30_labels/first_label/has-20        3.443n ± 1%    3.976n ±  2%   +15.50% (p=0.000 n=10)
Labels_Get/with_30_labels/middle_label/get-20       31.93n ± 0%    43.50n ±  1%   +36.21% (p=0.000 n=10)
Labels_Get/with_30_labels/middle_label/has-20       30.53n ± 0%    41.75n ±  1%   +36.75% (p=0.000 n=10)
Labels_Get/with_30_labels/last_label/get-20        106.55n ± 0%    71.17n ±  0%   -33.21% (p=0.000 n=10)
Labels_Get/with_30_labels/last_label/has-20        104.70n ± 0%    69.21n ±  1%   -33.90% (p=0.000 n=10)
Labels_Get/with_30_labels/not-found_label/get-20    3.976n ± 1%    4.772n ±  0%   +20.03% (p=0.000 n=10)
Labels_Get/with_30_labels/not-found_label/has-20    3.974n ± 0%    5.032n ±  0%   +26.64% (p=0.000 n=10)
Labels_Equals/equal-20                              2.382n ± 0%    2.446n ±  0%    +2.67% (p=0.000 n=10)
Labels_Equals/not_equal-20                         0.2741n ± 2%   0.2662n ±  2%    -2.88% (p=0.001 n=10)
Labels_Equals/different_sizes-20                   0.2762n ± 3%   0.2652n ±  0%    -3.95% (p=0.000 n=10)
Labels_Equals/lots-20                               2.381n ± 0%    2.386n ±  1%    +0.23% (p=0.011 n=10)
Labels_Equals/real_long_equal-20                    6.087n ± 1%    5.558n ±  1%    -8.70% (p=0.000 n=10)
Labels_Equals/real_long_different_end-20            5.030n ± 0%    4.699n ±  0%    -6.57% (p=0.000 n=10)
Labels_Compare/equal-20                             4.814n ± 1%    4.777n ±  0%    -0.77% (p=0.000 n=10)
Labels_Compare/not_equal-20                         17.55n ± 8%    20.92n ±  1%   +19.24% (p=0.000 n=10)
Labels_Compare/different_sizes-20                   3.711n ± 1%    3.707n ±  0%         ~ (p=0.224 n=10)
Labels_Compare/lots-20                              27.09n ± 3%    28.73n ±  2%    +6.05% (p=0.000 n=10)
Labels_Compare/real_long_equal-20                   27.91n ± 3%    15.67n ±  1%   -43.86% (p=0.000 n=10)
Labels_Compare/real_long_different_end-20           33.92n ± 1%    35.35n ±  1%    +4.22% (p=0.000 n=10)
Labels_Hash/typical_labels_under_1KB-20             59.63n ± 0%    59.67n ±  0%         ~ (p=0.897 n=10)
Labels_Hash/bigger_labels_over_1KB-20               73.42n ± 1%    73.81n ±  1%         ~ (p=0.342 n=10)
Labels_Hash/extremely_large_label_value_10MB-20     720.3µ ± 2%    715.2µ ±  3%         ~ (p=0.971 n=10)
Builder-20                                          371.6n ± 4%   1191.0n ±  3%  +220.46% (p=0.000 n=10)
Labels_Copy-20                                      85.52n ± 4%    53.90n ± 48%   -36.97% (p=0.000 n=10)
geomean                                             13.26n         14.68n         +10.71%

                                                 │   main.txt   │               new1.txt               │
                                                 │     B/op     │    B/op     vs base                  │
String-20                                          240.0 ± 0%     240.0 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Get/with_5_labels/first_label/get-20        0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Get/with_5_labels/first_label/has-20        0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Get/with_5_labels/middle_label/get-20       0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Get/with_5_labels/middle_label/has-20       0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Get/with_5_labels/last_label/get-20         0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Get/with_5_labels/last_label/has-20         0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Get/with_5_labels/not-found_label/get-20    0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Get/with_5_labels/not-found_label/has-20    0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Get/with_10_labels/first_label/get-20       0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Get/with_10_labels/first_label/has-20       0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Get/with_10_labels/middle_label/get-20      0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Get/with_10_labels/middle_label/has-20      0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Get/with_10_labels/last_label/get-20        0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Get/with_10_labels/last_label/has-20        0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Get/with_10_labels/not-found_label/get-20   0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Get/with_10_labels/not-found_label/has-20   0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Get/with_30_labels/first_label/get-20       0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Get/with_30_labels/first_label/has-20       0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Get/with_30_labels/middle_label/get-20      0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Get/with_30_labels/middle_label/has-20      0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Get/with_30_labels/last_label/get-20        0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Get/with_30_labels/last_label/has-20        0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Get/with_30_labels/not-found_label/get-20   0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Get/with_30_labels/not-found_label/has-20   0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Equals/equal-20                             0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Equals/not_equal-20                         0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Equals/different_sizes-20                   0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Equals/lots-20                              0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Equals/real_long_equal-20                   0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Equals/real_long_different_end-20           0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Compare/equal-20                            0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Compare/not_equal-20                        0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Compare/different_sizes-20                  0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Compare/lots-20                             0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Compare/real_long_equal-20                  0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Compare/real_long_different_end-20          0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Hash/typical_labels_under_1KB-20            0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Hash/bigger_labels_over_1KB-20              0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Labels_Hash/extremely_large_label_value_10MB-20    0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Builder-20                                         224.0 ± 0%     192.0 ± 0%  -14.29% (p=0.000 n=10)
Labels_Copy-20                                     224.0 ± 0%     192.0 ± 0%  -14.29% (p=0.000 n=10)
geomean                                                       ²                -0.73%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                                 │   main.txt   │              new1.txt               │
                                                 │  allocs/op   │ allocs/op   vs base                 │
String-20                                          1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Get/with_5_labels/first_label/get-20        0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Get/with_5_labels/first_label/has-20        0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Get/with_5_labels/middle_label/get-20       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Get/with_5_labels/middle_label/has-20       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Get/with_5_labels/last_label/get-20         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Get/with_5_labels/last_label/has-20         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Get/with_5_labels/not-found_label/get-20    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Get/with_5_labels/not-found_label/has-20    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Get/with_10_labels/first_label/get-20       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Get/with_10_labels/first_label/has-20       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Get/with_10_labels/middle_label/get-20      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Get/with_10_labels/middle_label/has-20      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Get/with_10_labels/last_label/get-20        0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Get/with_10_labels/last_label/has-20        0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Get/with_10_labels/not-found_label/get-20   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Get/with_10_labels/not-found_label/has-20   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Get/with_30_labels/first_label/get-20       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Get/with_30_labels/first_label/has-20       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Get/with_30_labels/middle_label/get-20      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Get/with_30_labels/middle_label/has-20      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Get/with_30_labels/last_label/get-20        0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Get/with_30_labels/last_label/has-20        0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Get/with_30_labels/not-found_label/get-20   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Get/with_30_labels/not-found_label/has-20   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Equals/equal-20                             0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Equals/not_equal-20                         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Equals/different_sizes-20                   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Equals/lots-20                              0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Equals/real_long_equal-20                   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Equals/real_long_different_end-20           0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Compare/equal-20                            0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Compare/not_equal-20                        0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Compare/different_sizes-20                  0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Compare/lots-20                             0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Compare/real_long_equal-20                  0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Compare/real_long_different_end-20          0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Hash/typical_labels_under_1KB-20            0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Hash/bigger_labels_over_1KB-20              0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Hash/extremely_large_label_value_10MB-20    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Builder-20                                         1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Labels_Copy-20                                     1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                                       ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
2025-02-20 12:06:33 +00:00
..
exemplar update links to openmetrics to reference the v1.0.0 release 2024-12-13 21:32:27 +00:00
histogram fix TestCuttingNewHeadChunks/really_large_histograms on 32-bit 2024-12-16 10:45:01 -05:00
labels Reduce stringlabels memory usage for common labels 2025-02-20 12:06:33 +00:00
metadata Fix: metadata API using wrong field names (#13633) 2024-02-26 09:53:39 +00:00
relabel Addressed comments. 2025-01-27 09:54:13 +00:00
rulefmt rulefmt: support YAML aliases for Alert/Record/Expr (#14957) 2025-02-13 20:48:33 +11:00
textparse textparse: Optimized protobuf parser with custom streaming unmarshal. (#15731) 2025-02-13 10:38:35 +00:00
timestamp Move packages out of deprecated pkg directory 2021-11-09 08:03:10 +01:00
value Move packages out of deprecated pkg directory 2021-11-09 08:03:10 +01:00