* Use a prefix trie for long alternate lists
* Add test for non terminal node
* Fix panic in FuzzFastRegexMatcher_WithFuzzyRegularExpressions when the fuzzy regex is invalid
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Address PR feedback
* Update model/labels/regexp_test.go
Co-authored-by: Marco Pracucci <marco@pracucci.com>
* Replace trie with slice or map depending on input size
* Fix tests
* Pull in tests from @pracucci's branch
* Add setMatches back in
* Use stringMatcher when it's faster
* Fix linter
* Estimate alternates ahead of time
* Simplify construction with `IndexByte`
* Add test and early return for empty regexp.
* Fix race conditions in tests
---------
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Co-authored-by: Marco Pracucci <marco@pracucci.com>
Go spends some time initializing all the elements of these arrays to
zero, so reduce the size from 1024 to 128. This is still much bigger
than we ever expect for a set of labels.
(If someone does have more than 128 labels it will still work, but via
heap allocation.)
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Although we had a different slice, the underlying memory was the same so
any changes meant we could skip some values.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This lets relabelling work on a `Builder` rather than converting to and
from `Labels` on every rule.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
The difference is modest, but we've used `slices.Sort` in lots of other
places so why not here.
name old time/op new time/op delta
Builder 1.04µs ± 3% 0.95µs ± 3% -8.27% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
Builder 312B ± 0% 288B ± 0% -7.69% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
Builder 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.008 n=5+5)
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This makes the buffer the correct size for the common case that labels
have only been added. It will be too large for the case that labels are
changed, but the current buffer resize logic in `appendLabelTo` doubles
the buffer, so a small over-estimate is better.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Optimized very long case insensitive alternations
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Run common regexps in BenchmarkFastRegexMatcher
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Modify BenchmarkNewFastRegexMatcher to benchmark the NewFastRegexMatcher() function
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Reduced allocations by optimizeEqualStringMatchers()
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Fixed typo in comments
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Fixed typo in test case name
Signed-off-by: Marco Pracucci <marco@pracucci.com>
---------
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Do not optimize regexps with being/end text anchors inside
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Explicit case for begin/end text in stringMatcherFromRegexpInternal()
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Added more test cases
Signed-off-by: Marco Pracucci <marco@pracucci.com>
---------
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Optimized FastRegexMatcher when the regex contains a case insensitive alternation made with concats too
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Do not use a pointer to hold whether the matches are case sensitive
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Improved unit tests based on review feedback
Signed-off-by: Marco Pracucci <marco@pracucci.com>
---------
Signed-off-by: Marco Pracucci <marco@pracucci.com>
These benchmarks were testing things related to what Prometheus does, but not testing actual Prometheus code.
Moved the label-copying benchmark into the labels package.
This commit adds an alternate implementation for `labels.Labels`, behind
a build tag `stringlabels`.
Instead of storing label names and values as individual strings, they
are all concatenated into one string in this format:
[len][name0][len][value0][len][name1][len][value1]...
The lengths are varint encoded so usually a single byte.
The previous `[]string` had 24 bytes of overhead for the slice and 16
for each label name and value; this one has 16 bytes overhead plus 1
for each name and value.
In `ScratchBuilder.Overwrite` and `Labels.Hash` we use an unsafe
conversion from string to byte slice. `Overwrite` is explicitly unsafe,
but for `Hash` this is a pure performance hack.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Sometimes label matchers know that they match values with a specific
prefix. This information can be valuable in some downstream storage
implementations.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Instead of passing in a `ScratchBuilder` and `Labels`, just pass the
builder and the caller can extract labels from it. In many cases the
caller didn't use the Labels value anyway.
Now in `Labels.ScratchBuilder` we need a slightly different API: one
to assign what will be the result, instead of overwriting some other
`Labels`. This is safer and easier to reason about.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Without changing the definition of `labels.Labels`, add methods which
enable code using it to work without knowledge of the internals.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
For performance reasons we may use a different implementation of Hash()
in future, so note this so callers can be warned.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* matcher.go: restore comment from upstream
There's no reason to remove this comment.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* head_append.go: Cortex to Mimir
This repo is used in Mimir.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Replacing code which assumes the internal structure of `Labels`.
Add a convenience function `EmptyLabels()` which is more efficient than
calling `New()`.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* model/relabel: Add benchmark
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* model/relabel: re-use Builder across relabels
Saves memory allocations.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* labels.Builder: allow re-use of result slice
This reduces memory allocations where the caller has a suitable slice available.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* model/relabel: re-use source values slice
To reduce memory allocations.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Unwind one change causing test failures
Restore original behaviour in PopulateLabels, where we must not overwrite the input set.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* relabel: simplify values optimisation
Use a stack-based array for up to 16 source labels, which will be the
vast majority of cases.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* lint
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>