tl;dr: This is not a fundamental solution to the indexing problem
(like tindex is) but it at least avoids utilizing the intersection
problem to the greatest possible amount.
In more detail:
Imagine the following query:
nicely:aggregating:rule{job="foo",env="prod"}
While it uses a nicely aggregating recording rule (which might have a
very low cardinality), Prometheus still intersects the low number of
fingerprints for `{__name__="nicely:aggregating:rule"}` with the many
thousands of fingerprints matching `{job="foo"}` and with the millions
of fingerprints matching `{env="prod"}`. This totally innocuous query
is dead slow if the Prometheus server has a lot of time series with
the `{env="prod"}` label. Ironically, if you make the query more
complicated, it becomes blazingly fast:
nicely:aggregating:rule{job=~"foo",env=~"prod"}
Why so? Because Prometheus only intersects with non-Equal matchers if
there are no Equal matchers. That's good in this case because it
retrieves the few fingerprints for
`{__name__="nicely:aggregating:rule"}` and then starts right ahead to
retrieve the metric for those FPs and checking individually if they
match the other matchers.
This change is generalizing the idea of when to stop intersecting FPs
and go into "retrieve metrics and check them individually against
remaining matchers" mode:
- First, sort all matchers by "expected cardinality". Matchers
matching the empty string are always worst (and never used for
intersections). Equal matchers are in general consider best, but by
using some crude heuristics, we declare some better than others
(instance labels or anything that looks like a recording rule).
- Then go through the matchers until we hit a threshold of remaining
FPs in the intersection. This threshold is higher if we are already
in the non-Equal matcher area as intersection is even more expensive
here.
- Once the threshold has been reached (or we have run out of matchers
that do not match the empty string), start with "retrieve metrics
and check them individually against remaining matchers".
A beefy server at SoundCloud was spending 67% of its CPU time in index
lookups (fingerprintsForLabelPairs), serving mostly a dashboard that
is exclusively built with recording rules. With this change, it spends
only 35% in fingerprintsForLabelPairs. The CPU usage dropped from 26
cores to 18 cores. The median latency for query_range dropped from 14s
to 50ms(!). As expected, higher percentile latency didn't improve that
much because the new approach is _occasionally_ running into the worst
case while the old one was _systematically_ doing so. The 99th
percentile latency is now about as high as the median before (14s)
while it was almost twice as high before (26s).
This commit creates a (so far unused) package. It contains the a custom
lexer/parser for the query language.
ast.go: New AST that interacts well with the parser.
lex.go: Custom lexer (new).
lex_test.go: Lexer tests (new).
parse.go: Custom parser (new).
parse_test.go: Parser tests (new).
functions.go: Changed function type, dummies for parser testing (barely changed/dummies).
printer.go: Adapted from rules/ and adjusted to new AST (mostly unchanged, few additions).
- Move CONTRIBUTORS.md to the more common AUTHORS.
- Added the required NOTICE file.
- Changed "Prometheus Team" to "The Prometheus Authors".
- Reverted the erroneous changes to the Apache License.
This was initially motivated by wanting to distribute the rule checker
tool under `tools/rule_checker`. However, this was not possible without
also distributing the LevelDB dynamic libraries because the tool
transitively depended on Levigo:
rule checker -> query layer -> tiered storage layer -> leveldb
This change separates external storage interfaces from the
implementation (tiered storage, leveldb storage, memory storage) by
putting them into separate packages:
- storage/metric: public, implementation-agnostic interfaces
- storage/metric/tiered: tiered storage implementation, including memory
and LevelDB storage.
I initially also considered splitting up the implementation into
separate packages for tiered storage, memory storage, and LevelDB
storage, but these are currently so intertwined that it would be another
major project in itself.
The query layers and most other parts of Prometheus now have notion of
the storage implementation anymore and just use whatever implementation
they get passed in via interfaces.
The rule_checker is now a static binary :)
Change-Id: I793bbf631a8648ca31790e7e772ecf9c2b92f7a0
There are four label-matching ops for selecting timeseries now:
- Equal: =
- NotEqual: !=
- RegexMatch: =~
- RegexNoMatch: !~
Instead of looking up labels by a simple clientmodel.LabelSet (basically
an equals op for every key/value pair in the set), timeseries
fingerprint selection is now done via a list of metric.LabelMatchers.
Change-Id: I510a83f761198e80946146770ebb64e4abc3bb96