* promql: Allow injecting fake tokens into the generated parser
Yacc grammars do not support having multiple start symbols.
To work around that restriction, it is possible to inject fake tokens into the lexer stream,
as described here https://www.gnu.org/software/bison/manual/html_node/Multiple-start_002dsymbols.html .
This is part of the parser rewrite effort described in #6256.
Signed-off-by: Tobias Guggenmos <tguggenm@redhat.com>
For yacc generated parsers there is the convention to capitalize the names of item types provided by the lexer, which makes it easy to distinct lexer tokens (capitalized) from nonterminal symbols (not capitalized) in language grammars.
This convention is also followed by the (non generated) go compiler (see https://golang.org/pkg/go/token/#Token).
Part of the parser rewrite described in #6256.
Signed-off-by: Tobias Guggenmos <tguggenm@redhat.com>
Before this commit, the PromQL parser ran in two goroutines:
* The lexer goroutine that splits the input into tokens and sent them over a channel to
* the parser goroutine which produces the abstract syntax tree
The Problem with this approach is that the parser spends more time on goroutine creation
and syncronisation than on actual parsing.
This commit removes that concurrency and replaces the channel by a slice based buffer.
Benchmarks show that this makes the up to 7 times faster than before.
Signed-off-by: Tobias Guggenmos <tguggenm@redhat.com>
* Expose lexer item types
We have generally agreed to expose AST types / values that are necessary
to make sense of the AST outside of the promql package. Currently the
`UnaryExpr`, `BinaryExpr`, and `AggregateExpr` AST nodes store the lexer
item type to indicate the operator type, but since the individual item
types aren't exposed, an external user of the package cannot determine
the operator type. So this PR exposes them.
Although not all item types are required to make sense of the AST (some
are really only used in the lexer), I decided to expose them all here to
be somewhat more consistent. Another option would be to not use lexer
item types at all in AST nodes.
The concrete motivation is my work on the PromQL->Flux transpiler, but
this ought to be useful for other cases as well.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Fix item type names in tests
Signed-off-by: Julius Volz <julius.volz@gmail.com>
When there was an error in the parser, the
lexer goroutine was left running.
Also make runtime panic test actually test things.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
The current separation between lexer and parser is a bit fuzzy when it
comes to operators, aggregators and other keywords. The lexer already
tries to determine the type of a token, even though that type might
change depending on the context.
This led to the problematic behavior that no tokens known to the lexer
could be used as label names, including operators (and, by, ...),
aggregators (count, quantile, ...) or other keywords (for, offset, ...).
This change additionally checks whether an identifier is one of these
types. We might want to check whether the specific item identification
should be moved from the lexer to the parser.
The `unless` set operator can be used to return all vector elements from
the LHS which do not match the elements on the RHS. A use case is to
return all metrics for nodes which do not have a specific role:
node_load1 unless on(instance) chef_role{role="app"}
This has the advantage that the user doesn't need
to list all labels they want to keep (as with "by")
but without having to worry about inconsistent labels
as when there's only one time series (as with "keeping_common").
Almost all aggregation should use this rather than the existing
two options as it's much less error prone and easier to maintain
due to not having to always add in "job" plus whatever other common
job-level labels you have like "region".
This change is breaking, use the 'bool' modifier for such comprisons.
After this change all comparisons without 'bool' will filter, and all
comparisons with 'bool' will return 0/1. This makes the language more
consistent and orthogonal, and ultimately easier to learn and use.
If we ever figure out sane semantics for filtering scalar/scalar
comparisons we can add them in, which will most likely come out of how
the new vector() function is used.
This adapts some functionality from the Go standard library for string
literal lexing and unquoting/unescaping.
The following string types are now supported:
Double- or single-quoted strings:
These support all escape sequences that Go supports in double-quoted
string literals. The difference is that Prometheus also has
single-quoted strings (instead of single-quoted runes in Go). Raw
newlines are not allowed.
Backtick-quoted raw strings:
Strings quoted in backticks are treated as raw strings just like in Go
and may contain raw newlines and other special characters directly.
Fixes https://github.com/prometheus/prometheus/issues/1122
Fixes https://github.com/prometheus/prometheus/issues/1121
When doing comparison operations on vectors, filtering
sometimes gets in the way and you have to go to a fair bit of
effort to workaround it in order to always return a result.
The 'bool' modifier instead of filtering returns 0/1 depending
on the result of the compairson.
This is also a prerequisite to removing plain scalar/scalar comparisons,
as it maintains the current behaviour under a new syntax.
This is with `golint -min_confidence=0.5`.
I left several lint warnings untouched because they were either
incorrect or I felt it was better not to change them at the moment.
`keep_common` is more in line with the function name
`drop_common_labels()` terminology-wise, and also more in line with
`group_left`/`group_right` (no `...ing` verb suffix).
We could also go the full way and call it `keep_common_labels`. That
would have the benefit of being even more consistent with the function
`drop_common_labels()` and would be more explanatory, but it also seems
quite long.
This commit adds parsing of time series description to the exisiting
query language parser. Time series descriptions are defined by a
metric followed by a sequence of values.