This is related to #454. Queries now timeout after a duration set by
the -query.timeout flag. The TotalEvalTimer is now started/stopped
inside any of the ast.Eval* functions.
The 2nd isCounter argument to delta is ugly, make it optional as the first step
of deprecating it. This will makes delta only ever applied to gauges.
Add a deriv function to calculate the least squares
slope of a gauge. This is more useful for prediction than delta,
as it isn't as heavily influenced by outliers at the boundaries.
- Move CONTRIBUTORS.md to the more common AUTHORS.
- Added the required NOTICE file.
- Changed "Prometheus Team" to "The Prometheus Authors".
- Reverted the erroneous changes to the Apache License.
It turned out in the end, that only drop_common_metrics() produced any
erroneous output in the old system. The second expression in the test
("sum(testmetric) keeping_extra") already worked in the old code, but
why not keep it in...
The way to test ranged evaluations is a bit clumsy so far, so I want to
build a nicer test framework in the end, where all the test cases can be
specified as text files which specify desired inputs, outputs, query
step widths, etc.
Change-Id: I821859789e69b8232bededf670a1b76e9e8c8ca4
Essentially:
- Remove unused code.
- Make it 'go vet' clean. The only remaining warnings are in generated code.
- Make it 'golint' clean. The only remaining warnings are in gerenated code.
- Smoothed out same minor things.
Change-Id: I3fe5c1fbead27b0e7a9c247fee2f5a45bc2d42c6
- Delete unneeded file view_adapter.go.
- Assessed that we still need the fingerprints in nodes
(to create iterators).
- Turned numMemChunkDescs into a metric.
Change-Id: I29be963c795a075ec00c095f76bf26405535609d
A common problem in Prometheus alerting is to detect when no timeseries
exist for a given metric name and label combination. Unfortunately,
Prometheus alert expressions need to be of vector type, and
"count(nonexistent_metric)" results in an empty vector, yielding no
output vector elements to base an alert on. The newly introduced
absent() function solves this issue:
ALERT FooAbsent IF absent(foo{job="myjob"}) [...]
absent() has the following behavior:
- if the vector passed to it has any elements, it returns an empty
vector.
- if the vector passed to it has no elements, it returns a 1-element
vector with the value 1.
In the second case, absent() tries to be smart about deriving labels of
the 1-element output vector from the input vector:
absent(nonexistent{job="myjob"}) => {job="myjob"}
absent(nonexistent{job="myjob",instance=~".*"}) => {job="myjob"}
absent(sum(nonexistent{job="myjob"})) => {}
That is, if the passed vector is a literal vector selector, it takes all
"=" label matchers as the basis for the output labels, but ignores all
non-equals or regex matchers. Also, if the passed vector results from a
non-selector expression, no labels can be derived.
Change-Id: I948505a1488d50265ab5692a3286bd7c8c70cd78
After many transformations, it doesn't make sense to keep the metric
names, since the result of the transformation is no longer that metric.
This drops the metric name after such transformations and makes the web
UI deal well with missing metric names.
This depends on the current branch on the following things:
- prometheus/client_golang needs to be at
e237cf15c6
in branch "julius/int-fingerprints" (to be merged with new storage)
- prometheus/promdash needs to be at
dd7691c9c2
Change-Id: Ib3c8cad8d647d9854e8c653c424b8c235ccc231d
This removes the dependancy on C leveldb and snappy.
It also takes care of fewer dependencies as they would
anyway not work on any non-Debian, non-Brew system.
Change-Id: Ia70dce1ba8a816a003587927e0b3a3f8ad2fd28c
In addition to the existing by-clause syntax:
sum(<expression>) by (<labels>) [keeping_extra]
...this allows the following new syntax:
sum by (<labels>) [keeping_extra] (<expression>)
Both orderings may be used in a single expression. It is up to the users
to establish guidelines around their usage.
Change-Id: Iba10c9cc5fb6ac62edfcf246d281473e82467992
This allows the following expression syntaxes for selecting timeseries:
foo (already valid before)
foo{} (already valid before)
{job="prometheus"} (new, select all timeseries for job "prometheus")
Omitting both the metric name *and* any label matchers ("" or "{}") will
still yield a syntax error.
To get all timeseries, you could do:
{__name__=~".*"}
or, without relying on knowledge about __metric__:
{job=~".*"}
Change-Id: Ifee000b9ac0184ef6ced18411069c7f2699a2dda
- Staleness delta is no a proper function parameter and not replicated
from package ast.
- Named type 'chunks' replaced by explicit '[]chunk' to avoid confusion.
- For the same reason, replaced 'chunkDescs' by '[]*chunkDescs'.
- Verified that math.Modf is not a speed enhancement over conversion
(actually 5x slower).
- Renamed firstTimeField, lastTimeField into chunkFirstTime and
chunkLastTime.
- Verified unpin() is sufficiently goroutine-safe.
- Decided not to update archivedFingerprintToTimeRange upon series
truncation and added a rationale why.
Change-Id: I863b8d785e5ad9f71eb63e229845eacf1bed8534
To achieve O(log n * k) runtime, this uses a heap to track the current
bottom-k or top-k elements while iterating over the full set of
available elements.
It would be possible to reuse more code between topk and bottomk, but I
decided for some more duplication for the sake of clarity.
This fixes https://github.com/prometheus/prometheus/issues/399
Change-Id: I7487ddaadbe7acb22ca2cf2283ba6e7915f2b336
- Always spell out the time unit (e.g. milliseconds instead of ms).
- Remove "_total" from the names of metrics that are not counters.
- Make use of the "Namespace" and "Subsystem" fields in the options.
- Removed the "capacity" facet from all metrics about channels/queues.
These are all fixed via command line flags and will never change
during the runtime of a process. Also, they should not be part of
the same metric family. I have added separate metrics for the
capacity of queues as convenience. (They will never change and are
only set once.)
- I left "metric_disk_latency_microseconds" unchanged, although that
metric measures the latency of the storage device, even if it is not
a spinning disk. "SSD" is read by many as "solid state disk", so
it's not too far off. (It should be "solid state drive", of course,
but "metric_drive_latency_microseconds" is probably confusing.)
- Brian suggested to not mix "failure" and "success" outcome in the
same metric family (distinguished by labels). For now, I left it as
it is. We are touching some bigger issue here, especially as other
parts in the Prometheus ecosystem are following the same
principle. We still need to come to terms here and then change
things consistently everywhere.
Change-Id: If799458b450d18f78500f05990301c12525197d3
Add a function to bypass the new auto-escaping.
Add a function to workaround go's templates only allowing passing in one argument.
Change-Id: Id7aa3f95e7c227692dc22108388b1d9b1e2eec99
Move rulemanager to it's own package to break cicrular dependency.
Make NewTestTieredStorage available to tests, remove duplication.
Change-Id: I33b321245a44aa727bfc3614a7c9ae5005b34e03
This was initially motivated by wanting to distribute the rule checker
tool under `tools/rule_checker`. However, this was not possible without
also distributing the LevelDB dynamic libraries because the tool
transitively depended on Levigo:
rule checker -> query layer -> tiered storage layer -> leveldb
This change separates external storage interfaces from the
implementation (tiered storage, leveldb storage, memory storage) by
putting them into separate packages:
- storage/metric: public, implementation-agnostic interfaces
- storage/metric/tiered: tiered storage implementation, including memory
and LevelDB storage.
I initially also considered splitting up the implementation into
separate packages for tiered storage, memory storage, and LevelDB
storage, but these are currently so intertwined that it would be another
major project in itself.
The query layers and most other parts of Prometheus now have notion of
the storage implementation anymore and just use whatever implementation
they get passed in via interfaces.
The rule_checker is now a static binary :)
Change-Id: I793bbf631a8648ca31790e7e772ecf9c2b92f7a0
This allows putting a scalar as the first argument of a binary operator
in which the second argument is a vector:
<scalar> <binop> <vector>
For example,
1 / http_requests_total
...will output a vector in which every sample value is 1 divided by the
respective input vector element.
This even works for filter binary operators now:
1 == http_requests_total
Returns a vector with all values set to 1 for every element in
http_requests_total whose initial value was 1.
Note: For filter binary operators, the resulting values are always taken
from the left-hand-side of the operation, no matter whether the scalar
or the vector argument is the left-hand-side. That is,
1 != http_requests_total
...will set all result vector sample values to 1, although these are
exactly the sample elements that were != 1 in the input vector.
If you want to just filter elements without changing their sample
values, you still need to do:
http_requests_total != 1
The new filter form is a bit exotic, and so probably won't be used
often. But it was easier to implement it than disallow it completely or
change its behavior.
Change-Id: Idd083f2bd3a1219ba1560cf4ace42f5b82e797a5
There are four label-matching ops for selecting timeseries now:
- Equal: =
- NotEqual: !=
- RegexMatch: =~
- RegexNoMatch: !~
Instead of looking up labels by a simple clientmodel.LabelSet (basically
an equals op for every key/value pair in the set), timeseries
fingerprint selection is now done via a list of metric.LabelMatchers.
Change-Id: I510a83f761198e80946146770ebb64e4abc3bb96