This change enables the PrometheusRemoteWriteBehind alert’s expression to be evaluated
even when the remote endpoint has never been reached. As a result, PrometheusRemoteWriteBehind
will fire to easily detect configuration mistakes (such as incorrect endpoint URLs) or
unrecoverable connectivity issues.
See https://github.com/prometheus/prometheus/issues/14350 for details.
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
The RunBuiltinTests function accepts a concrete type which makes
it hard to exclude certain tests from the suite. It would be great
if we could skip tests which might not be critical in order to unblock
updates.
By accepting an interface instead, we can inject a custom implementation
which would skips select test cases.
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Same idea as for the avg aggregator before: Most of the time, there is
no overflow, so we don't have to revert to the more expensive and less
precise incremental calculation of the mean value.
Signed-off-by: beorn7 <beorn@grafana.com>
The calculation of the mean value in avg_over_time is performed in an
incremental fashion. This introduces additional numerical errors that
even Kahan summation cannot compensate, but at least we can use the
Kahan-corrected mean value when we use the intermediate mean value in
the calculation.
Signed-off-by: beorn7 <beorn@grafana.com>
The basic idea here is that the previous code was always doing
incremental calculation of the mean value, which is more costly and
can be less precise. It protects against overflows, but in most cases,
an overflow doesn't happen anyway.
The other idea applied here is to expand on #14074, where Kahan
summation was applied to sum().
With this commit, the average is calculated in a conventional way
(adding everything up and divide in the end) as long as the sum isn't
overflowing float64. This is combined with Kahan summation so that the
avg aggregation, in most cases, is really equivalent to the sum
aggregation with a following division (which is the user's expectation
as avg is supposed to be syntactic sugar for sum with a following
divison).
If the sum hits ±Inf, the calculation reverts to incremental
calculation of the mean value. Kahan summation is also applied here,
although it cannot fully compensate for the numerical errors
introduced by the incremental mean calculation. (The tests added in
this commit would fail if incremental mean calculation was always
used.)
Signed-off-by: beorn7 <beorn@grafana.com>
The optimizer which detects cases where histogram buckets can be skipped
does not take into account binary expressions. This can lead to buckets
not being decoded if a metric is used with both histogram_fraction/quantile and
histogram_sum/count in the same expression.
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
* Revert "fix bug that would cause us to endlessly fall behind (#13583)"
This reverts commit 0c71230784.
(leaving the new test in place)
* TSDB: enhance TestRun_AvoidNotifyWhenBehind
With code suggested by @cstyan in #14439.
* WAL watcher: add back log line showing current segment
---------
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* split warnings and info annotations in API response
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
* update according to code review
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
* minimal UI change: show infos in different colour
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
* Update web/ui/react-app/src/pages/graph/Panel.tsx
Co-authored-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: zenador <zenador@users.noreply.github.com>
---------
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
Signed-off-by: zenador <zenador@users.noreply.github.com>
Co-authored-by: Julius Volz <julius.volz@gmail.com>
Clear caches by restarting scraping loops: each loop assumes it has
exclusive ownership of its cache, so we can't come in from another
goroutine and change it.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>