Commit graph

249 commits

Author SHA1 Message Date
Goutham Veeramachaneni 2efdf660b1
Increase evaluation failures on Commit() (#8770)
I think we should increment the metric here, we're setting the rule
health anyways. This means even if the "evaluation" suceeded, none of
the samples made it to storage.

This is a simplified solution to: https://github.com/prometheus/prometheus/pull/8410/

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2021-04-29 14:28:48 +02:00
Björn Rabenstein 9549a15c6f
Merge pull request #7675 from JessicaGreben/jg/11-retroactive-rule-eval
Add rule importer to backfill
2021-03-29 19:09:21 +02:00
Goutham Veeramachaneni 4b5ab80ca6
[rule] Update rule health for append/commit fails (#8619)
* [rule] Update rule health for append/commit fails

Similar to https://github.com/prometheus/prometheus/pull/8410 will
provide more context.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Add test for updating health on append fails

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2021-03-18 15:44:33 +01:00
jessicagreben 78e84aed89 resolve merge conflict
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2021-02-24 09:47:29 -08:00
Tom Wilkie 7369561305
Combine Appender.Add and AddFast into a single Append method. (#8489)
This moves the label lookup into TSDB, whilst still keeping the cached-ref optimisation for repeated Appends.

This makes the API easier to consume and implement.  In particular this change is motivated by the scrape-time-aggregation work, which I don't think is possible to implement without it as it needs access to label values.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2021-02-18 17:37:00 +05:30
jessicagreben 75654715d3 fix panics
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2020-11-01 07:54:04 -08:00
jessicagreben 6980bcf671 unexport backfiller
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2020-10-31 06:40:56 -07:00
jessicagreben 36ac0b68f1 merge master, fix conflicts 2020-10-17 08:20:21 -07:00
Łukasz Mierzwa 19c190b406
Add a rule_group_samples metric (#7977)
This new metric allows tracking how many samples did each rule group generate.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2020-09-25 16:48:38 +01:00
李国忠 4a52faf2ae
Unnecessary go routine spawn. (#7951)
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
2020-09-21 11:29:03 +01:00
jessicagreben dfa510086b add alignment, mv rule importer to promtool dir, add queryRange
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2020-09-13 08:07:59 -07:00
Anonymous 8219b442c8
rules: Remove redundant RLock to avoid double RLock (#7183)
Signed-off-by: BurtonQin <bobbqqin@gmail.com>
2020-09-07 18:58:21 +01:00
johncming 2f2a51a43a
web/api/v1: make names consistent. (#7841)
Signed-off-by: johncming <johncming@yahoo.com>
2020-08-25 11:38:06 +01:00
Goutham Veeramachaneni cb830b0a9c
Label rule_group_iterations metric with group name (#7823)
* Label rule_group_iterations metric with group name

evalTotal and evalFailures having the label but iterations not having it
is an odd mismatch.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Remove the metrics when a group is deleted.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Initialise the metrics

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2020-08-19 15:29:13 +02:00
johncming 362080ba28
rules: add evaluationTimestamp when copy state. (#7775)
Signed-off-by: johncming <johncming@yahoo.com>
2020-08-14 09:42:13 +01:00
Callum Styan c7a17f6491
Add additional tag/log to rules Manager Eval trace span. (#7708)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2020-08-06 08:42:20 -07:00
Annanay 7f98a744e5 Add context to Appender interface
Signed-off-by: Annanay <annanayagarwal@gmail.com>
2020-07-24 19:40:51 +05:30
Owen Diehl 9ccedc0407
Arbitrary rule & group loading (#7569)
* allows loading rule groups via an interface

Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
2020-07-22 15:19:34 +01:00
Julien Pivotto b83cbacbdd
Rule manager: remove blocking channel in mail (#7631)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-22 00:13:24 +02:00
Frederic Branczyk d17d88935c
rules: Use narrower interface for rule manager loading of for state (#7472)
To load ALERT_FOR_STATE only `storage.Queryable` interface is required,
so this patch uses this narrower interface for to perform this.

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
2020-06-26 19:06:36 +01:00
Kemal Akkoyun 66dfb951c4
*: Consistent Error/Warning handling for SeriesSet iterator: Allowing Async Select (#7251)
* Add errors and Warnings to SeriesSet

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Change Querier interface and refactor accordingly

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Refactor promql/engine to propagate warnings at eval stage

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Address review issues

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Make sure all the series from all Selects are pre-advanced

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Address review issues

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Separate merge series sets

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Clean

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Refactor merge querier failure handling

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Refactored and simplified fanout with improvements from incoming chunk iterator PRs.

* Secondary logic is hidden, instead of weird failed series set logic we had.
* Fanout is well commented
* Fanout closing record all errors
* MergeQuerier improved API (clearer)
* deferredGenericMergeSeriesSet is not needed as we return no samples anyway for failed series sets (next = false).

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fix formatting

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Fix CI issues

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Added final tests for error handling.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Brian's comments.

* Moved hints in populate to be allocated only when needed.
* Used sync.Once in secondary Querier to achieve all-or-nothing partial response logic.
* Select after first Next is done will panic.

NOTE: in lazySeriesSet in theory we could just panic, I think however we can
totally just return error, it will panic in expand anyway.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Utilize errWithWarnings

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Fix recently introduced expansion issue

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add tests for secondary querier error handling

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Implement lazy merge

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add name to test cases

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Reorganize

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Address review comments

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Address review comments

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Remove redundant warnings

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Fix rebase mistake

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-06-09 17:57:31 +01:00
Julien Pivotto fc3fb3265a
Merge pull request #7145 from prometheus/release-2.17
Backport release 2.17 into master
2020-04-20 14:08:12 +02:00
Chris Marchbanks a7b449320d
Fix updating rule manager never finishing (#7138)
Rather than sending a value to the done channel on a group to indicate
whether or not to add stale markers to a closing rule group use an
explicit boolean. This allows more functions than just run() to read
from the done channel and fixes an issue where Eval() could consume the
channel during an update, causing run() to never return.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2020-04-18 14:32:18 +02:00
Marek Slabicki 8224ddec23
Capitalizing first letter of all log lines (#7043)
Signed-off-by: Marek Slabicki <thaniri@gmail.com>
2020-04-11 09:22:18 +01:00
Ben Ye 00730bfee7
add rule_group label to rule evaluation metrics (#7094)
Signed-off-by: yeya24 <yb532204897@gmail.com>
2020-04-08 23:21:37 +02:00
Muhammad Falak R Wani 2d1a80aa82
rules: manager: clarify doc string for NewGroupMetrics (#7084)
* rules: manager: clarify doc string for NewGroupMetrics

Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com>
2020-04-07 14:06:01 +02:00
Brian Brazil 7646cbca32
Use .UTC everywhere we use time.Unix (#7066)
time.Unix attaches the local timezone, which can then
leak out (e.g. in the alert json). While this is harmless,
we should be consistent.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2020-03-29 17:35:39 +01:00
Bartlomiej Plotka c4eefd1b3a storage: Removed SelectSorted method; Simplified interface; Added requirement for remote read to sort response.
This is technically BREAKING CHANGE, but it was like this from the beginning: I just notice that we rely in
Prometheus on remote read being sorted. This is because we use selected data from remote reads in MergeSeriesSet
which rely on sorting.

I found during work on https://github.com/prometheus/prometheus/pull/5882 that
we do so many repetitions because of this, for not good reason. I think
I found a good balance between convenience and readability with just one method.
Smaller the interface = better.

Also I don't know what TestSelectSorted was testing, but now it's testing sorting.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-03-20 21:14:43 +01:00
Björn Rabenstein 1da83305be
Merge pull request #7009 from prometheus/release-2.17
Merge release-2.17 into master
2020-03-19 13:46:28 +01:00
Julien Pivotto 8907ba6235 Make TSDB use storage errors
This fixes #6992, which was introduced by #6777. There was an
intermediate component which translated TSDB errors into storage errors,
but that component was deleted and this bug went unnoticed, until we
were watching at the Prombench results. Without this, scrape will fail
instead of dropping samples or using "Add" when the series have been
garbage collected.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-03-17 22:24:25 +01:00
Björn Rabenstein d80b0810c1
Move crucial actions to defer (#6918)
With defer having less of a performance penalty, there is no reason
not to do those crucial operations via defer.

Context: With isolation in place, if we forget to Commit/Rollback, the
low watermark will get stuck forever.

The current code should not have any bugs, but moving to defer helps
to avoid future bugs.

This is also moving the `closeAppend` in the `Commit` implementation
itself to defer. If logging to the WAL fails, we would have missed the
`closeAppend`.

Signed-off-by: beorn7 <beorn@grafana.com>
2020-03-13 20:54:47 +01:00
Bartlomiej Plotka fe802f29c9 storage: Removed SelectSorted method; Simplified interface; Added requirement for remote read to sort response.
This is technically BREAKING CHANGE, but it was like this from the beginning: I just notice that we rely in
Prometheus on remote read being sorted. This is because we use selected data from remote reads in MergeSeriesSet
which rely on sorting.

I found during work on https://github.com/prometheus/prometheus/pull/5882 that
we do so many repetitions because of this, for not good reason. I think
I found a good balance between convenience and readability with just one method.
Smaller the interface = better.

Also I don't know what TestSelectSorted was testing, but now it's testing sorting.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-03-13 13:06:25 +00:00
Julien Pivotto 2051ae2e6a Revert "Fix race condition in Rule manager.Update() function"
This reverts commit 8b11d2cfb6.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-03-02 08:01:21 +01:00
fuling 8b11d2cfb6 Fix race condition in Rule manager.Update() function
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
2020-03-01 12:00:51 +01:00
Tobias Guggenmos 4835bbf376
Merge branch 'master' into split_parser 2020-02-19 15:18:13 +01:00
Bartlomiej Plotka 34426766d8 Unify Iterator interfaces. All point to storage now.
This is part of https://github.com/prometheus/prometheus/pull/5882 that can be done to simplify things.
All todos I added will be fixed in follow up PRs.

* querier.Querier, querier.Appender, querier.SeriesSet, and querier.Series interfaces merged
with storage interface.go. All imports that.
* querier.SeriesIterator replaced by chunkenc.Iterator
* Added chunkenc.Iterator.Seek method and tests for xor implementation (?)
* Since we properly handle SelectParams for Select methods I adjusted min max
based on that. This should help in terms of performance for queries with functions like offset.
* added Seek to deletedIterator and test.
* storage/tsdb was removed as it was only a unnecessary glue with incompatible structs.

No logic was changed, only different source of abstractions, so no need for benchmarks.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:54 +00:00
Tobias Guggenmos 20b1f596f6 Fix build errors in rest of prometheus
Signed-off-by: Tobias Guggenmos <tguggenm@redhat.com>
2020-02-17 16:09:23 +01:00
Julien Pivotto 135cc30063
rules: Make deleted rule series as stale after a reload (#6745)
* rules: Make deleted rule series as stale after a reload

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-02-12 16:22:18 +01:00
Julien Pivotto f82d55e79f
Refactor and simplify rule_group_interval_seconds (#6711)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-29 11:26:08 +00:00
Julien Pivotto 56ebd5afde Delete prometheus_rule_group metrics when groups are removed (#6693)
* Delete prometheus_rule_group metrics when groups are removed

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-27 12:41:32 +00:00
Julien Pivotto 5f27ac3583 Refactor query log fields (#6694)
* Refactor query log fields

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-27 09:53:10 +00:00
Harkishen Singh 84e6459c4d Adds support for line-column numbers for invalid rules, promtool (#6533)
Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>
2020-01-15 18:07:54 +00:00
Julien Pivotto 0011bba19b Query Log: add origin of the rules (#6592)
* Query Log: add origin of the rules

We don't set rule name and rule kind because the added value would be
quite low, given we have now the file, the group and the query.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-10 08:28:17 +00:00
Julien Pivotto e079c9ed45 manager: add full stops on comments
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2019-12-19 11:46:22 +01:00
alburthoffman 156dcb8cca avoid stopping rule groups if new rule groups are as same as old rule… (#6450)
* avoid stopping rule groups if new rule groups are as same as old rule groups

Signed-off-by: alburthoffman <alburthoffman@gmail.com>
2019-12-19 10:41:11 +00:00
Simon Pasquier 06066a3619
*: improve error messages when parsing bad rules (#5965)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-08-28 17:36:48 +02:00
Brian Brazil 2184b79763
Mark deleted rule's series as stale on next evaluation. (#5759)
Fixes #5755

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-08-07 16:11:05 +01:00
beorn7 dd81912554 Add objectives to Summaries
With the next release of client_golang, Summaries will not have
objectives by default. To not lose the objectives we have right now,
explicitly state the current default objectives.

Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-12 02:03:13 +02:00
pbhudiaBAE 43953b105b Sorting alerts by group name in /alerts (#5448)
* Working group name

Signed-off-by: Pritam Bhudia <pritam.bhudia@baesystems.com>

* Working categorised by group name

Signed-off-by: Pritam Bhudia <pritam.bhudia@baesystems.com>

* Changed group sorting in web

Signed-off-by: Pritam Bhudia <pritam.bhudia@baesystems.com>

* Fixed group sorting and comments

Signed-off-by: Pritam Bhudia <pritam.bhudia@baesystems.com>

* Fixed group sorting and comments with gofmt

Signed-off-by: Pritam Bhudia <pritam.bhudia@baesystems.com>

* Added file and group name

Signed-off-by: Pritam Bhudia <pritam.bhudia@baesystems.com>

* reverted back to full path to yml file

Signed-off-by: Pritam Bhudia <pritam.bhudia@baesystems.com>
2019-05-14 23:14:27 +02:00
Yao Zengzeng 5544cb252a fix some mistakes in comments (#5533)
Signed-off-by: YaoZengzeng <yaozengzeng@zju.edu.cn>
2019-05-05 10:48:42 +01:00
Simon Pasquier 45506841e6
*: enable all default linters (#5504)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-05-03 15:11:28 +02:00
Bjoern Rabenstein 38d518c0fe Rework #5009 after comments
Signed-off-by: Bjoern Rabenstein <bjoern@rabenste.in>
2019-04-17 01:40:10 +02:00
Tariq Ibrahim 8fdfa8abea refine error handling in prometheus (#5388)
i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors.
ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives.
iii) Does away with the use of fmt package for errors in favour of pkg/errors

Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-03-26 00:01:12 +01:00
James Ravn e15d8c5802 reload: copy state on both name and labels (#5368)
* reload: copy state on both name and labels

Fix https://github.com/prometheus/prometheus/issues/5193

Using just name causes the linked issue - if new rules are inserted with
the same name (but different labels), the reordering will cause stale
markers to be inserted in the next eval for all shifted rules, despite
them not being stale.

Ideally we want to avoid stale markers for time series that still exist
in the new rules, with name and labels being the unique identifer.

This change adds labels to the internal map when copying the old rule
data to the new rule data. This prevents the problem of staling rules
that simply shifted order.

If labels change, it is a new time series and the old series will stale
regardless. So it should be safe to always match on name and labels when
copying state.

Signed-off-by: James Ravn <james@r-vn.org>
2019-03-15 15:23:36 +00:00
David Symonds 46361a7c85 rules: Fix sorting of result from (*Manager).RuleGroups (#5260)
The previous code was defective in that it never sorted groups within a
file due to doing a multi-key sort incorrectly.

Signed-off-by: David Symonds <dsymonds@gmail.com>
2019-02-23 09:51:44 +01:00
beorn7 2db1eeb4ec Fix prometheus_rule_group_last_evaluation_timestamp_seconds
It should be a unix timestamp, not the seconds in the minute.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-06 11:02:49 +01:00
Ganesh Vernekar 787eb1e904 Set rule_group_last_duration_seconds to seconds (#5153)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-01-31 11:07:58 +01:00
Vishnunarayan K I fd3ef6ba34 Add metric rule_group_rules_loaded to get the number of rules loaded (#5090)
Signed-off-by: Vishnunarayan K I <appukuttancr@gmail.com>
2019-01-13 14:28:07 +00:00
Tom Wilkie 121603c417
Expose rules.NewGroupMetrics and rules.Metrics. (#5059)
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-01-03 12:07:06 +00:00
Bartek Płotka de213d4a5e rule manager: Moved metric registration to custom registerer which is already available. (#4961)
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2018-12-28 10:20:29 +00:00
mknapphrt f0e9196dca Return warnings on a remote read fail (#4832)
Signed-off-by: Mark Knapp <mknapp@hudson-trading.com>
2018-11-30 14:27:12 +00:00
Krasi Georgiev 0754e5334b
querier for RestoreForState not closed. (#4922)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-28 15:25:17 +02:00
Wei Guo e329cbf673 Add metric prometheus_rule_group_last_evaluation for recording and alerting (#4852)
* add metric prometheus_rule_group_last_evaluation for recording and alerting

Signed-off-by: Wei Guo <me@imkira.com>

* fix issues from comments

Signed-off-by: Wei Guo <me@imkira.com>
2018-11-27 14:38:13 +08:00
Will Hegedus 193ebe7e34 Updates to /targets and /rules (scrape duration, last evaluation time) (#4722)
* Add evaluationTimestamp (Last Evaluation) column to display on /rules
Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>

* Add lastScrapeDuration ("Scrape Duration") to display on /targets
Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>

* Updates based on Julius' feedback

Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>

* Update to set timestamp to when eval started (after eval completes)

Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>

* Update /rules to display time since last evaluation

Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>

* Re-order Last Eval/Eval Time to be consistent with targets page

Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>
2018-10-12 18:26:59 +02:00
Ganesh Vernekar 5790d23fd8 Unit testing for rules (#4350)
* Unit testing for rules
* Specifying order of group evaluation in unit tests

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-09-25 17:06:26 +01:00
Chris Marchbanks 63ed9d1b70 Send EndsAt along with alerts (#4550)
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2018-08-28 16:05:00 +01:00
Chris Marchbanks 87f1dad16d throttle resends of alerts to 1 minute by default (#4538)
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2018-08-27 17:41:42 +01:00
Julius Volz 8fbe1b5133
Handle a bunch of unchecked errors (#4461)
There are many more (mostly finalizers like Close/Stop/etc.), but most of
the others seemed like one couldn't do much about them anyway.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-08-17 17:24:35 +02:00
Julien Pivotto 0b4d22b245 rules/manager: remove a no-longer-relevant comment (#4503)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2018-08-15 09:33:39 +01:00
Benji Visser 8bb6e0dd6e Show rule evaluation errors on rules page (#4457)
* adding information about the health and errors for Rules

adding Health() and LastError() to the Rule interface. This will allow
us to easily surface information about rules.

Signed-off-by: noqcks <benny@noqcks.io>

* updating rules.html with fields for Rule errors and health state

Signed-off-by: noqcks <benny@noqcks.io>

* fix code comment grammar & access Rule health/error info using a mutex

Signed-off-by: noqcks <benny@noqcks.io>

* s/Errors/Error/ in rules.html to remain consistent with targets.html

Signed-off-by: noqcks <benny@noqcks.io>

* adding periods to code comments in reporting/alerting

Signed-off-by: noqcks <benny@noqcks.io>

* putting health/error below mutex in struct field

Signed-off-by: noqcks <benny@noqcks.io>
2018-08-07 00:33:45 +02:00
Julius Volz 90521a65f8
Remove error return value from NotifyFunc() (#4459)
It's always nil and we also forgot to check it.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-08-04 21:31:12 +02:00
Ganesh Vernekar f1db699dff Persist alert 'for' state across restarts (#4061)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-08-02 11:18:24 +01:00
Max Leonard Inden 71fafad099
api/v1: Coninue work exposing rules and alerts
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-07-30 15:31:51 +02:00
Bryan Boreham afdb66dfac Expose Group.CopyState() (#4304)
This makes the `rules` package more useful to projects that use
Prometheus as a library.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2018-07-18 15:14:38 +02:00
Julius Volz 9e3171f6e3 rules: Minor naming/comment cleanups (#4328)
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-07-18 04:54:33 +01:00
Alin Sinpalean 9dc763cc03 Run rule evaluation with timestamps precisely evaluation_interval apart (#4201)
* Run rule evaluation with timestamps precisely evaluation_interval apart from one another.

Signed-off-by: Alin Sinpalean <alin.sinpalean@gmail.com>
2018-06-01 15:23:07 +01:00
Mario Trangoni 464e747f1e fix some comments typos (#4059) 2018-04-08 10:51:54 +01:00
Bryan Boreham 93494d8b7e Add an OpenTracing span for each rule (#4027)
* Add an OpenTracing span for each rule

So that tags and child spans can be traced back to the rule that they
refer to.
2018-03-30 21:29:19 +01:00
ferhat elmas ffa673f7d8 General simplifications (#3887)
Another try as in #1516
2018-02-26 07:58:10 +00:00
Fabian Reinartz 7ccd4b39b8 *: implement query params
This adds a parameter to the storage selection interface which allows
query engine(s) to pass information about the operations surrounding a
data selection.
This can for example be used by remote storage backends to infer the
correct downsampling aggregates that need to be provided.
2018-02-13 12:17:22 +01:00
Brian Brazil 30b4439bbd Remove rule_type label from rule metrics.
This is not really needed now that we have rule groups
to distinguish rules.
2017-12-04 11:44:38 +00:00
Brian Brazil b97f4cf48c Add metrics for rule group interval and last duration. 2017-12-04 11:44:38 +00:00
Brian Brazil 0a42a9fc8f Copy over rule group duration on reload.
This is currently getting lost, this will soon be in a metric and we
don't want it dropping to 0 on every reload.
2017-12-04 11:44:38 +00:00
Brian Brazil aa370fa568 Clarify metric names around rule groups.
Make it clear they're about overall rule groups.
2017-12-04 11:44:38 +00:00
Fabian Reinartz 62461379b7 rules: decouple notifier packages
The dependency on the notifier packages caused a transitive dependency
on discovery and with that all client libraries our service discovery
uses.
2017-11-27 16:38:14 +01:00
Fabian Reinartz 4d964a0a0d rules: make glob expansion a concern of main 2017-11-24 08:22:57 +01:00
Fabian Reinartz bd9f7460eb rules: remove config package dependency 2017-11-24 07:57:54 +01:00
Fabian Reinartz 2d0e3746ac rules: remove dependency on promql.Engine 2017-11-24 07:57:54 +01:00
Goutham Veeramachaneni a880c86375
Fix unexported method on exported interface.
Also move to model.Duration

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-11-23 19:13:57 +05:30
conorbroderick 55aaece116 Add rule evaluation time 2017-11-22 15:22:02 +00:00
Goutham Veeramachaneni e1117715fe
rules: remove skipped iterations cuz no throttling
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-11-14 17:33:00 +05:30
Krasi Georgiev e86d82ad2d Fix regression of alert rules state loss on config reload. (#3382)
* incorrect map name for the group prevented copying state from existing alert rules on config reload

* applyConfig test

* few nits

* nits 2
2017-11-01 12:58:00 +01:00
Julius Volz 099df0c5f0 Migrate "golang.org/x/net/context" -> "context" (#3333)
In some places, where ctxhttp or gRPC are concerned, we still need to use the
old contexts.
2017-10-24 21:21:42 -07:00
Brian Brazil ee88f0d222 Ensure all values are used or _ 2017-10-09 19:44:03 +01:00
Fabian Reinartz 2d0b8e8b94 Merge branch 'master' into dev-2.0 2017-10-05 13:09:18 +02:00
beorn7 c2e9a151ab Make all rule links link to the "Console" tab rather than "Graph"
Clicking on a rule, either the name or the expression, opens the rule
result (or the corresponding expression, repsectively) in the
expression browser. This should by default happen in the console tab,
as, more often than not, displaying it in the graph tab runs into a
timeout.
2017-09-21 18:28:00 +02:00
Fabian Reinartz d21f149745 *: migrate to go-kit/log 2017-09-08 22:01:51 +05:30
Goutham Veeramachaneni 37e7b69f56
Merge remote-tracking branch 'upstream/dev-2.0' into rulegroups
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-19 16:34:55 +05:30
Goutham Veeramachaneni c472316fb3
Check done before every rule evaluation.
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-16 16:57:22 +05:30
Goutham Veeramachaneni 6b70a4d850
Incorporate PR feedback
* Move fingerprint to Hash()
* Move away from tsdb.MultiError
* 0777 -> 0666 for files
* checkOverflow of extra fields

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-16 16:44:33 +05:30