James Ravn
e15d8c5802
reload: copy state on both name and labels ( #5368 )
...
* reload: copy state on both name and labels
Fix https://github.com/prometheus/prometheus/issues/5193
Using just name causes the linked issue - if new rules are inserted with
the same name (but different labels), the reordering will cause stale
markers to be inserted in the next eval for all shifted rules, despite
them not being stale.
Ideally we want to avoid stale markers for time series that still exist
in the new rules, with name and labels being the unique identifer.
This change adds labels to the internal map when copying the old rule
data to the new rule data. This prevents the problem of staling rules
that simply shifted order.
If labels change, it is a new time series and the old series will stale
regardless. So it should be safe to always match on name and labels when
copying state.
Signed-off-by: James Ravn <james@r-vn.org>
2019-03-15 15:23:36 +00:00
David Symonds
46361a7c85
rules: Fix sorting of result from (*Manager).RuleGroups ( #5260 )
...
The previous code was defective in that it never sorted groups within a
file due to doing a multi-key sort incorrectly.
Signed-off-by: David Symonds <dsymonds@gmail.com>
2019-02-23 09:51:44 +01:00
beorn7
2db1eeb4ec
Fix prometheus_rule_group_last_evaluation_timestamp_seconds
...
It should be a unix timestamp, not the seconds in the minute.
Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-06 11:02:49 +01:00
Ganesh Vernekar
787eb1e904
Set rule_group_last_duration_seconds to seconds ( #5153 )
...
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-01-31 11:07:58 +01:00
Matt Layher
302148fd69
*: apply gofmt -s
...
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2019-01-16 17:28:14 -05:00
Vishnunarayan K I
fd3ef6ba34
Add metric rule_group_rules_loaded to get the number of rules loaded ( #5090 )
...
Signed-off-by: Vishnunarayan K I <appukuttancr@gmail.com>
2019-01-13 14:28:07 +00:00
Simon Pasquier
f678e27eb6
*: use latest release of staticcheck ( #5057 )
...
* *: use latest release of staticcheck
It also fixes a couple of things in the code flagged by the additional
checks.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Use official release of staticcheck
Also run 'go list' before staticcheck to avoid failures when downloading packages.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-01-04 14:47:38 +01:00
Tom Wilkie
121603c417
Expose rules.NewGroupMetrics and rules.Metrics. ( #5059 )
...
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-01-03 12:07:06 +00:00
Tom Wilkie
6e08029b56
Move err to be the last return value from storage.Select. ( #5054 )
...
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-01-02 11:10:13 +00:00
Bartek Płotka
de213d4a5e
rule manager: Moved metric registration to custom registerer which is already available. ( #4961 )
...
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2018-12-28 10:20:29 +00:00
AixesHunter
fb8479a677
Variable 'labels' collides with imported package name ( #5012 )
...
Signed-off-by: aixeshunter <aixeshunter@gmail.com>
2018-12-19 09:44:03 +00:00
mknapphrt
f0e9196dca
Return warnings on a remote read fail ( #4832 )
...
Signed-off-by: Mark Knapp <mknapp@hudson-trading.com>
2018-11-30 14:27:12 +00:00
Krasi Georgiev
0754e5334b
querier for RestoreForState not closed. ( #4922 )
...
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-28 15:25:17 +02:00
Ben Kochie
c6399296dc
Fix spelling/typos ( #4921 )
...
* Fix spelling/typos
Fix spelling/typos reported by codespell/misspell.
* UK -> US spelling changes.
Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-27 17:44:29 +01:00
Wei Guo
e329cbf673
Add metric prometheus_rule_group_last_evaluation for recording and alerting ( #4852 )
...
* add metric prometheus_rule_group_last_evaluation for recording and alerting
Signed-off-by: Wei Guo <me@imkira.com>
* fix issues from comments
Signed-off-by: Wei Guo <me@imkira.com>
2018-11-27 14:38:13 +08:00
Will Hegedus
193ebe7e34
Updates to /targets and /rules (scrape duration, last evaluation time) ( #4722 )
...
* Add evaluationTimestamp (Last Evaluation) column to display on /rules
Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>
* Add lastScrapeDuration ("Scrape Duration") to display on /targets
Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>
* Updates based on Julius' feedback
Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>
* Update to set timestamp to when eval started (after eval completes)
Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>
* Update /rules to display time since last evaluation
Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>
* Re-order Last Eval/Eval Time to be consistent with targets page
Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>
2018-10-12 18:26:59 +02:00
Callum Styan
9bca041285
WIP: keep track of samples per query, set a max # of samples ( #4513 )
...
* keep track of samples per query, set a max # of samples that can be in
memory at once
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2018-10-02 12:59:19 +01:00
Ganesh Vernekar
5790d23fd8
Unit testing for rules ( #4350 )
...
* Unit testing for rules
* Specifying order of group evaluation in unit tests
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-09-25 17:06:26 +01:00
Ganesh Vernekar
05726c5ea2
Test template expansion while loading groups ( #4537 )
...
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-09-13 13:55:58 +01:00
Chris Marchbanks
63ed9d1b70
Send EndsAt along with alerts ( #4550 )
...
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2018-08-28 16:05:00 +01:00
Chris Marchbanks
87f1dad16d
throttle resends of alerts to 1 minute by default ( #4538 )
...
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2018-08-27 17:41:42 +01:00
Goutham Veeramachaneni
f3b7c22827
rules: add comment about lock taking ( #4525 )
...
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2018-08-21 21:30:08 +02:00
Ganesh Vernekar
c663477688
Fixed TestUpdate in rules/manager_test.go ( #4516 )
...
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-08-20 18:21:05 +05:30
Julius Volz
8fbe1b5133
Handle a bunch of unchecked errors ( #4461 )
...
There are many more (mostly finalizers like Close/Stop/etc.), but most of
the others seemed like one couldn't do much about them anyway.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-08-17 17:24:35 +02:00
Ganesh Vernekar
a0a9e7df91
Fix TestForStateRestore ( #4476 ) ( #4512 )
...
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-08-16 22:56:15 +05:30
Julien Pivotto
0b4d22b245
rules/manager: remove a no-longer-relevant comment ( #4503 )
...
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2018-08-15 09:33:39 +01:00
Chris Marchbanks
11155c7028
Existing alert labels will update based on templates ( #4500 )
...
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2018-08-15 08:52:08 +01:00
Fabian Reinartz
b7e2f407de
rules: Fix double-locking of mutex
...
Signed-off-by: Fabian Reinartz <freinartz@google.com>
2018-08-07 07:33:39 -04:00
Benji Visser
8bb6e0dd6e
Show rule evaluation errors on rules page ( #4457 )
...
* adding information about the health and errors for Rules
adding Health() and LastError() to the Rule interface. This will allow
us to easily surface information about rules.
Signed-off-by: noqcks <benny@noqcks.io>
* updating rules.html with fields for Rule errors and health state
Signed-off-by: noqcks <benny@noqcks.io>
* fix code comment grammar & access Rule health/error info using a mutex
Signed-off-by: noqcks <benny@noqcks.io>
* s/Errors/Error/ in rules.html to remain consistent with targets.html
Signed-off-by: noqcks <benny@noqcks.io>
* adding periods to code comments in reporting/alerting
Signed-off-by: noqcks <benny@noqcks.io>
* putting health/error below mutex in struct field
Signed-off-by: noqcks <benny@noqcks.io>
2018-08-07 00:33:45 +02:00
Julius Volz
2b8fc062a8
rules: HTML-escape rule YAML marshal errors ( #4464 )
...
This was pointed out by `gosec`.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-08-05 14:01:51 +02:00
Julius Volz
90521a65f8
Remove error return value from NotifyFunc() ( #4459 )
...
It's always nil and we also forgot to check it.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-08-04 21:31:12 +02:00
Ganesh Vernekar
f1db699dff
Persist alert 'for' state across restarts ( #4061 )
...
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-08-02 11:18:24 +01:00
Max Leonard Inden
71fafad099
api/v1: Coninue work exposing rules and alerts
...
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-07-30 15:31:51 +02:00
mg03
31f8ca0dfb
api v1 alerts/rules json endpoint
...
Signed-off-by: mg03 <mgeng03@gmail.com>
2018-07-30 15:29:44 +02:00
Bryan Boreham
afdb66dfac
Expose Group.CopyState() ( #4304 )
...
This makes the `rules` package more useful to projects that use
Prometheus as a library.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2018-07-18 15:14:38 +02:00
Julius Volz
9e3171f6e3
rules: Minor naming/comment cleanups ( #4328 )
...
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-07-18 04:54:33 +01:00
Bryan Boreham
2bd510a63e
Make TestUpdate() do some work ( #4306 )
...
Previously it would set no preconditions and check no postconditions,
as the `groups` member was empty.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2018-06-22 15:21:04 +01:00
Alin Sinpalean
9dc763cc03
Run rule evaluation with timestamps precisely evaluation_interval apart ( #4201 )
...
* Run rule evaluation with timestamps precisely evaluation_interval apart from one another.
Signed-off-by: Alin Sinpalean <alin.sinpalean@gmail.com>
2018-06-01 15:23:07 +01:00
Mario Trangoni
464e747f1e
fix some comments typos ( #4059 )
2018-04-08 10:51:54 +01:00
Bryan Boreham
93494d8b7e
Add an OpenTracing span for each rule ( #4027 )
...
* Add an OpenTracing span for each rule
So that tags and child spans can be traced back to the rule that they
refer to.
2018-03-30 21:29:19 +01:00
ferhat elmas
ec8e4d8a7c
all: remove unnecessary type conversions ( #3992 )
...
excep promql due to not to create conflict with #3966 .
2018-03-21 09:25:22 +00:00
Warren Fernandes
58e2a31db8
Cleans up test by removing unused function ( #3969 )
2018-03-15 08:59:19 +00:00
ferhat elmas
ffa673f7d8
General simplifications ( #3887 )
...
Another try as in #1516
2018-02-26 07:58:10 +00:00
Fabian Reinartz
7ccd4b39b8
*: implement query params
...
This adds a parameter to the storage selection interface which allows
query engine(s) to pass information about the operations surrounding a
data selection.
This can for example be used by remote storage backends to infer the
correct downsampling aggregates that need to be provided.
2018-02-13 12:17:22 +01:00
Simon Pasquier
81c0ab69e0
Don't reset FiredAt for inactive alerts
...
Otherwise AlertManager receives resolved alerts where StartsAt is zero which
fails the validation.
2018-01-22 17:17:33 +01:00
Brian Brazil
30b4439bbd
Remove rule_type label from rule metrics.
...
This is not really needed now that we have rule groups
to distinguish rules.
2017-12-04 11:44:38 +00:00
Brian Brazil
b97f4cf48c
Add metrics for rule group interval and last duration.
2017-12-04 11:44:38 +00:00
Brian Brazil
0a42a9fc8f
Copy over rule group duration on reload.
...
This is currently getting lost, this will soon be in a metric and we
don't want it dropping to 0 on every reload.
2017-12-04 11:44:38 +00:00
Brian Brazil
aa370fa568
Clarify metric names around rule groups.
...
Make it clear they're about overall rule groups.
2017-12-04 11:44:38 +00:00
Fabian Reinartz
62461379b7
rules: decouple notifier packages
...
The dependency on the notifier packages caused a transitive dependency
on discovery and with that all client libraries our service discovery
uses.
2017-11-27 16:38:14 +01:00
Fabian Reinartz
4d964a0a0d
rules: make glob expansion a concern of main
2017-11-24 08:22:57 +01:00
Fabian Reinartz
bd9f7460eb
rules: remove config package dependency
2017-11-24 07:57:54 +01:00
Fabian Reinartz
2d0e3746ac
rules: remove dependency on promql.Engine
2017-11-24 07:57:54 +01:00
Fabian Reinartz
2ec5965b75
Merge pull request #3508 from prometheus/uptsdb
...
update TSDB
2017-11-23 19:11:54 +01:00
Fabian Reinartz
83cd270ea4
*: adapt to storage interface changes
2017-11-23 19:05:04 +01:00
Goutham Veeramachaneni
a880c86375
Fix unexported method on exported interface.
...
Also move to model.Duration
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-11-23 19:13:57 +05:30
conorbroderick
55aaece116
Add rule evaluation time
2017-11-22 15:22:02 +00:00
Goutham Veeramachaneni
e1117715fe
rules: remove skipped iterations cuz no throttling
...
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-11-14 17:33:00 +05:30
Jorge Hernández
6cd0f63eb1
Use testutil in rules subpackage ( #3278 )
...
* Use testutil in rules subpackage
* Fix manager test
* Use testutil in rules subpackage
* Fix manager test
* Fix rebase
* Change to testutil for applyConfig tests
2017-11-11 11:29:47 +01:00
Krasi Georgiev
e86d82ad2d
Fix regression of alert rules state loss on config reload. ( #3382 )
...
* incorrect map name for the group prevented copying state from existing alert rules on config reload
* applyConfig test
* few nits
* nits 2
2017-11-01 12:58:00 +01:00
Julius Volz
099df0c5f0
Migrate "golang.org/x/net/context" -> "context" ( #3333 )
...
In some places, where ctxhttp or gRPC are concerned, we still need to use the
old contexts.
2017-10-24 21:21:42 -07:00
Brian Brazil
cc5499fcad
Only close after checking for err.
2017-10-09 19:44:03 +01:00
Brian Brazil
ee88f0d222
Ensure all values are used or _
2017-10-09 19:44:03 +01:00
Fabian Reinartz
2d0b8e8b94
Merge branch 'master' into dev-2.0
2017-10-05 13:09:18 +02:00
Julius Volz
f7e8348a88
Re-add contexts to storage.Storage.Querier() ( #3230 )
...
* Re-add contexts to storage.Storage.Querier()
These are needed when replacing the storage by a multi-tenant
implementation where the tenant is stored in the context.
The 1.x query interfaces already had contexts, but they got lost in 2.x.
* Convert promql.Engine to use native contexts
2017-10-04 21:04:15 +02:00
beorn7
c2e9a151ab
Make all rule links link to the "Console" tab rather than "Graph"
...
Clicking on a rule, either the name or the expression, opens the rule
result (or the corresponding expression, repsectively) in the
expression browser. This should by default happen in the console tab,
as, more often than not, displaying it in the graph tab runs into a
timeout.
2017-09-21 18:28:00 +02:00
Fabian Reinartz
d21f149745
*: migrate to go-kit/log
2017-09-08 22:01:51 +05:30
Goutham Veeramachaneni
e1fc9dc78d
Move /rules to new format ( #2901 )
...
Fixes #2891
Signed-off-by: Goutham Veeramachaneni <goutham@boomerangcommerce.com>
2017-07-08 11:38:02 +02:00
Goutham Veeramachaneni
37e7b69f56
Merge remote-tracking branch 'upstream/dev-2.0' into rulegroups
...
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-19 16:34:55 +05:30
Goutham Veeramachaneni
c472316fb3
Check done before every rule evaluation.
...
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-16 16:57:22 +05:30
Goutham Veeramachaneni
6b70a4d850
Incorporate PR feedback
...
* Move fingerprint to Hash()
* Move away from tsdb.MultiError
* 0777 -> 0666 for files
* checkOverflow of extra fields
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-16 16:44:33 +05:30
Goutham Veeramachaneni
507790a357
Rework logging to use explicitly passed logger
...
Mostly cleaned up the global logger use. Still some uses in discovery
package.
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-16 15:52:44 +05:30
Goutham Veeramachaneni
dc69645e92
Move back to go-yaml
...
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-16 10:46:21 +05:30
Goutham Veeramachaneni
5ff283a7b7
Reflect the grouping in the UI
...
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-14 16:09:14 +05:30
Goutham Veeramachaneni
8cca666cf2
Add file name to group.
...
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-14 15:18:39 +05:30
Goutham Veeramachaneni
e893c89333
Validate labels and annotations
...
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-14 15:07:54 +05:30
Goutham Veeramachaneni
a48a018368
Make sure groups are unique in a single file
...
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-14 12:19:21 +05:30
Goutham Veeramachaneni
cea1e99f78
Add update-rules command to promtool
...
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-14 11:38:54 +05:30
Goutham Veeramachaneni
e8f55669ea
Move rules to new format
...
Signed-off-by: Goutham Veeramachaneni <goutham@boomerangcommerce.com>
2017-06-12 18:14:39 +05:30
Brian Brazil
dcea3e4773
Don't append a 0 when alert is no longer pending/firing
...
With staleness we no longer need this behaviour.
2017-05-24 13:52:45 +01:00
Brian Brazil
cc867dae60
Copy previous series and alert state more intelligently.
...
Usually rules don't more around, and if they do it's likely
that rules/alerts with the same name stay in the same order.
If rules/alerts with the same name are added/removed this
could cause a blip for one cycle, but this is unavoidable
without requiring rule and alert names to be unique - which we don't
want to do.
2017-05-24 13:52:45 +01:00
Brian Brazil
9bc68db7e6
Track staleness per rule rather than per group.
2017-05-24 13:52:45 +01:00
Brian Brazil
0451d6d31b
Add unittest for rule staleness, and rules generally.
2017-05-24 13:52:45 +01:00
Brian Brazil
0400f3cfd2
Very basic staleness handling for rules.
2017-05-24 13:52:45 +01:00
Fabian Reinartz
06c2b76cd4
Merge branch 'master' into uptsdb
2017-05-16 16:48:37 +02:00
Alexey Palazhchenko
b0e1ea7c6c
Simplify code, fix typos. ( #2719 )
2017-05-15 09:56:09 +01:00
Julius Volz
ac203ef0ee
Add externalURL template function ( #2716 )
...
This allows users to e.g. add links back to the generating Prometheus
right in their alert templates.
2017-05-13 15:47:04 +02:00
Julius Volz
fe11c5933a
Fix mutation of active alert elements by notifier ( #2656 )
...
This caused the external label application in the notifier to bleed back
into the rule manager's active alerting elements.
2017-04-26 10:29:42 -05:00
Fabian Reinartz
8ffc851147
Merge branch 'master' into dev-2.0
2017-04-04 15:17:56 +02:00
Tobias Schmidt
eaf33759fb
Register forgotten prometheus_evaluator_iterations_total metric
2017-04-02 20:32:56 -03:00
Tobias Schmidt
aaaba57184
Export number of missed rule evaluations
...
In case the execution of all rules takes longer than the configured rule
evaluation interval, one or more iterations will be skipped. This needs
to be visible to the opterator.
2017-04-02 20:03:28 -03:00
Fabian Reinartz
5772f1a7ba
retrieval/storage: adapt to new interface
...
This simplifies the interface to two add methods for
appends with labels or faster reference numbers.
2017-02-02 13:05:46 +01:00
Fabian Reinartz
ad9bc62e4c
storage: extend appender and adapt it
2017-01-13 14:48:01 +01:00
Fabian Reinartz
e94b0899ee
rules: fix tests, remove model types
2016-12-29 17:31:14 +01:00
Fabian Reinartz
f8fc1f5bb2
*: migrate ingestion to new batch Appender
2016-12-29 11:03:56 +01:00
Fabian Reinartz
fecf9532b9
*: fix misc compile errors
2016-12-25 11:42:57 +01:00
Fabian Reinartz
622ece6273
*: fix recording tests, migrate matcher types
2016-12-25 11:12:57 +01:00
Fabian Reinartz
5817cb5bde
*: migrate from model.* to promql.* types
2016-12-25 00:37:46 +01:00
Fabian Reinartz
e68a3cf21f
rules: update annotations on each iteration
2016-11-22 15:43:07 +01:00
Jonathan Lange
d78dd3593d
Set evaluation interval on Group construction
...
Prevents having object in invalid state, and allows users of public API
to construct valid Groups.
2016-11-18 16:32:30 +00:00