conorbroderick
55aaece116
Add rule evaluation time
2017-11-22 15:22:02 +00:00
Goutham Veeramachaneni
e1117715fe
rules: remove skipped iterations cuz no throttling
...
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-11-14 17:33:00 +05:30
Jorge Hernández
6cd0f63eb1
Use testutil in rules subpackage ( #3278 )
...
* Use testutil in rules subpackage
* Fix manager test
* Use testutil in rules subpackage
* Fix manager test
* Fix rebase
* Change to testutil for applyConfig tests
2017-11-11 11:29:47 +01:00
Krasi Georgiev
e86d82ad2d
Fix regression of alert rules state loss on config reload. ( #3382 )
...
* incorrect map name for the group prevented copying state from existing alert rules on config reload
* applyConfig test
* few nits
* nits 2
2017-11-01 12:58:00 +01:00
Julius Volz
099df0c5f0
Migrate "golang.org/x/net/context" -> "context" ( #3333 )
...
In some places, where ctxhttp or gRPC are concerned, we still need to use the
old contexts.
2017-10-24 21:21:42 -07:00
Brian Brazil
cc5499fcad
Only close after checking for err.
2017-10-09 19:44:03 +01:00
Brian Brazil
ee88f0d222
Ensure all values are used or _
2017-10-09 19:44:03 +01:00
Fabian Reinartz
2d0b8e8b94
Merge branch 'master' into dev-2.0
2017-10-05 13:09:18 +02:00
Julius Volz
f7e8348a88
Re-add contexts to storage.Storage.Querier() ( #3230 )
...
* Re-add contexts to storage.Storage.Querier()
These are needed when replacing the storage by a multi-tenant
implementation where the tenant is stored in the context.
The 1.x query interfaces already had contexts, but they got lost in 2.x.
* Convert promql.Engine to use native contexts
2017-10-04 21:04:15 +02:00
beorn7
c2e9a151ab
Make all rule links link to the "Console" tab rather than "Graph"
...
Clicking on a rule, either the name or the expression, opens the rule
result (or the corresponding expression, repsectively) in the
expression browser. This should by default happen in the console tab,
as, more often than not, displaying it in the graph tab runs into a
timeout.
2017-09-21 18:28:00 +02:00
Fabian Reinartz
d21f149745
*: migrate to go-kit/log
2017-09-08 22:01:51 +05:30
Goutham Veeramachaneni
e1fc9dc78d
Move /rules to new format ( #2901 )
...
Fixes #2891
Signed-off-by: Goutham Veeramachaneni <goutham@boomerangcommerce.com>
2017-07-08 11:38:02 +02:00
Goutham Veeramachaneni
37e7b69f56
Merge remote-tracking branch 'upstream/dev-2.0' into rulegroups
...
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-19 16:34:55 +05:30
Goutham Veeramachaneni
c472316fb3
Check done before every rule evaluation.
...
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-16 16:57:22 +05:30
Goutham Veeramachaneni
6b70a4d850
Incorporate PR feedback
...
* Move fingerprint to Hash()
* Move away from tsdb.MultiError
* 0777 -> 0666 for files
* checkOverflow of extra fields
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-16 16:44:33 +05:30
Goutham Veeramachaneni
507790a357
Rework logging to use explicitly passed logger
...
Mostly cleaned up the global logger use. Still some uses in discovery
package.
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-16 15:52:44 +05:30
Goutham Veeramachaneni
dc69645e92
Move back to go-yaml
...
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-16 10:46:21 +05:30
Goutham Veeramachaneni
5ff283a7b7
Reflect the grouping in the UI
...
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-14 16:09:14 +05:30
Goutham Veeramachaneni
8cca666cf2
Add file name to group.
...
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-14 15:18:39 +05:30
Goutham Veeramachaneni
e893c89333
Validate labels and annotations
...
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-14 15:07:54 +05:30
Goutham Veeramachaneni
a48a018368
Make sure groups are unique in a single file
...
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-14 12:19:21 +05:30
Goutham Veeramachaneni
cea1e99f78
Add update-rules command to promtool
...
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-14 11:38:54 +05:30
Goutham Veeramachaneni
e8f55669ea
Move rules to new format
...
Signed-off-by: Goutham Veeramachaneni <goutham@boomerangcommerce.com>
2017-06-12 18:14:39 +05:30
Brian Brazil
dcea3e4773
Don't append a 0 when alert is no longer pending/firing
...
With staleness we no longer need this behaviour.
2017-05-24 13:52:45 +01:00
Brian Brazil
cc867dae60
Copy previous series and alert state more intelligently.
...
Usually rules don't more around, and if they do it's likely
that rules/alerts with the same name stay in the same order.
If rules/alerts with the same name are added/removed this
could cause a blip for one cycle, but this is unavoidable
without requiring rule and alert names to be unique - which we don't
want to do.
2017-05-24 13:52:45 +01:00
Brian Brazil
9bc68db7e6
Track staleness per rule rather than per group.
2017-05-24 13:52:45 +01:00
Brian Brazil
0451d6d31b
Add unittest for rule staleness, and rules generally.
2017-05-24 13:52:45 +01:00
Brian Brazil
0400f3cfd2
Very basic staleness handling for rules.
2017-05-24 13:52:45 +01:00
Fabian Reinartz
06c2b76cd4
Merge branch 'master' into uptsdb
2017-05-16 16:48:37 +02:00
Alexey Palazhchenko
b0e1ea7c6c
Simplify code, fix typos. ( #2719 )
2017-05-15 09:56:09 +01:00
Julius Volz
ac203ef0ee
Add externalURL template function ( #2716 )
...
This allows users to e.g. add links back to the generating Prometheus
right in their alert templates.
2017-05-13 15:47:04 +02:00
Julius Volz
fe11c5933a
Fix mutation of active alert elements by notifier ( #2656 )
...
This caused the external label application in the notifier to bleed back
into the rule manager's active alerting elements.
2017-04-26 10:29:42 -05:00
Fabian Reinartz
8ffc851147
Merge branch 'master' into dev-2.0
2017-04-04 15:17:56 +02:00
Tobias Schmidt
eaf33759fb
Register forgotten prometheus_evaluator_iterations_total metric
2017-04-02 20:32:56 -03:00
Tobias Schmidt
aaaba57184
Export number of missed rule evaluations
...
In case the execution of all rules takes longer than the configured rule
evaluation interval, one or more iterations will be skipped. This needs
to be visible to the opterator.
2017-04-02 20:03:28 -03:00
Fabian Reinartz
5772f1a7ba
retrieval/storage: adapt to new interface
...
This simplifies the interface to two add methods for
appends with labels or faster reference numbers.
2017-02-02 13:05:46 +01:00
Fabian Reinartz
ad9bc62e4c
storage: extend appender and adapt it
2017-01-13 14:48:01 +01:00
Fabian Reinartz
e94b0899ee
rules: fix tests, remove model types
2016-12-29 17:31:14 +01:00
Fabian Reinartz
f8fc1f5bb2
*: migrate ingestion to new batch Appender
2016-12-29 11:03:56 +01:00
Fabian Reinartz
fecf9532b9
*: fix misc compile errors
2016-12-25 11:42:57 +01:00
Fabian Reinartz
622ece6273
*: fix recording tests, migrate matcher types
2016-12-25 11:12:57 +01:00
Fabian Reinartz
5817cb5bde
*: migrate from model.* to promql.* types
2016-12-25 00:37:46 +01:00
Fabian Reinartz
e68a3cf21f
rules: update annotations on each iteration
2016-11-22 15:43:07 +01:00
Jonathan Lange
d78dd3593d
Set evaluation interval on Group construction
...
Prevents having object in invalid state, and allows users of public API
to construct valid Groups.
2016-11-18 16:32:30 +00:00
Jonathan Lange
31fc357cd8
Make NewGroup and Group.Eval public
...
Allows callers to execute evaluate lists of rules without first writing
them to disk.
2016-11-18 16:25:58 +00:00
Jonathan Lange
2a2da40223
Make rule evaluation publicly available
...
Means that a third-party can parse rules and run them with their own
execution model.
2016-11-18 16:12:50 +00:00
Matt Bostock
926a5ab3dd
rules/manager.go: Fix race between reload and stop
...
On one relatively large Prometheus instance (1.7M series), I noticed
that upgrades were frequently resulting in Prometheus undergoing crash
recovery on start-up.
On closer examination, I found that Prometheus was panicking on
shutdown.
It seems that our configuration management (or misconfiguration thereof)
is reloading Prometheus then immediately restarting it, which I suspect
is causing this race:
Sep 21 15:12:42 host systemd[1]: Reloading prometheus monitoring system.
Sep 21 15:12:42 host prometheus[18734]: time="2016-09-21T15:12:42Z" level=info msg="Loading configuration file /etc/prometheus/config.yaml" source="main.go:221"
Sep 21 15:12:42 host systemd[1]: Reloaded prometheus monitoring system.
Sep 21 15:12:44 host systemd[1]: Stopping prometheus monitoring system...
Sep 21 15:12:44 host prometheus[18734]: time="2016-09-21T15:12:44Z" level=warning msg="Received SIGTERM, exiting gracefully..." source="main.go:203"
Sep 21 15:12:44 host prometheus[18734]: time="2016-09-21T15:12:44Z" level=info msg="See you next time!" source="main.go:210"
Sep 21 15:12:44 host prometheus[18734]: time="2016-09-21T15:12:44Z" level=info msg="Stopping target manager..." source="targetmanager.go:90"
Sep 21 15:12:52 host prometheus[18734]: time="2016-09-21T15:12:52Z" level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:548"
Sep 21 15:12:56 host prometheus[18734]: time="2016-09-21T15:12:56Z" level=warning msg="Error on ingesting out-of-order samples" numDropped=1 source="scrape.go:467"
Sep 21 15:12:56 host prometheus[18734]: time="2016-09-21T15:12:56Z" level=error msg="Error adding file watch for \"/etc/prometheus/targets\": no such file or directory" source="file.go:84"
Sep 21 15:12:56 host prometheus[18734]: time="2016-09-21T15:12:56Z" level=error msg="Error adding file watch for \"/etc/prometheus/targets\": no such file or directory" source="file.go:84"
Sep 21 15:13:01 host prometheus[18734]: time="2016-09-21T15:13:01Z" level=info msg="Stopping rule manager..." source="manager.go:366"
Sep 21 15:13:01 host prometheus[18734]: time="2016-09-21T15:13:01Z" level=info msg="Rule manager stopped." source="manager.go:372"
Sep 21 15:13:01 host prometheus[18734]: time="2016-09-21T15:13:01Z" level=info msg="Stopping notification handler..." source="notifier.go:325"
Sep 21 15:13:01 host prometheus[18734]: time="2016-09-21T15:13:01Z" level=info msg="Stopping local storage..." source="storage.go:381"
Sep 21 15:13:01 host prometheus[18734]: time="2016-09-21T15:13:01Z" level=info msg="Stopping maintenance loop..." source="storage.go:383"
Sep 21 15:13:01 host prometheus[18734]: panic: close of closed channel
Sep 21 15:13:01 host prometheus[18734]: goroutine 7686074 [running]:
Sep 21 15:13:01 host prometheus[18734]: panic(0xba57a0, 0xc60c42b500)
Sep 21 15:13:01 host prometheus[18734]: /usr/local/go/src/runtime/panic.go:500 +0x1a1
Sep 21 15:13:01 host prometheus[18734]: github.com/prometheus/prometheus/rules.(*Manager).ApplyConfig.func1(0xc6645a9901, 0xc420271ef0, 0xc420338ed0, 0xc60c42b4f0, 0xc6645a9900)
Sep 21 15:13:01 host prometheus[18734]: /home/build/packages/prometheus/tmp/build/gopath/src/github.com/prometheus/prometheus/rules/manager.go:412 +0x3c
Sep 21 15:13:01 host prometheus[18734]: created by github.com/prometheus/prometheus/rules.(*Manager).ApplyConfig
Sep 21 15:13:01 host prometheus[18734]: /home/build/packages/prometheus/tmp/build/gopath/src/github.com/prometheus/prometheus/rules/manager.go:423 +0x56b
Sep 21 15:13:03 host systemd[1]: prometheus.service: main process exited, code=exited, status=2/INVALIDARGUMENT
2016-09-21 22:03:02 +01:00
Julius Volz
c187308366
storage: Contextify storage interfaces.
...
This is based on https://github.com/prometheus/prometheus/pull/1997 .
This adds contexts to the relevant Storage methods and already passes
PromQL's new per-query context into the storage's query methods.
The immediate motivation supporting multi-tenancy in Frankenstein, but
this could also be used by Prometheus's normal local storage to support
cancellations and timeouts at some point.
2016-09-19 16:29:07 +02:00
Julius Volz
ed5a0f0abe
promql: Allow per-query contexts.
...
For Weaveworks' Frankenstein, we need to support multitenancy. In
Frankenstein, we initially solved this without modifying the promql
package at all: we constructed a new promql.Engine for every
query and injected a storage implementation into that engine which would
be primed to only collect data for a given user.
This is problematic to upstream, however. Prometheus assumes that there
is only one engine: the query concurrency gate is part of the engine,
and the engine contains one central cancellable context to shut down all
queries. Also, creating a new engine for every query seems like overkill.
Thus, we want to be able to pass per-query contexts into a single engine.
This change gets rid of the promql.Engine's built-in base context and
allows passing in a per-query context instead. Central cancellation of
all queries is still possible by deriving all passed-in contexts from
one central one, but this is now the responsibility of the caller. The
central query context is now created in main() and passed into the
relevant components (web handler / API, rule manager).
In a next step, the per-query context would have to be passed to the
storage implementation, so that the storage can implement multi-tenancy
or other features based on the contextual information.
2016-09-19 15:38:17 +02:00
beorn7
75bae065fd
Revert "Modify tests to adjust to reverting the /graph changes"
...
This reverts commit f1ea5bf232
.
Part two necessary for reverting the /graph revert.
2016-09-03 21:08:33 +02:00