Commit graph

579 commits

Author SHA1 Message Date
Sylvain Rabot 335a34486e Add external labels to template expansion
This affects the expansion of templates in alert labels and
annotations and console templates.

Signed-off-by: Sylvain Rabot <sylvain@abstraction.fr>
2019-04-17 01:40:10 +02:00
Tariq Ibrahim 8fdfa8abea refine error handling in prometheus (#5388)
i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors.
ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives.
iii) Does away with the use of fmt package for errors in favour of pkg/errors

Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-03-26 00:01:12 +01:00
James Ravn e15d8c5802 reload: copy state on both name and labels (#5368)
* reload: copy state on both name and labels

Fix https://github.com/prometheus/prometheus/issues/5193

Using just name causes the linked issue - if new rules are inserted with
the same name (but different labels), the reordering will cause stale
markers to be inserted in the next eval for all shifted rules, despite
them not being stale.

Ideally we want to avoid stale markers for time series that still exist
in the new rules, with name and labels being the unique identifer.

This change adds labels to the internal map when copying the old rule
data to the new rule data. This prevents the problem of staling rules
that simply shifted order.

If labels change, it is a new time series and the old series will stale
regardless. So it should be safe to always match on name and labels when
copying state.

Signed-off-by: James Ravn <james@r-vn.org>
2019-03-15 15:23:36 +00:00
David Symonds 46361a7c85 rules: Fix sorting of result from (*Manager).RuleGroups (#5260)
The previous code was defective in that it never sorted groups within a
file due to doing a multi-key sort incorrectly.

Signed-off-by: David Symonds <dsymonds@gmail.com>
2019-02-23 09:51:44 +01:00
beorn7 2db1eeb4ec Fix prometheus_rule_group_last_evaluation_timestamp_seconds
It should be a unix timestamp, not the seconds in the minute.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-06 11:02:49 +01:00
Ganesh Vernekar 787eb1e904 Set rule_group_last_duration_seconds to seconds (#5153)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-01-31 11:07:58 +01:00
Matt Layher 302148fd69 *: apply gofmt -s
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2019-01-16 17:28:14 -05:00
Vishnunarayan K I fd3ef6ba34 Add metric rule_group_rules_loaded to get the number of rules loaded (#5090)
Signed-off-by: Vishnunarayan K I <appukuttancr@gmail.com>
2019-01-13 14:28:07 +00:00
Simon Pasquier f678e27eb6
*: use latest release of staticcheck (#5057)
* *: use latest release of staticcheck

It also fixes a couple of things in the code flagged by the additional
checks.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Use official release of staticcheck

Also run 'go list' before staticcheck to avoid failures when downloading packages.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-01-04 14:47:38 +01:00
Tom Wilkie 121603c417
Expose rules.NewGroupMetrics and rules.Metrics. (#5059)
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-01-03 12:07:06 +00:00
Tom Wilkie 6e08029b56
Move err to be the last return value from storage.Select. (#5054)
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-01-02 11:10:13 +00:00
Bartek Płotka de213d4a5e rule manager: Moved metric registration to custom registerer which is already available. (#4961)
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2018-12-28 10:20:29 +00:00
AixesHunter fb8479a677 Variable 'labels' collides with imported package name (#5012)
Signed-off-by: aixeshunter <aixeshunter@gmail.com>
2018-12-19 09:44:03 +00:00
mknapphrt f0e9196dca Return warnings on a remote read fail (#4832)
Signed-off-by: Mark Knapp <mknapp@hudson-trading.com>
2018-11-30 14:27:12 +00:00
Krasi Georgiev 0754e5334b
querier for RestoreForState not closed. (#4922)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-28 15:25:17 +02:00
Ben Kochie c6399296dc
Fix spelling/typos (#4921)
* Fix spelling/typos

Fix spelling/typos reported by codespell/misspell.
* UK -> US spelling changes.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-27 17:44:29 +01:00
Wei Guo e329cbf673 Add metric prometheus_rule_group_last_evaluation for recording and alerting (#4852)
* add metric prometheus_rule_group_last_evaluation for recording and alerting

Signed-off-by: Wei Guo <me@imkira.com>

* fix issues from comments

Signed-off-by: Wei Guo <me@imkira.com>
2018-11-27 14:38:13 +08:00
Will Hegedus 193ebe7e34 Updates to /targets and /rules (scrape duration, last evaluation time) (#4722)
* Add evaluationTimestamp (Last Evaluation) column to display on /rules
Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>

* Add lastScrapeDuration ("Scrape Duration") to display on /targets
Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>

* Updates based on Julius' feedback

Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>

* Update to set timestamp to when eval started (after eval completes)

Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>

* Update /rules to display time since last evaluation

Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>

* Re-order Last Eval/Eval Time to be consistent with targets page

Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>
2018-10-12 18:26:59 +02:00
Callum Styan 9bca041285 WIP: keep track of samples per query, set a max # of samples (#4513)
* keep track of samples per query, set a max # of samples that can be in
memory at once

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2018-10-02 12:59:19 +01:00
Ganesh Vernekar 5790d23fd8 Unit testing for rules (#4350)
* Unit testing for rules
* Specifying order of group evaluation in unit tests

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-09-25 17:06:26 +01:00
Ganesh Vernekar 05726c5ea2 Test template expansion while loading groups (#4537)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-09-13 13:55:58 +01:00
Chris Marchbanks 63ed9d1b70 Send EndsAt along with alerts (#4550)
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2018-08-28 16:05:00 +01:00
Chris Marchbanks 87f1dad16d throttle resends of alerts to 1 minute by default (#4538)
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2018-08-27 17:41:42 +01:00
Goutham Veeramachaneni f3b7c22827 rules: add comment about lock taking (#4525)
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2018-08-21 21:30:08 +02:00
Ganesh Vernekar c663477688 Fixed TestUpdate in rules/manager_test.go (#4516)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-08-20 18:21:05 +05:30
Julius Volz 8fbe1b5133
Handle a bunch of unchecked errors (#4461)
There are many more (mostly finalizers like Close/Stop/etc.), but most of
the others seemed like one couldn't do much about them anyway.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-08-17 17:24:35 +02:00
Ganesh Vernekar a0a9e7df91 Fix TestForStateRestore (#4476) (#4512)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-08-16 22:56:15 +05:30
Julien Pivotto 0b4d22b245 rules/manager: remove a no-longer-relevant comment (#4503)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2018-08-15 09:33:39 +01:00
Chris Marchbanks 11155c7028 Existing alert labels will update based on templates (#4500)
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2018-08-15 08:52:08 +01:00
Fabian Reinartz b7e2f407de rules: Fix double-locking of mutex
Signed-off-by: Fabian Reinartz <freinartz@google.com>
2018-08-07 07:33:39 -04:00
Benji Visser 8bb6e0dd6e Show rule evaluation errors on rules page (#4457)
* adding information about the health and errors for Rules

adding Health() and LastError() to the Rule interface. This will allow
us to easily surface information about rules.

Signed-off-by: noqcks <benny@noqcks.io>

* updating rules.html with fields for Rule errors and health state

Signed-off-by: noqcks <benny@noqcks.io>

* fix code comment grammar & access Rule health/error info using a mutex

Signed-off-by: noqcks <benny@noqcks.io>

* s/Errors/Error/ in rules.html to remain consistent with targets.html

Signed-off-by: noqcks <benny@noqcks.io>

* adding periods to code comments in reporting/alerting

Signed-off-by: noqcks <benny@noqcks.io>

* putting health/error below mutex in struct field

Signed-off-by: noqcks <benny@noqcks.io>
2018-08-07 00:33:45 +02:00
Julius Volz 2b8fc062a8
rules: HTML-escape rule YAML marshal errors (#4464)
This was pointed out by `gosec`.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-08-05 14:01:51 +02:00
Julius Volz 90521a65f8
Remove error return value from NotifyFunc() (#4459)
It's always nil and we also forgot to check it.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-08-04 21:31:12 +02:00
Ganesh Vernekar f1db699dff Persist alert 'for' state across restarts (#4061)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-08-02 11:18:24 +01:00
Max Leonard Inden 71fafad099
api/v1: Coninue work exposing rules and alerts
Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2018-07-30 15:31:51 +02:00
mg03 31f8ca0dfb
api v1 alerts/rules json endpoint
Signed-off-by: mg03 <mgeng03@gmail.com>
2018-07-30 15:29:44 +02:00
Bryan Boreham afdb66dfac Expose Group.CopyState() (#4304)
This makes the `rules` package more useful to projects that use
Prometheus as a library.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2018-07-18 15:14:38 +02:00
Julius Volz 9e3171f6e3 rules: Minor naming/comment cleanups (#4328)
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-07-18 04:54:33 +01:00
Bryan Boreham 2bd510a63e Make TestUpdate() do some work (#4306)
Previously it would set no preconditions and check no postconditions,
as the `groups` member was empty.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2018-06-22 15:21:04 +01:00
Alin Sinpalean 9dc763cc03 Run rule evaluation with timestamps precisely evaluation_interval apart (#4201)
* Run rule evaluation with timestamps precisely evaluation_interval apart from one another.

Signed-off-by: Alin Sinpalean <alin.sinpalean@gmail.com>
2018-06-01 15:23:07 +01:00
Mario Trangoni 464e747f1e fix some comments typos (#4059) 2018-04-08 10:51:54 +01:00
Bryan Boreham 93494d8b7e Add an OpenTracing span for each rule (#4027)
* Add an OpenTracing span for each rule

So that tags and child spans can be traced back to the rule that they
refer to.
2018-03-30 21:29:19 +01:00
ferhat elmas ec8e4d8a7c all: remove unnecessary type conversions (#3992)
excep promql due to not to create conflict with #3966.
2018-03-21 09:25:22 +00:00
Warren Fernandes 58e2a31db8 Cleans up test by removing unused function (#3969) 2018-03-15 08:59:19 +00:00
ferhat elmas ffa673f7d8 General simplifications (#3887)
Another try as in #1516
2018-02-26 07:58:10 +00:00
Fabian Reinartz 7ccd4b39b8 *: implement query params
This adds a parameter to the storage selection interface which allows
query engine(s) to pass information about the operations surrounding a
data selection.
This can for example be used by remote storage backends to infer the
correct downsampling aggregates that need to be provided.
2018-02-13 12:17:22 +01:00
Simon Pasquier 81c0ab69e0 Don't reset FiredAt for inactive alerts
Otherwise AlertManager receives resolved alerts where StartsAt is zero which
fails the validation.
2018-01-22 17:17:33 +01:00
Brian Brazil 30b4439bbd Remove rule_type label from rule metrics.
This is not really needed now that we have rule groups
to distinguish rules.
2017-12-04 11:44:38 +00:00
Brian Brazil b97f4cf48c Add metrics for rule group interval and last duration. 2017-12-04 11:44:38 +00:00
Brian Brazil 0a42a9fc8f Copy over rule group duration on reload.
This is currently getting lost, this will soon be in a metric and we
don't want it dropping to 0 on every reload.
2017-12-04 11:44:38 +00:00
Brian Brazil aa370fa568 Clarify metric names around rule groups.
Make it clear they're about overall rule groups.
2017-12-04 11:44:38 +00:00
Fabian Reinartz 62461379b7 rules: decouple notifier packages
The dependency on the notifier packages caused a transitive dependency
on discovery and with that all client libraries our service discovery
uses.
2017-11-27 16:38:14 +01:00
Fabian Reinartz 4d964a0a0d rules: make glob expansion a concern of main 2017-11-24 08:22:57 +01:00
Fabian Reinartz bd9f7460eb rules: remove config package dependency 2017-11-24 07:57:54 +01:00
Fabian Reinartz 2d0e3746ac rules: remove dependency on promql.Engine 2017-11-24 07:57:54 +01:00
Fabian Reinartz 2ec5965b75
Merge pull request #3508 from prometheus/uptsdb
update TSDB
2017-11-23 19:11:54 +01:00
Fabian Reinartz 83cd270ea4 *: adapt to storage interface changes 2017-11-23 19:05:04 +01:00
Goutham Veeramachaneni a880c86375
Fix unexported method on exported interface.
Also move to model.Duration

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-11-23 19:13:57 +05:30
conorbroderick 55aaece116 Add rule evaluation time 2017-11-22 15:22:02 +00:00
Goutham Veeramachaneni e1117715fe
rules: remove skipped iterations cuz no throttling
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-11-14 17:33:00 +05:30
Jorge Hernández 6cd0f63eb1 Use testutil in rules subpackage (#3278)
* Use testutil in rules subpackage

* Fix manager test

* Use testutil in rules subpackage

* Fix manager test

* Fix rebase

* Change to testutil for applyConfig tests
2017-11-11 11:29:47 +01:00
Krasi Georgiev e86d82ad2d Fix regression of alert rules state loss on config reload. (#3382)
* incorrect map name for the group prevented copying state from existing alert rules on config reload

* applyConfig test

* few nits

* nits 2
2017-11-01 12:58:00 +01:00
Julius Volz 099df0c5f0 Migrate "golang.org/x/net/context" -> "context" (#3333)
In some places, where ctxhttp or gRPC are concerned, we still need to use the
old contexts.
2017-10-24 21:21:42 -07:00
Brian Brazil cc5499fcad Only close after checking for err. 2017-10-09 19:44:03 +01:00
Brian Brazil ee88f0d222 Ensure all values are used or _ 2017-10-09 19:44:03 +01:00
Fabian Reinartz 2d0b8e8b94 Merge branch 'master' into dev-2.0 2017-10-05 13:09:18 +02:00
Julius Volz f7e8348a88 Re-add contexts to storage.Storage.Querier() (#3230)
* Re-add contexts to storage.Storage.Querier()

These are needed when replacing the storage by a multi-tenant
implementation where the tenant is stored in the context.

The 1.x query interfaces already had contexts, but they got lost in 2.x.

* Convert promql.Engine to use native contexts
2017-10-04 21:04:15 +02:00
beorn7 c2e9a151ab Make all rule links link to the "Console" tab rather than "Graph"
Clicking on a rule, either the name or the expression, opens the rule
result (or the corresponding expression, repsectively) in the
expression browser. This should by default happen in the console tab,
as, more often than not, displaying it in the graph tab runs into a
timeout.
2017-09-21 18:28:00 +02:00
Fabian Reinartz d21f149745 *: migrate to go-kit/log 2017-09-08 22:01:51 +05:30
Goutham Veeramachaneni e1fc9dc78d Move /rules to new format (#2901)
Fixes #2891

Signed-off-by: Goutham Veeramachaneni <goutham@boomerangcommerce.com>
2017-07-08 11:38:02 +02:00
Goutham Veeramachaneni 37e7b69f56
Merge remote-tracking branch 'upstream/dev-2.0' into rulegroups
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-19 16:34:55 +05:30
Goutham Veeramachaneni c472316fb3
Check done before every rule evaluation.
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-16 16:57:22 +05:30
Goutham Veeramachaneni 6b70a4d850
Incorporate PR feedback
* Move fingerprint to Hash()
* Move away from tsdb.MultiError
* 0777 -> 0666 for files
* checkOverflow of extra fields

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-16 16:44:33 +05:30
Goutham Veeramachaneni 507790a357
Rework logging to use explicitly passed logger
Mostly cleaned up the global logger use. Still some uses in discovery
package.

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-16 15:52:44 +05:30
Goutham Veeramachaneni dc69645e92
Move back to go-yaml
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-16 10:46:21 +05:30
Goutham Veeramachaneni 5ff283a7b7
Reflect the grouping in the UI
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-14 16:09:14 +05:30
Goutham Veeramachaneni 8cca666cf2
Add file name to group.
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-14 15:18:39 +05:30
Goutham Veeramachaneni e893c89333
Validate labels and annotations
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-14 15:07:54 +05:30
Goutham Veeramachaneni a48a018368
Make sure groups are unique in a single file
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-14 12:19:21 +05:30
Goutham Veeramachaneni cea1e99f78
Add update-rules command to promtool
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-14 11:38:54 +05:30
Goutham Veeramachaneni e8f55669ea Move rules to new format
Signed-off-by: Goutham Veeramachaneni <goutham@boomerangcommerce.com>
2017-06-12 18:14:39 +05:30
Brian Brazil dcea3e4773 Don't append a 0 when alert is no longer pending/firing
With staleness we no longer need this behaviour.
2017-05-24 13:52:45 +01:00
Brian Brazil cc867dae60 Copy previous series and alert state more intelligently.
Usually rules don't more around, and if they do it's likely
that rules/alerts with the same name stay in the same order.

If rules/alerts with the same name are added/removed this
could cause a blip for one cycle, but this is unavoidable
without requiring rule and alert names to be unique - which we don't
want to do.
2017-05-24 13:52:45 +01:00
Brian Brazil 9bc68db7e6 Track staleness per rule rather than per group. 2017-05-24 13:52:45 +01:00
Brian Brazil 0451d6d31b Add unittest for rule staleness, and rules generally. 2017-05-24 13:52:45 +01:00
Brian Brazil 0400f3cfd2 Very basic staleness handling for rules. 2017-05-24 13:52:45 +01:00
Fabian Reinartz 06c2b76cd4 Merge branch 'master' into uptsdb 2017-05-16 16:48:37 +02:00
Alexey Palazhchenko b0e1ea7c6c Simplify code, fix typos. (#2719) 2017-05-15 09:56:09 +01:00
Julius Volz ac203ef0ee Add externalURL template function (#2716)
This allows users to e.g. add links back to the generating Prometheus
right in their alert templates.
2017-05-13 15:47:04 +02:00
Julius Volz fe11c5933a Fix mutation of active alert elements by notifier (#2656)
This caused the external label application in the notifier to bleed back
into the rule manager's active alerting elements.
2017-04-26 10:29:42 -05:00
Fabian Reinartz 8ffc851147 Merge branch 'master' into dev-2.0 2017-04-04 15:17:56 +02:00
Tobias Schmidt eaf33759fb Register forgotten prometheus_evaluator_iterations_total metric 2017-04-02 20:32:56 -03:00
Tobias Schmidt aaaba57184 Export number of missed rule evaluations
In case the execution of all rules takes longer than the configured rule
evaluation interval, one or more iterations will be skipped. This needs
to be visible to the opterator.
2017-04-02 20:03:28 -03:00
Fabian Reinartz 5772f1a7ba retrieval/storage: adapt to new interface
This simplifies the interface to two add methods for
appends with labels or faster reference numbers.
2017-02-02 13:05:46 +01:00
Fabian Reinartz ad9bc62e4c storage: extend appender and adapt it 2017-01-13 14:48:01 +01:00
Fabian Reinartz e94b0899ee rules: fix tests, remove model types 2016-12-29 17:31:14 +01:00
Fabian Reinartz f8fc1f5bb2 *: migrate ingestion to new batch Appender 2016-12-29 11:03:56 +01:00
Fabian Reinartz fecf9532b9 *: fix misc compile errors 2016-12-25 11:42:57 +01:00
Fabian Reinartz 622ece6273 *: fix recording tests, migrate matcher types 2016-12-25 11:12:57 +01:00
Fabian Reinartz 5817cb5bde *: migrate from model.* to promql.* types 2016-12-25 00:37:46 +01:00
Fabian Reinartz e68a3cf21f rules: update annotations on each iteration 2016-11-22 15:43:07 +01:00
Jonathan Lange d78dd3593d Set evaluation interval on Group construction
Prevents having object in invalid state, and allows users of public API
to construct valid Groups.
2016-11-18 16:32:30 +00:00
Jonathan Lange 31fc357cd8 Make NewGroup and Group.Eval public
Allows callers to execute evaluate lists of rules without first writing
them to disk.
2016-11-18 16:25:58 +00:00
Jonathan Lange 2a2da40223 Make rule evaluation publicly available
Means that a third-party can parse rules and run them with their own
execution model.
2016-11-18 16:12:50 +00:00
Matt Bostock 926a5ab3dd rules/manager.go: Fix race between reload and stop
On one relatively large Prometheus instance (1.7M series), I noticed
that upgrades were frequently resulting in Prometheus undergoing crash
recovery on start-up.

On closer examination, I found that Prometheus was panicking on
shutdown.

It seems that our configuration management (or misconfiguration thereof)
is reloading Prometheus then immediately restarting it, which I suspect
is causing this race:

    Sep 21 15:12:42 host systemd[1]: Reloading prometheus monitoring system.
    Sep 21 15:12:42 host prometheus[18734]: time="2016-09-21T15:12:42Z" level=info msg="Loading configuration file /etc/prometheus/config.yaml" source="main.go:221"
    Sep 21 15:12:42 host systemd[1]: Reloaded prometheus monitoring system.
    Sep 21 15:12:44 host systemd[1]: Stopping prometheus monitoring system...
    Sep 21 15:12:44 host prometheus[18734]: time="2016-09-21T15:12:44Z" level=warning msg="Received SIGTERM, exiting gracefully..." source="main.go:203"
    Sep 21 15:12:44 host prometheus[18734]: time="2016-09-21T15:12:44Z" level=info msg="See you next time!" source="main.go:210"
    Sep 21 15:12:44 host prometheus[18734]: time="2016-09-21T15:12:44Z" level=info msg="Stopping target manager..." source="targetmanager.go:90"
    Sep 21 15:12:52 host prometheus[18734]: time="2016-09-21T15:12:52Z" level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:548"
    Sep 21 15:12:56 host prometheus[18734]: time="2016-09-21T15:12:56Z" level=warning msg="Error on ingesting out-of-order samples" numDropped=1 source="scrape.go:467"
    Sep 21 15:12:56 host prometheus[18734]: time="2016-09-21T15:12:56Z" level=error msg="Error adding file watch for \"/etc/prometheus/targets\": no such file or directory" source="file.go:84"
    Sep 21 15:12:56 host prometheus[18734]: time="2016-09-21T15:12:56Z" level=error msg="Error adding file watch for \"/etc/prometheus/targets\": no such file or directory" source="file.go:84"
    Sep 21 15:13:01 host prometheus[18734]: time="2016-09-21T15:13:01Z" level=info msg="Stopping rule manager..." source="manager.go:366"
    Sep 21 15:13:01 host prometheus[18734]: time="2016-09-21T15:13:01Z" level=info msg="Rule manager stopped." source="manager.go:372"
    Sep 21 15:13:01 host prometheus[18734]: time="2016-09-21T15:13:01Z" level=info msg="Stopping notification handler..." source="notifier.go:325"
    Sep 21 15:13:01 host prometheus[18734]: time="2016-09-21T15:13:01Z" level=info msg="Stopping local storage..." source="storage.go:381"
    Sep 21 15:13:01 host prometheus[18734]: time="2016-09-21T15:13:01Z" level=info msg="Stopping maintenance loop..." source="storage.go:383"
    Sep 21 15:13:01 host prometheus[18734]: panic: close of closed channel
    Sep 21 15:13:01 host prometheus[18734]: goroutine 7686074 [running]:
    Sep 21 15:13:01 host prometheus[18734]: panic(0xba57a0, 0xc60c42b500)
    Sep 21 15:13:01 host prometheus[18734]: /usr/local/go/src/runtime/panic.go:500 +0x1a1
    Sep 21 15:13:01 host prometheus[18734]: github.com/prometheus/prometheus/rules.(*Manager).ApplyConfig.func1(0xc6645a9901, 0xc420271ef0, 0xc420338ed0, 0xc60c42b4f0, 0xc6645a9900)
    Sep 21 15:13:01 host prometheus[18734]: /home/build/packages/prometheus/tmp/build/gopath/src/github.com/prometheus/prometheus/rules/manager.go:412 +0x3c
    Sep 21 15:13:01 host prometheus[18734]: created by github.com/prometheus/prometheus/rules.(*Manager).ApplyConfig
    Sep 21 15:13:01 host prometheus[18734]: /home/build/packages/prometheus/tmp/build/gopath/src/github.com/prometheus/prometheus/rules/manager.go:423 +0x56b
    Sep 21 15:13:03 host systemd[1]: prometheus.service: main process exited, code=exited, status=2/INVALIDARGUMENT
2016-09-21 22:03:02 +01:00
Julius Volz c187308366 storage: Contextify storage interfaces.
This is based on https://github.com/prometheus/prometheus/pull/1997.

This adds contexts to the relevant Storage methods and already passes
PromQL's new per-query context into the storage's query methods.
The immediate motivation supporting multi-tenancy in Frankenstein, but
this could also be used by Prometheus's normal local storage to support
cancellations and timeouts at some point.
2016-09-19 16:29:07 +02:00
Julius Volz ed5a0f0abe promql: Allow per-query contexts.
For Weaveworks' Frankenstein, we need to support multitenancy. In
Frankenstein, we initially solved this without modifying the promql
package at all: we constructed a new promql.Engine for every
query and injected a storage implementation into that engine which would
be primed to only collect data for a given user.

This is problematic to upstream, however. Prometheus assumes that there
is only one engine: the query concurrency gate is part of the engine,
and the engine contains one central cancellable context to shut down all
queries. Also, creating a new engine for every query seems like overkill.

Thus, we want to be able to pass per-query contexts into a single engine.

This change gets rid of the promql.Engine's built-in base context and
allows passing in a per-query context instead. Central cancellation of
all queries is still possible by deriving all passed-in contexts from
one central one, but this is now the responsibility of the caller. The
central query context is now created in main() and passed into the
relevant components (web handler / API, rule manager).

In a next step, the per-query context would have to be passed to the
storage implementation, so that the storage can implement multi-tenancy
or other features based on the contextual information.
2016-09-19 15:38:17 +02:00
beorn7 75bae065fd Revert "Modify tests to adjust to reverting the /graph changes"
This reverts commit f1ea5bf232.

Part two necessary for reverting the /graph revert.
2016-09-03 21:08:33 +02:00
beorn7 f1ea5bf232 Modify tests to adjust to reverting the /graph changes
These tests have been added after the /graph changes and therefore
already test the new syntax.

This commit has to be reverted together with the previous one to get
back to the old new state. *sigh*
2016-09-02 14:12:31 +02:00
Julius Volz fe7b8b7fd1 Add missing license header to alerting_test.go 2016-08-13 00:11:52 +02:00
Julius Volz da7206ec29 Fix rule HTML escaping issues
This was mentioned as part of https://github.com/prometheus/alertmanager/issues/452
2016-08-12 02:59:41 +02:00
Brian Brazil 6fc88d4b4d Remove __name__ from alerts sent to AM.
Fixes #1861
2016-08-01 23:32:41 +01:00
Dmitry Vorobev 273e457da4 web: return status code and error message for config resource 2016-07-15 10:15:24 +02:00
Brian Brazil 0509b0f2db Expand alert templates at eval time.
Fixes #1678 #1677
2016-07-12 17:13:55 +01:00
beorn7 064b57858e Consistently use the Seconds() method for conversion of durations
This also fixes one remaining case of recording integral numbers
of seconds only for a metric, i.e. this will probably fix #1796.
2016-07-07 15:24:35 +02:00
Fabian Reinartz f7ed2ff706 Merge pull request #1644 from prometheus/beorn7/logging
Add missing logging of out-of-order samples
2016-05-20 05:52:00 -07:00
beorn7 b95c096a45 Fix style issues in rules/... 2016-05-19 16:59:53 +02:00
beorn7 45e5775f9b Add missing logging of out-of-order samples
So far, out-of-order samples during rule evaluation were not logged,
and neither scrape health samples. The latter are unlikely to cause
any errors. That's why I'm logging them always now. (It's alway highly
irregular should it happen.) For rules, I have used the same plumbing
as for samples, just with a different wording in the message to mark
them as a result of rule evaluation.
2016-05-19 16:22:53 +02:00
beorn7 4b574e8a61 Switch chunk encoding to type 2 where it was hardcoded type 1 before
The chunk encoding was hardcoded there because it mostly doesn't
matter what encoding is chosen in that test. Since type 1 is
battle-hardened enough, I'm switching to type 2 here so that we can
catch unexpected problems as a byproduct. My expectation is that the
chunk encoding doesn't matter anyway, as said, but then "unexpected
problems" contains the word "unexpected".
2016-03-20 23:32:20 +01:00
Fabian Reinartz d89c254849 Make copying alerting state safer.
This considers static labels in the equality of alerts to
avoid falsely copying state from a different alert definition with
the same name across reloads.

To be safe, it also copies the state map rather than just its pointer
so that remaining collisions disappear after one evaluation interval.
2016-03-02 12:21:54 +01:00
Fabian Reinartz bfa8aaa017 Rename notification to notifier 2016-03-01 12:39:08 +01:00
beorn7 663a1550d0 Fix the instrumentation fixes 2016-02-17 15:50:55 +01:00
Tobias Schmidt f1f8317fa5 Fix detection of flapping alerts
Alerts in the resolve retention period must be transitioned to the
active state again when their condition is met.
2016-02-04 23:55:12 -05:00
Björn Rabenstein 9ea3897ea7 Merge pull request #1354 from prometheus/beorn7/storage
Rework the way to communicate backpressure (AKA suspended ingestion)
2016-02-01 15:10:13 +01:00
beorn7 ec08c9a391 Rework the way to communicate backpressure (AKA suspended ingestion)
This gives up on the idea to communicate throuh the Append() call (by
either not returning as it is now or returning an error as
suggested/explored elsewhere). Here I have added a Throttled() call,
which has the advantage that it can be called before a whole _batch_
of Append()'s. Scrapes will happen completely or not at all. Same for
rule group evaluations. That's a highly desired behavior (as discussed
elsewhere). The code is even simpler now as the whole ingestion buffer
could be removed.

Logging of throttled mode has been streamlined and will create at most
one message per minute.
2016-02-01 14:45:44 +01:00
beorn7 a7408bfb47 Unify duration parsing
It's actually happening in several places (and for flags, we use the
standard Go time.Duration...). This at least reduces all our
home-grown parsing to one place (in model).
2016-01-29 15:41:50 +01:00
Fabian Reinartz a6935024e1 Remove old WITH clause in alert printing 2016-01-26 15:45:27 +01:00
Fabian Reinartz b0adfea8d5 Fix swapped constants, improve instrumentation 2016-01-21 12:15:29 +01:00
Fabian Reinartz a8c38c3ac5 Don't log rule evaluation failure on shutdown 2016-01-18 17:34:25 +01:00
Fabian Reinartz 6eee86dce8 Terminate rule groups during initial sleep
When an evaluation group runs initially, it waits a deterministic
amount of time. During that time it also has to accept
a termination singnal so shutdown doesn't hang during the first
evaluation iteration after a configuration reload.

Fixes #1307
2016-01-12 10:54:09 +01:00
Fabian Reinartz 26eb3ac2f8 Don't skip recording rule errors 2016-01-12 10:26:06 +01:00
Fabian Reinartz 37d80c4b25 Fix premature rule evaluation
This commit prevents rule evaluation from starting until after
the storage is ready.
2016-01-08 17:51:22 +01:00
Fabian Reinartz 0cf3c6a9ef Add comments, rename a method 2015-12-23 12:29:28 +01:00
Fabian Reinartz bf6abac8f4 Send resolved notifications 2015-12-17 15:42:26 +01:00
Fabian Reinartz f69e668fc4 Improve rules/ instrumentation
This commit adds a counter for the total number of rule evaluations
and standardizes the units to seconds.
2015-12-17 15:42:26 +01:00
Fabian Reinartz 62075aa037 Reduce noisy no-alertmanager warning 2015-12-17 15:42:26 +01:00
Fabian Reinartz 52e5224f5a Refactor rules/ package 2015-12-17 15:42:25 +01:00
Fabian Reinartz e4fabe135a Set StartsAt to time of first firing state 2015-12-17 11:36:58 +01:00
Fabian Reinartz 7c90db22ed Use annotation based alerts in rules/
This commit breaks the previously used alert format.
2015-12-14 10:16:07 +01:00
Fabian Reinartz e114ce0ff7 Refactor notification handler 2015-12-11 15:17:32 +01:00
Fabian Reinartz e3b6ec9784 Switch to common/log 2015-10-03 10:21:43 +02:00
Fabian Reinartz 171f50706a Fix unkeyed field errors. 2015-09-18 17:00:08 +02:00
Brian Brazil 4d196fea6b Merge pull request #1032 from prometheus/scalar-metric
rules: Allow for setting labels on LHS on scalars
2015-08-26 16:56:16 +01:00
Brian Brazil 3bcdb2bbba rules: Allow for setting labels on LHS on scalars 2015-08-26 16:54:28 +01:00
Julius Volz 995d3b831d Fix most golint warnings.
This is with `golint -min_confidence=0.5`.

I left several lint warnings untouched because they were either
incorrect or I felt it was better not to change them at the moment.
2015-08-26 12:44:46 +02:00
Fabian Reinartz d6b8da8d43 Switch promql types to common/model 2015-08-25 13:49:14 +02:00
Brian Brazil fdf0d0642e Cast value to float, as that's what the console templates expect. 2015-08-24 16:59:08 +01:00
Fabian Reinartz 438e232c9b Fix grouping of import blocks 2015-08-22 09:42:45 +02:00
Fabian Reinartz 306e8468a0 Switch from client_golang/model to common/model 2015-08-21 13:33:38 +02:00
Brian Brazil e6a67476c2 rules: Allow recorded rules expressions to be scalars.
This is useful if you want to build up a constant metric,
such as a set of alert thresholds that vary by label value.
2015-08-19 21:09:00 +01:00
Fabian Reinartz 7a67472fc1 Resolve relative paths on configuration loading
This moves the concern of resolving the files relative to the config
file into the configuration loading itself.
It also fixes #921 which did not load the cert and token files relatively.
2015-08-05 18:08:04 +02:00
Fabian Reinartz feb8a03503 rules: load rule files relative to a base dir 2015-07-03 15:10:37 +02:00
Julius Volz fcff35b43e Consolidate external reachability flags into one.
Besides fixing https://github.com/prometheus/prometheus/issues/805 by
making the entire externally reachable server URL configurable, this
adds tests for the "globalURL" template function and makes it easier to
test other such functions in the future.

This breaks the `web.Hostname` flag (and introduces `web.external-url`).
This flag is likely only used by few users, so I hope that's
justifiable.

Fixes https://github.com/prometheus/prometheus/issues/805
2015-07-03 13:39:10 +02:00
Fabian Reinartz f06cf664e1 rules: cleanup alerting test 2015-06-30 18:22:24 +02:00
Fabian Reinartz 9bd4f6d017 rules: preserve alert state across reloads. 2015-06-30 11:32:07 +02:00
Fabian Reinartz 4625485b84 rules: move rules*.go contents to manager*.go 2015-06-30 11:32:07 +02:00
Fabian Reinartz 749ae450c5 promql: add runbook to alert statement.
This commit adds the RUNBOOK keyword to alert statements. The field
is optional and expected to be a link.
2015-06-25 13:00:52 +02:00
Julius Volz d868264bb8 Improve UI of /alerts page.
Changes to the UI:
- "Active Since" timestamps are now human-readable.
- Alerting rules are now pretty-printed better.
- Labels are no longer just strings, but alert bubbles (like we do on
  the status page for base labels).
- Alert states and target health states are now capitalized in the
  presentation layer rather than at the source.
2015-06-23 18:48:45 +02:00
Fabian Reinartz fe301d7946 promql: remove global flags 2015-06-15 19:01:06 +02:00
Fabian Reinartz 5e13880201 General cleanup of rules. 2015-06-06 21:40:52 +02:00
Fabian Reinartz 75c920c95e Remove DotGraph method from Rule interface 2015-06-06 21:35:59 +02:00
Fabian Reinartz 83d07516e8 Remove EvalRaw methods from Rule interface 2015-06-06 21:34:09 +02:00
Fabian Reinartz 280d11dca8 main: exit on invalid rule files on startup. 2015-06-02 18:44:41 +02:00
Fabian Reinartz 0de6edbdfc Move pkg/ to util/ 2015-06-01 21:12:32 +02:00
Fabian Reinartz 02717e6fde Remove generic set type 2015-06-01 21:12:32 +02:00
Fabian Reinartz dbc0d30e3e Move string functionality to pkg/strutil 2015-06-01 21:12:32 +02:00
Fabian Reinartz f45a5cab60 Move templates package to pkg/template 2015-06-01 21:12:31 +02:00
Fabian Reinartz c44ac7bc26 Load rule files from entire directories 2015-06-01 21:12:31 +02:00
Julius Volz d7c015c149 Convert pathPrefix to not have trailing slash. 2015-06-01 12:43:17 +02:00
Julius Volz ff53d10849 Fix double slash in GeneratorURL sent to alertmanager.
Fixes https://github.com/prometheus/prometheus/issues/722
2015-05-23 19:16:57 +02:00
Julius Volz 267fd34156 Switch Prometheus to use github.com/prometheus/log.
This change is conceptually very simple, although the diff is large. It
switches logging from "github.com/golang/glog" to
"github.com/prometheus/log", while not actually changing any log
messages. V(1)-style logging has been changed to be log.Debug*().
2015-05-20 18:19:32 +02:00
Fabian Reinartz e2ed921505 Merge branch 'master' into fabxc/servdisc 2015-05-20 14:13:08 +02:00
Mitsuhiro Tanda 3e914a8cb1 fix graph links with path prefix 2015-05-19 02:45:05 +09:00
Fabian Reinartz bb540fd9fd Implement config reloading on SIGHUP.
With this commit, sending SIGHUP to the Prometheus process will reload
and apply the configuration file. The different components attempt
to handle failing changes gracefully.
2015-05-13 16:49:46 +02:00
Fabian Reinartz fe935179cd Stop routing rule statements through the engine. 2015-04-29 18:01:43 +02:00
Fabian Reinartz 8d7c479fed Merge pull request #658 from prometheus/fabxc/pql/rules-manager
Rename RuleManager to Manager, remove interface.
2015-04-29 16:54:21 +02:00
Fabian Reinartz 479891c9be Rename RuleManager to Manager, remove interface.
This commits renames the RuleManager to Manager as the package
name is 'rules' now. The unused layer of abstraction of the
RuleManager interface is removed.
2015-04-29 16:42:10 +02:00
Fabian Reinartz 25cdff3527 Remove name arg from Parse* functions, enhance parsing errors. 2015-04-29 16:38:41 +02:00
Fabian Reinartz 3ca11bcaf5 Switch Prometheus to promql package.
This commit removes all functionality from rules/ that is now handled in
promql/.
All parts of Prometheus are changed to use the promql/ package.
2015-04-28 16:19:23 +02:00
Ceesjan Luiten 0e18784c64 Make all paths absolute to support proxies 2015-04-02 20:36:47 +02:00
Brian Brazil 941f585164 Avoid +InfYs and similar, just display +Inf. 2015-03-28 18:51:41 +00:00
beorn7 a075900f9a Merge branch 'beorn7/persistence' into beorn7/ingestion-tweaks 2015-03-18 19:09:31 +01:00
Fabian Reinartz 624f27f4b6 Add ln, log2, log10 and exp functions to the query language. 2015-03-16 18:26:19 +01:00
Julius Volz b2651027fc Fix special value handling in division and modulo.
This fixes https://github.com/prometheus/prometheus/issues/597
2015-03-16 14:23:40 +01:00
beorn7 be11cb2b07 Remove the sample ingestion channel.
The one central sample ingestion channel has caused a variety of
trouble. This commit removes it. Targets and rule evaluation call an
Append method directly now. To incorporate multiple storage backends
(like OpenTSDB), storage.Tee forks the Append into two different
appenders.

Note that the tsdb queue manager had its own queue anyway. It was a
queue after a queue... Much queue, so overhead...

Targets have their own little buffer (implemented as a channel) to
avoid stalling during an http scrape. But a new scrape will only be
started once the old one is fully ingested.

The contraption of three pipelined ingesters was removed. A Target is
an ingester itself now. Despite more logic in Target, things should be
less confusing now.

Also, remove lint and vet warnings in ast.go.
2015-03-15 14:08:22 +01:00
beorn7 13fcf1ddbc Implement double-delta encoded chunks. 2015-03-05 20:33:26 +01:00
beorn7 9e85ab0eef Apply the new signature/fingerprinting functions from client_golang.
This requires the new version of client_golang (vendoring will follow
in the next commit), which changes the fingerprinting for
clientmodel.Metric.
2015-03-03 18:34:01 +01:00
Fabian Reinartz 182de6b99f Fix unary +/- expressions.
Unary expressions cause parsing errors if they are done in the lexer
by tokenizing them into the number.
This fix moves unary expressions to the parser.
2015-03-03 13:30:08 +01:00
Fabian Reinartz 6f754073d5 Add OR operation and vector matching options.
This commits implements the OR operation between two vectors.
Vector matching using the ON clause is added to limit the set of
labels that define a match between two elements. Group modifiers
(GROUP_LEFT/GROUP_RIGHT) to request many-to-one matching are added.
2015-03-03 11:35:10 +01:00
Julius Volz 0ac931aed1 Also support parsing float formats like "2.". 2015-03-02 12:58:05 +01:00
Julius Volz c2ab54e9a6 Support scientific notation and special float values.
This adds support for scientific notation in the expression language, as
well as for all possible literal forms of +Inf/-Inf/NaN.

TODO: Keep enough state in the parser/lexer to distinguish contexts in
which "Inf", "NaN", etc. should be parsed as a number vs. parsed as a
label name. Currently, foo{nan="bar"} would be a syntax error. However,
that is an existing bug for all our reserved words. E.g. foo{sum="bar"}
is a syntax error as well. This should be fixed separately.
2015-03-01 19:31:16 +01:00
beorn7 1a61bcae07 Fix plural of 'histogram'.
Actually, 'histogram' is Ancient Greek and 3rd declension... ;-)
2015-02-23 15:29:26 +01:00
beorn7 17443d288b Avoid copying of the COWMetric if we already have the metric available. 2015-02-22 01:04:52 +01:00
beorn7 9e7c3e3bcd Add the histogram_quantile function.
Since we are now getting really deep into floating point calculation,
the tests had to take into account the precision loss. Since the rule
tests are based on direct line matching in the output, implementing
the "almost equal" semantics was pretty cumbersome, but here we are.
2015-02-22 01:04:51 +01:00
Julius Volz 42601acfde Replace labelsToKey() with metric Fingerprint (fixes grouping bug). 2015-02-21 17:45:47 +01:00
Julius Volz 7fefccd929 Write() directly into hash and use model.SeparatorByte. 2015-02-21 17:19:13 +01:00
Julius Volz 645cf57bed Fix aggregation grouping key calculation. 2015-02-21 14:05:50 +01:00
Julius Volz 15b2b5aa66 Add tests for invalid uses of "offset". 2015-02-18 02:56:40 +01:00
Julius Volz 67e20acc6c Lower-case some package-internal names. 2015-02-18 02:45:54 +01:00
Julius Volz 72d7b325a1 Implement offset operator.
This allows changing the time offset for individual instant and range
vectors in a query.

For example, this returns the value of `foo` 5 minutes in the past
relative to the current query evaluation time:

    foo offset 5m

Note that the `offset` modifier always needs to follow the selector
immediately. I.e. the following would be correct:

    sum(foo offset 5m) // GOOD.

While the following would be *incorrect*:

    sum(foo) offset 5m // INVALID.

The same works for range vectors. This returns the 5-minutes-rate that
`foo` had a week ago:

    rate(foo[5m] offset 1w)

This change touches the following components:

* Lexer/parser: additions to correctly parse the new `offset`/`OFFSET`
  keyword.
* AST: vector and matrix nodes now have an additional `offset` field.
  This is used during their evaluation to adjust query and result times
  appropriately.
* Query analyzer: now works on separate sets of ranges and instants per
  offset. Isolating different offsets from each other completely in this
  way keeps the preloading code relatively simple.

No storage engine changes were needed by this change.

The rules tests have been changed to not probe the internal
implementation details of the query analyzer anymore (how many instants
and ranges have been preloaded). This would also become too cumbersome
to test with the new model, and measuring the result of the query should
be sufficient.

This fixes https://github.com/prometheus/prometheus/issues/529
This fixed https://github.com/prometheus/promdash/issues/201
2015-02-18 02:41:27 +01:00
Brian Brazil 60271d58bf Change the 2nd argument of round to toNearest.
This is more useful if you want get a multiple of 2 or 5, while
still working for .001.
2015-02-05 16:13:40 +00:00
Julius Volz 82613527f3 Remove unnecessary float64() conversion in round(). 2015-02-05 15:14:05 +01:00
Marko Mikulicic 8fdacbdf17 Add floor, ceil and round functions. Closes #402 2015-02-04 17:20:56 +01:00
Fabian Reinartz fa1e90003b Query timeout added.
This is related to #454. Queries now timeout after a duration set by
the -query.timeout flag. The TotalEvalTimer is now started/stopped
inside any of the ast.Eval* functions.
2015-02-03 08:04:27 +01:00
Bjoern Rabenstein 26e22e6ad6 Fix rule manager shutdown. 2015-01-29 15:05:10 +01:00
Julius Volz d4374a9265 More efficient JSON query result format.
This depends on https://github.com/prometheus/client_golang/pull/51.

For vectors, the result format looks like this:

```json
{
   "version": 1,
   "type" : "vector",
   "value" : [
      {
         "timestamp" : 1421765411.045,
         "value" : "65.475000",
         "metric" : {
            "quantile" : "0.5",
            "instance" : "http://localhost:9090/metrics",
            "job" : "prometheus",
            "__name__" : "http_request_duration_microseconds",
            "handler" : "/static/",
            "method" : "get",
            "code" : "304"
         }
      },
      {
         "timestamp" : 1421765411.045,
         "value" : "5826.339000",
         "metric" : {
            "quantile" : "0.9",
            "instance" : "http://localhost:9090/metrics",
            "job" : "prometheus",
            "__name__" : "http_request_duration_microseconds",
            "handler" : "prometheus",
            "method" : "get",
            "code" : "200"
         }
      },
      /* ... */
   ]
}
```

For matrices, it looks like this:

```json
{
   "version": 1,
   "type" : "matrix",
   "value" : [
      {
         "metric" : {
            "quantile" : "0.99",
            "instance" : "http://localhost:9090/metrics",
            "job" : "prometheus",
            "__name__" : "http_request_duration_microseconds",
            "handler" : "/static/",
            "method" : "get",
            "code" : "200"
         },
         "values" : [
            [
               1421765547.659,
               "29162.953000"
            ],
            [
               1421765548.659,
               "29162.953000"
            ],
            [
               1421765549.659,
               "29162.953000"
            ],
            /* ... */
         ]
      }
   ]
}
```
2015-01-26 13:06:22 +01:00
Brian Brazil a31730e88b Make 2nd arg to delta optional. Add a deriv() function.
The 2nd isCounter argument to delta is ugly, make it optional as the first step
of deprecating it. This will makes delta only ever applied to gauges.

Add a deriv function to calculate the least squares
slope of a gauge. This is more useful for prediction than delta,
as it isn't as heavily influenced by outliers at the boundaries.
2015-01-23 14:50:27 +00:00
Bjoern Rabenstein 5859b74f1b Clean up license issues.
- Move CONTRIBUTORS.md to the more common AUTHORS.
- Added the required NOTICE file.
- Changed "Prometheus Team" to "The Prometheus Authors".
- Reverted the erroneous changes to the Apache License.
2015-01-21 20:07:45 +01:00
Bjoern Rabenstein b09453af1d Adjust to new client_golang API. 2015-01-21 15:42:25 +01:00
Julius Volz bb1e49383e Log rule evalation errors. 2015-01-08 17:50:55 +01:00
Julius Volz d6b9e97655 Remove extraction.Result type, simplify code. 2015-01-08 16:34:01 +01:00
Julius Volz 9a4ca68a61 Add metrics for rule evaluation failures.
Fixes https://github.com/prometheus/prometheus/issues/417
2015-01-08 16:33:35 +01:00
Brian Brazil ffa2e73803 Fix regression from 5e8d57bec1
0 is a false value, so shortcutting no longer works.
Update other places in the code that assumed graph was the default.
2014-12-27 00:28:36 +00:00
Julius Volz cc27fb8aab Rename remaining all-caps constants in AST layer.
Change-Id: Ibe97e30981969056ffcdb89e63c1468ea1ffa140
2014-12-25 01:30:47 +01:00
Julius Volz 895523ad14 Include necessary Makefile.INCLUDE from rules/Makefile.
Change-Id: I077d018dbe4093cd40ddf38d66a996df222bf5e4
2014-12-25 01:13:59 +01:00
Julius Volz 2ade9d40cf Clarify why we need int constants for expression types.
Change-Id: I053fc5d32c118dbdb204dc8193337f981aff796e
2014-12-25 00:45:30 +01:00
Julius Volz 00a2a93a05 Add regression tests for metrics mutations in AST.
It turned out in the end, that only drop_common_metrics() produced any
erroneous output in the old system. The second expression in the test
("sum(testmetric) keeping_extra") already worked in the old code, but
why not keep it in...

The way to test ranged evaluations is a bit clumsy so far, so I want to
build a nicer test framework in the end, where all the test cases can be
specified as text files which specify desired inputs, outputs, query
step widths, etc.

Change-Id: I821859789e69b8232bededf670a1b76e9e8c8ca4
2014-12-12 20:34:55 +01:00
Julius Volz c9618d11e8 Introduce copy-on-write for metrics in AST.
This depends on changes in:

https://github.com/prometheus/client_golang/tree/cow-metrics.

Change-Id: I80b94833a60ddf954c7cd92fd2cfbebd8dd46142
2014-12-12 20:34:55 +01:00
Bjoern Rabenstein b1e4956142 Apply a giant code cleanup.
Essentially:

- Remove unused code.

- Make it 'go vet' clean. The only remaining warnings are in generated code.

- Make it 'golint' clean. The only remaining warnings are in gerenated code.

- Smoothed out same minor things.

Change-Id: I3fe5c1fbead27b0e7a9c247fee2f5a45bc2d42c6
2014-12-10 16:16:49 +01:00
Bjoern Rabenstein fee88a7a77 Remove the remaining races, new and old.
Also, resolve a few other TODOs.

Change-Id: Icb39b5a5e8ca22ebcb48771cd8951c5d9e112691
2014-12-03 18:07:23 +01:00
Bjoern Rabenstein 7d11019aa2 Squash a few trivial TODOs.
- Delete unneeded file view_adapter.go.
- Assessed that we still need the fingerprints in nodes
  (to create iterators).
- Turned numMemChunkDescs into a metric.

Change-Id: I29be963c795a075ec00c095f76bf26405535609d
2014-11-27 18:26:06 +01:00
Julius Volz 6eecee55b7 Fix acronym caps in GeneratorURL.
Change-Id: Ib18c1f617dcde1039e848059545a6d8831d9bf66
2014-11-25 17:13:04 +01:00
Bjoern Rabenstein 0ae1d8889a Fix tests after merge.
Change-Id: Ia90da9a3e48ed780ec38c4a6a1fd9ea34e7f6a58
2014-11-25 17:13:04 +01:00
Julius Volz b7bf11230a Add absent() function.
A common problem in Prometheus alerting is to detect when no timeseries
exist for a given metric name and label combination. Unfortunately,
Prometheus alert expressions need to be of vector type, and
"count(nonexistent_metric)" results in an empty vector, yielding no
output vector elements to base an alert on. The newly introduced
absent() function solves this issue:

  ALERT FooAbsent IF absent(foo{job="myjob"}) [...]

absent() has the following behavior:

- if the vector passed to it has any elements, it returns an empty
  vector.

- if the vector passed to it has no elements, it returns a 1-element
  vector with the value 1.

In the second case, absent() tries to be smart about deriving labels of
the 1-element output vector from the input vector:

  absent(nonexistent{job="myjob"})                   => {job="myjob"}
  absent(nonexistent{job="myjob",instance=~".*"})    => {job="myjob"}
  absent(sum(nonexistent{job="myjob"}))              => {}

That is, if the passed vector is a literal vector selector, it takes all
"=" label matchers as the basis for the output labels, but ignores all
non-equals or regex matchers. Also, if the passed vector results from a
non-selector expression, no labels can be derived.

Change-Id: I948505a1488d50265ab5692a3286bd7c8c70cd78
2014-11-25 17:13:04 +01:00
Julius Volz 3d47f94149 Drop metric names after transformations.
After many transformations, it doesn't make sense to keep the metric
names, since the result of the transformation is no longer that metric.
This drops the metric name after such transformations and makes the web
UI deal well with missing metric names.

This depends on the current branch on the following things:

- prometheus/client_golang needs to be at
  e237cf15c6
  in branch "julius/int-fingerprints" (to be merged with new storage)

- prometheus/promdash needs to be at
  dd7691c9c2

Change-Id: Ib3c8cad8d647d9854e8c653c424b8c235ccc231d
2014-11-25 17:13:04 +01:00
Bjoern Rabenstein 14bda4180c Changes after pair code review.
Change-Id: Ib72d40f8e9027818cfbbd32a7a7201eebda07455
2014-11-25 17:12:59 +01:00
Bjoern Rabenstein 006b5517e2 Simplify makefiles.
This removes the dependancy on C leveldb and snappy.
It also takes care of fewer dependencies as they would
anyway not work on any non-Debian, non-Brew system.

Change-Id: Ia70dce1ba8a816a003587927e0b3a3f8ad2fd28c
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein 74c143c4c9 Improve scraper shutdown time.
- Stop target pools in parallel.
- Stop individual scrapers in goroutines, too.
- Timing tweaks.

Change-Id: I9dff1ee18616694f14b04408eaf1625d0f989696
2014-11-25 17:10:39 +01:00
Julius Volz 0712d738d1 Allow alternative "by"-clause position in grammar.
In addition to the existing by-clause syntax:

  sum(<expression>) by (<labels>) [keeping_extra]

...this allows the following new syntax:

  sum by (<labels>) [keeping_extra] (<expression>)

Both orderings may be used in a single expression. It is up to the users
to establish guidelines around their usage.

Change-Id: Iba10c9cc5fb6ac62edfcf246d281473e82467992
2014-11-25 17:09:04 +01:00
Julius Volz 0e48c18bbf Allow omitting the metric name in queries.
This allows the following expression syntaxes for selecting timeseries:

  foo                    (already valid before)
  foo{}                  (already valid before)
  {job="prometheus"}     (new, select all timeseries for job "prometheus")

Omitting both the metric name *and* any label matchers ("" or "{}") will
still yield a syntax error.

To get all timeseries, you could do:

    {__name__=~".*"}
       or, without relying on knowledge about __metric__:
    {job=~".*"}

Change-Id: Ifee000b9ac0184ef6ced18411069c7f2699a2dda
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 096fa0f8b2 Squash a number of TODOs.
- Staleness delta is no a proper function parameter and not replicated
  from package ast.

- Named type 'chunks' replaced by explicit '[]chunk' to avoid confusion.

- For the same reason, replaced 'chunkDescs' by '[]*chunkDescs'.

- Verified that math.Modf is not a speed enhancement over conversion
  (actually 5x slower).

- Renamed firstTimeField, lastTimeField into chunkFirstTime and
  chunkLastTime.

- Verified unpin() is sufficiently goroutine-safe.

- Decided not to update archivedFingerprintToTimeRange upon series
  truncation and added a rationale why.

Change-Id: I863b8d785e5ad9f71eb63e229845eacf1bed8534
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein b3ed9aa7a2 Clean up start-up and shut-down.
Change-Id: Idff4bbb0a15a9f879bfbb3da5b1025179cab5e2c
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein 38fc24d0ed Fix targetpool_test.go and other tests.
Change-Id: I91a4dd1d39e01f174e1aaae653ce1ed7aecaa624
2014-11-25 17:08:26 +01:00
Julius Volz 7f5d3c2c29 Fix and improve the fp locker.
Benchmark:
$ go test -bench 'Fingerprint' -test.run 'Fingerprint' -test.cpu=1,2,4

OLD
BenchmarkFingerprintLockerParallel        500000              3618 ns/op
BenchmarkFingerprintLockerParallel-2      100000             12257 ns/op
BenchmarkFingerprintLockerParallel-4      500000             10164 ns/op
BenchmarkFingerprintLockerSerial        10000000               283 ns/op
BenchmarkFingerprintLockerSerial-2      10000000               284 ns/op
BenchmarkFingerprintLockerSerial-4      10000000               288 ns/op

NEW
BenchmarkFingerprintLockerParallel       1000000              1018 ns/op
BenchmarkFingerprintLockerParallel-2     1000000              1164 ns/op
BenchmarkFingerprintLockerParallel-4     2000000               910 ns/op
BenchmarkFingerprintLockerSerial        50000000                56.0 ns/op
BenchmarkFingerprintLockerSerial-2      50000000                47.9 ns/op
BenchmarkFingerprintLockerSerial-4      50000000                54.5 ns/op

Change-Id: I3c65a43822840e7e64c3c3cfe759e1de51272581
2014-11-25 17:07:45 +01:00
Julius Volz 358f97791d Minor cleanups.
Change-Id: Ia8685d8439a421fe2143d9ec7120d5bb5ab88d78
2014-11-25 17:07:44 +01:00
Bjoern Rabenstein f5f9f3514a Major code cleanup.
- Make it go-vet and golint clean.
- Add comments, TODOs, etc.

Change-Id: If1392d96f3d5b4cdde597b10c8dff1769fcfabe2
2014-11-25 17:02:53 +01:00
Julius Volz e7ed39c9a6 Initial experimental snapshot of next-gen storage.
Change-Id: Ifb8709960dbedd1d9f5efd88cdd359ee9fa9d26d
2014-11-25 17:02:00 +01:00
Julius Volz 85497e3f38 Add function to drop common labels in a vector.
This fixes https://github.com/prometheus/prometheus/issues/384.

Change-Id: I2973c4baeb8a4618ec3875fb11c6fcf5d111784b
2014-11-25 17:02:00 +01:00
Julius Volz 3fdb74e571 Add more topk() / bottomk() tests.
Test what happens if k > number of input elements.

Change-Id: Ie724b850939e297ebf085f0a5a3522e9cfcc6534
2014-11-25 17:02:00 +01:00
Julius Volz c582ae73c2 Implement topk() and bottomk() functions.
To achieve O(log n * k) runtime, this uses a heap to track the current
bottom-k or top-k elements while iterating over the full set of
available elements.

It would be possible to reuse more code between topk and bottomk, but I
decided for some more duplication for the sake of clarity.

This fixes https://github.com/prometheus/prometheus/issues/399

Change-Id: I7487ddaadbe7acb22ca2cf2283ba6e7915f2b336
2014-11-25 17:02:00 +01:00
Bjoern Rabenstein 1909686789 Make metrics exported by the Prometheus server itself more consistent.
- Always spell out the time unit (e.g. milliseconds instead of ms).

- Remove "_total" from the names of metrics that are not counters.

- Make use of the "Namespace" and "Subsystem" fields in the options.

- Removed the "capacity" facet from all metrics about channels/queues.
  These are all fixed via command line flags and will never change
  during the runtime of a process. Also, they should not be part of
  the same metric family. I have added separate metrics for the
  capacity of queues as convenience. (They will never change and are
  only set once.)

- I left "metric_disk_latency_microseconds" unchanged, although that
  metric measures the latency of the storage device, even if it is not
  a spinning disk. "SSD" is read by many as "solid state disk", so
  it's not too far off. (It should be "solid state drive", of course,
  but "metric_drive_latency_microseconds" is probably confusing.)

- Brian suggested to not mix "failure" and "success" outcome in the
  same metric family (distinguished by labels). For now, I left it as
  it is. We are touching some bigger issue here, especially as other
  parts in the Prometheus ecosystem are following the same
  principle. We still need to come to terms here and then change
  things consistently everywhere.

Change-Id: If799458b450d18f78500f05990301c12525197d3
2014-11-25 17:02:00 +01:00
Julius Volz 00b9489f1c Fix time() behavior.
time() should return the timestamp for which the query is executed, not
the actual current time.

Change-Id: I430a45cabad7785cd58f95b1028a71dff4c87710
2014-11-25 17:02:00 +01:00
Julius Volz c5984f1818 Add abs() and over-time aggregation functions.
This implements aggregation functions over time as request in
https://github.com/prometheus/prometheus/issues/383.

Change-Id: Ifd69b850de8cfdf6e7a6c0e042056fa4c672410e
2014-11-25 17:02:00 +01:00
Brian Brazil f525ca5d9e Let consoles get graph links from experssions.
Rename ConsoleLinkFromExpression, as we now have consoles.

Change-Id: I7ed2c9c83863adb390b51121dd9736845f7bcdfc
2014-11-25 17:01:59 +01:00
Bjoern Rabenstein 8956faeccb Migrate to new client_golang.
This change will only be submitted when the new client_golang has been
moved to the new version.

Change-Id: Ifceb59333072a08286a8ac910709a8ba2e3a1581
2014-11-25 17:01:59 +01:00
Brian Brazil 960ede66dc Use html/template for console templates and add template libary support.
Add a function to bypass the new auto-escaping.
Add a function to workaround go's templates only allowing passing in one argument.

Change-Id: Id7aa3f95e7c227692dc22108388b1d9b1e2eec99
2014-11-25 17:01:59 +01:00
Brian Brazil e041c0cd46 Add console and alert templates with access to all data.
Move rulemanager to it's own package to break cicrular dependency.
Make NewTestTieredStorage available to tests, remove duplication.

Change-Id: I33b321245a44aa727bfc3614a7c9ae5005b34e03
2014-05-30 16:24:56 +01:00
Bjoern Rabenstein ca6a4fccef Weed out our homegrown test.Tester.
The Go stdlib has testing.TB now, which fulfills the exact same
purpose.

Change-Id: I0db9c73400e208ca376b932a02b7e3402234b87c
2014-05-21 19:27:24 +02:00
Julius Volz 6297a405f2 Do not indent API JSON responses.
In one example response, this reduced the uncompressed size by 25% and
the gzipped size by 11%.

Change-Id: Ie80d44253124b9f8601b8ef9fc978e92dacff523
2014-04-22 15:16:37 +02:00
Julius Volz 01f652cb4c Separate storage implementation from interfaces.
This was initially motivated by wanting to distribute the rule checker
tool under `tools/rule_checker`. However, this was not possible without
also distributing the LevelDB dynamic libraries because the tool
transitively depended on Levigo:

rule checker -> query layer -> tiered storage layer -> leveldb

This change separates external storage interfaces from the
implementation (tiered storage, leveldb storage, memory storage) by
putting them into separate packages:

- storage/metric: public, implementation-agnostic interfaces
- storage/metric/tiered: tiered storage implementation, including memory
                         and LevelDB storage.

I initially also considered splitting up the implementation into
separate packages for tiered storage, memory storage, and LevelDB
storage, but these are currently so intertwined that it would be another
major project in itself.

The query layers and most other parts of Prometheus now have notion of
the storage implementation anymore and just use whatever implementation
they get passed in via interfaces.

The rule_checker is now a static binary :)

Change-Id: I793bbf631a8648ca31790e7e772ecf9c2b92f7a0
2014-04-16 13:30:19 +02:00