Commit graph

227 commits

Author SHA1 Message Date
Brian Brazil e041c0cd46 Add console and alert templates with access to all data.
Move rulemanager to it's own package to break cicrular dependency.
Make NewTestTieredStorage available to tests, remove duplication.

Change-Id: I33b321245a44aa727bfc3614a7c9ae5005b34e03
2014-05-30 16:24:56 +01:00
Julius Volz 01f652cb4c Separate storage implementation from interfaces.
This was initially motivated by wanting to distribute the rule checker
tool under `tools/rule_checker`. However, this was not possible without
also distributing the LevelDB dynamic libraries because the tool
transitively depended on Levigo:

rule checker -> query layer -> tiered storage layer -> leveldb

This change separates external storage interfaces from the
implementation (tiered storage, leveldb storage, memory storage) by
putting them into separate packages:

- storage/metric: public, implementation-agnostic interfaces
- storage/metric/tiered: tiered storage implementation, including memory
                         and LevelDB storage.

I initially also considered splitting up the implementation into
separate packages for tiered storage, memory storage, and LevelDB
storage, but these are currently so intertwined that it would be another
major project in itself.

The query layers and most other parts of Prometheus now have notion of
the storage implementation anymore and just use whatever implementation
they get passed in via interfaces.

The rule_checker is now a static binary :)

Change-Id: I793bbf631a8648ca31790e7e772ecf9c2b92f7a0
2014-04-16 13:30:19 +02:00
Julius Volz 20bfaf80ab Merge "Display filename when encountering bad rule file." 2013-12-13 15:01:02 +01:00
Julius Volz 3bf3a555b2 Merge "add evalDuration histogram and ruleCount counter for rules" 2013-12-11 22:52:19 +01:00
Stuart Nelson b75adfebad add evalDuration histogram and ruleCount counter for rules
Change-Id: I3508fe72526348d96b8158828388c3ac8d7c3fa9
2013-12-11 15:42:53 -05:00
Julius Volz 77a79d1fc0 Display filename when encountering bad rule file.
Change-Id: I4729371be92c5659a6938145c5fde66771d7be22
2013-12-11 15:44:11 +01:00
Julius Volz fb44580110 Cleanup/fix program termination sequence.
Change-Id: I2bc58a2583fb079c9ef383cfc7a5e0fbe613f1cd
2013-12-11 15:40:32 +01:00
Julius Volz 740d448983 Use custom timestamp type for sample timestamps and related code.
So far we've been using Go's native time.Time for anything related to sample
timestamps. Since the range of time.Time is much bigger than what we need, this
has created two problems:

- there could be time.Time values which were out of the range/precision of the
  time type that we persist to disk, therefore causing incorrectly ordered keys.
  One bug caused by this was:

  https://github.com/prometheus/prometheus/issues/367

  It would be good to use a timestamp type that's more closely aligned with
  what the underlying storage supports.

- sizeof(time.Time) is 192, while Prometheus should be ok with a single 64-bit
  Unix timestamp (possibly even a 32-bit one). Since we store samples in large
  numbers, this seriously affects memory usage. Furthermore, copying/working
  with the data will be faster if it's smaller.

*MEMORY USAGE RESULTS*
Initial memory usage comparisons for a running Prometheus with 1 timeseries and
100,000 samples show roughly a 13% decrease in total (VIRT) memory usage. In my
tests, this advantage for some reason decreased a bit the more samples the
timeseries had (to 5-7% for millions of samples). This I can't fully explain,
but perhaps garbage collection issues were involved.

*WHEN TO USE THE NEW TIMESTAMP TYPE*
The new clientmodel.Timestamp type should be used whenever time
calculations are either directly or indirectly related to sample
timestamps.

For example:
- the timestamp of a sample itself
- all kinds of watermarks
- anything that may become or is compared to a sample timestamp (like the timestamp
  passed into Target.Scrape()).

When to still use time.Time:
- for measuring durations/times not related to sample timestamps, like duration
  telemetry exporting, timers that indicate how frequently to execute some
  action, etc.

*NOTE ON OPERATOR OPTIMIZATION TESTS*
We don't use operator optimization code anymore, but it still lives in
the code as dead code. It still has tests, but I couldn't get all of them to
pass with the new timestamp format. I commented out the failing cases for now,
but we should probably remove the dead code soon. I just didn't want to do that
in the same change as this.

Change-Id: I821787414b0debe85c9fffaeb57abd453727af0f
2013-12-03 09:11:28 +01:00
Julius Volz 1eb1ceac8c Add alert-expression console links to notifications.
The ConsoleLinkForExpression() function now escapes console URLs in such a way
that works both in emails and in HTML.

Change-Id: I917bae0b526cbbac28ccd2a4ec3c5ac03ee4c647
2013-08-20 15:45:41 +02:00
Julius Volz aa5d251f8d Use github.com/golang/glog for all logging. 2013-08-12 17:54:36 +02:00
Julius Volz 3b970c5133 Add variable interpolation to notification messages.
This includes required refactorings to enable replacing the http client (for
testing) and moving the NotificationReq type definitions to the "notifications"
package, so that this package doesn't need to depend on "rules" anymore and
that it can instead use a representation of the required data which only
includes the necessary fields.
2013-08-12 12:29:08 +02:00
Julius Volz 35ee2cd3cb Add alertmanager notification support to Prometheus.
Alert definitions now also have mandatory SUMMARY and DESCRIPTION fields
that get sent along a firing alert to the alert manager.
2013-07-30 17:23:41 +02:00
Matt T. Proud 30b1cf80b5 WIP - Snapshot of Moving to Client Model. 2013-06-25 15:52:42 +02:00
Julius Volz 0226d1ac7a Implement alerts dashboard and expression console links. 2013-06-13 22:35:40 +02:00
Julius Volz ba29d07901 Show loaded rules in Status dashboard. 2013-06-11 11:39:31 +02:00
Julius Volz adb87816f4 Put RuleManager concurrency in hands of caller, fix races. 2013-06-05 13:56:56 +02:00
Matt T. Proud c10780c966 Introduce telemetry for rule evaluator durations.
This commit adds telemetry for the Prometheus expression rule
evaluator, which will enable meta-Prometheus monitoring of customers
to ensure that no instance is falling behind in answering routine
queries.

A few other sundry simplifications are introduced, too.
2013-05-23 21:29:27 +02:00
Julius Volz 56324d8ce2 Make AST query storage non-global. 2013-05-07 13:15:10 +02:00
Julius Volz 9cea5d9df8 Convert the Prometheus configuration to protocol buffers. 2013-04-30 22:26:00 +02:00
Julius Volz d8110fcd9c Send sample arrays instead of single samples over channels. 2013-04-29 17:24:17 +02:00
Julius Volz 2202cd71c9 Track alerts over time and write out alert timeseries. 2013-04-26 14:35:21 +02:00
Julius Volz c0601abf46 Implement initial no-op alert parsing and rule parsing tests. 2013-04-23 13:48:24 +02:00
Julius Volz 1eb586db7d Fix rule evaluation closure. 2013-04-17 15:11:21 +02:00
Julius Volz c4d0969c00 Propagate more errors during rule evaluation. 2013-04-09 13:47:20 +02:00
Matt T. Proud ea54751431 Update import paths to new location.
This repository moved from matttproud/prometheus to
prometheus/prometheus, and all import paths need to be updated.
2013-01-27 18:49:45 +01:00
Julius Volz 483bd81a44 Allow grammar to parse both rules and single expressions. 2013-01-11 01:17:37 +01:00
Julius Volz 56384bf42a Add initial config and rule language implementation. 2013-01-07 23:43:36 +01:00