Commit graph

95 commits

Author SHA1 Message Date
Julius Volz e7ed39c9a6 Initial experimental snapshot of next-gen storage.
Change-Id: Ifb8709960dbedd1d9f5efd88cdd359ee9fa9d26d
2014-11-25 17:02:00 +01:00
Brian Brazil 5edf689133 Stagger scrapes to spread out load.
Change-Id: Ib141b271e4adfb817886871f86051c207b05cf35
2014-11-25 17:02:00 +01:00
Bjoern Rabenstein 8956faeccb Migrate to new client_golang.
This change will only be submitted when the new client_golang has been
moved to the new version.

Change-Id: Ifceb59333072a08286a8ac910709a8ba2e3a1581
2014-11-25 17:01:59 +01:00
Brian Brazil e041c0cd46 Add console and alert templates with access to all data.
Move rulemanager to it's own package to break cicrular dependency.
Make NewTestTieredStorage available to tests, remove duplication.

Change-Id: I33b321245a44aa727bfc3614a7c9ae5005b34e03
2014-05-30 16:24:56 +01:00
Julius Volz 01f652cb4c Separate storage implementation from interfaces.
This was initially motivated by wanting to distribute the rule checker
tool under `tools/rule_checker`. However, this was not possible without
also distributing the LevelDB dynamic libraries because the tool
transitively depended on Levigo:

rule checker -> query layer -> tiered storage layer -> leveldb

This change separates external storage interfaces from the
implementation (tiered storage, leveldb storage, memory storage) by
putting them into separate packages:

- storage/metric: public, implementation-agnostic interfaces
- storage/metric/tiered: tiered storage implementation, including memory
                         and LevelDB storage.

I initially also considered splitting up the implementation into
separate packages for tiered storage, memory storage, and LevelDB
storage, but these are currently so intertwined that it would be another
major project in itself.

The query layers and most other parts of Prometheus now have notion of
the storage implementation anymore and just use whatever implementation
they get passed in via interfaces.

The rule_checker is now a static binary :)

Change-Id: I793bbf631a8648ca31790e7e772ecf9c2b92f7a0
2014-04-16 13:30:19 +02:00
Matt T. Proud 2064f32662 Clean up quitting behavior and add quit trigger.
The closing of Prometheus now using a sync.Once wrapper to prevent
any accidental multiple invocations of it, which could trigger
corruption or a race condition.  The shutdown process is made more
verbose through logging.

A not-enabled by default web handler has been provided to trigger a
remote shutdown if requested for debugging purposes.

Change-Id: If4fee75196bbff1fb1e4a4ef7e1cfa53fef88f2e
2014-04-15 21:40:04 +02:00
Matt T. Proud 81367893fd Use idiomatic one-to-many one-time signal pattern.
The idiomatic pattern for signalling a one-time message to multiple
consumers from a single producer is as follows:

```
  c := make(chan struct{})
  w := new(sync.WaitGroup)  // Boilerplate to ensure synchronization.

  for i := 0; i < 1000; i++ {
    w.Add(1)
    go func() {
      defer w.Done()

      for {
        select {
        case _, ok := <- c:
          if !ok {
            return
          }
        default:
          // Do something here.
        }
      }
    }()
  }

  close(c)  // Signal the one-to-many single-use message.
  w.Wait()

```

Change-Id: I755f73ba4c70a923afd342a4dea63365bdf2144b
2014-04-15 10:15:25 +02:00
Matt T. Proud 1d01435d4d Make curation semaphore behavior idiomatic.
Idiomatic semaphore usage in Go, unless it is wrapping a concrete type,
should use anonymous empty structs (``struct{}``).  This has several
features that are worthwhile:

  1. It conveys that the object in the channel is likely used for
     resource limiting / semaphore use.  This is by idiom.

  2. Due to magic under the hood, empty structs have a width of zero,
     meaning they consume little space.  It is presumed that slices,
     channels, and other values of them can be represented specially
     with alternative optimizations.  Dmitry Vyukov has done
     investigations into improvements that can be made to the channel
     design and Go and concludes that there are already nice short
     circuiting behaviors at work with this type.

This is the first change of several that apply this type of change to
suitable places.

In this one change, we fix a bug in the previous revision, whereby a
semaphore can be acquired for curation and never released back for
subsequent work: http://goo.gl/70Y2qK.  Compare that versus the
compaction definition above.

On top of that, the use of the semaphore in the mode better supports
system shutdown idioms through the closing of channels.

Change-Id: Idb4fca310f26b73c9ec690bbdd4136180d14c32d
2014-04-14 22:51:58 +02:00
Julius Volz 817d9b0e97 "go fmt" fixup.
Change-Id: I262bb462281bc2610819c822fc7a0768c6ce3d8d
2014-02-27 19:48:55 +01:00
Julius Volz 2279fcbac4 Compact everything to the same sample group size.
Change-Id: Ibb4f3a5d76173d64de916ef1eb41ab5d7900c97b
2014-02-19 16:22:20 +01:00
Julius Volz 61d26e8445 Add optional sample replication to OpenTSDB.
Prometheus needs long-term storage. Since we don't have enough resources
to build our own timeseries storage from scratch ontop of Riak,
Cassandra or a similar distributed datastore at the moment, we're
planning on using OpenTSDB as long-term storage for Prometheus. It's
data model is roughly compatible with that of Prometheus, with some
caveats.

As a first step, this adds write-only replication from Prometheus to
OpenTSDB, with the following things worth noting:

1)
I tried to keep the integration lightweight, meaning that anything
related to OpenTSDB is isolated to its own package and only main knows
about it (essentially it tees all samples to both the existing storage
and TSDB). It's not touching the existing TieredStorage at all to avoid
more complexity in that area. This might change in the future,
especially if we decide to implement a read path for OpenTSDB through
Prometheus as well.

2)
Backpressure while sending to OpenTSDB is handled by simply dropping
samples on the floor when the in-memory queue of samples destined for
OpenTSDB runs full.  Prometheus also only attempts to send samples once,
rather than implementing a complex retry algorithm. Thus, replication to
OpenTSDB is best-effort for now.  If needed, this may be extended in the
future.

3)
Samples are sent in batches of limited size to OpenTSDB. The optimal
batch size, timeout parameters, etc. may need to be adjusted in the
future.

4)
OpenTSDB has different rules for legal characters in tag (label) values.
While Prometheus allows any characters in label values, OpenTSDB limits
them to a to z, A to Z, 0 to 9, -, _, . and /. Currently any illegal
characters in Prometheus label values are simply replaced by an
underscore. Especially when integrating OpenTSDB with the read path in
Prometheus, we'll need to reconsider this: either we'll need to
introduce the same limitations for Prometheus labels or escape/encode
illegal characters in OpenTSDB in such a way that they are fully
decodable again when reading through Prometheus, so that corresponding
timeseries in both systems match in their labelsets.

Change-Id: I8394c9c55dbac3946a0fa497f566d5e6e2d600b5
2014-01-02 18:21:38 +01:00
Julius Volz fb44580110 Cleanup/fix program termination sequence.
Change-Id: I2bc58a2583fb079c9ef383cfc7a5e0fbe613f1cd
2013-12-11 15:40:32 +01:00
Julius Volz 740d448983 Use custom timestamp type for sample timestamps and related code.
So far we've been using Go's native time.Time for anything related to sample
timestamps. Since the range of time.Time is much bigger than what we need, this
has created two problems:

- there could be time.Time values which were out of the range/precision of the
  time type that we persist to disk, therefore causing incorrectly ordered keys.
  One bug caused by this was:

  https://github.com/prometheus/prometheus/issues/367

  It would be good to use a timestamp type that's more closely aligned with
  what the underlying storage supports.

- sizeof(time.Time) is 192, while Prometheus should be ok with a single 64-bit
  Unix timestamp (possibly even a 32-bit one). Since we store samples in large
  numbers, this seriously affects memory usage. Furthermore, copying/working
  with the data will be faster if it's smaller.

*MEMORY USAGE RESULTS*
Initial memory usage comparisons for a running Prometheus with 1 timeseries and
100,000 samples show roughly a 13% decrease in total (VIRT) memory usage. In my
tests, this advantage for some reason decreased a bit the more samples the
timeseries had (to 5-7% for millions of samples). This I can't fully explain,
but perhaps garbage collection issues were involved.

*WHEN TO USE THE NEW TIMESTAMP TYPE*
The new clientmodel.Timestamp type should be used whenever time
calculations are either directly or indirectly related to sample
timestamps.

For example:
- the timestamp of a sample itself
- all kinds of watermarks
- anything that may become or is compared to a sample timestamp (like the timestamp
  passed into Target.Scrape()).

When to still use time.Time:
- for measuring durations/times not related to sample timestamps, like duration
  telemetry exporting, timers that indicate how frequently to execute some
  action, etc.

*NOTE ON OPERATOR OPTIMIZATION TESTS*
We don't use operator optimization code anymore, but it still lives in
the code as dead code. It still has tests, but I couldn't get all of them to
pass with the new timestamp format. I commented out the failing cases for now,
but we should probably remove the dead code soon. I just didn't want to do that
in the same change as this.

Change-Id: I821787414b0debe85c9fffaeb57abd453727af0f
2013-12-03 09:11:28 +01:00
Matt T. Proud df2f9e47b8 Fix arguments for format string.
Change-Id: Ie0d55a70969c992b1afc6fa96b0e3f2171f0de5a
2013-09-04 18:38:29 +02:00
Matt T. Proud 33d6f73d18 Remove unused import.
Change-Id: Ic2415bd1abb07c43a9df3adc4b13c75fd465c767
2013-09-04 18:13:06 +02:00
Matt T. Proud 4a87c002e8 Update low-level i'faces to reflect wireformats.
This commit fixes a critique of the old storage API design, whereby
the input parameters were always as raw bytes and never Protocol
Buffer messages that encapsulated the data, meaning every place a
read or mutation was conducted needed to manually perform said
translations on its own.  This is taxing.

Change-Id: I4786938d0d207cefb7782bd2bd96a517eead186f
2013-09-04 17:13:58 +02:00
Matt T. Proud 12d5e6ca5a Curation should not starve user-interactive ops.
The background curation should be staggered to ensure that disk
I/O yields to user-interactive operations in a timely manner. The
lack of routine prioritization necessitates this.

Change-Id: I9b498a74ccd933ffb856e06fedc167430e521d86
2013-08-26 19:40:55 +02:00
Julius Volz 93a8d03221 Merge "Add alert-expression console links to notifications." 2013-08-24 19:40:50 +02:00
Matt T. Proud 0d73b8f87e Adjust curation thresholds and intervals.
Curation is an expensive process.  We can perform it less frequently;
and in the case of compaction processors, we can afford to let them
group more together.

Change-Id: I421e27958ed7a98dfacaababefad19462772b6a3
2013-08-23 16:07:37 +02:00
Julius Volz 1eb1ceac8c Add alert-expression console links to notifications.
The ConsoleLinkForExpression() function now escapes console URLs in such a way
that works both in emails and in HTML.

Change-Id: I917bae0b526cbbac28ccd2a4ec3c5ac03ee4c647
2013-08-20 15:45:41 +02:00
Julius Volz 305662bfdd Handle SIGTERM in addition to SIGINT.
Change-Id: I9c91ec7e2b9026b9eb152b98a6986f2bf773eb03
2013-08-14 12:38:21 +02:00
Matt T. Proud 972e856d9b Kill the curation state channel.
The use of the channels for curation state were always unidiomatic.

Change-Id: I1cb1d7175ebfb4faf28dff84201066278d6a0d92
2013-08-13 17:20:22 +02:00
Julius Volz d69b85e6c9 Add global label support via Ingesters. 2013-08-13 16:54:15 +02:00
Matt T. Proud 417e6ffa0c Rename the queue arguments for another trial.
Again, playing around.

Change-Id: I5413f7723a38ae18792802c5ed91753101adf491

Moving order of arguments by priority.

Testing out patch sets.

Change-Id: I5413f7723a38ae18792802c5ed91753101adf491
2013-08-13 14:05:12 +02:00
Matt T. Proud 014c5ef176 Move version to the bottom.
Again, dummy commit.  Just to demonstrate naive drafting with Gerrit.

Change-Id: I43afaf346c61738a17be60eaa22c966545e1eccf
2013-08-13 13:52:24 +02:00
Matt T. Proud 13bfc6c120 Consolidate queue arguments in main.
This is a no-op commit to demonstrate Gerrit workflow.

Change-Id: I3f09ccd91e4645753517fa31ca97384e7548317f
2013-08-13 13:47:37 +02:00
Julius Volz 0003027dce Add needed trailing spaces in logs. 2013-08-12 18:22:48 +02:00
Julius Volz aa5d251f8d Use github.com/golang/glog for all logging. 2013-08-12 17:54:36 +02:00
Julius Volz 3b970c5133 Add variable interpolation to notification messages.
This includes required refactorings to enable replacing the http client (for
testing) and moving the NotificationReq type definitions to the "notifications"
package, so that this package doesn't need to depend on "rules" anymore and
that it can instead use a representation of the required data which only
includes the necessary fields.
2013-08-12 12:29:08 +02:00
Julius Volz ecf0ee8f39 Transfer alerting rule and Prometheus URL to alertmanager. 2013-08-09 18:32:13 +02:00
Matt T. Proud 9baaadfc53 Include forgotten interval. 2013-08-05 18:34:19 +02:00
Matt T. Proud 07ac921aec Code Review: First pass. 2013-08-05 17:31:49 +02:00
Matt T. Proud d8792cfd86 Extract HighWatermarking.
Clean up the rest.
2013-08-05 11:03:03 +02:00
Julius Volz 35ee2cd3cb Add alertmanager notification support to Prometheus.
Alert definitions now also have mandatory SUMMARY and DESCRIPTION fields
that get sent along a firing alert to the alert manager.
2013-07-30 17:23:41 +02:00
Matt T. Proud 30b1cf80b5 WIP - Snapshot of Moving to Client Model. 2013-06-25 15:52:42 +02:00
Julius Volz 0226d1ac7a Implement alerts dashboard and expression console links. 2013-06-13 22:35:40 +02:00
Julius Volz ba29d07901 Show loaded rules in Status dashboard. 2013-06-11 11:39:31 +02:00
Matt T. Proud 2c3df44af6 Ensure database access waits until it is started.
This commit introduces a channel message to ensure serving
state has been reached with the storage stack before anything attempts
to use it.
2013-06-06 10:42:21 +02:00
Julius Volz adb87816f4 Put RuleManager concurrency in hands of caller, fix races. 2013-06-05 13:56:56 +02:00
Matt T. Proud 0d2d6e9a27 Include uptime in the status console.
In order to help corroborate whether a Prometheus instance has
flapped until meta-monitoring is in-place, we ought to provide the
instance's start time in the console to aid in diagnostics.
2013-05-24 10:44:34 +02:00
Matt T. Proud b063338ae6 Expand the in-memory arena size.
We need to exercise this code path sooner versus later.  This will
be a cheap way of doing so.
2013-05-14 17:59:52 +02:00
Bernerd Schaefer 63d9988b9c Drop unused writeMemoryInterval 2013-05-14 17:03:03 +02:00
Matt T. Proud b224251981 Simplify compaction and expose database sizes.
This commit simplifies the way that compactions across a database's
keyspace occur due to reading the LevelDB internals. Secondarily it
introduces the database size estimation mechanisms.

Include database health and help interfaces.

Add database statistics; remove status goroutines.

This commit kills the use of Go routines to expose status throughout
the web components of Prometheus. It also dumps raw LevelDB status
on a separate /databases endpoint.
2013-05-14 12:29:53 +02:00
juliusv 92ad65ff13 Merge pull request #232 from prometheus/optimize/granular-storage-locking
Synchronous memory appends and more fine-grained storage locks.
2013-05-13 10:11:57 -07:00
Matt T. Proud d538b0382f Include long-tail data deletion mechanism.
This commit introduces the long-tail deletion mechanism, which will
automatically cull old sample values.  It is an acceptable
hold-over until we get a resampling pipeline implemented.

Kill legacy OS X documentation, too.
2013-05-13 10:54:36 +02:00
Julius Volz ce1ee444f1 Synchronous memory appends and more fine-grained storage locks.
This does two things:

1) Make TieredStorage.AppendSamples() write directly to memory instead of
   buffering to a channel first. This is needed in cases where a rule might
   immediately need the data generated by a previous rule.

2) Replace the single storage mutex by two new ones:
   - memoryMutex - needs to be locked at any time that two concurrent
                   goroutines could be accessing (via read or write) the
                   TieredStorage memoryArena.
   - memoryDeleteMutex - used to prevent any deletion of samples from
                         memoryArena as long as renderView is running and
                         assembling data from it.
   The LevelDB disk storage does not need to be protected by a mutex when
   rendering a view since renderView works off a LevelDB snapshot.

The rationale against adding memoryMutex directly to the memory storage: taking
a mutex does come with a small inherent time cost, and taking it is only
required in few places. In fact, no locking is required for the memory storage
instance which is part of a view (and not the TieredStorage).
2013-05-10 17:15:52 +02:00
Matt Proud 7f0d816574 Schedule the background compactors to run.
This commit introduces three background compactors, which compact
sparse samples together.

1. Older than five minutes is grouped together into chunks of 50 every 30
   minutes.

2. Older than 60 minutes is grouped together into chunks of 250 every 50
   minutes.

3. Older than one day is grouped together into chunks of 5000 every 70
   minutes.
2013-05-07 17:14:04 +02:00
Julius Volz af7920126c Fix build errors and add default build step to "make". 2013-05-07 15:54:41 +02:00
Julius Volz 56324d8ce2 Make AST query storage non-global. 2013-05-07 13:15:10 +02:00
Matt T. Proud 3b9b1c6ab4 Define dependencies for web. stack concretely.
This commit destroys the use of AppState, which makes passing
concrete state along to various serving components onerous.
2013-05-06 11:13:12 +02:00