Commit graph

1203 commits

Author SHA1 Message Date
Tom Wilkie b568ace7ce Move protos to ./prompb 2017-07-12 22:06:35 +01:00
Tom Wilkie 96e25adc8d Introduce 'primary' storage in fanout, and have Add return the ref from the primary.
Also, ensure all append batches are rolled back when a commit or rollback fails.
2017-07-12 15:51:05 +01:00
Tom Wilkie db8128ceeb Add label set as first parameter to AddFast, ingored by TSDB adapter. 2017-07-12 15:20:12 +01:00
Tom Wilkie 2dda5775e3 Initial port of remote storage to v2. 2017-07-12 12:27:57 +01:00
Fabian Reinartz 16464c3a33 Merge pull request #2910 from prometheus/adminapi
Admin API
2017-07-11 17:15:49 +02:00
Fabian Reinartz ccf9e62972 *: add admin grpc API 2017-07-10 09:14:14 +02:00
Goutham Veeramachaneni 243419c007 Return tsdb.ErrOutOfBounds as storage.ErrOutOfBounds
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-07-06 14:18:31 +02:00
Goutham Veeramachaneni 3069bd3996 Handle scrapes with OutOfBounds metrics better
fixes #2894

Signed-off-by: Goutham Veeramachaneni <goutham@boomerangcommerce.com>
2017-07-04 11:24:13 +02:00
Goutham Veeramachaneni d407bd150c Consolidate the duration params in CLI
* All CLI params moved to model.Duration

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-16 20:20:57 +05:30
Goutham Veeramachaneni baf5b0f0fc Fix error where we look into the future. (#2829)
* Fix error where we look into the future.

So currently we are adding values that are in the future for an older
timestamp. For example, if we have [(1, 1), (150, 2)] we will end up
showing [(1, 1), (2,2)].

Further it is not advisable to call .At() after Next() returns false.

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>

* Retuen early if done

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>

* Handle Seek() where we reach the end of iterator

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>

* Simplify code

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-13 07:22:27 +02:00
Brian Brazil c02c25d5ba Allow peeking back further in buffer. 2017-05-24 14:27:17 +01:00
Fabian Reinartz d289dc55c3 storage: update TSDB 2017-05-22 11:53:08 +02:00
Fabian Reinartz 9b175d48cb Add flag to disable TSDB lock file 2017-05-09 12:56:51 +02:00
Fabian Reinartz 0f3110487d Merge remote-tracking branch 'origin/dev-2.0' into dev-2.0 2017-04-27 10:25:04 +02:00
Fabian Reinartz 37deb21c45 vendor: remove unused dependency and last ref to fabxc/tsdb 2017-04-27 10:23:34 +02:00
Brian Brazil 5c9a6ce747 Add license to files.
This should fix CI for dev-2.0.
2017-04-19 13:46:22 +01:00
Fabian Reinartz 8ffc851147 Merge branch 'master' into dev-2.0 2017-04-04 15:17:56 +02:00
Fabian Reinartz cfb2a7f1d5 vendor: sync organisation migration of tsdb 2017-04-04 11:33:51 +02:00
Fabian Reinartz bbcf20ba01 web: deduplicate series in federation 2017-04-04 11:20:23 +02:00
Fabian Reinartz 4e41987bcb storage: add deduplication function
This adds a function to deduplicate two series sets given that duplicate
series have equivalent data points.
2017-04-04 11:07:21 +02:00
Björn Rabenstein 50e4f49b7e Merge pull request #2561 from prometheus/beorn7/storage2
storage: Evict unused chunk.Descs in crash recovery
2017-04-04 00:05:03 +02:00
beorn7 08fc6cbd39 storage: Evict unused chunk.Descs in crash recovery
This is in line with the v1.5 change in paradigm to not keep
chunk.Descs without chunks around after a series maintenance.

It's mainly motivated by avoiding excessive amounts of RAM usage
during crash recovery.

The code avoids to create memory time series with zero chunk.Descs as
that is prone to trigger weird effects. (Series maintenance would
archive series with zero chunk.Descs, but we cannot do that here
because the archive indices still have to be checked.)
2017-04-04 00:04:22 +02:00
Björn Rabenstein 1c6240fc40 Merge pull request #2559 from prometheus/beorn7/storage
storage: Replace fpIter by sortedFPs
2017-04-03 16:56:21 +02:00
beorn7 d284ffab03 storage: Replace fpIter by sortedFPs
The fpIter was kind of cumbersome to use and required a lock for each
iteration (which wasn't even needed for the iteration at startup after
loading the checkpoint).

The new implementation here has an obvious penalty in memory, but it's
only 8 byte per series, so 80MiB for a beefy server with 10M memory
time series (which would probably need ~100GiB RAM, so the memory
penalty is only 0.1% of the total memory need).

The big advantage is that now series maintenance happens in order,
which leads to the time between two maintenances of the same series
being less random. Ideally, after each maintenance, the next
maintenance would tackle the series with the largest number of
non-persisted chunks. That would be quite an effort to find out or
track, but with the approach here, the next maintenance will tackle
the series whose previous maintenance is longest ago, which is a good
approximation.

While this commit won't change the _average_ number of chunks
persisted per maintenance, it will reduce the mean time a given chunk
has to wait for its persistence and thus reduce the steady-state
number of chunks waiting for persistence.

Also, the map iteration in Go is non-deterministic but not truly
random. In practice, the iteration appears to be somewhat "bucketed".
You can often observe a bunch of series with similar duration since
their last maintenance, i.e. you see batches of series with similar
number of chunks persisted per maintenance. If that batch is
relatively young, a whole lot of series are maintained with very few
chunks to persist. (See screenshot in PR for a better explanation.)
2017-04-03 15:34:46 +02:00
Tobias Schmidt eac36d123e Fix unstable fanin test (#2558) 2017-04-03 13:02:15 +02:00
Julius Volz 5a896033e3 Add remote read external label handling (#2555)
* Add remote read external label handling

This implements rule 1 and 2 from
https://docs.google.com/document/d/188YauRgfF0J4CYMigLsVNN34V_kUwKnApBs2dQMfBbs/edit

* Use more descriptive example labels in read test

* Add comment for querier.addExternalLabels()

* Make argument naming in removeLabels() more generic
2017-04-02 17:48:15 +02:00
Björn Rabenstein e63d079b59 Merge pull request #2527 from prometheus/beorn7/storage
storage: Evict chunks and calculate persistence pressure...
2017-03-27 14:49:42 +02:00
Julius Volz b5b0e00923 Merge pull request #2499 from prometheus/remote-read
Remote Read
2017-03-27 14:43:44 +02:00
beorn7 434ab2a6a3 storage: Evict chunks and calculate persistence pressure based on target heap size
This is a fairly easy attempt to dynamically evict chunks based on the
heap size. A target heap size has to be set as a command line flage,
so that users can essentially say "utilize 4GiB of RAM, and please
don't OOM".

The -storage.local.max-chunks-to-persist and
-storage.local.memory-chunks flags are deprecated by this
change. Backwards compatibility is provided by ignoring
-storage.local.max-chunks-to-persist and use
-storage.local.memory-chunks to set the new
-storage.local.target-heap-size to a reasonable (and conservative)
value (both with a warning).

This also makes the metrics intstrumentation more consistent (in
naming and implementation) and cleans up a few quirks in the tests.

Answers to anticipated comments:

There is a chance that Go 1.9 will allow programs better control over
the Go memory management. I don't expect those changes to be in
contradiction with the approach here, but I do expect them to
complement them and allow them to be more precise and controlled. In
any case, once those Go changes are available, this code has to be
revisted.

One might be tempted to let the user specify an estimated value for
the RSS usage, and then internall set a target heap size of a certain
fraction of that. (In my experience, 2/3 is a fairly safe bet.)
However, investigations have shown that RSS size and its relation to
the heap size is really really complicated. It depends on so many
factors that I wouldn't even start listing them in a commit
description. It depends on many circumstances and not at least on the
risk trade-off of each individual user between RAM utilization and
probability of OOMing during a RAM usage peak. To not add even more to
the confusion, we need to stick to the well-defined number we also use
in the targeting here, the sum of the sizes of heap objects.
2017-03-27 14:33:50 +02:00
beorn7 96a303b348 storage: Use staleness delta as head chunk timeout
Currently, if a series stops to exist, its head chunk will be kept
open for an hour. That prevents it from being persisted. Which
prevents it from being evicted. Which prevents the series from being
archived.

Most of the time, once no sample has been added to a series within the
staleness limit, we can be pretty confident that this series will not
receive samples anymore. The whole chain as described above can be
started after 5m instead of 1h. In the relaxed case, this doesn't
change a lot as the head chunk timeout is only checked during series
maintenance, and usually, a series is only maintained every six
hours. However, there is the typical scenario where a large service is
deployed, the deoply turns out to be bad, and then it is deployed
again within minutes, and quite quickly the number of time series has
tripled. That's the point where the Prometheus server is stressed and
switches (rightfully) into rushed mode. In that mode, time series are
processed as quickly as possible, but all of that is in vein if all of
those recently ended time series cannot be persisted yet for another
hour. In that scenario, this change will help most, and it's exactly
the scenario where help is most desperately needed.
2017-03-26 23:44:50 +02:00
Julius Volz 3f23aa2cc7 Add headers to indicate remote read/write version
Also add Content-Type header.
2017-03-24 17:39:51 +01:00
Julius Volz 8fda83ea12 Make rules only read local data 2017-03-21 00:50:04 +01:00
Julius Volz 94acd3f1d8 Add fanin tests and fix uncovered bugs 2017-03-21 00:08:17 +01:00
Julius Volz 9b33cfc457 Fix/unify context-based remote storage timeouts 2017-03-20 14:17:06 +01:00
Julius Volz 815762a4ad Move retrieval.NewHTTPClient -> httputil.NewClientFromConfig 2017-03-20 14:17:04 +01:00
Fabian Reinartz 397f001ac5 Merge branch 'master' into dev-2.0 2017-03-20 14:12:11 +01:00
Julius Volz eb14678a25 Make remote read/write use config.HTTPClientConfig 2017-03-20 13:37:50 +01:00
Julius Volz 406b65d0dc Rename remote.Storage to remote.Writer 2017-03-20 13:15:28 +01:00
Julius Volz 02395a224d [WIP] Remote Read 2017-03-20 13:13:44 +01:00
Julius Volz 40e41a4776 Merge pull request #2494 from tomwilkie/remote-write-sharding
Dynamically reshard the QueueManager based on observed load.
2017-03-20 12:45:17 +01:00
Fabian Reinartz b586781283 *: update tsdb vendoring and add retention flag 2017-03-17 16:06:04 +01:00
beorn7 48d221c11e storage: Fix typo in comment 2017-03-16 11:49:41 +01:00
Fabian Reinartz 0ecd205794 promql: Use buffer pool for matrix allocations 2017-03-14 10:57:34 +01:00
Tom Wilkie 75bb0f3253 Review feedback 2017-03-13 21:24:49 +00:00
Tom Wilkie 77cce900b8 Fix tests 2017-03-13 15:21:59 +00:00
Tom Wilkie b48799a01e Add license stanza 2017-03-13 14:50:15 +00:00
Tom Wilkie 9d22f030cf Dynamically reshard the QueueManager based on observed load. 2017-03-13 14:41:16 +00:00
Fabian Reinartz 8a8eb12985 storage/tsdb: don't use partitioned DB. 2017-03-07 11:51:30 +01:00
Fabian Reinartz 9eb1d6c927 remote: take code from master 2017-03-07 11:43:32 +01:00
Fabian Reinartz 9304179ef7 Merge branch 'master' into dev-2.0 2017-03-02 08:16:58 +01:00
Fabian Reinartz 4397b4d508 *: pass Prometheus registry into storage 2017-02-28 09:33:14 +01:00
Tom Wilkie 1ab893c6ec Limit 'discarding sample' logs to 1 every 10s (#2446)
* Limit 'discarding sample' logs to 1 every 10s

* Include the vendored library

* Review feedback
2017-02-23 19:20:39 +01:00
Julius Volz 2f39dbc8b3 Rename StorageQueueManager -> QueueManager 2017-02-21 21:45:43 +01:00
Julius Volz e9476b35d5 Re-add multiple remote writers
Each remote write endpoint gets its own set of relabeling rules.

This is based on the (yet-to-be-merged)
https://github.com/prometheus/prometheus/pull/2419, which removes legacy
remote write implementations.
2017-02-20 13:23:12 +01:00
Björn Rabenstein 089dc1076b Merge pull request #2435 from jmeulemans/open-chunks-gauge
Adding gauge for number of open head chunks.
2017-02-17 16:02:06 +01:00
Jeremy Meulemans 025c828976 Changed to open_head_chunks to address review.
Now incrementing numHeadChunks directly.
2017-02-17 07:10:13 -06:00
Jeremy Meulemans 074050b8c0 Updating for failed codeclimate check. 2017-02-16 18:04:28 -06:00
Jeremy Meulemans f70b52d0b6 Adding gauge for number of open head chunks.
Fixes #1710
2017-02-16 17:56:45 -06:00
Julius Volz beb3c4b389 Remove legacy remote storage implementations
This removes legacy support for specific remote storage systems in favor
of only offering the generic remote write protocol. An example bridge
application that translates from the generic protocol to each of those
legacy backends is still provided at:

documentation/examples/remote_storage/remote_storage_bridge

See also https://github.com/prometheus/prometheus/issues/10

The next step in the plan is to re-add support for multiple remote
storages.
2017-02-14 17:52:05 +01:00
beorn7 d771185a43 storage: Fix chunkIndexToStartSeek calculation
With a high enough shrink ratio and enough chunks to persist, the
cutoff point could be _outside_ of the file, which wreaks havoc in the
storage.
2017-02-10 11:42:59 +01:00
beorn7 73bd5e4dff Merge branch 'beorn7/storage' into beorn7/storage3 2017-02-09 14:44:10 +01:00
beorn7 46a0837816 storage: Fix offset returned by dropAndPersistChunks
This is another corner-case that was previously never exercised
because the rewriting of a series file was never prevented by the
shrink ratio.

Scenario: There is an existing series on disk, which is archived. If a
new sample comes in for that file, a new chunk in memory is created,
and the chunkDescsOffset is set to -1. If series maintenance happens
before the series has at least one chunk to persist _and_ an
insufficient chunks on disk is old enough for purging (so that the
shrink ratio kicks in), dropAndPersistChunks would return 0, but it
should return the chunk length of the series file.
2017-02-09 14:35:07 +01:00
beorn7 9d12204da5 Merge branch 'release-1.5' 2017-02-09 13:11:53 +01:00
beorn7 bed4934224 storage: One more persist error code path discovered
Also, in that code path, set chunkDescsOffset to 0 rather than -1 in
case of "dropped more chunks from persistence than from memory" so
that no other weird things happen before the series is quarantined for
good.
2017-02-09 11:51:40 +01:00
beorn7 242d8edcb5 Merge branch 'release-1.5' 2017-02-08 17:28:09 +01:00
beorn7 8c8baaa558 storage: writeMemorySeries needs to return true for quarantined series
This is another fallout of my bug hunt.
2017-02-08 16:28:56 +01:00
Mitsuhiro Tanda be8b1eb656 storage: optimize dropping chunks by using minShrinkRatio (#2397)
storage: prevent unnecessary chunk header reading if minShrinkRatio > 0
2017-02-07 17:33:54 +01:00
beorn7 2363a90adc storage: Do not throw away fully persisted memory series in checkpointing 2017-02-06 17:39:59 +01:00
Fabian Reinartz ea3ba338dd main: add flags for new storage 2017-02-05 18:22:06 +01:00
beorn7 244a65fb29 storage: Increase persist watermark before calling append
The append call may reuse cds, and thus change its len.
(In practice, this wouldn't happen as cds should have len==cap.
Still, the previous order of lines was problematic.)
2017-02-05 02:25:09 +01:00
beorn7 75282b27ba storage: Added checks for invariants 2017-02-04 23:40:22 +01:00
beorn7 31e9db7f0c storage: Simplify evictChunkDesc method 2017-02-04 22:29:37 +01:00
Fabian Reinartz 5772f1a7ba retrieval/storage: adapt to new interface
This simplifies the interface to two add methods for
appends with labels or faster reference numbers.
2017-02-02 13:05:46 +01:00
beorn7 65dc8f44d3 storage: Test for errors returned by MaybePopulateLastTime 2017-02-01 23:43:58 +01:00
beorn7 752fac60ae storage: Remove race condition from TestLoop 2017-02-01 23:43:58 +01:00
beorn7 4ccfc93dcf storage: Set shrink ratio in the constructor. 2017-02-01 15:37:16 +01:00
beorn7 b2f086c6c4 storage: Expose bug of not setting the shrink ratio in the contstructor 2017-02-01 15:37:10 +01:00
Brian Brazil c1b547a90e Only checkpoint chunkdescs and series that need persisting. (#2340)
This decreases checkpoint size by not checkpointing things
that don't actually need checkpointing.

This is fully compatible with the v2 checkpoint format,
as it makes series appear as though the only chunksdescs
in memory are those that need persisting.
2017-01-17 00:59:38 +00:00
Fabian Reinartz c691895a0f retrieval: cache series references, use pkg/textparse
With this change the scraping caches series references and only
allocates label sets if it has to retrieve a new reference.
pkg/textparse is used to do the conditional parsing and reduce
allocations from 900B/sample to 0 in the general case.
2017-01-16 12:03:57 +01:00
Brian Brazil f64c231dad Allow checkpoints and maintenance to happen concurrently. (#2321)
This is essential on larger Prometheus servers, as otherwise
checkpoints prevent sufficient persisting of chunks to disk.
2017-01-13 17:24:19 +00:00
Fabian Reinartz ad9bc62e4c storage: extend appender and adapt it 2017-01-13 14:48:01 +01:00
Brian Brazil 1dcb7637f5 Add various persistence related metrics (#2333)
Add metrics around checkpointing and persistence

* Add a metric to say if checkpointing is happening,
and another to track total checkpoint time and count.

This breaks the existing prometheus_local_storage_checkpoint_duration_seconds
by renaming it to prometheus_local_storage_checkpoint_last_duration_seconds
as the former name is more appropriate for a summary.

* Add metric for last checkpoint size.

* Add metric for series/chunks processed by checkpoints.

For long checkpoints it'd be useful to see how they're progressing.

* Add metric for dirty series

* Add metric for number of chunks persisted per series.

You can get the number of chunks from chunk_ops,
but not the matching number of series. This helps determine
the size of the writes being made.

* Add metric for chunks queued for persistence

Chunks created includes both chunks that'll need persistence
and chunks read in for queries. This only includes chunks created
for persistence.

* Code review comments on new persistence metrics.
2017-01-11 15:11:19 +00:00
Fabian Reinartz 304cae9928 tsdb: Use PartitionedDB constructor 2017-01-06 12:34:54 +01:00
Brian Brazil f9e581907a Make index queue bigger. (#2322)
When a large Prometheus starts up fresh it can take many minutes
to warmup and clear out the index queue. A larger queue means less
blocking, bigger batches and cuts down startup time by ~50%.
2017-01-05 17:57:42 +00:00
Fabian Reinartz bc20d93f0a storage: rename iterator value getters to At() 2017-01-02 13:33:37 +01:00
Fabian Reinartz 7322c46b8e storage: add mock iterator for test 2016-12-30 10:45:56 +01:00
Fabian Reinartz f8fc1f5bb2 *: migrate ingestion to new batch Appender 2016-12-29 11:03:56 +01:00
Fabian Reinartz 71fe0c58a8 promql: misc fixes 2016-12-28 11:32:15 +01:00
Mitsuhiro Tanda 7e369b9318 expose max memory chunks metrics (#2303)
* expose max memory chunks metrics
2016-12-27 18:34:07 +00:00
Fabian Reinartz fecf9532b9 *: fix misc compile errors 2016-12-25 11:42:57 +01:00
Fabian Reinartz 622ece6273 *: fix recording tests, migrate matcher types 2016-12-25 11:12:57 +01:00
Fabian Reinartz 0492ddbd4d *: fully decouple tsdb, add new storage interfaces 2016-12-25 01:43:22 +01:00
Fabian Reinartz d17b5be48a storage/metric: remove package 2016-12-25 00:42:52 +01:00
Fabian Reinartz 8b84ee5ee6 storage: remove old storage
This removes all old storage files and only keeps interfaces
to still allow the code to compile.
2016-12-22 23:33:32 +01:00
Fabian Reinartz 11a731ba82 remote: remove hard-coded remote storages
This commit removes the flag-configured remote storage integrations
in favor of the generic remote write path.
2016-12-22 23:17:35 +01:00
Brian Brazil 93b70ee4ea Evict chunk descs of all unloaded chunks during maintenance. (#2297)
Keeping these around has two problems:
1) Each desc takes 64 bytes, 10 of them is 640B. This is a lot of
overhead on a 1024 byte chunk.
2) It can take well over a week to reach a point where this and thus
Prometheus memory usage as a whole enters steady state. This makes RAM
estimation very hard for users, and makes it difficult to investigate
things like memory fragmentation.

Instead we'll wipe them during each memory series maintenance cycle, and
if a query pulls them in they'll hang around as cache until the next
cycle.
2016-12-22 13:49:03 +00:00
Brian Brazil 1b8a474612 Don't clone the metric if there's no remote writes.
The metric clone can't be further optimised, and is a
non-trivial memory allocation cost so fast path it
if there's no remote writes configured.
2016-12-21 11:34:48 +00:00
Tristan Colgate 30be8e0b8a ignore dotfiles in data directory 2016-12-15 11:48:23 +00:00
Björn Rabenstein 45570e5972 Merge pull request #2277 from prometheus/beorn7/storage2
storage: Sanity-check number of loaded chunk descs
2016-12-14 02:59:10 +01:00
beorn7 253be23c00 storage: Sanity-check number of loaded chunk descs
Two cases:

- An unarchived metric must have at least one chunk desc loaded upon
  unarchival. Otherwise, the file is gone or has size 0, which is an
  inconsistency (because the series is still indexed in the archive
  index). Hence, quarantining is triggered.

- If loading the chunk descs of a series with a known chunkDescsOffset
  (i.e. != -1), the number of chunks loaded must be equal to
  chunkDescsOffset. If not, there is a data corruption. An error is
  returned, which leads to qurantining.

In any case, there is a guard added to not access the 1st element of
an empty chunkDescs slice. (That's what triggered the crashes in issue
2249.)  A time series with unknown chunkDescsOffset and no chunks in
memory and no chunks on disk either could trigger that case. I would
assume such a "null series" doesn't exist, but it's not entirely
unthinkable and unreasonable to happen (perhaps in future uses of the
storage). (Create a series, and then something tries to preload chunks
before the first sample is added.)
2016-12-13 23:19:39 +01:00
Björn Rabenstein 5f0c0e43cf Merge pull request #2276 from prometheus/beorn7/storage
storage: Catch data corruption that leads to division by zero
2016-12-13 23:13:39 +01:00
beorn7 837c029b16 storage: Fix linter issue
Go style tries to avoid indented `else` blocks.
2016-12-13 19:05:30 +01:00
beorn7 4719482f5f storage: Make tests go-vet and golint clean 2016-12-13 17:07:27 +01:00
beorn7 485ac8dff7 storage: Verify validity of byte length when unmarshalling (double)delta chunks
This makes sure a division-by-zero crash cannot happen in the Len()
method.

Fixes #2773
2016-12-13 17:07:27 +01:00
tattsun e714079cf2 storage: fix error message (#2270)
* storage: add error message
2016-12-09 22:36:27 +00:00
Christopher M. Luciano 148b006e25 Clarify error message when Prometheus data dir finds unexpected files 2016-12-05 10:51:57 -05:00
Julius Volz 127332c56f Merge pull request #2168 from tomwilkie/chunk-len
Add call to estimate number of samples in a chunk to the API
2016-11-17 23:13:50 -08:00
Tom Wilkie 585878cdb2 Add call to estimate number of samples in a chunk to the API 2016-11-17 19:09:59 +00:00
Björn Rabenstein 036715370f Merge pull request #2184 from huydx/master
Fix possible memory leak by defer inside loop
2016-11-14 15:26:39 +01:00
huydx c999902761 Fix possible memory leak by defer inside loop 2016-11-14 14:08:08 +09:00
Fabian Reinartz 856de30c09 Check error before defer closing
If an error is returned the file might be nil and a Close call
would cause a panic.
2016-11-13 18:16:02 +01:00
Fabian Reinartz 6703404cb4 Merge remote-tracking branch 'origin/release-1.2' 2016-11-01 16:35:22 +01:00
beorn7 c5bd178b93 Protect exported Querier interface method against negative time ranges 2016-11-01 15:05:01 +01:00
beorn7 5b16d6bd6e Merge branch 'release-1.2' 2016-10-31 00:06:23 +01:00
beorn7 876e5da4f8 Add guard against non-monotonic samples in series
This can only happen due to data corruption.
2016-10-25 14:59:33 +02:00
Dominik Schulz 182e17958a Trivial spelling corrections and a small comment. 2016-10-18 20:14:38 +02:00
Fabian Reinartz 8fa18d564a storage: enhance Querier interface usage
This extracts Querier as an instantiateable and closeable object
rather than just defining extending methods of the storage interface.
This improves composability and allows abstracting query transactions,
which can be useful for transaction-level caches, consistent data views,
and encapsulating teardown.
2016-10-16 10:39:29 +02:00
beorn7 719508752b Re-add counting of evict chunk ops and decrementing NumMemChunks
Also, modify test to expose the regression.
2016-10-10 16:30:10 +02:00
Julius Volz cb02f017ee Clean up some doc comments 2016-10-06 21:53:40 +02:00
Julius Volz c212ef0326 Add Chunk.Utilization() methods
When using the chunking code in other projects (both Weave Prism and
ChronixDB ingester), you sometimes want to know how well you are
utilizing your chunks when closing/storing them.
2016-10-06 16:31:59 +02:00
Julius Volz c7932aa009 Remove gRPC leftovers in protobuf definitions 2016-10-05 17:31:04 +02:00
Björn Rabenstein 1e2f03f668 Merge pull request #2005 from redbaron/microoptimise-matching
Microoptimise matching
2016-10-05 17:26:56 +02:00
Maxim Ivanov e6db9f8159 New fpsForLabelMatchers and seriesForLabelMatchers methods
These more specific methods have replaced `metricForLabelMatchers`
in cases where its  `map[fingerprint]metric` result type was
not necessary or was used as an intermediate step

Avoids duplicated calls to `seriesForRange` from
`QueryRange` and `QueryInstant` methods.
2016-10-05 15:15:54 +01:00
Brian Brazil 6e8f87a37f Merge pull request #2047 from prometheus/write-relabel
Add support for remote write relabelling.
2016-10-05 07:47:49 +01:00
Brian Brazil 77605649a9 Add support for remote write relabelling.
Switch back to a single remote writer, as we were only ever meant to
have one and the relabel semantics are clearer that way.
2016-10-05 07:43:19 +01:00
Julius Volz c9d4526428 Unpublish accidentally published series methods
There were some more accidentally published methods of the memorySeries
type which I didn't notice when reviewing https://github.com/prometheus/prometheus/pull/2011
2016-10-03 00:04:56 +02:00
Maxim Ivanov 4978a65495 Extract initial FP candidate build logic into candidateFPsForLabelMatchers method
No functional changes otherwise
2016-10-02 17:35:02 +01:00
Maxim Ivanov c048a0cde8 Add metrics to result after checking all matchers
Should be marginally faster and somewhat more GC friendly
2016-10-02 17:35:02 +01:00
Maxim Ivanov bedc0eda1f Added BenchmarkQueryRange 2016-10-02 17:35:02 +01:00
Julius Volz c25f0de5ae Remove local.ZeroSample{,Pair}, use model definitions 2016-09-28 23:42:45 +02:00
Julius Volz 044ebce779 Review fixups. 2016-09-28 23:42:44 +02:00
Julius Volz d30a3c7c0f Fix accidental publishing of memorySeries.firstTime() 2016-09-26 13:25:27 +02:00
Julius Volz ab80ced756 storage: separate chunk package, publish more names
This is a followup to https://github.com/prometheus/prometheus/pull/2011.

This publishes more of the methods and other names of the chunk code and
moves the chunk code to its own package. There's some unavoidable
ugliness: the chunk and chunkDesc metrics are used by both packages, so
I had to move them to the chunk package. That isn't great, but I don't
see how to do it better without a larger redesign of everything. Same
for the evict requests and some other types.
2016-09-26 13:25:11 +02:00
Julius Volz 42c05dd3a2 Merge pull request #2011 from mattkanwisher/chuck-public
Make Chunk and ChunkIterator public for reuse
2016-09-23 14:45:35 +02:00
beorn7 ca98382943 Avoid defer in seriesMap.get
This is related to https://github.com/golang/go/issues/14939 .
It's probably the only occurrence where it matters.
2016-09-22 17:50:58 +02:00
Matthew Campbell 67d76e3a5d timeseries: store varbit encoded data into cassandra 2016-09-21 17:56:55 +02:00
Tom Wilkie 4520e12440 Add HTTP Basic Auth & TLS support to the generic write path. (#1957)
* Add config, HTTP Basic Auth and TLS support to the generic write path.

- Move generic write path configuration to the config file
- Factor out config.TLSConfig -> tlf.Config translation
- Support TLSConfig for generic remote storage
- Rename Run to Start, and make it non-blocking.
- Dedupe code in httputil for TLS config.
- Make remote queue metrics global.
2016-09-19 22:47:51 +02:00
Julius Volz c187308366 storage: Contextify storage interfaces.
This is based on https://github.com/prometheus/prometheus/pull/1997.

This adds contexts to the relevant Storage methods and already passes
PromQL's new per-query context into the storage's query methods.
The immediate motivation supporting multi-tenancy in Frankenstein, but
this could also be used by Prometheus's normal local storage to support
cancellations and timeouts at some point.
2016-09-19 16:29:07 +02:00
Maxim Ivanov bdc53098fc Avoid having contended mutexes on same cacheline
CPUs have to serialise write access to a single cache line
effectively reducing level of possible parallelism. Placing
mutexes on different cache lines avoids this problem.

Most gains will be seen on NUMA servers where CPU interconnect
traffic is especially expensive

Before:
go test . -run none -bench BenchmarkFingerprintLocker
BenchmarkFingerprintLockerParallel-4   	 2000000	       932 ns/op
BenchmarkFingerprintLockerSerial-4     	30000000	        49.6 ns/op

After:
go test . -run none -bench BenchmarkFingerprintLocker
BenchmarkFingerprintLockerParallel-4   	 3000000	       569 ns/op
BenchmarkFingerprintLockerSerial-4     	30000000	        51.0 ns/op
2016-09-18 23:32:55 +01:00
Julius Volz 5f5a78e807 Merge pull request #1974 from prometheus/disable-local-storage
Allow disabling local storage.
2016-09-17 18:40:01 +02:00
Tom Wilkie d83879210c Switch back to protos over HTTP, instead of GRPC.
My aim is to support the new grpc generic write path in Frankenstein.  On the surface this seems easy - however I've hit a number of problems that make me think it might be better to not use grpc just yet.

The explanation of the problems requires a little background.  At weave, traffic to frankenstein need to go through a couple of services first, for SSL and to be authenticated.  So traffic goes:

    internet -> frontend -> authfe -> frankenstein

- The frontend is Nginx, and adds/removes SSL.  Its done this way for legacy reasons, so the certs can be managed in one place, although eventually we imagine we'll merge it with authfe.  All traffic from frontend is sent to authfe.
- Authfe checks the auth tokens / cookie etc and then picks the service to forward the RPC to.
- Frankenstein accepts the reads and does the right thing with them.

First problem I hit was Nginx won't proxy http2 requests - it can accept them, but all calls downstream are http1 (see https://trac.nginx.org/nginx/ticket/923).  This wasn't such a big deal, so it now looks like:

    internet --(grpc/http2)--> frontend --(grpc/http1)--> authfe --(grpc/http1)--> frankenstein

Next problem was golang grpc server won't accept http1 requests (see https://groups.google.com/forum/#!topic/grpc-io/JnjCYGPMUms).  It is possible to link a grpc server in with a normal go http mux, as long as the mux server is serving over SSL, as the golang http client & server won't do http2 over anything other than an SSL connection.  This would require making all our service to service comms SSL.  So I had a go a writing a grpc http1 server, and got pretty far.  But is was a bit of a mess.

So finally I thought I'd make a separate grpc frontend for this, running in parallel with the frontend/authfe combo on a different port - and first up I'd need a grpc reverse proxy.  Ideally we'd have some nice, generic reverse proxy that only knew about a map from service names -> downstream service, and didn't need to decode & re-encode every request as it went through.  It seems like this can't be done with golang's grpc library - see https://github.com/mwitkow/grpc-proxy/issues/1.

And then I was surprised to find you can't do grpc from browsers! See http://www.grpc.io/faq/ - not important to us, but I'm starting to question why we decided to use grpc in the first place?

It would seem we could have most of the benefits of grpc with protos over HTTP, and this wouldn't preclude moving to grpc when its a bit more mature?  In fact, the grcp FAQ even admits as much:

> Why is gRPC better than any binary blob over HTTP/2?
> This is largely what gRPC is on the wire.
2016-09-15 23:21:54 +01:00
Tobias Schmidt 29ced0090f Fix common english misspellings 2016-09-14 23:23:28 -04:00
Tobias Schmidt e2c12dcdb5 Add missing error check in persistence test 2016-09-14 23:16:47 -04:00
Tobias Schmidt 8f3b62bfe4 Simplify struct initialization 2016-09-14 23:13:27 -04:00
Julius Volz b24e5d63bc Add noop local storage engine.
This adds a flag -storage.local.engine which allows turning off local
storage in Prometheus. Instead of adding if-conditions and nil checks to
all parts of Prometheus that deal with Prometheus's local storage
(including the web interface), disabling local storage simply means
replacing the normal local storage with a noop version that throws
samples away and returns empty query results. We also don't add the noop
storage to the fanout appender to decrease internal overhead.

Instead of returning empty results, an alternate behavior could be to
return errors on any query that point out that the local storage is
disabled. Not sure which one is more preferable, so I went with the
empty result option for now.
2016-09-14 13:18:05 +02:00
Fabian Reinartz 22296dcb85 storage: clarify sample/matcher relation in docs 2016-09-12 11:19:36 +02:00
Fabian Reinartz cc6f988a5e storage: fix Querier interface documentation 2016-09-12 10:48:54 +02:00
Fabian Reinartz 7bd7e63f97 storage: fix struct alignment issue in test
The uint64 `numCalls` ends up being not word-aligned on certain architectures,
which makes atomic reads/writes panic.
2016-09-11 00:32:57 +02:00
nghialv 7655038184 fix typo 2016-09-08 19:01:32 +09:00
Tom Wilkie d41d91388f Update for new generic remote storage. 2016-08-30 17:43:29 +02:00
Tom Wilkie a6931b71e8 Rationalise retrieval metrics so we have the state (success/failed) on both samples and batches, in a consistent fashion.
Also, report total queue capacity of all queues, i.e. capacity * shards.
2016-08-30 17:42:42 +02:00
Tom Wilkie ece12bff93 Shard/parrallelise samples by fingerprint in StorageQueueManager
By splitting the single queue into multiple queues and flushing each individual queue in serially (and all queues in parallel), we can guarantee to preserve the order of timestampsin samples sent to downstream systems.
2016-08-30 17:42:36 +02:00
Julius Volz fe29e87824 Merge pull request #1930 from prometheus/generic-write-grpc
Generic write via gRPC
2016-08-30 17:28:48 +02:00
Julius Volz aa3f2b7216 Generic write cleanups and changes.
- fold metric name into labels
- return initialization errors back to main
- add snappy compression
- better context handling
- pre-allocation of labels
- remove generic naming
- other cleanups
2016-08-30 17:24:48 +02:00
Brian Brazil 36d2c4bd0b Add generic write path using grpc.
This uses a new proto format, with scope for multiple samples per
timeseries in future. This will allow users to pump samples out to
whatever they like without having to change the core Prometheus code.

There's also an example receiver to save users figuring out the
boilerplate themselves.
2016-08-30 17:19:18 +02:00
Dan Milstein ec064c96f6 Add field names to table-driven test fixtures 2016-08-30 07:57:39 -04:00
Dan Milstein ac8788aca6 Convert to table-driven test and inline helper func 2016-08-30 07:57:39 -04:00
Dan Milstein f50f656a66 Fix double-delta unmarshaling to respect actual min header size
Turns out its valid to have an overall chunk which is smaller than the
full doubleDeltaHeaderBytes size -- if it has a single sample, it
doesn't fill the whole header.  Updated unmarshalling check to respect
this.
2016-08-30 07:57:39 -04:00
Dan Milstein b815956341 Catch errors when unmarshalling delta/doubleDelta encoded chunks
This is (hopefully) a fix for #1653

Specifically, this makes it so that if the length for the stored
delta/doubleDelta is somehow corrupted to be too small, the attempt to
unmarshal will return an error.

The current (broken) behavior is to return a malformed chunk, which can
then lead to a panic when there is an attempt to read header values.

The referenced issue proposed creating chunks with a minimum length -- I
instead opted to just error on the attempt to unmarshal, since I'm not
clear on how it could be safe to proceed when the length is
incorrect/unknown.

The issue also talked about possibly "quarantining series", but I don't
know the surrounding code well enough to understand how to make that
happen.
2016-08-30 07:57:39 -04:00
Matt Bostock e618af5d0b Storage: Add crash recovery metric 'started_dirty'
...to indicate when crash recovery was invoked during Prometheus
startup.

Fixes #1918.
2016-08-27 21:41:06 +02:00
Dan Milstein 764ceaa939 Add timeout to test, cap waiting at 1 second 2016-08-24 11:30:38 -04:00
Dan Milstein 007907b410 Fix one of the tests for a remote storage QueueManager
Specifically, the TestSpawnNotMoreThanMaxConcurrentSendsGoroutines was failing on a fresh checkout of master.

The test had a race condition -- it would only pass if one of the
spawned goroutines happened to very quickly pull a set of samples off an
internal queue.

This patch rewrites the test so that it deterministically waits until
all samples have been pulled off that queue.  In case of errors, it also
now reports on the difference between what it expected and what it found.

I verified that, if the code under test is deliberately broken, the test
successfully reports on that.
2016-08-23 16:26:33 -04:00
Julius Volz 3bfec97d46 Make the storage interface higher-level.
See discussion in
https://groups.google.com/forum/#!topic/prometheus-developers/bkuGbVlvQ9g

The main idea is that the user of a storage shouldn't have to deal with
fingerprints anymore, and should not need to do an individual preload
call for each metric. The storage interface needs to be made more
high-level to not expose these details.

This also makes it easier to reuse the same storage interface for remote
storages later, as fewer roundtrips are required and the fingerprint
concept doesn't work well across the network.

NOTE: this deliberately gets rid of a small optimization in the old
query Analyzer, where we dedupe instants and ranges for the same series.
This should have a minor impact, as most queries do not have multiple
selectors loading the same series (and at the same offset).
2016-07-25 13:59:22 +02:00
beorn7 fc6737b7fb storage: improve index lookups
tl;dr: This is not a fundamental solution to the indexing problem
(like tindex is) but it at least avoids utilizing the intersection
problem to the greatest possible amount.

In more detail:

Imagine the following query:

    nicely:aggregating:rule{job="foo",env="prod"}

While it uses a nicely aggregating recording rule (which might have a
very low cardinality), Prometheus still intersects the low number of
fingerprints for `{__name__="nicely:aggregating:rule"}` with the many
thousands of fingerprints matching `{job="foo"}` and with the millions
of fingerprints matching `{env="prod"}`. This totally innocuous query
is dead slow if the Prometheus server has a lot of time series with
the `{env="prod"}` label. Ironically, if you make the query more
complicated, it becomes blazingly fast:

    nicely:aggregating:rule{job=~"foo",env=~"prod"}

Why so? Because Prometheus only intersects with non-Equal matchers if
there are no Equal matchers. That's good in this case because it
retrieves the few fingerprints for
`{__name__="nicely:aggregating:rule"}` and then starts right ahead to
retrieve the metric for those FPs and checking individually if they
match the other matchers.

This change is generalizing the idea of when to stop intersecting FPs
and go into "retrieve metrics and check them individually against
remaining matchers" mode:

- First, sort all matchers by "expected cardinality". Matchers
  matching the empty string are always worst (and never used for
  intersections). Equal matchers are in general consider best, but by
  using some crude heuristics, we declare some better than others
  (instance labels or anything that looks like a recording rule).

- Then go through the matchers until we hit a threshold of remaining
  FPs in the intersection. This threshold is higher if we are already
  in the non-Equal matcher area as intersection is even more expensive
  here.

- Once the threshold has been reached (or we have run out of matchers
  that do not match the empty string), start with "retrieve metrics
  and check them individually against remaining matchers".

A beefy server at SoundCloud was spending 67% of its CPU time in index
lookups (fingerprintsForLabelPairs), serving mostly a dashboard that
is exclusively built with recording rules. With this change, it spends
only 35% in fingerprintsForLabelPairs. The CPU usage dropped from 26
cores to 18 cores. The median latency for query_range dropped from 14s
to 50ms(!). As expected, higher percentile latency didn't improve that
much because the new approach is _occasionally_ running into the worst
case while the old one was _systematically_ doing so. The 99th
percentile latency is now about as high as the median before (14s)
while it was almost twice as high before (26s).
2016-07-20 17:35:53 +02:00
Dmitry Vorobev 273e457da4 web: return status code and error message for config resource 2016-07-15 10:15:24 +02:00
Björn Rabenstein 0622304244 Merge pull request #1798 from prometheus/beorn7/storage2
Crash recovery: Fix an edge case.
2016-07-13 16:53:18 +02:00
beorn7 2a75b15328 Crash recovery: Fix an edge case.
If the chunks of a series in the checkpoint are all older then the
latest chunk on disk, the head chunk is persisted and therefore has to
be declared closed.

It would be great to have a test for this, but that would require more
plumbing, subject of #447.
2016-07-07 16:17:38 +02:00
beorn7 064b57858e Consistently use the Seconds() method for conversion of durations
This also fixes one remaining case of recording integral numbers
of seconds only for a metric, i.e. this will probably fix #1796.
2016-07-07 15:24:35 +02:00
Julius Volz 91401794fa storage: Make MemorySeriesStorage a public type
See https://twitter.com/fabxc/status/748032597876482048
2016-06-29 08:14:23 +02:00
Fabian Reinartz 425736a377 *: remove last remainers of non-second metrics 2016-06-23 17:50:39 +02:00
Julius Volz b7b6717438 Separate query interface out of local.Storage.
PromQL only requires a much narrower interface than local.Storage in
order to run queries. Narrower interfaces are easier to replace and
test, too.

We could also change the web interface to use local.Querier, except that
we'll probably use appending functions from there in the future.
2016-06-23 15:14:38 +02:00
Jan van Valburg 68f3df49d0 stoarge: fix 'access denied' error on Windows
On Windows, it is not possible to rename or delete a file that is
currerntly open. This change closes the file in dropAndPersistChunks
before it tries to delete it, or rename the temporary file to it.
2016-06-21 11:21:20 +02:00
beorn7 b274c7aaa7 Update doc comments 2016-06-03 12:34:01 +02:00
beorn7 99881ded63 Make the number of fingerprint mutexes configurable
With a lot of series accessed in a short timeframe (by a query, a
large scrape, checkpointing, ...), there is actually quite a
significant amount of lock contention if something similar is running
at the same time.

In those cases, the number of locks needs to be increased.

On the same front, as our fingerprints don't have a lot of entropy, I
introduced some additional shuffling. With the current state, anly
changes in the least singificant bits of a FP would matter.
2016-06-02 19:18:00 +02:00
Tobias Schmidt 4c439b4b45 Merge pull request #1646 from prometheus/beorn7/valuecomparison
Correctly identify no-op appends if the value is NaN
2016-05-20 10:38:05 -04:00
beorn7 a308c76292 Improve TestAppendOutOfOrder
It did not test the returned error so far.
Also, add tests for the NaN case broken before
https://github.com/prometheus/common/pull/40
2016-05-20 13:46:33 +02:00
beorn7 b2ef4dc52d Correctly identify no-op appends if the value is NaN
This requires an updating of the vendored commen.model package, which
I will do once https://github.com/prometheus/common/pull/40 is merged.
2016-05-19 18:32:47 +02:00
Dmitry Vorobev bd2a770015 storage/remote: Spawn not more than "maxConcurrentSends" goroutines. 2016-05-19 16:15:04 +02:00
Dmitry Savintsev 7fdb62c253 fix several minor golint style issues 2016-05-11 14:26:18 +02:00
Steve Durrheimer 399d5c6375
Make version informations consistent between prometheus components 2016-05-05 22:33:18 +02:00
beorn7 07a294ac15 Doc comment fixes 2016-04-26 01:05:56 +02:00
beorn7 20cba1ed8f Initialize metric vectors in memorySeriesStorage 2016-04-25 17:08:07 +02:00
beorn7 d566808d40 Bring back logging of discarded samples
But only on DEBUG level.

Also, count and report the two cases of out-of-order timestamps on the
one hand and same timestamp but different value on the other hand
separately.
2016-04-25 16:43:52 +02:00
beorn7 db16acd7fb Never drop a still open head chunk. 2016-04-15 19:18:40 +02:00
beorn7 a90d645378 Checkpoint fingerprint mappings only upon shutdown
Before, we checkpointed after every newly detected fingerprint
collision, which is not a problem as long as collisions are
rare. However, with a sufficient number of metrics or particular
nature of the data set, there might be a lot of collisions, all to be
detected upon the first set of scrapes, and then the checkpointing
after each detection will take a quite long time (it's O(n²),
essentially).

Since we are rebuilding the fingerprint mapping during crash recovery,
the previous, very conservative approach didn't even buy us
anything. We only ever read from the checkpoint file after a clean
shutdown, so the only time we need to write the checkpoint file is
during a clean shutdown.
2016-04-15 01:03:28 +02:00
Jonathan Boulle 38098f8c95 Add missing license headers
Prometheus is Apache 2 licensed, and most source files have the
appropriate copyright license header, but some were missing it without
apparent reason. Correct that by adding it.
2016-04-13 16:08:22 +02:00
Fabian Reinartz a18639dc2d Merge pull request #1454 from prometheus/beorn7/fix-test
Give TestEvictAndLoadChunkDescs more time to actually evict
2016-04-08 14:58:01 +02:00
Tobias Schmidt e82ef154ee Remove unused code leftovers 2016-04-02 20:20:55 -04:00
beorn7 d09ca03e10 Work around compiler bug
Benchmarks don't show any significant changes.
2016-03-29 17:05:28 +02:00
beorn7 507f550cd4 Merge branch 'master' into beorn7/storage7 2016-03-24 14:21:28 +01:00
beorn7 865d16f870 Rename Gorilla into varbit 2016-03-23 16:30:41 +01:00
Julius Volz d3b53bd7f0 Fix comment about Graphite mapping of dimensions. 2016-03-23 00:38:32 +01:00
beorn7 4b574e8a61 Switch chunk encoding to type 2 where it was hardcoded type 1 before
The chunk encoding was hardcoded there because it mostly doesn't
matter what encoding is chosen in that test. Since type 1 is
battle-hardened enough, I'm switching to type 2 here so that we can
catch unexpected problems as a byproduct. My expectation is that the
chunk encoding doesn't matter anyway, as said, but then "unexpected
problems" contains the word "unexpected".
2016-03-20 23:32:20 +01:00
beorn7 c72979e3ed Remove a redundancy from Gorilla-style chunks
So far, the last sample in a chunk was saved twice. That's required
for adding more samples as we need to know the last sample added to
add more samples without iterating through the whole chunk. However,
once the last sample was added to the chunk before it's full, there is
no need to save it twice. Thus, the very last sample added to a chunk
can _only_ be saved in the header fields for the last sample. The
chunk has to be identifiable as closed, then. This information has
been added to the flags byte.
2016-03-20 23:09:48 +01:00
beorn7 b6dbb826ae Improve fuzz testing and fix a bug exposed
This improves fuzz testing in two ways:

(1) More realistic time stamps. So far, the most common case in
practice was very rare in the test: Completely regular increases of
the timestamp.

(2) Verify samples by scanning through the whole relevant section of
the series.

For Gorilla-like chunks, this showed two things:

(1) With more regularly increasing time stamps, BenchmarkFuzz is
essentially as fast as with the traditional chunks:

```
BenchmarkFuzzChunkType0-8              2         972514684 ns/op        83426196 B/op    2500044 allocs/op
BenchmarkFuzzChunkType1-8              2         971478001 ns/op        82874660 B/op    2512364 allocs/op
BenchmarkFuzzChunkType2-8              2         999339453 ns/op        76670636 B/op    2366116 allocs/op
```

(2) There was a bug related to when and how the chunk footer is
overwritten to make use for the last sample. This wasn't exposed by
random access as the last sample of a chunk is retrieved from the
values in the header in that case.
2016-03-20 17:21:28 +01:00
beorn7 9d8fbbe822 Review improvements 2016-03-17 17:31:56 +01:00
beorn7 8cdced3850 Implement Gorilla-inspired chunk encoding
This is not a verbatim implementation of the Gorilla encoding.  First
of all, it could not, even if we wanted, because Prometheus has a
different chunking model (constant size, not constant time).  Second,
this adds a number of changes that improve the encoding in general or
at least for the specific use case of Prometheus (and are partially
only possible in the context of Prometheus). See comments in the code
for details.
2016-03-17 14:47:08 +01:00
beorn7 8e64e8dfca Fix return statement. 2016-03-17 14:43:00 +01:00
Björn Rabenstein 98c8560851 Merge pull request #1477 from prometheus/beorn7/storage7
Solve the series churn problem...
2016-03-17 14:39:28 +01:00
beorn7 e7ac9c6863 Improvments based on review
- Moved returns into the default section of switch statement that can
  only happen then.

- Fix typo.
2016-03-17 14:37:24 +01:00
beorn7 199f309a39 Resurrect and rename invalid preload requests count metric.
It is now also used in label matching, so the name of the metric
changed from `prometheus_local_storage_invalid_preload_requests_total`
to `non_existent_series_matches_total'.
2016-03-13 11:54:24 +01:00
beorn7 e8c1f30ab2 Merge the parallel logic of getSeriesForRange and metricForFingerprint 2016-03-09 21:56:15 +01:00
beorn7 9445c7053d Add tests for range-limited label matching
While doing so, improve getSeriesForRange.
2016-03-09 21:01:03 +01:00
beorn7 47e3c90f9b Clean up error propagation
Only return an error where callers are doing something with it except
simply logging and ignoring.

All the errors touched in this commit flag the storage as dirty
anyway, and that fact is logged anyway. So most of what is being
removed here is just log spam.

As discussed earlier, the class of errors that flags the storage as
dirty signals fundamental corruption, no even bubbling up a one-time
warning to the user (e.g. about incomplete results) isn't helping much
because _anything_ happening in the storage has to be doubted from
that point on (and in fact retroactively into the past, too). Flagging
the storage dirty, and alerting on it (plus marking the state in the
web UI) is the only way I can see right now.

As a byproduct, I cleaned up the setDirty method a bit and improved
the logged errors.
2016-03-09 18:56:30 +01:00
beorn7 99854a84d7 Merge branch 'beorn7/storage6' into beorn7/storage7 2016-03-09 17:23:25 +01:00
beorn7 5e4fa96719 Merge branch 'beorn7/storage5' into beorn7/storage6 2016-03-09 17:21:32 +01:00
beorn7 b343e65907 Merge branch 'beorn7/storage4' into beorn7/storage5
erge is necessary,
2016-03-09 17:14:42 +01:00
beorn7 d0a4477446 Merge branch 'beorn7/storage3' into beorn7/storage4
Conflicts:
	storage/local/preload.go
	storage/local/storage.go
	storage/local/storage_test.go
2016-03-09 17:13:16 +01:00
beorn7 55eddab25f Merge branch 'beorn7/storage2' into beorn7/storage3 2016-03-09 16:48:46 +01:00
beorn7 161eada3ad Make chunkIterator even leaner. 2016-03-09 16:20:39 +01:00
beorn7 beb36df4bb De-flag preloadChunksForRange
Now there is preloadChunksForRange and preloadChunksForInstant in
both, the series and the storage.
2016-03-09 14:50:09 +01:00
beorn7 836f1db04c Improve MetricsForLabelMatchers
WIP: This needs more tests.

It now gets a from and through value, which it may opportunistically
use to optimize the retrieval. With possible future range indices,
this could be used in a very efficient way. This change merely applies
some easy checks, which should nevertheless solve the use case of
heavy rule evaluations on servers with a lot of series churn.

Idea is the following:

- Only archive series that are at least as old as the headChunkTimeout
  (which was already extremely unlikely to happen).

- Then maintain a high watermark for the last archival, i.e. no
  archived series has a sample more recent than that watermark.

- Any query that doesn't reach to a time before that watermark doesn't
  have to touch the archive index at all. (A production server at
  Soundcloud with the aforementioned series churn and heavy rule
  evaluations spends 50% of its CPU time in archive index
  lookups. Since rule evaluations usually only touch very recent
  values, most of those lookup should disappear with this change.)

- Federation with a very broad label matcher will profit from this,
  too.

As a byproduct, the un-needed MetricForFingerprint method was removed
from the Storage interface.
2016-03-09 00:25:59 +01:00
beorn7 167b83695c Merge branch 'beorn7/storage5' into beorn7/storage6 2016-03-08 00:20:44 +01:00
beorn7 01795382c9 Merge branch 'beorn7/storage4' into beorn7/storage5 2016-03-08 00:20:13 +01:00
beorn7 f7fc542db6 Merge branch 'master' into beorn7/storage4
Conflicts:
	storage/local/persistence.go
2016-03-08 00:14:00 +01:00
beorn7 3d86130d8c Merge branch 'master' into beorn7/storage3 2016-03-07 23:39:12 +01:00
beorn7 1f30c8de8d Merge branch 'master' into beorn7/storage2 2016-03-07 23:38:42 +01:00
beorn7 c13b1ecfe9 Make chunk iterators more DRY
This finally extracts all the common code of the two chunk iterators
into one. Any future chunk encodings with fast access by index can use
the same iterator by simply providing an indexAccessor. Other future
chunk encodings without fast index access (like Gorilla-style) can
still implement the chunkIterator interface as usual.
2016-03-07 20:23:14 +01:00
beorn7 32f280a3cd Slim down the chunkIterator interface
For one, remove unneeded methods.

Then, instead of using a channel for all values, use a
bufio.Scanner-like interface. This removes the need for creating a
goroutine and avoids the (unnecessary) locking performed by channel
sending and receiving.

This will make it much easier to write new chunk implementations (like
Gorilla-style encoding).
2016-03-07 19:50:13 +01:00
beorn7 b6fdb355d7 Move dump-heads into its own tool 2016-03-07 16:30:19 +01:00
beorn7 f193f2b8ef Add a command to promtool that dumps metadata of heads.db
I needed this today for debugging. It can certainly be improved, but
it's already quite helpful.

I refactored the reading of heads.db files out of persistence, which
is an improvement, too.

I made minor changes to the cli package to allow outputting via the
io.Writer interface.
2016-03-07 16:21:57 +01:00
beorn7 75a6b460ef Give TestEvictAndLoadChunkDescs more time to actually evict
Obviously, it's really bad to depend on timing here. The proper fix
would be to have something like WaitForIndexing for other things to
wait for, too.

For now, let's see if the wait time increase fixes the issue.
2016-03-03 13:29:39 +01:00
beorn7 fc7de5374a Quarantine series upon problem writing to the series file
This fixes https://github.com/prometheus/prometheus/issues/1059 , but
not in the obvious way (simply not updating the persist watermark,
because that's actually not that simple - we don't really know what
has gone wrong exactly). As any errors relevant here are most likely
caused by severe and unrecoverable problems with the series file,
Using the now quarantine feature is the right step. We don't really
have to be worried about any inconsistent state of the series because
it will be removed for good ASAP. Another plus is that we don't have
to declare the whole storage dirty anymore.
2016-03-03 13:15:02 +01:00
beorn7 0ea5801e47 Handle errors caused by data corruption more gracefully
This requires all the panic calls upon unexpected data to be converted
into errors returned. This pollute the function signatures quite
lot. Well, this is Go...

The ideas behind this are the following:

- panic only if it's a programming error. Data corruptions happen, and
  they are not programming errors.

- If we detect a data corruption, we "quarantine" the series,
  essentially removing it from the database and putting its data into
  a separate directory for forensics.

- Failure during writing to a series file is not considered corruption
  automatically. It will call setDirty, though, so that a
  crashrecovery upon the next restart will commence and check for
  that.

- Series quarantining and setDirty calls are logged and counted in
  metrics, but are hidden from the user of the interfaces in
  interface.go, whith the notable exception of Append(). The reasoning
  is that we treat corruption by removing the corrupted series, i.e. a
  query for it will return no results on its next call anyway, so
  return no results right now. In the case of Append(), we want to
  tell the user that no data has been appended, though.

Minor side effects:

- Now consistently using filepath.* instead of path.*.

- Introduced structured logging where I touched it. This makes things
  less consistent, but a complete change to structured logging would
  be out of scope for this PR.
2016-03-02 23:02:34 +01:00
beorn7 b6840997a7 Merge branch 'beorn7/storage2' into beorn7/storage3 2016-03-02 16:11:25 +01:00
beorn7 ce58fd357b Merge branch 'beorn7/storage' into beorn7/storage2
Conflicts:
	storage/local/chunk.go
	storage/local/interface.go
2016-03-02 16:09:32 +01:00
beorn7 2581648f70 Separate iterators by offset
Add test that exposes the problem.
2016-03-02 16:01:03 +01:00
beorn7 c740789ce3 Improve predict_linear
Fixes https://github.com/prometheus/prometheus/issues/1401

This remove the last (and in fact bogus) use of BoundaryValues.

Thus, a whole lot of unused (and arguably sub-optimal / ugly) code can
be removed here, too.
2016-02-25 12:10:55 +01:00
beorn7 4b503ed9a5 Merge branch 'master' into beorn7/storage2 2016-02-24 14:03:49 +01:00
beorn7 059295332f Merge remote-tracking branch 'origin/master' into beorn7/storage 2016-02-24 14:02:27 +01:00
beorn7 53005c3085 Merge branch 'beorn7/storage' into beorn7/storage2 2016-02-24 14:00:56 +01:00
beorn7 28e9bbc15f Populate chunkDesc.chunkLastTime during checkpoint loading, too 2016-02-24 13:58:34 +01:00
Björn Rabenstein a8c79f0a0c Merge pull request #1422 from prometheus/release-0.17
Merge more commits from 0.17.
2016-02-23 23:07:44 +01:00
beorn7 8fa1560e48 Fix a very special case of handling the checkpoint timer 2016-02-23 16:48:35 +01:00
beorn7 41e44f6ab9 Merge branch 'master' into beorn7/storage2 2016-02-22 16:54:33 +01:00
Björn Rabenstein d9eb624322 Merge pull request #1415 from prometheus/release-0.17
Forward-merge release-0.17 into master
2016-02-22 16:39:48 +01:00
beorn7 4d1f7b49b6 Fix a race condition in calculatePersistenceUrgencyScore 2016-02-22 15:48:39 +01:00
beorn7 454ecf3f52 Rework the way ranges and instants are handled
In a way, our instants were also ranges, just with the staleness delta
as range length. They are no treated equally, just that in one case,
the range length is set as range, in the other the staleness
delta. However, there are "real" instants where start and and time of
a query is the same. In those cases, we only want to return a single
value (the one closest before or at the equal start and end time). If
that value is the last sample in the series, odds are we have it
already in the series object. In that case, there is no need to pin or
load any chunks. A special singleSampleSeriesIterator is created for
that. This should greatly speed up instant queries as they happen
frequently for rule evaluations.
2016-02-22 01:47:18 +01:00
beorn7 b876f8e6a5 Move lastSamplePair method up to memorySeries
This implies a slight change of behavior as only samples added to the
respective instance of a memorySeries are returned. However, this is
most likely anyway what we want.

Following cases:

- Server has been restarted: Given the time it takes to cleanly
  shutdown and start up a server, the series are now stale anyway. An
  improved staleness handling (still to be implemented) will be based
  on tracking if a given target is continuing to expose samples for a
  given time series. In that case, we need a full scrape cycle to
  decide about staleness. So again, it makes sense to consider
  everything stale directly after a server restart.

- Series unarchived due to a read request: The series is definitely
  stale so we don't want to return anything anyway.

- Freshly created time series or series unarchived because of a sample
  append: That happens because appending a sample is imminent. Before
  the fingerprint lock is released, the series will have received a
  sample, and lastSamplePair will always returned the expected value.
2016-02-19 18:16:41 +01:00
beorn7 1e13f89039 Return SamplePair istead of *SamplePair consistently
Formalize ZeroSamplePair as return value for non-existing samples.

Change LastSamplePairForFingerprint to return a SamplePair (and not a
pointer to it), which saves allocations in a potentially extremely
frequent call.
2016-02-19 17:00:40 +01:00
beorn7 d290340367 Fix and improve chunkDesc locking 2016-02-19 16:24:38 +01:00
beorn7 0e202dacb4 Streamline series iterator creation
This will fix issue #1035 and will also help to make issue #1264 less
bad.

The fundamental problem in the current code:

In the preload phase, we quite accurately determine which chunks will
be used for the query being executed. However, in the subsequent step
of creating series iterators, the created iterators are referencing
_all_ in-memory chunks in their series, even the un-pinned ones. In
iterator creation, we copy a pointer to each in-memory chunk of a
series into the iterator. While this creates a certain amount of
allocation churn, the worst thing about it is that copying the chunk
pointer out of the chunkDesc requires a mutex acquisition. (Remember
that the iterator will also reference un-pinned chunks, so we need to
acquire the mutex to protect against concurrent eviction.) The worst
case happens if a series doesn't even contain any relevant samples for
the query time range. We notice that during preloading but then we
will still create a series iterator for it. But even for series that
do contain relevant samples, the overhead is quite bad for instant
queries that retrieve a single sample from each series, but still go
through all the effort of series iterator creation. All of that is
particularly bad if a series has many in-memory chunks.

This commit addresses the problem from two sides:

First, it merges preloading and iterator creation into one step,
i.e. the preload call returns an iterator for exactly the preloaded
chunks.

Second, the required mutex acquisition in chunkDesc has been greatly
reduced. That was enabled by a side effect of the first step, which is
that the iterator is only referencing pinned chunks, so there is no
risk of concurrent eviction anymore, and chunks can be accessed
without mutex acquisition.

To simplify the code changes for the above, the long-planned change of
ValueAtTime to ValueAtOrBefore time was performed at the same
time. (It should have been done first, but it kind of accidentally
happened while I was in the middle of writing the series iterator
changes. Sorry for that.) So far, we actively filtered the up to two
values that were returned by ValueAtTime, i.e. we invested work to
retrieve up to two values, and then we invested more work to throw one
of them away.

The SeriesIterator.BoundaryValues method can be removed once #1401 is
fixed. But I really didn't want to load even more changes into this
PR.

Benchmarks:

The BenchmarkFuzz.* benchmarks run 83% faster (i.e. about six times
faster) and allocate 95% fewer bytes. The reason for that is that the
benchmark reads one sample after another from the time series and
creates a new series iterator for each sample read.

To find out how much these improvements matter in practice, I have
mirrored a beefy Prometheus server at SoundCloud that suffers from
both issues #1035 and #1264. To reach steady state that would be
comparable, the server needs to run for 15d. So far, it has run for
1d. The test server currently has only half as many memory time series
and 60% of the memory chunks the main server has. The 90th percentile
rule evaluation cycle time is ~11s on the main server and only ~3s on
the test server. However, these numbers might get much closer over
time.

In addition to performance improvements, this commit removes about 150
LOC.
2016-02-19 16:24:38 +01:00
beorn7 ef3ab96111 Populate first and last time in the chunk descriptor earlier
The First time is kind of trivial as we always know it when we create
a new chunkDesc.

The last time is only know when the chunk is closed, so we have to set
it at that time.

The change saves a lot of digging down into the chunk
itself. Especially the last time is relative expensive as it involves
the creation of an iterator. The first time access now doesn't require
locking, which is also a nice gain.
2016-02-15 14:06:09 +01:00
beorn7 9a3edea477 Remove race condition from TestRetentionCutoff 2016-02-12 12:13:19 +01:00
Julius Volz 9b6d69610a Fix various typos in comments.
Helpfully reported by
https://goreportcard.com/report/github.com/prometheus/prometheus :)
2016-02-10 03:47:00 +01:00
Fabian Reinartz 1f877f3d2a Fix deadlock, structure target logging 2016-02-03 10:39:34 +01:00
Fabian Reinartz 59f1e722df Return error on sample appending 2016-02-02 14:01:44 +01:00
beorn7 ec08c9a391 Rework the way to communicate backpressure (AKA suspended ingestion)
This gives up on the idea to communicate throuh the Append() call (by
either not returning as it is now or returning an error as
suggested/explored elsewhere). Here I have added a Throttled() call,
which has the advantage that it can be called before a whole _batch_
of Append()'s. Scrapes will happen completely or not at all. Same for
rule group evaluations. That's a highly desired behavior (as discussed
elsewhere). The code is even simpler now as the whole ingestion buffer
could be removed.

Logging of throttled mode has been streamlined and will create at most
one message per minute.
2016-02-01 14:45:44 +01:00
beorn7 87ef24cd25 Add instrumentation and refactor things around "rushed mode" 2016-01-26 17:44:21 +01:00
beorn7 a2cd479058 Fix calculation of chunks to persist after restart
Since we are not overestimating the number of chunks to persist
anymore, this commit also adjusts the default value for
-storage.local.memory-chunks. Update of documentation will follow.
2016-01-25 19:33:51 +01:00
beorn7 972d94433a Introduce a hysteresis for "rushed mode"
"Rushed mode" is formerly known as "degraded mode", which is changed
with this commit, too. The name "degraded" was very misleading.

Also, switch into rushed mode if we have too many chunks in memory and
an at least reasonable amount of chunks to persist so that speeding up
persisting chunks can help.
2016-01-25 19:24:37 +01:00
beorn7 14796bdb60 Improve chunkMaxBatchSize doc comment 2016-01-25 18:57:51 +01:00
beorn7 582af1618c Streamline chunk writing
This helps to avoid allocations in the same way we were already doing
it during reading.
2016-01-25 16:36:36 +01:00
beorn7 99b9611351 Remove a race condition from TestRetentionCutoff 2016-01-25 16:36:14 +01:00
beorn7 3f4d22e4c7 Update doc comment
This should have gone into a previous commit, but I forgot to save
this particular file.
2016-01-12 12:38:18 +01:00
beorn7 add2ebdd56 Tolerate the lost+found directory in the data directory 2016-01-11 18:05:36 +01:00
Björn Rabenstein 6293f3a374 Merge pull request #1304 from prometheus/beorn7/storage
Improve handling of series file truncation
2016-01-11 17:27:08 +01:00
beorn7 cb117d8346 Add a series ops metric "purge_on_request"
It counts series deletions triggered via the API.
2016-01-11 17:22:16 +01:00
beorn7 4221c7de5c Improve handling of series file truncation
If only very few chunks are to be truncated from a very large series
file, the rewrite of the file is a lorge overhead. With this change, a
certain ratio of the file has to be dropped to make it happen. While
only causing disk overhead at about the same ratio (by default 10%),
it will cut down I/O by a lot in above scenario.
2016-01-11 16:42:10 +01:00
Corentin Chary 7b6c3e556c Use '.' instead of '=' to separate labels from their values in Graphite
Using .label=value. was weird to use in Graphite and didn't bring much value.
2016-01-11 13:57:14 +01:00
Julius Volz 75fdcf5698 Merge pull request #1197 from iksaif/master
Add support for remote storage on Graphite
2015-11-10 09:46:17 +01:00
Corentin Chary a2e4439086 Add support for remote storage on Graphite
Allows to use graphite over tcp or udp. Metrics labels
and values are used to construct a valid Graphite path
in a way that will allow us to eventually read them back
and reconstruct the metrics.

For example, this metric:

model.Metric{
	model.MetricNameLabel: "test:metric",
	"testlabel":           "test:value",
	"testlabel2":           "test:value",
)

Will become:

test:metric.testlabel=test:value.testlabel2=test:value

escape.go takes care of escaping values to match Graphite
character set, it basically uses percent-encoding as a fallback
wich will work pretty will in the graphite/grafana world.

The remote storage module also has an optional 'prefix' parameter
to prefix all metrics with a path (for example, 'prometheus.').

Graphite URLs are simply in the form tcp://host:port or
udp://host:port.
2015-11-10 07:58:57 +01:00
Fabian Reinartz 33aab4169c Anchor regexes in vector matching
This commit makes the regex behavior of vector matching consistent with
configuration and label_replace() by anchoring it.

Fixes #1200
2015-11-05 11:23:43 +01:00
Fabian Reinartz e3b6ec9784 Switch to common/log 2015-10-03 10:21:43 +02:00
Julius Volz dac26cef71 Rename global "labels" config option to "external_labels". 2015-09-29 20:54:20 +02:00
Julius Volz eeb1da36ac Fix InfluxDB write support to work with InfluxDB 0.9.x.
Because the InfluxDB client library currently pulls in multiple MBs of
unnecessary dependencies, I have modified and cut up the vendored
version to only pull in the few pieces that are actually needed.

On InfluxDB's side, this dependency issue is tracked in:

https://github.com/influxdb/influxdb/issues/3447

Hopefully, it will be resolved soon.

If a password is needed for InfluxDB, it may be supplied via the
INFLUXDB_PW environment variable.
2015-09-16 17:40:03 +02:00
Julius Volz 5f77fce578 Improve remote storage queue manager metrics. 2015-09-16 17:20:23 +02:00
beorn7 22d3a4311a Increase waiting time in TestEvictAndLoadChunkDescs
The test had become flaky with Go1.5.

Theory here is that with Go1.5.x, sleeping for 10ms might not be
enough to wake up another goroutine, possibly because it is used for
GC. 50ms should always be enough due to GC pause guarantees with the
new GC.
2015-09-14 21:09:46 +02:00
Julius Volz af513468eb Fix some dead code, missing error checks, shadowings.
I applied
https://medium.com/@jgautheron/quality-pipeline-for-go-projects-497e34d6567
and was greeted with a deluge of warnings, most of which were not
applicable or really fixable realistically. These are some of the first
ones I decided to fix.
2015-09-14 12:21:34 +02:00
beorn7 daeccdd0e9 Fix DropMetricsForFingerprints
It now deletes the series file also for archived series.

Also, fix a naming error in a doc comment.
2015-09-11 15:47:23 +02:00
Julius Volz ffc5142c54 Merge pull request #1058 from prometheus/check-errors
Fix error checking and logging around checkpointing.
2015-09-07 19:57:16 +02:00
Julius Volz 6774a73878 Fix error checking and logging around checkpointing. 2015-09-07 19:34:59 +02:00
Julius Volz 011faf9057 Fix typo in comment. 2015-09-07 19:15:28 +02:00
Fabian Reinartz 8fa719f778 Attach global labels to remote storage samples 2015-09-03 16:38:04 +02:00
Dieter Plaetinck e1dacc56e6 fix comment.
the sample doesn't get appended to the list of sampleappenders.
2015-08-30 16:26:46 +02:00
Julius Volz 744d5d5a7a Merge pull request #1029 from prometheus/vet-fixes
Fix "go vet" errors.
2015-08-26 12:50:18 +02:00
Julius Volz 995d3b831d Fix most golint warnings.
This is with `golint -min_confidence=0.5`.

I left several lint warnings untouched because they were either
incorrect or I felt it was better not to change them at the moment.
2015-08-26 12:44:46 +02:00
Julius Volz 963ad82dcb Fix "go vet" errors.
I ignored all errors of the type "composite literal uses unkeyed
fields". Most of them are wrong because of
https://github.com/golang/go/issues/9171.
2015-08-26 02:05:04 +02:00
Fabian Reinartz d6b8da8d43 Switch promql types to common/model 2015-08-25 13:49:14 +02:00
Fabian Reinartz e061595352 Move COWMetric into storage/metric package 2015-08-25 11:59:07 +02:00
Brian Brazil fdf0d0642e Cast value to float, as that's what the console templates expect. 2015-08-24 16:59:08 +01:00
Fabian Reinartz 1535ef1457 Replace metric.SamplePair with model.SamplePair 2015-08-22 14:52:35 +02:00
Fabian Reinartz c9d396f476 Replace metric.LabelPair with model.LabelPair 2015-08-22 13:32:13 +02:00
Fabian Reinartz 438e232c9b Fix grouping of import blocks 2015-08-22 09:42:45 +02:00
Fabian Reinartz 306e8468a0 Switch from client_golang/model to common/model 2015-08-21 13:33:38 +02:00
Julius Volz f65ef1ed10 Fix wording in shutdown warning. 2015-08-17 14:26:53 +02:00
Brian Brazil 0ec71442cd Storage: Tell users how to avoid crash recovery.
If users see the crash recovery error, the chances are
they aren't shutting down Prometheus correctly. Telling
them how to do so will help them debug and fix the problem.
2015-08-16 10:42:31 +01:00
Laurie Malau 20ad403587 Don't warn/increment metric upon equal timestamps during append.
Perhaps it would be even better to still warn in case the sample value has
changed but the timestamps are equal, but we don't have efficient access
to the last value.
2015-08-09 23:49:49 +02:00
Will Rouesnel 7810448dbe Add proxy_url parameter to allow specifying per-job HTTP proxy servers
Allow scrape_configs to have an optional proxy_url option which specifies
a proxy to be used for all connections to hosts in that config.

Internally this modifies the various client functions to take a *url.URL pointer
which currently must point to an HTTP proxy (but has been left open-ended to
allow the url format to be extended to support others, such as maybe SOCKS if
needed).
2015-08-08 04:29:27 +10:00
Julius Volz 517badc21d Only do regex lookups when there was no equality match.
For the label matching index-based preselection phase, don't do an OR
between equality and non-equality matchers. Execute only one of the two
(with equality matchers preferred when present).

Fixes https://github.com/prometheus/prometheus/issues/924
2015-07-23 23:13:30 +02:00
beorn7 699946bf32 Fix chunk desc loading.
If all samples in consecutive chunks have the same timestamp, the way
we used to load chunks will fail. With this change, the persist
watermark is used to load the right amount of chunkDescs from disk.

This bug is a possible reason for the rare storage corruption we have
observed.
2015-07-16 13:09:20 +02:00
beorn7 4203849c92 Test chunkDesc eviction and loading 2015-07-16 13:09:13 +02:00
beorn7 37e12df9ff Improve TestAppendOutOfOrder 2015-07-16 12:48:33 +02:00
beorn7 502aa9ded5 Use Has instead of Get for existence test. 2015-07-16 12:26:50 +02:00
beorn7 ff08f0b6fe storage: ensure timestamp monotonicity within series.
Fixes https://github.com/prometheus/prometheus/issues/481

While doing so, clean up and fix a few other things:

- Fix `go vet` warnings (@fabxc to blame ;).

- Fix a racey problem with unarchiving: Whenever we unarchive a
  series, we essentially want to do something with it. However, until
  we have done something with it, it appears like a series that is
  ready to be archived or even purged. So e.g. it would be ignored
  during checkpointing. With this fix, we always load the chunkDescs
  upon unarchiving. This is wasteful if we only want to add a new
  sample to an archived time series, but the (presumably more common)
  case where we access an archived time series in a query doesn't
  become more expensive.

- The change above streamlined the getOrCreateSeries ond
  newMemorySeries flow. Also, the modTime is now always set correctly.

- Fix the leveldb-backed implementation of KeyValueStore.Delete. It
  had the wrong behavior of still returning true, nil if a
  non-existing key has been passed in.
2015-07-15 18:56:53 +02:00
Julius Volz acbc2b8cb6 storage: Fix float->uint conversions on some compilers.
See https://github.com/prometheus/prometheus/issues/887, which will at
least be partially fixed by this.

From the spec https://golang.org/ref/spec#Conversions:

"In all non-constant conversions involving floating-point or complex
values, if the result type cannot represent the value the conversion
succeeds but the result value is implementation-dependent."

This ended up setting the converted values to 0 on Debian's Go 1.4.2
compiler, at least on 32-bit Debians.
2015-07-13 11:19:11 +02:00
Fabian Reinartz 3e811ad7a4 Merge pull request #827 from prometheus/fabxc/rmt-cleanup
main: cleanup initialization of remote storage.
2015-06-24 09:53:47 +02:00
Fabian Reinartz 23e77450ff main: cleanup initialization of remote storage. 2015-06-23 18:24:48 +02:00
beorn7 8c196c1028 Minor doc fixes. 2015-06-23 17:07:18 +02:00
Fabian Reinartz 6bfb4549a6 storage: add LastSamplePairForFingerprint method 2015-06-23 13:45:15 +02:00
Fabian Reinartz dc7d27ab9a retrieval: add honor label handling and parametrized querying.
This commit adds the honor_labels and params arguments to the scrape
config. This allows to specify query parameters used by the scrapers
and handling scraped labels with precedence.
2015-06-23 13:45:14 +02:00
beorn7 9016917d1c Increment dirty counter only if setDirty(true) is called.
Currently, we increment the counter even if setDirty(false) is called,
which sets the storage clean.
2015-06-22 18:12:55 +02:00
Fabian Reinartz 1eff186555 Merge pull request #810 from prometheus/fabxc/lmatch
Match empty labels.
2015-06-22 15:45:50 +02:00
Fabian Reinartz 5b91ea9b36 storage: improve label matching and allow unset matching.
Matching of empty labels now also matches metrics where the label
was not explicitly set to the empty string.
2015-06-22 15:33:44 +02:00
Fabian Reinartz 46df1fd5ea storage/local: add benchmark for label matching. 2015-06-22 15:33:44 +02:00
Fabian Reinartz b105e26f4d storage: remove global flags 2015-06-15 19:01:06 +02:00
Fabian Reinartz 5c6c0e2faa Add storage method to delete time series 2015-06-01 21:23:32 +02:00
Fabian Reinartz 0de6edbdfc Move pkg/ to util/ 2015-06-01 21:12:32 +02:00
Fabian Reinartz dfaf31a1da Move web/httputils to pkg/httputil and add DeadlineClient to it 2015-06-01 21:12:31 +02:00
Fabian Reinartz 2317b001d0 Move flock package to pkg/flock 2015-06-01 21:12:31 +02:00
Fabian Reinartz 3c8fbf1e15 Move test package to pkg/testutil 2015-06-01 21:12:31 +02:00
Fabian Reinartz aff01e29c3 Limit retrievable samples to retention window.
The storage does not delete data immediately after the retention period.
We don't want to retrieve this data as it causes artifacts.
2015-05-27 13:13:59 +02:00
Fabian Reinartz a92134a947 Merge pull request #724 from prometheus/fabxc/storage-startup
Read from indexing queue during crash recovery.
2015-05-23 16:50:47 +02:00
Fabian Reinartz 6e319532cf Read from indexing queue during crash recovery.
Change #704 introduced a regression that started reading the queue only
after potential crash recovery. When more than the queue capacity was
indexed, Prometheus deadlocked.
2015-05-23 15:32:35 +02:00
beorn7 dbcb3d9333 Use an RW lock to checkpoint fingerprint mappings.
This has to be backported to 0.13.x.
2015-05-23 14:05:05 +02:00
beorn7 3b9ab546e6 Add metrics to count inconsistencies and fp collisions. 2015-05-21 18:46:20 +02:00
Björn Rabenstein c44e7cd105 Merge pull request #706 from prometheus/beorn7/persistence2
Improve iterator performance.
2015-05-21 13:48:52 +02:00
Fabian Reinartz 112a778922 Align int64s for atomic operations 2015-05-21 01:38:50 +02:00
beorn7 3b9c421a69 Weed out all the [Gg]et* method names.
The only exception is getNumChunksToPersist to avoid naming the struct
member numChunksToPersist in a weird way.
2015-05-20 19:13:06 +02:00
Julius Volz 267fd34156 Switch Prometheus to use github.com/prometheus/log.
This change is conceptually very simple, although the diff is large. It
switches logging from "github.com/golang/glog" to
"github.com/prometheus/log", while not actually changing any log
messages. V(1)-style logging has been changed to be log.Debug*().
2015-05-20 18:19:32 +02:00
beorn7 81b190bf45 Remove locking from series iterator. Cache chunk iterators. 2015-05-20 16:19:34 +02:00
beorn7 cd5574bf8a Make chunk and series iterators more efficient. 2015-05-20 16:19:34 +02:00
beorn7 f79c694be5 Add benchmarks for series iterator methods. 2015-05-20 16:19:34 +02:00
Fabian Reinartz f59a449a24 Fix storage test 2015-05-20 16:12:07 +02:00
Fabian Reinartz d8440d75f1 Do not start storage processing before Start() is called. 2015-05-19 13:51:45 +02:00
beorn7 d1a93655a1 Fix typo. 2015-05-11 17:15:30 +02:00
beorn7 7c6466d476 Reserve only ~1M FPs for the mapping.
That reduces the chance of having a fingerprint in the reserved area.
2015-05-08 18:10:56 +02:00
beorn7 ac75dc2812 Avoid archive lookup for known mapped FPs. 2015-05-08 16:39:26 +02:00
beorn7 ed810b45bf Improvements after review. 2015-05-08 13:35:39 +02:00
beorn7 c36e0e05f1 Add crash recovery of fingerprint mappings. 2015-05-07 18:58:14 +02:00
beorn7 2235cec175 Handle fingerprint collisions. 2015-05-07 18:17:59 +02:00
Fabian Reinartz eeca323d24 Merge branch 'master' into promql 2015-05-06 13:04:54 +02:00
beorn7 9820e5fe99 Use FastFingerprint where appropriate. 2015-05-06 12:00:58 +02:00
Scott Worley e5f92d35fe Fix storage/local tests for 32-bit systems 2015-04-30 14:19:48 -07:00
Fabian Reinartz 32b7595c47 Create promql package with lexer/parser.
This commit creates a (so far unused) package. It contains the a custom
lexer/parser for the query language.

ast.go: New AST that interacts well with the parser.
lex.go: Custom lexer (new).
lex_test.go: Lexer tests (new).
parse.go: Custom parser (new).
parse_test.go: Parser tests (new).
functions.go: Changed function type, dummies for parser testing (barely changed/dummies).
printer.go: Adapted from rules/ and adjusted to new AST (mostly unchanged, few additions).
2015-04-23 16:04:50 +02:00
beorn7 a052d32609 Comment improvement. 2015-04-14 10:49:43 +02:00
beorn7 66fc61f9b7 Make bufPool a member of the persistence struct. 2015-04-14 10:43:09 +02:00
beorn7 b02d900e61 Improve chunk and chunkDesc loading.
Also, clean up some things in the code (especially introduction of the
chunkLenWithHeader constant to avoid the same expression all over the place).

Benchmark results:

BEFORE
BenchmarkLoadChunksSequentially     5000            283580 ns/op          152143 B/op        312 allocs/op
BenchmarkLoadChunksRandomly        20000             82936 ns/op           39310 B/op         99 allocs/op
BenchmarkLoadChunkDescs            10000            110833 ns/op           15092 B/op        345 allocs/op

AFTER
BenchmarkLoadChunksSequentially    10000            146785 ns/op          152285 B/op        315 allocs/op
BenchmarkLoadChunksRandomly        20000             67598 ns/op           39438 B/op        103 allocs/op
BenchmarkLoadChunkDescs            20000             99631 ns/op           12636 B/op        192 allocs/op

Note that everything is obviously loaded from the page cache (as the
benchmark runs thousands of times with very small series files). In a
real-world scenario, I expect a larger impact, as the disk operations
will more often actually hit the disk. To load ~50 sequential chunks,
this reduces the iops from 100 seeks and 100 reads to 1 seek and 1
read.
2015-04-13 21:06:04 +02:00
beorn7 c563398c68 Remove obsolete debug message. 2015-04-13 16:59:52 +02:00
beorn7 c5fa0b90c3 Fix the case where a series in memory has 0 chunks, but chunks on disk.
This is actually completely normal for a freshly unarchived series.

Test added to expose.
2015-04-09 15:57:11 +02:00
Björn Rabenstein d8e515e9cb Merge pull request #617 from prometheus/influxdb-write-support
Add experimental InfluxDB write support.
2015-04-07 13:23:06 +02:00
Julius Volz 593e565688 Allow writing to InfluxDB/OpenTSDB at the same time. 2015-04-02 20:24:38 +02:00
beorn7 3035b8bfdd Adaptively reduce the wait time for memory series maintenance.
This will make in-memory series maintenance the faster the more chunks
are waiting for persistence.
2015-04-01 17:52:03 +02:00
Julius Volz 61fb688dd9 Add experimental InfluxDB write support. 2015-04-01 02:03:16 +02:00
beorn7 fbc44d8f95 Add benchmark for loading chunks and chunk descs. 2015-03-19 19:28:21 +01:00
beorn7 6a21f73898 Fixes after review. 2015-03-19 17:54:59 +01:00
beorn7 51d35f4481 Instrument series maintenance durations. 2015-03-19 17:06:16 +01:00
beorn7 12ae6e9203 Increase resilience of the storage against data corruption - step 4.
Step 4: Add a configurable sync'ing of series files after modification.
2015-03-19 15:58:02 +01:00
beorn7 11bd9ce1bd Increase resilience of the storage against data corruption - step 3.
Step 3: Remember the mtime of series files and make use of it to
detect series files that are not the one the checkpoint thinks they
are.
2015-03-19 15:44:11 +01:00
beorn7 e25cca823c Increase resilience of the storage against data corruption - step 2.
Step 2: Add a flag -storage.local.pedantic-checks to check every
series file.

Also, remove countPersistedHeadChunks channel, which is unused.
2015-03-19 12:06:15 +01:00
beorn7 3d8d8928be Increase resilience of the storage against data corruption - step 1.
Step 1: Admit the problem by turning the various "panic"s into logged
errors, followed by marking the persistence as dirty.
2015-03-19 11:49:18 +01:00
beorn7 da7c0461c6 Rename persist queue len/cap to num/max chunks to persist.
Remove deprecated flag storage.incoming-samples-queue-capacity.
2015-03-18 19:36:41 +01:00
beorn7 a075900f9a Merge branch 'beorn7/persistence' into beorn7/ingestion-tweaks 2015-03-18 19:09:31 +01:00
beorn7 1d8fc7d56f Change minor things after code review. 2015-03-18 19:09:07 +01:00
beorn7 be11cb2b07 Remove the sample ingestion channel.
The one central sample ingestion channel has caused a variety of
trouble. This commit removes it. Targets and rule evaluation call an
Append method directly now. To incorporate multiple storage backends
(like OpenTSDB), storage.Tee forks the Append into two different
appenders.

Note that the tsdb queue manager had its own queue anyway. It was a
queue after a queue... Much queue, so overhead...

Targets have their own little buffer (implemented as a channel) to
avoid stalling during an http scrape. But a new scrape will only be
started once the old one is fully ingested.

The contraption of three pipelined ingesters was removed. A Target is
an ingester itself now. Despite more logic in Target, things should be
less confusing now.

Also, remove lint and vet warnings in ast.go.
2015-03-15 14:08:22 +01:00
beorn7 0056eaeb4f Redesign series maintenance and chunk persistence. 2015-03-14 22:05:23 +01:00
beorn7 5bea942d8e Improve various things around chunk encoding.
A number of mostly minor things:

- Rename chunk type -> chunk encoding.

- After all, do not carry around the chunk encoding to all parts of
  the system, but just have one place where the encoding for new
  chunks is set based on the flag. The new approach has caveats as
  well, but the polution of so many method signatures is worse.

- Use the default chunk encoding for new chunks of existing
  series. (Previously, only new _series_ would get chunks with the
  default encoding.)

- Use an enum for chunk encoding. (But keep the version number for the
  flag, for reasons discussed previously.)

- Add encoding() to the chunk interface (so that a chunk knows its own
  encoding - no need to have that in a different top-level function).

- Got rid of newFollowUpChunk (which would keep the existing encoding
  for all chunks of a time series). Now only use newChunk(), which
  will create a chunk encoding according to the flag.

- Simplified transcodeAndAdd.

- Reordered methods of deltaEncodedChunk and doubleDeltaEncoded chunk
  to match the order in the chunk interface.

- Only transcode if the chunk is not yet half full. If more than half
  full, add a new chunk instead.
2015-03-14 19:03:20 +01:00
beorn7 9ecf93526d Sync the checkpoints.
Because that's what should be done with checkpoints.
2015-03-11 19:10:51 +01:00
beorn7 853f971540 Actually use double-delta encoding for transcoding. :-o 2015-03-11 16:52:58 +01:00
beorn7 23ba8a5516 Make floats exact again.
This should do the right thing for the old delta chunks, too.
2015-03-06 17:03:56 +01:00
beorn7 a8d4f8af9a Improve minor things after review.
The problem of float precision will be addressed in the next commit.
2015-03-06 12:53:00 +01:00
beorn7 13fcf1ddbc Implement double-delta encoded chunks. 2015-03-05 20:33:26 +01:00
beorn7 5ed8f6c205 Update persistQueueLength after chunks were persisted. 2015-03-04 18:46:16 +01:00
beorn7 0167083da6 Improvements after review. 2015-03-03 18:59:39 +01:00
beorn7 ebac14eff3 Add version guard to persistence. 2015-03-03 18:34:01 +01:00
Julius Volz 795704f0df Merge pull request #565 from fabxc/fabxc/labelmatcher_test
Tests for retrieving fingerprints for label matchers added.
2015-02-27 14:52:37 +01:00
Fabian Reinartz 4bff5d29bf Add tests for retrieving fingerprints for label matchers.
This checks for the basic behaviour of GetFingerprintsForLabelMatchers, that is, whether the different matcher types filter the correct fingerprints and intersections are correct.
2015-02-27 14:41:43 +01:00
beorn7 92991026bb Fix chunkDescsTotal count in case of errors.
Only increment the counter if we actually add the memory series to the
fingerprintToSeries map.
2015-02-27 02:21:12 +01:00
beorn7 1db7589081 Reduce the capacity of countPersistedHeadChunks.
The capacity is basically how many persisted head chunks we will count
at most while doing other things, in particular checkpointing. To
limit the amount of already counted head chunks, keep this number low,
otherwise we will easily checkpoint too often if checkpoints take long
anyway.
2015-02-27 00:53:52 +01:00
beorn7 9406afad72 Do not double-count non-persisted head chunks on loading. 2015-02-27 00:06:16 +01:00
beorn7 dbc22b972c Check last time in head chunk for head chunk timeout, not first. 2015-02-26 23:40:42 +01:00
beorn7 edd716e63c Fix the embarrassing bug introduced in commit 0851945.
In that commit, the 'maintainSeries' call was accidentally removed.

This commit refactors things a bit so that there is now a clean
'maintainMemorySeries' and a 'maintainArchivedSeries' call.

Straighten the nomenclature a bit (consistently use 'drop' for
chunks and 'purge' for series/metrics).

Remove the annoying 'Completed maintenance sweep through archived
fingerprints' message if there were no archived fingerprints to do
maintenance on.
2015-02-26 18:30:33 +01:00
beorn7 af91fb8e31 Improve persisting chunks to disk.
This is done by bucketing chunks by fingerprint. If the persisting to
disk falls behind, more and more chunks are in the queue. As soon as
there are "double hits", we will now persist both chunks in one go,
doubling the disk throughput (assuming it is limited by disk
seeks). Should even more pile up so that we end wit "triple hits", we
will persist those first, and so on.

Even if we have millions of time series, this will still help,
assuming not all of them are growing with the same speed. Series that
get many samples and/or are not very compressable will accumulate
chunks faster, and they will soon get double- or triple-writes.

To improve the chance of double writes,
-storage.local.persistence-queue-capacity could be set to a higher
value. However, that will slow down shutdown a lot (as the queue has
to be worked through). So we leave it to the user to set it to a
really high value. A more fundamental solution would be to checkpoint
not only head chunks, but also chunks still in the persist queue. That
would be quite complicated for a rather limited use-case (running many
time series with high ingestion rate on slow spinning disks).
2015-02-17 16:02:09 +01:00
beorn7 e22f26bc58 Move to a queue model for appending samples after all.
Starting a goroutine takes 1-2µs on my laptop. From the "numbers every
Go programmer should know", I had 300ns for a channel send in my
mind. Turns out, on my laptop, it takes only 60ns. That's fast enough
to warrant the machinery of yet another channel with a fixed set of
worker goroutines feeding from it. The number chosen (8 for now) is
low enough to not really afflict a measurable overhead (a big
Prometheus server has >1000 goroutines running), but high enough to
not make sample ingestion a bottleneck.
2015-02-13 14:26:54 +01:00
beorn7 fe518fdb28 Simplify AppendSamples by allowing it to be goroutine-unsafe. 2015-02-13 12:13:22 +01:00
beorn7 8a1c195b54 Move emptiness check to the receivers. 2015-02-12 19:47:24 +01:00
beorn7 5d3cd65a5d Improve performance of ingestion.
- Parallelize AppendSamples as much as possible without breaking the
  contract about temporal order.

- Allocate more fingerprint locker slots.

- Do not run early checkpoints if we are behind on chunk persistence.

- Increase fpMinWaitDuration to give the disk more time for more
  important things.

Also, switch math.MaxInt64 and math.MinInt64 to the new constants.
2015-02-12 18:12:37 +01:00
beorn7 d2ab49c396 Make the persist queue length configurable.
Also, set a much higher default value.

Chunk persist requests can be quite spiky. If you collect a large
number of time series that are very similar, they will tend to finish
up a chunk at about the same time. There is no reason we need to back
up scraping just because of that. The rationale of the new default
value is "1/8 of the chunks in memory".
2015-02-06 14:54:53 +01:00
Julius Volz 9412b296d5 Remove labels on persist error counter.
This fixes https://github.com/prometheus/prometheus/issues/496
2015-02-01 14:03:34 +01:00
Bjoern Rabenstein 3948e2a7f8 Move lost files to an "orphaned" directory.
Previously, those were simply deleted. The orphaned files can now be
used for forensics if needed.
2015-01-29 14:52:12 +01:00
Bjoern Rabenstein c24bfdf701 Move crash related code into separate file.
persistence.go is way too long anyway, and a lot of code is just crash
recovery, which is not important to understand the normal operation.

Also, remove unused `exists` function.
2015-01-29 13:13:16 +01:00
Bjoern Rabenstein ab386d1f5d Declare storage.local.index-cache-size.* default values as tweaked. 2015-01-29 13:04:54 +01:00
Bjoern Rabenstein 73f6dc4d44 Make KeyValueStore.Delete report if the key to delete was found.
Previously, it would return an error instead. Now we can distinguish
the cases 'error while deleting known key' vs. 'key not in index'
without testing for leveldb-internal kinds of errors.
2015-01-29 12:57:50 +01:00
Bjoern Rabenstein 2c8d324ca4 Remove check that did not check anything. 2015-01-26 13:48:24 +01:00
Julius Volz d4374a9265 More efficient JSON query result format.
This depends on https://github.com/prometheus/client_golang/pull/51.

For vectors, the result format looks like this:

```json
{
   "version": 1,
   "type" : "vector",
   "value" : [
      {
         "timestamp" : 1421765411.045,
         "value" : "65.475000",
         "metric" : {
            "quantile" : "0.5",
            "instance" : "http://localhost:9090/metrics",
            "job" : "prometheus",
            "__name__" : "http_request_duration_microseconds",
            "handler" : "/static/",
            "method" : "get",
            "code" : "304"
         }
      },
      {
         "timestamp" : 1421765411.045,
         "value" : "5826.339000",
         "metric" : {
            "quantile" : "0.9",
            "instance" : "http://localhost:9090/metrics",
            "job" : "prometheus",
            "__name__" : "http_request_duration_microseconds",
            "handler" : "prometheus",
            "method" : "get",
            "code" : "200"
         }
      },
      /* ... */
   ]
}
```

For matrices, it looks like this:

```json
{
   "version": 1,
   "type" : "matrix",
   "value" : [
      {
         "metric" : {
            "quantile" : "0.99",
            "instance" : "http://localhost:9090/metrics",
            "job" : "prometheus",
            "__name__" : "http_request_duration_microseconds",
            "handler" : "/static/",
            "method" : "get",
            "code" : "200"
         },
         "values" : [
            [
               1421765547.659,
               "29162.953000"
            ],
            [
               1421765548.659,
               "29162.953000"
            ],
            [
               1421765549.659,
               "29162.953000"
            ],
            /* ... */
         ]
      }
   ]
}
```
2015-01-26 13:06:22 +01:00
Bjoern Rabenstein 2c8fdcbc23 Remove a deadlock during shutdown.
If queries are still running when the shutdown is initiated, they will
finish _during_ the shutdown. In that case, they might request chunk
eviction upon unpinning their pinned chunks. That might completely
fill the evict request queue _after_ draining it during storage
shutdown. If that ever happens (which is the case if there are _many_
queries still running during shutdown), the affected queries will be
stuck while keeping a fingerprint locked. The checkpointing can then
not process that fingerprint (or one that shares the same lock). And
then we are deadlocked.
2015-01-22 14:42:15 +01:00
Bjoern Rabenstein 5859b74f1b Clean up license issues.
- Move CONTRIBUTORS.md to the more common AUTHORS.
- Added the required NOTICE file.
- Changed "Prometheus Team" to "The Prometheus Authors".
- Reverted the erroneous changes to the Apache License.
2015-01-21 20:07:45 +01:00
Bjoern Rabenstein f298af5756 Use named returns in flock.New. 2015-01-19 14:31:16 +01:00
Bjoern Rabenstein baca6faa1c Add double-start protection.
This mimics the locking leveldb is performing anyway. Advantages of
doing it separately:

- Should we ever replace the leveldb implementation by one without
  double-start protection, we are still good.

- In contrast to leveldb, the new code creates a meaningful error
  message.
2015-01-14 17:13:42 +01:00
Bjoern Rabenstein ae70eac97d Adjust the partitioning by outcome. 2015-01-13 18:34:56 +01:00
Julius Volz a6bc42bc61 Minor formatting/spelling fixups. 2015-01-09 11:04:20 +01:00
Bjoern Rabenstein 0851945054 Add a heuristics to checkpoint early if there are many "dirty" series.. 2015-01-08 20:15:58 +01:00
Bjoern Rabenstein 622e8350cd Fix a bug handling freshly unarchived series.
Usually, if you unarchive a series, it is to add something to it,
which will create a new head chunk. However, if a series in
unarchived, and before anything is added to it, it is handled by the
maintenance loop, it will be archived again. In that case, we have to
load the chunkDescs to know the lastTime of the series to be
archived. Usually, this case will happen only rarely (as a race, has
never happened so far, possibly because the locking around unarchiving
and the subsequent sample append is smart enough). However, during
crash recovery, we sometimes treat series as "freshly unarchived"
without directly appending a sample. We might add more cases of that
type later, so better deal with archiving properly and load chunkDescs
if required.
2015-01-08 16:25:50 +01:00
Bjoern Rabenstein eb932d1524 Remove a deadlock during shutdown. 2015-01-07 19:02:38 +01:00
Brian Brazil e56786b221 Have scrape time as a pseudovariable, not a prometheus variable.
This ensures it has the right timestamp, and is easier to work with.

Switch sd variable away from 'outcome', using total/failed instead.
2014-12-27 00:39:33 +00:00
Bjoern Rabenstein ff24070a03 Fix embarrassing bug in crash recovery.
(And yes, we always knew we need tests for that. I have added a TODO now.)

Change-Id: I9cf52bbf98e263e0b79404bda4c442beba9696a8
2014-12-17 17:18:04 +01:00
Julius Volz c9618d11e8 Introduce copy-on-write for metrics in AST.
This depends on changes in:

https://github.com/prometheus/client_golang/tree/cow-metrics.

Change-Id: I80b94833a60ddf954c7cd92fd2cfbebd8dd46142
2014-12-12 20:34:55 +01:00
Bjoern Rabenstein afd864e7f4 Adjust to the new version of goleveldb.
(And yes, we do want vendoring for that... This is just the quick fix.)

Change-Id: I9d347a64d96de6b3390a0e35c8d466f14bb83e4e
2014-12-10 18:04:29 +01:00
Bjoern Rabenstein fee88a7a77 Remove the remaining races, new and old.
Also, resolve a few other TODOs.

Change-Id: Icb39b5a5e8ca22ebcb48771cd8951c5d9e112691
2014-12-03 18:07:23 +01:00
Bjoern Rabenstein 66c80b5ebd Fix typo.
Change-Id: I72608c7841c00145458807d3c3ee29db7b5ac2bc
2014-11-28 12:50:19 +01:00
Bjoern Rabenstein 674624f1c8 Completed more TODOs.
- Documented checkpoint file format.
- High-level description of series sanitation.
- Replace fp.LoadFromString panic with an error.
  (Change in client_golang already submitted.)
- Introduced checks for series file size where appropriate.
- Removed two Law of Demeter violations.

Change-Id: I555d97a2c8f4769820c2fc8bf5d6f4e160222abc
2014-11-27 20:46:45 +01:00
Bjoern Rabenstein 7d11019aa2 Squash a few trivial TODOs.
- Delete unneeded file view_adapter.go.
- Assessed that we still need the fingerprints in nodes
  (to create iterators).
- Turned numMemChunkDescs into a metric.

Change-Id: I29be963c795a075ec00c095f76bf26405535609d
2014-11-27 18:26:06 +01:00
Bjoern Rabenstein 49683c0c20 Avoid test flags in normal binary.
Change-Id: If1fba813a73bf93ea5918dcda326e3ffa81a797d
2014-11-27 18:04:48 +01:00
Bjoern Rabenstein 9bc05052ad Add line that has mysteriously disappeared after rebase.
Change-Id: I3612eb0b626e66e607b363e9801f187d2ba637a3
2014-11-25 17:15:56 +01:00
Bjoern Rabenstein 14bda4180c Changes after pair code review.
Change-Id: Ib72d40f8e9027818cfbbd32a7a7201eebda07455
2014-11-25 17:12:59 +01:00
Bjoern Rabenstein 9ea808cd8b Remove debug log line.
Change-Id: Icdd2351b89f2d37ac2b615f9cf872e054c694ad1
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein bb42cc2e2d Evict based on memory pressure. Evict recently used chunks last.
Change-Id: Ie6168f0cdb3917bdc63b6fe15585dd70c1e42afe
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein e23ee0f7cc Fix race in test.
Change-Id: I53e1a4c5a6b5f846acd76043166b6cb7bf7d5dc7
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein d73e851b14 Tweak timing in the maintenance loop.
Change-Id: I9801c4f9a22c3b3dc1ce1af81fdd9e992a4f4dd7
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein 2672aa8ece Instrument series maintenance.
Change-Id: Ie4269d07ad4d23d44230c95a523088b472718e54
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein 74c143c4c9 Improve scraper shutdown time.
- Stop target pools in parallel.
- Stop individual scrapers in goroutines, too.
- Timing tweaks.

Change-Id: I9dff1ee18616694f14b04408eaf1625d0f989696
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein 3f61d304ce Reorganize maintenance loop.
Change-Id: Iac10f988ba3e93ffb188f49c30f92e0b6adce5a3
2014-11-25 17:10:30 +01:00
Bjoern Rabenstein c087ee35f7 Remove archiveMtx.
Change-Id: Ie8019f860bbda68621f74380c90a4e57930d3d7a
2014-11-25 17:10:30 +01:00
Bjoern Rabenstein 7af42eda65 Optimize purging.
Now only purge if there is something to purge.
Also, set savedFirstTime and archived time range appropriately.
(Which is needed for the optimization.)

Change-Id: Idcd33319a84def3ce0318d886f10c6800369e7f9
2014-11-25 17:10:30 +01:00
Bjoern Rabenstein 33b959b898 Persist savedFirstTime in checkpoint.
Change-Id: Ibdfdea16fad0608ec104fbccc749e824a171f227
2014-11-25 17:10:30 +01:00
Bjoern Rabenstein 904acd43da Add crash recovery.
Fix the behavior if preload for non-existent series is requested.

Instead of returning an error (which triggers a panic further up),
simply count those incidents. They can happen regularly, we just want
to know if they happen too frequently because that would mean the
indexing is behind or broken.

Change-Id: I4b2d1b93c4146eeea897d188063cb9574a270f8b
2014-11-25 17:09:43 +01:00
Bjoern Rabenstein 7a9efc9c59 Fix typo in test.
Change-Id: I3c2fd76bc5f50446c58f8ef693d9c6595197feaa
2014-11-25 17:09:43 +01:00
Bjoern Rabenstein 4efc60174b Tweak and verify a few parameters.
Remove TODOs accordingly.

Change-Id: Ic062e13b6ae89a9135d3f14011114fe1cca1cef8
2014-11-25 17:09:43 +01:00
Bjoern Rabenstein 5f8e9617ef Add more tests.
Add an end-to-end fuzz and race test.

Fix a race exposed by the above.

Change-Id: Ifaa39a90cefbde8d4c29bda197cc92592ded21bb
2014-11-25 17:09:17 +01:00
Bjoern Rabenstein d215e013b7 Fix the weird chunkDesc shuffling bug.
The root cause was that after chunkDesc eviction, the offset between
memory representation of chunk layout (via chunkDescs in memory) was
shiftet against chunks as layed out on disk. Keeping the offset up to
date is by no means trivial, so this commit is pretty involved.

Also, found a race that for some reason didn't bite us so far:
Persisting chunks was completel unlocked, so if chunks were purged on
disk at the same time, disaster would strike. However, locking the
persisting of chunk revealed interesting dead locks. Basically, never
queue under the fp lock.

Change-Id: I1ea9e4e71024cabbc1f9601b28e74db0c5c55db8
2014-11-25 17:09:17 +01:00
Bjoern Rabenstein a617269b12 Avoid unnecessary cloning of the head chunk.
Change-Id: I5da774515d5493166a197b5814d0a720628cfaff
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein f1de5b0c4e Run checkpointing of in-memory metrics and head chunks periodically.
Checkpointing interval is now a command line flag.

Along the way, several things were refactored.
- Restructure the way the storage is started and stopped..
- Number of series in checkpoint is now a uint64, not a varint.
  (Breaks old checkpoints, needs wipe!)
- More consistent naming and order of methods.

Change-Id: I883d9170c9a608ee716bb0ab3d0ded8ca03760d9
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 74c9b34a5e Improve storage instrumentation even more.
Add gauge for chunks and chunkdescs in memory (backed by a global
variable to be used later not only for instrumentation but also for
memory management).

Refactored instrumentation code once more (instrumentation.go is back :).

Change-Id: Ife39947e22a48cac4982db7369c231947f446e17
2014-11-25 17:09:04 +01:00
Julius Volz c3fcea45e3 Support finer time resolutions than 1 second.
Change-Id: I4c5f1d6d2361e841999b23283d1961b1bd0c2859
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 443dd33805 Improve instrumentation in storage.
Also, fix some other minor bugs.

Change-Id: If72f1c058b0f47d3e378fdf80228d7e9a8db06c7
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 1936a40e75 Minor loging improvement.
Change-Id: I7875d1a58ef9c5ff149f18e36f65959a4712fea2
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 192bf52c41 Evict chunkDescs, too.
Change-Id: I8b70f22fbf1dfcbc49f9ec391985144649e6ce9c
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 95f392fb2c Prevent an indexing death spiral.
Change-Id: I86b20cd0830d02f87b2f020767257e2d3fb2033c
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 40354eaa29 Reduce directory depth by one.
Change-Id: I7f89df61135ff19169ed97633a662685d414c448
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 096fa0f8b2 Squash a number of TODOs.
- Staleness delta is no a proper function parameter and not replicated
  from package ast.

- Named type 'chunks' replaced by explicit '[]chunk' to avoid confusion.

- For the same reason, replaced 'chunkDescs' by '[]*chunkDescs'.

- Verified that math.Modf is not a speed enhancement over conversion
  (actually 5x slower).

- Renamed firstTimeField, lastTimeField into chunkFirstTime and
  chunkLastTime.

- Verified unpin() is sufficiently goroutine-safe.

- Decided not to update archivedFingerprintToTimeRange upon series
  truncation and added a rationale why.

Change-Id: I863b8d785e5ad9f71eb63e229845eacf1bed8534
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 427c8d53a5 Fix handling of empty chunkDescs while preloading chunks.
Change-Id: I73ce89fe0ef90c6eda78218e5be2cbfa0207c364
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein ecee5d8281 Fix head chunk persisting and a chunkDesc race condition.
- Head chunk persisting only happens in evictOlderThan, so do it
  there.  (With the previous code, it would never happen.)

- Raw accesses to chunkDesc.chunk are now done via isEvicted (with
  locking).

Change-Id: I48b07b56dfea4899b50df159b4ea566954396fcd
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 6b37e47f9e Remove unused metrics.
Change-Id: Icf03ba4ce92a5e38daf12930f9661daba79c83bb
2014-11-25 17:09:03 +01:00
Bjoern Rabenstein 2b4ff620aa Return a nop iterator for series that have been purged completely.
Change-Id: I6e92cac4472486feefdecba8593c17867e8c710d
2014-11-25 17:09:03 +01:00
Bjoern Rabenstein 6e3a366f91 Only archive a time series when none of its chunks is pinned.
Change-Id: I7e4b67c34b417b8980173bc5dc3b213bd7d698e5
2014-11-25 17:09:03 +01:00
Julius Volz bfa64248b7 Deal with missing series in preloading.
Change-Id: Ibf3a57b329f40a3d5e0b98464a2f45d2f1bd07bf
2014-11-25 17:09:03 +01:00
Bjoern Rabenstein ca42a22e20 Add safety panic to seriesMap.put.
Change-Id: I4d4d2e45cc0f908a33eb1ae6e3ee6796adfcbd1e
2014-11-25 17:09:03 +01:00
Bjoern Rabenstein 83b4fa868d Fix GetBoundaryValues.
Change-Id: I8f8bbdb88e9b24e4c37ff869126ed9343f261ce2
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein b3ed9aa7a2 Clean up start-up and shut-down.
Change-Id: Idff4bbb0a15a9f879bfbb3da5b1025179cab5e2c
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein 4447708c9f Fix a race in target.go.
Also, fix problems in shutdown.
Starting serving and shutdown still has to be cleaned up properly.
It's a mess.

Change-Id: I51061db12064e434066446e6fceac32741c4f84c
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein fd6600850a Fix race in chunkDesc.
Change-Id: Id7bae115d75886e10d44184a690a76777b1531fe
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein 1c53c09558 Treat empty chunkDescs properly in preloadChunksForRange.
Change-Id: Ida1bd3fe1f9fb0ea2d5dbb9704be926f0824f873
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein 934d09f738 Fix race during shutdown.
Change-Id: I2f8bf48d92a14f1e5ecde27c1b138734d7653394
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein 38fc24d0ed Fix targetpool_test.go and other tests.
Change-Id: I91a4dd1d39e01f174e1aaae653ce1ed7aecaa624
2014-11-25 17:08:26 +01:00
Julius Volz 7f5d3c2c29 Fix and improve the fp locker.
Benchmark:
$ go test -bench 'Fingerprint' -test.run 'Fingerprint' -test.cpu=1,2,4

OLD
BenchmarkFingerprintLockerParallel        500000              3618 ns/op
BenchmarkFingerprintLockerParallel-2      100000             12257 ns/op
BenchmarkFingerprintLockerParallel-4      500000             10164 ns/op
BenchmarkFingerprintLockerSerial        10000000               283 ns/op
BenchmarkFingerprintLockerSerial-2      10000000               284 ns/op
BenchmarkFingerprintLockerSerial-4      10000000               288 ns/op

NEW
BenchmarkFingerprintLockerParallel       1000000              1018 ns/op
BenchmarkFingerprintLockerParallel-2     1000000              1164 ns/op
BenchmarkFingerprintLockerParallel-4     2000000               910 ns/op
BenchmarkFingerprintLockerSerial        50000000                56.0 ns/op
BenchmarkFingerprintLockerSerial-2      50000000                47.9 ns/op
BenchmarkFingerprintLockerSerial-4      50000000                54.5 ns/op

Change-Id: I3c65a43822840e7e64c3c3cfe759e1de51272581
2014-11-25 17:07:45 +01:00
Bjoern Rabenstein 7ad55ef83c Actually close the iterator channels.
Change-Id: I6f6a2aef5ff55c6b2d21ad91d02ae6b0ecba4ae8
2014-11-25 17:07:45 +01:00
Bjoern Rabenstein 8fba3302bc Bold changes to concurrency.
(WIP. Probably doesn't work yet.)

Change-Id: Id1537dfcca53831a1d428078a5863ece7bdf4875
2014-11-25 17:07:45 +01:00
Bjoern Rabenstein fcdf5a8ee7 Fix bugs in chunk evict code.
Also, simplify code by re-looking up metric in metric map.

Change-Id: Ib2092f9184374e5a543e87d3a9f4a74fda64b193
2014-11-25 17:07:45 +01:00
Bjoern Rabenstein 7e6a03fbf9 Fix a few concurrency issues before starting to use the new fp locker.
Change-Id: I8615e8816e79ef0882e123163ee590c739b79d12
2014-11-25 17:07:45 +01:00
Julius Volz db92620163 Instrument eviction and purge durations.
Change-Id: Ia5b2319363ad2644674c9b7a94162a89bcc296fb
2014-11-25 17:07:45 +01:00