Commit graph

761 commits

Author SHA1 Message Date
Björn Rabenstein 50e4f49b7e Merge pull request #2561 from prometheus/beorn7/storage2
storage: Evict unused chunk.Descs in crash recovery
2017-04-04 00:05:03 +02:00
beorn7 08fc6cbd39 storage: Evict unused chunk.Descs in crash recovery
This is in line with the v1.5 change in paradigm to not keep
chunk.Descs without chunks around after a series maintenance.

It's mainly motivated by avoiding excessive amounts of RAM usage
during crash recovery.

The code avoids to create memory time series with zero chunk.Descs as
that is prone to trigger weird effects. (Series maintenance would
archive series with zero chunk.Descs, but we cannot do that here
because the archive indices still have to be checked.)
2017-04-04 00:04:22 +02:00
Björn Rabenstein 1c6240fc40 Merge pull request #2559 from prometheus/beorn7/storage
storage: Replace fpIter by sortedFPs
2017-04-03 16:56:21 +02:00
beorn7 d284ffab03 storage: Replace fpIter by sortedFPs
The fpIter was kind of cumbersome to use and required a lock for each
iteration (which wasn't even needed for the iteration at startup after
loading the checkpoint).

The new implementation here has an obvious penalty in memory, but it's
only 8 byte per series, so 80MiB for a beefy server with 10M memory
time series (which would probably need ~100GiB RAM, so the memory
penalty is only 0.1% of the total memory need).

The big advantage is that now series maintenance happens in order,
which leads to the time between two maintenances of the same series
being less random. Ideally, after each maintenance, the next
maintenance would tackle the series with the largest number of
non-persisted chunks. That would be quite an effort to find out or
track, but with the approach here, the next maintenance will tackle
the series whose previous maintenance is longest ago, which is a good
approximation.

While this commit won't change the _average_ number of chunks
persisted per maintenance, it will reduce the mean time a given chunk
has to wait for its persistence and thus reduce the steady-state
number of chunks waiting for persistence.

Also, the map iteration in Go is non-deterministic but not truly
random. In practice, the iteration appears to be somewhat "bucketed".
You can often observe a bunch of series with similar duration since
their last maintenance, i.e. you see batches of series with similar
number of chunks persisted per maintenance. If that batch is
relatively young, a whole lot of series are maintained with very few
chunks to persist. (See screenshot in PR for a better explanation.)
2017-04-03 15:34:46 +02:00
Tobias Schmidt eac36d123e Fix unstable fanin test (#2558) 2017-04-03 13:02:15 +02:00
Julius Volz 5a896033e3 Add remote read external label handling (#2555)
* Add remote read external label handling

This implements rule 1 and 2 from
https://docs.google.com/document/d/188YauRgfF0J4CYMigLsVNN34V_kUwKnApBs2dQMfBbs/edit

* Use more descriptive example labels in read test

* Add comment for querier.addExternalLabels()

* Make argument naming in removeLabels() more generic
2017-04-02 17:48:15 +02:00
Björn Rabenstein e63d079b59 Merge pull request #2527 from prometheus/beorn7/storage
storage: Evict chunks and calculate persistence pressure...
2017-03-27 14:49:42 +02:00
Julius Volz b5b0e00923 Merge pull request #2499 from prometheus/remote-read
Remote Read
2017-03-27 14:43:44 +02:00
beorn7 434ab2a6a3 storage: Evict chunks and calculate persistence pressure based on target heap size
This is a fairly easy attempt to dynamically evict chunks based on the
heap size. A target heap size has to be set as a command line flage,
so that users can essentially say "utilize 4GiB of RAM, and please
don't OOM".

The -storage.local.max-chunks-to-persist and
-storage.local.memory-chunks flags are deprecated by this
change. Backwards compatibility is provided by ignoring
-storage.local.max-chunks-to-persist and use
-storage.local.memory-chunks to set the new
-storage.local.target-heap-size to a reasonable (and conservative)
value (both with a warning).

This also makes the metrics intstrumentation more consistent (in
naming and implementation) and cleans up a few quirks in the tests.

Answers to anticipated comments:

There is a chance that Go 1.9 will allow programs better control over
the Go memory management. I don't expect those changes to be in
contradiction with the approach here, but I do expect them to
complement them and allow them to be more precise and controlled. In
any case, once those Go changes are available, this code has to be
revisted.

One might be tempted to let the user specify an estimated value for
the RSS usage, and then internall set a target heap size of a certain
fraction of that. (In my experience, 2/3 is a fairly safe bet.)
However, investigations have shown that RSS size and its relation to
the heap size is really really complicated. It depends on so many
factors that I wouldn't even start listing them in a commit
description. It depends on many circumstances and not at least on the
risk trade-off of each individual user between RAM utilization and
probability of OOMing during a RAM usage peak. To not add even more to
the confusion, we need to stick to the well-defined number we also use
in the targeting here, the sum of the sizes of heap objects.
2017-03-27 14:33:50 +02:00
beorn7 96a303b348 storage: Use staleness delta as head chunk timeout
Currently, if a series stops to exist, its head chunk will be kept
open for an hour. That prevents it from being persisted. Which
prevents it from being evicted. Which prevents the series from being
archived.

Most of the time, once no sample has been added to a series within the
staleness limit, we can be pretty confident that this series will not
receive samples anymore. The whole chain as described above can be
started after 5m instead of 1h. In the relaxed case, this doesn't
change a lot as the head chunk timeout is only checked during series
maintenance, and usually, a series is only maintained every six
hours. However, there is the typical scenario where a large service is
deployed, the deoply turns out to be bad, and then it is deployed
again within minutes, and quite quickly the number of time series has
tripled. That's the point where the Prometheus server is stressed and
switches (rightfully) into rushed mode. In that mode, time series are
processed as quickly as possible, but all of that is in vein if all of
those recently ended time series cannot be persisted yet for another
hour. In that scenario, this change will help most, and it's exactly
the scenario where help is most desperately needed.
2017-03-26 23:44:50 +02:00
Julius Volz 3f23aa2cc7 Add headers to indicate remote read/write version
Also add Content-Type header.
2017-03-24 17:39:51 +01:00
Julius Volz 8fda83ea12 Make rules only read local data 2017-03-21 00:50:04 +01:00
Julius Volz 94acd3f1d8 Add fanin tests and fix uncovered bugs 2017-03-21 00:08:17 +01:00
Julius Volz 9b33cfc457 Fix/unify context-based remote storage timeouts 2017-03-20 14:17:06 +01:00
Julius Volz 815762a4ad Move retrieval.NewHTTPClient -> httputil.NewClientFromConfig 2017-03-20 14:17:04 +01:00
Julius Volz eb14678a25 Make remote read/write use config.HTTPClientConfig 2017-03-20 13:37:50 +01:00
Julius Volz 406b65d0dc Rename remote.Storage to remote.Writer 2017-03-20 13:15:28 +01:00
Julius Volz 02395a224d [WIP] Remote Read 2017-03-20 13:13:44 +01:00
Julius Volz 40e41a4776 Merge pull request #2494 from tomwilkie/remote-write-sharding
Dynamically reshard the QueueManager based on observed load.
2017-03-20 12:45:17 +01:00
beorn7 48d221c11e storage: Fix typo in comment 2017-03-16 11:49:41 +01:00
Tom Wilkie 75bb0f3253 Review feedback 2017-03-13 21:24:49 +00:00
Tom Wilkie 77cce900b8 Fix tests 2017-03-13 15:21:59 +00:00
Tom Wilkie b48799a01e Add license stanza 2017-03-13 14:50:15 +00:00
Tom Wilkie 9d22f030cf Dynamically reshard the QueueManager based on observed load. 2017-03-13 14:41:16 +00:00
Tom Wilkie 1ab893c6ec Limit 'discarding sample' logs to 1 every 10s (#2446)
* Limit 'discarding sample' logs to 1 every 10s

* Include the vendored library

* Review feedback
2017-02-23 19:20:39 +01:00
Julius Volz 2f39dbc8b3 Rename StorageQueueManager -> QueueManager 2017-02-21 21:45:43 +01:00
Julius Volz e9476b35d5 Re-add multiple remote writers
Each remote write endpoint gets its own set of relabeling rules.

This is based on the (yet-to-be-merged)
https://github.com/prometheus/prometheus/pull/2419, which removes legacy
remote write implementations.
2017-02-20 13:23:12 +01:00
Björn Rabenstein 089dc1076b Merge pull request #2435 from jmeulemans/open-chunks-gauge
Adding gauge for number of open head chunks.
2017-02-17 16:02:06 +01:00
Jeremy Meulemans 025c828976 Changed to open_head_chunks to address review.
Now incrementing numHeadChunks directly.
2017-02-17 07:10:13 -06:00
Jeremy Meulemans 074050b8c0 Updating for failed codeclimate check. 2017-02-16 18:04:28 -06:00
Jeremy Meulemans f70b52d0b6 Adding gauge for number of open head chunks.
Fixes #1710
2017-02-16 17:56:45 -06:00
Julius Volz beb3c4b389 Remove legacy remote storage implementations
This removes legacy support for specific remote storage systems in favor
of only offering the generic remote write protocol. An example bridge
application that translates from the generic protocol to each of those
legacy backends is still provided at:

documentation/examples/remote_storage/remote_storage_bridge

See also https://github.com/prometheus/prometheus/issues/10

The next step in the plan is to re-add support for multiple remote
storages.
2017-02-14 17:52:05 +01:00
beorn7 d771185a43 storage: Fix chunkIndexToStartSeek calculation
With a high enough shrink ratio and enough chunks to persist, the
cutoff point could be _outside_ of the file, which wreaks havoc in the
storage.
2017-02-10 11:42:59 +01:00
beorn7 73bd5e4dff Merge branch 'beorn7/storage' into beorn7/storage3 2017-02-09 14:44:10 +01:00
beorn7 46a0837816 storage: Fix offset returned by dropAndPersistChunks
This is another corner-case that was previously never exercised
because the rewriting of a series file was never prevented by the
shrink ratio.

Scenario: There is an existing series on disk, which is archived. If a
new sample comes in for that file, a new chunk in memory is created,
and the chunkDescsOffset is set to -1. If series maintenance happens
before the series has at least one chunk to persist _and_ an
insufficient chunks on disk is old enough for purging (so that the
shrink ratio kicks in), dropAndPersistChunks would return 0, but it
should return the chunk length of the series file.
2017-02-09 14:35:07 +01:00
beorn7 9d12204da5 Merge branch 'release-1.5' 2017-02-09 13:11:53 +01:00
beorn7 bed4934224 storage: One more persist error code path discovered
Also, in that code path, set chunkDescsOffset to 0 rather than -1 in
case of "dropped more chunks from persistence than from memory" so
that no other weird things happen before the series is quarantined for
good.
2017-02-09 11:51:40 +01:00
beorn7 242d8edcb5 Merge branch 'release-1.5' 2017-02-08 17:28:09 +01:00
beorn7 8c8baaa558 storage: writeMemorySeries needs to return true for quarantined series
This is another fallout of my bug hunt.
2017-02-08 16:28:56 +01:00
Mitsuhiro Tanda be8b1eb656 storage: optimize dropping chunks by using minShrinkRatio (#2397)
storage: prevent unnecessary chunk header reading if minShrinkRatio > 0
2017-02-07 17:33:54 +01:00
beorn7 2363a90adc storage: Do not throw away fully persisted memory series in checkpointing 2017-02-06 17:39:59 +01:00
beorn7 244a65fb29 storage: Increase persist watermark before calling append
The append call may reuse cds, and thus change its len.
(In practice, this wouldn't happen as cds should have len==cap.
Still, the previous order of lines was problematic.)
2017-02-05 02:25:09 +01:00
beorn7 75282b27ba storage: Added checks for invariants 2017-02-04 23:40:22 +01:00
beorn7 31e9db7f0c storage: Simplify evictChunkDesc method 2017-02-04 22:29:37 +01:00
beorn7 65dc8f44d3 storage: Test for errors returned by MaybePopulateLastTime 2017-02-01 23:43:58 +01:00
beorn7 752fac60ae storage: Remove race condition from TestLoop 2017-02-01 23:43:58 +01:00
beorn7 4ccfc93dcf storage: Set shrink ratio in the constructor. 2017-02-01 15:37:16 +01:00
beorn7 b2f086c6c4 storage: Expose bug of not setting the shrink ratio in the contstructor 2017-02-01 15:37:10 +01:00
Brian Brazil c1b547a90e Only checkpoint chunkdescs and series that need persisting. (#2340)
This decreases checkpoint size by not checkpointing things
that don't actually need checkpointing.

This is fully compatible with the v2 checkpoint format,
as it makes series appear as though the only chunksdescs
in memory are those that need persisting.
2017-01-17 00:59:38 +00:00
Brian Brazil f64c231dad Allow checkpoints and maintenance to happen concurrently. (#2321)
This is essential on larger Prometheus servers, as otherwise
checkpoints prevent sufficient persisting of chunks to disk.
2017-01-13 17:24:19 +00:00