Commit graph

45 commits

Author SHA1 Message Date
beorn7 0e202dacb4 Streamline series iterator creation
This will fix issue #1035 and will also help to make issue #1264 less
bad.

The fundamental problem in the current code:

In the preload phase, we quite accurately determine which chunks will
be used for the query being executed. However, in the subsequent step
of creating series iterators, the created iterators are referencing
_all_ in-memory chunks in their series, even the un-pinned ones. In
iterator creation, we copy a pointer to each in-memory chunk of a
series into the iterator. While this creates a certain amount of
allocation churn, the worst thing about it is that copying the chunk
pointer out of the chunkDesc requires a mutex acquisition. (Remember
that the iterator will also reference un-pinned chunks, so we need to
acquire the mutex to protect against concurrent eviction.) The worst
case happens if a series doesn't even contain any relevant samples for
the query time range. We notice that during preloading but then we
will still create a series iterator for it. But even for series that
do contain relevant samples, the overhead is quite bad for instant
queries that retrieve a single sample from each series, but still go
through all the effort of series iterator creation. All of that is
particularly bad if a series has many in-memory chunks.

This commit addresses the problem from two sides:

First, it merges preloading and iterator creation into one step,
i.e. the preload call returns an iterator for exactly the preloaded
chunks.

Second, the required mutex acquisition in chunkDesc has been greatly
reduced. That was enabled by a side effect of the first step, which is
that the iterator is only referencing pinned chunks, so there is no
risk of concurrent eviction anymore, and chunks can be accessed
without mutex acquisition.

To simplify the code changes for the above, the long-planned change of
ValueAtTime to ValueAtOrBefore time was performed at the same
time. (It should have been done first, but it kind of accidentally
happened while I was in the middle of writing the series iterator
changes. Sorry for that.) So far, we actively filtered the up to two
values that were returned by ValueAtTime, i.e. we invested work to
retrieve up to two values, and then we invested more work to throw one
of them away.

The SeriesIterator.BoundaryValues method can be removed once #1401 is
fixed. But I really didn't want to load even more changes into this
PR.

Benchmarks:

The BenchmarkFuzz.* benchmarks run 83% faster (i.e. about six times
faster) and allocate 95% fewer bytes. The reason for that is that the
benchmark reads one sample after another from the time series and
creates a new series iterator for each sample read.

To find out how much these improvements matter in practice, I have
mirrored a beefy Prometheus server at SoundCloud that suffers from
both issues #1035 and #1264. To reach steady state that would be
comparable, the server needs to run for 15d. So far, it has run for
1d. The test server currently has only half as many memory time series
and 60% of the memory chunks the main server has. The 90th percentile
rule evaluation cycle time is ~11s on the main server and only ~3s on
the test server. However, these numbers might get much closer over
time.

In addition to performance improvements, this commit removes about 150
LOC.
2016-02-19 16:24:38 +01:00
Fabian Reinartz 59f1e722df Return error on sample appending 2016-02-02 14:01:44 +01:00
beorn7 ec08c9a391 Rework the way to communicate backpressure (AKA suspended ingestion)
This gives up on the idea to communicate throuh the Append() call (by
either not returning as it is now or returning an error as
suggested/explored elsewhere). Here I have added a Throttled() call,
which has the advantage that it can be called before a whole _batch_
of Append()'s. Scrapes will happen completely or not at all. Same for
rule group evaluations. That's a highly desired behavior (as discussed
elsewhere). The code is even simpler now as the whole ingestion buffer
could be removed.

Logging of throttled mode has been streamlined and will create at most
one message per minute.
2016-02-01 14:45:44 +01:00
beorn7 3f4d22e4c7 Update doc comment
This should have gone into a previous commit, but I forgot to save
this particular file.
2016-01-12 12:38:18 +01:00
Fabian Reinartz e061595352 Move COWMetric into storage/metric package 2015-08-25 11:59:07 +02:00
Fabian Reinartz 1535ef1457 Replace metric.SamplePair with model.SamplePair 2015-08-22 14:52:35 +02:00
Fabian Reinartz 438e232c9b Fix grouping of import blocks 2015-08-22 09:42:45 +02:00
Fabian Reinartz 306e8468a0 Switch from client_golang/model to common/model 2015-08-21 13:33:38 +02:00
beorn7 8c196c1028 Minor doc fixes. 2015-06-23 17:07:18 +02:00
Fabian Reinartz 6bfb4549a6 storage: add LastSamplePairForFingerprint method 2015-06-23 13:45:15 +02:00
Fabian Reinartz 5b91ea9b36 storage: improve label matching and allow unset matching.
Matching of empty labels now also matches metrics where the label
was not explicitly set to the empty string.
2015-06-22 15:33:44 +02:00
Fabian Reinartz 5c6c0e2faa Add storage method to delete time series 2015-06-01 21:23:32 +02:00
Fabian Reinartz aff01e29c3 Limit retrievable samples to retention window.
The storage does not delete data immediately after the retention period.
We don't want to retrieve this data as it causes artifacts.
2015-05-27 13:13:59 +02:00
beorn7 3b9c421a69 Weed out all the [Gg]et* method names.
The only exception is getNumChunksToPersist to avoid naming the struct
member numChunksToPersist in a weird way.
2015-05-20 19:13:06 +02:00
beorn7 81b190bf45 Remove locking from series iterator. Cache chunk iterators. 2015-05-20 16:19:34 +02:00
Fabian Reinartz d8440d75f1 Do not start storage processing before Start() is called. 2015-05-19 13:51:45 +02:00
beorn7 be11cb2b07 Remove the sample ingestion channel.
The one central sample ingestion channel has caused a variety of
trouble. This commit removes it. Targets and rule evaluation call an
Append method directly now. To incorporate multiple storage backends
(like OpenTSDB), storage.Tee forks the Append into two different
appenders.

Note that the tsdb queue manager had its own queue anyway. It was a
queue after a queue... Much queue, so overhead...

Targets have their own little buffer (implemented as a channel) to
avoid stalling during an http scrape. But a new scrape will only be
started once the old one is fully ingested.

The contraption of three pipelined ingesters was removed. A Target is
an ingester itself now. Despite more logic in Target, things should be
less confusing now.

Also, remove lint and vet warnings in ast.go.
2015-03-15 14:08:22 +01:00
beorn7 fe518fdb28 Simplify AppendSamples by allowing it to be goroutine-unsafe. 2015-02-13 12:13:22 +01:00
beorn7 5d3cd65a5d Improve performance of ingestion.
- Parallelize AppendSamples as much as possible without breaking the
  contract about temporal order.

- Allocate more fingerprint locker slots.

- Do not run early checkpoints if we are behind on chunk persistence.

- Increase fpMinWaitDuration to give the disk more time for more
  important things.

Also, switch math.MaxInt64 and math.MinInt64 to the new constants.
2015-02-12 18:12:37 +01:00
Bjoern Rabenstein 5859b74f1b Clean up license issues.
- Move CONTRIBUTORS.md to the more common AUTHORS.
- Added the required NOTICE file.
- Changed "Prometheus Team" to "The Prometheus Authors".
- Reverted the erroneous changes to the Apache License.
2015-01-21 20:07:45 +01:00
Julius Volz a6bc42bc61 Minor formatting/spelling fixups. 2015-01-09 11:04:20 +01:00
Bjoern Rabenstein 622e8350cd Fix a bug handling freshly unarchived series.
Usually, if you unarchive a series, it is to add something to it,
which will create a new head chunk. However, if a series in
unarchived, and before anything is added to it, it is handled by the
maintenance loop, it will be archived again. In that case, we have to
load the chunkDescs to know the lastTime of the series to be
archived. Usually, this case will happen only rarely (as a race, has
never happened so far, possibly because the locking around unarchiving
and the subsequent sample append is smart enough). However, during
crash recovery, we sometimes treat series as "freshly unarchived"
without directly appending a sample. We might add more cases of that
type later, so better deal with archiving properly and load chunkDescs
if required.
2015-01-08 16:25:50 +01:00
Julius Volz c9618d11e8 Introduce copy-on-write for metrics in AST.
This depends on changes in:

https://github.com/prometheus/client_golang/tree/cow-metrics.

Change-Id: I80b94833a60ddf954c7cd92fd2cfbebd8dd46142
2014-12-12 20:34:55 +01:00
Bjoern Rabenstein 14bda4180c Changes after pair code review.
Change-Id: Ib72d40f8e9027818cfbbd32a7a7201eebda07455
2014-11-25 17:12:59 +01:00
Bjoern Rabenstein f1de5b0c4e Run checkpointing of in-memory metrics and head chunks periodically.
Checkpointing interval is now a command line flag.

Along the way, several things were refactored.
- Restructure the way the storage is started and stopped..
- Number of series in checkpoint is now a uint64, not a varint.
  (Breaks old checkpoints, needs wipe!)
- More consistent naming and order of methods.

Change-Id: I883d9170c9a608ee716bb0ab3d0ded8ca03760d9
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 74c9b34a5e Improve storage instrumentation even more.
Add gauge for chunks and chunkdescs in memory (backed by a global
variable to be used later not only for instrumentation but also for
memory management).

Refactored instrumentation code once more (instrumentation.go is back :).

Change-Id: Ife39947e22a48cac4982db7369c231947f446e17
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 443dd33805 Improve instrumentation in storage.
Also, fix some other minor bugs.

Change-Id: If72f1c058b0f47d3e378fdf80228d7e9a8db06c7
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 096fa0f8b2 Squash a number of TODOs.
- Staleness delta is no a proper function parameter and not replicated
  from package ast.

- Named type 'chunks' replaced by explicit '[]chunk' to avoid confusion.

- For the same reason, replaced 'chunkDescs' by '[]*chunkDescs'.

- Verified that math.Modf is not a speed enhancement over conversion
  (actually 5x slower).

- Renamed firstTimeField, lastTimeField into chunkFirstTime and
  chunkLastTime.

- Verified unpin() is sufficiently goroutine-safe.

- Decided not to update archivedFingerprintToTimeRange upon series
  truncation and added a rationale why.

Change-Id: I863b8d785e5ad9f71eb63e229845eacf1bed8534
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein b3ed9aa7a2 Clean up start-up and shut-down.
Change-Id: Idff4bbb0a15a9f879bfbb3da5b1025179cab5e2c
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein 38fc24d0ed Fix targetpool_test.go and other tests.
Change-Id: I91a4dd1d39e01f174e1aaae653ce1ed7aecaa624
2014-11-25 17:08:26 +01:00
Bjoern Rabenstein 8fba3302bc Bold changes to concurrency.
(WIP. Probably doesn't work yet.)

Change-Id: Id1537dfcca53831a1d428078a5863ece7bdf4875
2014-11-25 17:07:45 +01:00
Bjoern Rabenstein fcdf5a8ee7 Fix bugs in chunk evict code.
Also, simplify code by re-looking up metric in metric map.

Change-Id: Ib2092f9184374e5a543e87d3a9f4a74fda64b193
2014-11-25 17:07:45 +01:00
Bjoern Rabenstein 7e6a03fbf9 Fix a few concurrency issues before starting to use the new fp locker.
Change-Id: I8615e8816e79ef0882e123163ee590c739b79d12
2014-11-25 17:07:45 +01:00
Bjoern Rabenstein e9ff29c547 Comment/code cleanup.
Change-Id: I38736e3d0fec79759a2bafa35aecf914480ff810
2014-11-25 17:07:44 +01:00
Bjoern Rabenstein 0031a448e2 Add WaitForIndexing.
Change-Id: I5a5c975c4246632f937413322c855bbe63d00802
2014-11-25 17:07:44 +01:00
Bjoern Rabenstein c7aad110fb Add an indexing queue and batch the ops.
Some other improvements on the way, in particular codec -> codable
renaming and addition of LookupSet methods.

Change-Id: I978f8f3f84ca8e4d39a9d9f152ae0ad274bbf4e2
2014-11-25 17:07:44 +01:00
Bjoern Rabenstein 71206dbc06 More code cleanups.
Add license text everywhere.
And others....

Change-Id: I11ccde267a2ef7eb366c4788ba7aeae14ba7545c
2014-11-25 17:07:44 +01:00
Julius Volz 630b5a087a Also consider on-disk fingerprints during purge.
This reintroduces LevelDB iterators so that we can iterate through all
the on-disk fingerprints.

Change-Id: I007ee4638d038d2a4461bbda27f30fcaad411474
2014-11-25 17:07:35 +01:00
Bjoern Rabenstein f5f9f3514a Major code cleanup.
- Make it go-vet and golint clean.
- Add comments, TODOs, etc.

Change-Id: If1392d96f3d5b4cdde597b10c8dff1769fcfabe2
2014-11-25 17:02:53 +01:00
Bjoern Rabenstein bbf49200ab Implement methods in persistence.go.
Change-Id: I804cdd0b30420e171825fd86fe1281eca0d5e638
2014-11-25 17:02:23 +01:00
Bjoern Rabenstein 5a128a04a9 Major reorganization of the storage.
Most important, the heads file will now persist all the chunk descs,
too. Implicitly, it will serve as the persisted form of the
fp-to-series map.

Change-Id: Ic867e78f2714d54c3b5733939cc5aef43f7bd08d
2014-11-25 17:02:01 +01:00
Bjoern Rabenstein 89f10e8eb2 Move to using the standard library interfaces for encoding/decoding.
BinaryMarshaler instead of encodable.
BinaryUnmarshaler instead of decodable.

Left 'codable' in place for lack of a better word.

Change-Id: I8a104be7d6db916e8dbc47ff95e6ff73b845ac22
2014-11-25 17:02:01 +01:00
Julius Volz 7e85711df0 Beginnings of a tiered index implementation.
This reintroduces a LevelDB-based metrics index.

Change-Id: I4111540301c52255a07b2f570761707a32f72c05
2014-11-25 17:02:00 +01:00
Bjoern Rabenstein ecdf5ab14f Index-persistence switched from gob to a hand-coded solution.
Change-Id: Ib4ec42535bd08df16d34d4774bb638e35c5a1841
2014-11-25 17:02:00 +01:00
Julius Volz e7ed39c9a6 Initial experimental snapshot of next-gen storage.
Change-Id: Ifb8709960dbedd1d9f5efd88cdd359ee9fa9d26d
2014-11-25 17:02:00 +01:00