Commit graph

289 commits

Author SHA1 Message Date
Matt Bostock e618af5d0b Storage: Add crash recovery metric 'started_dirty'
...to indicate when crash recovery was invoked during Prometheus
startup.

Fixes #1918.
2016-08-27 21:41:06 +02:00
Julius Volz 3bfec97d46 Make the storage interface higher-level.
See discussion in
https://groups.google.com/forum/#!topic/prometheus-developers/bkuGbVlvQ9g

The main idea is that the user of a storage shouldn't have to deal with
fingerprints anymore, and should not need to do an individual preload
call for each metric. The storage interface needs to be made more
high-level to not expose these details.

This also makes it easier to reuse the same storage interface for remote
storages later, as fewer roundtrips are required and the fingerprint
concept doesn't work well across the network.

NOTE: this deliberately gets rid of a small optimization in the old
query Analyzer, where we dedupe instants and ranges for the same series.
This should have a minor impact, as most queries do not have multiple
selectors loading the same series (and at the same offset).
2016-07-25 13:59:22 +02:00
beorn7 fc6737b7fb storage: improve index lookups
tl;dr: This is not a fundamental solution to the indexing problem
(like tindex is) but it at least avoids utilizing the intersection
problem to the greatest possible amount.

In more detail:

Imagine the following query:

    nicely:aggregating:rule{job="foo",env="prod"}

While it uses a nicely aggregating recording rule (which might have a
very low cardinality), Prometheus still intersects the low number of
fingerprints for `{__name__="nicely:aggregating:rule"}` with the many
thousands of fingerprints matching `{job="foo"}` and with the millions
of fingerprints matching `{env="prod"}`. This totally innocuous query
is dead slow if the Prometheus server has a lot of time series with
the `{env="prod"}` label. Ironically, if you make the query more
complicated, it becomes blazingly fast:

    nicely:aggregating:rule{job=~"foo",env=~"prod"}

Why so? Because Prometheus only intersects with non-Equal matchers if
there are no Equal matchers. That's good in this case because it
retrieves the few fingerprints for
`{__name__="nicely:aggregating:rule"}` and then starts right ahead to
retrieve the metric for those FPs and checking individually if they
match the other matchers.

This change is generalizing the idea of when to stop intersecting FPs
and go into "retrieve metrics and check them individually against
remaining matchers" mode:

- First, sort all matchers by "expected cardinality". Matchers
  matching the empty string are always worst (and never used for
  intersections). Equal matchers are in general consider best, but by
  using some crude heuristics, we declare some better than others
  (instance labels or anything that looks like a recording rule).

- Then go through the matchers until we hit a threshold of remaining
  FPs in the intersection. This threshold is higher if we are already
  in the non-Equal matcher area as intersection is even more expensive
  here.

- Once the threshold has been reached (or we have run out of matchers
  that do not match the empty string), start with "retrieve metrics
  and check them individually against remaining matchers".

A beefy server at SoundCloud was spending 67% of its CPU time in index
lookups (fingerprintsForLabelPairs), serving mostly a dashboard that
is exclusively built with recording rules. With this change, it spends
only 35% in fingerprintsForLabelPairs. The CPU usage dropped from 26
cores to 18 cores. The median latency for query_range dropped from 14s
to 50ms(!). As expected, higher percentile latency didn't improve that
much because the new approach is _occasionally_ running into the worst
case while the old one was _systematically_ doing so. The 99th
percentile latency is now about as high as the median before (14s)
while it was almost twice as high before (26s).
2016-07-20 17:35:53 +02:00
Björn Rabenstein 0622304244 Merge pull request #1798 from prometheus/beorn7/storage2
Crash recovery: Fix an edge case.
2016-07-13 16:53:18 +02:00
beorn7 2a75b15328 Crash recovery: Fix an edge case.
If the chunks of a series in the checkpoint are all older then the
latest chunk on disk, the head chunk is persisted and therefore has to
be declared closed.

It would be great to have a test for this, but that would require more
plumbing, subject of #447.
2016-07-07 16:17:38 +02:00
beorn7 064b57858e Consistently use the Seconds() method for conversion of durations
This also fixes one remaining case of recording integral numbers
of seconds only for a metric, i.e. this will probably fix #1796.
2016-07-07 15:24:35 +02:00
Julius Volz 91401794fa storage: Make MemorySeriesStorage a public type
See https://twitter.com/fabxc/status/748032597876482048
2016-06-29 08:14:23 +02:00
Fabian Reinartz 425736a377 *: remove last remainers of non-second metrics 2016-06-23 17:50:39 +02:00
Julius Volz b7b6717438 Separate query interface out of local.Storage.
PromQL only requires a much narrower interface than local.Storage in
order to run queries. Narrower interfaces are easier to replace and
test, too.

We could also change the web interface to use local.Querier, except that
we'll probably use appending functions from there in the future.
2016-06-23 15:14:38 +02:00
Jan van Valburg 68f3df49d0 stoarge: fix 'access denied' error on Windows
On Windows, it is not possible to rename or delete a file that is
currerntly open. This change closes the file in dropAndPersistChunks
before it tries to delete it, or rename the temporary file to it.
2016-06-21 11:21:20 +02:00
beorn7 b274c7aaa7 Update doc comments 2016-06-03 12:34:01 +02:00
beorn7 99881ded63 Make the number of fingerprint mutexes configurable
With a lot of series accessed in a short timeframe (by a query, a
large scrape, checkpointing, ...), there is actually quite a
significant amount of lock contention if something similar is running
at the same time.

In those cases, the number of locks needs to be increased.

On the same front, as our fingerprints don't have a lot of entropy, I
introduced some additional shuffling. With the current state, anly
changes in the least singificant bits of a FP would matter.
2016-06-02 19:18:00 +02:00
beorn7 a308c76292 Improve TestAppendOutOfOrder
It did not test the returned error so far.
Also, add tests for the NaN case broken before
https://github.com/prometheus/common/pull/40
2016-05-20 13:46:33 +02:00
beorn7 b2ef4dc52d Correctly identify no-op appends if the value is NaN
This requires an updating of the vendored commen.model package, which
I will do once https://github.com/prometheus/common/pull/40 is merged.
2016-05-19 18:32:47 +02:00
Steve Durrheimer 399d5c6375
Make version informations consistent between prometheus components 2016-05-05 22:33:18 +02:00
beorn7 07a294ac15 Doc comment fixes 2016-04-26 01:05:56 +02:00
beorn7 20cba1ed8f Initialize metric vectors in memorySeriesStorage 2016-04-25 17:08:07 +02:00
beorn7 d566808d40 Bring back logging of discarded samples
But only on DEBUG level.

Also, count and report the two cases of out-of-order timestamps on the
one hand and same timestamp but different value on the other hand
separately.
2016-04-25 16:43:52 +02:00
beorn7 db16acd7fb Never drop a still open head chunk. 2016-04-15 19:18:40 +02:00
beorn7 a90d645378 Checkpoint fingerprint mappings only upon shutdown
Before, we checkpointed after every newly detected fingerprint
collision, which is not a problem as long as collisions are
rare. However, with a sufficient number of metrics or particular
nature of the data set, there might be a lot of collisions, all to be
detected upon the first set of scrapes, and then the checkpointing
after each detection will take a quite long time (it's O(n²),
essentially).

Since we are rebuilding the fingerprint mapping during crash recovery,
the previous, very conservative approach didn't even buy us
anything. We only ever read from the checkpoint file after a clean
shutdown, so the only time we need to write the checkpoint file is
during a clean shutdown.
2016-04-15 01:03:28 +02:00
Jonathan Boulle 38098f8c95 Add missing license headers
Prometheus is Apache 2 licensed, and most source files have the
appropriate copyright license header, but some were missing it without
apparent reason. Correct that by adding it.
2016-04-13 16:08:22 +02:00
Fabian Reinartz a18639dc2d Merge pull request #1454 from prometheus/beorn7/fix-test
Give TestEvictAndLoadChunkDescs more time to actually evict
2016-04-08 14:58:01 +02:00
beorn7 d09ca03e10 Work around compiler bug
Benchmarks don't show any significant changes.
2016-03-29 17:05:28 +02:00
beorn7 865d16f870 Rename Gorilla into varbit 2016-03-23 16:30:41 +01:00
beorn7 4b574e8a61 Switch chunk encoding to type 2 where it was hardcoded type 1 before
The chunk encoding was hardcoded there because it mostly doesn't
matter what encoding is chosen in that test. Since type 1 is
battle-hardened enough, I'm switching to type 2 here so that we can
catch unexpected problems as a byproduct. My expectation is that the
chunk encoding doesn't matter anyway, as said, but then "unexpected
problems" contains the word "unexpected".
2016-03-20 23:32:20 +01:00
beorn7 c72979e3ed Remove a redundancy from Gorilla-style chunks
So far, the last sample in a chunk was saved twice. That's required
for adding more samples as we need to know the last sample added to
add more samples without iterating through the whole chunk. However,
once the last sample was added to the chunk before it's full, there is
no need to save it twice. Thus, the very last sample added to a chunk
can _only_ be saved in the header fields for the last sample. The
chunk has to be identifiable as closed, then. This information has
been added to the flags byte.
2016-03-20 23:09:48 +01:00
beorn7 b6dbb826ae Improve fuzz testing and fix a bug exposed
This improves fuzz testing in two ways:

(1) More realistic time stamps. So far, the most common case in
practice was very rare in the test: Completely regular increases of
the timestamp.

(2) Verify samples by scanning through the whole relevant section of
the series.

For Gorilla-like chunks, this showed two things:

(1) With more regularly increasing time stamps, BenchmarkFuzz is
essentially as fast as with the traditional chunks:

```
BenchmarkFuzzChunkType0-8              2         972514684 ns/op        83426196 B/op    2500044 allocs/op
BenchmarkFuzzChunkType1-8              2         971478001 ns/op        82874660 B/op    2512364 allocs/op
BenchmarkFuzzChunkType2-8              2         999339453 ns/op        76670636 B/op    2366116 allocs/op
```

(2) There was a bug related to when and how the chunk footer is
overwritten to make use for the last sample. This wasn't exposed by
random access as the last sample of a chunk is retrieved from the
values in the header in that case.
2016-03-20 17:21:28 +01:00
beorn7 9d8fbbe822 Review improvements 2016-03-17 17:31:56 +01:00
beorn7 8cdced3850 Implement Gorilla-inspired chunk encoding
This is not a verbatim implementation of the Gorilla encoding.  First
of all, it could not, even if we wanted, because Prometheus has a
different chunking model (constant size, not constant time).  Second,
this adds a number of changes that improve the encoding in general or
at least for the specific use case of Prometheus (and are partially
only possible in the context of Prometheus). See comments in the code
for details.
2016-03-17 14:47:08 +01:00
beorn7 8e64e8dfca Fix return statement. 2016-03-17 14:43:00 +01:00
Björn Rabenstein 98c8560851 Merge pull request #1477 from prometheus/beorn7/storage7
Solve the series churn problem...
2016-03-17 14:39:28 +01:00
beorn7 e7ac9c6863 Improvments based on review
- Moved returns into the default section of switch statement that can
  only happen then.

- Fix typo.
2016-03-17 14:37:24 +01:00
beorn7 199f309a39 Resurrect and rename invalid preload requests count metric.
It is now also used in label matching, so the name of the metric
changed from `prometheus_local_storage_invalid_preload_requests_total`
to `non_existent_series_matches_total'.
2016-03-13 11:54:24 +01:00
beorn7 e8c1f30ab2 Merge the parallel logic of getSeriesForRange and metricForFingerprint 2016-03-09 21:56:15 +01:00
beorn7 9445c7053d Add tests for range-limited label matching
While doing so, improve getSeriesForRange.
2016-03-09 21:01:03 +01:00
beorn7 47e3c90f9b Clean up error propagation
Only return an error where callers are doing something with it except
simply logging and ignoring.

All the errors touched in this commit flag the storage as dirty
anyway, and that fact is logged anyway. So most of what is being
removed here is just log spam.

As discussed earlier, the class of errors that flags the storage as
dirty signals fundamental corruption, no even bubbling up a one-time
warning to the user (e.g. about incomplete results) isn't helping much
because _anything_ happening in the storage has to be doubted from
that point on (and in fact retroactively into the past, too). Flagging
the storage dirty, and alerting on it (plus marking the state in the
web UI) is the only way I can see right now.

As a byproduct, I cleaned up the setDirty method a bit and improved
the logged errors.
2016-03-09 18:56:30 +01:00
beorn7 99854a84d7 Merge branch 'beorn7/storage6' into beorn7/storage7 2016-03-09 17:23:25 +01:00
beorn7 5e4fa96719 Merge branch 'beorn7/storage5' into beorn7/storage6 2016-03-09 17:21:32 +01:00
beorn7 b343e65907 Merge branch 'beorn7/storage4' into beorn7/storage5
erge is necessary,
2016-03-09 17:14:42 +01:00
beorn7 d0a4477446 Merge branch 'beorn7/storage3' into beorn7/storage4
Conflicts:
	storage/local/preload.go
	storage/local/storage.go
	storage/local/storage_test.go
2016-03-09 17:13:16 +01:00
beorn7 55eddab25f Merge branch 'beorn7/storage2' into beorn7/storage3 2016-03-09 16:48:46 +01:00
beorn7 161eada3ad Make chunkIterator even leaner. 2016-03-09 16:20:39 +01:00
beorn7 beb36df4bb De-flag preloadChunksForRange
Now there is preloadChunksForRange and preloadChunksForInstant in
both, the series and the storage.
2016-03-09 14:50:09 +01:00
beorn7 836f1db04c Improve MetricsForLabelMatchers
WIP: This needs more tests.

It now gets a from and through value, which it may opportunistically
use to optimize the retrieval. With possible future range indices,
this could be used in a very efficient way. This change merely applies
some easy checks, which should nevertheless solve the use case of
heavy rule evaluations on servers with a lot of series churn.

Idea is the following:

- Only archive series that are at least as old as the headChunkTimeout
  (which was already extremely unlikely to happen).

- Then maintain a high watermark for the last archival, i.e. no
  archived series has a sample more recent than that watermark.

- Any query that doesn't reach to a time before that watermark doesn't
  have to touch the archive index at all. (A production server at
  Soundcloud with the aforementioned series churn and heavy rule
  evaluations spends 50% of its CPU time in archive index
  lookups. Since rule evaluations usually only touch very recent
  values, most of those lookup should disappear with this change.)

- Federation with a very broad label matcher will profit from this,
  too.

As a byproduct, the un-needed MetricForFingerprint method was removed
from the Storage interface.
2016-03-09 00:25:59 +01:00
beorn7 167b83695c Merge branch 'beorn7/storage5' into beorn7/storage6 2016-03-08 00:20:44 +01:00
beorn7 01795382c9 Merge branch 'beorn7/storage4' into beorn7/storage5 2016-03-08 00:20:13 +01:00
beorn7 f7fc542db6 Merge branch 'master' into beorn7/storage4
Conflicts:
	storage/local/persistence.go
2016-03-08 00:14:00 +01:00
beorn7 3d86130d8c Merge branch 'master' into beorn7/storage3 2016-03-07 23:39:12 +01:00
beorn7 1f30c8de8d Merge branch 'master' into beorn7/storage2 2016-03-07 23:38:42 +01:00
beorn7 c13b1ecfe9 Make chunk iterators more DRY
This finally extracts all the common code of the two chunk iterators
into one. Any future chunk encodings with fast access by index can use
the same iterator by simply providing an indexAccessor. Other future
chunk encodings without fast index access (like Gorilla-style) can
still implement the chunkIterator interface as usual.
2016-03-07 20:23:14 +01:00