Commit graph

303 commits

Author SHA1 Message Date
Matthew Campbell 67d76e3a5d timeseries: store varbit encoded data into cassandra 2016-09-21 17:56:55 +02:00
Julius Volz c187308366 storage: Contextify storage interfaces.
This is based on https://github.com/prometheus/prometheus/pull/1997.

This adds contexts to the relevant Storage methods and already passes
PromQL's new per-query context into the storage's query methods.
The immediate motivation supporting multi-tenancy in Frankenstein, but
this could also be used by Prometheus's normal local storage to support
cancellations and timeouts at some point.
2016-09-19 16:29:07 +02:00
Maxim Ivanov bdc53098fc Avoid having contended mutexes on same cacheline
CPUs have to serialise write access to a single cache line
effectively reducing level of possible parallelism. Placing
mutexes on different cache lines avoids this problem.

Most gains will be seen on NUMA servers where CPU interconnect
traffic is especially expensive

Before:
go test . -run none -bench BenchmarkFingerprintLocker
BenchmarkFingerprintLockerParallel-4   	 2000000	       932 ns/op
BenchmarkFingerprintLockerSerial-4     	30000000	        49.6 ns/op

After:
go test . -run none -bench BenchmarkFingerprintLocker
BenchmarkFingerprintLockerParallel-4   	 3000000	       569 ns/op
BenchmarkFingerprintLockerSerial-4     	30000000	        51.0 ns/op
2016-09-18 23:32:55 +01:00
Julius Volz 5f5a78e807 Merge pull request #1974 from prometheus/disable-local-storage
Allow disabling local storage.
2016-09-17 18:40:01 +02:00
Tobias Schmidt 29ced0090f Fix common english misspellings 2016-09-14 23:23:28 -04:00
Tobias Schmidt e2c12dcdb5 Add missing error check in persistence test 2016-09-14 23:16:47 -04:00
Julius Volz b24e5d63bc Add noop local storage engine.
This adds a flag -storage.local.engine which allows turning off local
storage in Prometheus. Instead of adding if-conditions and nil checks to
all parts of Prometheus that deal with Prometheus's local storage
(including the web interface), disabling local storage simply means
replacing the normal local storage with a noop version that throws
samples away and returns empty query results. We also don't add the noop
storage to the fanout appender to decrease internal overhead.

Instead of returning empty results, an alternate behavior could be to
return errors on any query that point out that the local storage is
disabled. Not sure which one is more preferable, so I went with the
empty result option for now.
2016-09-14 13:18:05 +02:00
Fabian Reinartz 22296dcb85 storage: clarify sample/matcher relation in docs 2016-09-12 11:19:36 +02:00
Fabian Reinartz cc6f988a5e storage: fix Querier interface documentation 2016-09-12 10:48:54 +02:00
nghialv 7655038184 fix typo 2016-09-08 19:01:32 +09:00
Dan Milstein ec064c96f6 Add field names to table-driven test fixtures 2016-08-30 07:57:39 -04:00
Dan Milstein ac8788aca6 Convert to table-driven test and inline helper func 2016-08-30 07:57:39 -04:00
Dan Milstein f50f656a66 Fix double-delta unmarshaling to respect actual min header size
Turns out its valid to have an overall chunk which is smaller than the
full doubleDeltaHeaderBytes size -- if it has a single sample, it
doesn't fill the whole header.  Updated unmarshalling check to respect
this.
2016-08-30 07:57:39 -04:00
Dan Milstein b815956341 Catch errors when unmarshalling delta/doubleDelta encoded chunks
This is (hopefully) a fix for #1653

Specifically, this makes it so that if the length for the stored
delta/doubleDelta is somehow corrupted to be too small, the attempt to
unmarshal will return an error.

The current (broken) behavior is to return a malformed chunk, which can
then lead to a panic when there is an attempt to read header values.

The referenced issue proposed creating chunks with a minimum length -- I
instead opted to just error on the attempt to unmarshal, since I'm not
clear on how it could be safe to proceed when the length is
incorrect/unknown.

The issue also talked about possibly "quarantining series", but I don't
know the surrounding code well enough to understand how to make that
happen.
2016-08-30 07:57:39 -04:00
Matt Bostock e618af5d0b Storage: Add crash recovery metric 'started_dirty'
...to indicate when crash recovery was invoked during Prometheus
startup.

Fixes #1918.
2016-08-27 21:41:06 +02:00
Julius Volz 3bfec97d46 Make the storage interface higher-level.
See discussion in
https://groups.google.com/forum/#!topic/prometheus-developers/bkuGbVlvQ9g

The main idea is that the user of a storage shouldn't have to deal with
fingerprints anymore, and should not need to do an individual preload
call for each metric. The storage interface needs to be made more
high-level to not expose these details.

This also makes it easier to reuse the same storage interface for remote
storages later, as fewer roundtrips are required and the fingerprint
concept doesn't work well across the network.

NOTE: this deliberately gets rid of a small optimization in the old
query Analyzer, where we dedupe instants and ranges for the same series.
This should have a minor impact, as most queries do not have multiple
selectors loading the same series (and at the same offset).
2016-07-25 13:59:22 +02:00
beorn7 fc6737b7fb storage: improve index lookups
tl;dr: This is not a fundamental solution to the indexing problem
(like tindex is) but it at least avoids utilizing the intersection
problem to the greatest possible amount.

In more detail:

Imagine the following query:

    nicely:aggregating:rule{job="foo",env="prod"}

While it uses a nicely aggregating recording rule (which might have a
very low cardinality), Prometheus still intersects the low number of
fingerprints for `{__name__="nicely:aggregating:rule"}` with the many
thousands of fingerprints matching `{job="foo"}` and with the millions
of fingerprints matching `{env="prod"}`. This totally innocuous query
is dead slow if the Prometheus server has a lot of time series with
the `{env="prod"}` label. Ironically, if you make the query more
complicated, it becomes blazingly fast:

    nicely:aggregating:rule{job=~"foo",env=~"prod"}

Why so? Because Prometheus only intersects with non-Equal matchers if
there are no Equal matchers. That's good in this case because it
retrieves the few fingerprints for
`{__name__="nicely:aggregating:rule"}` and then starts right ahead to
retrieve the metric for those FPs and checking individually if they
match the other matchers.

This change is generalizing the idea of when to stop intersecting FPs
and go into "retrieve metrics and check them individually against
remaining matchers" mode:

- First, sort all matchers by "expected cardinality". Matchers
  matching the empty string are always worst (and never used for
  intersections). Equal matchers are in general consider best, but by
  using some crude heuristics, we declare some better than others
  (instance labels or anything that looks like a recording rule).

- Then go through the matchers until we hit a threshold of remaining
  FPs in the intersection. This threshold is higher if we are already
  in the non-Equal matcher area as intersection is even more expensive
  here.

- Once the threshold has been reached (or we have run out of matchers
  that do not match the empty string), start with "retrieve metrics
  and check them individually against remaining matchers".

A beefy server at SoundCloud was spending 67% of its CPU time in index
lookups (fingerprintsForLabelPairs), serving mostly a dashboard that
is exclusively built with recording rules. With this change, it spends
only 35% in fingerprintsForLabelPairs. The CPU usage dropped from 26
cores to 18 cores. The median latency for query_range dropped from 14s
to 50ms(!). As expected, higher percentile latency didn't improve that
much because the new approach is _occasionally_ running into the worst
case while the old one was _systematically_ doing so. The 99th
percentile latency is now about as high as the median before (14s)
while it was almost twice as high before (26s).
2016-07-20 17:35:53 +02:00
Björn Rabenstein 0622304244 Merge pull request #1798 from prometheus/beorn7/storage2
Crash recovery: Fix an edge case.
2016-07-13 16:53:18 +02:00
beorn7 2a75b15328 Crash recovery: Fix an edge case.
If the chunks of a series in the checkpoint are all older then the
latest chunk on disk, the head chunk is persisted and therefore has to
be declared closed.

It would be great to have a test for this, but that would require more
plumbing, subject of #447.
2016-07-07 16:17:38 +02:00
beorn7 064b57858e Consistently use the Seconds() method for conversion of durations
This also fixes one remaining case of recording integral numbers
of seconds only for a metric, i.e. this will probably fix #1796.
2016-07-07 15:24:35 +02:00
Julius Volz 91401794fa storage: Make MemorySeriesStorage a public type
See https://twitter.com/fabxc/status/748032597876482048
2016-06-29 08:14:23 +02:00
Fabian Reinartz 425736a377 *: remove last remainers of non-second metrics 2016-06-23 17:50:39 +02:00
Julius Volz b7b6717438 Separate query interface out of local.Storage.
PromQL only requires a much narrower interface than local.Storage in
order to run queries. Narrower interfaces are easier to replace and
test, too.

We could also change the web interface to use local.Querier, except that
we'll probably use appending functions from there in the future.
2016-06-23 15:14:38 +02:00
Jan van Valburg 68f3df49d0 stoarge: fix 'access denied' error on Windows
On Windows, it is not possible to rename or delete a file that is
currerntly open. This change closes the file in dropAndPersistChunks
before it tries to delete it, or rename the temporary file to it.
2016-06-21 11:21:20 +02:00
beorn7 b274c7aaa7 Update doc comments 2016-06-03 12:34:01 +02:00
beorn7 99881ded63 Make the number of fingerprint mutexes configurable
With a lot of series accessed in a short timeframe (by a query, a
large scrape, checkpointing, ...), there is actually quite a
significant amount of lock contention if something similar is running
at the same time.

In those cases, the number of locks needs to be increased.

On the same front, as our fingerprints don't have a lot of entropy, I
introduced some additional shuffling. With the current state, anly
changes in the least singificant bits of a FP would matter.
2016-06-02 19:18:00 +02:00
beorn7 a308c76292 Improve TestAppendOutOfOrder
It did not test the returned error so far.
Also, add tests for the NaN case broken before
https://github.com/prometheus/common/pull/40
2016-05-20 13:46:33 +02:00
beorn7 b2ef4dc52d Correctly identify no-op appends if the value is NaN
This requires an updating of the vendored commen.model package, which
I will do once https://github.com/prometheus/common/pull/40 is merged.
2016-05-19 18:32:47 +02:00
Steve Durrheimer 399d5c6375
Make version informations consistent between prometheus components 2016-05-05 22:33:18 +02:00
beorn7 07a294ac15 Doc comment fixes 2016-04-26 01:05:56 +02:00
beorn7 20cba1ed8f Initialize metric vectors in memorySeriesStorage 2016-04-25 17:08:07 +02:00
beorn7 d566808d40 Bring back logging of discarded samples
But only on DEBUG level.

Also, count and report the two cases of out-of-order timestamps on the
one hand and same timestamp but different value on the other hand
separately.
2016-04-25 16:43:52 +02:00
beorn7 db16acd7fb Never drop a still open head chunk. 2016-04-15 19:18:40 +02:00
beorn7 a90d645378 Checkpoint fingerprint mappings only upon shutdown
Before, we checkpointed after every newly detected fingerprint
collision, which is not a problem as long as collisions are
rare. However, with a sufficient number of metrics or particular
nature of the data set, there might be a lot of collisions, all to be
detected upon the first set of scrapes, and then the checkpointing
after each detection will take a quite long time (it's O(n²),
essentially).

Since we are rebuilding the fingerprint mapping during crash recovery,
the previous, very conservative approach didn't even buy us
anything. We only ever read from the checkpoint file after a clean
shutdown, so the only time we need to write the checkpoint file is
during a clean shutdown.
2016-04-15 01:03:28 +02:00
Jonathan Boulle 38098f8c95 Add missing license headers
Prometheus is Apache 2 licensed, and most source files have the
appropriate copyright license header, but some were missing it without
apparent reason. Correct that by adding it.
2016-04-13 16:08:22 +02:00
Fabian Reinartz a18639dc2d Merge pull request #1454 from prometheus/beorn7/fix-test
Give TestEvictAndLoadChunkDescs more time to actually evict
2016-04-08 14:58:01 +02:00
beorn7 d09ca03e10 Work around compiler bug
Benchmarks don't show any significant changes.
2016-03-29 17:05:28 +02:00
beorn7 865d16f870 Rename Gorilla into varbit 2016-03-23 16:30:41 +01:00
beorn7 4b574e8a61 Switch chunk encoding to type 2 where it was hardcoded type 1 before
The chunk encoding was hardcoded there because it mostly doesn't
matter what encoding is chosen in that test. Since type 1 is
battle-hardened enough, I'm switching to type 2 here so that we can
catch unexpected problems as a byproduct. My expectation is that the
chunk encoding doesn't matter anyway, as said, but then "unexpected
problems" contains the word "unexpected".
2016-03-20 23:32:20 +01:00
beorn7 c72979e3ed Remove a redundancy from Gorilla-style chunks
So far, the last sample in a chunk was saved twice. That's required
for adding more samples as we need to know the last sample added to
add more samples without iterating through the whole chunk. However,
once the last sample was added to the chunk before it's full, there is
no need to save it twice. Thus, the very last sample added to a chunk
can _only_ be saved in the header fields for the last sample. The
chunk has to be identifiable as closed, then. This information has
been added to the flags byte.
2016-03-20 23:09:48 +01:00
beorn7 b6dbb826ae Improve fuzz testing and fix a bug exposed
This improves fuzz testing in two ways:

(1) More realistic time stamps. So far, the most common case in
practice was very rare in the test: Completely regular increases of
the timestamp.

(2) Verify samples by scanning through the whole relevant section of
the series.

For Gorilla-like chunks, this showed two things:

(1) With more regularly increasing time stamps, BenchmarkFuzz is
essentially as fast as with the traditional chunks:

```
BenchmarkFuzzChunkType0-8              2         972514684 ns/op        83426196 B/op    2500044 allocs/op
BenchmarkFuzzChunkType1-8              2         971478001 ns/op        82874660 B/op    2512364 allocs/op
BenchmarkFuzzChunkType2-8              2         999339453 ns/op        76670636 B/op    2366116 allocs/op
```

(2) There was a bug related to when and how the chunk footer is
overwritten to make use for the last sample. This wasn't exposed by
random access as the last sample of a chunk is retrieved from the
values in the header in that case.
2016-03-20 17:21:28 +01:00
beorn7 9d8fbbe822 Review improvements 2016-03-17 17:31:56 +01:00
beorn7 8cdced3850 Implement Gorilla-inspired chunk encoding
This is not a verbatim implementation of the Gorilla encoding.  First
of all, it could not, even if we wanted, because Prometheus has a
different chunking model (constant size, not constant time).  Second,
this adds a number of changes that improve the encoding in general or
at least for the specific use case of Prometheus (and are partially
only possible in the context of Prometheus). See comments in the code
for details.
2016-03-17 14:47:08 +01:00
beorn7 8e64e8dfca Fix return statement. 2016-03-17 14:43:00 +01:00
Björn Rabenstein 98c8560851 Merge pull request #1477 from prometheus/beorn7/storage7
Solve the series churn problem...
2016-03-17 14:39:28 +01:00
beorn7 e7ac9c6863 Improvments based on review
- Moved returns into the default section of switch statement that can
  only happen then.

- Fix typo.
2016-03-17 14:37:24 +01:00
beorn7 199f309a39 Resurrect and rename invalid preload requests count metric.
It is now also used in label matching, so the name of the metric
changed from `prometheus_local_storage_invalid_preload_requests_total`
to `non_existent_series_matches_total'.
2016-03-13 11:54:24 +01:00
beorn7 e8c1f30ab2 Merge the parallel logic of getSeriesForRange and metricForFingerprint 2016-03-09 21:56:15 +01:00
beorn7 9445c7053d Add tests for range-limited label matching
While doing so, improve getSeriesForRange.
2016-03-09 21:01:03 +01:00
beorn7 47e3c90f9b Clean up error propagation
Only return an error where callers are doing something with it except
simply logging and ignoring.

All the errors touched in this commit flag the storage as dirty
anyway, and that fact is logged anyway. So most of what is being
removed here is just log spam.

As discussed earlier, the class of errors that flags the storage as
dirty signals fundamental corruption, no even bubbling up a one-time
warning to the user (e.g. about incomplete results) isn't helping much
because _anything_ happening in the storage has to be doubted from
that point on (and in fact retroactively into the past, too). Flagging
the storage dirty, and alerting on it (plus marking the state in the
web UI) is the only way I can see right now.

As a byproduct, I cleaned up the setDirty method a bit and improved
the logged errors.
2016-03-09 18:56:30 +01:00