Commit graph

11131 commits

Author SHA1 Message Date
beorn7 865d16f870 Rename Gorilla into varbit 2016-03-23 16:30:41 +01:00
Brian Brazil a9168e60c4 Merge pull request #1497 from prometheus/fabxc/alerts
Remove old alerting syntax
2016-03-23 09:30:18 +00:00
Fabian Reinartz ab3d7a0ec0 Remove old alerting syntax 2016-03-23 10:19:00 +01:00
Julius Volz d3b53bd7f0 Fix comment about Graphite mapping of dimensions. 2016-03-23 00:38:32 +01:00
beorn7 4b574e8a61 Switch chunk encoding to type 2 where it was hardcoded type 1 before
The chunk encoding was hardcoded there because it mostly doesn't
matter what encoding is chosen in that test. Since type 1 is
battle-hardened enough, I'm switching to type 2 here so that we can
catch unexpected problems as a byproduct. My expectation is that the
chunk encoding doesn't matter anyway, as said, but then "unexpected
problems" contains the word "unexpected".
2016-03-20 23:32:20 +01:00
beorn7 c72979e3ed Remove a redundancy from Gorilla-style chunks
So far, the last sample in a chunk was saved twice. That's required
for adding more samples as we need to know the last sample added to
add more samples without iterating through the whole chunk. However,
once the last sample was added to the chunk before it's full, there is
no need to save it twice. Thus, the very last sample added to a chunk
can _only_ be saved in the header fields for the last sample. The
chunk has to be identifiable as closed, then. This information has
been added to the flags byte.
2016-03-20 23:09:48 +01:00
beorn7 b6dbb826ae Improve fuzz testing and fix a bug exposed
This improves fuzz testing in two ways:

(1) More realistic time stamps. So far, the most common case in
practice was very rare in the test: Completely regular increases of
the timestamp.

(2) Verify samples by scanning through the whole relevant section of
the series.

For Gorilla-like chunks, this showed two things:

(1) With more regularly increasing time stamps, BenchmarkFuzz is
essentially as fast as with the traditional chunks:

```
BenchmarkFuzzChunkType0-8              2         972514684 ns/op        83426196 B/op    2500044 allocs/op
BenchmarkFuzzChunkType1-8              2         971478001 ns/op        82874660 B/op    2512364 allocs/op
BenchmarkFuzzChunkType2-8              2         999339453 ns/op        76670636 B/op    2366116 allocs/op
```

(2) There was a bug related to when and how the chunk footer is
overwritten to make use for the last sample. This wasn't exposed by
random access as the last sample of a chunk is retrieved from the
values in the header in that case.
2016-03-20 17:21:28 +01:00
Brian Brazil 8788701ce7 Add test for incorrect behaviour 2016-03-18 12:07:40 +00:00
Brian Brazil 39d556f0d5 Move all the operator tests into one file 2016-03-18 12:02:44 +00:00
beorn7 9d8fbbe822 Review improvements 2016-03-17 17:31:56 +01:00
beorn7 8cdced3850 Implement Gorilla-inspired chunk encoding
This is not a verbatim implementation of the Gorilla encoding.  First
of all, it could not, even if we wanted, because Prometheus has a
different chunking model (constant size, not constant time).  Second,
this adds a number of changes that improve the encoding in general or
at least for the specific use case of Prometheus (and are partially
only possible in the context of Prometheus). See comments in the code
for details.
2016-03-17 14:47:08 +01:00
Björn Rabenstein e83f05fe93 Merge pull request #1492 from prometheus/beorn7/storage4
Merging what has already been reviewed in other PRs.
2016-03-17 14:43:59 +01:00
Björn Rabenstein 6f00df2ee8 Merge pull request #1453 from prometheus/beorn7/storage5
Quarantine series upon problem writing to the series file
2016-03-17 14:43:14 +01:00
beorn7 8e64e8dfca Fix return statement. 2016-03-17 14:43:00 +01:00
Björn Rabenstein 90eb0555df Merge pull request #1466 from prometheus/beorn7/storage6
Rework chunk iterators
2016-03-17 14:40:24 +01:00
Björn Rabenstein 98c8560851 Merge pull request #1477 from prometheus/beorn7/storage7
Solve the series churn problem...
2016-03-17 14:39:28 +01:00
beorn7 e7ac9c6863 Improvments based on review
- Moved returns into the default section of switch statement that can
  only happen then.

- Fix typo.
2016-03-17 14:37:24 +01:00
beorn7 168333d662 Merge branch 'beorn7/storage6' into beorn7/storage7 2016-03-16 17:02:35 +01:00
beorn7 79628ae883 Merge branch 'beorn7/storage5' into beorn7/storage6 2016-03-16 17:02:18 +01:00
beorn7 195853e8ba Merge branch 'beorn7/storage4' into beorn7/storage5 2016-03-16 17:02:04 +01:00
Fabian Reinartz 19c5eb6194 Merge pull request #1486 from prometheus/instrument-scrape-pool-sync
Instrument scrape pool `sync()`
2016-03-15 18:46:19 +01:00
stuart nelson dbe5d18b6e Instrument scrape pool sync()
Instruments:
- duration
- count
2016-03-14 18:30:16 +01:00
beorn7 199f309a39 Resurrect and rename invalid preload requests count metric.
It is now also used in label matching, so the name of the metric
changed from `prometheus_local_storage_invalid_preload_requests_total`
to `non_existent_series_matches_total'.
2016-03-13 11:54:24 +01:00
beorn7 2e4e2459a9 Merge branch 'master' into beorn7/storage7 2016-03-12 15:07:31 +01:00
stuart nelson 813f61e551 Merge pull request #1484 from prometheus/instrument-retrieval
Instrument retrieval/scrape.go
2016-03-11 12:26:00 +01:00
stuart nelson a1ee77601a Instrument the duration of the reload function 2016-03-11 12:12:42 +01:00
beorn7 e8c1f30ab2 Merge the parallel logic of getSeriesForRange and metricForFingerprint 2016-03-09 21:56:15 +01:00
beorn7 9445c7053d Add tests for range-limited label matching
While doing so, improve getSeriesForRange.
2016-03-09 21:01:03 +01:00
beorn7 47e3c90f9b Clean up error propagation
Only return an error where callers are doing something with it except
simply logging and ignoring.

All the errors touched in this commit flag the storage as dirty
anyway, and that fact is logged anyway. So most of what is being
removed here is just log spam.

As discussed earlier, the class of errors that flags the storage as
dirty signals fundamental corruption, no even bubbling up a one-time
warning to the user (e.g. about incomplete results) isn't helping much
because _anything_ happening in the storage has to be doubted from
that point on (and in fact retroactively into the past, too). Flagging
the storage dirty, and alerting on it (plus marking the state in the
web UI) is the only way I can see right now.

As a byproduct, I cleaned up the setDirty method a bit and improved
the logged errors.
2016-03-09 18:56:30 +01:00
beorn7 99854a84d7 Merge branch 'beorn7/storage6' into beorn7/storage7 2016-03-09 17:23:25 +01:00
beorn7 5e4fa96719 Merge branch 'beorn7/storage5' into beorn7/storage6 2016-03-09 17:21:32 +01:00
Björn Rabenstein 583b1f3753 Merge pull request #1483 from prometheus/beorn7/storage
Accumulated merge, already reviewed.
2016-03-09 17:18:05 +01:00
Björn Rabenstein c088b2669b Merge pull request #1424 from prometheus/beorn7/storage2
Move and improve lastSamplePair.
2016-03-09 17:16:44 +01:00
Björn Rabenstein b2ce53af00 Merge pull request #1426 from prometheus/beorn7/storage3
Improve predict_linear
2016-03-09 17:16:09 +01:00
Björn Rabenstein cd068c1e65 Merge pull request #1448 from prometheus/beorn7/storage4
Handle errors caused by data corruption more gracefully
2016-03-09 17:15:45 +01:00
beorn7 b343e65907 Merge branch 'beorn7/storage4' into beorn7/storage5
erge is necessary,
2016-03-09 17:14:42 +01:00
beorn7 d0a4477446 Merge branch 'beorn7/storage3' into beorn7/storage4
Conflicts:
	storage/local/preload.go
	storage/local/storage.go
	storage/local/storage_test.go
2016-03-09 17:13:16 +01:00
beorn7 55eddab25f Merge branch 'beorn7/storage2' into beorn7/storage3 2016-03-09 16:48:46 +01:00
Fabian Reinartz 7bcf0f2893 Merge pull request #1482 from prometheus/fabxc/testswap
Fix flaky test comparison
2016-03-09 16:36:39 +01:00
beorn7 161eada3ad Make chunkIterator even leaner. 2016-03-09 16:20:39 +01:00
Fabian Reinartz 895f2f092f Fix flaky scrape test
t
2016-03-09 16:00:33 +01:00
beorn7 dad302144d Make a naked return less naked 2016-03-09 15:06:00 +01:00
beorn7 beb36df4bb De-flag preloadChunksForRange
Now there is preloadChunksForRange and preloadChunksForInstant in
both, the series and the storage.
2016-03-09 14:50:09 +01:00
beorn7 bbd34d7ccf Merge branch 'beorn7/storage6' into beorn7/storage7 2016-03-09 00:50:33 +01:00
beorn7 7cdfae1466 Merge branch 'beorn7/storage5' into beorn7/storage6 2016-03-09 00:50:17 +01:00
beorn7 d6b00b4f6c Merge branch 'beorn7/storage4' into beorn7/storage5 2016-03-09 00:50:05 +01:00
beorn7 eb9caf13be Merge branch 'beorn7/storage3' into beorn7/storage4 2016-03-09 00:49:52 +01:00
beorn7 d284864c87 Merge branch 'beorn7/storage2' into beorn7/storage3 2016-03-09 00:49:41 +01:00
beorn7 dcb7c0d3ee Merge branch 'master' into beorn7/storage2 2016-03-09 00:48:51 +01:00
beorn7 836f1db04c Improve MetricsForLabelMatchers
WIP: This needs more tests.

It now gets a from and through value, which it may opportunistically
use to optimize the retrieval. With possible future range indices,
this could be used in a very efficient way. This change merely applies
some easy checks, which should nevertheless solve the use case of
heavy rule evaluations on servers with a lot of series churn.

Idea is the following:

- Only archive series that are at least as old as the headChunkTimeout
  (which was already extremely unlikely to happen).

- Then maintain a high watermark for the last archival, i.e. no
  archived series has a sample more recent than that watermark.

- Any query that doesn't reach to a time before that watermark doesn't
  have to touch the archive index at all. (A production server at
  Soundcloud with the aforementioned series churn and heavy rule
  evaluations spends 50% of its CPU time in archive index
  lookups. Since rule evaluations usually only touch very recent
  values, most of those lookup should disappear with this change.)

- Federation with a very broad label matcher will profit from this,
  too.

As a byproduct, the un-needed MetricForFingerprint method was removed
from the Storage interface.
2016-03-09 00:25:59 +01:00