Commit graph

3922 commits

Author SHA1 Message Date
beorn7 9d8fbbe822 Review improvements 2016-03-17 17:31:56 +01:00
beorn7 8cdced3850 Implement Gorilla-inspired chunk encoding
This is not a verbatim implementation of the Gorilla encoding.  First
of all, it could not, even if we wanted, because Prometheus has a
different chunking model (constant size, not constant time).  Second,
this adds a number of changes that improve the encoding in general or
at least for the specific use case of Prometheus (and are partially
only possible in the context of Prometheus). See comments in the code
for details.
2016-03-17 14:47:08 +01:00
Björn Rabenstein e83f05fe93 Merge pull request #1492 from prometheus/beorn7/storage4
Merging what has already been reviewed in other PRs.
2016-03-17 14:43:59 +01:00
Björn Rabenstein 6f00df2ee8 Merge pull request #1453 from prometheus/beorn7/storage5
Quarantine series upon problem writing to the series file
2016-03-17 14:43:14 +01:00
beorn7 8e64e8dfca Fix return statement. 2016-03-17 14:43:00 +01:00
Björn Rabenstein 90eb0555df Merge pull request #1466 from prometheus/beorn7/storage6
Rework chunk iterators
2016-03-17 14:40:24 +01:00
Björn Rabenstein 98c8560851 Merge pull request #1477 from prometheus/beorn7/storage7
Solve the series churn problem...
2016-03-17 14:39:28 +01:00
beorn7 e7ac9c6863 Improvments based on review
- Moved returns into the default section of switch statement that can
  only happen then.

- Fix typo.
2016-03-17 14:37:24 +01:00
beorn7 168333d662 Merge branch 'beorn7/storage6' into beorn7/storage7 2016-03-16 17:02:35 +01:00
beorn7 79628ae883 Merge branch 'beorn7/storage5' into beorn7/storage6 2016-03-16 17:02:18 +01:00
beorn7 195853e8ba Merge branch 'beorn7/storage4' into beorn7/storage5 2016-03-16 17:02:04 +01:00
Fabian Reinartz 19c5eb6194 Merge pull request #1486 from prometheus/instrument-scrape-pool-sync
Instrument scrape pool `sync()`
2016-03-15 18:46:19 +01:00
stuart nelson dbe5d18b6e Instrument scrape pool sync()
Instruments:
- duration
- count
2016-03-14 18:30:16 +01:00
beorn7 199f309a39 Resurrect and rename invalid preload requests count metric.
It is now also used in label matching, so the name of the metric
changed from `prometheus_local_storage_invalid_preload_requests_total`
to `non_existent_series_matches_total'.
2016-03-13 11:54:24 +01:00
beorn7 2e4e2459a9 Merge branch 'master' into beorn7/storage7 2016-03-12 15:07:31 +01:00
stuart nelson 813f61e551 Merge pull request #1484 from prometheus/instrument-retrieval
Instrument retrieval/scrape.go
2016-03-11 12:26:00 +01:00
stuart nelson a1ee77601a Instrument the duration of the reload function 2016-03-11 12:12:42 +01:00
beorn7 e8c1f30ab2 Merge the parallel logic of getSeriesForRange and metricForFingerprint 2016-03-09 21:56:15 +01:00
beorn7 9445c7053d Add tests for range-limited label matching
While doing so, improve getSeriesForRange.
2016-03-09 21:01:03 +01:00
beorn7 47e3c90f9b Clean up error propagation
Only return an error where callers are doing something with it except
simply logging and ignoring.

All the errors touched in this commit flag the storage as dirty
anyway, and that fact is logged anyway. So most of what is being
removed here is just log spam.

As discussed earlier, the class of errors that flags the storage as
dirty signals fundamental corruption, no even bubbling up a one-time
warning to the user (e.g. about incomplete results) isn't helping much
because _anything_ happening in the storage has to be doubted from
that point on (and in fact retroactively into the past, too). Flagging
the storage dirty, and alerting on it (plus marking the state in the
web UI) is the only way I can see right now.

As a byproduct, I cleaned up the setDirty method a bit and improved
the logged errors.
2016-03-09 18:56:30 +01:00
beorn7 99854a84d7 Merge branch 'beorn7/storage6' into beorn7/storage7 2016-03-09 17:23:25 +01:00
beorn7 5e4fa96719 Merge branch 'beorn7/storage5' into beorn7/storage6 2016-03-09 17:21:32 +01:00
Björn Rabenstein 583b1f3753 Merge pull request #1483 from prometheus/beorn7/storage
Accumulated merge, already reviewed.
2016-03-09 17:18:05 +01:00
Björn Rabenstein c088b2669b Merge pull request #1424 from prometheus/beorn7/storage2
Move and improve lastSamplePair.
2016-03-09 17:16:44 +01:00
Björn Rabenstein b2ce53af00 Merge pull request #1426 from prometheus/beorn7/storage3
Improve predict_linear
2016-03-09 17:16:09 +01:00
Björn Rabenstein cd068c1e65 Merge pull request #1448 from prometheus/beorn7/storage4
Handle errors caused by data corruption more gracefully
2016-03-09 17:15:45 +01:00
beorn7 b343e65907 Merge branch 'beorn7/storage4' into beorn7/storage5
erge is necessary,
2016-03-09 17:14:42 +01:00
beorn7 d0a4477446 Merge branch 'beorn7/storage3' into beorn7/storage4
Conflicts:
	storage/local/preload.go
	storage/local/storage.go
	storage/local/storage_test.go
2016-03-09 17:13:16 +01:00
beorn7 55eddab25f Merge branch 'beorn7/storage2' into beorn7/storage3 2016-03-09 16:48:46 +01:00
Fabian Reinartz 7bcf0f2893 Merge pull request #1482 from prometheus/fabxc/testswap
Fix flaky test comparison
2016-03-09 16:36:39 +01:00
beorn7 161eada3ad Make chunkIterator even leaner. 2016-03-09 16:20:39 +01:00
Fabian Reinartz 895f2f092f Fix flaky scrape test
t
2016-03-09 16:00:33 +01:00
beorn7 dad302144d Make a naked return less naked 2016-03-09 15:06:00 +01:00
beorn7 beb36df4bb De-flag preloadChunksForRange
Now there is preloadChunksForRange and preloadChunksForInstant in
both, the series and the storage.
2016-03-09 14:50:09 +01:00
beorn7 bbd34d7ccf Merge branch 'beorn7/storage6' into beorn7/storage7 2016-03-09 00:50:33 +01:00
beorn7 7cdfae1466 Merge branch 'beorn7/storage5' into beorn7/storage6 2016-03-09 00:50:17 +01:00
beorn7 d6b00b4f6c Merge branch 'beorn7/storage4' into beorn7/storage5 2016-03-09 00:50:05 +01:00
beorn7 eb9caf13be Merge branch 'beorn7/storage3' into beorn7/storage4 2016-03-09 00:49:52 +01:00
beorn7 d284864c87 Merge branch 'beorn7/storage2' into beorn7/storage3 2016-03-09 00:49:41 +01:00
beorn7 dcb7c0d3ee Merge branch 'master' into beorn7/storage2 2016-03-09 00:48:51 +01:00
beorn7 836f1db04c Improve MetricsForLabelMatchers
WIP: This needs more tests.

It now gets a from and through value, which it may opportunistically
use to optimize the retrieval. With possible future range indices,
this could be used in a very efficient way. This change merely applies
some easy checks, which should nevertheless solve the use case of
heavy rule evaluations on servers with a lot of series churn.

Idea is the following:

- Only archive series that are at least as old as the headChunkTimeout
  (which was already extremely unlikely to happen).

- Then maintain a high watermark for the last archival, i.e. no
  archived series has a sample more recent than that watermark.

- Any query that doesn't reach to a time before that watermark doesn't
  have to touch the archive index at all. (A production server at
  Soundcloud with the aforementioned series churn and heavy rule
  evaluations spends 50% of its CPU time in archive index
  lookups. Since rule evaluations usually only touch very recent
  values, most of those lookup should disappear with this change.)

- Federation with a very broad label matcher will profit from this,
  too.

As a byproduct, the un-needed MetricForFingerprint method was removed
from the Storage interface.
2016-03-09 00:25:59 +01:00
Björn Rabenstein eebe077f98 Merge pull request #1476 from prometheus/beorn7/makefile
Use UTC for build timestamp
2016-03-08 18:18:54 +01:00
beorn7 6ba379e256 Use UTC for build timestamp 2016-03-08 17:47:17 +01:00
beorn7 d77d625ad3 Merge branch 'master' into beorn7/storage6 2016-03-08 17:39:14 +01:00
Brian Brazil 84c421da8e Merge pull request #1475 from prometheus/fabxc/targetsort
Sort exported targets
2016-03-08 16:24:55 +00:00
Fabian Reinartz f2e359962c Sort exported targets 2016-03-08 17:12:27 +01:00
Fabian Reinartz eb915ec40f Merge pull request #1474 from prometheus/fabxc/spinfix
Handle closed target provider channel
2016-03-08 17:02:05 +01:00
Fabian Reinartz 56fc9bdff3 Handle closed target provider channel
This fixes the case where a target provider closes the update
channel and exits before the context is canceled.
This should only be true for the static provider but it's safer
to generally handle this case.
2016-03-08 15:49:03 +01:00
Tobias Schmidt 2f151d02eb Merge pull request #1456 from prometheus/validate-alertmanager-url
Validate alertmanager URL
2016-03-07 20:09:46 -05:00
Tobias Schmidt 7763bbd993 Validate alertmanager URL 2016-03-07 20:07:17 -05:00