Commit graph

4582 commits

Author SHA1 Message Date
beorn7 1ae50b1d1b vendoring: Update client_golang/prometheus
This is mostly required to enable summaries without quantiles
2017-04-11 12:58:24 +02:00
beorn7 92d4cf7663 vendoring: Remove unused packages 2017-04-11 12:58:24 +02:00
Brian Brazil 0e0fc5a7f4 Correct example name to adapter. (#2590) 2017-04-10 17:24:53 +01:00
Fabian Reinartz 757cba7c31 cmd/prometheus: Undo GOGC adjustment 2017-04-10 16:22:01 +02:00
Fabian Reinartz ece483c0c1 version: cut 2.0.0-alpha.0 2017-04-10 13:03:47 +02:00
Fabian Reinartz f2d610c1e5 vendor: update tsdb for fast equal matching 2017-04-10 13:00:27 +02:00
Björn Rabenstein acd72ae1a7 Merge pull request #2591 from prometheus/beorn7/storage
storage: Several optimizations of checkpointing
2017-04-07 20:02:14 +02:00
Goutham Veeramachaneni cffb1acf7f Test Longer Tests in Travis (#2570)
* Test Longer Tests in Travis

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>

* Make test Target Run All Tests

* Add test-short to run short tests

test is running all the tests now as we are running make tests in
CircleCI and I think the base image is shared across Prometheus Org.

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>

* Remove Empty Line

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-04-07 13:46:06 +02:00
beorn7 f20b84e816 flags: Improve doc strings for checkpoint flags 2017-04-07 13:10:12 +02:00
beorn7 f338d791d2 storage: Several optimizations of checkpointing
- checkpointSeriesMapAndHeads accepts a context now to allow
  cancelling.

- If a shutdown is initiated, cancel the ongoing checkpoint. (We will
  create a final checkpoint anyway.)

- Always wait for at least as long as the last checkpoint took before
  starting the next checkpoint (to cap the time spending checkpointing
  at 50%).

- If an error has occurred during checkpointing, don't bother to sync
  the write.

- Make sure the temporary checkpoint file is deleted, even if an error
  has occurred.

- Clean up the checkpoint loop a bit. (The concurrent Timer.Reset(0)
  call might have cause a race.)
2017-04-07 13:10:12 +02:00
Björn Rabenstein 934d86b936 Merge pull request #2593 from prometheus/beorn7/storage2
storage: Recover from corrupted indices for archived series
2017-04-07 12:55:35 +02:00
Goutham Veeramachaneni 0f48d07f95 Fix Map Race by Moving Locking closer to the Write (#2476) 2017-04-07 08:55:01 +02:00
Julius Volz 182d7de9cd Merge pull request #2597 from richardkiene/CMON-53
Add triton zone brand metadata
2017-04-07 01:02:02 +02:00
Björn Rabenstein 38bcba11fe Merge pull request #2594 from prometheus/beorn7/storage3
storage: Guard against a corner case of data corruption
2017-04-07 00:52:28 +02:00
Björn Rabenstein f0076aca01 Merge pull request #2595 from prometheus/beorn7/storage4
storage: Guard against appending to evicted chunk
2017-04-07 00:51:53 +02:00
Tom Wilkie e5d7bbfc3c Remote writes: retry on recoverable errors. (#2552)
* Remote writes: retry on recoverable errors.

* Add comments

* Review feedback

* Comments

* Review feedback

* Final spelling misteak (I hope).  Plus, record failed samples correctly.
2017-04-07 00:15:41 +02:00
Richard Kiene ec692f6161 Add triton zone brand metadata 2017-04-06 21:35:42 +00:00
beorn7 7199a9d9d4 storage: Guard against appending to evicted chunk
Fixes #2480. For certain definition of "fixes".

This is something that should never happen. Sadly, it does happen,
albeit extremely rarely. This could be some weird cornercase we
haven't covered yet. Or it happens as a consequesnce of data
corruption or a crash recovery gone bad.

This is not a "real" fix as we don't know the root cause of the
incident reported in #2480. However, this makes sure the server does
not crash, but deals gracefully with the problem: The series in
question is quarantined, which even makes it available for forensics.
2017-04-06 20:02:52 +02:00
beorn7 3d12906286 storage: Guard against a corner case of data corruption
Fixes #2475.
2017-04-06 19:50:32 +02:00
beorn7 4fcc73a04c storage: Recover from corrupted indices for archived series
An unopenable archived_fingerprint_to_timerange is simply deleted and
will be rebuilt during crash recovery (wich can then take quite some time).

An unopenable archived_fingerprint_to_metric is not deleted but
instructions to the user are logged. A deletion has to be done by the
user explicitly as it means losing all archived series (and a repair
with a 3rd party tool might still be possible).
2017-04-06 19:26:39 +02:00
Julius Volz 9775ad4754 Merge pull request #2588 from prometheus/read-multi
Separate out remote read responses.
2017-04-06 17:10:31 +02:00
Conor Broderick c72692fd75 Fixed issue of partially hidden y-axis values on graph (#2589) 2017-04-06 16:04:44 +01:00
Brian Brazil c813c824d4 Separate out remote read responses.
Fixes #2574
2017-04-06 15:49:47 +01:00
Björn Rabenstein 516a96d9a3 Merge pull request #2587 from prometheus/beorn7/storage2
storage: Mark storage as dirty if indexing fails
2017-04-06 16:42:06 +02:00
Julius Volz beeb0b55c0 Merge pull request #2572 from weaveworks/2571-propagate-api-error
Add promql.ErrStorage, which the API propagates as a 500.
2017-04-06 16:36:20 +02:00
Björn Rabenstein fdd2bc22ae Merge pull request #2583 from prometheus/beorn7/storage
storage: Increment s.persistErrors on all persist errors
2017-04-06 15:56:49 +02:00
beorn7 ed5f68f382 storage: Increment s.persistErrors on all persist errors
Fixes #2091
2017-04-06 15:55:15 +02:00
Tom Wilkie f0e8a5f37c Add promql.ErrStorage, which is interpreted by the API as a 500. 2017-04-06 14:41:23 +01:00
beorn7 f3365c4f26 storage: Mark storage as dirty if indexing fails 2017-04-06 15:29:33 +02:00
Julius Volz 5f764d9940 Merge pull request #2582 from mdlayher/scrape-header-rename
retrieval: make scrape timeout header consistent with others
2017-04-05 23:13:32 +02:00
Matt Layher 5e4f5fb5ad retrieval: make scrape timeout header consistent with others 2017-04-05 14:56:22 -04:00
Brian Brazil 26bedc9e00 Revert use of buildVersion in console templates. (#2579)
This function isn't available in console templates,
so go back to pre-#2468 state to get things working again.
2017-04-05 15:19:17 +01:00
Alexey Palazhchenko 17f15d024a Small fixes. (#2578)
Fix typos. Simplify with gofmt -s
2017-04-05 14:24:22 +01:00
Fabian Reinartz 8c768f2ca3 web: Fix federation for instance label 2017-04-05 14:53:34 +02:00
Björn Rabenstein 425f591fc9 Merge pull request #2576 from prometheus/beorn7/storage
storage: Check for negative values from varint decoding
2017-04-04 23:23:51 +02:00
Julius Volz a874556a66 Merge pull request #2577 from prometheus/beorn7/storage2
storage: Fix `go vet` error
2017-04-04 19:44:42 +02:00
Matt Layher fe4b6693f7 retrieval: add Scrape-Timeout-Seconds header to each scrape request (#2565)
Fixes #2508.
2017-04-04 18:26:28 +01:00
beorn7 ae286385fd storage: Check for negative values from varint decoding
Sadly, we have a number of places where we use varint encoding for
numbers that cannot be negative. We could have saved a bit by using
uvarint encoding. On the bright side, we now have a 50% chance to
detect data corruption. :-/

Fixes #1800 and #2492.
2017-04-04 19:14:52 +02:00
beorn7 9b6a1dad05 storage: Fix go vet error 2017-04-04 19:14:09 +02:00
Fabian Reinartz 47b0b9f7b0 vendor: add tsdb support for windows 2017-04-04 16:55:30 +02:00
Julius Volz 5f3327f620 Merge pull request #2568 from AlekSi/patch-1
Use latest released Go 1.8.x
2017-04-04 15:54:30 +02:00
Fabian Reinartz cb84d6057e vendor: add non-amd64 implementation for xxhash 2017-04-04 15:19:22 +02:00
Fabian Reinartz 8ffc851147 Merge branch 'master' into dev-2.0 2017-04-04 15:17:56 +02:00
Alexey Palazhchenko 535a18e978 Use latest released Go 1.8.x 2017-04-04 13:52:18 +03:00
Fabian Reinartz cfb2a7f1d5 vendor: sync organisation migration of tsdb 2017-04-04 11:33:51 +02:00
Fabian Reinartz bbcf20ba01 web: deduplicate series in federation 2017-04-04 11:20:23 +02:00
Fabian Reinartz f56644e3ae api/v1: deduplicate selected series 2017-04-04 11:09:11 +02:00
Fabian Reinartz 4e41987bcb storage: add deduplication function
This adds a function to deduplicate two series sets given that duplicate
series have equivalent data points.
2017-04-04 11:07:21 +02:00
Björn Rabenstein 50e4f49b7e Merge pull request #2561 from prometheus/beorn7/storage2
storage: Evict unused chunk.Descs in crash recovery
2017-04-04 00:05:03 +02:00
beorn7 08fc6cbd39 storage: Evict unused chunk.Descs in crash recovery
This is in line with the v1.5 change in paradigm to not keep
chunk.Descs without chunks around after a series maintenance.

It's mainly motivated by avoiding excessive amounts of RAM usage
during crash recovery.

The code avoids to create memory time series with zero chunk.Descs as
that is prone to trigger weird effects. (Series maintenance would
archive series with zero chunk.Descs, but we cannot do that here
because the archive indices still have to be checked.)
2017-04-04 00:04:22 +02:00