prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-11-17 02:54:05 -08:00

Author	SHA1	Message	Date
Goutham Veeramachaneni	cffb1acf7f	Test Longer Tests in Travis (#2570 ) * Test Longer Tests in Travis Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in> * Make test Target Run All Tests * Add test-short to run short tests test is running all the tests now as we are running make tests in CircleCI and I think the base image is shared across Prometheus Org. Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in> * Remove Empty Line Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-04-07 13:46:06 +02:00
beorn7	f20b84e816	flags: Improve doc strings for checkpoint flags	2017-04-07 13:10:12 +02:00
beorn7	f338d791d2	storage: Several optimizations of checkpointing - checkpointSeriesMapAndHeads accepts a context now to allow cancelling. - If a shutdown is initiated, cancel the ongoing checkpoint. (We will create a final checkpoint anyway.) - Always wait for at least as long as the last checkpoint took before starting the next checkpoint (to cap the time spending checkpointing at 50%). - If an error has occurred during checkpointing, don't bother to sync the write. - Make sure the temporary checkpoint file is deleted, even if an error has occurred. - Clean up the checkpoint loop a bit. (The concurrent Timer.Reset(0) call might have cause a race.)	2017-04-07 13:10:12 +02:00
Björn Rabenstein	934d86b936	Merge pull request #2593 from prometheus/beorn7/storage2 storage: Recover from corrupted indices for archived series	2017-04-07 12:55:35 +02:00
Goutham Veeramachaneni	0f48d07f95	Fix Map Race by Moving Locking closer to the Write (#2476 )	2017-04-07 08:55:01 +02:00
Julius Volz	182d7de9cd	Merge pull request #2597 from richardkiene/CMON-53 Add triton zone brand metadata	2017-04-07 01:02:02 +02:00
Björn Rabenstein	38bcba11fe	Merge pull request #2594 from prometheus/beorn7/storage3 storage: Guard against a corner case of data corruption	2017-04-07 00:52:28 +02:00
Björn Rabenstein	f0076aca01	Merge pull request #2595 from prometheus/beorn7/storage4 storage: Guard against appending to evicted chunk	2017-04-07 00:51:53 +02:00
Tom Wilkie	e5d7bbfc3c	Remote writes: retry on recoverable errors. (#2552 ) * Remote writes: retry on recoverable errors. * Add comments * Review feedback * Comments * Review feedback * Final spelling misteak (I hope). Plus, record failed samples correctly.	2017-04-07 00:15:41 +02:00
Richard Kiene	ec692f6161	Add triton zone brand metadata	2017-04-06 21:35:42 +00:00
beorn7	7199a9d9d4	storage: Guard against appending to evicted chunk Fixes #2480. For certain definition of "fixes". This is something that should never happen. Sadly, it does happen, albeit extremely rarely. This could be some weird cornercase we haven't covered yet. Or it happens as a consequesnce of data corruption or a crash recovery gone bad. This is not a "real" fix as we don't know the root cause of the incident reported in #2480. However, this makes sure the server does not crash, but deals gracefully with the problem: The series in question is quarantined, which even makes it available for forensics.	2017-04-06 20:02:52 +02:00
beorn7	3d12906286	storage: Guard against a corner case of data corruption Fixes #2475.	2017-04-06 19:50:32 +02:00
beorn7	4fcc73a04c	storage: Recover from corrupted indices for archived series An unopenable archived_fingerprint_to_timerange is simply deleted and will be rebuilt during crash recovery (wich can then take quite some time). An unopenable archived_fingerprint_to_metric is not deleted but instructions to the user are logged. A deletion has to be done by the user explicitly as it means losing all archived series (and a repair with a 3rd party tool might still be possible).	2017-04-06 19:26:39 +02:00
Julius Volz	9775ad4754	Merge pull request #2588 from prometheus/read-multi Separate out remote read responses.	2017-04-06 17:10:31 +02:00
Conor Broderick	c72692fd75	Fixed issue of partially hidden y-axis values on graph (#2589 )	2017-04-06 16:04:44 +01:00
Brian Brazil	c813c824d4	Separate out remote read responses. Fixes #2574	2017-04-06 15:49:47 +01:00
Björn Rabenstein	516a96d9a3	Merge pull request #2587 from prometheus/beorn7/storage2 storage: Mark storage as dirty if indexing fails	2017-04-06 16:42:06 +02:00
Julius Volz	beeb0b55c0	Merge pull request #2572 from weaveworks/2571-propagate-api-error Add promql.ErrStorage, which the API propagates as a 500.	2017-04-06 16:36:20 +02:00
Björn Rabenstein	fdd2bc22ae	Merge pull request #2583 from prometheus/beorn7/storage storage: Increment s.persistErrors on all persist errors	2017-04-06 15:56:49 +02:00
beorn7	ed5f68f382	storage: Increment s.persistErrors on all persist errors Fixes #2091	2017-04-06 15:55:15 +02:00
Tom Wilkie	f0e8a5f37c	Add promql.ErrStorage, which is interpreted by the API as a 500.	2017-04-06 14:41:23 +01:00
beorn7	f3365c4f26	storage: Mark storage as dirty if indexing fails	2017-04-06 15:29:33 +02:00
Julius Volz	5f764d9940	Merge pull request #2582 from mdlayher/scrape-header-rename retrieval: make scrape timeout header consistent with others	2017-04-05 23:13:32 +02:00
Matt Layher	5e4f5fb5ad	retrieval: make scrape timeout header consistent with others	2017-04-05 14:56:22 -04:00
Brian Brazil	26bedc9e00	Revert use of buildVersion in console templates. (#2579 ) This function isn't available in console templates, so go back to pre-#2468 state to get things working again.	2017-04-05 15:19:17 +01:00
Alexey Palazhchenko	17f15d024a	Small fixes. (#2578 ) Fix typos. Simplify with gofmt -s	2017-04-05 14:24:22 +01:00
Björn Rabenstein	425f591fc9	Merge pull request #2576 from prometheus/beorn7/storage storage: Check for negative values from varint decoding	2017-04-04 23:23:51 +02:00
Julius Volz	a874556a66	Merge pull request #2577 from prometheus/beorn7/storage2 storage: Fix `go vet` error	2017-04-04 19:44:42 +02:00
Matt Layher	fe4b6693f7	retrieval: add Scrape-Timeout-Seconds header to each scrape request (#2565 ) Fixes #2508.	2017-04-04 18:26:28 +01:00
beorn7	ae286385fd	storage: Check for negative values from varint decoding Sadly, we have a number of places where we use varint encoding for numbers that cannot be negative. We could have saved a bit by using uvarint encoding. On the bright side, we now have a 50% chance to detect data corruption. :-/ Fixes #1800 and #2492.	2017-04-04 19:14:52 +02:00
beorn7	9b6a1dad05	storage: Fix `go vet` error	2017-04-04 19:14:09 +02:00
Julius Volz	5f3327f620	Merge pull request #2568 from AlekSi/patch-1 Use latest released Go 1.8.x	2017-04-04 15:54:30 +02:00
Alexey Palazhchenko	535a18e978	Use latest released Go 1.8.x	2017-04-04 13:52:18 +03:00
Björn Rabenstein	50e4f49b7e	Merge pull request #2561 from prometheus/beorn7/storage2 storage: Evict unused chunk.Descs in crash recovery	2017-04-04 00:05:03 +02:00
beorn7	08fc6cbd39	storage: Evict unused chunk.Descs in crash recovery This is in line with the v1.5 change in paradigm to not keep chunk.Descs without chunks around after a series maintenance. It's mainly motivated by avoiding excessive amounts of RAM usage during crash recovery. The code avoids to create memory time series with zero chunk.Descs as that is prone to trigger weird effects. (Series maintenance would archive series with zero chunk.Descs, but we cannot do that here because the archive indices still have to be checked.)	2017-04-04 00:04:22 +02:00
Julius Volz	eda4286484	Merge pull request #2557 from prometheus/influxdb-read Add InfluxDB read-back support to remote storage bridge	2017-04-03 18:29:22 +02:00
Björn Rabenstein	1c6240fc40	Merge pull request #2559 from prometheus/beorn7/storage storage: Replace fpIter by sortedFPs	2017-04-03 16:56:21 +02:00
beorn7	d284ffab03	storage: Replace fpIter by sortedFPs The fpIter was kind of cumbersome to use and required a lock for each iteration (which wasn't even needed for the iteration at startup after loading the checkpoint). The new implementation here has an obvious penalty in memory, but it's only 8 byte per series, so 80MiB for a beefy server with 10M memory time series (which would probably need ~100GiB RAM, so the memory penalty is only 0.1% of the total memory need). The big advantage is that now series maintenance happens in order, which leads to the time between two maintenances of the same series being less random. Ideally, after each maintenance, the next maintenance would tackle the series with the largest number of non-persisted chunks. That would be quite an effort to find out or track, but with the approach here, the next maintenance will tackle the series whose previous maintenance is longest ago, which is a good approximation. While this commit won't change the _average_ number of chunks persisted per maintenance, it will reduce the mean time a given chunk has to wait for its persistence and thus reduce the steady-state number of chunks waiting for persistence. Also, the map iteration in Go is non-deterministic but not truly random. In practice, the iteration appears to be somewhat "bucketed". You can often observe a bunch of series with similar duration since their last maintenance, i.e. you see batches of series with similar number of chunks persisted per maintenance. If that batch is relatively young, a whole lot of series are maintained with very few chunks to persist. (See screenshot in PR for a better explanation.)	2017-04-03 15:34:46 +02:00
Tobias Schmidt	eac36d123e	Fix unstable fanin test (#2558 )	2017-04-03 13:02:15 +02:00
Conor Broderick	dafae52efa	Display total number of returned elements on console (#2532 ) Display total number of returned elements on console	2017-04-03 11:52:25 +01:00
Julius Volz	111841a230	Vendor new InfluxDB client library	2017-04-03 12:38:05 +02:00
Fabian Reinartz	e18be8d1a5	Merge pull request #2556 from prometheus/grobie/count-missed-group-executions Export number of missed rule evaluations	2017-04-03 10:09:12 +02:00
Julius Volz	3581057ea4	Update remote storage bridge README.md	2017-04-03 01:42:49 +02:00
Julius Volz	b391cbb808	Add InfluxDB read-back support to remote storage bridge	2017-04-03 01:42:43 +02:00
Tobias Schmidt	eaf33759fb	Register forgotten prometheus_evaluator_iterations_total metric	2017-04-02 20:32:56 -03:00
Tobias Schmidt	aaaba57184	Export number of missed rule evaluations In case the execution of all rules takes longer than the configured rule evaluation interval, one or more iterations will be skipped. This needs to be visible to the opterator.	2017-04-02 20:03:28 -03:00
Julius Volz	5a896033e3	Add remote read external label handling (#2555 ) * Add remote read external label handling This implements rule 1 and 2 from https://docs.google.com/document/d/188YauRgfF0J4CYMigLsVNN34V_kUwKnApBs2dQMfBbs/edit * Use more descriptive example labels in read test * Add comment for querier.addExternalLabels() * Make argument naming in removeLabels() more generic	2017-04-02 17:48:15 +02:00
Julius Volz	9cc7b393c5	Merge pull request #2548 from prometheus/sort-targets Sort targets by instance within a job	2017-04-01 00:07:31 +02:00
Julius Volz	589061919a	Merge pull request #2465 from Gouthamve/alert-metrics-2429 Better Metrics For Alerts	2017-03-31 21:45:05 +02:00
Goutham Veeramachaneni	f27ce34a13	Use Registerer to Register All Metrics * Made Metric a Gauge so that it can be registered.	2017-04-01 00:14:30 +05:30

1 2 3 4 5 ...

3894 commits