prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-12-27 14:39:40 -08:00

Author	SHA1	Message	Date
Julius Volz	9412b296d5	Remove labels on persist error counter. This fixes https://github.com/prometheus/prometheus/issues/496	2015-02-01 14:03:34 +01:00
Bjoern Rabenstein	3948e2a7f8	Move lost files to an "orphaned" directory. Previously, those were simply deleted. The orphaned files can now be used for forensics if needed.	2015-01-29 14:52:12 +01:00
Bjoern Rabenstein	c24bfdf701	Move crash related code into separate file. persistence.go is way too long anyway, and a lot of code is just crash recovery, which is not important to understand the normal operation. Also, remove unused `exists` function.	2015-01-29 13:13:16 +01:00
Bjoern Rabenstein	ab386d1f5d	Declare storage.local.index-cache-size.* default values as tweaked.	2015-01-29 13:04:54 +01:00
Bjoern Rabenstein	73f6dc4d44	Make KeyValueStore.Delete report if the key to delete was found. Previously, it would return an error instead. Now we can distinguish the cases 'error while deleting known key' vs. 'key not in index' without testing for leveldb-internal kinds of errors.	2015-01-29 12:57:50 +01:00
Bjoern Rabenstein	2c8d324ca4	Remove check that did not check anything.	2015-01-26 13:48:24 +01:00
Bjoern Rabenstein	2c8fdcbc23	Remove a deadlock during shutdown. If queries are still running when the shutdown is initiated, they will finish _during_ the shutdown. In that case, they might request chunk eviction upon unpinning their pinned chunks. That might completely fill the evict request queue _after_ draining it during storage shutdown. If that ever happens (which is the case if there are _many_ queries still running during shutdown), the affected queries will be stuck while keeping a fingerprint locked. The checkpointing can then not process that fingerprint (or one that shares the same lock). And then we are deadlocked.	2015-01-22 14:42:15 +01:00
Bjoern Rabenstein	5859b74f1b	Clean up license issues. - Move CONTRIBUTORS.md to the more common AUTHORS. - Added the required NOTICE file. - Changed "Prometheus Team" to "The Prometheus Authors". - Reverted the erroneous changes to the Apache License.	2015-01-21 20:07:45 +01:00
Bjoern Rabenstein	f298af5756	Use named returns in flock.New.	2015-01-19 14:31:16 +01:00
Bjoern Rabenstein	baca6faa1c	Add double-start protection. This mimics the locking leveldb is performing anyway. Advantages of doing it separately: - Should we ever replace the leveldb implementation by one without double-start protection, we are still good. - In contrast to leveldb, the new code creates a meaningful error message.	2015-01-14 17:13:42 +01:00
Julius Volz	a6bc42bc61	Minor formatting/spelling fixups.	2015-01-09 11:04:20 +01:00
Bjoern Rabenstein	0851945054	Add a heuristics to checkpoint early if there are many "dirty" series..	2015-01-08 20:15:58 +01:00
Bjoern Rabenstein	622e8350cd	Fix a bug handling freshly unarchived series. Usually, if you unarchive a series, it is to add something to it, which will create a new head chunk. However, if a series in unarchived, and before anything is added to it, it is handled by the maintenance loop, it will be archived again. In that case, we have to load the chunkDescs to know the lastTime of the series to be archived. Usually, this case will happen only rarely (as a race, has never happened so far, possibly because the locking around unarchiving and the subsequent sample append is smart enough). However, during crash recovery, we sometimes treat series as "freshly unarchived" without directly appending a sample. We might add more cases of that type later, so better deal with archiving properly and load chunkDescs if required.	2015-01-08 16:25:50 +01:00
Bjoern Rabenstein	eb932d1524	Remove a deadlock during shutdown.	2015-01-07 19:02:38 +01:00
Brian Brazil	e56786b221	Have scrape time as a pseudovariable, not a prometheus variable. This ensures it has the right timestamp, and is easier to work with. Switch sd variable away from 'outcome', using total/failed instead.	2014-12-27 00:39:33 +00:00
Bjoern Rabenstein	ff24070a03	Fix embarrassing bug in crash recovery. (And yes, we always knew we need tests for that. I have added a TODO now.) Change-Id: I9cf52bbf98e263e0b79404bda4c442beba9696a8	2014-12-17 17:18:04 +01:00
Julius Volz	c9618d11e8	Introduce copy-on-write for metrics in AST. This depends on changes in: https://github.com/prometheus/client_golang/tree/cow-metrics. Change-Id: I80b94833a60ddf954c7cd92fd2cfbebd8dd46142	2014-12-12 20:34:55 +01:00
Bjoern Rabenstein	afd864e7f4	Adjust to the new version of goleveldb. (And yes, we do want vendoring for that... This is just the quick fix.) Change-Id: I9d347a64d96de6b3390a0e35c8d466f14bb83e4e	2014-12-10 18:04:29 +01:00
Bjoern Rabenstein	fee88a7a77	Remove the remaining races, new and old. Also, resolve a few other TODOs. Change-Id: Icb39b5a5e8ca22ebcb48771cd8951c5d9e112691	2014-12-03 18:07:23 +01:00
Bjoern Rabenstein	66c80b5ebd	Fix typo. Change-Id: I72608c7841c00145458807d3c3ee29db7b5ac2bc	2014-11-28 12:50:19 +01:00
Bjoern Rabenstein	674624f1c8	Completed more TODOs. - Documented checkpoint file format. - High-level description of series sanitation. - Replace fp.LoadFromString panic with an error. (Change in client_golang already submitted.) - Introduced checks for series file size where appropriate. - Removed two Law of Demeter violations. Change-Id: I555d97a2c8f4769820c2fc8bf5d6f4e160222abc	2014-11-27 20:46:45 +01:00
Bjoern Rabenstein	7d11019aa2	Squash a few trivial TODOs. - Delete unneeded file view_adapter.go. - Assessed that we still need the fingerprints in nodes (to create iterators). - Turned numMemChunkDescs into a metric. Change-Id: I29be963c795a075ec00c095f76bf26405535609d	2014-11-27 18:26:06 +01:00
Bjoern Rabenstein	49683c0c20	Avoid test flags in normal binary. Change-Id: If1fba813a73bf93ea5918dcda326e3ffa81a797d	2014-11-27 18:04:48 +01:00
Bjoern Rabenstein	9bc05052ad	Add line that has mysteriously disappeared after rebase. Change-Id: I3612eb0b626e66e607b363e9801f187d2ba637a3	2014-11-25 17:15:56 +01:00
Bjoern Rabenstein	14bda4180c	Changes after pair code review. Change-Id: Ib72d40f8e9027818cfbbd32a7a7201eebda07455	2014-11-25 17:12:59 +01:00
Bjoern Rabenstein	9ea808cd8b	Remove debug log line. Change-Id: Icdd2351b89f2d37ac2b615f9cf872e054c694ad1	2014-11-25 17:10:39 +01:00
Bjoern Rabenstein	bb42cc2e2d	Evict based on memory pressure. Evict recently used chunks last. Change-Id: Ie6168f0cdb3917bdc63b6fe15585dd70c1e42afe	2014-11-25 17:10:39 +01:00
Bjoern Rabenstein	e23ee0f7cc	Fix race in test. Change-Id: I53e1a4c5a6b5f846acd76043166b6cb7bf7d5dc7	2014-11-25 17:10:39 +01:00
Bjoern Rabenstein	d73e851b14	Tweak timing in the maintenance loop. Change-Id: I9801c4f9a22c3b3dc1ce1af81fdd9e992a4f4dd7	2014-11-25 17:10:39 +01:00
Bjoern Rabenstein	2672aa8ece	Instrument series maintenance. Change-Id: Ie4269d07ad4d23d44230c95a523088b472718e54	2014-11-25 17:10:39 +01:00
Bjoern Rabenstein	74c143c4c9	Improve scraper shutdown time. - Stop target pools in parallel. - Stop individual scrapers in goroutines, too. - Timing tweaks. Change-Id: I9dff1ee18616694f14b04408eaf1625d0f989696	2014-11-25 17:10:39 +01:00
Bjoern Rabenstein	3f61d304ce	Reorganize maintenance loop. Change-Id: Iac10f988ba3e93ffb188f49c30f92e0b6adce5a3	2014-11-25 17:10:30 +01:00
Bjoern Rabenstein	c087ee35f7	Remove archiveMtx. Change-Id: Ie8019f860bbda68621f74380c90a4e57930d3d7a	2014-11-25 17:10:30 +01:00
Bjoern Rabenstein	7af42eda65	Optimize purging. Now only purge if there is something to purge. Also, set savedFirstTime and archived time range appropriately. (Which is needed for the optimization.) Change-Id: Idcd33319a84def3ce0318d886f10c6800369e7f9	2014-11-25 17:10:30 +01:00
Bjoern Rabenstein	33b959b898	Persist savedFirstTime in checkpoint. Change-Id: Ibdfdea16fad0608ec104fbccc749e824a171f227	2014-11-25 17:10:30 +01:00
Bjoern Rabenstein	904acd43da	Add crash recovery. Fix the behavior if preload for non-existent series is requested. Instead of returning an error (which triggers a panic further up), simply count those incidents. They can happen regularly, we just want to know if they happen too frequently because that would mean the indexing is behind or broken. Change-Id: I4b2d1b93c4146eeea897d188063cb9574a270f8b	2014-11-25 17:09:43 +01:00
Bjoern Rabenstein	7a9efc9c59	Fix typo in test. Change-Id: I3c2fd76bc5f50446c58f8ef693d9c6595197feaa	2014-11-25 17:09:43 +01:00
Bjoern Rabenstein	4efc60174b	Tweak and verify a few parameters. Remove TODOs accordingly. Change-Id: Ic062e13b6ae89a9135d3f14011114fe1cca1cef8	2014-11-25 17:09:43 +01:00
Bjoern Rabenstein	5f8e9617ef	Add more tests. Add an end-to-end fuzz and race test. Fix a race exposed by the above. Change-Id: Ifaa39a90cefbde8d4c29bda197cc92592ded21bb	2014-11-25 17:09:17 +01:00
Bjoern Rabenstein	d215e013b7	Fix the weird chunkDesc shuffling bug. The root cause was that after chunkDesc eviction, the offset between memory representation of chunk layout (via chunkDescs in memory) was shiftet against chunks as layed out on disk. Keeping the offset up to date is by no means trivial, so this commit is pretty involved. Also, found a race that for some reason didn't bite us so far: Persisting chunks was completel unlocked, so if chunks were purged on disk at the same time, disaster would strike. However, locking the persisting of chunk revealed interesting dead locks. Basically, never queue under the fp lock. Change-Id: I1ea9e4e71024cabbc1f9601b28e74db0c5c55db8	2014-11-25 17:09:17 +01:00
Bjoern Rabenstein	a617269b12	Avoid unnecessary cloning of the head chunk. Change-Id: I5da774515d5493166a197b5814d0a720628cfaff	2014-11-25 17:09:04 +01:00
Bjoern Rabenstein	f1de5b0c4e	Run checkpointing of in-memory metrics and head chunks periodically. Checkpointing interval is now a command line flag. Along the way, several things were refactored. - Restructure the way the storage is started and stopped.. - Number of series in checkpoint is now a uint64, not a varint. (Breaks old checkpoints, needs wipe!) - More consistent naming and order of methods. Change-Id: I883d9170c9a608ee716bb0ab3d0ded8ca03760d9	2014-11-25 17:09:04 +01:00
Bjoern Rabenstein	74c9b34a5e	Improve storage instrumentation even more. Add gauge for chunks and chunkdescs in memory (backed by a global variable to be used later not only for instrumentation but also for memory management). Refactored instrumentation code once more (instrumentation.go is back :). Change-Id: Ife39947e22a48cac4982db7369c231947f446e17	2014-11-25 17:09:04 +01:00
Julius Volz	c3fcea45e3	Support finer time resolutions than 1 second. Change-Id: I4c5f1d6d2361e841999b23283d1961b1bd0c2859	2014-11-25 17:09:04 +01:00
Bjoern Rabenstein	443dd33805	Improve instrumentation in storage. Also, fix some other minor bugs. Change-Id: If72f1c058b0f47d3e378fdf80228d7e9a8db06c7	2014-11-25 17:09:04 +01:00
Bjoern Rabenstein	1936a40e75	Minor loging improvement. Change-Id: I7875d1a58ef9c5ff149f18e36f65959a4712fea2	2014-11-25 17:09:04 +01:00
Bjoern Rabenstein	192bf52c41	Evict chunkDescs, too. Change-Id: I8b70f22fbf1dfcbc49f9ec391985144649e6ce9c	2014-11-25 17:09:04 +01:00
Bjoern Rabenstein	95f392fb2c	Prevent an indexing death spiral. Change-Id: I86b20cd0830d02f87b2f020767257e2d3fb2033c	2014-11-25 17:09:04 +01:00
Bjoern Rabenstein	40354eaa29	Reduce directory depth by one. Change-Id: I7f89df61135ff19169ed97633a662685d414c448	2014-11-25 17:09:04 +01:00
Bjoern Rabenstein	096fa0f8b2	Squash a number of TODOs. - Staleness delta is no a proper function parameter and not replicated from package ast. - Named type 'chunks' replaced by explicit '[]chunk' to avoid confusion. - For the same reason, replaced 'chunkDescs' by '[]*chunkDescs'. - Verified that math.Modf is not a speed enhancement over conversion (actually 5x slower). - Renamed firstTimeField, lastTimeField into chunkFirstTime and chunkLastTime. - Verified unpin() is sufficiently goroutine-safe. - Decided not to update archivedFingerprintToTimeRange upon series truncation and added a rationale why. Change-Id: I863b8d785e5ad9f71eb63e229845eacf1bed8534	2014-11-25 17:09:04 +01:00

1 2 3

101 commits