prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-12-26 22:19:40 -08:00

Author	SHA1	Message	Date
Bjoern Rabenstein	2c8fdcbc23	Remove a deadlock during shutdown. If queries are still running when the shutdown is initiated, they will finish _during_ the shutdown. In that case, they might request chunk eviction upon unpinning their pinned chunks. That might completely fill the evict request queue _after_ draining it during storage shutdown. If that ever happens (which is the case if there are _many_ queries still running during shutdown), the affected queries will be stuck while keeping a fingerprint locked. The checkpointing can then not process that fingerprint (or one that shares the same lock). And then we are deadlocked.	2015-01-22 14:42:15 +01:00
Bjoern Rabenstein	5859b74f1b	Clean up license issues. - Move CONTRIBUTORS.md to the more common AUTHORS. - Added the required NOTICE file. - Changed "Prometheus Team" to "The Prometheus Authors". - Reverted the erroneous changes to the Apache License.	2015-01-21 20:07:45 +01:00
Bjoern Rabenstein	f298af5756	Use named returns in flock.New.	2015-01-19 14:31:16 +01:00
Bjoern Rabenstein	baca6faa1c	Add double-start protection. This mimics the locking leveldb is performing anyway. Advantages of doing it separately: - Should we ever replace the leveldb implementation by one without double-start protection, we are still good. - In contrast to leveldb, the new code creates a meaningful error message.	2015-01-14 17:13:42 +01:00
Bjoern Rabenstein	ae70eac97d	Adjust the partitioning by outcome.	2015-01-13 18:34:56 +01:00
Julius Volz	a6bc42bc61	Minor formatting/spelling fixups.	2015-01-09 11:04:20 +01:00
Bjoern Rabenstein	0851945054	Add a heuristics to checkpoint early if there are many "dirty" series..	2015-01-08 20:15:58 +01:00
Bjoern Rabenstein	622e8350cd	Fix a bug handling freshly unarchived series. Usually, if you unarchive a series, it is to add something to it, which will create a new head chunk. However, if a series in unarchived, and before anything is added to it, it is handled by the maintenance loop, it will be archived again. In that case, we have to load the chunkDescs to know the lastTime of the series to be archived. Usually, this case will happen only rarely (as a race, has never happened so far, possibly because the locking around unarchiving and the subsequent sample append is smart enough). However, during crash recovery, we sometimes treat series as "freshly unarchived" without directly appending a sample. We might add more cases of that type later, so better deal with archiving properly and load chunkDescs if required.	2015-01-08 16:25:50 +01:00
Bjoern Rabenstein	eb932d1524	Remove a deadlock during shutdown.	2015-01-07 19:02:38 +01:00
Brian Brazil	e56786b221	Have scrape time as a pseudovariable, not a prometheus variable. This ensures it has the right timestamp, and is easier to work with. Switch sd variable away from 'outcome', using total/failed instead.	2014-12-27 00:39:33 +00:00
Bjoern Rabenstein	ff24070a03	Fix embarrassing bug in crash recovery. (And yes, we always knew we need tests for that. I have added a TODO now.) Change-Id: I9cf52bbf98e263e0b79404bda4c442beba9696a8	2014-12-17 17:18:04 +01:00
Julius Volz	c9618d11e8	Introduce copy-on-write for metrics in AST. This depends on changes in: https://github.com/prometheus/client_golang/tree/cow-metrics. Change-Id: I80b94833a60ddf954c7cd92fd2cfbebd8dd46142	2014-12-12 20:34:55 +01:00
Bjoern Rabenstein	afd864e7f4	Adjust to the new version of goleveldb. (And yes, we do want vendoring for that... This is just the quick fix.) Change-Id: I9d347a64d96de6b3390a0e35c8d466f14bb83e4e	2014-12-10 18:04:29 +01:00
Bjoern Rabenstein	fee88a7a77	Remove the remaining races, new and old. Also, resolve a few other TODOs. Change-Id: Icb39b5a5e8ca22ebcb48771cd8951c5d9e112691	2014-12-03 18:07:23 +01:00
Bjoern Rabenstein	66c80b5ebd	Fix typo. Change-Id: I72608c7841c00145458807d3c3ee29db7b5ac2bc	2014-11-28 12:50:19 +01:00
Bjoern Rabenstein	674624f1c8	Completed more TODOs. - Documented checkpoint file format. - High-level description of series sanitation. - Replace fp.LoadFromString panic with an error. (Change in client_golang already submitted.) - Introduced checks for series file size where appropriate. - Removed two Law of Demeter violations. Change-Id: I555d97a2c8f4769820c2fc8bf5d6f4e160222abc	2014-11-27 20:46:45 +01:00
Bjoern Rabenstein	7d11019aa2	Squash a few trivial TODOs. - Delete unneeded file view_adapter.go. - Assessed that we still need the fingerprints in nodes (to create iterators). - Turned numMemChunkDescs into a metric. Change-Id: I29be963c795a075ec00c095f76bf26405535609d	2014-11-27 18:26:06 +01:00
Bjoern Rabenstein	49683c0c20	Avoid test flags in normal binary. Change-Id: If1fba813a73bf93ea5918dcda326e3ffa81a797d	2014-11-27 18:04:48 +01:00
Bjoern Rabenstein	9bc05052ad	Add line that has mysteriously disappeared after rebase. Change-Id: I3612eb0b626e66e607b363e9801f187d2ba637a3	2014-11-25 17:15:56 +01:00
Bjoern Rabenstein	14bda4180c	Changes after pair code review. Change-Id: Ib72d40f8e9027818cfbbd32a7a7201eebda07455	2014-11-25 17:12:59 +01:00
Bjoern Rabenstein	9ea808cd8b	Remove debug log line. Change-Id: Icdd2351b89f2d37ac2b615f9cf872e054c694ad1	2014-11-25 17:10:39 +01:00
Bjoern Rabenstein	bb42cc2e2d	Evict based on memory pressure. Evict recently used chunks last. Change-Id: Ie6168f0cdb3917bdc63b6fe15585dd70c1e42afe	2014-11-25 17:10:39 +01:00
Bjoern Rabenstein	e23ee0f7cc	Fix race in test. Change-Id: I53e1a4c5a6b5f846acd76043166b6cb7bf7d5dc7	2014-11-25 17:10:39 +01:00
Bjoern Rabenstein	d73e851b14	Tweak timing in the maintenance loop. Change-Id: I9801c4f9a22c3b3dc1ce1af81fdd9e992a4f4dd7	2014-11-25 17:10:39 +01:00
Bjoern Rabenstein	2672aa8ece	Instrument series maintenance. Change-Id: Ie4269d07ad4d23d44230c95a523088b472718e54	2014-11-25 17:10:39 +01:00
Bjoern Rabenstein	74c143c4c9	Improve scraper shutdown time. - Stop target pools in parallel. - Stop individual scrapers in goroutines, too. - Timing tweaks. Change-Id: I9dff1ee18616694f14b04408eaf1625d0f989696	2014-11-25 17:10:39 +01:00
Bjoern Rabenstein	3f61d304ce	Reorganize maintenance loop. Change-Id: Iac10f988ba3e93ffb188f49c30f92e0b6adce5a3	2014-11-25 17:10:30 +01:00
Bjoern Rabenstein	c087ee35f7	Remove archiveMtx. Change-Id: Ie8019f860bbda68621f74380c90a4e57930d3d7a	2014-11-25 17:10:30 +01:00
Bjoern Rabenstein	7af42eda65	Optimize purging. Now only purge if there is something to purge. Also, set savedFirstTime and archived time range appropriately. (Which is needed for the optimization.) Change-Id: Idcd33319a84def3ce0318d886f10c6800369e7f9	2014-11-25 17:10:30 +01:00
Bjoern Rabenstein	33b959b898	Persist savedFirstTime in checkpoint. Change-Id: Ibdfdea16fad0608ec104fbccc749e824a171f227	2014-11-25 17:10:30 +01:00
Bjoern Rabenstein	904acd43da	Add crash recovery. Fix the behavior if preload for non-existent series is requested. Instead of returning an error (which triggers a panic further up), simply count those incidents. They can happen regularly, we just want to know if they happen too frequently because that would mean the indexing is behind or broken. Change-Id: I4b2d1b93c4146eeea897d188063cb9574a270f8b	2014-11-25 17:09:43 +01:00
Bjoern Rabenstein	7a9efc9c59	Fix typo in test. Change-Id: I3c2fd76bc5f50446c58f8ef693d9c6595197feaa	2014-11-25 17:09:43 +01:00
Bjoern Rabenstein	4efc60174b	Tweak and verify a few parameters. Remove TODOs accordingly. Change-Id: Ic062e13b6ae89a9135d3f14011114fe1cca1cef8	2014-11-25 17:09:43 +01:00
Bjoern Rabenstein	5f8e9617ef	Add more tests. Add an end-to-end fuzz and race test. Fix a race exposed by the above. Change-Id: Ifaa39a90cefbde8d4c29bda197cc92592ded21bb	2014-11-25 17:09:17 +01:00
Bjoern Rabenstein	d215e013b7	Fix the weird chunkDesc shuffling bug. The root cause was that after chunkDesc eviction, the offset between memory representation of chunk layout (via chunkDescs in memory) was shiftet against chunks as layed out on disk. Keeping the offset up to date is by no means trivial, so this commit is pretty involved. Also, found a race that for some reason didn't bite us so far: Persisting chunks was completel unlocked, so if chunks were purged on disk at the same time, disaster would strike. However, locking the persisting of chunk revealed interesting dead locks. Basically, never queue under the fp lock. Change-Id: I1ea9e4e71024cabbc1f9601b28e74db0c5c55db8	2014-11-25 17:09:17 +01:00
Bjoern Rabenstein	a617269b12	Avoid unnecessary cloning of the head chunk. Change-Id: I5da774515d5493166a197b5814d0a720628cfaff	2014-11-25 17:09:04 +01:00
Bjoern Rabenstein	f1de5b0c4e	Run checkpointing of in-memory metrics and head chunks periodically. Checkpointing interval is now a command line flag. Along the way, several things were refactored. - Restructure the way the storage is started and stopped.. - Number of series in checkpoint is now a uint64, not a varint. (Breaks old checkpoints, needs wipe!) - More consistent naming and order of methods. Change-Id: I883d9170c9a608ee716bb0ab3d0ded8ca03760d9	2014-11-25 17:09:04 +01:00
Bjoern Rabenstein	74c9b34a5e	Improve storage instrumentation even more. Add gauge for chunks and chunkdescs in memory (backed by a global variable to be used later not only for instrumentation but also for memory management). Refactored instrumentation code once more (instrumentation.go is back :). Change-Id: Ife39947e22a48cac4982db7369c231947f446e17	2014-11-25 17:09:04 +01:00
Julius Volz	c3fcea45e3	Support finer time resolutions than 1 second. Change-Id: I4c5f1d6d2361e841999b23283d1961b1bd0c2859	2014-11-25 17:09:04 +01:00
Bjoern Rabenstein	443dd33805	Improve instrumentation in storage. Also, fix some other minor bugs. Change-Id: If72f1c058b0f47d3e378fdf80228d7e9a8db06c7	2014-11-25 17:09:04 +01:00
Bjoern Rabenstein	1936a40e75	Minor loging improvement. Change-Id: I7875d1a58ef9c5ff149f18e36f65959a4712fea2	2014-11-25 17:09:04 +01:00
Bjoern Rabenstein	192bf52c41	Evict chunkDescs, too. Change-Id: I8b70f22fbf1dfcbc49f9ec391985144649e6ce9c	2014-11-25 17:09:04 +01:00
Bjoern Rabenstein	95f392fb2c	Prevent an indexing death spiral. Change-Id: I86b20cd0830d02f87b2f020767257e2d3fb2033c	2014-11-25 17:09:04 +01:00
Bjoern Rabenstein	40354eaa29	Reduce directory depth by one. Change-Id: I7f89df61135ff19169ed97633a662685d414c448	2014-11-25 17:09:04 +01:00
Bjoern Rabenstein	096fa0f8b2	Squash a number of TODOs. - Staleness delta is no a proper function parameter and not replicated from package ast. - Named type 'chunks' replaced by explicit '[]chunk' to avoid confusion. - For the same reason, replaced 'chunkDescs' by '[]*chunkDescs'. - Verified that math.Modf is not a speed enhancement over conversion (actually 5x slower). - Renamed firstTimeField, lastTimeField into chunkFirstTime and chunkLastTime. - Verified unpin() is sufficiently goroutine-safe. - Decided not to update archivedFingerprintToTimeRange upon series truncation and added a rationale why. Change-Id: I863b8d785e5ad9f71eb63e229845eacf1bed8534	2014-11-25 17:09:04 +01:00
Bjoern Rabenstein	427c8d53a5	Fix handling of empty chunkDescs while preloading chunks. Change-Id: I73ce89fe0ef90c6eda78218e5be2cbfa0207c364	2014-11-25 17:09:04 +01:00
Bjoern Rabenstein	ecee5d8281	Fix head chunk persisting and a chunkDesc race condition. - Head chunk persisting only happens in evictOlderThan, so do it there. (With the previous code, it would never happen.) - Raw accesses to chunkDesc.chunk are now done via isEvicted (with locking). Change-Id: I48b07b56dfea4899b50df159b4ea566954396fcd	2014-11-25 17:09:04 +01:00
Bjoern Rabenstein	6b37e47f9e	Remove unused metrics. Change-Id: Icf03ba4ce92a5e38daf12930f9661daba79c83bb	2014-11-25 17:09:03 +01:00
Bjoern Rabenstein	2b4ff620aa	Return a nop iterator for series that have been purged completely. Change-Id: I6e92cac4472486feefdecba8593c17867e8c710d	2014-11-25 17:09:03 +01:00
Bjoern Rabenstein	6e3a366f91	Only archive a time series when none of its chunks is pinned. Change-Id: I7e4b67c34b417b8980173bc5dc3b213bd7d698e5	2014-11-25 17:09:03 +01:00
Julius Volz	bfa64248b7	Deal with missing series in preloading. Change-Id: Ibf3a57b329f40a3d5e0b98464a2f45d2f1bd07bf	2014-11-25 17:09:03 +01:00
Bjoern Rabenstein	ca42a22e20	Add safety panic to seriesMap.put. Change-Id: I4d4d2e45cc0f908a33eb1ae6e3ee6796adfcbd1e	2014-11-25 17:09:03 +01:00
Bjoern Rabenstein	83b4fa868d	Fix GetBoundaryValues. Change-Id: I8f8bbdb88e9b24e4c37ff869126ed9343f261ce2	2014-11-25 17:08:45 +01:00
Bjoern Rabenstein	b3ed9aa7a2	Clean up start-up and shut-down. Change-Id: Idff4bbb0a15a9f879bfbb3da5b1025179cab5e2c	2014-11-25 17:08:45 +01:00
Bjoern Rabenstein	4447708c9f	Fix a race in target.go. Also, fix problems in shutdown. Starting serving and shutdown still has to be cleaned up properly. It's a mess. Change-Id: I51061db12064e434066446e6fceac32741c4f84c	2014-11-25 17:08:45 +01:00
Bjoern Rabenstein	fd6600850a	Fix race in chunkDesc. Change-Id: Id7bae115d75886e10d44184a690a76777b1531fe	2014-11-25 17:08:45 +01:00
Bjoern Rabenstein	1c53c09558	Treat empty chunkDescs properly in preloadChunksForRange. Change-Id: Ida1bd3fe1f9fb0ea2d5dbb9704be926f0824f873	2014-11-25 17:08:45 +01:00
Bjoern Rabenstein	934d09f738	Fix race during shutdown. Change-Id: I2f8bf48d92a14f1e5ecde27c1b138734d7653394	2014-11-25 17:08:45 +01:00
Bjoern Rabenstein	38fc24d0ed	Fix targetpool_test.go and other tests. Change-Id: I91a4dd1d39e01f174e1aaae653ce1ed7aecaa624	2014-11-25 17:08:26 +01:00
Julius Volz	7f5d3c2c29	Fix and improve the fp locker. Benchmark: $ go test -bench 'Fingerprint' -test.run 'Fingerprint' -test.cpu=1,2,4 OLD BenchmarkFingerprintLockerParallel 500000 3618 ns/op BenchmarkFingerprintLockerParallel-2 100000 12257 ns/op BenchmarkFingerprintLockerParallel-4 500000 10164 ns/op BenchmarkFingerprintLockerSerial 10000000 283 ns/op BenchmarkFingerprintLockerSerial-2 10000000 284 ns/op BenchmarkFingerprintLockerSerial-4 10000000 288 ns/op NEW BenchmarkFingerprintLockerParallel 1000000 1018 ns/op BenchmarkFingerprintLockerParallel-2 1000000 1164 ns/op BenchmarkFingerprintLockerParallel-4 2000000 910 ns/op BenchmarkFingerprintLockerSerial 50000000 56.0 ns/op BenchmarkFingerprintLockerSerial-2 50000000 47.9 ns/op BenchmarkFingerprintLockerSerial-4 50000000 54.5 ns/op Change-Id: I3c65a43822840e7e64c3c3cfe759e1de51272581	2014-11-25 17:07:45 +01:00
Bjoern Rabenstein	7ad55ef83c	Actually close the iterator channels. Change-Id: I6f6a2aef5ff55c6b2d21ad91d02ae6b0ecba4ae8	2014-11-25 17:07:45 +01:00
Bjoern Rabenstein	8fba3302bc	Bold changes to concurrency. (WIP. Probably doesn't work yet.) Change-Id: Id1537dfcca53831a1d428078a5863ece7bdf4875	2014-11-25 17:07:45 +01:00
Bjoern Rabenstein	fcdf5a8ee7	Fix bugs in chunk evict code. Also, simplify code by re-looking up metric in metric map. Change-Id: Ib2092f9184374e5a543e87d3a9f4a74fda64b193	2014-11-25 17:07:45 +01:00
Bjoern Rabenstein	7e6a03fbf9	Fix a few concurrency issues before starting to use the new fp locker. Change-Id: I8615e8816e79ef0882e123163ee590c739b79d12	2014-11-25 17:07:45 +01:00
Julius Volz	db92620163	Instrument eviction and purge durations. Change-Id: Ia5b2319363ad2644674c9b7a94162a89bcc296fb	2014-11-25 17:07:45 +01:00
Julius Volz	e0ee7ec7ab	Add fingerprintLocker for locking individual fingerprints. Change-Id: Id41ba555715229edf7d6543f56736b82f6eff1ef	2014-11-25 17:07:45 +01:00
Julius Volz	df1b2a2422	Fix indexing latency instrumentation. Change-Id: I532c170121cd2996d1a378adbb1fd551cd5a4e38	2014-11-25 17:07:44 +01:00
Bjoern Rabenstein	01dd618a20	Fix a locking bug. Change-Id: I183780785991d0b4165ce9186f53eb8201fb3ed5	2014-11-25 17:07:44 +01:00
Julius Volz	a746fbb8bc	Instrument indexing: queue length, batch sizes and latencies. Change-Id: I60bcbd24b160e47d418a485d8cffa39344a257c6	2014-11-25 17:07:44 +01:00
Bjoern Rabenstein	aea32b0b4b	Avoid redundant fingerprint calculation. Change-Id: Ief8a165dcfa5030226953346ec9dfe4a7787df1f	2014-11-25 17:07:44 +01:00
Bjoern Rabenstein	e9ff29c547	Comment/code cleanup. Change-Id: I38736e3d0fec79759a2bafa35aecf914480ff810	2014-11-25 17:07:44 +01:00
Bjoern Rabenstein	0031a448e2	Add WaitForIndexing. Change-Id: I5a5c975c4246632f937413322c855bbe63d00802	2014-11-25 17:07:44 +01:00
Bjoern Rabenstein	c7aad110fb	Add an indexing queue and batch the ops. Some other improvements on the way, in particular codec -> codable renaming and addition of LookupSet methods. Change-Id: I978f8f3f84ca8e4d39a9d9f152ae0ad274bbf4e2	2014-11-25 17:07:44 +01:00
Bjoern Rabenstein	71206dbc06	More code cleanups. Add license text everywhere. And others.... Change-Id: I11ccde267a2ef7eb366c4788ba7aeae14ba7545c	2014-11-25 17:07:44 +01:00
Julius Volz	f0d5d4bda3	Fix bug around index purging. Change-Id: I8cea00e03f72bbeead2cbd2d26b34d986059ced0	2014-11-25 17:07:44 +01:00
Julius Volz	630b5a087a	Also consider on-disk fingerprints during purge. This reintroduces LevelDB iterators so that we can iterate through all the on-disk fingerprints. Change-Id: I007ee4638d038d2a4461bbda27f30fcaad411474	2014-11-25 17:07:35 +01:00
Bjoern Rabenstein	f5f9f3514a	Major code cleanup. - Make it go-vet and golint clean. - Add comments, TODOs, etc. Change-Id: If1392d96f3d5b4cdde597b10c8dff1769fcfabe2	2014-11-25 17:02:53 +01:00
Bjoern Rabenstein	3592dc2359	Implement series eviction. Change-Id: I7a503e0ba78aae3761d032851b06f2807122b085	2014-11-25 17:02:52 +01:00
Bjoern Rabenstein	bbf49200ab	Implement methods in persistence.go. Change-Id: I804cdd0b30420e171825fd86fe1281eca0d5e638	2014-11-25 17:02:23 +01:00
Bjoern Rabenstein	5a128a04a9	Major reorganization of the storage. Most important, the heads file will now persist all the chunk descs, too. Implicitly, it will serve as the persisted form of the fp-to-series map. Change-Id: Ic867e78f2714d54c3b5733939cc5aef43f7bd08d	2014-11-25 17:02:01 +01:00
Bjoern Rabenstein	e7cb9ddb9f	Use a sync.pool for the staging buffer in codec.go. Change-Id: I1aae6847f77b5a7c75582b07c199b1943cf90552	2014-11-25 17:02:01 +01:00
Bjoern Rabenstein	4770cf76a4	Make index package more self-contained. Moved interna from diskPersistence into the indexer. TotalIndexer now called diskIndexer. Change-Id: I6c8c62cb171f12bbd8a5474773af7786d71ba388	2014-11-25 17:02:01 +01:00
Bjoern Rabenstein	89f10e8eb2	Move to using the standard library interfaces for encoding/decoding. BinaryMarshaler instead of encodable. BinaryUnmarshaler instead of decodable. Left 'codable' in place for lack of a better word. Change-Id: I8a104be7d6db916e8dbc47ff95e6ff73b845ac22	2014-11-25 17:02:01 +01:00
Bjoern Rabenstein	af77d5ef0b	Added a few missing implementations in index.go. Also, added closing of persistence and mem storage. Change-Id: Iacf0d22c3520dd2584d9546984c1f8a5ed6cd54e	2014-11-25 17:02:01 +01:00
Julius Volz	cca7ebe906	Some more cleanups / obsolete code removals. Change-Id: I584144ceeeedafdb114266d8a6d2513e67b1d010	2014-11-25 17:02:00 +01:00
Julius Volz	7e85711df0	Beginnings of a tiered index implementation. This reintroduces a LevelDB-based metrics index. Change-Id: I4111540301c52255a07b2f570761707a32f72c05	2014-11-25 17:02:00 +01:00
Julius Volz	8dfaa5ecd2	Remove use of freelists for chunk bufs. Change-Id: Ib887fdb61e1d96da0cd32545817b925ba88831c1	2014-11-25 17:02:00 +01:00
Julius Volz	7b35e0f0b8	Use constants from math package instead of literals. Change-Id: I55427ba32c2cbb32ee42ec1e3153160965ab8b3c	2014-11-25 17:02:00 +01:00
Julius Volz	15929eece2	Unpin any already loaded chunks upon preloading error. Change-Id: Ib451136e3ef21bce8b814c21b66eaab727ab341b	2014-11-25 17:02:00 +01:00
Julius Volz	fd01d07589	Check that chunk buffer length fits in 16 bit. Change-Id: Id086a54aa8a1990c1979e747c1c02e53bed6d447	2014-11-25 17:02:00 +01:00
Bjoern Rabenstein	1ca7f24137	Remove float diff tolerance altogether. Change-Id: I9ea9683a4665d5800fca75560bb4b8a8b4406d55	2014-11-25 17:02:00 +01:00
Bjoern Rabenstein	d742edfe0d	Fix precision loss. Large delta values often imply a difference between a large base value and the large delta value, potentially resulting in small numbers with a huge precision error. Since large delta values need 8 bytes anyway, we are not even saving memory. As a solution, always save the absoluto value rather than a delta once 8 bytes would be needed for the delta. Timestamps are then saved as 8 byte integers, while values are always saved as float64 in that case. Change-Id: I01100d600515e16df58ce508b50982ffd762cc49	2014-11-25 17:02:00 +01:00
Bjoern Rabenstein	dc2e463a97	Improvements after review. Change-Id: I484359282d4c7113518bbbb131f4f18383c08fdb	2014-11-25 17:02:00 +01:00
Bjoern Rabenstein	52c9dc43a3	Improve testing. In particular, create a fuzz test for time series. Change-Id: I523a17912405a0b6b46bd395c781d201dfe55036	2014-11-25 17:02:00 +01:00
Julius Volz	3b25867d61	Add chunk persistence tests, fix storage tests. Change-Id: Id0b8f5382e99efa839cc0f826e92bbda985fe9a9	2014-11-25 17:02:00 +01:00
Bjoern Rabenstein	ecdf5ab14f	Index-persistence switched from gob to a hand-coded solution. Change-Id: Ib4ec42535bd08df16d34d4774bb638e35c5a1841	2014-11-25 17:02:00 +01:00
Julius Volz	e7ed39c9a6	Initial experimental snapshot of next-gen storage. Change-Id: Ifb8709960dbedd1d9f5efd88cdd359ee9fa9d26d	2014-11-25 17:02:00 +01:00
Julius Volz	c6e9f085a3	Update used Go version to 1.3. Go downloads moved to a different URL and require following redirects (curl's '-L' option) now. Go 1.3 deliberately randomizes ranges over maps, which uncovered some bugs in our tests. These are fixed too. Change-Id: Id2d9e185d8d2379a9b7b8ad5ba680024565d15f4	2014-11-25 17:02:00 +01:00
Bjoern Rabenstein	1909686789	Make metrics exported by the Prometheus server itself more consistent. - Always spell out the time unit (e.g. milliseconds instead of ms). - Remove "_total" from the names of metrics that are not counters. - Make use of the "Namespace" and "Subsystem" fields in the options. - Removed the "capacity" facet from all metrics about channels/queues. These are all fixed via command line flags and will never change during the runtime of a process. Also, they should not be part of the same metric family. I have added separate metrics for the capacity of queues as convenience. (They will never change and are only set once.) - I left "metric_disk_latency_microseconds" unchanged, although that metric measures the latency of the storage device, even if it is not a spinning disk. "SSD" is read by many as "solid state disk", so it's not too far off. (It should be "solid state drive", of course, but "metric_drive_latency_microseconds" is probably confusing.) - Brian suggested to not mix "failure" and "success" outcome in the same metric family (distinguished by labels). For now, I left it as it is. We are touching some bigger issue here, especially as other parts in the Prometheus ecosystem are following the same principle. We still need to come to terms here and then change things consistently everywhere. Change-Id: If799458b450d18f78500f05990301c12525197d3	2014-11-25 17:02:00 +01:00
Julius Volz	80b3d3bf34	Speed up disk flushes by removing unnecessary sort. The first sort in groupByFingerprint already ensures that all resulting sample lists contain only one fingerprint. We also already assume that all samples passed into AppendSamples (and thus groupByFingerprint) are chronologically sorted within each fingerprint. The extra chronological sort is thus superfluous. Furthermore, this second sort didn't only sort chronologically, but also compared all metric fingerprints again (although we already know that we're only sorting within samples for the same fingerprint). This caused a huge memory and runtime overhead. In a heavily loaded real Prometheus, this brought down disk flush times from ~9 minutes to ~1 minute. OLD: BenchmarkLevelDBAppendRepeatingValues 5 331391808 ns/op 44542953 B/op 597788 allocs/op BenchmarkLevelDBAppendsRepeatingValues 5 329893512 ns/op 46968288 B/op 3104373 allocs/op NEW: BenchmarkLevelDBAppendRepeatingValues 5 299298635 ns/op 43329497 B/op 567616 allocs/op BenchmarkLevelDBAppendsRepeatingValues 20 92204601 ns/op `1779454` B/op 70975 allocs/op Change-Id: Ie2d8db3569b0102a18010f9e106e391fda7f7883	2014-11-25 17:01:59 +01:00

1 2 3 4 5 ...

468 commits