prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2025-03-05 20:59:13 -08:00

Author	SHA1	Message	Date
beorn7	7199a9d9d4	storage: Guard against appending to evicted chunk Fixes #2480. For certain definition of "fixes". This is something that should never happen. Sadly, it does happen, albeit extremely rarely. This could be some weird cornercase we haven't covered yet. Or it happens as a consequesnce of data corruption or a crash recovery gone bad. This is not a "real" fix as we don't know the root cause of the incident reported in #2480. However, this makes sure the server does not crash, but deals gracefully with the problem: The series in question is quarantined, which even makes it available for forensics.	2017-04-06 20:02:52 +02:00
Björn Rabenstein	516a96d9a3	Merge pull request #2587 from prometheus/beorn7/storage2 storage: Mark storage as dirty if indexing fails	2017-04-06 16:42:06 +02:00
beorn7	ed5f68f382	storage: Increment s.persistErrors on all persist errors Fixes #2091	2017-04-06 15:55:15 +02:00
beorn7	f3365c4f26	storage: Mark storage as dirty if indexing fails	2017-04-06 15:29:33 +02:00
Alexey Palazhchenko	17f15d024a	Small fixes. (#2578 ) Fix typos. Simplify with gofmt -s	2017-04-05 14:24:22 +01:00
beorn7	ae286385fd	storage: Check for negative values from varint decoding Sadly, we have a number of places where we use varint encoding for numbers that cannot be negative. We could have saved a bit by using uvarint encoding. On the bright side, we now have a 50% chance to detect data corruption. :-/ Fixes #1800 and #2492.	2017-04-04 19:14:52 +02:00
Björn Rabenstein	50e4f49b7e	Merge pull request #2561 from prometheus/beorn7/storage2 storage: Evict unused chunk.Descs in crash recovery	2017-04-04 00:05:03 +02:00
beorn7	08fc6cbd39	storage: Evict unused chunk.Descs in crash recovery This is in line with the v1.5 change in paradigm to not keep chunk.Descs without chunks around after a series maintenance. It's mainly motivated by avoiding excessive amounts of RAM usage during crash recovery. The code avoids to create memory time series with zero chunk.Descs as that is prone to trigger weird effects. (Series maintenance would archive series with zero chunk.Descs, but we cannot do that here because the archive indices still have to be checked.)	2017-04-04 00:04:22 +02:00
beorn7	d284ffab03	storage: Replace fpIter by sortedFPs The fpIter was kind of cumbersome to use and required a lock for each iteration (which wasn't even needed for the iteration at startup after loading the checkpoint). The new implementation here has an obvious penalty in memory, but it's only 8 byte per series, so 80MiB for a beefy server with 10M memory time series (which would probably need ~100GiB RAM, so the memory penalty is only 0.1% of the total memory need). The big advantage is that now series maintenance happens in order, which leads to the time between two maintenances of the same series being less random. Ideally, after each maintenance, the next maintenance would tackle the series with the largest number of non-persisted chunks. That would be quite an effort to find out or track, but with the approach here, the next maintenance will tackle the series whose previous maintenance is longest ago, which is a good approximation. While this commit won't change the _average_ number of chunks persisted per maintenance, it will reduce the mean time a given chunk has to wait for its persistence and thus reduce the steady-state number of chunks waiting for persistence. Also, the map iteration in Go is non-deterministic but not truly random. In practice, the iteration appears to be somewhat "bucketed". You can often observe a bunch of series with similar duration since their last maintenance, i.e. you see batches of series with similar number of chunks persisted per maintenance. If that batch is relatively young, a whole lot of series are maintained with very few chunks to persist. (See screenshot in PR for a better explanation.)	2017-04-03 15:34:46 +02:00
beorn7	434ab2a6a3	storage: Evict chunks and calculate persistence pressure based on target heap size This is a fairly easy attempt to dynamically evict chunks based on the heap size. A target heap size has to be set as a command line flage, so that users can essentially say "utilize 4GiB of RAM, and please don't OOM". The -storage.local.max-chunks-to-persist and -storage.local.memory-chunks flags are deprecated by this change. Backwards compatibility is provided by ignoring -storage.local.max-chunks-to-persist and use -storage.local.memory-chunks to set the new -storage.local.target-heap-size to a reasonable (and conservative) value (both with a warning). This also makes the metrics intstrumentation more consistent (in naming and implementation) and cleans up a few quirks in the tests. Answers to anticipated comments: There is a chance that Go 1.9 will allow programs better control over the Go memory management. I don't expect those changes to be in contradiction with the approach here, but I do expect them to complement them and allow them to be more precise and controlled. In any case, once those Go changes are available, this code has to be revisted. One might be tempted to let the user specify an estimated value for the RSS usage, and then internall set a target heap size of a certain fraction of that. (In my experience, 2/3 is a fairly safe bet.) However, investigations have shown that RSS size and its relation to the heap size is really really complicated. It depends on so many factors that I wouldn't even start listing them in a commit description. It depends on many circumstances and not at least on the risk trade-off of each individual user between RAM utilization and probability of OOMing during a RAM usage peak. To not add even more to the confusion, we need to stick to the well-defined number we also use in the targeting here, the sum of the sizes of heap objects.	2017-03-27 14:33:50 +02:00
beorn7	96a303b348	storage: Use staleness delta as head chunk timeout Currently, if a series stops to exist, its head chunk will be kept open for an hour. That prevents it from being persisted. Which prevents it from being evicted. Which prevents the series from being archived. Most of the time, once no sample has been added to a series within the staleness limit, we can be pretty confident that this series will not receive samples anymore. The whole chain as described above can be started after 5m instead of 1h. In the relaxed case, this doesn't change a lot as the head chunk timeout is only checked during series maintenance, and usually, a series is only maintained every six hours. However, there is the typical scenario where a large service is deployed, the deoply turns out to be bad, and then it is deployed again within minutes, and quite quickly the number of time series has tripled. That's the point where the Prometheus server is stressed and switches (rightfully) into rushed mode. In that mode, time series are processed as quickly as possible, but all of that is in vein if all of those recently ended time series cannot be persisted yet for another hour. In that scenario, this change will help most, and it's exactly the scenario where help is most desperately needed.	2017-03-26 23:44:50 +02:00
beorn7	48d221c11e	storage: Fix typo in comment	2017-03-16 11:49:41 +01:00
Jeremy Meulemans	025c828976	Changed to open_head_chunks to address review. Now incrementing numHeadChunks directly.	2017-02-17 07:10:13 -06:00
Jeremy Meulemans	074050b8c0	Updating for failed codeclimate check.	2017-02-16 18:04:28 -06:00
Jeremy Meulemans	f70b52d0b6	Adding gauge for number of open head chunks. Fixes #1710	2017-02-16 17:56:45 -06:00
beorn7	d771185a43	storage: Fix chunkIndexToStartSeek calculation With a high enough shrink ratio and enough chunks to persist, the cutoff point could be _outside_ of the file, which wreaks havoc in the storage.	2017-02-10 11:42:59 +01:00
beorn7	73bd5e4dff	Merge branch 'beorn7/storage' into beorn7/storage3	2017-02-09 14:44:10 +01:00
beorn7	46a0837816	storage: Fix offset returned by dropAndPersistChunks This is another corner-case that was previously never exercised because the rewriting of a series file was never prevented by the shrink ratio. Scenario: There is an existing series on disk, which is archived. If a new sample comes in for that file, a new chunk in memory is created, and the chunkDescsOffset is set to -1. If series maintenance happens before the series has at least one chunk to persist _and_ an insufficient chunks on disk is old enough for purging (so that the shrink ratio kicks in), dropAndPersistChunks would return 0, but it should return the chunk length of the series file.	2017-02-09 14:35:07 +01:00
beorn7	9d12204da5	Merge branch 'release-1.5'	2017-02-09 13:11:53 +01:00
beorn7	bed4934224	storage: One more persist error code path discovered Also, in that code path, set chunkDescsOffset to 0 rather than -1 in case of "dropped more chunks from persistence than from memory" so that no other weird things happen before the series is quarantined for good.	2017-02-09 11:51:40 +01:00
beorn7	242d8edcb5	Merge branch 'release-1.5'	2017-02-08 17:28:09 +01:00
beorn7	8c8baaa558	storage: writeMemorySeries needs to return true for quarantined series This is another fallout of my bug hunt.	2017-02-08 16:28:56 +01:00
Mitsuhiro Tanda	be8b1eb656	storage: optimize dropping chunks by using minShrinkRatio (#2397 ) storage: prevent unnecessary chunk header reading if minShrinkRatio > 0	2017-02-07 17:33:54 +01:00
beorn7	2363a90adc	storage: Do not throw away fully persisted memory series in checkpointing	2017-02-06 17:39:59 +01:00
beorn7	244a65fb29	storage: Increase persist watermark before calling append The append call may reuse cds, and thus change its len. (In practice, this wouldn't happen as cds should have len==cap. Still, the previous order of lines was problematic.)	2017-02-05 02:25:09 +01:00
beorn7	75282b27ba	storage: Added checks for invariants	2017-02-04 23:40:22 +01:00
beorn7	31e9db7f0c	storage: Simplify evictChunkDesc method	2017-02-04 22:29:37 +01:00
beorn7	65dc8f44d3	storage: Test for errors returned by MaybePopulateLastTime	2017-02-01 23:43:58 +01:00
beorn7	752fac60ae	storage: Remove race condition from TestLoop	2017-02-01 23:43:58 +01:00
beorn7	4ccfc93dcf	storage: Set shrink ratio in the constructor.	2017-02-01 15:37:16 +01:00
beorn7	b2f086c6c4	storage: Expose bug of not setting the shrink ratio in the contstructor	2017-02-01 15:37:10 +01:00
Brian Brazil	c1b547a90e	Only checkpoint chunkdescs and series that need persisting. (#2340 ) This decreases checkpoint size by not checkpointing things that don't actually need checkpointing. This is fully compatible with the v2 checkpoint format, as it makes series appear as though the only chunksdescs in memory are those that need persisting.	2017-01-17 00:59:38 +00:00
Brian Brazil	f64c231dad	Allow checkpoints and maintenance to happen concurrently. (#2321 ) This is essential on larger Prometheus servers, as otherwise checkpoints prevent sufficient persisting of chunks to disk.	2017-01-13 17:24:19 +00:00
Brian Brazil	1dcb7637f5	Add various persistence related metrics (#2333 ) Add metrics around checkpointing and persistence * Add a metric to say if checkpointing is happening, and another to track total checkpoint time and count. This breaks the existing prometheus_local_storage_checkpoint_duration_seconds by renaming it to prometheus_local_storage_checkpoint_last_duration_seconds as the former name is more appropriate for a summary. * Add metric for last checkpoint size. * Add metric for series/chunks processed by checkpoints. For long checkpoints it'd be useful to see how they're progressing. * Add metric for dirty series * Add metric for number of chunks persisted per series. You can get the number of chunks from chunk_ops, but not the matching number of series. This helps determine the size of the writes being made. * Add metric for chunks queued for persistence Chunks created includes both chunks that'll need persistence and chunks read in for queries. This only includes chunks created for persistence. * Code review comments on new persistence metrics.	2017-01-11 15:11:19 +00:00
Brian Brazil	f9e581907a	Make index queue bigger. (#2322 ) When a large Prometheus starts up fresh it can take many minutes to warmup and clear out the index queue. A larger queue means less blocking, bigger batches and cuts down startup time by ~50%.	2017-01-05 17:57:42 +00:00
Mitsuhiro Tanda	7e369b9318	expose max memory chunks metrics (#2303 ) * expose max memory chunks metrics	2016-12-27 18:34:07 +00:00
Brian Brazil	93b70ee4ea	Evict chunk descs of all unloaded chunks during maintenance. (#2297 ) Keeping these around has two problems: 1) Each desc takes 64 bytes, 10 of them is 640B. This is a lot of overhead on a 1024 byte chunk. 2) It can take well over a week to reach a point where this and thus Prometheus memory usage as a whole enters steady state. This makes RAM estimation very hard for users, and makes it difficult to investigate things like memory fragmentation. Instead we'll wipe them during each memory series maintenance cycle, and if a query pulls them in they'll hang around as cache until the next cycle.	2016-12-22 13:49:03 +00:00
Tristan Colgate	30be8e0b8a	ignore dotfiles in data directory	2016-12-15 11:48:23 +00:00
Björn Rabenstein	45570e5972	Merge pull request #2277 from prometheus/beorn7/storage2 storage: Sanity-check number of loaded chunk descs	2016-12-14 02:59:10 +01:00
beorn7	253be23c00	storage: Sanity-check number of loaded chunk descs Two cases: - An unarchived metric must have at least one chunk desc loaded upon unarchival. Otherwise, the file is gone or has size 0, which is an inconsistency (because the series is still indexed in the archive index). Hence, quarantining is triggered. - If loading the chunk descs of a series with a known chunkDescsOffset (i.e. != -1), the number of chunks loaded must be equal to chunkDescsOffset. If not, there is a data corruption. An error is returned, which leads to qurantining. In any case, there is a guard added to not access the 1st element of an empty chunkDescs slice. (That's what triggered the crashes in issue 2249.) A time series with unknown chunkDescsOffset and no chunks in memory and no chunks on disk either could trigger that case. I would assume such a "null series" doesn't exist, but it's not entirely unthinkable and unreasonable to happen (perhaps in future uses of the storage). (Create a series, and then something tries to preload chunks before the first sample is added.)	2016-12-13 23:19:39 +01:00
Björn Rabenstein	5f0c0e43cf	Merge pull request #2276 from prometheus/beorn7/storage storage: Catch data corruption that leads to division by zero	2016-12-13 23:13:39 +01:00
beorn7	837c029b16	storage: Fix linter issue Go style tries to avoid indented `else` blocks.	2016-12-13 19:05:30 +01:00
beorn7	4719482f5f	storage: Make tests go-vet and golint clean	2016-12-13 17:07:27 +01:00
beorn7	485ac8dff7	storage: Verify validity of byte length when unmarshalling (double)delta chunks This makes sure a division-by-zero crash cannot happen in the Len() method. Fixes #2773	2016-12-13 17:07:27 +01:00
tattsun	e714079cf2	storage: fix error message (#2270 ) * storage: add error message	2016-12-09 22:36:27 +00:00
Christopher M. Luciano	148b006e25	Clarify error message when Prometheus data dir finds unexpected files	2016-12-05 10:51:57 -05:00
Julius Volz	127332c56f	Merge pull request #2168 from tomwilkie/chunk-len Add call to estimate number of samples in a chunk to the API	2016-11-17 23:13:50 -08:00
Tom Wilkie	585878cdb2	Add call to estimate number of samples in a chunk to the API	2016-11-17 19:09:59 +00:00
Björn Rabenstein	036715370f	Merge pull request #2184 from huydx/master Fix possible memory leak by defer inside loop	2016-11-14 15:26:39 +01:00
huydx	c999902761	Fix possible memory leak by defer inside loop	2016-11-14 14:08:08 +09:00

1 2 3 4 5 ...

375 commits