prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2025-03-05 20:59:13 -08:00

Author	SHA1	Message	Date
Björn Rabenstein	50e4f49b7e	Merge pull request #2561 from prometheus/beorn7/storage2 storage: Evict unused chunk.Descs in crash recovery	2017-04-04 00:05:03 +02:00
beorn7	08fc6cbd39	storage: Evict unused chunk.Descs in crash recovery This is in line with the v1.5 change in paradigm to not keep chunk.Descs without chunks around after a series maintenance. It's mainly motivated by avoiding excessive amounts of RAM usage during crash recovery. The code avoids to create memory time series with zero chunk.Descs as that is prone to trigger weird effects. (Series maintenance would archive series with zero chunk.Descs, but we cannot do that here because the archive indices still have to be checked.)	2017-04-04 00:04:22 +02:00
beorn7	d284ffab03	storage: Replace fpIter by sortedFPs The fpIter was kind of cumbersome to use and required a lock for each iteration (which wasn't even needed for the iteration at startup after loading the checkpoint). The new implementation here has an obvious penalty in memory, but it's only 8 byte per series, so 80MiB for a beefy server with 10M memory time series (which would probably need ~100GiB RAM, so the memory penalty is only 0.1% of the total memory need). The big advantage is that now series maintenance happens in order, which leads to the time between two maintenances of the same series being less random. Ideally, after each maintenance, the next maintenance would tackle the series with the largest number of non-persisted chunks. That would be quite an effort to find out or track, but with the approach here, the next maintenance will tackle the series whose previous maintenance is longest ago, which is a good approximation. While this commit won't change the _average_ number of chunks persisted per maintenance, it will reduce the mean time a given chunk has to wait for its persistence and thus reduce the steady-state number of chunks waiting for persistence. Also, the map iteration in Go is non-deterministic but not truly random. In practice, the iteration appears to be somewhat "bucketed". You can often observe a bunch of series with similar duration since their last maintenance, i.e. you see batches of series with similar number of chunks persisted per maintenance. If that batch is relatively young, a whole lot of series are maintained with very few chunks to persist. (See screenshot in PR for a better explanation.)	2017-04-03 15:34:46 +02:00
beorn7	434ab2a6a3	storage: Evict chunks and calculate persistence pressure based on target heap size This is a fairly easy attempt to dynamically evict chunks based on the heap size. A target heap size has to be set as a command line flage, so that users can essentially say "utilize 4GiB of RAM, and please don't OOM". The -storage.local.max-chunks-to-persist and -storage.local.memory-chunks flags are deprecated by this change. Backwards compatibility is provided by ignoring -storage.local.max-chunks-to-persist and use -storage.local.memory-chunks to set the new -storage.local.target-heap-size to a reasonable (and conservative) value (both with a warning). This also makes the metrics intstrumentation more consistent (in naming and implementation) and cleans up a few quirks in the tests. Answers to anticipated comments: There is a chance that Go 1.9 will allow programs better control over the Go memory management. I don't expect those changes to be in contradiction with the approach here, but I do expect them to complement them and allow them to be more precise and controlled. In any case, once those Go changes are available, this code has to be revisted. One might be tempted to let the user specify an estimated value for the RSS usage, and then internall set a target heap size of a certain fraction of that. (In my experience, 2/3 is a fairly safe bet.) However, investigations have shown that RSS size and its relation to the heap size is really really complicated. It depends on so many factors that I wouldn't even start listing them in a commit description. It depends on many circumstances and not at least on the risk trade-off of each individual user between RAM utilization and probability of OOMing during a RAM usage peak. To not add even more to the confusion, we need to stick to the well-defined number we also use in the targeting here, the sum of the sizes of heap objects.	2017-03-27 14:33:50 +02:00
beorn7	96a303b348	storage: Use staleness delta as head chunk timeout Currently, if a series stops to exist, its head chunk will be kept open for an hour. That prevents it from being persisted. Which prevents it from being evicted. Which prevents the series from being archived. Most of the time, once no sample has been added to a series within the staleness limit, we can be pretty confident that this series will not receive samples anymore. The whole chain as described above can be started after 5m instead of 1h. In the relaxed case, this doesn't change a lot as the head chunk timeout is only checked during series maintenance, and usually, a series is only maintained every six hours. However, there is the typical scenario where a large service is deployed, the deoply turns out to be bad, and then it is deployed again within minutes, and quite quickly the number of time series has tripled. That's the point where the Prometheus server is stressed and switches (rightfully) into rushed mode. In that mode, time series are processed as quickly as possible, but all of that is in vein if all of those recently ended time series cannot be persisted yet for another hour. In that scenario, this change will help most, and it's exactly the scenario where help is most desperately needed.	2017-03-26 23:44:50 +02:00
Jeremy Meulemans	025c828976	Changed to open_head_chunks to address review. Now incrementing numHeadChunks directly.	2017-02-17 07:10:13 -06:00
Jeremy Meulemans	074050b8c0	Updating for failed codeclimate check.	2017-02-16 18:04:28 -06:00
Jeremy Meulemans	f70b52d0b6	Adding gauge for number of open head chunks. Fixes #1710	2017-02-16 17:56:45 -06:00
beorn7	bed4934224	storage: One more persist error code path discovered Also, in that code path, set chunkDescsOffset to 0 rather than -1 in case of "dropped more chunks from persistence than from memory" so that no other weird things happen before the series is quarantined for good.	2017-02-09 11:51:40 +01:00
beorn7	8c8baaa558	storage: writeMemorySeries needs to return true for quarantined series This is another fallout of my bug hunt.	2017-02-08 16:28:56 +01:00
beorn7	65dc8f44d3	storage: Test for errors returned by MaybePopulateLastTime	2017-02-01 23:43:58 +01:00
Brian Brazil	f64c231dad	Allow checkpoints and maintenance to happen concurrently. (#2321 ) This is essential on larger Prometheus servers, as otherwise checkpoints prevent sufficient persisting of chunks to disk.	2017-01-13 17:24:19 +00:00
Brian Brazil	1dcb7637f5	Add various persistence related metrics (#2333 ) Add metrics around checkpointing and persistence * Add a metric to say if checkpointing is happening, and another to track total checkpoint time and count. This breaks the existing prometheus_local_storage_checkpoint_duration_seconds by renaming it to prometheus_local_storage_checkpoint_last_duration_seconds as the former name is more appropriate for a summary. * Add metric for last checkpoint size. * Add metric for series/chunks processed by checkpoints. For long checkpoints it'd be useful to see how they're progressing. * Add metric for dirty series * Add metric for number of chunks persisted per series. You can get the number of chunks from chunk_ops, but not the matching number of series. This helps determine the size of the writes being made. * Add metric for chunks queued for persistence Chunks created includes both chunks that'll need persistence and chunks read in for queries. This only includes chunks created for persistence. * Code review comments on new persistence metrics.	2017-01-11 15:11:19 +00:00
Mitsuhiro Tanda	7e369b9318	expose max memory chunks metrics (#2303 ) * expose max memory chunks metrics	2016-12-27 18:34:07 +00:00
beorn7	253be23c00	storage: Sanity-check number of loaded chunk descs Two cases: - An unarchived metric must have at least one chunk desc loaded upon unarchival. Otherwise, the file is gone or has size 0, which is an inconsistency (because the series is still indexed in the archive index). Hence, quarantining is triggered. - If loading the chunk descs of a series with a known chunkDescsOffset (i.e. != -1), the number of chunks loaded must be equal to chunkDescsOffset. If not, there is a data corruption. An error is returned, which leads to qurantining. In any case, there is a guard added to not access the 1st element of an empty chunkDescs slice. (That's what triggered the crashes in issue 2249.) A time series with unknown chunkDescsOffset and no chunks in memory and no chunks on disk either could trigger that case. I would assume such a "null series" doesn't exist, but it's not entirely unthinkable and unreasonable to happen (perhaps in future uses of the storage). (Create a series, and then something tries to preload chunks before the first sample is added.)	2016-12-13 23:19:39 +01:00
Fabian Reinartz	6703404cb4	Merge remote-tracking branch 'origin/release-1.2'	2016-11-01 16:35:22 +01:00
beorn7	c5bd178b93	Protect exported Querier interface method against negative time ranges	2016-11-01 15:05:01 +01:00
Fabian Reinartz	8fa18d564a	storage: enhance Querier interface usage This extracts Querier as an instantiateable and closeable object rather than just defining extending methods of the storage interface. This improves composability and allows abstracting query transactions, which can be useful for transaction-level caches, consistent data views, and encapsulating teardown.	2016-10-16 10:39:29 +02:00
Björn Rabenstein	1e2f03f668	Merge pull request #2005 from redbaron/microoptimise-matching Microoptimise matching	2016-10-05 17:26:56 +02:00
Maxim Ivanov	e6db9f8159	New fpsForLabelMatchers and seriesForLabelMatchers methods These more specific methods have replaced `metricForLabelMatchers` in cases where its `map[fingerprint]metric` result type was not necessary or was used as an intermediate step Avoids duplicated calls to `seriesForRange` from `QueryRange` and `QueryInstant` methods.	2016-10-05 15:15:54 +01:00
Julius Volz	c9d4526428	Unpublish accidentally published series methods There were some more accidentally published methods of the memorySeries type which I didn't notice when reviewing https://github.com/prometheus/prometheus/pull/2011	2016-10-03 00:04:56 +02:00
Maxim Ivanov	4978a65495	Extract initial FP candidate build logic into candidateFPsForLabelMatchers method No functional changes otherwise	2016-10-02 17:35:02 +01:00
Maxim Ivanov	c048a0cde8	Add metrics to result after checking all matchers Should be marginally faster and somewhat more GC friendly	2016-10-02 17:35:02 +01:00
Julius Volz	c25f0de5ae	Remove local.ZeroSample{,Pair}, use model definitions	2016-09-28 23:42:45 +02:00
Julius Volz	044ebce779	Review fixups.	2016-09-28 23:42:44 +02:00
Julius Volz	d30a3c7c0f	Fix accidental publishing of memorySeries.firstTime()	2016-09-26 13:25:27 +02:00
Julius Volz	ab80ced756	storage: separate chunk package, publish more names This is a followup to https://github.com/prometheus/prometheus/pull/2011. This publishes more of the methods and other names of the chunk code and moves the chunk code to its own package. There's some unavoidable ugliness: the chunk and chunkDesc metrics are used by both packages, so I had to move them to the chunk package. That isn't great, but I don't see how to do it better without a larger redesign of everything. Same for the evict requests and some other types.	2016-09-26 13:25:11 +02:00
Matthew Campbell	67d76e3a5d	timeseries: store varbit encoded data into cassandra	2016-09-21 17:56:55 +02:00
Julius Volz	c187308366	storage: Contextify storage interfaces. This is based on https://github.com/prometheus/prometheus/pull/1997. This adds contexts to the relevant Storage methods and already passes PromQL's new per-query context into the storage's query methods. The immediate motivation supporting multi-tenancy in Frankenstein, but this could also be used by Prometheus's normal local storage to support cancellations and timeouts at some point.	2016-09-19 16:29:07 +02:00
Julius Volz	3bfec97d46	Make the storage interface higher-level. See discussion in https://groups.google.com/forum/#!topic/prometheus-developers/bkuGbVlvQ9g The main idea is that the user of a storage shouldn't have to deal with fingerprints anymore, and should not need to do an individual preload call for each metric. The storage interface needs to be made more high-level to not expose these details. This also makes it easier to reuse the same storage interface for remote storages later, as fewer roundtrips are required and the fingerprint concept doesn't work well across the network. NOTE: this deliberately gets rid of a small optimization in the old query Analyzer, where we dedupe instants and ranges for the same series. This should have a minor impact, as most queries do not have multiple selectors loading the same series (and at the same offset).	2016-07-25 13:59:22 +02:00
beorn7	fc6737b7fb	storage: improve index lookups tl;dr: This is not a fundamental solution to the indexing problem (like tindex is) but it at least avoids utilizing the intersection problem to the greatest possible amount. In more detail: Imagine the following query: nicely:aggregating:rule{job="foo",env="prod"} While it uses a nicely aggregating recording rule (which might have a very low cardinality), Prometheus still intersects the low number of fingerprints for `{__name__="nicely:aggregating:rule"}` with the many thousands of fingerprints matching `{job="foo"}` and with the millions of fingerprints matching `{env="prod"}`. This totally innocuous query is dead slow if the Prometheus server has a lot of time series with the `{env="prod"}` label. Ironically, if you make the query more complicated, it becomes blazingly fast: nicely:aggregating:rule{job=~"foo",env=~"prod"} Why so? Because Prometheus only intersects with non-Equal matchers if there are no Equal matchers. That's good in this case because it retrieves the few fingerprints for `{__name__="nicely:aggregating:rule"}` and then starts right ahead to retrieve the metric for those FPs and checking individually if they match the other matchers. This change is generalizing the idea of when to stop intersecting FPs and go into "retrieve metrics and check them individually against remaining matchers" mode: - First, sort all matchers by "expected cardinality". Matchers matching the empty string are always worst (and never used for intersections). Equal matchers are in general consider best, but by using some crude heuristics, we declare some better than others (instance labels or anything that looks like a recording rule). - Then go through the matchers until we hit a threshold of remaining FPs in the intersection. This threshold is higher if we are already in the non-Equal matcher area as intersection is even more expensive here. - Once the threshold has been reached (or we have run out of matchers that do not match the empty string), start with "retrieve metrics and check them individually against remaining matchers". A beefy server at SoundCloud was spending 67% of its CPU time in index lookups (fingerprintsForLabelPairs), serving mostly a dashboard that is exclusively built with recording rules. With this change, it spends only 35% in fingerprintsForLabelPairs. The CPU usage dropped from 26 cores to 18 cores. The median latency for query_range dropped from 14s to 50ms(!). As expected, higher percentile latency didn't improve that much because the new approach is _occasionally_ running into the worst case while the old one was _systematically_ doing so. The 99th percentile latency is now about as high as the median before (14s) while it was almost twice as high before (26s).	2016-07-20 17:35:53 +02:00
beorn7	064b57858e	Consistently use the `Seconds()` method for conversion of durations This also fixes one remaining case of recording integral numbers of seconds only for a metric, i.e. this will probably fix #1796.	2016-07-07 15:24:35 +02:00
Julius Volz	91401794fa	storage: Make MemorySeriesStorage a public type See https://twitter.com/fabxc/status/748032597876482048	2016-06-29 08:14:23 +02:00
Fabian Reinartz	425736a377	*: remove last remainers of non-second metrics	2016-06-23 17:50:39 +02:00
Julius Volz	b7b6717438	Separate query interface out of local.Storage. PromQL only requires a much narrower interface than local.Storage in order to run queries. Narrower interfaces are easier to replace and test, too. We could also change the web interface to use local.Querier, except that we'll probably use appending functions from there in the future.	2016-06-23 15:14:38 +02:00
beorn7	99881ded63	Make the number of fingerprint mutexes configurable With a lot of series accessed in a short timeframe (by a query, a large scrape, checkpointing, ...), there is actually quite a significant amount of lock contention if something similar is running at the same time. In those cases, the number of locks needs to be increased. On the same front, as our fingerprints don't have a lot of entropy, I introduced some additional shuffling. With the current state, anly changes in the least singificant bits of a FP would matter.	2016-06-02 19:18:00 +02:00
beorn7	b2ef4dc52d	Correctly identify no-op appends if the value is NaN This requires an updating of the vendored commen.model package, which I will do once https://github.com/prometheus/common/pull/40 is merged.	2016-05-19 18:32:47 +02:00
beorn7	07a294ac15	Doc comment fixes	2016-04-26 01:05:56 +02:00
beorn7	20cba1ed8f	Initialize metric vectors in memorySeriesStorage	2016-04-25 17:08:07 +02:00
beorn7	d566808d40	Bring back logging of discarded samples But only on DEBUG level. Also, count and report the two cases of out-of-order timestamps on the one hand and same timestamp but different value on the other hand separately.	2016-04-25 16:43:52 +02:00
beorn7	a90d645378	Checkpoint fingerprint mappings only upon shutdown Before, we checkpointed after every newly detected fingerprint collision, which is not a problem as long as collisions are rare. However, with a sufficient number of metrics or particular nature of the data set, there might be a lot of collisions, all to be detected upon the first set of scrapes, and then the checkpointing after each detection will take a quite long time (it's O(n²), essentially). Since we are rebuilding the fingerprint mapping during crash recovery, the previous, very conservative approach didn't even buy us anything. We only ever read from the checkpoint file after a clean shutdown, so the only time we need to write the checkpoint file is during a clean shutdown.	2016-04-15 01:03:28 +02:00
beorn7	199f309a39	Resurrect and rename invalid preload requests count metric. It is now also used in label matching, so the name of the metric changed from `prometheus_local_storage_invalid_preload_requests_total` to `non_existent_series_matches_total'.	2016-03-13 11:54:24 +01:00
beorn7	e8c1f30ab2	Merge the parallel logic of getSeriesForRange and metricForFingerprint	2016-03-09 21:56:15 +01:00
beorn7	9445c7053d	Add tests for range-limited label matching While doing so, improve getSeriesForRange.	2016-03-09 21:01:03 +01:00
beorn7	47e3c90f9b	Clean up error propagation Only return an error where callers are doing something with it except simply logging and ignoring. All the errors touched in this commit flag the storage as dirty anyway, and that fact is logged anyway. So most of what is being removed here is just log spam. As discussed earlier, the class of errors that flags the storage as dirty signals fundamental corruption, no even bubbling up a one-time warning to the user (e.g. about incomplete results) isn't helping much because _anything_ happening in the storage has to be doubted from that point on (and in fact retroactively into the past, too). Flagging the storage dirty, and alerting on it (plus marking the state in the web UI) is the only way I can see right now. As a byproduct, I cleaned up the setDirty method a bit and improved the logged errors.	2016-03-09 18:56:30 +01:00
beorn7	99854a84d7	Merge branch 'beorn7/storage6' into beorn7/storage7	2016-03-09 17:23:25 +01:00
beorn7	b343e65907	Merge branch 'beorn7/storage4' into beorn7/storage5 erge is necessary,	2016-03-09 17:14:42 +01:00
beorn7	d0a4477446	Merge branch 'beorn7/storage3' into beorn7/storage4 Conflicts: storage/local/preload.go storage/local/storage.go storage/local/storage_test.go	2016-03-09 17:13:16 +01:00
beorn7	55eddab25f	Merge branch 'beorn7/storage2' into beorn7/storage3	2016-03-09 16:48:46 +01:00
beorn7	beb36df4bb	De-flag preloadChunksForRange Now there is preloadChunksForRange and preloadChunksForInstant in both, the series and the storage.	2016-03-09 14:50:09 +01:00

1 2 3 4

190 commits