prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-09-22 08:47:33 -07:00

Author	SHA1	Message	Date
beorn7	253be23c00	storage: Sanity-check number of loaded chunk descs Two cases: - An unarchived metric must have at least one chunk desc loaded upon unarchival. Otherwise, the file is gone or has size 0, which is an inconsistency (because the series is still indexed in the archive index). Hence, quarantining is triggered. - If loading the chunk descs of a series with a known chunkDescsOffset (i.e. != -1), the number of chunks loaded must be equal to chunkDescsOffset. If not, there is a data corruption. An error is returned, which leads to qurantining. In any case, there is a guard added to not access the 1st element of an empty chunkDescs slice. (That's what triggered the crashes in issue 2249.) A time series with unknown chunkDescsOffset and no chunks in memory and no chunks on disk either could trigger that case. I would assume such a "null series" doesn't exist, but it's not entirely unthinkable and unreasonable to happen (perhaps in future uses of the storage). (Create a series, and then something tries to preload chunks before the first sample is added.)	2016-12-13 23:19:39 +01:00
Fabian Reinartz	6703404cb4	Merge remote-tracking branch 'origin/release-1.2'	2016-11-01 16:35:22 +01:00
beorn7	c5bd178b93	Protect exported Querier interface method against negative time ranges	2016-11-01 15:05:01 +01:00
Fabian Reinartz	8fa18d564a	storage: enhance Querier interface usage This extracts Querier as an instantiateable and closeable object rather than just defining extending methods of the storage interface. This improves composability and allows abstracting query transactions, which can be useful for transaction-level caches, consistent data views, and encapsulating teardown.	2016-10-16 10:39:29 +02:00
Björn Rabenstein	1e2f03f668	Merge pull request #2005 from redbaron/microoptimise-matching Microoptimise matching	2016-10-05 17:26:56 +02:00
Maxim Ivanov	e6db9f8159	New fpsForLabelMatchers and seriesForLabelMatchers methods These more specific methods have replaced `metricForLabelMatchers` in cases where its `map[fingerprint]metric` result type was not necessary or was used as an intermediate step Avoids duplicated calls to `seriesForRange` from `QueryRange` and `QueryInstant` methods.	2016-10-05 15:15:54 +01:00
Julius Volz	c9d4526428	Unpublish accidentally published series methods There were some more accidentally published methods of the memorySeries type which I didn't notice when reviewing https://github.com/prometheus/prometheus/pull/2011	2016-10-03 00:04:56 +02:00
Maxim Ivanov	4978a65495	Extract initial FP candidate build logic into candidateFPsForLabelMatchers method No functional changes otherwise	2016-10-02 17:35:02 +01:00
Maxim Ivanov	c048a0cde8	Add metrics to result after checking all matchers Should be marginally faster and somewhat more GC friendly	2016-10-02 17:35:02 +01:00
Julius Volz	c25f0de5ae	Remove local.ZeroSample{,Pair}, use model definitions	2016-09-28 23:42:45 +02:00
Julius Volz	044ebce779	Review fixups.	2016-09-28 23:42:44 +02:00
Julius Volz	d30a3c7c0f	Fix accidental publishing of memorySeries.firstTime()	2016-09-26 13:25:27 +02:00
Julius Volz	ab80ced756	storage: separate chunk package, publish more names This is a followup to https://github.com/prometheus/prometheus/pull/2011. This publishes more of the methods and other names of the chunk code and moves the chunk code to its own package. There's some unavoidable ugliness: the chunk and chunkDesc metrics are used by both packages, so I had to move them to the chunk package. That isn't great, but I don't see how to do it better without a larger redesign of everything. Same for the evict requests and some other types.	2016-09-26 13:25:11 +02:00
Matthew Campbell	67d76e3a5d	timeseries: store varbit encoded data into cassandra	2016-09-21 17:56:55 +02:00
Julius Volz	c187308366	storage: Contextify storage interfaces. This is based on https://github.com/prometheus/prometheus/pull/1997. This adds contexts to the relevant Storage methods and already passes PromQL's new per-query context into the storage's query methods. The immediate motivation supporting multi-tenancy in Frankenstein, but this could also be used by Prometheus's normal local storage to support cancellations and timeouts at some point.	2016-09-19 16:29:07 +02:00
Julius Volz	3bfec97d46	Make the storage interface higher-level. See discussion in https://groups.google.com/forum/#!topic/prometheus-developers/bkuGbVlvQ9g The main idea is that the user of a storage shouldn't have to deal with fingerprints anymore, and should not need to do an individual preload call for each metric. The storage interface needs to be made more high-level to not expose these details. This also makes it easier to reuse the same storage interface for remote storages later, as fewer roundtrips are required and the fingerprint concept doesn't work well across the network. NOTE: this deliberately gets rid of a small optimization in the old query Analyzer, where we dedupe instants and ranges for the same series. This should have a minor impact, as most queries do not have multiple selectors loading the same series (and at the same offset).	2016-07-25 13:59:22 +02:00
beorn7	fc6737b7fb	storage: improve index lookups tl;dr: This is not a fundamental solution to the indexing problem (like tindex is) but it at least avoids utilizing the intersection problem to the greatest possible amount. In more detail: Imagine the following query: nicely:aggregating:rule{job="foo",env="prod"} While it uses a nicely aggregating recording rule (which might have a very low cardinality), Prometheus still intersects the low number of fingerprints for `{__name__="nicely:aggregating:rule"}` with the many thousands of fingerprints matching `{job="foo"}` and with the millions of fingerprints matching `{env="prod"}`. This totally innocuous query is dead slow if the Prometheus server has a lot of time series with the `{env="prod"}` label. Ironically, if you make the query more complicated, it becomes blazingly fast: nicely:aggregating:rule{job=~"foo",env=~"prod"} Why so? Because Prometheus only intersects with non-Equal matchers if there are no Equal matchers. That's good in this case because it retrieves the few fingerprints for `{__name__="nicely:aggregating:rule"}` and then starts right ahead to retrieve the metric for those FPs and checking individually if they match the other matchers. This change is generalizing the idea of when to stop intersecting FPs and go into "retrieve metrics and check them individually against remaining matchers" mode: - First, sort all matchers by "expected cardinality". Matchers matching the empty string are always worst (and never used for intersections). Equal matchers are in general consider best, but by using some crude heuristics, we declare some better than others (instance labels or anything that looks like a recording rule). - Then go through the matchers until we hit a threshold of remaining FPs in the intersection. This threshold is higher if we are already in the non-Equal matcher area as intersection is even more expensive here. - Once the threshold has been reached (or we have run out of matchers that do not match the empty string), start with "retrieve metrics and check them individually against remaining matchers". A beefy server at SoundCloud was spending 67% of its CPU time in index lookups (fingerprintsForLabelPairs), serving mostly a dashboard that is exclusively built with recording rules. With this change, it spends only 35% in fingerprintsForLabelPairs. The CPU usage dropped from 26 cores to 18 cores. The median latency for query_range dropped from 14s to 50ms(!). As expected, higher percentile latency didn't improve that much because the new approach is _occasionally_ running into the worst case while the old one was _systematically_ doing so. The 99th percentile latency is now about as high as the median before (14s) while it was almost twice as high before (26s).	2016-07-20 17:35:53 +02:00
beorn7	064b57858e	Consistently use the `Seconds()` method for conversion of durations This also fixes one remaining case of recording integral numbers of seconds only for a metric, i.e. this will probably fix #1796.	2016-07-07 15:24:35 +02:00
Julius Volz	91401794fa	storage: Make MemorySeriesStorage a public type See https://twitter.com/fabxc/status/748032597876482048	2016-06-29 08:14:23 +02:00
Fabian Reinartz	425736a377	*: remove last remainers of non-second metrics	2016-06-23 17:50:39 +02:00
Julius Volz	b7b6717438	Separate query interface out of local.Storage. PromQL only requires a much narrower interface than local.Storage in order to run queries. Narrower interfaces are easier to replace and test, too. We could also change the web interface to use local.Querier, except that we'll probably use appending functions from there in the future.	2016-06-23 15:14:38 +02:00
beorn7	99881ded63	Make the number of fingerprint mutexes configurable With a lot of series accessed in a short timeframe (by a query, a large scrape, checkpointing, ...), there is actually quite a significant amount of lock contention if something similar is running at the same time. In those cases, the number of locks needs to be increased. On the same front, as our fingerprints don't have a lot of entropy, I introduced some additional shuffling. With the current state, anly changes in the least singificant bits of a FP would matter.	2016-06-02 19:18:00 +02:00
beorn7	b2ef4dc52d	Correctly identify no-op appends if the value is NaN This requires an updating of the vendored commen.model package, which I will do once https://github.com/prometheus/common/pull/40 is merged.	2016-05-19 18:32:47 +02:00
beorn7	07a294ac15	Doc comment fixes	2016-04-26 01:05:56 +02:00
beorn7	20cba1ed8f	Initialize metric vectors in memorySeriesStorage	2016-04-25 17:08:07 +02:00
beorn7	d566808d40	Bring back logging of discarded samples But only on DEBUG level. Also, count and report the two cases of out-of-order timestamps on the one hand and same timestamp but different value on the other hand separately.	2016-04-25 16:43:52 +02:00
beorn7	a90d645378	Checkpoint fingerprint mappings only upon shutdown Before, we checkpointed after every newly detected fingerprint collision, which is not a problem as long as collisions are rare. However, with a sufficient number of metrics or particular nature of the data set, there might be a lot of collisions, all to be detected upon the first set of scrapes, and then the checkpointing after each detection will take a quite long time (it's O(n²), essentially). Since we are rebuilding the fingerprint mapping during crash recovery, the previous, very conservative approach didn't even buy us anything. We only ever read from the checkpoint file after a clean shutdown, so the only time we need to write the checkpoint file is during a clean shutdown.	2016-04-15 01:03:28 +02:00
beorn7	199f309a39	Resurrect and rename invalid preload requests count metric. It is now also used in label matching, so the name of the metric changed from `prometheus_local_storage_invalid_preload_requests_total` to `non_existent_series_matches_total'.	2016-03-13 11:54:24 +01:00
beorn7	e8c1f30ab2	Merge the parallel logic of getSeriesForRange and metricForFingerprint	2016-03-09 21:56:15 +01:00
beorn7	9445c7053d	Add tests for range-limited label matching While doing so, improve getSeriesForRange.	2016-03-09 21:01:03 +01:00
beorn7	47e3c90f9b	Clean up error propagation Only return an error where callers are doing something with it except simply logging and ignoring. All the errors touched in this commit flag the storage as dirty anyway, and that fact is logged anyway. So most of what is being removed here is just log spam. As discussed earlier, the class of errors that flags the storage as dirty signals fundamental corruption, no even bubbling up a one-time warning to the user (e.g. about incomplete results) isn't helping much because _anything_ happening in the storage has to be doubted from that point on (and in fact retroactively into the past, too). Flagging the storage dirty, and alerting on it (plus marking the state in the web UI) is the only way I can see right now. As a byproduct, I cleaned up the setDirty method a bit and improved the logged errors.	2016-03-09 18:56:30 +01:00
beorn7	99854a84d7	Merge branch 'beorn7/storage6' into beorn7/storage7	2016-03-09 17:23:25 +01:00
beorn7	b343e65907	Merge branch 'beorn7/storage4' into beorn7/storage5 erge is necessary,	2016-03-09 17:14:42 +01:00
beorn7	d0a4477446	Merge branch 'beorn7/storage3' into beorn7/storage4 Conflicts: storage/local/preload.go storage/local/storage.go storage/local/storage_test.go	2016-03-09 17:13:16 +01:00
beorn7	55eddab25f	Merge branch 'beorn7/storage2' into beorn7/storage3	2016-03-09 16:48:46 +01:00
beorn7	beb36df4bb	De-flag preloadChunksForRange Now there is preloadChunksForRange and preloadChunksForInstant in both, the series and the storage.	2016-03-09 14:50:09 +01:00
beorn7	836f1db04c	Improve MetricsForLabelMatchers WIP: This needs more tests. It now gets a from and through value, which it may opportunistically use to optimize the retrieval. With possible future range indices, this could be used in a very efficient way. This change merely applies some easy checks, which should nevertheless solve the use case of heavy rule evaluations on servers with a lot of series churn. Idea is the following: - Only archive series that are at least as old as the headChunkTimeout (which was already extremely unlikely to happen). - Then maintain a high watermark for the last archival, i.e. no archived series has a sample more recent than that watermark. - Any query that doesn't reach to a time before that watermark doesn't have to touch the archive index at all. (A production server at Soundcloud with the aforementioned series churn and heavy rule evaluations spends 50% of its CPU time in archive index lookups. Since rule evaluations usually only touch very recent values, most of those lookup should disappear with this change.) - Federation with a very broad label matcher will profit from this, too. As a byproduct, the un-needed MetricForFingerprint method was removed from the Storage interface.	2016-03-09 00:25:59 +01:00
beorn7	fc7de5374a	Quarantine series upon problem writing to the series file This fixes https://github.com/prometheus/prometheus/issues/1059 , but not in the obvious way (simply not updating the persist watermark, because that's actually not that simple - we don't really know what has gone wrong exactly). As any errors relevant here are most likely caused by severe and unrecoverable problems with the series file, Using the now quarantine feature is the right step. We don't really have to be worried about any inconsistent state of the series because it will be removed for good ASAP. Another plus is that we don't have to declare the whole storage dirty anymore.	2016-03-03 13:15:02 +01:00
beorn7	0ea5801e47	Handle errors caused by data corruption more gracefully This requires all the panic calls upon unexpected data to be converted into errors returned. This pollute the function signatures quite lot. Well, this is Go... The ideas behind this are the following: - panic only if it's a programming error. Data corruptions happen, and they are not programming errors. - If we detect a data corruption, we "quarantine" the series, essentially removing it from the database and putting its data into a separate directory for forensics. - Failure during writing to a series file is not considered corruption automatically. It will call setDirty, though, so that a crashrecovery upon the next restart will commence and check for that. - Series quarantining and setDirty calls are logged and counted in metrics, but are hidden from the user of the interfaces in interface.go, whith the notable exception of Append(). The reasoning is that we treat corruption by removing the corrupted series, i.e. a query for it will return no results on its next call anyway, so return no results right now. In the case of Append(), we want to tell the user that no data has been appended, though. Minor side effects: - Now consistently using filepath.* instead of path.*. - Introduced structured logging where I touched it. This makes things less consistent, but a complete change to structured logging would be out of scope for this PR.	2016-03-02 23:02:34 +01:00
beorn7	c740789ce3	Improve predict_linear Fixes https://github.com/prometheus/prometheus/issues/1401 This remove the last (and in fact bogus) use of BoundaryValues. Thus, a whole lot of unused (and arguably sub-optimal / ugly) code can be removed here, too.	2016-02-25 12:10:55 +01:00
beorn7	4b503ed9a5	Merge branch 'master' into beorn7/storage2	2016-02-24 14:03:49 +01:00
Björn Rabenstein	a8c79f0a0c	Merge pull request #1422 from prometheus/release-0.17 Merge more commits from 0.17.	2016-02-23 23:07:44 +01:00
beorn7	8fa1560e48	Fix a very special case of handling the checkpoint timer	2016-02-23 16:48:35 +01:00
beorn7	41e44f6ab9	Merge branch 'master' into beorn7/storage2	2016-02-22 16:54:33 +01:00
Björn Rabenstein	d9eb624322	Merge pull request #1415 from prometheus/release-0.17 Forward-merge release-0.17 into master	2016-02-22 16:39:48 +01:00
beorn7	4d1f7b49b6	Fix a race condition in calculatePersistenceUrgencyScore	2016-02-22 15:48:39 +01:00
beorn7	454ecf3f52	Rework the way ranges and instants are handled In a way, our instants were also ranges, just with the staleness delta as range length. They are no treated equally, just that in one case, the range length is set as range, in the other the staleness delta. However, there are "real" instants where start and and time of a query is the same. In those cases, we only want to return a single value (the one closest before or at the equal start and end time). If that value is the last sample in the series, odds are we have it already in the series object. In that case, there is no need to pin or load any chunks. A special singleSampleSeriesIterator is created for that. This should greatly speed up instant queries as they happen frequently for rule evaluations.	2016-02-22 01:47:18 +01:00
beorn7	b876f8e6a5	Move lastSamplePair method up to memorySeries This implies a slight change of behavior as only samples added to the respective instance of a memorySeries are returned. However, this is most likely anyway what we want. Following cases: - Server has been restarted: Given the time it takes to cleanly shutdown and start up a server, the series are now stale anyway. An improved staleness handling (still to be implemented) will be based on tracking if a given target is continuing to expose samples for a given time series. In that case, we need a full scrape cycle to decide about staleness. So again, it makes sense to consider everything stale directly after a server restart. - Series unarchived due to a read request: The series is definitely stale so we don't want to return anything anyway. - Freshly created time series or series unarchived because of a sample append: That happens because appending a sample is imminent. Before the fingerprint lock is released, the series will have received a sample, and lastSamplePair will always returned the expected value.	2016-02-19 18:16:41 +01:00
beorn7	1e13f89039	Return SamplePair istead of *SamplePair consistently Formalize ZeroSamplePair as return value for non-existing samples. Change LastSamplePairForFingerprint to return a SamplePair (and not a pointer to it), which saves allocations in a potentially extremely frequent call.	2016-02-19 17:00:40 +01:00
beorn7	0e202dacb4	Streamline series iterator creation This will fix issue #1035 and will also help to make issue #1264 less bad. The fundamental problem in the current code: In the preload phase, we quite accurately determine which chunks will be used for the query being executed. However, in the subsequent step of creating series iterators, the created iterators are referencing _all_ in-memory chunks in their series, even the un-pinned ones. In iterator creation, we copy a pointer to each in-memory chunk of a series into the iterator. While this creates a certain amount of allocation churn, the worst thing about it is that copying the chunk pointer out of the chunkDesc requires a mutex acquisition. (Remember that the iterator will also reference un-pinned chunks, so we need to acquire the mutex to protect against concurrent eviction.) The worst case happens if a series doesn't even contain any relevant samples for the query time range. We notice that during preloading but then we will still create a series iterator for it. But even for series that do contain relevant samples, the overhead is quite bad for instant queries that retrieve a single sample from each series, but still go through all the effort of series iterator creation. All of that is particularly bad if a series has many in-memory chunks. This commit addresses the problem from two sides: First, it merges preloading and iterator creation into one step, i.e. the preload call returns an iterator for exactly the preloaded chunks. Second, the required mutex acquisition in chunkDesc has been greatly reduced. That was enabled by a side effect of the first step, which is that the iterator is only referencing pinned chunks, so there is no risk of concurrent eviction anymore, and chunks can be accessed without mutex acquisition. To simplify the code changes for the above, the long-planned change of ValueAtTime to ValueAtOrBefore time was performed at the same time. (It should have been done first, but it kind of accidentally happened while I was in the middle of writing the series iterator changes. Sorry for that.) So far, we actively filtered the up to two values that were returned by ValueAtTime, i.e. we invested work to retrieve up to two values, and then we invested more work to throw one of them away. The SeriesIterator.BoundaryValues method can be removed once #1401 is fixed. But I really didn't want to load even more changes into this PR. Benchmarks: The BenchmarkFuzz.* benchmarks run 83% faster (i.e. about six times faster) and allocate 95% fewer bytes. The reason for that is that the benchmark reads one sample after another from the time series and creates a new series iterator for each sample read. To find out how much these improvements matter in practice, I have mirrored a beefy Prometheus server at SoundCloud that suffers from both issues #1035 and #1264. To reach steady state that would be comparable, the server needs to run for 15d. So far, it has run for 1d. The test server currently has only half as many memory time series and 60% of the memory chunks the main server has. The 90th percentile rule evaluation cycle time is ~11s on the main server and only ~3s on the test server. However, these numbers might get much closer over time. In addition to performance improvements, this commit removes about 150 LOC.	2016-02-19 16:24:38 +01:00

1 2 3 4

176 commits