prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2025-03-05 20:59:13 -08:00

Author	SHA1	Message	Date
huydx	c999902761	Fix possible memory leak by defer inside loop	2016-11-14 14:08:08 +09:00
Fabian Reinartz	6703404cb4	Merge remote-tracking branch 'origin/release-1.2'	2016-11-01 16:35:22 +01:00
beorn7	c5bd178b93	Protect exported Querier interface method against negative time ranges	2016-11-01 15:05:01 +01:00
beorn7	5b16d6bd6e	Merge branch 'release-1.2'	2016-10-31 00:06:23 +01:00
beorn7	876e5da4f8	Add guard against non-monotonic samples in series This can only happen due to data corruption.	2016-10-25 14:59:33 +02:00
Dominik Schulz	182e17958a	Trivial spelling corrections and a small comment.	2016-10-18 20:14:38 +02:00
Fabian Reinartz	8fa18d564a	storage: enhance Querier interface usage This extracts Querier as an instantiateable and closeable object rather than just defining extending methods of the storage interface. This improves composability and allows abstracting query transactions, which can be useful for transaction-level caches, consistent data views, and encapsulating teardown.	2016-10-16 10:39:29 +02:00
beorn7	719508752b	Re-add counting of evict chunk ops and decrementing NumMemChunks Also, modify test to expose the regression.	2016-10-10 16:30:10 +02:00
Julius Volz	cb02f017ee	Clean up some doc comments	2016-10-06 21:53:40 +02:00
Julius Volz	c212ef0326	Add Chunk.Utilization() methods When using the chunking code in other projects (both Weave Prism and ChronixDB ingester), you sometimes want to know how well you are utilizing your chunks when closing/storing them.	2016-10-06 16:31:59 +02:00
Björn Rabenstein	1e2f03f668	Merge pull request #2005 from redbaron/microoptimise-matching Microoptimise matching	2016-10-05 17:26:56 +02:00
Maxim Ivanov	e6db9f8159	New fpsForLabelMatchers and seriesForLabelMatchers methods These more specific methods have replaced `metricForLabelMatchers` in cases where its `map[fingerprint]metric` result type was not necessary or was used as an intermediate step Avoids duplicated calls to `seriesForRange` from `QueryRange` and `QueryInstant` methods.	2016-10-05 15:15:54 +01:00
Julius Volz	c9d4526428	Unpublish accidentally published series methods There were some more accidentally published methods of the memorySeries type which I didn't notice when reviewing https://github.com/prometheus/prometheus/pull/2011	2016-10-03 00:04:56 +02:00
Maxim Ivanov	4978a65495	Extract initial FP candidate build logic into candidateFPsForLabelMatchers method No functional changes otherwise	2016-10-02 17:35:02 +01:00
Maxim Ivanov	c048a0cde8	Add metrics to result after checking all matchers Should be marginally faster and somewhat more GC friendly	2016-10-02 17:35:02 +01:00
Maxim Ivanov	bedc0eda1f	Added BenchmarkQueryRange	2016-10-02 17:35:02 +01:00
Julius Volz	c25f0de5ae	Remove local.ZeroSample{,Pair}, use model definitions	2016-09-28 23:42:45 +02:00
Julius Volz	044ebce779	Review fixups.	2016-09-28 23:42:44 +02:00
Julius Volz	d30a3c7c0f	Fix accidental publishing of memorySeries.firstTime()	2016-09-26 13:25:27 +02:00
Julius Volz	ab80ced756	storage: separate chunk package, publish more names This is a followup to https://github.com/prometheus/prometheus/pull/2011. This publishes more of the methods and other names of the chunk code and moves the chunk code to its own package. There's some unavoidable ugliness: the chunk and chunkDesc metrics are used by both packages, so I had to move them to the chunk package. That isn't great, but I don't see how to do it better without a larger redesign of everything. Same for the evict requests and some other types.	2016-09-26 13:25:11 +02:00
Julius Volz	42c05dd3a2	Merge pull request #2011 from mattkanwisher/chuck-public Make Chunk and ChunkIterator public for reuse	2016-09-23 14:45:35 +02:00
beorn7	ca98382943	Avoid `defer` in seriesMap.get This is related to https://github.com/golang/go/issues/14939 . It's probably the only occurrence where it matters.	2016-09-22 17:50:58 +02:00
Matthew Campbell	67d76e3a5d	timeseries: store varbit encoded data into cassandra	2016-09-21 17:56:55 +02:00
Julius Volz	c187308366	storage: Contextify storage interfaces. This is based on https://github.com/prometheus/prometheus/pull/1997. This adds contexts to the relevant Storage methods and already passes PromQL's new per-query context into the storage's query methods. The immediate motivation supporting multi-tenancy in Frankenstein, but this could also be used by Prometheus's normal local storage to support cancellations and timeouts at some point.	2016-09-19 16:29:07 +02:00
Maxim Ivanov	bdc53098fc	Avoid having contended mutexes on same cacheline CPUs have to serialise write access to a single cache line effectively reducing level of possible parallelism. Placing mutexes on different cache lines avoids this problem. Most gains will be seen on NUMA servers where CPU interconnect traffic is especially expensive Before: go test . -run none -bench BenchmarkFingerprintLocker BenchmarkFingerprintLockerParallel-4 2000000 932 ns/op BenchmarkFingerprintLockerSerial-4 30000000 49.6 ns/op After: go test . -run none -bench BenchmarkFingerprintLocker BenchmarkFingerprintLockerParallel-4 3000000 569 ns/op BenchmarkFingerprintLockerSerial-4 30000000 51.0 ns/op	2016-09-18 23:32:55 +01:00
Julius Volz	5f5a78e807	Merge pull request #1974 from prometheus/disable-local-storage Allow disabling local storage.	2016-09-17 18:40:01 +02:00
Tobias Schmidt	29ced0090f	Fix common english misspellings	2016-09-14 23:23:28 -04:00
Tobias Schmidt	e2c12dcdb5	Add missing error check in persistence test	2016-09-14 23:16:47 -04:00
Julius Volz	b24e5d63bc	Add noop local storage engine. This adds a flag -storage.local.engine which allows turning off local storage in Prometheus. Instead of adding if-conditions and nil checks to all parts of Prometheus that deal with Prometheus's local storage (including the web interface), disabling local storage simply means replacing the normal local storage with a noop version that throws samples away and returns empty query results. We also don't add the noop storage to the fanout appender to decrease internal overhead. Instead of returning empty results, an alternate behavior could be to return errors on any query that point out that the local storage is disabled. Not sure which one is more preferable, so I went with the empty result option for now.	2016-09-14 13:18:05 +02:00
Fabian Reinartz	22296dcb85	storage: clarify sample/matcher relation in docs	2016-09-12 11:19:36 +02:00
Fabian Reinartz	cc6f988a5e	storage: fix Querier interface documentation	2016-09-12 10:48:54 +02:00
nghialv	7655038184	fix typo	2016-09-08 19:01:32 +09:00
Dan Milstein	ec064c96f6	Add field names to table-driven test fixtures	2016-08-30 07:57:39 -04:00
Dan Milstein	ac8788aca6	Convert to table-driven test and inline helper func	2016-08-30 07:57:39 -04:00
Dan Milstein	f50f656a66	Fix double-delta unmarshaling to respect actual min header size Turns out its valid to have an overall chunk which is smaller than the full doubleDeltaHeaderBytes size -- if it has a single sample, it doesn't fill the whole header. Updated unmarshalling check to respect this.	2016-08-30 07:57:39 -04:00
Dan Milstein	b815956341	Catch errors when unmarshalling delta/doubleDelta encoded chunks This is (hopefully) a fix for #1653 Specifically, this makes it so that if the length for the stored delta/doubleDelta is somehow corrupted to be too small, the attempt to unmarshal will return an error. The current (broken) behavior is to return a malformed chunk, which can then lead to a panic when there is an attempt to read header values. The referenced issue proposed creating chunks with a minimum length -- I instead opted to just error on the attempt to unmarshal, since I'm not clear on how it could be safe to proceed when the length is incorrect/unknown. The issue also talked about possibly "quarantining series", but I don't know the surrounding code well enough to understand how to make that happen.	2016-08-30 07:57:39 -04:00
Matt Bostock	e618af5d0b	Storage: Add crash recovery metric 'started_dirty' ...to indicate when crash recovery was invoked during Prometheus startup. Fixes #1918.	2016-08-27 21:41:06 +02:00
Julius Volz	3bfec97d46	Make the storage interface higher-level. See discussion in https://groups.google.com/forum/#!topic/prometheus-developers/bkuGbVlvQ9g The main idea is that the user of a storage shouldn't have to deal with fingerprints anymore, and should not need to do an individual preload call for each metric. The storage interface needs to be made more high-level to not expose these details. This also makes it easier to reuse the same storage interface for remote storages later, as fewer roundtrips are required and the fingerprint concept doesn't work well across the network. NOTE: this deliberately gets rid of a small optimization in the old query Analyzer, where we dedupe instants and ranges for the same series. This should have a minor impact, as most queries do not have multiple selectors loading the same series (and at the same offset).	2016-07-25 13:59:22 +02:00
beorn7	fc6737b7fb	storage: improve index lookups tl;dr: This is not a fundamental solution to the indexing problem (like tindex is) but it at least avoids utilizing the intersection problem to the greatest possible amount. In more detail: Imagine the following query: nicely:aggregating:rule{job="foo",env="prod"} While it uses a nicely aggregating recording rule (which might have a very low cardinality), Prometheus still intersects the low number of fingerprints for `{__name__="nicely:aggregating:rule"}` with the many thousands of fingerprints matching `{job="foo"}` and with the millions of fingerprints matching `{env="prod"}`. This totally innocuous query is dead slow if the Prometheus server has a lot of time series with the `{env="prod"}` label. Ironically, if you make the query more complicated, it becomes blazingly fast: nicely:aggregating:rule{job=~"foo",env=~"prod"} Why so? Because Prometheus only intersects with non-Equal matchers if there are no Equal matchers. That's good in this case because it retrieves the few fingerprints for `{__name__="nicely:aggregating:rule"}` and then starts right ahead to retrieve the metric for those FPs and checking individually if they match the other matchers. This change is generalizing the idea of when to stop intersecting FPs and go into "retrieve metrics and check them individually against remaining matchers" mode: - First, sort all matchers by "expected cardinality". Matchers matching the empty string are always worst (and never used for intersections). Equal matchers are in general consider best, but by using some crude heuristics, we declare some better than others (instance labels or anything that looks like a recording rule). - Then go through the matchers until we hit a threshold of remaining FPs in the intersection. This threshold is higher if we are already in the non-Equal matcher area as intersection is even more expensive here. - Once the threshold has been reached (or we have run out of matchers that do not match the empty string), start with "retrieve metrics and check them individually against remaining matchers". A beefy server at SoundCloud was spending 67% of its CPU time in index lookups (fingerprintsForLabelPairs), serving mostly a dashboard that is exclusively built with recording rules. With this change, it spends only 35% in fingerprintsForLabelPairs. The CPU usage dropped from 26 cores to 18 cores. The median latency for query_range dropped from 14s to 50ms(!). As expected, higher percentile latency didn't improve that much because the new approach is _occasionally_ running into the worst case while the old one was _systematically_ doing so. The 99th percentile latency is now about as high as the median before (14s) while it was almost twice as high before (26s).	2016-07-20 17:35:53 +02:00
Björn Rabenstein	0622304244	Merge pull request #1798 from prometheus/beorn7/storage2 Crash recovery: Fix an edge case.	2016-07-13 16:53:18 +02:00
beorn7	2a75b15328	Crash recovery: Fix an edge case. If the chunks of a series in the checkpoint are all older then the latest chunk on disk, the head chunk is persisted and therefore has to be declared closed. It would be great to have a test for this, but that would require more plumbing, subject of #447.	2016-07-07 16:17:38 +02:00
beorn7	064b57858e	Consistently use the `Seconds()` method for conversion of durations This also fixes one remaining case of recording integral numbers of seconds only for a metric, i.e. this will probably fix #1796.	2016-07-07 15:24:35 +02:00
Julius Volz	91401794fa	storage: Make MemorySeriesStorage a public type See https://twitter.com/fabxc/status/748032597876482048	2016-06-29 08:14:23 +02:00
Fabian Reinartz	425736a377	*: remove last remainers of non-second metrics	2016-06-23 17:50:39 +02:00
Julius Volz	b7b6717438	Separate query interface out of local.Storage. PromQL only requires a much narrower interface than local.Storage in order to run queries. Narrower interfaces are easier to replace and test, too. We could also change the web interface to use local.Querier, except that we'll probably use appending functions from there in the future.	2016-06-23 15:14:38 +02:00
Jan van Valburg	68f3df49d0	stoarge: fix 'access denied' error on Windows On Windows, it is not possible to rename or delete a file that is currerntly open. This change closes the file in dropAndPersistChunks before it tries to delete it, or rename the temporary file to it.	2016-06-21 11:21:20 +02:00
beorn7	b274c7aaa7	Update doc comments	2016-06-03 12:34:01 +02:00
beorn7	99881ded63	Make the number of fingerprint mutexes configurable With a lot of series accessed in a short timeframe (by a query, a large scrape, checkpointing, ...), there is actually quite a significant amount of lock contention if something similar is running at the same time. In those cases, the number of locks needs to be increased. On the same front, as our fingerprints don't have a lot of entropy, I introduced some additional shuffling. With the current state, anly changes in the least singificant bits of a FP would matter.	2016-06-02 19:18:00 +02:00
beorn7	a308c76292	Improve TestAppendOutOfOrder It did not test the returned error so far. Also, add tests for the NaN case broken before https://github.com/prometheus/common/pull/40	2016-05-20 13:46:33 +02:00
beorn7	b2ef4dc52d	Correctly identify no-op appends if the value is NaN This requires an updating of the vendored commen.model package, which I will do once https://github.com/prometheus/common/pull/40 is merged.	2016-05-19 18:32:47 +02:00

1 2 3 4 5 ...

325 commits