prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-09-21 00:07:36 -07:00

Author	SHA1	Message	Date
Fabian Reinartz	c9d396f476	Replace metric.LabelPair with model.LabelPair	2015-08-22 13:32:13 +02:00
Fabian Reinartz	438e232c9b	Fix grouping of import blocks	2015-08-22 09:42:45 +02:00
Fabian Reinartz	306e8468a0	Switch from client_golang/model to common/model	2015-08-21 13:33:38 +02:00
Laurie Malau	20ad403587	Don't warn/increment metric upon equal timestamps during append. Perhaps it would be even better to still warn in case the sample value has changed but the timestamps are equal, but we don't have efficient access to the last value.	2015-08-09 23:49:49 +02:00
Julius Volz	517badc21d	Only do regex lookups when there was no equality match. For the label matching index-based preselection phase, don't do an OR between equality and non-equality matchers. Execute only one of the two (with equality matchers preferred when present). Fixes https://github.com/prometheus/prometheus/issues/924	2015-07-23 23:13:30 +02:00
beorn7	699946bf32	Fix chunk desc loading. If all samples in consecutive chunks have the same timestamp, the way we used to load chunks will fail. With this change, the persist watermark is used to load the right amount of chunkDescs from disk. This bug is a possible reason for the rare storage corruption we have observed.	2015-07-16 13:09:20 +02:00
beorn7	4203849c92	Test chunkDesc eviction and loading	2015-07-16 13:09:13 +02:00
beorn7	ff08f0b6fe	storage: ensure timestamp monotonicity within series. Fixes https://github.com/prometheus/prometheus/issues/481 While doing so, clean up and fix a few other things: - Fix `go vet` warnings (@fabxc to blame ;). - Fix a racey problem with unarchiving: Whenever we unarchive a series, we essentially want to do something with it. However, until we have done something with it, it appears like a series that is ready to be archived or even purged. So e.g. it would be ignored during checkpointing. With this fix, we always load the chunkDescs upon unarchiving. This is wasteful if we only want to add a new sample to an archived time series, but the (presumably more common) case where we access an archived time series in a query doesn't become more expensive. - The change above streamlined the getOrCreateSeries ond newMemorySeries flow. Also, the modTime is now always set correctly. - Fix the leveldb-backed implementation of KeyValueStore.Delete. It had the wrong behavior of still returning true, nil if a non-existing key has been passed in.	2015-07-15 18:56:53 +02:00
Fabian Reinartz	6bfb4549a6	storage: add LastSamplePairForFingerprint method	2015-06-23 13:45:15 +02:00
Fabian Reinartz	dc7d27ab9a	retrieval: add honor label handling and parametrized querying. This commit adds the honor_labels and params arguments to the scrape config. This allows to specify query parameters used by the scrapers and handling scraped labels with precedence.	2015-06-23 13:45:14 +02:00
Fabian Reinartz	1eff186555	Merge pull request #810 from prometheus/fabxc/lmatch Match empty labels.	2015-06-22 15:45:50 +02:00
Fabian Reinartz	5b91ea9b36	storage: improve label matching and allow unset matching. Matching of empty labels now also matches metrics where the label was not explicitly set to the empty string.	2015-06-22 15:33:44 +02:00
Fabian Reinartz	b105e26f4d	storage: remove global flags	2015-06-15 19:01:06 +02:00
Fabian Reinartz	5c6c0e2faa	Add storage method to delete time series	2015-06-01 21:23:32 +02:00
Fabian Reinartz	aff01e29c3	Limit retrievable samples to retention window. The storage does not delete data immediately after the retention period. We don't want to retrieve this data as it causes artifacts.	2015-05-27 13:13:59 +02:00
Fabian Reinartz	6e319532cf	Read from indexing queue during crash recovery. Change #704 introduced a regression that started reading the queue only after potential crash recovery. When more than the queue capacity was indexed, Prometheus deadlocked.	2015-05-23 15:32:35 +02:00
beorn7	3b9ab546e6	Add metrics to count inconsistencies and fp collisions.	2015-05-21 18:46:20 +02:00
Björn Rabenstein	c44e7cd105	Merge pull request #706 from prometheus/beorn7/persistence2 Improve iterator performance.	2015-05-21 13:48:52 +02:00
Fabian Reinartz	112a778922	Align int64s for atomic operations	2015-05-21 01:38:50 +02:00
beorn7	3b9c421a69	Weed out all the [Gg]et* method names. The only exception is getNumChunksToPersist to avoid naming the struct member numChunksToPersist in a weird way.	2015-05-20 19:13:06 +02:00
Julius Volz	267fd34156	Switch Prometheus to use github.com/prometheus/log. This change is conceptually very simple, although the diff is large. It switches logging from "github.com/golang/glog" to "github.com/prometheus/log", while not actually changing any log messages. V(1)-style logging has been changed to be log.Debug*().	2015-05-20 18:19:32 +02:00
beorn7	81b190bf45	Remove locking from series iterator. Cache chunk iterators.	2015-05-20 16:19:34 +02:00
Fabian Reinartz	d8440d75f1	Do not start storage processing before Start() is called.	2015-05-19 13:51:45 +02:00
beorn7	2235cec175	Handle fingerprint collisions.	2015-05-07 18:17:59 +02:00
beorn7	9820e5fe99	Use FastFingerprint where appropriate.	2015-05-06 12:00:58 +02:00
beorn7	c5fa0b90c3	Fix the case where a series in memory has 0 chunks, but chunks on disk. This is actually completely normal for a freshly unarchived series. Test added to expose.	2015-04-09 15:57:11 +02:00
beorn7	3035b8bfdd	Adaptively reduce the wait time for memory series maintenance. This will make in-memory series maintenance the faster the more chunks are waiting for persistence.	2015-04-01 17:52:03 +02:00
beorn7	6a21f73898	Fixes after review.	2015-03-19 17:54:59 +01:00
beorn7	51d35f4481	Instrument series maintenance durations.	2015-03-19 17:06:16 +01:00
beorn7	12ae6e9203	Increase resilience of the storage against data corruption - step 4. Step 4: Add a configurable sync'ing of series files after modification.	2015-03-19 15:58:02 +01:00
beorn7	11bd9ce1bd	Increase resilience of the storage against data corruption - step 3. Step 3: Remember the mtime of series files and make use of it to detect series files that are not the one the checkpoint thinks they are.	2015-03-19 15:44:11 +01:00
beorn7	e25cca823c	Increase resilience of the storage against data corruption - step 2. Step 2: Add a flag -storage.local.pedantic-checks to check every series file. Also, remove countPersistedHeadChunks channel, which is unused.	2015-03-19 12:06:15 +01:00
beorn7	3d8d8928be	Increase resilience of the storage against data corruption - step 1. Step 1: Admit the problem by turning the various "panic"s into logged errors, followed by marking the persistence as dirty.	2015-03-19 11:49:18 +01:00
beorn7	da7c0461c6	Rename persist queue len/cap to num/max chunks to persist. Remove deprecated flag storage.incoming-samples-queue-capacity.	2015-03-18 19:36:41 +01:00
beorn7	a075900f9a	Merge branch 'beorn7/persistence' into beorn7/ingestion-tweaks	2015-03-18 19:09:31 +01:00
beorn7	1d8fc7d56f	Change minor things after code review.	2015-03-18 19:09:07 +01:00
beorn7	be11cb2b07	Remove the sample ingestion channel. The one central sample ingestion channel has caused a variety of trouble. This commit removes it. Targets and rule evaluation call an Append method directly now. To incorporate multiple storage backends (like OpenTSDB), storage.Tee forks the Append into two different appenders. Note that the tsdb queue manager had its own queue anyway. It was a queue after a queue... Much queue, so overhead... Targets have their own little buffer (implemented as a channel) to avoid stalling during an http scrape. But a new scrape will only be started once the old one is fully ingested. The contraption of three pipelined ingesters was removed. A Target is an ingester itself now. Despite more logic in Target, things should be less confusing now. Also, remove lint and vet warnings in ast.go.	2015-03-15 14:08:22 +01:00
beorn7	0056eaeb4f	Redesign series maintenance and chunk persistence.	2015-03-14 22:05:23 +01:00
beorn7	5bea942d8e	Improve various things around chunk encoding. A number of mostly minor things: - Rename chunk type -> chunk encoding. - After all, do not carry around the chunk encoding to all parts of the system, but just have one place where the encoding for new chunks is set based on the flag. The new approach has caveats as well, but the polution of so many method signatures is worse. - Use the default chunk encoding for new chunks of existing series. (Previously, only new _series_ would get chunks with the default encoding.) - Use an enum for chunk encoding. (But keep the version number for the flag, for reasons discussed previously.) - Add encoding() to the chunk interface (so that a chunk knows its own encoding - no need to have that in a different top-level function). - Got rid of newFollowUpChunk (which would keep the existing encoding for all chunks of a time series). Now only use newChunk(), which will create a chunk encoding according to the flag. - Simplified transcodeAndAdd. - Reordered methods of deltaEncodedChunk and doubleDeltaEncoded chunk to match the order in the chunk interface. - Only transcode if the chunk is not yet half full. If more than half full, add a new chunk instead.	2015-03-14 19:03:20 +01:00
beorn7	13fcf1ddbc	Implement double-delta encoded chunks.	2015-03-05 20:33:26 +01:00
beorn7	5ed8f6c205	Update persistQueueLength after chunks were persisted.	2015-03-04 18:46:16 +01:00
beorn7	1db7589081	Reduce the capacity of countPersistedHeadChunks. The capacity is basically how many persisted head chunks we will count at most while doing other things, in particular checkpointing. To limit the amount of already counted head chunks, keep this number low, otherwise we will easily checkpoint too often if checkpoints take long anyway.	2015-02-27 00:53:52 +01:00
beorn7	9406afad72	Do not double-count non-persisted head chunks on loading.	2015-02-27 00:06:16 +01:00
beorn7	dbc22b972c	Check last time in head chunk for head chunk timeout, not first.	2015-02-26 23:40:42 +01:00
beorn7	edd716e63c	Fix the embarrassing bug introduced in commit `0851945`. In that commit, the 'maintainSeries' call was accidentally removed. This commit refactors things a bit so that there is now a clean 'maintainMemorySeries' and a 'maintainArchivedSeries' call. Straighten the nomenclature a bit (consistently use 'drop' for chunks and 'purge' for series/metrics). Remove the annoying 'Completed maintenance sweep through archived fingerprints' message if there were no archived fingerprints to do maintenance on.	2015-02-26 18:30:33 +01:00
beorn7	af91fb8e31	Improve persisting chunks to disk. This is done by bucketing chunks by fingerprint. If the persisting to disk falls behind, more and more chunks are in the queue. As soon as there are "double hits", we will now persist both chunks in one go, doubling the disk throughput (assuming it is limited by disk seeks). Should even more pile up so that we end wit "triple hits", we will persist those first, and so on. Even if we have millions of time series, this will still help, assuming not all of them are growing with the same speed. Series that get many samples and/or are not very compressable will accumulate chunks faster, and they will soon get double- or triple-writes. To improve the chance of double writes, -storage.local.persistence-queue-capacity could be set to a higher value. However, that will slow down shutdown a lot (as the queue has to be worked through). So we leave it to the user to set it to a really high value. A more fundamental solution would be to checkpoint not only head chunks, but also chunks still in the persist queue. That would be quite complicated for a rather limited use-case (running many time series with high ingestion rate on slow spinning disks).	2015-02-17 16:02:09 +01:00
beorn7	e22f26bc58	Move to a queue model for appending samples after all. Starting a goroutine takes 1-2µs on my laptop. From the "numbers every Go programmer should know", I had 300ns for a channel send in my mind. Turns out, on my laptop, it takes only 60ns. That's fast enough to warrant the machinery of yet another channel with a fixed set of worker goroutines feeding from it. The number chosen (8 for now) is low enough to not really afflict a measurable overhead (a big Prometheus server has >1000 goroutines running), but high enough to not make sample ingestion a bottleneck.	2015-02-13 14:26:54 +01:00
beorn7	fe518fdb28	Simplify AppendSamples by allowing it to be goroutine-unsafe.	2015-02-13 12:13:22 +01:00
beorn7	5d3cd65a5d	Improve performance of ingestion. - Parallelize AppendSamples as much as possible without breaking the contract about temporal order. - Allocate more fingerprint locker slots. - Do not run early checkpoints if we are behind on chunk persistence. - Increase fpMinWaitDuration to give the disk more time for more important things. Also, switch math.MaxInt64 and math.MinInt64 to the new constants.	2015-02-12 18:12:37 +01:00
beorn7	d2ab49c396	Make the persist queue length configurable. Also, set a much higher default value. Chunk persist requests can be quite spiky. If you collect a large number of time series that are very similar, they will tend to finish up a chunk at about the same time. There is no reason we need to back up scraping just because of that. The rationale of the new default value is "1/8 of the chunks in memory".	2015-02-06 14:54:53 +01:00

1 2 3

108 commits