prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2025-03-05 20:59:13 -08:00

Author	SHA1	Message	Date
beorn7	8cdced3850	Implement Gorilla-inspired chunk encoding This is not a verbatim implementation of the Gorilla encoding. First of all, it could not, even if we wanted, because Prometheus has a different chunking model (constant size, not constant time). Second, this adds a number of changes that improve the encoding in general or at least for the specific use case of Prometheus (and are partially only possible in the context of Prometheus). See comments in the code for details.	2016-03-17 14:47:08 +01:00
beorn7	8e64e8dfca	Fix return statement.	2016-03-17 14:43:00 +01:00
beorn7	e7ac9c6863	Improvments based on review - Moved returns into the default section of switch statement that can only happen then. - Fix typo.	2016-03-17 14:37:24 +01:00
beorn7	c13b1ecfe9	Make chunk iterators more DRY This finally extracts all the common code of the two chunk iterators into one. Any future chunk encodings with fast access by index can use the same iterator by simply providing an indexAccessor. Other future chunk encodings without fast index access (like Gorilla-style) can still implement the chunkIterator interface as usual.	2016-03-07 20:23:14 +01:00
beorn7	32f280a3cd	Slim down the chunkIterator interface For one, remove unneeded methods. Then, instead of using a channel for all values, use a bufio.Scanner-like interface. This removes the need for creating a goroutine and avoids the (unnecessary) locking performed by channel sending and receiving. This will make it much easier to write new chunk implementations (like Gorilla-style encoding).	2016-03-07 19:50:13 +01:00
beorn7	0ea5801e47	Handle errors caused by data corruption more gracefully This requires all the panic calls upon unexpected data to be converted into errors returned. This pollute the function signatures quite lot. Well, this is Go... The ideas behind this are the following: - panic only if it's a programming error. Data corruptions happen, and they are not programming errors. - If we detect a data corruption, we "quarantine" the series, essentially removing it from the database and putting its data into a separate directory for forensics. - Failure during writing to a series file is not considered corruption automatically. It will call setDirty, though, so that a crashrecovery upon the next restart will commence and check for that. - Series quarantining and setDirty calls are logged and counted in metrics, but are hidden from the user of the interfaces in interface.go, whith the notable exception of Append(). The reasoning is that we treat corruption by removing the corrupted series, i.e. a query for it will return no results on its next call anyway, so return no results right now. In the case of Append(), we want to tell the user that no data has been appended, though. Minor side effects: - Now consistently using filepath.* instead of path.*. - Introduced structured logging where I touched it. This makes things less consistent, but a complete change to structured logging would be out of scope for this PR.	2016-03-02 23:02:34 +01:00
beorn7	1e13f89039	Return SamplePair istead of *SamplePair consistently Formalize ZeroSamplePair as return value for non-existing samples. Change LastSamplePairForFingerprint to return a SamplePair (and not a pointer to it), which saves allocations in a potentially extremely frequent call.	2016-02-19 17:00:40 +01:00
beorn7	0e202dacb4	Streamline series iterator creation This will fix issue #1035 and will also help to make issue #1264 less bad. The fundamental problem in the current code: In the preload phase, we quite accurately determine which chunks will be used for the query being executed. However, in the subsequent step of creating series iterators, the created iterators are referencing _all_ in-memory chunks in their series, even the un-pinned ones. In iterator creation, we copy a pointer to each in-memory chunk of a series into the iterator. While this creates a certain amount of allocation churn, the worst thing about it is that copying the chunk pointer out of the chunkDesc requires a mutex acquisition. (Remember that the iterator will also reference un-pinned chunks, so we need to acquire the mutex to protect against concurrent eviction.) The worst case happens if a series doesn't even contain any relevant samples for the query time range. We notice that during preloading but then we will still create a series iterator for it. But even for series that do contain relevant samples, the overhead is quite bad for instant queries that retrieve a single sample from each series, but still go through all the effort of series iterator creation. All of that is particularly bad if a series has many in-memory chunks. This commit addresses the problem from two sides: First, it merges preloading and iterator creation into one step, i.e. the preload call returns an iterator for exactly the preloaded chunks. Second, the required mutex acquisition in chunkDesc has been greatly reduced. That was enabled by a side effect of the first step, which is that the iterator is only referencing pinned chunks, so there is no risk of concurrent eviction anymore, and chunks can be accessed without mutex acquisition. To simplify the code changes for the above, the long-planned change of ValueAtTime to ValueAtOrBefore time was performed at the same time. (It should have been done first, but it kind of accidentally happened while I was in the middle of writing the series iterator changes. Sorry for that.) So far, we actively filtered the up to two values that were returned by ValueAtTime, i.e. we invested work to retrieve up to two values, and then we invested more work to throw one of them away. The SeriesIterator.BoundaryValues method can be removed once #1401 is fixed. But I really didn't want to load even more changes into this PR. Benchmarks: The BenchmarkFuzz.* benchmarks run 83% faster (i.e. about six times faster) and allocate 95% fewer bytes. The reason for that is that the benchmark reads one sample after another from the time series and creates a new series iterator for each sample read. To find out how much these improvements matter in practice, I have mirrored a beefy Prometheus server at SoundCloud that suffers from both issues #1035 and #1264. To reach steady state that would be comparable, the server needs to run for 15d. So far, it has run for 1d. The test server currently has only half as many memory time series and 60% of the memory chunks the main server has. The 90th percentile rule evaluation cycle time is ~11s on the main server and only ~3s on the test server. However, these numbers might get much closer over time. In addition to performance improvements, this commit removes about 150 LOC.	2016-02-19 16:24:38 +01:00
beorn7	582af1618c	Streamline chunk writing This helps to avoid allocations in the same way we were already doing it during reading.	2016-01-25 16:36:36 +01:00
Fabian Reinartz	1535ef1457	Replace metric.SamplePair with model.SamplePair	2015-08-22 14:52:35 +02:00
Fabian Reinartz	306e8468a0	Switch from client_golang/model to common/model	2015-08-21 13:33:38 +02:00
beorn7	ff08f0b6fe	storage: ensure timestamp monotonicity within series. Fixes https://github.com/prometheus/prometheus/issues/481 While doing so, clean up and fix a few other things: - Fix `go vet` warnings (@fabxc to blame ;). - Fix a racey problem with unarchiving: Whenever we unarchive a series, we essentially want to do something with it. However, until we have done something with it, it appears like a series that is ready to be archived or even purged. So e.g. it would be ignored during checkpointing. With this fix, we always load the chunkDescs upon unarchiving. This is wasteful if we only want to add a new sample to an archived time series, but the (presumably more common) case where we access an archived time series in a query doesn't become more expensive. - The change above streamlined the getOrCreateSeries ond newMemorySeries flow. Also, the modTime is now always set correctly. - Fix the leveldb-backed implementation of KeyValueStore.Delete. It had the wrong behavior of still returning true, nil if a non-existing key has been passed in.	2015-07-15 18:56:53 +02:00
Julius Volz	acbc2b8cb6	storage: Fix float->uint conversions on some compilers. See https://github.com/prometheus/prometheus/issues/887, which will at least be partially fixed by this. From the spec https://golang.org/ref/spec#Conversions: "In all non-constant conversions involving floating-point or complex values, if the result type cannot represent the value the conversion succeeds but the result value is implementation-dependent." This ended up setting the converted values to 0 on Debian's Go 1.4.2 compiler, at least on 32-bit Debians.	2015-07-13 11:19:11 +02:00
beorn7	3b9c421a69	Weed out all the [Gg]et* method names. The only exception is getNumChunksToPersist to avoid naming the struct member numChunksToPersist in a weird way.	2015-05-20 19:13:06 +02:00
beorn7	cd5574bf8a	Make chunk and series iterators more efficient.	2015-05-20 16:19:34 +02:00
beorn7	b02d900e61	Improve chunk and chunkDesc loading. Also, clean up some things in the code (especially introduction of the chunkLenWithHeader constant to avoid the same expression all over the place). Benchmark results: BEFORE BenchmarkLoadChunksSequentially 5000 283580 ns/op 152143 B/op 312 allocs/op BenchmarkLoadChunksRandomly 20000 82936 ns/op 39310 B/op 99 allocs/op BenchmarkLoadChunkDescs 10000 110833 ns/op 15092 B/op 345 allocs/op AFTER BenchmarkLoadChunksSequentially 10000 146785 ns/op 152285 B/op 315 allocs/op BenchmarkLoadChunksRandomly 20000 67598 ns/op 39438 B/op 103 allocs/op BenchmarkLoadChunkDescs 20000 99631 ns/op 12636 B/op 192 allocs/op Note that everything is obviously loaded from the page cache (as the benchmark runs thousands of times with very small series files). In a real-world scenario, I expect a larger impact, as the disk operations will more often actually hit the disk. To load ~50 sequential chunks, this reduces the iops from 100 seeks and 100 reads to 1 seek and 1 read.	2015-04-13 21:06:04 +02:00
beorn7	5bea942d8e	Improve various things around chunk encoding. A number of mostly minor things: - Rename chunk type -> chunk encoding. - After all, do not carry around the chunk encoding to all parts of the system, but just have one place where the encoding for new chunks is set based on the flag. The new approach has caveats as well, but the polution of so many method signatures is worse. - Use the default chunk encoding for new chunks of existing series. (Previously, only new _series_ would get chunks with the default encoding.) - Use an enum for chunk encoding. (But keep the version number for the flag, for reasons discussed previously.) - Add encoding() to the chunk interface (so that a chunk knows its own encoding - no need to have that in a different top-level function). - Got rid of newFollowUpChunk (which would keep the existing encoding for all chunks of a time series). Now only use newChunk(), which will create a chunk encoding according to the flag. - Simplified transcodeAndAdd. - Reordered methods of deltaEncodedChunk and doubleDeltaEncoded chunk to match the order in the chunk interface. - Only transcode if the chunk is not yet half full. If more than half full, add a new chunk instead.	2015-03-14 19:03:20 +01:00
beorn7	23ba8a5516	Make floats exact again. This should do the right thing for the old delta chunks, too.	2015-03-06 17:03:56 +01:00
beorn7	a8d4f8af9a	Improve minor things after review. The problem of float precision will be addressed in the next commit.	2015-03-06 12:53:00 +01:00
beorn7	13fcf1ddbc	Implement double-delta encoded chunks.	2015-03-05 20:33:26 +01:00
Bjoern Rabenstein	5859b74f1b	Clean up license issues. - Move CONTRIBUTORS.md to the more common AUTHORS. - Added the required NOTICE file. - Changed "Prometheus Team" to "The Prometheus Authors". - Reverted the erroneous changes to the Apache License.	2015-01-21 20:07:45 +01:00
Bjoern Rabenstein	14bda4180c	Changes after pair code review. Change-Id: Ib72d40f8e9027818cfbbd32a7a7201eebda07455	2014-11-25 17:12:59 +01:00
Bjoern Rabenstein	096fa0f8b2	Squash a number of TODOs. - Staleness delta is no a proper function parameter and not replicated from package ast. - Named type 'chunks' replaced by explicit '[]chunk' to avoid confusion. - For the same reason, replaced 'chunkDescs' by '[]*chunkDescs'. - Verified that math.Modf is not a speed enhancement over conversion (actually 5x slower). - Renamed firstTimeField, lastTimeField into chunkFirstTime and chunkLastTime. - Verified unpin() is sufficiently goroutine-safe. - Decided not to update archivedFingerprintToTimeRange upon series truncation and added a rationale why. Change-Id: I863b8d785e5ad9f71eb63e229845eacf1bed8534	2014-11-25 17:09:04 +01:00
Bjoern Rabenstein	71206dbc06	More code cleanups. Add license text everywhere. And others.... Change-Id: I11ccde267a2ef7eb366c4788ba7aeae14ba7545c	2014-11-25 17:07:44 +01:00
Bjoern Rabenstein	f5f9f3514a	Major code cleanup. - Make it go-vet and golint clean. - Add comments, TODOs, etc. Change-Id: If1392d96f3d5b4cdde597b10c8dff1769fcfabe2	2014-11-25 17:02:53 +01:00
Julius Volz	8dfaa5ecd2	Remove use of freelists for chunk bufs. Change-Id: Ib887fdb61e1d96da0cd32545817b925ba88831c1	2014-11-25 17:02:00 +01:00
Julius Volz	7b35e0f0b8	Use constants from math package instead of literals. Change-Id: I55427ba32c2cbb32ee42ec1e3153160965ab8b3c	2014-11-25 17:02:00 +01:00
Julius Volz	fd01d07589	Check that chunk buffer length fits in 16 bit. Change-Id: Id086a54aa8a1990c1979e747c1c02e53bed6d447	2014-11-25 17:02:00 +01:00
Bjoern Rabenstein	d742edfe0d	Fix precision loss. Large delta values often imply a difference between a large base value and the large delta value, potentially resulting in small numbers with a huge precision error. Since large delta values need 8 bytes anyway, we are not even saving memory. As a solution, always save the absoluto value rather than a delta once 8 bytes would be needed for the delta. Timestamps are then saved as 8 byte integers, while values are always saved as float64 in that case. Change-Id: I01100d600515e16df58ce508b50982ffd762cc49	2014-11-25 17:02:00 +01:00
Bjoern Rabenstein	ecdf5ab14f	Index-persistence switched from gob to a hand-coded solution. Change-Id: Ib4ec42535bd08df16d34d4774bb638e35c5a1841	2014-11-25 17:02:00 +01:00
Julius Volz	e7ed39c9a6	Initial experimental snapshot of next-gen storage. Change-Id: Ifb8709960dbedd1d9f5efd88cdd359ee9fa9d26d	2014-11-25 17:02:00 +01:00

31 commits