prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-12-28 23:19:41 -08:00

Author	SHA1	Message	Date
Julius Volz	c7c0b33d0b	Add regex-matching support for labels. There are four label-matching ops for selecting timeseries now: - Equal: = - NotEqual: != - RegexMatch: =~ - RegexNoMatch: !~ Instead of looking up labels by a simple clientmodel.LabelSet (basically an equals op for every key/value pair in the set), timeseries fingerprint selection is now done via a list of metric.LabelMatchers. Change-Id: I510a83f761198e80946146770ebb64e4abc3bb96	2014-04-01 14:24:53 +02:00
Julius Volz	7a577b86b7	Fix interval op special case. In the case that a getValuesAtIntervalOp's ExtractSamples() is called with a current time after the last chunk time, we return without extracting any further values beyond the last one in the chunk (correct), but also without advancing the op's time (incorrect). This leads to an infinite loop in renderView(), since the op is called repeatedly without ever being advanced and consumed. This adds handling for this special case. When detecting this case, we immediately set the op to be consumed, since we would always get a value after the current time passed in if there was one. Change-Id: Id99149e07b5188d655331382b8b6a461b677005c	2014-03-26 13:29:03 +01:00
Julius Volz	9d5c367745	Fix incorrect interval op advancement. This fixes a bug where an interval op might advance too far past the end of the currently extracted chunk, effectively skipping over relevant (to-be-extracted) values in the subsequent chunk. The result: missing samples at chunk boundaries in the resulting view. Change-Id: Iebf5d086293a277d330039c69f78e1eaf084b3c8	2014-03-18 16:22:50 +01:00
Bjoern Rabenstein	c3b282bd14	Add regression tests for 'loop until op is consumed' bug. - Most of this is the actual regression test in tiered_test.go. - Working on that regression tests uncovered problems in tiered_test.go that are fixed in this commit. - The 'op.consumed = false' line added to freelist.go was actually not fixing a bug. Instead, there was no bug at all. So this commit removes that line again, but adds a regression test to make sure that the assumed bug is indeed not there (cf. freelist_test.go). - Removed more code duplication in operation.go (following the same approach as before, i.e. embedding op type A into op type B if everything in A is the same as in B with the exception of String() and ExtractSample()). (This change make struct literals for ops more clunky, but that only affects tests. No code change whatsoever was necessary in the actual code after this refactoring.) - Fix another op leak in tiered.go. Change-Id: Ia165c52e33290ad4f6aba9c83d92318d4f583517	2014-03-12 18:40:24 +01:00
Julius Volz	86fc13a52e	Convert metric.Values to slice of values. The initial impetus for this was that it made unmarshalling sample values much faster. Other relevant benchmark changes in ns/op: Benchmark old new speedup ================================================================== BenchmarkMarshal 179170 127996 1.4x BenchmarkUnmarshal 404984 132186 3.1x BenchmarkMemoryGetValueAtTime 57801 50050 1.2x BenchmarkMemoryGetBoundaryValues 64496 53194 1.2x BenchmarkMemoryGetRangeValues 66585 54065 1.2x BenchmarkStreamAdd 45.0 75.3 0.6x BenchmarkAppendSample1 1157 1587 0.7x BenchmarkAppendSample10 4090 4284 0.95x BenchmarkAppendSample100 45660 44066 1.0x BenchmarkAppendSample1000 579084 582380 1.0x BenchmarkMemoryAppendRepeatingValues 22796594 22005502 1.0x Overall, this gives us good speedups in the areas where they matter most: decoding values from disk and accessing the memory storage (which is also used for views). Some of the smaller append examples take minimally longer, but the cost seems to get amortized over larger appends, so I'm not worried about these. Also, we're currently not bottlenecked on the write path and have plenty of other optimizations available in that area if it becomes necessary. Memory allocations during appends don't change measurably at all. Change-Id: I7dc7394edea09506976765551f35b138518db9e8	2014-03-11 18:23:37 +01:00
Bjoern Rabenstein	9ea9189dd1	Remove the multi-op-per-fingerprint capability. Currently, rendering a view is capable of handling multiple ops for the same fingerprint efficiently. However, this capability requires a lot of complexity in the code, which we are not using at all because the way we assemble a viewRequest will never have more than one operation per fingerprint. This commit weeds out the said capability, along with all the code needed for it. It is still possible to have more than one operation for the same fingerprint, it will just be handled in a less efficient way (as proven by the unit tests). As a result, scanjob.go could be removed entirely. This commit also contains a few related refactorings and removals of dead code in operation.go, view,go, and freelist.go. Also, the docstrings received some love. Change-Id: I032b976e0880151c3f3fdb3234fb65e484f0e2e5	2014-03-04 16:29:56 +01:00
Julius Volz	94666e20b7	Minor test error reporting cleanup. Change-Id: Ie11c16b4e60de7c179c6d2a86e063f4432e2000f	2014-02-03 12:27:01 +01:00
Julius Volz	fd2158e746	Store copy of metric during fingerprint caching Problem description: ==================== If a rule evaluation referencing a metric/timeseries M happens at a time when M doesn't have a memory timeseries yet, looking up the fingerprint for M (via TieredStorage.GetMetricForFingerprint()) will create a new Metric object for M which gets both: a) attached to a new empty memory timeseries (so we don't have to ask disk for the Metric's fingerprint next time), and b) returned to the rule evaluation layer. However, the rule evaluation layer replaces the name label (and possibly other labels) of the metric with the name of the recorded rule. Since both the rule evaluator and the memory storage share a reference to the same Metric object, the original memory timeseries will now also be incorrectly renamed. Fix: ==== Instead of storing a reference to a shared metric object, take a copy of the object when creating an empty memory timeseries for caching purposes. Change-Id: I9f2172696c16c10b377e6708553a46ef29390f1e	2014-02-02 17:11:08 +01:00
Julius Volz	740d448983	Use custom timestamp type for sample timestamps and related code. So far we've been using Go's native time.Time for anything related to sample timestamps. Since the range of time.Time is much bigger than what we need, this has created two problems: - there could be time.Time values which were out of the range/precision of the time type that we persist to disk, therefore causing incorrectly ordered keys. One bug caused by this was: https://github.com/prometheus/prometheus/issues/367 It would be good to use a timestamp type that's more closely aligned with what the underlying storage supports. - sizeof(time.Time) is 192, while Prometheus should be ok with a single 64-bit Unix timestamp (possibly even a 32-bit one). Since we store samples in large numbers, this seriously affects memory usage. Furthermore, copying/working with the data will be faster if it's smaller. MEMORY USAGE RESULTS Initial memory usage comparisons for a running Prometheus with 1 timeseries and 100,000 samples show roughly a 13% decrease in total (VIRT) memory usage. In my tests, this advantage for some reason decreased a bit the more samples the timeseries had (to 5-7% for millions of samples). This I can't fully explain, but perhaps garbage collection issues were involved. WHEN TO USE THE NEW TIMESTAMP TYPE The new clientmodel.Timestamp type should be used whenever time calculations are either directly or indirectly related to sample timestamps. For example: - the timestamp of a sample itself - all kinds of watermarks - anything that may become or is compared to a sample timestamp (like the timestamp passed into Target.Scrape()). When to still use time.Time: - for measuring durations/times not related to sample timestamps, like duration telemetry exporting, timers that indicate how frequently to execute some action, etc. NOTE ON OPERATOR OPTIMIZATION TESTS We don't use operator optimization code anymore, but it still lives in the code as dead code. It still has tests, but I couldn't get all of them to pass with the new timestamp format. I commented out the failing cases for now, but we should probably remove the dead code soon. I just didn't want to do that in the same change as this. Change-Id: I821787414b0debe85c9fffaeb57abd453727af0f	2013-12-03 09:11:28 +01:00
Matt T. Proud	30b1cf80b5	WIP - Snapshot of Moving to Client Model.	2013-06-25 15:52:42 +02:00
Julius Volz	51689d965d	Add debug timers to instant and range queries. This adds timers around several query-relevant code blocks. For now, the query timer stats are only logged for queries initiated through the UI. In other cases (rule evaluations), the stats are simply thrown away. My hope is that this helps us understand where queries spend time, especially in cases where they sometimes hang for unusual amounts of time.	2013-06-05 18:32:54 +02:00
Julius Volz	5b105c77fc	Repointerize fingerprints.	2013-05-21 14:28:14 +02:00
Julius Volz	ce1ee444f1	Synchronous memory appends and more fine-grained storage locks. This does two things: 1) Make TieredStorage.AppendSamples() write directly to memory instead of buffering to a channel first. This is needed in cases where a rule might immediately need the data generated by a previous rule. 2) Replace the single storage mutex by two new ones: - memoryMutex - needs to be locked at any time that two concurrent goroutines could be accessing (via read or write) the TieredStorage memoryArena. - memoryDeleteMutex - used to prevent any deletion of samples from memoryArena as long as renderView is running and assembling data from it. The LevelDB disk storage does not need to be protected by a mutex when rendering a view since renderView works off a LevelDB snapshot. The rationale against adding memoryMutex directly to the memory storage: taking a mutex does come with a small inherent time cost, and taking it is only required in few places. In fact, no locking is required for the memory storage instance which is part of a view (and not the TieredStorage).	2013-05-10 17:15:52 +02:00
Matt T. Proud	161c8fbf9b	Include deletion processor for long-tail values. This commit extracts the model.Values truncation behavior into the actual tiered storage, which uses it and behaves in a peculiar way—notably the retention of previous elements if the chunk were to ever go empty. This is done to enable interpolation between sparse sample values in the evaluation cycle. Nothing necessarily new here—just an extraction. Now, the model.Values TruncateBefore functionality would do what a user would expect without any surprises, which is required for the DeletionProcessor, which may decide to split a large chunk in two if it determines that the chunk contains the cut-off time.	2013-05-10 12:19:12 +02:00
Matt T. Proud	f897164bcf	Expose TieredStorage.DiskStorage.	2013-05-07 10:26:28 +02:00
Matt T. Proud	ce45787dbf	Storage interface to TieredStorage. This commit drops the Storage interface and just replaces it with a publicized TieredStorage type. Storage had been anticipated to be used as a wrapper for testability but just was not used due to practicality. Merely overengineered. My bad. Anyway, we will eventually instantiate the TieredStorage dependencies in main.go and pass them in for more intelligent lifecycle management. These changes will pave the way for managing the curators without Law of Demeter violations.	2013-05-03 15:54:14 +02:00
Julius Volz	d8110fcd9c	Send sample arrays instead of single samples over channels.	2013-04-29 17:24:17 +02:00
Julius Volz	2202cd71c9	Track alerts over time and write out alert timeseries.	2013-04-26 14:35:21 +02:00
Johannes 'fish' Ziemke	1ad41d4c00	Call closer.Close() earlier.	2013-04-25 13:29:28 +02:00
Julius Volz	99dcbe0f94	Integrate memory and disk layers in view rendering.	2013-04-19 16:01:27 +02:00
Julius Volz	95b081f9bc	Stop serving tiered storage after draining it.	2013-04-15 13:30:03 +02:00
Julius Volz	c59f3fc538	Fix formatting in tiered_test.go.	2013-03-28 12:16:31 +01:00
juliusv	39826d7335	Merge pull request #107 from prometheus/julius-fix-get-fingerprints Fix bug in GetFingerprintsForLabelSet().	2013-03-27 10:54:17 -07:00
Julius Volz	2668700e54	Fix bug in GetFingerprintsForLabelSet().	2013-03-27 18:50:30 +01:00
Matt T. Proud	c53a72a894	Test data for the curator.	2013-03-27 18:13:43 +01:00
Julius Volz	55ca65aa6e	More userfriendly output when we fail to create the tiered storage.	2013-03-27 11:25:05 +01:00
Matt T. Proud	c4e971d7d9	Merge pull request #101 from prometheus/refactor/test/directory-extraction Create temporary directory handler.	2013-03-26 10:46:28 -07:00
Matt T. Proud	b86b0ea41a	Create temporary directory handler.	2013-03-26 18:09:25 +01:00
Julius Volz	2b8f0b2cc7	Constantize metric name label name.	2013-03-26 16:20:23 +01:00
Julius Volz	e096896932	PR comment fixups.	2013-03-26 15:28:00 +01:00
Julius Volz	dd67ab115b	Change GetAllMetricNames() to GetAllValuesForLabel().	2013-03-26 14:47:07 +01:00
Julius Volz	42bdf921d1	Fetch integrated memory/disk data for simple Get* functions.	2013-03-26 14:47:07 +01:00
Julius Volz	4d79dc3602	Replace renderView() by cleaner and more correct reimplementation.	2013-03-21 18:11:02 +01:00
Julius Volz	3621148e7f	Comment out panicking test until proper support is implemented.	2013-03-21 18:08:47 +01:00
Julius Volz	2f06b8bea6	Fix tiered storage test to trigger iterator rewinding case.	2013-03-21 18:08:47 +01:00
Matt T. Proud	62b5d7ce20	Oops.	2013-03-21 18:08:46 +01:00
Matt T. Proud	34a921e16d	Checkpoint.	2013-03-21 18:08:46 +01:00

37 commits