prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2025-03-05 20:59:13 -08:00

Author	SHA1	Message	Date
Julius Volz	80b3d3bf34	Speed up disk flushes by removing unnecessary sort. The first sort in groupByFingerprint already ensures that all resulting sample lists contain only one fingerprint. We also already assume that all samples passed into AppendSamples (and thus groupByFingerprint) are chronologically sorted within each fingerprint. The extra chronological sort is thus superfluous. Furthermore, this second sort didn't only sort chronologically, but also compared all metric fingerprints again (although we already know that we're only sorting within samples for the same fingerprint). This caused a huge memory and runtime overhead. In a heavily loaded real Prometheus, this brought down disk flush times from ~9 minutes to ~1 minute. OLD: BenchmarkLevelDBAppendRepeatingValues 5 331391808 ns/op 44542953 B/op 597788 allocs/op BenchmarkLevelDBAppendsRepeatingValues 5 329893512 ns/op 46968288 B/op 3104373 allocs/op NEW: BenchmarkLevelDBAppendRepeatingValues 5 299298635 ns/op 43329497 B/op 567616 allocs/op BenchmarkLevelDBAppendsRepeatingValues 20 92204601 ns/op `1779454` B/op 70975 allocs/op Change-Id: Ie2d8db3569b0102a18010f9e106e391fda7f7883	2014-11-25 17:01:59 +01:00
Julius Volz	21cafe6cd7	Only evict memory series after they are on disk. This fixes the problem where samples become temporarily unavailable for queries while they are being flushed to disk. Although the entire flushing code could use some major refactoring, I'm explicitly trying to do the minimal change to fix the problem since there's a whole new storage implementation in the pipeline. Change-Id: I0f5393a30b88654c73567456aeaea62f8b3756d9	2014-11-25 17:01:59 +01:00
Bjoern Rabenstein	8956faeccb	Migrate to new client_golang. This change will only be submitted when the new client_golang has been moved to the new version. Change-Id: Ifceb59333072a08286a8ac910709a8ba2e3a1581	2014-11-25 17:01:59 +01:00
Bjoern Rabenstein	814e479723	Treat non-200 HTTP response as error. Change-Id: I2a9f3b47012b3c4839be53aa44c66d16dd41a24a	2014-11-25 17:01:59 +01:00
Brian Brazil	e27447da5c	Remove the broken "User Dashboard" link. Due to the lack of a </a>, this makes the entire header render badly. Accordingly it's safe to assume noone is using it, so remove it. With the new console template support, we'll need to something a bit more nuanced later. Change-Id: I3424bed6aea18cbd4c63ad48f98808098dadc3ad	2014-11-25 17:01:59 +01:00
Brian Brazil	2f76f434a5	Add humanizeDuration function. This attempts to reasonably handle things from weekly cronjobs, to rpcs taking ns to things that are usually ms but jump to over a second. For consistency, stop putting spaces before prefixes. Change-Id: I6407879187b25680b323cd70254e205315b5fc3c	2014-11-25 17:01:59 +01:00
Brian Brazil	960ede66dc	Use html/template for console templates and add template libary support. Add a function to bypass the new auto-escaping. Add a function to workaround go's templates only allowing passing in one argument. Change-Id: Id7aa3f95e7c227692dc22108388b1d9b1e2eec99	2014-11-25 17:01:59 +01:00
Brian Brazil	0f5874ff97	Make Prometheus in header link to status page. This is consistent with alertmanager, and more intiutive for users. The graphs page just has graphs, so remove mention of consoles. Change-Id: I87780a4ade33697a6095423e1a7de47d341d2838	2014-11-25 17:01:59 +01:00
Brian Brazil	cd3592aebc	Add title and match functions. Change-Id: Ifd376c2935e22d378e7afa06122642847a237d78	2014-11-25 17:01:59 +01:00
Brian Brazil	1828b1f55c	Only log every query when debugging. Change-Id: I4f988d81cda6f6deb0ed7f497de4aa75409b158f	2014-11-25 17:01:59 +01:00
Brian Brazil	9b74324d9e	Add functions for regex replacement, sorting and humanizing. Change-Id: I471c7a8087cd5432b51afce811b591b11583a0c3	2014-11-25 17:01:59 +01:00
Johannes 'fish' Ziemke	d085de5a69	Add vim-common for xxd required by embed-static.sh Change-Id: Ie1c108dd49d0bbbbcdcd90719a192718ec46d2e4	2014-06-04 17:31:42 +02:00
Brian Brazil	e041c0cd46	Add console and alert templates with access to all data. Move rulemanager to it's own package to break cicrular dependency. Make NewTestTieredStorage available to tests, remove duplication. Change-Id: I33b321245a44aa727bfc3614a7c9ae5005b34e03	2014-05-30 16:24:56 +01:00
Julius Volz	16ca35c07e	Prometheus version 0.5.0. Change-Id: Ibf4d07c2878a9c22d6ade639b776dcfe3b533b34	2014-05-28 16:05:14 +02:00
Bjoern Rabenstein	ca6a4fccef	Weed out our homegrown test.Tester. The Go stdlib has testing.TB now, which fulfills the exact same purpose. Change-Id: I0db9c73400e208ca376b932a02b7e3402234b87c	2014-05-21 19:27:24 +02:00
Brian Brazil	23255f1499	Fix negative Next Retrieval on status page. Change-Id: Ifa754034660a251fee71f166dbf057697ec4e872	2014-05-12 15:24:34 +01:00
Bjoern Rabenstein	da28f8dd13	Link to relevant style guidelines. Change-Id: Iac9777a62a11cb8e5c13efe5229a85e82c5039cb	2014-05-06 12:23:03 +02:00
Julius Volz	4df5c7ab18	Optimize label matcher memory and runtime behavior. This optimizes the runtime and memory allocation behavior for label matchers other than type "Equal". Instead of creating a new set for every union of fingerprints, this simply adds new fingerprints to the existing set to achieve the same effect. The current behavior made a production Prometheus unresponsive when running a NotEqual match against the "instance" label (a label with high value cardinality). BEFORE: BenchmarkGetFingerprintsForNotEqualMatcher 10 170430297 ns/op 39229944 B/op 40709 allocs/op AFTER: BenchmarkGetFingerprintsForNotEqualMatcher 5000 706260 ns/op 217717 B/op 1116 allocs/op Change-Id: Ifd78e81e7dfbf5d7249e50ad1903a5d9c42c347a	2014-05-05 11:29:17 -04:00
Bjoern Rabenstein	64811caaec	Make Prometheus announce its new super-power: text format! Change-Id: Ia2ddfb28999c145e4d46c395381a9bf89d43148c	2014-04-22 18:44:52 +02:00
Tobias Schmidt	b3a78d2202	Merge "Do not indent API JSON responses."	2014-04-22 15:57:41 +02:00
Bjoern Rabenstein	de9a88b964	Ensure temporal order in streams. BenchmarkAppendSample.* before this change: BenchmarkAppendSample1 1000000 1142 ns/op --- BENCH: BenchmarkAppendSample1 memory_test.go:81: 1 cycles with 9992.000000 bytes per cycle, totalling 9992 memory_test.go:81: 100 cycles with 250.399994 bytes per cycle, totalling 25040 memory_test.go:81: 10000 cycles with 239.428802 bytes per cycle, totalling 2394288 memory_test.go:81: 1000000 cycles with 255.504684 bytes per cycle, totalling 255504688 BenchmarkAppendSample10 500000 3823 ns/op --- BENCH: BenchmarkAppendSample10 memory_test.go:81: 1 cycles with 15536.000000 bytes per cycle, totalling 15536 memory_test.go:81: 100 cycles with 662.239990 bytes per cycle, totalling 66224 memory_test.go:81: 10000 cycles with 601.937622 bytes per cycle, totalling 6019376 memory_test.go:81: 500000 cycles with 598.582764 bytes per cycle, totalling 299291408 BenchmarkAppendSample100 50000 41111 ns/op --- BENCH: BenchmarkAppendSample100 memory_test.go:81: 1 cycles with 79824.000000 bytes per cycle, totalling 79824 memory_test.go:81: 100 cycles with 4924.479980 bytes per cycle, totalling 492448 memory_test.go:81: 10000 cycles with 4278.019043 bytes per cycle, totalling 42780192 memory_test.go:81: 50000 cycles with 4275.242676 bytes per cycle, totalling 213762144 BenchmarkAppendSample1000 5000 533933 ns/op --- BENCH: BenchmarkAppendSample1000 memory_test.go:81: 1 cycles with 840224.000000 bytes per cycle, totalling 840224 memory_test.go:81: 100 cycles with 62789.281250 bytes per cycle, totalling 6278928 memory_test.go:81: 5000 cycles with 55208.601562 bytes per cycle, totalling 276043008 ok github.com/prometheus/prometheus/storage/metric/tiered 27.828s BenchmarkAppendSample.* after this change: BenchmarkAppendSample1 1000000 1109 ns/op --- BENCH: BenchmarkAppendSample1 memory_test.go:131: 1 cycles with 9992.000000 bytes per cycle, totalling 9992 memory_test.go:131: 100 cycles with 250.399994 bytes per cycle, totalling 25040 memory_test.go:131: 10000 cycles with 239.220795 bytes per cycle, totalling 2392208 memory_test.go:131: 1000000 cycles with 255.492630 bytes per cycle, totalling 255492624 BenchmarkAppendSample10 500000 3663 ns/op --- BENCH: BenchmarkAppendSample10 memory_test.go:131: 1 cycles with 15536.000000 bytes per cycle, totalling 15536 memory_test.go:131: 100 cycles with 662.239990 bytes per cycle, totalling 66224 memory_test.go:131: 10000 cycles with 601.889587 bytes per cycle, totalling 6018896 memory_test.go:131: 500000 cycles with 598.550903 bytes per cycle, totalling 299275472 BenchmarkAppendSample100 50000 40694 ns/op --- BENCH: BenchmarkAppendSample100 memory_test.go:131: 1 cycles with 78976.000000 bytes per cycle, totalling 78976 memory_test.go:131: 100 cycles with 4928.319824 bytes per cycle, totalling 492832 memory_test.go:131: 10000 cycles with 4277.961426 bytes per cycle, totalling 42779616 memory_test.go:131: 50000 cycles with 4275.054199 bytes per cycle, totalling 213752720 BenchmarkAppendSample1000 5000 530744 ns/op --- BENCH: BenchmarkAppendSample1000 memory_test.go:131: 1 cycles with 842192.000000 bytes per cycle, totalling 842192 memory_test.go:131: 100 cycles with 62765.441406 bytes per cycle, totalling 6276544 memory_test.go:131: 5000 cycles with 55209.812500 bytes per cycle, totalling 276049056 ok github.com/prometheus/prometheus/storage/metric/tiered 27.468s Change-Id: Idaa339cd83539b5e4391614541a2c3a04002d66d	2014-04-22 15:22:54 +02:00
Julius Volz	6297a405f2	Do not indent API JSON responses. In one example response, this reduced the uncompressed size by 25% and the gzipped size by 11%. Change-Id: Ie80d44253124b9f8601b8ef9fc978e92dacff523	2014-04-22 15:16:37 +02:00
Julius Volz	b3901827ee	Fix persistence references in tools subdir. These had escaped me because the tools aren't rebuilt if there are changes outside of the respective tool itself. Change-Id: I3e69631babdd95b18e698eb79098dfa59f60f597	2014-04-17 15:28:22 +02:00
Julius Volz	d2421a6916	Prometheus version 0.4.0. Change-Id: I752044a69f86aacc5a7e6da62a6f4a187cf67d27	2014-04-17 15:10:13 +02:00
Julius Volz	1b29975865	Fix RWLock memory storage deadlock. This fixes https://github.com/prometheus/prometheus/issues/390 The cause for the deadlock was a lock semantic in Go that wasn't obvious to me when introducing this bug: http://golang.org/pkg/sync/#RWMutex.Lock Key phrase: "To ensure that the lock eventually becomes available, a blocked Lock call excludes new readers from acquiring the lock." In the memory series storage, we have one function (GetFingerprintsForLabelMatchers) acquiring an RLock(), which calls another function also acquiring the same RLock() (GetLabelValuesForLabelName). That normally doesn't deadlock, unless a Lock() call from another goroutine happens right in between the two RLock() calls, blocking both the Lock() and the second RLock() call from ever completing. GoRoutine 1 GoRoutine 2 ====================================== RLock() ... Lock() [DEADLOCK] RLock() [DEADLOCK] Unlock() RUnlock() RUnlock() Testing deadlocks is tricky, but the regression test I added does reliably detect the deadlock in the original code on my machine within a normal concurrent reader/writer run duration of 250ms. Change-Id: Ib34c2bb8df1a80af44550cc2bf5007055cdef413	2014-04-17 13:43:13 +02:00
Julius Volz	01f652cb4c	Separate storage implementation from interfaces. This was initially motivated by wanting to distribute the rule checker tool under `tools/rule_checker`. However, this was not possible without also distributing the LevelDB dynamic libraries because the tool transitively depended on Levigo: rule checker -> query layer -> tiered storage layer -> leveldb This change separates external storage interfaces from the implementation (tiered storage, leveldb storage, memory storage) by putting them into separate packages: - storage/metric: public, implementation-agnostic interfaces - storage/metric/tiered: tiered storage implementation, including memory and LevelDB storage. I initially also considered splitting up the implementation into separate packages for tiered storage, memory storage, and LevelDB storage, but these are currently so intertwined that it would be another major project in itself. The query layers and most other parts of Prometheus now have notion of the storage implementation anymore and just use whatever implementation they get passed in via interfaces. The rule_checker is now a static binary :) Change-Id: I793bbf631a8648ca31790e7e772ecf9c2b92f7a0	2014-04-16 13:30:19 +02:00
Matt T. Proud	3e969a8ca2	Parameterize the buffer for marshal/unmarshal. We are not reusing buffers yet. This could introduce problems, so the behavior is disabled for now. Cursory benchmark data: - Marshal for 10,000 samples: -30% overhead. - Unmarshal for 10,000 samples: -15% overhead. Change-Id: Ib006bdc656af45dca2b92de08a8f905d8d728cac	2014-04-16 12:16:59 +02:00
Matt T. Proud	2064f32662	Clean up quitting behavior and add quit trigger. The closing of Prometheus now using a sync.Once wrapper to prevent any accidental multiple invocations of it, which could trigger corruption or a race condition. The shutdown process is made more verbose through logging. A not-enabled by default web handler has been provided to trigger a remote shutdown if requested for debugging purposes. Change-Id: If4fee75196bbff1fb1e4a4ef7e1cfa53fef88f2e	2014-04-15 21:40:04 +02:00
Matt T. Proud	58ef638e72	Merge "Use idiomatic one-to-many one-time signal pattern."	2014-04-15 21:26:31 +02:00
Matt T. Proud	6ec72393c4	Correct size of unmarshalling destination buffer. The format header size is not deducted from the size of the byte stream when calculating the output buffer size for samples. I have yet to notice problems directly as a result of this, but it is good to fix. Change-Id: Icb07a0718366c04ddac975d738a6305687773af0	2014-04-15 11:55:44 +02:00
Matt T. Proud	81367893fd	Use idiomatic one-to-many one-time signal pattern. The idiomatic pattern for signalling a one-time message to multiple consumers from a single producer is as follows: ``` c := make(chan struct{}) w := new(sync.WaitGroup) // Boilerplate to ensure synchronization. for i := 0; i < 1000; i++ { w.Add(1) go func() { defer w.Done() for { select { case _, ok := <- c: if !ok { return } default: // Do something here. } } }() } close(c) // Signal the one-to-many single-use message. w.Wait() ``` Change-Id: I755f73ba4c70a923afd342a4dea63365bdf2144b	2014-04-15 10:15:25 +02:00
Matt T. Proud	1d01435d4d	Make curation semaphore behavior idiomatic. Idiomatic semaphore usage in Go, unless it is wrapping a concrete type, should use anonymous empty structs (``struct{}``). This has several features that are worthwhile: 1. It conveys that the object in the channel is likely used for resource limiting / semaphore use. This is by idiom. 2. Due to magic under the hood, empty structs have a width of zero, meaning they consume little space. It is presumed that slices, channels, and other values of them can be represented specially with alternative optimizations. Dmitry Vyukov has done investigations into improvements that can be made to the channel design and Go and concludes that there are already nice short circuiting behaviors at work with this type. This is the first change of several that apply this type of change to suitable places. In this one change, we fix a bug in the previous revision, whereby a semaphore can be acquired for curation and never released back for subsequent work: http://goo.gl/70Y2qK. Compare that versus the compaction definition above. On top of that, the use of the semaphore in the mode better supports system shutdown idioms through the closing of channels. Change-Id: Idb4fca310f26b73c9ec690bbdd4136180d14c32d	2014-04-14 22:51:58 +02:00
Matt T. Proud	e9eda76192	Fix Mac OS X build since we upgraded to go1.2. Since go1.2, the release engineers have keyed their release artifacts to the major release family of Mac OS X. Change-Id: Ia4bf0c86af9884748e21be14ab6e09f01a830e19	2014-04-14 21:16:30 +02:00
Björn Rabenstein	95bc920f5f	Merge "Allow reversing vector and scalar arguments in binops."	2014-04-08 17:17:42 +02:00
Julius Volz	d411a7d810	Allow reversing vector and scalar arguments in binops. This allows putting a scalar as the first argument of a binary operator in which the second argument is a vector: <scalar> <binop> <vector> For example, 1 / http_requests_total ...will output a vector in which every sample value is 1 divided by the respective input vector element. This even works for filter binary operators now: 1 == http_requests_total Returns a vector with all values set to 1 for every element in http_requests_total whose initial value was 1. Note: For filter binary operators, the resulting values are always taken from the left-hand-side of the operation, no matter whether the scalar or the vector argument is the left-hand-side. That is, 1 != http_requests_total ...will set all result vector sample values to 1, although these are exactly the sample elements that were != 1 in the input vector. If you want to just filter elements without changing their sample values, you still need to do: http_requests_total != 1 The new filter form is a bit exotic, and so probably won't be used often. But it was easier to implement it than disallow it completely or change its behavior. Change-Id: Idd083f2bd3a1219ba1560cf4ace42f5b82e797a5	2014-04-08 17:16:18 +02:00
Julius Volz	26bbca4270	Prometheus version 0.3.0. Change-Id: Iebfbbf033140ff82ac051d22c885916763768f8c	2014-04-05 01:49:46 +02:00
Julius Volz	c7c0b33d0b	Add regex-matching support for labels. There are four label-matching ops for selecting timeseries now: - Equal: = - NotEqual: != - RegexMatch: =~ - RegexNoMatch: !~ Instead of looking up labels by a simple clientmodel.LabelSet (basically an equals op for every key/value pair in the set), timeseries fingerprint selection is now done via a list of metric.LabelMatchers. Change-Id: I510a83f761198e80946146770ebb64e4abc3bb96	2014-04-01 14:24:53 +02:00
Julius Volz	ae30453214	Add label names -> label values index. Change-Id: Ie39b4044558afc4d1aa937de7dcf8df61f821fb4	2014-03-28 15:16:37 +01:00
Julius Volz	71d2ff406d	Prometheus version 0.2.1. Change-Id: I288f88390d3eee45bf684647337e7bfff11def6a	2014-03-26 13:29:10 +01:00
Julius Volz	7a577b86b7	Fix interval op special case. In the case that a getValuesAtIntervalOp's ExtractSamples() is called with a current time after the last chunk time, we return without extracting any further values beyond the last one in the chunk (correct), but also without advancing the op's time (incorrect). This leads to an infinite loop in renderView(), since the op is called repeatedly without ever being advanced and consumed. This adds handling for this special case. When detecting this case, we immediately set the op to be consumed, since we would always get a value after the current time passed in if there was one. Change-Id: Id99149e07b5188d655331382b8b6a461b677005c	2014-03-26 13:29:03 +01:00
Bjoern Rabenstein	257b720e87	Fix typo. Change-Id: I6e7edcb48ace7fe4d6de4ff16519da5bb326b6ce	2014-03-25 12:22:18 +01:00
Bjoern Rabenstein	caf47b2fbc	New encoding for OpenTSDB tag values (and metric names). Change-Id: I0f4393f638c6e2bb2b2ce14e58e38b49ce456da8	2014-03-21 17:18:44 +01:00
Bjoern Rabenstein	0a65b691cc	Disallow ":" in identifiers, but still allow it in metric names. Change-Id: Iace925ab1b71a360bd63357e87f68e727f7afbcb	2014-03-21 13:44:37 +01:00
Julius Volz	0e7596b653	Prometheus version 0.2.0. Change-Id: I4ecc8b909fc90378d855ea3620e1f6f75cc53b6d	2014-03-18 17:21:53 +01:00
Julius Volz	9d5c367745	Fix incorrect interval op advancement. This fixes a bug where an interval op might advance too far past the end of the currently extracted chunk, effectively skipping over relevant (to-be-extracted) values in the subsequent chunk. The result: missing samples at chunk boundaries in the resulting view. Change-Id: Iebf5d086293a277d330039c69f78e1eaf084b3c8	2014-03-18 16:22:50 +01:00
Julius Volz	cc04238a85	Switch to new "__name__" metric name label. This also fixes the compaction test, which before worked only because the input sample sorting was accidentally equal to the resulting on-disk sample sorting. Change-Id: I2a21c4b46ba562424b27058fc02eba84fa6a6006	2014-03-14 16:52:37 +01:00
Bjoern Rabenstein	c3b282bd14	Add regression tests for 'loop until op is consumed' bug. - Most of this is the actual regression test in tiered_test.go. - Working on that regression tests uncovered problems in tiered_test.go that are fixed in this commit. - The 'op.consumed = false' line added to freelist.go was actually not fixing a bug. Instead, there was no bug at all. So this commit removes that line again, but adds a regression test to make sure that the assumed bug is indeed not there (cf. freelist_test.go). - Removed more code duplication in operation.go (following the same approach as before, i.e. embedding op type A into op type B if everything in A is the same as in B with the exception of String() and ExtractSample()). (This change make struct literals for ops more clunky, but that only affects tests. No code change whatsoever was necessary in the actual code after this refactoring.) - Fix another op leak in tiered.go. Change-Id: Ia165c52e33290ad4f6aba9c83d92318d4f583517	2014-03-12 18:40:24 +01:00
Björn Rabenstein	b470fb0672	Merge "Convert metric.Values to slice of values."	2014-03-11 18:44:04 +01:00
Björn Rabenstein	b7ba349ca8	Merge "Introduce semantic versioning."	2014-03-11 18:34:21 +01:00
Julius Volz	86fc13a52e	Convert metric.Values to slice of values. The initial impetus for this was that it made unmarshalling sample values much faster. Other relevant benchmark changes in ns/op: Benchmark old new speedup ================================================================== BenchmarkMarshal 179170 127996 1.4x BenchmarkUnmarshal 404984 132186 3.1x BenchmarkMemoryGetValueAtTime 57801 50050 1.2x BenchmarkMemoryGetBoundaryValues 64496 53194 1.2x BenchmarkMemoryGetRangeValues 66585 54065 1.2x BenchmarkStreamAdd 45.0 75.3 0.6x BenchmarkAppendSample1 1157 1587 0.7x BenchmarkAppendSample10 4090 4284 0.95x BenchmarkAppendSample100 45660 44066 1.0x BenchmarkAppendSample1000 579084 582380 1.0x BenchmarkMemoryAppendRepeatingValues 22796594 22005502 1.0x Overall, this gives us good speedups in the areas where they matter most: decoding values from disk and accessing the memory storage (which is also used for views). Some of the smaller append examples take minimally longer, but the cost seems to get amortized over larger appends, so I'm not worried about these. Also, we're currently not bottlenecked on the write path and have plenty of other optimizations available in that area if it becomes necessary. Memory allocations during appends don't change measurably at all. Change-Id: I7dc7394edea09506976765551f35b138518db9e8	2014-03-11 18:23:37 +01:00

1 2 3 4 5 ...

961 commits