The first sort in groupByFingerprint already ensures that all resulting sample
lists contain only one fingerprint. We also already assume that all
samples passed into AppendSamples (and thus groupByFingerprint) are
chronologically sorted within each fingerprint.
The extra chronological sort is thus superfluous. Furthermore, this
second sort didn't only sort chronologically, but also compared all
metric fingerprints again (although we already know that we're only
sorting within samples for the same fingerprint). This caused a huge
memory and runtime overhead.
In a heavily loaded real Prometheus, this brought down disk flush times
from ~9 minutes to ~1 minute.
OLD:
BenchmarkLevelDBAppendRepeatingValues 5 331391808 ns/op 44542953 B/op 597788 allocs/op
BenchmarkLevelDBAppendsRepeatingValues 5 329893512 ns/op 46968288 B/op 3104373 allocs/op
NEW:
BenchmarkLevelDBAppendRepeatingValues 5 299298635 ns/op 43329497 B/op 567616 allocs/op
BenchmarkLevelDBAppendsRepeatingValues 20 92204601 ns/op 1779454 B/op 70975 allocs/op
Change-Id: Ie2d8db3569b0102a18010f9e106e391fda7f7883
This fixes the problem where samples become temporarily unavailable for
queries while they are being flushed to disk. Although the entire
flushing code could use some major refactoring, I'm explicitly trying to
do the minimal change to fix the problem since there's a whole new
storage implementation in the pipeline.
Change-Id: I0f5393a30b88654c73567456aeaea62f8b3756d9
Due to the lack of a </a>, this makes the entire header render badly.
Accordingly it's safe to assume noone is using it, so remove it.
With the new console template support, we'll need to something a bit
more nuanced later.
Change-Id: I3424bed6aea18cbd4c63ad48f98808098dadc3ad
This attempts to reasonably handle things from weekly cronjobs,
to rpcs taking ns to things that are usually ms but jump to over a second.
For consistency, stop putting spaces before prefixes.
Change-Id: I6407879187b25680b323cd70254e205315b5fc3c
Add a function to bypass the new auto-escaping.
Add a function to workaround go's templates only allowing passing in one argument.
Change-Id: Id7aa3f95e7c227692dc22108388b1d9b1e2eec99
This is consistent with alertmanager, and more intiutive for users.
The graphs page just has graphs, so remove mention of consoles.
Change-Id: I87780a4ade33697a6095423e1a7de47d341d2838
Move rulemanager to it's own package to break cicrular dependency.
Make NewTestTieredStorage available to tests, remove duplication.
Change-Id: I33b321245a44aa727bfc3614a7c9ae5005b34e03
This optimizes the runtime and memory allocation behavior for label matchers
other than type "Equal". Instead of creating a new set for every union of
fingerprints, this simply adds new fingerprints to the existing set to achieve
the same effect.
The current behavior made a production Prometheus unresponsive when running a
NotEqual match against the "instance" label (a label with high value
cardinality).
BEFORE:
BenchmarkGetFingerprintsForNotEqualMatcher 10 170430297 ns/op 39229944 B/op 40709 allocs/op
AFTER:
BenchmarkGetFingerprintsForNotEqualMatcher 5000 706260 ns/op 217717 B/op 1116 allocs/op
Change-Id: Ifd78e81e7dfbf5d7249e50ad1903a5d9c42c347a
These had escaped me because the tools aren't rebuilt if there are
changes outside of the respective tool itself.
Change-Id: I3e69631babdd95b18e698eb79098dfa59f60f597
This fixes https://github.com/prometheus/prometheus/issues/390
The cause for the deadlock was a lock semantic in Go that wasn't
obvious to me when introducing this bug:
http://golang.org/pkg/sync/#RWMutex.Lock
Key phrase: "To ensure that the lock eventually becomes available, a
blocked Lock call excludes new readers from acquiring the lock."
In the memory series storage, we have one function
(GetFingerprintsForLabelMatchers) acquiring an RLock(), which calls
another function also acquiring the same RLock()
(GetLabelValuesForLabelName). That normally doesn't deadlock, unless a
Lock() call from another goroutine happens right in between the two
RLock() calls, blocking both the Lock() and the second RLock() call from
ever completing.
GoRoutine 1 GoRoutine 2
======================================
RLock()
... Lock() [DEADLOCK]
RLock() [DEADLOCK] Unlock()
RUnlock()
RUnlock()
Testing deadlocks is tricky, but the regression test I added does
reliably detect the deadlock in the original code on my machine within a
normal concurrent reader/writer run duration of 250ms.
Change-Id: Ib34c2bb8df1a80af44550cc2bf5007055cdef413
This was initially motivated by wanting to distribute the rule checker
tool under `tools/rule_checker`. However, this was not possible without
also distributing the LevelDB dynamic libraries because the tool
transitively depended on Levigo:
rule checker -> query layer -> tiered storage layer -> leveldb
This change separates external storage interfaces from the
implementation (tiered storage, leveldb storage, memory storage) by
putting them into separate packages:
- storage/metric: public, implementation-agnostic interfaces
- storage/metric/tiered: tiered storage implementation, including memory
and LevelDB storage.
I initially also considered splitting up the implementation into
separate packages for tiered storage, memory storage, and LevelDB
storage, but these are currently so intertwined that it would be another
major project in itself.
The query layers and most other parts of Prometheus now have notion of
the storage implementation anymore and just use whatever implementation
they get passed in via interfaces.
The rule_checker is now a static binary :)
Change-Id: I793bbf631a8648ca31790e7e772ecf9c2b92f7a0
We are not reusing buffers yet. This could introduce problems,
so the behavior is disabled for now.
Cursory benchmark data:
- Marshal for 10,000 samples: -30% overhead.
- Unmarshal for 10,000 samples: -15% overhead.
Change-Id: Ib006bdc656af45dca2b92de08a8f905d8d728cac
The closing of Prometheus now using a sync.Once wrapper to prevent
any accidental multiple invocations of it, which could trigger
corruption or a race condition. The shutdown process is made more
verbose through logging.
A not-enabled by default web handler has been provided to trigger a
remote shutdown if requested for debugging purposes.
Change-Id: If4fee75196bbff1fb1e4a4ef7e1cfa53fef88f2e
The format header size is not deducted from the size of the byte
stream when calculating the output buffer size for samples. I have
yet to notice problems directly as a result of this, but it is good
to fix.
Change-Id: Icb07a0718366c04ddac975d738a6305687773af0
The idiomatic pattern for signalling a one-time message to multiple
consumers from a single producer is as follows:
```
c := make(chan struct{})
w := new(sync.WaitGroup) // Boilerplate to ensure synchronization.
for i := 0; i < 1000; i++ {
w.Add(1)
go func() {
defer w.Done()
for {
select {
case _, ok := <- c:
if !ok {
return
}
default:
// Do something here.
}
}
}()
}
close(c) // Signal the one-to-many single-use message.
w.Wait()
```
Change-Id: I755f73ba4c70a923afd342a4dea63365bdf2144b
Idiomatic semaphore usage in Go, unless it is wrapping a concrete type,
should use anonymous empty structs (``struct{}``). This has several
features that are worthwhile:
1. It conveys that the object in the channel is likely used for
resource limiting / semaphore use. This is by idiom.
2. Due to magic under the hood, empty structs have a width of zero,
meaning they consume little space. It is presumed that slices,
channels, and other values of them can be represented specially
with alternative optimizations. Dmitry Vyukov has done
investigations into improvements that can be made to the channel
design and Go and concludes that there are already nice short
circuiting behaviors at work with this type.
This is the first change of several that apply this type of change to
suitable places.
In this one change, we fix a bug in the previous revision, whereby a
semaphore can be acquired for curation and never released back for
subsequent work: http://goo.gl/70Y2qK. Compare that versus the
compaction definition above.
On top of that, the use of the semaphore in the mode better supports
system shutdown idioms through the closing of channels.
Change-Id: Idb4fca310f26b73c9ec690bbdd4136180d14c32d
Since go1.2, the release engineers have keyed their release
artifacts to the major release family of Mac OS X.
Change-Id: Ia4bf0c86af9884748e21be14ab6e09f01a830e19
This allows putting a scalar as the first argument of a binary operator
in which the second argument is a vector:
<scalar> <binop> <vector>
For example,
1 / http_requests_total
...will output a vector in which every sample value is 1 divided by the
respective input vector element.
This even works for filter binary operators now:
1 == http_requests_total
Returns a vector with all values set to 1 for every element in
http_requests_total whose initial value was 1.
Note: For filter binary operators, the resulting values are always taken
from the left-hand-side of the operation, no matter whether the scalar
or the vector argument is the left-hand-side. That is,
1 != http_requests_total
...will set all result vector sample values to 1, although these are
exactly the sample elements that were != 1 in the input vector.
If you want to just filter elements without changing their sample
values, you still need to do:
http_requests_total != 1
The new filter form is a bit exotic, and so probably won't be used
often. But it was easier to implement it than disallow it completely or
change its behavior.
Change-Id: Idd083f2bd3a1219ba1560cf4ace42f5b82e797a5
There are four label-matching ops for selecting timeseries now:
- Equal: =
- NotEqual: !=
- RegexMatch: =~
- RegexNoMatch: !~
Instead of looking up labels by a simple clientmodel.LabelSet (basically
an equals op for every key/value pair in the set), timeseries
fingerprint selection is now done via a list of metric.LabelMatchers.
Change-Id: I510a83f761198e80946146770ebb64e4abc3bb96
In the case that a getValuesAtIntervalOp's ExtractSamples() is called
with a current time after the last chunk time, we return without
extracting any further values beyond the last one in the chunk
(correct), but also without advancing the op's time (incorrect). This
leads to an infinite loop in renderView(), since the op is called
repeatedly without ever being advanced and consumed.
This adds handling for this special case. When detecting this case, we
immediately set the op to be consumed, since we would always get a value
after the current time passed in if there was one.
Change-Id: Id99149e07b5188d655331382b8b6a461b677005c
This fixes a bug where an interval op might advance too far past the end
of the currently extracted chunk, effectively skipping over relevant
(to-be-extracted) values in the subsequent chunk. The result: missing
samples at chunk boundaries in the resulting view.
Change-Id: Iebf5d086293a277d330039c69f78e1eaf084b3c8
This also fixes the compaction test, which before worked only because
the input sample sorting was accidentally equal to the resulting on-disk
sample sorting.
Change-Id: I2a21c4b46ba562424b27058fc02eba84fa6a6006
- Most of this is the actual regression test in tiered_test.go.
- Working on that regression tests uncovered problems in
tiered_test.go that are fixed in this commit.
- The 'op.consumed = false' line added to freelist.go was actually not
fixing a bug. Instead, there was no bug at all. So this commit
removes that line again, but adds a regression test to make sure
that the assumed bug is indeed not there (cf. freelist_test.go).
- Removed more code duplication in operation.go (following the same
approach as before, i.e. embedding op type A into op type B if
everything in A is the same as in B with the exception of String()
and ExtractSample()). (This change make struct literals for ops more
clunky, but that only affects tests. No code change whatsoever was
necessary in the actual code after this refactoring.)
- Fix another op leak in tiered.go.
Change-Id: Ia165c52e33290ad4f6aba9c83d92318d4f583517
The initial impetus for this was that it made unmarshalling sample
values much faster.
Other relevant benchmark changes in ns/op:
Benchmark old new speedup
==================================================================
BenchmarkMarshal 179170 127996 1.4x
BenchmarkUnmarshal 404984 132186 3.1x
BenchmarkMemoryGetValueAtTime 57801 50050 1.2x
BenchmarkMemoryGetBoundaryValues 64496 53194 1.2x
BenchmarkMemoryGetRangeValues 66585 54065 1.2x
BenchmarkStreamAdd 45.0 75.3 0.6x
BenchmarkAppendSample1 1157 1587 0.7x
BenchmarkAppendSample10 4090 4284 0.95x
BenchmarkAppendSample100 45660 44066 1.0x
BenchmarkAppendSample1000 579084 582380 1.0x
BenchmarkMemoryAppendRepeatingValues 22796594 22005502 1.0x
Overall, this gives us good speedups in the areas where they matter
most: decoding values from disk and accessing the memory storage (which
is also used for views).
Some of the smaller append examples take minimally longer, but the cost
seems to get amortized over larger appends, so I'm not worried about
these. Also, we're currently not bottlenecked on the write path and have
plenty of other optimizations available in that area if it becomes
necessary.
Memory allocations during appends don't change measurably at all.
Change-Id: I7dc7394edea09506976765551f35b138518db9e8