This fixes https://github.com/prometheus/prometheus/issues/390
The cause for the deadlock was a lock semantic in Go that wasn't
obvious to me when introducing this bug:
http://golang.org/pkg/sync/#RWMutex.Lock
Key phrase: "To ensure that the lock eventually becomes available, a
blocked Lock call excludes new readers from acquiring the lock."
In the memory series storage, we have one function
(GetFingerprintsForLabelMatchers) acquiring an RLock(), which calls
another function also acquiring the same RLock()
(GetLabelValuesForLabelName). That normally doesn't deadlock, unless a
Lock() call from another goroutine happens right in between the two
RLock() calls, blocking both the Lock() and the second RLock() call from
ever completing.
GoRoutine 1 GoRoutine 2
======================================
RLock()
... Lock() [DEADLOCK]
RLock() [DEADLOCK] Unlock()
RUnlock()
RUnlock()
Testing deadlocks is tricky, but the regression test I added does
reliably detect the deadlock in the original code on my machine within a
normal concurrent reader/writer run duration of 250ms.
Change-Id: Ib34c2bb8df1a80af44550cc2bf5007055cdef413
This was initially motivated by wanting to distribute the rule checker
tool under `tools/rule_checker`. However, this was not possible without
also distributing the LevelDB dynamic libraries because the tool
transitively depended on Levigo:
rule checker -> query layer -> tiered storage layer -> leveldb
This change separates external storage interfaces from the
implementation (tiered storage, leveldb storage, memory storage) by
putting them into separate packages:
- storage/metric: public, implementation-agnostic interfaces
- storage/metric/tiered: tiered storage implementation, including memory
and LevelDB storage.
I initially also considered splitting up the implementation into
separate packages for tiered storage, memory storage, and LevelDB
storage, but these are currently so intertwined that it would be another
major project in itself.
The query layers and most other parts of Prometheus now have notion of
the storage implementation anymore and just use whatever implementation
they get passed in via interfaces.
The rule_checker is now a static binary :)
Change-Id: I793bbf631a8648ca31790e7e772ecf9c2b92f7a0
We are not reusing buffers yet. This could introduce problems,
so the behavior is disabled for now.
Cursory benchmark data:
- Marshal for 10,000 samples: -30% overhead.
- Unmarshal for 10,000 samples: -15% overhead.
Change-Id: Ib006bdc656af45dca2b92de08a8f905d8d728cac
The closing of Prometheus now using a sync.Once wrapper to prevent
any accidental multiple invocations of it, which could trigger
corruption or a race condition. The shutdown process is made more
verbose through logging.
A not-enabled by default web handler has been provided to trigger a
remote shutdown if requested for debugging purposes.
Change-Id: If4fee75196bbff1fb1e4a4ef7e1cfa53fef88f2e
The format header size is not deducted from the size of the byte
stream when calculating the output buffer size for samples. I have
yet to notice problems directly as a result of this, but it is good
to fix.
Change-Id: Icb07a0718366c04ddac975d738a6305687773af0
The idiomatic pattern for signalling a one-time message to multiple
consumers from a single producer is as follows:
```
c := make(chan struct{})
w := new(sync.WaitGroup) // Boilerplate to ensure synchronization.
for i := 0; i < 1000; i++ {
w.Add(1)
go func() {
defer w.Done()
for {
select {
case _, ok := <- c:
if !ok {
return
}
default:
// Do something here.
}
}
}()
}
close(c) // Signal the one-to-many single-use message.
w.Wait()
```
Change-Id: I755f73ba4c70a923afd342a4dea63365bdf2144b
Idiomatic semaphore usage in Go, unless it is wrapping a concrete type,
should use anonymous empty structs (``struct{}``). This has several
features that are worthwhile:
1. It conveys that the object in the channel is likely used for
resource limiting / semaphore use. This is by idiom.
2. Due to magic under the hood, empty structs have a width of zero,
meaning they consume little space. It is presumed that slices,
channels, and other values of them can be represented specially
with alternative optimizations. Dmitry Vyukov has done
investigations into improvements that can be made to the channel
design and Go and concludes that there are already nice short
circuiting behaviors at work with this type.
This is the first change of several that apply this type of change to
suitable places.
In this one change, we fix a bug in the previous revision, whereby a
semaphore can be acquired for curation and never released back for
subsequent work: http://goo.gl/70Y2qK. Compare that versus the
compaction definition above.
On top of that, the use of the semaphore in the mode better supports
system shutdown idioms through the closing of channels.
Change-Id: Idb4fca310f26b73c9ec690bbdd4136180d14c32d
Since go1.2, the release engineers have keyed their release
artifacts to the major release family of Mac OS X.
Change-Id: Ia4bf0c86af9884748e21be14ab6e09f01a830e19
This allows putting a scalar as the first argument of a binary operator
in which the second argument is a vector:
<scalar> <binop> <vector>
For example,
1 / http_requests_total
...will output a vector in which every sample value is 1 divided by the
respective input vector element.
This even works for filter binary operators now:
1 == http_requests_total
Returns a vector with all values set to 1 for every element in
http_requests_total whose initial value was 1.
Note: For filter binary operators, the resulting values are always taken
from the left-hand-side of the operation, no matter whether the scalar
or the vector argument is the left-hand-side. That is,
1 != http_requests_total
...will set all result vector sample values to 1, although these are
exactly the sample elements that were != 1 in the input vector.
If you want to just filter elements without changing their sample
values, you still need to do:
http_requests_total != 1
The new filter form is a bit exotic, and so probably won't be used
often. But it was easier to implement it than disallow it completely or
change its behavior.
Change-Id: Idd083f2bd3a1219ba1560cf4ace42f5b82e797a5
There are four label-matching ops for selecting timeseries now:
- Equal: =
- NotEqual: !=
- RegexMatch: =~
- RegexNoMatch: !~
Instead of looking up labels by a simple clientmodel.LabelSet (basically
an equals op for every key/value pair in the set), timeseries
fingerprint selection is now done via a list of metric.LabelMatchers.
Change-Id: I510a83f761198e80946146770ebb64e4abc3bb96
In the case that a getValuesAtIntervalOp's ExtractSamples() is called
with a current time after the last chunk time, we return without
extracting any further values beyond the last one in the chunk
(correct), but also without advancing the op's time (incorrect). This
leads to an infinite loop in renderView(), since the op is called
repeatedly without ever being advanced and consumed.
This adds handling for this special case. When detecting this case, we
immediately set the op to be consumed, since we would always get a value
after the current time passed in if there was one.
Change-Id: Id99149e07b5188d655331382b8b6a461b677005c
This fixes a bug where an interval op might advance too far past the end
of the currently extracted chunk, effectively skipping over relevant
(to-be-extracted) values in the subsequent chunk. The result: missing
samples at chunk boundaries in the resulting view.
Change-Id: Iebf5d086293a277d330039c69f78e1eaf084b3c8
This also fixes the compaction test, which before worked only because
the input sample sorting was accidentally equal to the resulting on-disk
sample sorting.
Change-Id: I2a21c4b46ba562424b27058fc02eba84fa6a6006
- Most of this is the actual regression test in tiered_test.go.
- Working on that regression tests uncovered problems in
tiered_test.go that are fixed in this commit.
- The 'op.consumed = false' line added to freelist.go was actually not
fixing a bug. Instead, there was no bug at all. So this commit
removes that line again, but adds a regression test to make sure
that the assumed bug is indeed not there (cf. freelist_test.go).
- Removed more code duplication in operation.go (following the same
approach as before, i.e. embedding op type A into op type B if
everything in A is the same as in B with the exception of String()
and ExtractSample()). (This change make struct literals for ops more
clunky, but that only affects tests. No code change whatsoever was
necessary in the actual code after this refactoring.)
- Fix another op leak in tiered.go.
Change-Id: Ia165c52e33290ad4f6aba9c83d92318d4f583517
The initial impetus for this was that it made unmarshalling sample
values much faster.
Other relevant benchmark changes in ns/op:
Benchmark old new speedup
==================================================================
BenchmarkMarshal 179170 127996 1.4x
BenchmarkUnmarshal 404984 132186 3.1x
BenchmarkMemoryGetValueAtTime 57801 50050 1.2x
BenchmarkMemoryGetBoundaryValues 64496 53194 1.2x
BenchmarkMemoryGetRangeValues 66585 54065 1.2x
BenchmarkStreamAdd 45.0 75.3 0.6x
BenchmarkAppendSample1 1157 1587 0.7x
BenchmarkAppendSample10 4090 4284 0.95x
BenchmarkAppendSample100 45660 44066 1.0x
BenchmarkAppendSample1000 579084 582380 1.0x
BenchmarkMemoryAppendRepeatingValues 22796594 22005502 1.0x
Overall, this gives us good speedups in the areas where they matter
most: decoding values from disk and accessing the memory storage (which
is also used for views).
Some of the smaller append examples take minimally longer, but the cost
seems to get amortized over larger appends, so I'm not worried about
these. Also, we're currently not bottlenecked on the write path and have
plenty of other optimizations available in that area if it becomes
necessary.
Memory allocations during appends don't change measurably at all.
Change-Id: I7dc7394edea09506976765551f35b138518db9e8
This introduces semantic versioning (http://semver.org/) in Prometheus:
- A new VERSION file contains the semantic version string.
- The "tarball" target now includes versioning and build information in
the tarball name, like: "prometheus-0.1.0.linux-amd64.tar.gz".
- A new "release" target allows scp-ing the versioned tarball to a
remote machine (file server).
- A new "tag" target allows git-tagging the current revision with the
version specified in VERSION.
Change-Id: I1f19f38b9b317bfa9eb513754750df5a9c602d94
This doesn't add complex discriminator logic yet, but adds a single
version byte to the beginning of each samples chunk. If we ever need to
change the disk format again, this will make it easy to do so without
having to wipe the entire database.
Change-Id: I60c39274256f790bc2da83167a1effaa174588fe
This fixes https://github.com/prometheus/prometheus/issues/381.
For any stale series we dropped from memory, this bug caused us to also drop
any other series from the labelpair->fingerprints memory index if they had any
label/value-pairs in common with the intentionally dropped series.
To fix this issue more easily, I converted the labelpair->fingerprints index
map values to a utility.Set of clientmodel.Fingerprints. This makes handling
this index much easier in general.
Change-Id: If5e81e202e8c542261bbd9797aa1257376c5c074
Currently, rendering a view is capable of handling multiple ops for
the same fingerprint efficiently. However, this capability requires a
lot of complexity in the code, which we are not using at all because
the way we assemble a viewRequest will never have more than one
operation per fingerprint.
This commit weeds out the said capability, along with all the code
needed for it. It is still possible to have more than one operation
for the same fingerprint, it will just be handled in a less efficient
way (as proven by the unit tests).
As a result, scanjob.go could be removed entirely.
This commit also contains a few related refactorings and removals of
dead code in operation.go, view,go, and freelist.go. Also, the
docstrings received some love.
Change-Id: I032b976e0880151c3f3fdb3234fb65e484f0e2e5
We have seven different types all called like LevelDB.*Options. One
of them is the plain LevelDBOptions. All others are just wrapping that
type without adding anything except clunkier handling.
If there ever was a plan to add more specific options to the various
LevelDB.*Options types, history has proven that nothing like that is
going to happen anytime soon.
To keep the code a bit shorter and more focused on the real (quite
significant) complexities we have to deal with here, this commit
reduces all uses of LevelDBOptions to the actual LevelDBOptions type.
1576 fewer characters to read...
Change-Id: I3d7a2b7ffed78b337aa37f812c53c058329ecaa6
- Mostly docstring fixed/additions.
(Please review these carefully, since most of them were missing, I
had to guess them from an outsider's perspective. (Which on the
other hand proves how desperately required many of these docstrings
are.))
- Removed all uses of new(...) to meet our own style guide (draft).
- Fixed all other 'go vet' and 'golint' issues (except those that are
not fixable (i.e. caused by bugs in or by design of 'go vet' and
'golint')).
- Some trivial refactorings, like reorder functions, minor renames, ...
- Some slightly less trivial refactoring, mostly to reduce code
duplication by embedding types instead of writing many explicit
forwarders.
- Cleaned up the interface structure a bit. (Most significant probably
the removal of the View-like methods from MetricPersistenc. Now they
are only in View and not duplicated anymore.)
- Removed dead code. (Probably not all of it, but it's a first
step...)
- Fixed a leftover in storage/metric/end_to_end_test.go (that made
some parts of the code never execute (incidentally, those parts
were broken (and I fixed them, too))).
Change-Id: Ibcac069940d118a88f783314f5b4595dce6641d5