If only very few chunks are to be truncated from a very large series
file, the rewrite of the file is a lorge overhead. With this change, a
certain ratio of the file has to be dropped to make it happen. While
only causing disk overhead at about the same ratio (by default 10%),
it will cut down I/O by a lot in above scenario.
The test had become flaky with Go1.5.
Theory here is that with Go1.5.x, sleeping for 10ms might not be
enough to wake up another goroutine, possibly because it is used for
GC. 50ms should always be enough due to GC pause guarantees with the
new GC.
This is with `golint -min_confidence=0.5`.
I left several lint warnings untouched because they were either
incorrect or I felt it was better not to change them at the moment.
If users see the crash recovery error, the chances are
they aren't shutting down Prometheus correctly. Telling
them how to do so will help them debug and fix the problem.
Perhaps it would be even better to still warn in case the sample value has
changed but the timestamps are equal, but we don't have efficient access
to the last value.
For the label matching index-based preselection phase, don't do an OR
between equality and non-equality matchers. Execute only one of the two
(with equality matchers preferred when present).
Fixes https://github.com/prometheus/prometheus/issues/924
If all samples in consecutive chunks have the same timestamp, the way
we used to load chunks will fail. With this change, the persist
watermark is used to load the right amount of chunkDescs from disk.
This bug is a possible reason for the rare storage corruption we have
observed.
Fixes https://github.com/prometheus/prometheus/issues/481
While doing so, clean up and fix a few other things:
- Fix `go vet` warnings (@fabxc to blame ;).
- Fix a racey problem with unarchiving: Whenever we unarchive a
series, we essentially want to do something with it. However, until
we have done something with it, it appears like a series that is
ready to be archived or even purged. So e.g. it would be ignored
during checkpointing. With this fix, we always load the chunkDescs
upon unarchiving. This is wasteful if we only want to add a new
sample to an archived time series, but the (presumably more common)
case where we access an archived time series in a query doesn't
become more expensive.
- The change above streamlined the getOrCreateSeries ond
newMemorySeries flow. Also, the modTime is now always set correctly.
- Fix the leveldb-backed implementation of KeyValueStore.Delete. It
had the wrong behavior of still returning true, nil if a
non-existing key has been passed in.
See https://github.com/prometheus/prometheus/issues/887, which will at
least be partially fixed by this.
From the spec https://golang.org/ref/spec#Conversions:
"In all non-constant conversions involving floating-point or complex
values, if the result type cannot represent the value the conversion
succeeds but the result value is implementation-dependent."
This ended up setting the converted values to 0 on Debian's Go 1.4.2
compiler, at least on 32-bit Debians.
This commit adds the honor_labels and params arguments to the scrape
config. This allows to specify query parameters used by the scrapers
and handling scraped labels with precedence.
Change #704 introduced a regression that started reading the queue only
after potential crash recovery. When more than the queue capacity was
indexed, Prometheus deadlocked.
This change is conceptually very simple, although the diff is large. It
switches logging from "github.com/golang/glog" to
"github.com/prometheus/log", while not actually changing any log
messages. V(1)-style logging has been changed to be log.Debug*().
Also, clean up some things in the code (especially introduction of the
chunkLenWithHeader constant to avoid the same expression all over the place).
Benchmark results:
BEFORE
BenchmarkLoadChunksSequentially 5000 283580 ns/op 152143 B/op 312 allocs/op
BenchmarkLoadChunksRandomly 20000 82936 ns/op 39310 B/op 99 allocs/op
BenchmarkLoadChunkDescs 10000 110833 ns/op 15092 B/op 345 allocs/op
AFTER
BenchmarkLoadChunksSequentially 10000 146785 ns/op 152285 B/op 315 allocs/op
BenchmarkLoadChunksRandomly 20000 67598 ns/op 39438 B/op 103 allocs/op
BenchmarkLoadChunkDescs 20000 99631 ns/op 12636 B/op 192 allocs/op
Note that everything is obviously loaded from the page cache (as the
benchmark runs thousands of times with very small series files). In a
real-world scenario, I expect a larger impact, as the disk operations
will more often actually hit the disk. To load ~50 sequential chunks,
this reduces the iops from 100 seeks and 100 reads to 1 seek and 1
read.
The one central sample ingestion channel has caused a variety of
trouble. This commit removes it. Targets and rule evaluation call an
Append method directly now. To incorporate multiple storage backends
(like OpenTSDB), storage.Tee forks the Append into two different
appenders.
Note that the tsdb queue manager had its own queue anyway. It was a
queue after a queue... Much queue, so overhead...
Targets have their own little buffer (implemented as a channel) to
avoid stalling during an http scrape. But a new scrape will only be
started once the old one is fully ingested.
The contraption of three pipelined ingesters was removed. A Target is
an ingester itself now. Despite more logic in Target, things should be
less confusing now.
Also, remove lint and vet warnings in ast.go.
A number of mostly minor things:
- Rename chunk type -> chunk encoding.
- After all, do not carry around the chunk encoding to all parts of
the system, but just have one place where the encoding for new
chunks is set based on the flag. The new approach has caveats as
well, but the polution of so many method signatures is worse.
- Use the default chunk encoding for new chunks of existing
series. (Previously, only new _series_ would get chunks with the
default encoding.)
- Use an enum for chunk encoding. (But keep the version number for the
flag, for reasons discussed previously.)
- Add encoding() to the chunk interface (so that a chunk knows its own
encoding - no need to have that in a different top-level function).
- Got rid of newFollowUpChunk (which would keep the existing encoding
for all chunks of a time series). Now only use newChunk(), which
will create a chunk encoding according to the flag.
- Simplified transcodeAndAdd.
- Reordered methods of deltaEncodedChunk and doubleDeltaEncoded chunk
to match the order in the chunk interface.
- Only transcode if the chunk is not yet half full. If more than half
full, add a new chunk instead.
This checks for the basic behaviour of GetFingerprintsForLabelMatchers, that is, whether the different matcher types filter the correct fingerprints and intersections are correct.
The capacity is basically how many persisted head chunks we will count
at most while doing other things, in particular checkpointing. To
limit the amount of already counted head chunks, keep this number low,
otherwise we will easily checkpoint too often if checkpoints take long
anyway.
In that commit, the 'maintainSeries' call was accidentally removed.
This commit refactors things a bit so that there is now a clean
'maintainMemorySeries' and a 'maintainArchivedSeries' call.
Straighten the nomenclature a bit (consistently use 'drop' for
chunks and 'purge' for series/metrics).
Remove the annoying 'Completed maintenance sweep through archived
fingerprints' message if there were no archived fingerprints to do
maintenance on.