Commit graph

895 commits

Author SHA1 Message Date
Krasi Georgiev 8d991bdc1e
Delete temp checkpoint folder on error. (#415) 2019-01-07 11:43:33 +03:00
Brian Brazil 296f943ec4
More efficient Merge implementation. (#486)
Avoid a tree of merge objects, which can result in
what I suspect is n^2 calls to Seek when using Without.

With 100k metrics, and a regex of ^$ in BenchmarkHeadPostingForMatchers:

Before:
BenchmarkHeadPostingForMatchers-8              1        51633185216 ns/op      29745528 B/op      200357 allocs/op

After:
BenchmarkHeadPostingForMatchers-8             10         108924996 ns/op 25715025 B/op     101748 allocs/op

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-01-03 16:59:52 +00:00
Brian Brazil b2d7bbd6b1
Move series fetches out of inner loop of SortedPostings. (#485)
With 1M series:

Before:
BenchmarkHeadPostingForMatchers-8              1        3501996117 ns/op 61311520 B/op         78 allocs/op

After:
BenchmarkHeadPostingForMatchers-8              1        1403072952 ns/op 69261568 B/op         72 allocs/op

This works out as 3X faster, as the above time includes other things.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-01-03 10:35:10 +00:00
Krasi Georgiev 48c439d26d
fix statick check errors (#475)
fix the tests for `check_license` and `staticcheck`

the static check also found some actual bugs.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-01-02 19:48:42 +03:00
Krasi Georgiev eb6586f513
rename createPopulatedBlock to createBlock and use it instead ot createEmptyBlock (#488)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-29 14:20:51 +03:00
Krasi Georgiev a90a7195d4
createPopulatedBlock returns the block dir instead of the block (#487)
It is easy to forget to close the block returned by createPopulatedBlock
which causes failures for windows so instead it returns the block dir
and which can be used by OpenBlock explicitly.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-29 10:23:56 +03:00
Krasi Georgiev 090b6852e1
remove unused PrefixMatcher (#474)
* remove unused `PrefixMatcher`

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-28 21:13:02 +03:00
Krasi Georgiev 2e0571caba
remove unused WALFlushInterval option and NopWAL struct (#468)
The WALFlushInterval is not used anywhere in the code base.
The WAL is not an interface anymore to save some lookup time so can't use NopWAL in the tests. Instead can just pass nil as the code checks for that and it is essentially a noop.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-28 20:42:46 +03:00
Brian Brazil 915d7cf937
Add a tsdbutil command to analyse churn etc. (#478)
This reports the cardinality of each label,
the total number of label pairs,
and how much series worth of time is "uncovered"
by series data. Which is basically how much churn there is.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-12-28 17:06:12 +00:00
Krasi Georgiev 6d489a1004
fix TestWALSegmentSizeOption for windows. (#482)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-19 12:40:47 +03:00
glutamatt 22e3aeb107 Add WALSegmentSize as an option of tsdb creation (#450)
Expose `WALSegmentSize` option to allow overriding the `DefaultOptions.WALSegmentSize`.
2018-12-18 21:56:51 +03:00
Krasi Georgiev 520ab7dc53
re-add the missing prometheus_tsdb_wal_corruptions_total (#473)
closes https://github.com/prometheus/tsdb/issues/471

after implementing the new WAL this metric was missing so adding it again.
Also added it in a test to make sure it works as expected.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-18 13:24:56 +03:00
Krasi Georgiev 79869d9a4d
fix race for minValidTime (#479)
it happens when truncating the WAL and another goroutine creates a new
Appender()

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-14 14:42:07 +03:00
Krasi Georgiev 79aa611d56
release 0.3.1 (#477)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-13 17:02:32 +03:00
Krasi Georgiev fd24a1c1a8
re add the missing check_license test (#476)
removed the "check_license" by mistake so adding it again.
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-13 16:58:06 +03:00
Krasi Georgiev 2962202ed3
fix windows tests (#469)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-13 16:29:29 +03:00
Krasi Georgiev a2779cc901
fix flaky tests: TestDisableAutoCompactions,TestBlockRanges (#472)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-12 14:49:03 +03:00
JoeWrightss cbfda5a801 Fixs typo: "compltely" to "completely" (#470)
Fix a small typo.
2018-12-11 23:09:17 +03:00
JoeWrightss c8c03ff85b fix some typos (#466)
Signed-off-by: JoeWrightss <zhoulin.xie@daocloud.io>
2018-12-07 12:24:02 +03:00
Krasi Georgiev b4a2103a12
Release v0.3.0 (#464) 2018-12-04 16:50:11 +03:00
Krasi Georgiev bac9cbed2e
no overlapping on compaction when an existing block is not within default boundaries. (#461)
closes https://github.com/prometheus/prometheus/issues/4643

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-04 13:30:49 +03:00
JoeWrightss 83a4ef1382 Typo fixed: "inlcuding" -> "including" (#462)
Signed-off-by: JoeWrightss <zhoulin.xie@daocloud.io>
2018-12-04 01:14:37 +03:00
Krasi Georgiev 01e8296ee1
remove opaque metrics (#457)
* more descriptive help text for the head metrics unit

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-30 20:18:12 +02:00
Krasi Georgiev 48efdf8b81
refactor NewSegmentsRangeReader to take multi WAL ranges (#449)
* refactor NewSegmentsRangeReader to take multi WAL ranges

In case of an error when checkpointing the WAL the error doesn't show
the exact WAL index that is corrupter. this is because it uses
MultiReader to read multiply WAL files.
This refactoring allows the NewSegmentsRangeReader to take more than a
single WAL range and it reads all of the ranges by iterating each one.

this changes the logs from
create checkpoint: read segments: corruption after 4841144384 bytes:...
to
create checkpoint: read segments: corruption in segment
data/wal/00017351 at 123142208: ...

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-30 16:46:16 +02:00
Krasi Georgiev 0493efb7c5
repair wal when the record cannot be decoded (#453)
* repair wal when the record cannot be decoded

Currently repair is run only when the error happens in the reader.

A corruption can occur after the record is read and when it is decoded.
This change wraps the error at decoding as a CorruptionErr as this error
is expected to trigger a repair.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-30 13:37:04 +02:00
Krasi Georgiev 24520727a4
return an error when the last wal segment record is torn. (#451)
* return an error when the  last wal segment record is torn.

this ensures that a repair will be run when the last record in a segment
is torn.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-28 15:15:11 +02:00
Simon Pasquier fb32ef6000
Use Go modules (#454)
* *: support Go modules

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Update go.mod and Makefile.common

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-11-28 11:39:56 +01:00
Brian Brazil d50b9a5619
Reload after reading the WAL. (#460)
This causes the head to be GCed at startup,
removing any series that were read from the WAL
but have since been written to a block. In
systems with low ingestion rates, this potentially
could be many many hours of data.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-11-28 09:23:50 +00:00
Brian Brazil 407e12d051 Make MemPostings nested.
This saves memory, about a quarter of the size of the postings map
itself with high-cardinality labels (not including the post ids).

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-11-22 09:36:39 +00:00
Brian Brazil fc99b8bb3a Make index reader postings nested.
This reduces memory by only having to store the string's 16
bytes+map overheard once per label name, rather than duplicating it in every
entry for the label value.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-11-22 09:36:39 +00:00
Brian Brazil c93e261466 Reduce memory taken up by posting/symbol tables.
Reuse the string already allocated for symbols
in the posting tables.

Use a slice for symbols in v2 format.

Move symbol size logic into the index code.
Avoid duplication of lookupSymbol logic.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-11-22 09:36:39 +00:00
Tom Wilkie 88ebd749dd Make newBReader return a struct, not a pointer. (#459)
This shows up as a hot spot in profiles of queries involving lots of seeks, as each seek creates a new iterator.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-11-22 13:21:57 +05:30
Krasi Georgiev b75d702ceb
fix flaky compaction test (#458)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-21 00:33:14 +02:00
Krasi Georgiev 7f00217d77
Allow manual compaction for tests when compaction is disabled globally. (#412)
for tests we need to control when a compaction happens so with this
change automated compaction can be disabled, but allow to run it
manually it tests.

fixes failing tests in : https://github.com/prometheus/tsdb/pull/374

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-20 12:34:26 +02:00
Ganesh Vernekar 7f30395115 LabelNames() for Querier (#455)
* LabelNames() for Querier

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* nits

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-11-16 19:02:24 +01:00
Brian Brazil 41b54585d9
Use already open blocks while compacting. (#441)
This roughly halves the RAM requirements of compaction.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-11-15 12:20:54 +00:00
Krasi Georgiev 3385571ddf
buffer-panic when reading a record after recPageTerm (#429)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-14 18:43:33 +02:00
Krasi Georgiev 5a9ddeecef
fix lint errors (#439)
unexported NewMemTombstones as this returns unexported memTombstones
type which will not be shows in godoc.
Added missing comments for exported methods.
Removed unused RecordLogger,RecordReader interfaces.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-14 18:40:01 +02:00
Brian Brazil 910f3021b0
Use sampleBuf instead of maintaining lastValue. (#444)
This cuts the size of memSize by 8B.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-11-14 14:02:32 +00:00
Brian Brazil 10632217ce
Merge pull request #440 from prometheus/wal-reading
Improve WAL reading
2018-11-14 13:59:41 +00:00
nilsocket 80981a6aac FromMap(), sorts and returns instead of calling New() (#433)
Signed-off-by: nilsocket <nilsocket@gmail.com>
2018-11-14 13:43:03 +01:00
Alin Sinpalean 171fc4ab5d Limit the returned db.Querier to the requested time range (#351)
Limit the returned `db.Querier` to the requested time range. Preallocate the `baseChunkSeries.lset` and `baseChunkSeries.chks` slices to the previous series' slice sizes to avoid unnecessary grow slice reallocations.
2018-11-09 15:54:56 +02:00
Krasi Georgiev e4843938ba
add missing zero to tombstone magic number (#448)
add missing zero to tombstone magic number constant.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-09 13:37:02 +02:00
Krasi Georgiev a9470dd8d5
few more comments to explain the WAL workflow (#430)
More comments for the WAL package.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-08 10:27:16 +02:00
Ganesh Vernekar 3a08a71d86 LabelNames() method to get all unique label names (#369)
* LabelNames() method to get all unique label names

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-11-07 17:52:41 +02:00
Ganesh Vernekar a95323c021 Add license headers to missing files (#447)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-11-06 20:19:42 +02:00
Brian Brazil c7e7fd355e Only send WAL read workers the samples they need.
Calculating the modulus in each worker was a hotspot,
and meant that you had more work to do the more cores you had.
This cuts CPU usage (on my 8 core, 4 real core machine) by
33%, and walltime by 3%

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-10-31 22:52:26 +00:00
Brian Brazil a64b0d51c4 Precalculate memSeries.head
This is read far more than it changes.
This cuts ~14% off walltme and ~27% off CPU for WAL reading.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-10-31 15:49:42 +00:00
Brian Brazil d8c8e4e6e4 Keep local cache of ids.
With the various goroutines running, the locking
in getByID is notable. This cuts cpu usage by ~25%
and walltime by ~20%.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-10-31 15:49:42 +00:00
Brian Brazil f0e79ec264 Actually reuse samples in loadWAL across records.
This cuts walltime by 2.5X and CPU by 2X

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-10-31 15:49:42 +00:00