Commit graph

872 commits

Author SHA1 Message Date
Krasi Georgiev 48efdf8b81
refactor NewSegmentsRangeReader to take multi WAL ranges (#449)
* refactor NewSegmentsRangeReader to take multi WAL ranges

In case of an error when checkpointing the WAL the error doesn't show
the exact WAL index that is corrupter. this is because it uses
MultiReader to read multiply WAL files.
This refactoring allows the NewSegmentsRangeReader to take more than a
single WAL range and it reads all of the ranges by iterating each one.

this changes the logs from
create checkpoint: read segments: corruption after 4841144384 bytes:...
to
create checkpoint: read segments: corruption in segment
data/wal/00017351 at 123142208: ...

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-30 16:46:16 +02:00
Krasi Georgiev 0493efb7c5
repair wal when the record cannot be decoded (#453)
* repair wal when the record cannot be decoded

Currently repair is run only when the error happens in the reader.

A corruption can occur after the record is read and when it is decoded.
This change wraps the error at decoding as a CorruptionErr as this error
is expected to trigger a repair.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-30 13:37:04 +02:00
Krasi Georgiev 24520727a4
return an error when the last wal segment record is torn. (#451)
* return an error when the  last wal segment record is torn.

this ensures that a repair will be run when the last record in a segment
is torn.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-28 15:15:11 +02:00
Simon Pasquier fb32ef6000
Use Go modules (#454)
* *: support Go modules

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Update go.mod and Makefile.common

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-11-28 11:39:56 +01:00
Brian Brazil d50b9a5619
Reload after reading the WAL. (#460)
This causes the head to be GCed at startup,
removing any series that were read from the WAL
but have since been written to a block. In
systems with low ingestion rates, this potentially
could be many many hours of data.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-11-28 09:23:50 +00:00
Brian Brazil 407e12d051 Make MemPostings nested.
This saves memory, about a quarter of the size of the postings map
itself with high-cardinality labels (not including the post ids).

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-11-22 09:36:39 +00:00
Brian Brazil fc99b8bb3a Make index reader postings nested.
This reduces memory by only having to store the string's 16
bytes+map overheard once per label name, rather than duplicating it in every
entry for the label value.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-11-22 09:36:39 +00:00
Brian Brazil c93e261466 Reduce memory taken up by posting/symbol tables.
Reuse the string already allocated for symbols
in the posting tables.

Use a slice for symbols in v2 format.

Move symbol size logic into the index code.
Avoid duplication of lookupSymbol logic.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-11-22 09:36:39 +00:00
Tom Wilkie 88ebd749dd Make newBReader return a struct, not a pointer. (#459)
This shows up as a hot spot in profiles of queries involving lots of seeks, as each seek creates a new iterator.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-11-22 13:21:57 +05:30
Krasi Georgiev b75d702ceb
fix flaky compaction test (#458)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-21 00:33:14 +02:00
Krasi Georgiev 7f00217d77
Allow manual compaction for tests when compaction is disabled globally. (#412)
for tests we need to control when a compaction happens so with this
change automated compaction can be disabled, but allow to run it
manually it tests.

fixes failing tests in : https://github.com/prometheus/tsdb/pull/374

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-20 12:34:26 +02:00
Ganesh Vernekar 7f30395115 LabelNames() for Querier (#455)
* LabelNames() for Querier

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* nits

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-11-16 19:02:24 +01:00
Brian Brazil 41b54585d9
Use already open blocks while compacting. (#441)
This roughly halves the RAM requirements of compaction.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-11-15 12:20:54 +00:00
Krasi Georgiev 3385571ddf
buffer-panic when reading a record after recPageTerm (#429)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-14 18:43:33 +02:00
Krasi Georgiev 5a9ddeecef
fix lint errors (#439)
unexported NewMemTombstones as this returns unexported memTombstones
type which will not be shows in godoc.
Added missing comments for exported methods.
Removed unused RecordLogger,RecordReader interfaces.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-14 18:40:01 +02:00
Brian Brazil 910f3021b0
Use sampleBuf instead of maintaining lastValue. (#444)
This cuts the size of memSize by 8B.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-11-14 14:02:32 +00:00
Brian Brazil 10632217ce
Merge pull request #440 from prometheus/wal-reading
Improve WAL reading
2018-11-14 13:59:41 +00:00
nilsocket 80981a6aac FromMap(), sorts and returns instead of calling New() (#433)
Signed-off-by: nilsocket <nilsocket@gmail.com>
2018-11-14 13:43:03 +01:00
Alin Sinpalean 171fc4ab5d Limit the returned db.Querier to the requested time range (#351)
Limit the returned `db.Querier` to the requested time range. Preallocate the `baseChunkSeries.lset` and `baseChunkSeries.chks` slices to the previous series' slice sizes to avoid unnecessary grow slice reallocations.
2018-11-09 15:54:56 +02:00
Krasi Georgiev e4843938ba
add missing zero to tombstone magic number (#448)
add missing zero to tombstone magic number constant.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-09 13:37:02 +02:00
Krasi Georgiev a9470dd8d5
few more comments to explain the WAL workflow (#430)
More comments for the WAL package.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-08 10:27:16 +02:00
Ganesh Vernekar 3a08a71d86 LabelNames() method to get all unique label names (#369)
* LabelNames() method to get all unique label names

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-11-07 17:52:41 +02:00
Ganesh Vernekar a95323c021 Add license headers to missing files (#447)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-11-06 20:19:42 +02:00
Brian Brazil c7e7fd355e Only send WAL read workers the samples they need.
Calculating the modulus in each worker was a hotspot,
and meant that you had more work to do the more cores you had.
This cuts CPU usage (on my 8 core, 4 real core machine) by
33%, and walltime by 3%

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-10-31 22:52:26 +00:00
Brian Brazil a64b0d51c4 Precalculate memSeries.head
This is read far more than it changes.
This cuts ~14% off walltme and ~27% off CPU for WAL reading.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-10-31 15:49:42 +00:00
Brian Brazil d8c8e4e6e4 Keep local cache of ids.
With the various goroutines running, the locking
in getByID is notable. This cuts cpu usage by ~25%
and walltime by ~20%.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-10-31 15:49:42 +00:00
Brian Brazil f0e79ec264 Actually reuse samples in loadWAL across records.
This cuts walltime by 2.5X and CPU by 2X

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-10-31 15:49:42 +00:00
Ye Ben 23a5f09085 fix a typo dont -> dont't (#438)
Signed-off-by: yeya24 <ben.ye@daocloud.io>
2018-10-31 13:49:57 +02:00
Krasi Georgiev ae91febcbb
Update the files format README.md (#437) 2018-10-29 19:51:47 +02:00
Krasi Georgiev d804a27062
refactor util funcs to allow re-usage. (#419)
* refactor util funcs to allow reusage.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-10-25 21:06:19 +01:00
Ben a8351dc9d0 Using filepath.Join() instead of strings with slashes (#428)
fixes: https://github.com/prometheus/tsdb/issues/426
Using `filepath.Join()` instead of strings containing forward slash path delimiters (needed for non-*nix OSes), as suggested by @krasi-georgiev
2018-10-25 10:32:57 +01:00
Chris Marchbanks f4afc7dff2 Add benchmark for querying a persisted block (#425)
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2018-10-24 00:35:52 +03:00
Krasi Georgiev 1dd9a6bd29
comments about the 120samples const and link to Gorilla papers. (#423)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-10-23 13:43:06 +03:00
Krasi Georgiev 66b6b87cd4 fix windows tests (#421)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-10-22 15:19:52 -04:00
Thomas Jackson b4132df5f7 Reduce allocations for queries on HEAD (#417)
Some benchmarks for HEAD and allocate the correct slice size in LabelValues , we already know what it'll be

This is ~15% time improvement, and ~25% allocation improvement:


```
benchmark                             old ns/op     new ns/op     delta
BenchmarkHeadPostingForMatchers-4     74452         63514         -14.69%

benchmark                             old allocs     new allocs     delta
BenchmarkHeadPostingForMatchers-4     20             13             -35.00%

benchmark                             old bytes     new bytes     delta
BenchmarkHeadPostingForMatchers-4     5425          3137          -42.18%
```

Signed-off-by: Thomas Jackson <jacksontj.89@gmail.com>
2018-10-22 13:52:01 +03:00
Simon Pasquier 18af5763d8
Merge pull request #409 from simonpasquier/remove-prometheus-dep
fileutil: remove dependency on prometheus/prometheus
2018-10-16 10:15:06 +02:00
Simon Pasquier 4dd740e4cc fileutil: remove dependency on prometheus/prometheus
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-10-15 17:21:20 +02:00
Krasi Georgiev cf2af2b371
add maintainers file (#404)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-10-15 14:32:00 +03:00
Bartek Płotka 047b1b1357 compact: Verify for chunks outside of compacted time range. Added unit test for populateBlocs. (#349)
* compact: Verify for chunks outside of compacted time range. 
 Unit test for populateBlocs.

Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
Co-authored-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-10-12 12:45:19 +03:00
Krasi Georgiev d7492b9350
more descriptive var names and some more logging. (#405)
* more descriptive checkpoint var names and some more logging.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-10-11 18:23:52 +03:00
Camille Janicki 0ce41118ed Add msg parameter to Equals function in testutil (#398)
* Add msg parameter to Equals function in testutil

Co-authored-by: Chris Marchbanks <csmarchbanks@gmail.com>
Signed-off-by: Camille Janicki <camille.janicki@gmail.com>
2018-10-03 11:08:31 +03:00
Kautilya Tripathi 3fd6d2f920 more meaningful names for serializedStringTuples and stringTuples (#377)
more meaningful names for serializedStringTuples and stringTuples structs

Signed-off-by: knrt10 <tripathi.kautilya@gmail.com>
Co-authored-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-10-02 21:03:12 +03:00
Simon Pasquier 043e3bb5f9
Merge pull request #396 from codesome/new-metrics
Add new metrics.
2018-10-01 16:00:01 +02:00
Ganesh Vernekar 2638b587f6
Merge remote-tracking branch 'upstream/master' into new-metrics
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-09-30 20:20:00 +05:30
Ganesh Vernekar 61d0868966 Fix TestCorrectNumTombstones (#399)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-09-28 13:26:29 +03:00
Ganesh Vernekar 61b000ee0e
Fix review comments
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-09-28 15:00:51 +05:30
Catalin Patulea b4c7c80227 Reword chunk references, LSB usually means 'bits'. (#364)
https://en.wikipedia.org/wiki/Bit_numbering#Least_significant_bit

Signed-off-by: Catalin Patulea <catalinp@google.com>
2018-09-27 21:38:02 +03:00
Ganesh Vernekar 6e712963e2 Fix updating of NumTombstones in block.Delete(..) (#385)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-09-27 14:43:22 +03:00
Ganesh Vernekar 632dfb349e
Add new metrics.
1. 'prometheus_tsdb_wal_truncate_fail' for failed WAL truncation.
2. 'prometheus_tsdb_checkpoint_delete_fail' for failed old checkpoint delete.

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-09-25 18:50:57 +05:30
Callum Styan a971f52ac8 clean up after running repair tests (#372)
implement a `CopyDirs` helper to walk all dirs and subdirs and copy all files within to a destination folder.

Refactor `TestRepairBadIndexVersion` to use the `CopyDirs` so it works on a copy db and remove the copy after the test.
This allows running the test more than once and doesn't confuse git.

Signed-off-by: Callum Stytan <callumstyan@gmail.com>
2018-09-21 20:35:33 +03:00