Commit graph

918 commits

Author SHA1 Message Date
Krasi Georgiev 6c34eb8b63 nits
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-01-30 12:17:15 +02:00
Krasi Georgiev 315de4c782 fix windows tests
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-01-30 11:40:40 +02:00
Krasi Georgiev 1603222bbc small refactor of openTestDB to handle errors properly.
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-01-30 11:40:12 +02:00
Krasi Georgiev 7245c6dc33 Merge remote-tracking branch 'upstream/master' into delete-compact-block-on-reload-error
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-01-30 11:39:19 +02:00
Krasi Georgiev ee99718ff6
rename chunk reader vars to make it easier to follow. (#508)
* rename chunk reader vars to make it easyer to understand.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-01-29 20:46:12 +03:00
radek_lesniewski eb5034d5b0 Additional logging in compact.go - logged time needed for writing blocks (#505)
* Additional logging in compact.go - logged time needed for writing blocks to disk

Signed-off-by: Radoslaw Lesniewski <Radoslaw.Lesniewski@sabre.com>

* Additional logging in compact.go - code formatted

Signed-off-by: Radoslaw Lesniewski <Radoslaw.Lesniewski@sabre.com>
2019-01-29 16:53:53 +05:30
Krasi Georgiev 3ec08eac50 use camelcase for rangeToTriggerCompaction
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-01-29 10:32:59 +02:00
Krasi Georgiev 53c18e7a41 use a global indexFilename constant
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-01-29 10:32:32 +02:00
Krasi Georgiev b521559c3b do a proper cleanup for a failed reload after a compaction
a failed reload immediately after a compaction should delete the
resulting block to avoid creating blocks with the same time range.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-01-29 10:31:54 +02:00
Ganesh Vernekar a29ab50a20
Merge pull request #510 from yeya24/patch/fixtypo
fix two typos in db_test
2019-01-29 11:51:03 +05:30
yeya24 6181d1f18f fix two typos in db_test
Signed-off-by: yeya24 <ben.ye@daocloud.io>
2019-01-29 10:25:12 +08:00
Krasi Georgiev dac2b97dfd
make createBlock more generic so it can be used in other tests. (#489)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-01-28 14:24:49 +03:00
Alec 051a7ae1a7 Missing the length of the encoding byte when calling b.Range
Signed-off-by: naivewong <867245430@qq.com>
2019-01-28 13:33:44 +03:00
Krasi Georgiev a44d15798e
remove a changelog double entry (#507)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-01-23 17:10:19 +03:00
Brian Brazil 5db162568b Remove _total from prometheus_tsdb_storage_blocks_bytes (#506)
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-01-23 16:46:58 +03:00
Krasi Georgiev cc277e398b
fix the refs logic for the addFast path for createBlock (#504)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-01-21 14:56:26 +03:00
Krasi Georgiev 10ba228e6b
release 0.4.0 (#501)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-01-18 11:42:59 +03:00
Ganesh Vernekar 1a9d08adc5 Don't write empty blocks (#374)
* Dont write empty blocks when a compaction results in a block with no samples.

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-01-18 11:35:16 +03:00
Callum Styan 3929359302
add live reader for WAL (#481)
* add live reader for WAL

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-01-16 10:09:08 -08:00
mknapphrt ebf5d74325 Added storage size based retention method and new metrics (#343)
Added methods needed to retain data based on a byte limitation rather than time. Limitation is only applied if the flag is set (defaults to 0). Both blocks that are older than the retention period and the blocks that make the size of the storage too large are removed.

2 new metrics for keeping track of the size of the local storage folder and the amount of times data has been deleted because the size restriction was exceeded.
Signed-off-by: Mark Knapp <mknapp@hudson-trading.com>
2019-01-16 13:03:52 +03:00
naivewong bff5aa4d21 Missing the len of crc32 when calculating maxLen in WriteChunks (#494)
Signed-off-by: naivewong <867245430@qq.com>
2019-01-14 20:58:03 +05:30
Ye Ben a360aa3e86 change variable name metrics to labels (#496)
Signed-off-by: yeya24 <ben.ye@daocloud.io>
2019-01-14 11:44:32 +03:00
Bartek Płotka c065fa6957 Exposed helper methods for reading index bytes. (#492)
Changes:
* Make `NewReader` method useful. It was impossible to use it, because closer was always nil.
* ReadSymbols, TOC and ReadOffsetTable are not public functions (used by Thanos).
* decbufXXX are now functions.
* More verbose errors.
* Removed unused crc32 field.
* Some var name changes to make it more verbose:
  * symbols -> allocatedSymbols
  * symbolsSlice -> symbolsV1
  * symbols -> symbolsV2
  *
* Pre-calculate symbolsTableSize.
* Initialized symbols for Symbols() method with valid length.
* Added test for Symbol method.
* Made Decoder LookupSymbol method public. Kept Decode public as it is useful as helper from index package.

Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2019-01-11 17:31:26 +00:00
Krasi Georgiev 8d991bdc1e
Delete temp checkpoint folder on error. (#415) 2019-01-07 11:43:33 +03:00
Brian Brazil 296f943ec4
More efficient Merge implementation. (#486)
Avoid a tree of merge objects, which can result in
what I suspect is n^2 calls to Seek when using Without.

With 100k metrics, and a regex of ^$ in BenchmarkHeadPostingForMatchers:

Before:
BenchmarkHeadPostingForMatchers-8              1        51633185216 ns/op      29745528 B/op      200357 allocs/op

After:
BenchmarkHeadPostingForMatchers-8             10         108924996 ns/op 25715025 B/op     101748 allocs/op

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-01-03 16:59:52 +00:00
Brian Brazil b2d7bbd6b1
Move series fetches out of inner loop of SortedPostings. (#485)
With 1M series:

Before:
BenchmarkHeadPostingForMatchers-8              1        3501996117 ns/op 61311520 B/op         78 allocs/op

After:
BenchmarkHeadPostingForMatchers-8              1        1403072952 ns/op 69261568 B/op         72 allocs/op

This works out as 3X faster, as the above time includes other things.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-01-03 10:35:10 +00:00
Krasi Georgiev 48c439d26d
fix statick check errors (#475)
fix the tests for `check_license` and `staticcheck`

the static check also found some actual bugs.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-01-02 19:48:42 +03:00
Krasi Georgiev eb6586f513
rename createPopulatedBlock to createBlock and use it instead ot createEmptyBlock (#488)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-29 14:20:51 +03:00
Krasi Georgiev a90a7195d4
createPopulatedBlock returns the block dir instead of the block (#487)
It is easy to forget to close the block returned by createPopulatedBlock
which causes failures for windows so instead it returns the block dir
and which can be used by OpenBlock explicitly.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-29 10:23:56 +03:00
Krasi Georgiev 090b6852e1
remove unused PrefixMatcher (#474)
* remove unused `PrefixMatcher`

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-28 21:13:02 +03:00
Krasi Georgiev 2e0571caba
remove unused WALFlushInterval option and NopWAL struct (#468)
The WALFlushInterval is not used anywhere in the code base.
The WAL is not an interface anymore to save some lookup time so can't use NopWAL in the tests. Instead can just pass nil as the code checks for that and it is essentially a noop.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-28 20:42:46 +03:00
Brian Brazil 915d7cf937
Add a tsdbutil command to analyse churn etc. (#478)
This reports the cardinality of each label,
the total number of label pairs,
and how much series worth of time is "uncovered"
by series data. Which is basically how much churn there is.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-12-28 17:06:12 +00:00
Krasi Georgiev 6d489a1004
fix TestWALSegmentSizeOption for windows. (#482)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-19 12:40:47 +03:00
glutamatt 22e3aeb107 Add WALSegmentSize as an option of tsdb creation (#450)
Expose `WALSegmentSize` option to allow overriding the `DefaultOptions.WALSegmentSize`.
2018-12-18 21:56:51 +03:00
Krasi Georgiev 520ab7dc53
re-add the missing prometheus_tsdb_wal_corruptions_total (#473)
closes https://github.com/prometheus/tsdb/issues/471

after implementing the new WAL this metric was missing so adding it again.
Also added it in a test to make sure it works as expected.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-18 13:24:56 +03:00
Krasi Georgiev 79869d9a4d
fix race for minValidTime (#479)
it happens when truncating the WAL and another goroutine creates a new
Appender()

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-14 14:42:07 +03:00
Krasi Georgiev 79aa611d56
release 0.3.1 (#477)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-13 17:02:32 +03:00
Krasi Georgiev fd24a1c1a8
re add the missing check_license test (#476)
removed the "check_license" by mistake so adding it again.
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-13 16:58:06 +03:00
Krasi Georgiev 2962202ed3
fix windows tests (#469)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-13 16:29:29 +03:00
Krasi Georgiev a2779cc901
fix flaky tests: TestDisableAutoCompactions,TestBlockRanges (#472)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-12 14:49:03 +03:00
JoeWrightss cbfda5a801 Fixs typo: "compltely" to "completely" (#470)
Fix a small typo.
2018-12-11 23:09:17 +03:00
JoeWrightss c8c03ff85b fix some typos (#466)
Signed-off-by: JoeWrightss <zhoulin.xie@daocloud.io>
2018-12-07 12:24:02 +03:00
Krasi Georgiev b4a2103a12
Release v0.3.0 (#464) 2018-12-04 16:50:11 +03:00
Krasi Georgiev bac9cbed2e
no overlapping on compaction when an existing block is not within default boundaries. (#461)
closes https://github.com/prometheus/prometheus/issues/4643

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-04 13:30:49 +03:00
JoeWrightss 83a4ef1382 Typo fixed: "inlcuding" -> "including" (#462)
Signed-off-by: JoeWrightss <zhoulin.xie@daocloud.io>
2018-12-04 01:14:37 +03:00
Krasi Georgiev 01e8296ee1
remove opaque metrics (#457)
* more descriptive help text for the head metrics unit

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-30 20:18:12 +02:00
Krasi Georgiev 48efdf8b81
refactor NewSegmentsRangeReader to take multi WAL ranges (#449)
* refactor NewSegmentsRangeReader to take multi WAL ranges

In case of an error when checkpointing the WAL the error doesn't show
the exact WAL index that is corrupter. this is because it uses
MultiReader to read multiply WAL files.
This refactoring allows the NewSegmentsRangeReader to take more than a
single WAL range and it reads all of the ranges by iterating each one.

this changes the logs from
create checkpoint: read segments: corruption after 4841144384 bytes:...
to
create checkpoint: read segments: corruption in segment
data/wal/00017351 at 123142208: ...

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-30 16:46:16 +02:00
Krasi Georgiev 0493efb7c5
repair wal when the record cannot be decoded (#453)
* repair wal when the record cannot be decoded

Currently repair is run only when the error happens in the reader.

A corruption can occur after the record is read and when it is decoded.
This change wraps the error at decoding as a CorruptionErr as this error
is expected to trigger a repair.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-30 13:37:04 +02:00
Krasi Georgiev 24520727a4
return an error when the last wal segment record is torn. (#451)
* return an error when the  last wal segment record is torn.

this ensures that a repair will be run when the last record in a segment
is torn.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-28 15:15:11 +02:00
Simon Pasquier fb32ef6000
Use Go modules (#454)
* *: support Go modules

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Update go.mod and Makefile.common

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-11-28 11:39:56 +01:00