Commit graph

265 commits

Author SHA1 Message Date
Krasi Georgiev 6f9bbc7253
Open db in Read only mode (#588)
* Added db read only open mode and use it for the tsdb cli.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-07-23 11:04:48 +03:00
Chris Marchbanks 0cd46f8762
Add logging during WAL replay
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2019-07-13 11:10:44 -06:00
Ganesh Vernekar b1cd829030
Reuse Chunk Iterator (#642)
* Reset method for chunkenc.Iterator

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Reset method only for XORIterator

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Use Reset(...) in querier.go

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Reuse deletedIterator

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Another way of reusing chunk iterators

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Unexport xorIterator

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix memSeries.iterator(...)

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Add some comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-07-09 15:19:34 +05:30
Krasi Georgiev 31f7990d1d
Re-encode chunks that are still being appended to when snapshoti… (#641)
* re encode all head chunks outside the time range.

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
2019-07-03 13:47:31 +03:00
Krasi Georgiev 69740485c1
move the wal repair logic in db.Open (#633)
* move the wal repair logic in db.Open

This is to allow opening a wal in a read oly mode without triggering a
repair.

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
2019-06-14 17:39:22 +02:00
beorn7 90a7612df3 Make objectives of Summaries explicit
With the next release of client_golang, Summaries will not have
objectives by default.

As it turns out, for prometheus_tsdb_head_gc_duration_seconds and
prometheus_tsdb_wal_truncate_duration_seconds, the objective-less
default makes more sense then the current default.

To make sure we do the right thing before and after the upcoming
release of client_golang, I have set the objectives explicitly
wherever that was not the case so far:

- prometheus_tsdb_head_gc_duration_seconds and
  prometheus_tsdb_wal_truncate_duration_seconds now have no objectives
  explicitly.
- prometheus_tsdb_wal_fsync_duration_seconds now explicitly uses the
  previous default objectives.

Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-14 14:17:24 +02:00
Brian Brazil be4edbe174
Start a new WAL segement on head truncation. (#605)
This reduces disk space usage to not be a minimum of 3 128MB files
in small setups. This will possibly also help debug wal data issues,
by making things a bit more deterministic.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-06-07 11:35:02 +01:00
Brian Brazil 149c5dc73a
Handle multiple refs for the same series when WAL reading. (#623)
This can happen if a given series is created/truncated/recreated.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-06-06 14:28:54 +01:00
Callum Styan 562e93e8e6 Always create a new clean segment when starting the WAL. (#608)
* Always create a new clean segment when starting the WAL.
* Ensure we flush the last page after repairing and before recreating the
new segment in Repair.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-05-24 19:33:27 +01:00
Brian Brazil 30d0ea59d7 Don't crash on an unknown tombstone ref. (#604)
Fixes https://github.com/prometheus/prometheus/issues/5562

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-05-16 16:36:44 +03:00
Brian Brazil 6ac81cc7a9
Correctly handle empty labels. (#594)
Currently a time series with empty labels is not treated the same
as one with missing labels. Currently this can only come from
ALERTS&ALERT_FOR_STATE so it's unlikely anyone has actually hit it.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-05-07 11:00:16 +01:00
Brian Brazil dfed85e4a4
Keep series that are still in WAL in checkpoints (#577)
If all the samples are deleted for a series,
we should still keep the series in the WAL as
anything else reading the WAL will still care
about it in order to understand the samples.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-04-09 14:16:24 +01:00
zhulongcheng aed16621c0 Add Head.compactable method (#542)
* Add Head.compactable method

Signed-off-by: zhulongcheng <zhulongcheng.me@gmail.com>
2019-04-01 11:19:06 +03:00
zhulongcheng 837ae9aaa0 Update comment for ErrOutOfOrderSample (#563)
Signed-off-by: zhulongcheng <zhulongcheng.me@gmail.com>
2019-03-21 10:53:39 +02:00
zhulongcheng 62cfe4446f Make Head.symbols map with size hint (#552)
To reduce the number of times the map is resized

Signed-off-by: zhulongcheng <zhulongcheng.me@gmail.com>
2019-03-20 10:43:07 +02:00
zhulongcheng 95648b33c4 Fix a typo in head.go (#553)
Signed-off-by: zhulongcheng <zhulongcheng.me@gmail.com>
2019-03-18 10:37:20 +02:00
Alec e7436e13f0 Merge encoding_helpers.go to tsdbutil (#526)
remove duplicate encoding helper funcs and move to own package so they can be reused.

Signed-off-by: naivewong <867245430@qq.com>
2019-02-22 19:11:11 +02:00
Ganesh Vernekar c59ed492b2 Vertical query merging and compaction (#370)
* Vertical series iterator

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Select overlapped blocks first in compactor Plan()

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Added vertical compaction

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Code cleanup and comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix tests

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Add benchmark for compaction

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Perform vertical compaction only when blocks are overlapping.

Actions for vertical compaction:
* Sorting chunk metas
* Calling chunks.MergeOverlappingChunks on the chunks

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Benchmark for vertical compaction

* BenchmarkNormalCompaction => BenchmarkCompaction
* Moved the benchmark from db_test.go to compact_test.go

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Benchmark for query iterator and seek for non overlapping blocks

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Vertical query merge only for overlapping blocks

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Simplify logging in Compact(...)

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Updated CHANGELOG.md

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Calculate overlapping inside populateBlock

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* MinTime and MaxTime for BlockReader.

Using this to find overlapping blocks in populateBlock()

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Sort blocks w.r.t. MinTime in reload()

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Log about overlapping in LeveledCompactor.write() instead of returning bool

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Log about overlapping inside LeveledCompactor.populateBlock()

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Refactor createBlock to take optional []Series

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* review1

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>

* Updated CHANGELOG and minor nits

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* nits

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Updated CHANGELOG

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Refactor iterator and seek benchmarks for Querier.

Also has as overlapping blocks.

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Additional test case

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* genSeries takes optional labels. Updated BenchmarkQueryIterator and BenchmarkQuerySeek.

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Split genSeries into genSeries and populateSeries

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Check error in benchmark

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Warn about overlapping blocks in reload()

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-02-14 14:29:41 +01:00
Krasi Georgiev 0b72f9af4c
Merge pull request #270 from codesome/master
Head: don't create stones, delete samples directly
2019-02-08 12:35:01 +02:00
Ganesh Vernekar d7e505db34
Dont store stones in head, delete samples directly
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-01-08 22:38:41 +05:30
Simon Pasquier d5d7a097e1 Update Makefile.common
This change also uses the latest staticcheck version which comes with
new verifications, hence some clean up in the code.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-01-04 16:08:43 +01:00
Brian Brazil b2d7bbd6b1
Move series fetches out of inner loop of SortedPostings. (#485)
With 1M series:

Before:
BenchmarkHeadPostingForMatchers-8              1        3501996117 ns/op 61311520 B/op         78 allocs/op

After:
BenchmarkHeadPostingForMatchers-8              1        1403072952 ns/op 69261568 B/op         72 allocs/op

This works out as 3X faster, as the above time includes other things.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-01-03 10:35:10 +00:00
Krasi Georgiev 48c439d26d
fix statick check errors (#475)
fix the tests for `check_license` and `staticcheck`

the static check also found some actual bugs.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-01-02 19:48:42 +03:00
Krasi Georgiev 520ab7dc53
re-add the missing prometheus_tsdb_wal_corruptions_total (#473)
closes https://github.com/prometheus/tsdb/issues/471

after implementing the new WAL this metric was missing so adding it again.
Also added it in a test to make sure it works as expected.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-18 13:24:56 +03:00
Krasi Georgiev 79869d9a4d
fix race for minValidTime (#479)
it happens when truncating the WAL and another goroutine creates a new
Appender()

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-14 14:42:07 +03:00
Krasi Georgiev 2962202ed3
fix windows tests (#469)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-13 16:29:29 +03:00
JoeWrightss cbfda5a801 Fixs typo: "compltely" to "completely" (#470)
Fix a small typo.
2018-12-11 23:09:17 +03:00
Krasi Georgiev bac9cbed2e
no overlapping on compaction when an existing block is not within default boundaries. (#461)
closes https://github.com/prometheus/prometheus/issues/4643

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-04 13:30:49 +03:00
Krasi Georgiev 01e8296ee1
remove opaque metrics (#457)
* more descriptive help text for the head metrics unit

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-30 20:18:12 +02:00
Krasi Georgiev 48efdf8b81
refactor NewSegmentsRangeReader to take multi WAL ranges (#449)
* refactor NewSegmentsRangeReader to take multi WAL ranges

In case of an error when checkpointing the WAL the error doesn't show
the exact WAL index that is corrupter. this is because it uses
MultiReader to read multiply WAL files.
This refactoring allows the NewSegmentsRangeReader to take more than a
single WAL range and it reads all of the ranges by iterating each one.

this changes the logs from
create checkpoint: read segments: corruption after 4841144384 bytes:...
to
create checkpoint: read segments: corruption in segment
data/wal/00017351 at 123142208: ...

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-30 16:46:16 +02:00
Krasi Georgiev 0493efb7c5
repair wal when the record cannot be decoded (#453)
* repair wal when the record cannot be decoded

Currently repair is run only when the error happens in the reader.

A corruption can occur after the record is read and when it is decoded.
This change wraps the error at decoding as a CorruptionErr as this error
is expected to trigger a repair.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-30 13:37:04 +02:00
Krasi Georgiev 5a9ddeecef
fix lint errors (#439)
unexported NewMemTombstones as this returns unexported memTombstones
type which will not be shows in godoc.
Added missing comments for exported methods.
Removed unused RecordLogger,RecordReader interfaces.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-14 18:40:01 +02:00
Brian Brazil 910f3021b0
Use sampleBuf instead of maintaining lastValue. (#444)
This cuts the size of memSize by 8B.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-11-14 14:02:32 +00:00
Brian Brazil 10632217ce
Merge pull request #440 from prometheus/wal-reading
Improve WAL reading
2018-11-14 13:59:41 +00:00
Ganesh Vernekar 3a08a71d86 LabelNames() method to get all unique label names (#369)
* LabelNames() method to get all unique label names

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-11-07 17:52:41 +02:00
Brian Brazil c7e7fd355e Only send WAL read workers the samples they need.
Calculating the modulus in each worker was a hotspot,
and meant that you had more work to do the more cores you had.
This cuts CPU usage (on my 8 core, 4 real core machine) by
33%, and walltime by 3%

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-10-31 22:52:26 +00:00
Brian Brazil a64b0d51c4 Precalculate memSeries.head
This is read far more than it changes.
This cuts ~14% off walltme and ~27% off CPU for WAL reading.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-10-31 15:49:42 +00:00
Brian Brazil d8c8e4e6e4 Keep local cache of ids.
With the various goroutines running, the locking
in getByID is notable. This cuts cpu usage by ~25%
and walltime by ~20%.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-10-31 15:49:42 +00:00
Brian Brazil f0e79ec264 Actually reuse samples in loadWAL across records.
This cuts walltime by 2.5X and CPU by 2X

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-10-31 15:49:42 +00:00
Krasi Georgiev d804a27062
refactor util funcs to allow re-usage. (#419)
* refactor util funcs to allow reusage.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-10-25 21:06:19 +01:00
Krasi Georgiev 1dd9a6bd29
comments about the 120samples const and link to Gorilla papers. (#423)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-10-23 13:43:06 +03:00
Thomas Jackson b4132df5f7 Reduce allocations for queries on HEAD (#417)
Some benchmarks for HEAD and allocate the correct slice size in LabelValues , we already know what it'll be

This is ~15% time improvement, and ~25% allocation improvement:


```
benchmark                             old ns/op     new ns/op     delta
BenchmarkHeadPostingForMatchers-4     74452         63514         -14.69%

benchmark                             old allocs     new allocs     delta
BenchmarkHeadPostingForMatchers-4     20             13             -35.00%

benchmark                             old bytes     new bytes     delta
BenchmarkHeadPostingForMatchers-4     5425          3137          -42.18%
```

Signed-off-by: Thomas Jackson <jacksontj.89@gmail.com>
2018-10-22 13:52:01 +03:00
Krasi Georgiev d7492b9350
more descriptive var names and some more logging. (#405)
* more descriptive checkpoint var names and some more logging.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-10-11 18:23:52 +03:00
Ganesh Vernekar 61b000ee0e
Fix review comments
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-09-28 15:00:51 +05:30
Ganesh Vernekar 632dfb349e
Add new metrics.
1. 'prometheus_tsdb_wal_truncate_fail' for failed WAL truncation.
2. 'prometheus_tsdb_checkpoint_delete_fail' for failed old checkpoint delete.

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-09-25 18:50:57 +05:30
Julius Volz 5ae6c60d39 Handle a bunch of unchecked errors (#365)
As discovered by "gosec".

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-09-20 11:33:52 +03:00
beorn7 3bc6c670fa Revert "Remove prometheus_ prefix from metrics"
This reverts commit 98fe30438c.

After some discussion, it was concluded that we want the full
`prometheus_tsdb_...` prefix hardcoded in the library.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2018-09-18 19:19:19 +02:00
Chris Marchbanks a8966cb53d Fix race condition between gc and committing (#378)
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2018-09-17 19:58:42 +03:00
beorn7 98fe30438c Remove prometheus_ prefix from metrics
This can now be added by users of the library as needed with the new
https://godoc.org/github.com/prometheus/client_golang/prometheus#WrapRegistererWithPrefix

Signed-off-by: beorn7 <beorn@soundcloud.com>
2018-09-17 14:54:28 +02:00
Ganesh Vernekar 2945db18ca Changes in series names (and types) exposed (#376)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-09-12 14:39:02 +05:30