Commit graph

242 commits

Author SHA1 Message Date
Krasi Georgiev 520ab7dc53
re-add the missing prometheus_tsdb_wal_corruptions_total (#473)
closes https://github.com/prometheus/tsdb/issues/471

after implementing the new WAL this metric was missing so adding it again.
Also added it in a test to make sure it works as expected.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-18 13:24:56 +03:00
Krasi Georgiev 79869d9a4d
fix race for minValidTime (#479)
it happens when truncating the WAL and another goroutine creates a new
Appender()

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-14 14:42:07 +03:00
Krasi Georgiev 2962202ed3
fix windows tests (#469)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-13 16:29:29 +03:00
JoeWrightss cbfda5a801 Fixs typo: "compltely" to "completely" (#470)
Fix a small typo.
2018-12-11 23:09:17 +03:00
Krasi Georgiev bac9cbed2e
no overlapping on compaction when an existing block is not within default boundaries. (#461)
closes https://github.com/prometheus/prometheus/issues/4643

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-12-04 13:30:49 +03:00
Krasi Georgiev 01e8296ee1
remove opaque metrics (#457)
* more descriptive help text for the head metrics unit

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-30 20:18:12 +02:00
Krasi Georgiev 48efdf8b81
refactor NewSegmentsRangeReader to take multi WAL ranges (#449)
* refactor NewSegmentsRangeReader to take multi WAL ranges

In case of an error when checkpointing the WAL the error doesn't show
the exact WAL index that is corrupter. this is because it uses
MultiReader to read multiply WAL files.
This refactoring allows the NewSegmentsRangeReader to take more than a
single WAL range and it reads all of the ranges by iterating each one.

this changes the logs from
create checkpoint: read segments: corruption after 4841144384 bytes:...
to
create checkpoint: read segments: corruption in segment
data/wal/00017351 at 123142208: ...

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-30 16:46:16 +02:00
Krasi Georgiev 0493efb7c5
repair wal when the record cannot be decoded (#453)
* repair wal when the record cannot be decoded

Currently repair is run only when the error happens in the reader.

A corruption can occur after the record is read and when it is decoded.
This change wraps the error at decoding as a CorruptionErr as this error
is expected to trigger a repair.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-30 13:37:04 +02:00
Krasi Georgiev 5a9ddeecef
fix lint errors (#439)
unexported NewMemTombstones as this returns unexported memTombstones
type which will not be shows in godoc.
Added missing comments for exported methods.
Removed unused RecordLogger,RecordReader interfaces.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-14 18:40:01 +02:00
Brian Brazil 910f3021b0
Use sampleBuf instead of maintaining lastValue. (#444)
This cuts the size of memSize by 8B.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-11-14 14:02:32 +00:00
Brian Brazil 10632217ce
Merge pull request #440 from prometheus/wal-reading
Improve WAL reading
2018-11-14 13:59:41 +00:00
Ganesh Vernekar 3a08a71d86 LabelNames() method to get all unique label names (#369)
* LabelNames() method to get all unique label names

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-11-07 17:52:41 +02:00
Brian Brazil c7e7fd355e Only send WAL read workers the samples they need.
Calculating the modulus in each worker was a hotspot,
and meant that you had more work to do the more cores you had.
This cuts CPU usage (on my 8 core, 4 real core machine) by
33%, and walltime by 3%

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-10-31 22:52:26 +00:00
Brian Brazil a64b0d51c4 Precalculate memSeries.head
This is read far more than it changes.
This cuts ~14% off walltme and ~27% off CPU for WAL reading.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-10-31 15:49:42 +00:00
Brian Brazil d8c8e4e6e4 Keep local cache of ids.
With the various goroutines running, the locking
in getByID is notable. This cuts cpu usage by ~25%
and walltime by ~20%.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-10-31 15:49:42 +00:00
Brian Brazil f0e79ec264 Actually reuse samples in loadWAL across records.
This cuts walltime by 2.5X and CPU by 2X

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-10-31 15:49:42 +00:00
Krasi Georgiev d804a27062
refactor util funcs to allow re-usage. (#419)
* refactor util funcs to allow reusage.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-10-25 21:06:19 +01:00
Krasi Georgiev 1dd9a6bd29
comments about the 120samples const and link to Gorilla papers. (#423)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-10-23 13:43:06 +03:00
Thomas Jackson b4132df5f7 Reduce allocations for queries on HEAD (#417)
Some benchmarks for HEAD and allocate the correct slice size in LabelValues , we already know what it'll be

This is ~15% time improvement, and ~25% allocation improvement:


```
benchmark                             old ns/op     new ns/op     delta
BenchmarkHeadPostingForMatchers-4     74452         63514         -14.69%

benchmark                             old allocs     new allocs     delta
BenchmarkHeadPostingForMatchers-4     20             13             -35.00%

benchmark                             old bytes     new bytes     delta
BenchmarkHeadPostingForMatchers-4     5425          3137          -42.18%
```

Signed-off-by: Thomas Jackson <jacksontj.89@gmail.com>
2018-10-22 13:52:01 +03:00
Krasi Georgiev d7492b9350
more descriptive var names and some more logging. (#405)
* more descriptive checkpoint var names and some more logging.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-10-11 18:23:52 +03:00
Ganesh Vernekar 61b000ee0e
Fix review comments
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-09-28 15:00:51 +05:30
Ganesh Vernekar 632dfb349e
Add new metrics.
1. 'prometheus_tsdb_wal_truncate_fail' for failed WAL truncation.
2. 'prometheus_tsdb_checkpoint_delete_fail' for failed old checkpoint delete.

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-09-25 18:50:57 +05:30
Julius Volz 5ae6c60d39 Handle a bunch of unchecked errors (#365)
As discovered by "gosec".

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-09-20 11:33:52 +03:00
beorn7 3bc6c670fa Revert "Remove prometheus_ prefix from metrics"
This reverts commit 98fe30438c.

After some discussion, it was concluded that we want the full
`prometheus_tsdb_...` prefix hardcoded in the library.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2018-09-18 19:19:19 +02:00
Chris Marchbanks a8966cb53d Fix race condition between gc and committing (#378)
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2018-09-17 19:58:42 +03:00
beorn7 98fe30438c Remove prometheus_ prefix from metrics
This can now be added by users of the library as needed with the new
https://godoc.org/github.com/prometheus/client_golang/prometheus#WrapRegistererWithPrefix

Signed-off-by: beorn7 <beorn@soundcloud.com>
2018-09-17 14:54:28 +02:00
Ganesh Vernekar 2945db18ca Changes in series names (and types) exposed (#376)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-09-12 14:39:02 +05:30
Fabian Reinartz 76990518d3
Merge pull request #340 from prometheus/wal_migrate
Migrate write ahead log
2018-08-02 16:58:17 -05:00
Fabian Reinartz 45071c657c Properly initialize head time
This fixes various issues when initializing the head time range
under different starting conditions.

Signed-off-by: Fabian Reinartz <freinartz@google.com>
2018-07-19 07:41:02 -04:00
Fabian Reinartz 1a5573b4ce Migrate write ahead log
On startup, rewrite the old write ahead log into the new format once.

Signed-off-by: Fabian Reinartz <freinartz@google.com>
2018-07-19 07:34:18 -04:00
Fabian Reinartz 3e76f0163e Address comments
Signed-off-by: Fabian Reinartz <freinartz@google.com>
2018-07-19 07:25:30 -04:00
Fabian Reinartz def912ce0e Integrate new WAL and checkpoints
Remove the old WAL and drop in the new one

Signed-off-by: Fabian Reinartz <freinartz@google.com>
2018-07-19 07:25:30 -04:00
codwu 667e539a7a Merge branch 'master' of https://github.com/prometheus/tsdb into tsdb-delete 2018-07-06 20:21:32 +08:00
Benoît Knecht 1e1b2e163d Make interval overlap comparisons more explicit
Blocks are half-open intervals [a, b), while all other intervals
(chunks, head, ...) are closed intervals [a, b].

Make that distinction explicit by defining `OverlapsClosedInterval()`
methods for blocks and chunks, and using them in place of the more
generic `intervalOverlap()` function.

This change also fixes `db.Querier()` and `db.Delete()`, which could
previously return one extraneous block at the end of the specified
interval.

Signed-off-by: Benoît Knecht <benoit.knecht@fsfe.org>
2018-07-02 10:35:08 +02:00
Fabian Reinartz ea607b9fc3 Log series on rollback
Signed-off-by: Fabian Reinartz <freinartz@google.com>
2018-06-28 09:04:07 -04:00
codwu 84a45cb79a add rwmutex to prevent concurrent map read when delete series
Signed-off-by: codwu <wuhan9087@163.com>
2018-06-08 19:52:01 +08:00
Goutham Veeramachaneni 90d55672d1
Merge pull request #286 from mattbostock/rename_high_timestamp
head: Rename highTimestamp to maxt
2018-04-05 17:41:48 +05:30
Matt Bostock 55ea5ae6b1 head: Rename highTimestamp to maxt
`maxt` seems more consistent with `mint` and other uses of `maxt`
elsewhere in the code, if I've understand the intent correctly.
2018-02-21 16:01:12 +00:00
Simon Pasquier 79defa54df Fix crash when a series has no block 2018-02-21 16:45:06 +01:00
Matt Bostock bc97c6c6be Misc fixes (#285)
* Fix typo in head.go

pralellize -> paralellize

* Remove commented out code

It's dead code, remove it.

* Correct reference to sample buffer
2018-02-21 16:38:59 +01:00
Matt Bostock 50211e89ad Fix typos in comments (#254)
a the -> the
timestmap -> timestamp
badded -> padded
its -> it is
callers -> caller's
2018-01-13 17:51:50 +00:00
Fabian Reinartz 67f0ca8f0e Move index and chunk encoders to own packages 2017-12-21 11:27:54 +01:00
Goutham Veeramachaneni 239cbae154
Merge pull request #228 from Gouthamve/not-matchers
Select series with label unset for != and !~
2017-12-21 12:10:51 +05:30
Goutham Veeramachaneni 3158b03e6c Select series with label unset for != and !~
Fixes https://github.com/prometheus/prometheus/issues/3575

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2017-12-19 02:47:31 +00:00
Goutham Veeramachaneni 05d62ca842 Make sure gc'ed chunks are handled properly
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2017-12-14 19:48:39 +00:00
Fabian Reinartz f1512a368a Expose ChunkSeriesSet and lookups methods. 2017-11-13 14:02:32 +01:00
Fabian Reinartz 3ef4326114 Refactor tombstone reader types 2017-11-13 13:38:07 +01:00
Fabian Reinartz e5ce2bef43 Add explicit error to Querier.Select
This has been a frequent source of debugging pain since errors are
potentially delayed to a much later point. They bubble up in an
unrelated execution path.
2017-11-13 12:16:58 +01:00
Julius Volz 1dad3370fd Close WAL when closing the DB
Also, the `wal` field of the `DB` was not used anywhere, so this removes
it.
2017-11-11 14:56:23 +01:00
Oren Shomron 6ca5e52b69 Typo in prometheus_tsdb_head_samples_appended_total description (#188) 2017-10-25 19:12:18 +01:00