Commit graph

651 commits

Author SHA1 Message Date
Sunny Klair ab02ea4de4 Validate index offset table checksums on read 2017-10-26 13:48:31 -04:00
Sunny Klair 4fdf9b195c Validate index TOC checksum on read 2017-10-25 18:12:13 -04:00
Oren Shomron 6ca5e52b69 Typo in prometheus_tsdb_head_samples_appended_total description (#188) 2017-10-25 19:12:18 +01:00
Fabian Reinartz 5d28c849c7 Merge pull request #187 from prometheus/cutchunk
Ensure near-empty chunks end at correct boundary
2017-10-25 16:52:11 +02:00
Fabian Reinartz 82796db37b Ensure near-empty chunks end at correct boundary
We were determining a chunk's end time once it was one quarter full to
compute it so all chunks have uniform number of samples.
This accidentally skipped the case where series started near the end of
a chunk range/block and never reached that threshold. As a result they
got persisted but were continued across the range.

This resulted in corrupted persisted data.
2017-10-25 09:51:55 +02:00
Fabian Reinartz d109149d17 Merge pull request #186 from prometheus/closeallblocks
Ensure readers are closed on setup failure.
2017-10-25 09:31:06 +02:00
Fabian Reinartz c93162c751 Merge pull request #185 from prometheus/walslice
Limit WAL sample processing batch size
2017-10-25 09:30:52 +02:00
Fabian Reinartz 820bd8b170 Merge pull request #181 from prometheus/gcchunk
Fix dangling chunk reference panic
2017-10-25 09:30:16 +02:00
Fabian Reinartz 6ecdaa5314 Merge pull request #183 from prometheus/walbrokensegment
Truncate segments on broken header
2017-10-25 09:27:44 +02:00
Fabian Reinartz 7bc07d80b6 Merge pull request #180 from prometheus/fixbugz
Fix race in symbol table re-creation
2017-10-24 11:39:13 +02:00
Fabian Reinartz f8e88bfdb7 Close previous block queriers on error
This ensures we close all previously opened queriers if on of the block
querier fails to open.
Also swap in new blocks before closing old ones to avoid the situation
in general. Make read locking of blocks more conservative to avoid
unnecessary retries by clients, e.g. when blocks are getting closed
before we can successfully instantiate querier against them.
2017-10-23 21:56:12 +02:00
Fabian Reinartz 9749aa2a3e head: limit WAL sample processing batch size 2017-10-23 16:22:24 +02:00
Fabian Reinartz 80055bb95b Truncate segments on broken header 2017-10-20 13:16:44 +02:00
Fabian Reinartz 9e999e8b0b Merge pull request #184 from prometheus/metricprefix
Prefix all metrics with `prometheus_*`
2017-10-20 13:01:15 +02:00
Fabian Reinartz d17104f1f0 Prefix all metrics with prometheus_* 2017-10-20 12:32:32 +02:00
Fabian Reinartz ea817e169b Return nop iterator for invalid chunk references 2017-10-20 09:43:52 +02:00
Fabian Reinartz 6dcca97755 Fix race in symbol table re-creation 2017-10-20 09:29:03 +02:00
Fabian Reinartz ebdc0f4a61 Merge pull request #179 from prometheus/tabw
Remove GetTabWriter from tsdb package
2017-10-20 08:51:07 +02:00
Fabian Reinartz e59b7b8ac4 Remove prometheus/prometheus dev-2.0 branch workaround 2017-10-19 18:24:12 +02:00
Fabian Reinartz 6a10761b50 Remove GetTabWriter from tsdb package 2017-10-19 18:14:37 +02:00
Fabian Reinartz 7f8fa07cf7 Merge pull request #176 from prometheus/races
Races
2017-10-12 15:27:08 +02:00
Fabian Reinartz efe9509d04 Merge pull request #175 from prometheus/seriesnotfound
Clarify postings index semantics, handle staleness
2017-10-12 15:22:44 +02:00
Fabian Reinartz 065f42f58c head: track number of series not found errors in metric 2017-10-12 15:25:12 +02:00
Fabian Reinartz 91a154d228 Fix block printing in cmd/main 2017-10-11 11:02:57 +02:00
Fabian Reinartz 88305e7612 Access chunk time range while holding lock 2017-10-11 10:17:59 +02:00
Fabian Reinartz 106eaf39d1 Ensure workers terminated fully before reading unknownRefs 2017-10-11 10:12:29 +02:00
Fabian Reinartz 665955da48 Clarify postings index semantics, handle staleness
The postings list index may point to series that no longer
exist during garbage collection. This clarifies that this is valid
behavior.
It would be possible, though more complex, to always keep them in sync.
However, series existance means nothing in itself as the queried time
range defines whether there's actual data. Thus our definition is sane
overall as long as drift is kept small.
2017-10-11 09:37:19 +02:00
Fabian Reinartz 6e94145515 Merge pull request #156 from BasPH/cli-ls
Add list blocks command to CLI
2017-10-11 09:22:41 +02:00
Fabian Reinartz c3e502b194 Merge pull request #168 from prometheus/fasterwal
wal: decode and process in separate threads.
2017-10-10 18:11:44 +02:00
Fabian Reinartz f347eac33d Merge pull request #171 from prometheus/safeclose
Add more verbose error handling for closing, reduce locking
2017-10-10 18:10:06 +02:00
Fabian Reinartz fb9da52b11 Add more verbose error handling for closing, reduce locking
This commit introduces error returns in various places and is explicit
about closing persisted blocks.
{Index,Chunk,Tombstone}Readers are more consistent about their Close()
method. Whenever a reader is retrieved, the corresponding close method
must eventually be called. We use this to track pending readers against
persisted blocks.

Querier's against the DB no longer hold a read lock for their entire
lifecycle. This avoids long running queriers to starve new ones when we
have to acquire a write lock when reloading blocks.
2017-10-10 12:13:37 +02:00
Fabian Reinartz d7cd5b21ea Merge pull request #169 from prometheus/muchfasterwal
wal: parallelize sample processing
2017-10-09 18:22:41 +02:00
Fabian Reinartz 7efb830d70 wal: parallelize sample processing 2017-10-09 15:22:38 +02:00
Fabian Reinartz 963a270885 Merge pull request #170 from simonpasquier/fix-typo-in-variable-names
Fix innocuous typo in variable names
2017-10-09 12:38:53 +02:00
Simon Pasquier e858c0826c Fix innocuous typo in variable names
This change fixes the variable names holding the tsdb_head_max_time and
tsdb_head_min_time metrics. It is a cosmetic change to improve the
code readability as the metric values are taken from the correct
variables.
2017-10-09 12:24:53 +02:00
Fabian Reinartz d3682d701c wal: decode and process in separate threads. 2017-10-06 14:46:52 +02:00
Fabian Reinartz dc87103807 Merge pull request #166 from prometheus/batchpostings
Load postings in batch on startup
2017-10-06 14:41:23 +02:00
Fabian Reinartz 74b0336d06 Merge pull request #167 from simonpasquier/instrument-wal-corruptions
Instrument WAL corruptions
2017-10-06 14:16:41 +02:00
Simon Pasquier 3e17cd1621 Instrument WAL corruptions 2017-10-06 13:50:20 +02:00
Fabian Reinartz cd2e26b7fc Load postings in batch on startup
This allows to insert IDs to postings out of order until
a trigger function is called. This avoids the insertion sort we usually
do which can be very costly since WAL entries are more out of order than
regular adds.
2017-10-06 10:39:10 +02:00
Goutham Veeramachaneni 4a7c39d9d8 Merge pull request #160 from dim/fix/snapshot-test
Restore snapshot functionality
2017-10-05 12:57:10 +05:30
Fabian Reinartz 27f1b8aac3 Merge pull request #162 from BasPH/fsync-duration
Instrument WAL fsync
2017-10-05 08:18:36 +02:00
Bas Harenslak 5e1c258a98 Instrument WAL fsync 2017-10-04 22:17:02 +02:00
Goutham Veeramachaneni da565f975e Merge pull request #161 from prometheus/fileutil
Remove dependency on etcd/pkg/fileutil
2017-10-04 17:08:54 +05:30
Goutham Veeramachaneni 203012169a
snapshot: Remove truncation check to restore func.
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-10-04 16:58:07 +05:30
Fabian Reinartz f04ec031eb compact: sync temporary directory 2017-10-04 12:22:09 +02:00
Fabian Reinartz 665d1fd451 Merge pull request #158 from prometheus/cachesym
Allocate and cache strings for persisted blocks
2017-10-04 12:19:50 +02:00
Fabian Reinartz bbe72dccb9 Remove dependency on etcd/pkg/fileutil 2017-10-04 10:23:41 +02:00
Dimitrij Denissenko c9fc2af6c0 Add test for snapshot 2017-10-03 13:06:26 +01:00
Goutham Veeramachaneni 3b7e71fee9 Merge pull request #159 from dim/fix/wal-flush
Use configurable WAL flush interval
2017-10-03 16:17:55 +05:30