Commit graph

21 commits

Author SHA1 Message Date
Krasi Georgiev 48efdf8b81
refactor NewSegmentsRangeReader to take multi WAL ranges (#449)
* refactor NewSegmentsRangeReader to take multi WAL ranges

In case of an error when checkpointing the WAL the error doesn't show
the exact WAL index that is corrupter. this is because it uses
MultiReader to read multiply WAL files.
This refactoring allows the NewSegmentsRangeReader to take more than a
single WAL range and it reads all of the ranges by iterating each one.

this changes the logs from
create checkpoint: read segments: corruption after 4841144384 bytes:...
to
create checkpoint: read segments: corruption in segment
data/wal/00017351 at 123142208: ...

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-30 16:46:16 +02:00
Krasi Georgiev 0493efb7c5
repair wal when the record cannot be decoded (#453)
* repair wal when the record cannot be decoded

Currently repair is run only when the error happens in the reader.

A corruption can occur after the record is read and when it is decoded.
This change wraps the error at decoding as a CorruptionErr as this error
is expected to trigger a repair.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-30 13:37:04 +02:00
Krasi Georgiev 24520727a4
return an error when the last wal segment record is torn. (#451)
* return an error when the  last wal segment record is torn.

this ensures that a repair will be run when the last record in a segment
is torn.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-28 15:15:11 +02:00
Krasi Georgiev 3385571ddf
buffer-panic when reading a record after recPageTerm (#429)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-14 18:43:33 +02:00
Krasi Georgiev a9470dd8d5
few more comments to explain the WAL workflow (#430)
More comments for the WAL package.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-08 10:27:16 +02:00
Krasi Georgiev d7492b9350
more descriptive var names and some more logging. (#405)
* more descriptive checkpoint var names and some more logging.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-10-11 18:23:52 +03:00
Camille Janicki 0ce41118ed Add msg parameter to Equals function in testutil (#398)
* Add msg parameter to Equals function in testutil

Co-authored-by: Chris Marchbanks <csmarchbanks@gmail.com>
Signed-off-by: Camille Janicki <camille.janicki@gmail.com>
2018-10-03 11:08:31 +03:00
Ganesh Vernekar 61b000ee0e
Fix review comments
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-09-28 15:00:51 +05:30
Ganesh Vernekar 632dfb349e
Add new metrics.
1. 'prometheus_tsdb_wal_truncate_fail' for failed WAL truncation.
2. 'prometheus_tsdb_checkpoint_delete_fail' for failed old checkpoint delete.

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-09-25 18:50:57 +05:30
Goutham Veeramachaneni 9c8ca47399
Fix filehandling for windows (#392)
* Fix filehandling for windows

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Fix more windows filehandling issues

Windows: Close files before deleting Checkpoints.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

Windows: Close writers in case of errors so they can be deleted

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

Windows: Close block so that it can be deleted.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

Windows: Close file to delete it

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

Windows: Close dir so that it can be deleted.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

Windows: close files so that they can be deleted.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Review feedback

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2018-09-21 11:01:22 +05:30
Goutham Veeramachaneni c7d0d10da4
Make sure WAL Repair can handle wrapped errors
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2018-09-19 12:12:25 +05:30
beorn7 3bc6c670fa Revert "Remove prometheus_ prefix from metrics"
This reverts commit 98fe30438c.

After some discussion, it was concluded that we want the full
`prometheus_tsdb_...` prefix hardcoded in the library.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2018-09-18 19:19:19 +02:00
beorn7 98fe30438c Remove prometheus_ prefix from metrics
This can now be added by users of the library as needed with the new
https://godoc.org/github.com/prometheus/client_golang/prometheus#WrapRegistererWithPrefix

Signed-off-by: beorn7 <beorn@soundcloud.com>
2018-09-17 14:54:28 +02:00
Fabian Reinartz 22cae653d8 Fixes for 32bit archs
Signed-off-by: Fabian Reinartz <freinartz@google.com>
2018-08-07 06:52:16 -04:00
Fabian Reinartz f8ec0074e7 Add Replace function
Signed-off-by: Fabian Reinartz <freinartz@google.com>
2018-08-02 17:51:49 -04:00
Fabian Reinartz b81e0fbf2a Address comments
Signed-off-by: Fabian Reinartz <freinartz@google.com>
2018-07-20 02:26:12 -04:00
Fabian Reinartz 3e76f0163e Address comments
Signed-off-by: Fabian Reinartz <freinartz@google.com>
2018-07-19 07:25:30 -04:00
Fabian Reinartz 3f538817f8 move WAL lock
Signed-off-by: Fabian Reinartz <freinartz@google.com>
2018-07-19 07:25:30 -04:00
Fabian Reinartz d951140ab8 wal: avoid heap allocation in WAL reader
The buffers we allocated were escaping to the heap, resulting in large
memory usage spikes during startup and checkpointing in Prometheus.
This attaches the buffer to the reader object to prevent this.

Signed-off-by: Fabian Reinartz <freinartz@google.com>
2018-07-19 07:25:30 -04:00
Fabian Reinartz 449a2d0db7 wal: add segment type and repair procedure
Allow to repair the WAL based on the error returned by a reader
during a full scan over all records.

Signed-off-by: Fabian Reinartz <freinartz@google.com>
2018-07-19 07:24:40 -04:00
Fabian Reinartz 8e1f97fad4 wal: add write ahead log package
This adds a new WAL that's agnostic to the actual record contents.
It's much simpler and should be more resilient than the existing one.

Signed-off-by: Fabian Reinartz <freinartz@google.com>
2018-07-19 07:24:40 -04:00