* Test Longer Tests in Travis
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
* Make test Target Run All Tests
* Add test-short to run short tests
test is running all the tests now as we are running make tests in
CircleCI and I think the base image is shared across Prometheus Org.
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
* Remove Empty Line
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
- checkpointSeriesMapAndHeads accepts a context now to allow
cancelling.
- If a shutdown is initiated, cancel the ongoing checkpoint. (We will
create a final checkpoint anyway.)
- Always wait for at least as long as the last checkpoint took before
starting the next checkpoint (to cap the time spending checkpointing
at 50%).
- If an error has occurred during checkpointing, don't bother to sync
the write.
- Make sure the temporary checkpoint file is deleted, even if an error
has occurred.
- Clean up the checkpoint loop a bit. (The concurrent Timer.Reset(0)
call might have cause a race.)
Fixes#2480. For certain definition of "fixes".
This is something that should never happen. Sadly, it does happen,
albeit extremely rarely. This could be some weird cornercase we
haven't covered yet. Or it happens as a consequesnce of data
corruption or a crash recovery gone bad.
This is not a "real" fix as we don't know the root cause of the
incident reported in #2480. However, this makes sure the server does
not crash, but deals gracefully with the problem: The series in
question is quarantined, which even makes it available for forensics.
An unopenable archived_fingerprint_to_timerange is simply deleted and
will be rebuilt during crash recovery (wich can then take quite some time).
An unopenable archived_fingerprint_to_metric is not deleted but
instructions to the user are logged. A deletion has to be done by the
user explicitly as it means losing all archived series (and a repair
with a 3rd party tool might still be possible).
Sadly, we have a number of places where we use varint encoding for
numbers that cannot be negative. We could have saved a bit by using
uvarint encoding. On the bright side, we now have a 50% chance to
detect data corruption. :-/
Fixes#1800 and #2492.