* storage: Added Chunks{Queryable/Querier/SeriesSet/Series/Iteratable. Added generic Merge{SeriesSet/Querier} implementation.
## Rationales:
In many places (e.g. chunk Remote read, Thanos Receive fetching chunk from TSDB), we operate on encoded chunks not samples.
This means that we unnecessary decode/encode, wasting CPU, time and memory.
This PR adds chunk iterator interfaces and makes the merge code to be reused between both seriesSets
I will make the use of it in following PR inside tsdb itself. For now fanout implements it and mergers.
All merges now also allows passing series mergers. This opens doors for custom deduplications other than TSDB vertical ones (e.g. offline one we have in Thanos).
## Changes
* Added Chunk versions of all iterating methods. It all starts in Querier/ChunkQuerier. The plan is that
Storage will implement both chunked and samples.
* Added Seek to chunks.Iterator interface for iterating over chunks.
* NewMergeChunkQuerier was added; Both this and NewMergeQuerier are now using generigMergeQuerier to share the code. Generic code was added.
* Improved tests.
* Added some TODO for further simplifications in next PRs.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Addressed Brian's comments.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Moved s/Labeled/SeriesLabels as per Krasi suggestion.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Addressed Krasi's comments.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Second iteration of Krasi comments.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Another round of comments.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
This is technically BREAKING CHANGE, but it was like this from the beginning: I just notice that we rely in
Prometheus on remote read being sorted. This is because we use selected data from remote reads in MergeSeriesSet
which rely on sorting.
I found during work on https://github.com/prometheus/prometheus/pull/5882 that
we do so many repetitions because of this, for not good reason. I think
I found a good balance between convenience and readability with just one method.
Smaller the interface = better.
Also I don't know what TestSelectSorted was testing, but now it's testing sorting.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Fix bug with WAL watcher and Live Reader metrics usage.
Calling NewXMetrics when creating a Watcher or LiveReader results in a
registration error, which we're ignoring, and as a result other than the
first Watcher/Reader created, we had no metrics for either. So we would
only have metrics like Watcher Records Read for the first remote write
config in a users config file.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
I think the previous behavior is problematic as it will leave
`memSeries` around that still have `pendingCommit` set to `true`.
The only case where this can happen in this code path is a failure to
write to the WAL, in which case we are probably in trouble anyway. I
believe, however, we should still try to do the right thing and do the
full rollback. This will implicitly try to write to the WAL again, but
this time without samples, which may even succeed. (But we propagate
the previous error in any case.)
This also adds `a.head.putSeriesBuffer(a.sampleSeries)` to Rollback,
which was previously missing.
Signed-off-by: beorn7 <beorn@grafana.com>
* Change createTimeseries to take values for number of series and number
of samples per series.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Take num of samples to expect in expectSampleCount instead of array of
samples.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Add field to TestStorageClient to ignore samples sent waitgroup for
potential tests where we don't care about delivery of all samples.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Fix up tests a little bit.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
This is part of https://github.com/prometheus/prometheus/pull/5882 that can be done to simplify things.
All todos I added will be fixed in follow up PRs.
* querier.Querier, querier.Appender, querier.SeriesSet, and querier.Series interfaces merged
with storage interface.go. All imports that.
* querier.SeriesIterator replaced by chunkenc.Iterator
* Added chunkenc.Iterator.Seek method and tests for xor implementation (?)
* Since we properly handle SelectParams for Select methods I adjusted min max
based on that. This should help in terms of performance for queries with functions like offset.
* added Seek to deletedIterator and test.
* storage/tsdb was removed as it was only a unnecessary glue with incompatible structs.
No logic was changed, only different source of abstractions, so no need for benchmarks.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Remote store client's `Store` API currently doesn't use passed
context, but instead just constructs a new `context.Background()`
Signed-off-by: Anand Singh Kunwar <anandkunwar95@gmail.com>
The integral accumulator in the remote write sharding code is just a
second way of keeping track of the number of samples pending. Remove
integralAccumulator and use the samplesPending value we already
calculate to calculate the number of shards.
This has the added benefit of fixing a bug where the integralAccumulator
was not being initialized correctly due to not taking into account the
number of ticks being counted, causing the integralAccumulator initial
value to be off by an order of magnitude in some cases.
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
Also improves TestPopulateLabels: testutil.ErrorEqual just returned a
bool without failing the test.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Track remote write queues via a map so we don't care about index.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Support a job name for remote write/read so we can differentiate between
them using the name.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Remote write/read has Name to not confuse the meaning of the field with
scrape job names.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Split queue/client label into remote_name and url labels.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Don't allow for duplicate remote write/read configs.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Ensure we restart remote write queues if the hash of their config has
not changed, but the remote name has changed.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Include name in remote read/write config hashes, simplify duplicates
check, update test accordingly.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
It is possible that desired shards is always a bit higher than the
number of shards (less than 30%) and by exporting desired shards as the
raw number it will be easy to tell if a Prometheus is in that situation.
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
* Refactor calculateDesiredShards + don't reshard if we're having issues
sending samples.
* Track lastSendTimestamp via an int64 with atomic add/load, add a test
for reshard calculation.
* Simplify conditional for skipping resharding, add samplesIn/Out to shard
testcase struct.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
The WAL Watcher replays a checkpoint after it is created in order to
garbage collect series that no longer exist in the WAL. Currently the
garbage collection process is done serially with reading from the tip of
the WAL which can cause large delays in writing samples to remote
storage just after compaction occurs.
This also fixes a memory leak where dropped series are not cleaned up as
part of the SeriesReset process.
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
* Replaced test validations with testutils on storage/remote/codec_test.go
Signed-off-by: George Felix <george.felix@ubeeqo.com>
* gofmt
Signed-off-by: George Felix <george.felix@ubeeqo.com>
* Removed shouldPass assertion
Signed-off-by: George Felix <gfelixc@gmail.com>
* Fixes to improve readability
Signed-off-by: George Felix <george.felix@ubeeqo.com>
* Fixes based on code review comments
Signed-off-by: George Felix <george.felix@ubeeqo.com>
* Added Fatal method and used it in buffer_test
Signed-off-by: Joe Elliott <number101010@gmail.com>
* Added period to meet contributing guidelines
Signed-off-by: Joe Elliott <number101010@gmail.com>
* Removed fatal testutil method. Refactored test cases to use testutil.Assert
Signed-off-by: Joe Elliott <number101010@gmail.com>
* Added if found condition for clarity
Signed-off-by: Joe Elliott <number101010@gmail.com>
* Show warnings in UI if query have returned some warnings
+ improve warning (error) text if query to remote was finished with error
* Add prefixes for remote_read errors
Signed-off-by: Stan Putrya <root.vagner@gmail.com>
* Update go.mod dependencies before release
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Add issue for showing query warnings in promtool
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Revert json-iterator back to 1.1.6
It produced errors when marshaling Point values with special float
values.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Fix expected step values in promtool tests after client_golang update
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Update generated protobuf code after proto dep updates
Signed-off-by: Julius Volz <julius.volz@gmail.com>
The desired shards calculation now properly keeps track of the rate of
pending samples, and uses the previously unused integralAccumulator to
adjust for missing information in the desired shards calculation.
Also, configure more capacity for each shard. The default 10 capacity
causes shards to block on each other while
sending remote requests. Default to a 500 sample capacity and explain in
the documentation that having more capacity will help throughput.
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
* Check for duplicate label names in remote read
Also add test to confirm that #5731 is fixed
* Use subtests in TestValidateLabelsAndMetricName
* Really check that expectedErr matches err
Signed-off-by: Vadym Martsynovskyy <vmartsynovskyy@gmail.com>
* Only check last directory when discovering checkpoint number
Signed-off-by: Devin Trejo <dtrejo@palantir.com>
* Comments for checkpointNum
Signed-off-by: Devin Trejo <dtrejo@palantir.com>
* Add benchmark for sample delivery
* Simplify StoreSeries to have only one loop
* Reduce allocations for pending samples in runShard
* Only allocate one send slice per segment
* Cache a buffer in each shard for snappy to use
* Remove queue manager seriesMtx
It is not possible for any of the places protected by the seriesMtx to
be called concurrently so it is safe to remove. By removing the mutex we
can simplify the Append code to one loop.
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
This allows other processes to reuse just the remote write code without
having to use the remote read code as well. This will be used to create
a sidecar capable of sending remote write payloads.
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
* Update remote write and remote read separately
* Add external labels to the remote write conf hash
* Add unit tests for remote storage lifecycle
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
* Don't panic if we try to release a string that is not in the interner.
* Move seriesMtx locking in QueueManager's StoreSeries function.
This stops us from calling release for strings that aren't interned if
there's a race between reading a checkpoint and storing new series
labels, which could happen during checkpointing or reloading config.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Unregister remote write queue manager specific metrics when stopping the
queue manager.
* Use DeleteLabelValues instead of Unregister to remove queue and watcher
related metrics when we stop them. Create those metrics in the structs
start functions rather than in their constructors because of the
ordering of creation, start, and stop in remote storage ApplyConfig.
* Add setMetrics function to WAL watcher so we can set
the watchers metrics in it's Start function, but not
have to call Start in some tests (causes data race).
Signed-off-by: Callum Styan <callumstyan@gmail.com>
From the documentation:
> The default HTTP client's Transport may not
> reuse HTTP/1.x "keep-alive" TCP connections if the Body is
> not read to completion and closed.
This effectively enable keep-alive for the fixed requests.
Signed-off-by: Romain Baugue <romain.baugue@elwinar.com>
* Fix ReadCheckpoint to ensure that it actually reads all the contents of
each segment in a checkpoint dir, or returns an error.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors.
ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives.
iii) Does away with the use of fmt package for errors in favour of pkg/errors
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
a string when there are no longer any refs. Add tests for interning.
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
- Unmarshall external_labels config as labels.Labels, add tests.
- Convert some more uses of model.LabelSet to labels.Labels.
- Remove old relabel pkg (fixes#3647).
- Validate external label names.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
* Consistently pre-lookup the metrics for a given queue in queue manager.
* Don't open the WAL (for writing) in the remote_write code.
* Add some more logging.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
- Remove prometheus_remote_queue_last_send_timestamp_seconds metric. Its not particularly useful, we have highest_timestamp_seconds.
- Factor out maxGauage, a gauge that only increases.
- Change sharding calculations to use max samples in timestamp - max samples out timestamp (not rates).
- Also include the ratio of samples dropped to correctly predict number of pending samples.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
- Add a dropped samples EWMA and use it in calculating desired shards.
- Update metric names and a log messages.
- Limit number of entries in the dedupe logging middleware to prevent potential OOM.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
- If we're replaying the WAL to get series records, skip that segment when we hit corruptions.
- If we're tailing the WAL for samples, fail the watcher.
- When the watcher fails, restart from the latest checkpoint - and only send new samples by updating startTime.
- Tidy up log lines and error handling, don't return so many errors on quiting.
- Expect EOF when processing checkpoints.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
- Tests that created a QueueManager were leaving behind files at the end of tests.
- WAL replaying (readToEnd)tests seem to require extra time to finish now.
- Some fixes to make staticcheck happy
Signed-off-by: Callum Styan <callumstyan@gmail.com>