* Add benchmark for sample delivery
* Simplify StoreSeries to have only one loop
* Reduce allocations for pending samples in runShard
* Only allocate one send slice per segment
* Cache a buffer in each shard for snappy to use
* Remove queue manager seriesMtx
It is not possible for any of the places protected by the seriesMtx to
be called concurrently so it is safe to remove. By removing the mutex we
can simplify the Append code to one loop.
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
This allows other processes to reuse just the remote write code without
having to use the remote read code as well. This will be used to create
a sidecar capable of sending remote write payloads.
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
* Update remote write and remote read separately
* Add external labels to the remote write conf hash
* Add unit tests for remote storage lifecycle
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
* Don't panic if we try to release a string that is not in the interner.
* Move seriesMtx locking in QueueManager's StoreSeries function.
This stops us from calling release for strings that aren't interned if
there's a race between reading a checkpoint and storing new series
labels, which could happen during checkpointing or reloading config.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Unregister remote write queue manager specific metrics when stopping the
queue manager.
* Use DeleteLabelValues instead of Unregister to remove queue and watcher
related metrics when we stop them. Create those metrics in the structs
start functions rather than in their constructors because of the
ordering of creation, start, and stop in remote storage ApplyConfig.
* Add setMetrics function to WAL watcher so we can set
the watchers metrics in it's Start function, but not
have to call Start in some tests (causes data race).
Signed-off-by: Callum Styan <callumstyan@gmail.com>
From the documentation:
> The default HTTP client's Transport may not
> reuse HTTP/1.x "keep-alive" TCP connections if the Body is
> not read to completion and closed.
This effectively enable keep-alive for the fixed requests.
Signed-off-by: Romain Baugue <romain.baugue@elwinar.com>
* Fix ReadCheckpoint to ensure that it actually reads all the contents of
each segment in a checkpoint dir, or returns an error.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors.
ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives.
iii) Does away with the use of fmt package for errors in favour of pkg/errors
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
a string when there are no longer any refs. Add tests for interning.
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
- Unmarshall external_labels config as labels.Labels, add tests.
- Convert some more uses of model.LabelSet to labels.Labels.
- Remove old relabel pkg (fixes#3647).
- Validate external label names.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
* Consistently pre-lookup the metrics for a given queue in queue manager.
* Don't open the WAL (for writing) in the remote_write code.
* Add some more logging.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
- Remove prometheus_remote_queue_last_send_timestamp_seconds metric. Its not particularly useful, we have highest_timestamp_seconds.
- Factor out maxGauage, a gauge that only increases.
- Change sharding calculations to use max samples in timestamp - max samples out timestamp (not rates).
- Also include the ratio of samples dropped to correctly predict number of pending samples.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
- Add a dropped samples EWMA and use it in calculating desired shards.
- Update metric names and a log messages.
- Limit number of entries in the dedupe logging middleware to prevent potential OOM.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
- If we're replaying the WAL to get series records, skip that segment when we hit corruptions.
- If we're tailing the WAL for samples, fail the watcher.
- When the watcher fails, restart from the latest checkpoint - and only send new samples by updating startTime.
- Tidy up log lines and error handling, don't return so many errors on quiting.
- Expect EOF when processing checkpoints.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
- Tests that created a QueueManager were leaving behind files at the end of tests.
- WAL replaying (readToEnd)tests seem to require extra time to finish now.
- Some fixes to make staticcheck happy
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* storage/remote: adapt tests for Travis CI
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Check filesystems on Travis environment
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Run remote/storage tests on CircleCI for troubleshooting
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Try using tmpfs partition
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Revert "Try using tmpfs partition"
This reverts commit 85a30deb72.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Don't store labels in writeToMock
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Fix data race
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Bump retries to 100 meaning that the total timeout is 10s
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* clean up .travis.yml
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* code fixup
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Remove unneeded empty line
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
- Use the queue name in WAL watcher logging.
- Don't return from watch if the reader error was EOF.
- Fix sample timestamp check logic regarding what samples we send.
- Refactor so we don't need readToEnd/readSeriesRecords
- Fix wal_watcher tests since readToEnd no longer exists
Signed-off-by: Callum Styan <callumstyan@gmail.com>
- Remove datarace in the exported highest scrape timestamp.
- Backoff on enqueue should be per-sample - reset the result for each sample.
- Remove diffKeys, unused ctx and cancelfunc in WALWatcher, 'name' from writeTo interface, and pass it to constructor.
- Reorder functions in WALWatcher depth-first according to call graph.
- Fix vendor/modules.txt.
- Split out the various timer periods into consts at the top of the file.
- Move w.currentSegmentMetric.Set close to where we set the currentSegment.
- Combine r.Next() and isClosed(w.quit) into a single loop.
- Unnest some ifs in WALWatcher.watch, propagate erros in decodeRecord, add some new lines to make it easier to read.
- Reorganise checkpoint handling to reduce nesting and make it easier to follow.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
This change switches the remote_write API to use the TSDB WAL. This should reduce memory usage and prevent sample loss when the remote end point is down.
We use the new LiveReader from TSDB to tail WAL segments. Logic for finding the tracking segment is included in this PR. The WAL is tailed once for each remote_write endpoint specified. Reading from the segment is based on a ticker rather than relying on fsnotify write events, which were found to be complicated and unreliable in early prototypes.
Enqueuing a sample for sending via remote_write can now block, to provide back pressure. Queues are still required to acheive parallelism and batching. We have updated the queue config based on new defaults for queue capacity and pending samples values - much smaller values are now possible. The remote_write resharding code has been updated to prevent deadlocks, and extra tests have been added for these cases.
As part of this change, we attempt to guarantee that samples are not lost; however this initial version doesn't guarantee this across Prometheus restarts or non-retryable errors from the remote end (eg 400s).
This changes also includes the following optimisations:
- only marshal the proto request once, not once per retry
- maintain a single copy of the labels for given series to reduce GC pressure
Other minor tweaks:
- only reshard if we've also successfully sent recently
- add pending samples, latest sent timestamp, WAL events processed metrics
Co-authored-by: Chris Marchbanks <csmarchbanks.com> (initial prototype)
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> (sharding changes)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* *: use latest release of staticcheck
It also fixes a couple of things in the code flagged by the additional
checks.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Use official release of staticcheck
Also run 'go list' before staticcheck to avoid failures when downloading packages.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Currently Prometheus requests show up with a UA of Go-http-client/1.1
which isn't super helpful. Though the X-Prometheus-Remote-* headers
exist they need to be explicitly configured when logging the request in
order to be able to deduce this is a request originating from
Prometheus. By setting the header we remove this ambiguity and make
default server logs just a bit more useful.
This also updates a few other places to consistently capitalize the 'P'
in the user agent, as well as ensure we set a UA to begin with.
Signed-off-by: Daniele Sluijters <daenney@users.noreply.github.com>
* *: remove use of golang.org/x/net/context
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* scrape: fix TestTargetScrapeScrapeCancel
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Limit the number of samples remote read can return.
- Return 413 entity too large.
- Limit can be set be a flag. Allow 0 to mean no limit.
- Include limit in error message.
- Set default limit to 50M (* 16 bytes = 800MB).
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
There are many more (mostly finalizers like Close/Stop/etc.), but most of
the others seemed like one couldn't do much about them anyway.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Add Start/End to SelectParams
* Make remote read use the new selectParams for start/end
This commit will continue sending the start/end time of the remote read
query as the overarching promql time and the specific range of data that
the query is intersted in receiving a response to is now part of the
ReadHints (upstream discussion in #4226).
* Remove unused vendored code
The genproto.sh script was updated, but the code wasn't regenerated.
This simply removes the vendored deps that are no longer part of the
codegen output.
Signed-off-by: Thomas Jackson <jacksontj.89@gmail.com>
This commit fixes a denial-of-service issue of the remote
read endpoint. It limits the size of the POST request body
to 32 MB such that clients cannot write arbitrary amounts
of data to the server memory.
Fixes#4238
Signed-off-by: Andreas Auernhammer <aead@mail.de>
More than one remote_write destination can be configured, in which
case it's essential to know which one each log message refers to.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This adds a parameter to the storage selection interface which allows
query engine(s) to pass information about the operations surrounding a
data selection.
This can for example be used by remote storage backends to infer the
correct downsampling aggregates that need to be provided.
* refactor: move targetGroup struct and CheckOverflow() to their own package
* refactor: move auth and security related structs to a utility package, fix import error in utility package
* refactor: Azure SD, remove SD struct from config
* refactor: DNS SD, remove SD struct from config into dns package
* refactor: ec2 SD, move SD struct from config into the ec2 package
* refactor: file SD, move SD struct from config to file discovery package
* refactor: gce, move SD struct from config to gce discovery package
* refactor: move HTTPClientConfig and URL into util/config, fix import error in httputil
* refactor: consul, move SD struct from config into consul discovery package
* refactor: marathon, move SD struct from config into marathon discovery package
* refactor: triton, move SD struct from config to triton discovery package, fix test
* refactor: zookeeper, move SD structs from config to zookeeper discovery package
* refactor: openstack, remove SD struct from config, move into openstack discovery package
* refactor: kubernetes, move SD struct from config into kubernetes discovery package
* refactor: notifier, use targetgroup package instead of config
* refactor: tests for file, marathon, triton SD - use targetgroup package instead of config.TargetGroup
* refactor: retrieval, use targetgroup package instead of config.TargetGroup
* refactor: storage, use config util package
* refactor: discovery manager, use targetgroup package instead of config.TargetGroup
* refactor: use HTTPClient and TLS config from configUtil instead of config
* refactor: tests, use targetgroup package instead of config.TargetGroup
* refactor: fix tagetgroup.Group pointers that were removed by mistake
* refactor: openstack, kubernetes: drop prefixes
* refactor: remove import aliases forced due to vscode bug
* refactor: move main SD struct out of config into discovery/config
* refactor: rename configUtil to config_util
* refactor: rename yamlUtil to yaml_config
* refactor: kubernetes, remove prefixes
* refactor: move the TargetGroup package to discovery/
* refactor: fix order of imports
For special remote read endpoints which have only data for specific
queries, it is desired to limit the number of queries sent to the
configured remote read endpoint to reduce latency and performance
overhead.
* Decouple remote client from ReadRecent feature.
* Separate remote read filter into a small, testable function.
* Use storage.Queryable interface to compose independent
functionalities.
The labelsets returned from remote read are mutated in higher levels
(like seriesFilter.Labels()) and since the concreteSeriesSet didn't
return a copy, the external mutation affected the labelset in the
concreteSeries itself. This resulted in bizarre bugs where local and
remote series would show with identical label sets in the UI, but not be
deduplicated, since internally, a series might come to look like:
{__name__="node_load5", instance="192.168.1.202:12090", job="node_exporter", node="odroid", node="odroid"}
(note the repetition of the last label)