This adds support for a new query param on the new `api/v1/metadata`
endpoint that provides metadata for a specified metric via the V1 API.
It collapses metadata that is equal across all targets, and aggregates
under the same metric name the ones that differ.
Signed-off-by: gotjosh <josue@grafana.com>
* api: provide per metric metadata
This adds a new endpoint that provides per metric metadata via the V1 API.
It collapses metadata that is equal across all targets, and aggregates under the same metric name the ones that differ.
* Allow tests to be asserted on response length
Some tests e.g. limit on API responses, don't require an assertion on
equality.
This allows us to assert against response length instead of
equality.
Signed-off-by: gotjosh <josue@grafana.com>
* Allows sorting of responses from the API in tests
Fixes flaky test for api/v1/targets/metadata.
Allows sorting of responses from the API. For our tests to be deterministic, we need to ensure the response from the API follows an order. This structure allows us to define one.
Fixes#6431
Signed-off-by: gotjosh <josue@grafana.com>
This commit introduces several test cases for the current /targets/metadata API endpoint.
To achieve so, we use a mock of the metadataStore and inject it to the targets under test.
Currently, three success cases are covered: with a metric name, with a target matcher, and with both. As for the failure scenario, the one where we couldn't match against a particular metric is covered.
Signed-off-by: gotjosh <josue@grafana.com>
Previously, the struct `testTargetRetriever` had hardcoded active and dropped targets. This made it difficult to change the target information depending on the test case.
This change introduces a way to define them as arguments and pass it to a constructor for building. It lays a foundation for dynamically defining targets with various set of arguments to test different scenarios.
Signed-off-by: gotjosh <josue@grafana.com>
According to the documentation, the target metadata API accepts it,
if no value for match_target has been provided. This was not the case
in the implementation.
This commit make the API behave as described in the docs.
Signed-off-by: Tobias Guggenmos <tguggenm@redhat.com>
* Add tests to ensure we can marshal and unmarshal our min/max times
Related to https://github.com/prometheus/client_golang/issues/614
Instead of implementing all the time parsing, we can special-case handle
these 2 times. This means if times in this format show up that
time.Parse can't handle they will still error, but we can marshal/parse
our own min/max time
Signed-off-by: Thomas Jackson <jacksontj.89@gmail.com>
The goal is to remove almost all references to the
golang.org/x/net/context package.
github.com/gogo/protobuf => v1.2.1
google.golang.org/grpc => v1.19.1
github.com/grpc-ecosystem/grpc-gateway => v1.18.5
It also replaces github.com/cockroachdb/cmux by github.com/soheilhy/cmux
because of [1] which fixes#3909 incidentally.
[1] https://github.com/grpc/grpc-go/issues/2636
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors.
ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives.
iii) Does away with the use of fmt package for errors in favour of pkg/errors
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
- Unmarshall external_labels config as labels.Labels, add tests.
- Convert some more uses of model.LabelSet to labels.Labels.
- Remove old relabel pkg (fixes#3647).
- Validate external label names.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
This change switches the remote_write API to use the TSDB WAL. This should reduce memory usage and prevent sample loss when the remote end point is down.
We use the new LiveReader from TSDB to tail WAL segments. Logic for finding the tracking segment is included in this PR. The WAL is tailed once for each remote_write endpoint specified. Reading from the segment is based on a ticker rather than relying on fsnotify write events, which were found to be complicated and unreliable in early prototypes.
Enqueuing a sample for sending via remote_write can now block, to provide back pressure. Queues are still required to acheive parallelism and batching. We have updated the queue config based on new defaults for queue capacity and pending samples values - much smaller values are now possible. The remote_write resharding code has been updated to prevent deadlocks, and extra tests have been added for these cases.
As part of this change, we attempt to guarantee that samples are not lost; however this initial version doesn't guarantee this across Prometheus restarts or non-retryable errors from the remote end (eg 400s).
This changes also includes the following optimisations:
- only marshal the proto request once, not once per retry
- maintain a single copy of the labels for given series to reduce GC pressure
Other minor tweaks:
- only reshard if we've also successfully sent recently
- add pending samples, latest sent timestamp, WAL events processed metrics
Co-authored-by: Chris Marchbanks <csmarchbanks.com> (initial prototype)
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> (sharding changes)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* *: bump gRPC dependencies
This change updates the gRPC dependencies to more recent versions:
* github.com/gogo/protobuf => v1.2.0
* github.com/grpc-ecosystem/grpc-gateway => v1.6.3
* google.golang.org/grpc => v1.17.0
In addition scripts/genproto.sh leverages Go modules information instead of
hardcoding SHA1 commits. This ensures that the code is generated from
the exact same sources.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Run 'make proto' in CI
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Revert tabs -> spaces change
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Fix 'make proto' step
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* 'go get' grpc/protobuf dependencies
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Prepopulate cache with go mod download
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* *: use latest release of staticcheck
It also fixes a couple of things in the code flagged by the additional
checks.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Use official release of staticcheck
Also run 'go list' before staticcheck to avoid failures when downloading packages.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* update promlog to latest version
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* Update api tests, fix main setup
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* tidy go.sum
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* revendor prometheus/common
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* only initialize config; use kingpin for remote_storage_adapter
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* actually parse the flags
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* clean up imports
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
By default the gRPC client of the REST API gateway relies on the
HTTP_PROXY variable to connect to the local gRPC server which isn't
desired as the server runs in the same process. This change uses a
custom dialer that connects directly to the server's address.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* *: move to go 1.11
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Reduce number of places where we specify the Go version
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
The scrape manage receiver's channel now just saves the target sets
and another backgorund runner updates the scrape loops every 5 seconds.
This is so that the scrape manager doesn't block the receiving channel
when it does the long background reloading of the scrape loops.
Active and dropped targets are now saved in each scrape pool instead of
the scrape manager. This is mainly to avoid races when getting the
targets via the web api.
When reloading the scrape loops now happens in parallel to speed up the
final disared state and this also speeds up the prometheus's shutting
down.
Also updated some funcs signatures in the web package for consistency.
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
* Limit the number of samples remote read can return.
- Return 413 entity too large.
- Limit can be set be a flag. Allow 0 to mean no limit.
- Include limit in error message.
- Set default limit to 50M (* 16 bytes = 800MB).
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
There are many more (mostly finalizers like Close/Stop/etc.), but most of
the others seemed like one couldn't do much about them anyway.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Add Start/End to SelectParams
* Make remote read use the new selectParams for start/end
This commit will continue sending the start/end time of the remote read
query as the overarching promql time and the specific range of data that
the query is intersted in receiving a response to is now part of the
ReadHints (upstream discussion in #4226).
* Remove unused vendored code
The genproto.sh script was updated, but the code wasn't regenerated.
This simply removes the vendored deps that are no longer part of the
codegen output.
Signed-off-by: Thomas Jackson <jacksontj.89@gmail.com>
This adds a per-target cache of scraped metadata. The metadata is only
available for the lifecycle of the attached target. An API endpoint allows
to select metadata by metric name and a label selection of targets.
Signed-off-by: Fabian Reinartz <freinartz@google.com>
* Move range logic to 'eval'
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make aggregegate range aware
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* PromQL is statically typed, so don't eval to find the type.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Extend rangewrapper to multiple exprs
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Start making function evaluation ranged
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make instant queries a special case of range queries
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Eliminate evalString
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Evaluate range vector functions one series at a time
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make unary operators range aware
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make binops range aware
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Pass time to range-aware functions.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make simple _over_time functions range aware
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Reduce allocs when working with matrix selectors
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Add basic benchmark for range evaluation
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Reuse objects for function arguments
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Do dropmetricname and allocating output vector only once.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Add range-aware support for range vector functions with params
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Optimise holt_winters, cut cpu and allocs by ~25%
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make rate&friends range aware
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make more functions range aware. Document calling convention.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make date functions range aware
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make simple math functions range aware
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Convert more functions to be range aware
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make more functions range aware
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Specialcase timestamp() with vector selector arg for range awareness
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Remove transition code for functions
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Remove the rest of the engine transition code
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Remove more obselete code
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Remove the last uses of the eval* functions
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Remove engine finalizers to prevent corruption
The finalizers set by matrixSelector were being called
just before the value they were retruning to the pool
was then being provided to the caller. Thus a concurrent query
could corrupt the data that the user has just been returned.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Add new benchmark suite for range functinos
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Migrate existing benchmarks to new system
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Expand promql benchmarks
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Simply test by removing unused range code
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* When testing instant queries, check range queries too.
To protect against subsequent steps in a range query being
affected by the previous steps, add a test that evaluates
an instant query that we know works again as a range query
with the tiimestamp we care about not being the first step.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Reuse ring for matrix iters. Put query results back in pool.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Reuse buffer when iterating over matrix selectors
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Unary minus should remove metric name
Cut down benchmarks for faster runs.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Reduce repetition in benchmark test cases
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Work series by series when doing normal vectorSelectors
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Optimise benchmark setup, cuts time by 60%
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Have rangeWrapper use an evalNodeHelper to cache across steps
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Use evalNodeHelper with functions
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Cache dropMetricName within a node evaluation.
This saves both the calculations and allocs done by dropMetricName
across steps.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Reuse input vectors in rangewrapper
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Reuse the point slices in the matrixes input/output by rangeWrapper
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make benchmark setup faster using AddFast
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Simplify benchmark code.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Add caching in VectorBinop
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Use xor to have one-level resultMetric hash key
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Add more benchmarks
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Call Query.Close in apiv1
This allows point slices allocated for the response data
to be reused by later queries, saving allocations.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Optimise histogram_quantile
It's now 5-10% faster with 97% less garbage generated for 1k steps
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make the input collection in rangeVector linear rather than quadratic
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Optimise label_replace, for 1k steps 15x fewer allocs and 3x faster
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Optimise label_join, 1.8x faster and 11x less memory for 1k steps
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Expand benchmarks, cleanup comments, simplify numSteps logic.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Address Fabian's comments
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Comments from Alin.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Address jrv's comments
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Remove dead code
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Address Simon's comments.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Rename populateIterators, pre-init some sizes
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Handle case where function has non-matrix args first
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Split rangeWrapper out to rangeEval function, improve comments
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Cleanup and make things more consistent
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Make EvalNodeHelper public
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Fabian's comments.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
Fix race by properly locking access to scrape pools. Use separate mutex for information needed by UI so that UI isn't blocked when targets are being updated.
* web: replace deprecated InstrumentHandler()
This change replaces the deprecated InstrumentHandler function by the
equivalent functions from the promhttp package.
The following metrics are removed:
* http_request_duration_microseconds (Summary).
* http_request_size_bytes (Summary).
* http_requests_total (Counter).
And the following metrics are added instead:
* prometheus_http_request_duration_seconds (Histogram).
* prometheus_http_response_size_bytes (Histogram).
* promhttp_metric_handler_requests_in_flight (Gauge).
* promhttp_metric_handler_requests_total (Counter).
* Update github.com/prometheus/common/route package
* web: refactor using the new prometheus/common/route package
This adds a parameter to the storage selection interface which allows
query engine(s) to pass information about the operations surrounding a
data selection.
This can for example be used by remote storage backends to infer the
correct downsampling aggregates that need to be provided.
Federation makes use of dedupedSeriesSet to merge SeriesSets for every
query into one output stream. If many match[] arguments are provided,
many dedupedSeriesSet objects will get chained. This has the downside of
causing a potential O(n*k) running time, where n is the number of series
and k the number of match[] arguments.
In the mean time, the storage package provides a mergeSeriesSet that
accomplishes the same with an O(n*log(k)) running time by making use of
a binary heap. Let's just get rid of dedupedSeriesSet and change all
existing callers to use mergeSeriesSet.