Commit graph

1514 commits

Author SHA1 Message Date
Arve Knudsen 99f3051f45 OTLP: Use PrometheusConverter directly
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-04-30 13:10:27 +02:00
Arve Knudsen 759ca8b207
Merge branch 'main' into refactor/add_max_func_to_maxTimestamp
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-04-30 11:50:37 +02:00
Jesus Vazquez 7554384dac
otlp: Prometheus to own its own copy of the otlptranslator package (#13991)
After a lot of productive discussion between the Prometheus and
OpenTelemetry community we decided that it made sense for Prometheus to
own its own copy of the code in charge for handling OTLP ingestion
traffic.

This commit is removing the README and update-copy.sh files that had the
previous steps to update the code.

Also it is updating the licensing of all the files to make sure the
OpenTelemetry provenance is explicit and to state the new ownership.

Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com>
Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-04-30 11:29:52 +02:00
komisan19 b974a99279 fix
Signed-off-by: komisan19 <18901496+komisan19@users.noreply.github.com>
2024-04-30 10:45:50 +09:00
komisan19 3d84d4d6dc fix
Signed-off-by: komisan19 <18901496+komisan19@users.noreply.github.com>
2024-04-22 19:04:00 +09:00
komisan19 5ab24a06d0 refactor: add max func to maxTimestamp
Signed-off-by: komisan19 <18901496+komisan19@users.noreply.github.com>
2024-04-21 23:39:25 +09:00
Matthieu MOREL 6f595c6762
golangci-lint: enable whitespace linter (#13905)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-04-11 09:27:54 +01:00
Matthieu MOREL d496687c8e golangci-lint: enable usestdlibvars linter
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-04-08 19:26:23 +00:00
carehabit a672662073
all: fix some typos (#13863)
Signed-off-by: carehabit <shenyuting@outlook.com>
2024-04-01 18:06:05 +02:00
Arve Knudsen d8e4230696 storage: Fix mockChunkQuerier type name
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-03-22 07:44:38 +01:00
Jan-Otto Kröpke 302e151de8
{discovery,remote_write}/azure: Support default SDK authentication (#13099)
* discovery/azure: Offer default SDK authentication

Signed-off-by: Jan-Otto Kröpke <mail@jkroepke.de>
2024-03-16 11:06:57 +00:00
Julien d1abc3f255
Merge pull request #13777 from roidelapluie/remoteread2
Chunked remote read: close the querier earlier
2024-03-15 14:42:30 +01:00
Julien Pivotto 53091126c2 Chunked remote read: close the querier earlier
I have seen prometheis instances misebehaving because of broken chinked remote
read requests.

In order to avoid OOM's when this happens, I propose to close the
queries used by the streamed remote read requests earlier.

Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2024-03-15 14:03:16 +01:00
Jakub Čajka 505fd638be
otlptranslator: fix up import paths
Signed-off-by: Jakub Čajka <jcajka@redhat.com>
2024-03-13 15:56:14 +01:00
Björn Rabenstein 9acae57937
Merge pull request #13681 from krajorama/native-latency-histograms
Add native histograms to latency/duration metrics
2024-03-07 20:46:43 +01:00
beorn7 7df0b9b92d storage: simplify sampleRing fix
Signed-off-by: beorn7 <beorn@grafana.com>
2024-03-05 15:41:18 +01:00
Yuri Nikolic d5ab1851dc SampleRingIterator: add currType field
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
2024-03-01 14:59:19 +01:00
György Krajcsovits 4d4d822c36 Add native histograms to latency/duration metrics
Dogfood native histograms.
Allow dependent projects to migrate to native histograms.

I took the defaults from client_golang.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-03-01 14:44:38 +01:00
Robert Fratto a09465baee
storage/remote: disable resharding during active retry backoffs (#13562)
* storage/remote: disable resharding during active retry backoffs

Today, remote_write reshards based on pure throughput. This is
problematic if throughput has been diminished because of HTTP 429s;
increasing the number of shards due to backpressure will only exacerbate
the problem.

This commit disables resharding for twice the retry backoff, ensuring
that resharding will never occur during an active backoff, and that
resharding does not become enabled again until enough time has elapsed
to allow any pending requests to be retried.

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

* storage/remote: test that resharding is disabled on retry

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

* storage/remote: address review feedback

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

* storage/remote: track time where resharding initially got disabled

This change introduces a second atomic int64 to roughly track when
resharding got disabled. This int64 is only updated after updating the
disabled timestamp if resharding was previously enabled.

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

---------

Signed-off-by: Robert Fratto <robertfratto@gmail.com>
2024-02-28 14:28:39 -08:00
machine424 f477e0539a
Move from golang.org/x/exp/slices into slices now that we only support Go >= 1.21
Prevent adding back golang.org/x/exp/slices.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-02-28 14:54:53 +01:00
Bryan Boreham 2ac1632eec storage/remote: improve symbol-table handling
On the incoming path, `writeHandler.write()` creates a new table for
each request.

`labelProtosToLabels` takes a `ScratchBuilder` now.

Call `NewScratchBuilder` as required in tests.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-23 13:50:27 +00:00
Bryan Boreham 8f525b4ba4 storage/remote tests: refactor: extract function newTestQueueManager
To reduce repetition.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-23 13:50:27 +00:00
Arve Knudsen bf5ca8cf38 otlptranslator: Upgrade to v0.95.0
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-02-22 09:12:07 +01:00
Bryan Boreham aba0071480
Merge pull request #13589 from bboreham/trace_id
Standardise exemplar label as "trace_id"
2024-02-19 09:34:04 +00:00
Owen Williams a28d7865ad UTF-8: Add support for parsing UTF8 metric and label names
This adds support for the new grammar of `{"metric_name", "l1"="val"}` to promql and some of the exposition formats.
This grammar will also be valid for non-UTF-8 names.
UTF-8 names will not be considered valid unless model.NameValidationScheme is changed.

This does not update the go expfmt parser in text_parse.go, which will be addressed by https://github.com/prometheus/common/issues/554/.

Part of https://github.com/prometheus/prometheus/issues/13095

Signed-off-by: Owen Williams <owen.williams@grafana.com>
2024-02-15 14:34:37 -05:00
Bryan Boreham c0e36e6bb3 Standardise exemplar label as "trace_id"
This is consistent with the OpenTelemetry standard, and an example in OpenMetrics.

https://github.com/open-telemetry/opentelemetry-specification/blob/89aa01348139/specification/metrics/data-model.md#exemplars
https://github.com/OpenObservability/OpenMetrics/blob/138654493130/specification/OpenMetrics.md#exemplars-1

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-15 14:20:08 +00:00
Bryan Boreham 17f48f2b3b Tests: use replacement DeepEquals in more places
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-08 19:32:33 +00:00
Bryan Boreham 8655fe5401
Merge pull request #13491 from bboreham/faster-store-series
storage/remote: speed up StoreSeries by re-using labels.Builder
2024-02-06 17:16:32 +01:00
Bryan Boreham 41f3eeb048
Merge pull request #13497 from captncraig/cmp_signedheaders
storage/remote: apply custom headers before sigv4 transport
2024-02-04 14:46:14 +01:00
Bryan Boreham fbca054af6 storage: don't wrap single querier in merge-queriers
If given a single querier, just return it instead of constructing a
complicated wrapper. The code in `mergeGenericQuerier` which skipped
merging when there was only one is not needed any more.

This change required a few tests to be tweaked, because they relied on
the specific behaviour of `mergeGenericQuerier.Select()`.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-01-31 12:14:22 +00:00
Bryan Boreham b9fdf3dad1 storage/remote: document why two benchmarks are skipped
One was silently doing nothing; one was doing something but the work
didn't go up linearly with iteration count.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-01-30 16:48:04 +00:00
Craig Peterson 5b5230deb7 remote client: apply custom headers before sigv4 transport
Signed-off-by: Craig Peterson <192540+captncraig@users.noreply.github.com>
2024-01-30 09:27:00 -05:00
Bryan Boreham dcd024a095 storage/remote: speed up StoreSeries by re-using labels.Builder
Relabeling can take a pre-populated `Builder` instead of making a new
one every time. This is much more efficient.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-01-29 18:49:55 +00:00
Bryan Boreham d9483bb77c storage/remote: add BenchmarkStoreSeries
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-01-29 16:54:12 +00:00
Marco Pracucci 501bc6419e
Add ShardedPostings() support to TSDB (#10421)
This PR is a reference implementation of the proposal described in #10420.

In addition to what described in #10420, in this PR I've introduced labels.StableHash(). The idea is to offer an hashing function which doesn't change over time, and that's used by query sharding in order to get a stable behaviour over time. The implementation of labels.StableHash() is the hashing function used by Prometheus before stringlabels, and what's used by Grafana Mimir for query sharding (because built before stringlabels was a thing).

Follow up work
As mentioned in #10420, if this PR is accepted I'm also open to upload another foundamental piece used by Grafana Mimir query sharding to accelerate the query execution: an optional, configurable and fast in-memory cache for the series hashes.

Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-01-29 11:57:27 +00:00
Goutham Veeramachaneni fb552d290d
Merge pull request #13464 from aknuds1/arve/fix-update-copy
otlptranslator/update-copy.sh: Fix sed command lines
2024-01-26 17:52:49 +05:30
Arve Knudsen de28494434 Make update-copy.sh work for both OSX and GNU sed
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-01-25 15:11:03 +01:00
Arve Knudsen 660df3488d otlptranslator/update-copy.sh: Fix sed command lines
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-01-25 13:56:04 +01:00
Filip Petkovski 583f3e587c
Optimize histogram iterators (#13340)
Optimize histogram iterators

Histogram iterators allocate new objects in the AtHistogram and
AtFloatHistogram methods, which makes calculating rates over long
ranges expensive.

In #13215 we allowed an existing object to be reused
when converting an integer histogram to a float histogram. This commit follows
the same idea and allows injecting an existing object in the AtHistogram and
AtFloatHistogram methods. When the injected value is nil, iterators allocate
new histograms, otherwise they populate and return the injected object.

The commit also adds a CopyTo method to Histogram and FloatHistogram which
is used in the BufferedIterator to overwrite items in the ring instead of making
new copies.

Note that a specialized HPoint pool is needed for all of this to work 
(`matrixSelectorHPool`).

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
2024-01-23 17:02:14 +01:00
Bryan Boreham 63cdd6dbe1 storage: skip merging when no remote storage configured
Prometheus is hard-coded to use a fanout storage between TSDB and
a remote storage which by default is empty.
This change detects the empty storage and skips merging between
result sets, which would make `Select()` sort results.

Bottom line: we skip a sort unless there really is some remote storage
configured.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-01-18 17:50:06 +00:00
Goutham aee6896c47
Minor fixes to otlp vendor update script
Signed-off-by: Goutham <gouthamve@gmail.com>
2024-01-18 15:32:06 +05:30
Marc Tudurí 78c5ce3196
Drop old inmemory samples (#13002)
* Drop old inmemory samples

Co-authored-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
Signed-off-by: Marc Tuduri <marctc@protonmail.com>

* Avoid copying timeseries when the feature is disabled

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
Signed-off-by: Marc Tuduri <marctc@protonmail.com>

* Run gofmt

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
Signed-off-by: Marc Tuduri <marctc@protonmail.com>

* Clarify docs

Signed-off-by: Marc Tuduri <marctc@protonmail.com>

* Add more logging info

Signed-off-by: Marc Tuduri <marctc@protonmail.com>

* Remove loggers

Signed-off-by: Marc Tuduri <marctc@protonmail.com>

* optimize function and add tests

Signed-off-by: Marc Tuduri <marctc@protonmail.com>

* Simplify filter

Signed-off-by: Marc Tuduri <marctc@protonmail.com>

* rename var

Signed-off-by: Marc Tuduri <marctc@protonmail.com>

* Update help info from metrics

Signed-off-by: Marc Tuduri <marctc@protonmail.com>

* use metrics to keep track of drop elements during buildWriteRequest

Signed-off-by: Marc Tuduri <marctc@protonmail.com>

* rename var in tests

Signed-off-by: Marc Tuduri <marctc@protonmail.com>

* pass time.Now as parameter

Signed-off-by: Marc Tuduri <marctc@protonmail.com>

* Change buildwriterequest during retries

Signed-off-by: Marc Tuduri <marctc@protonmail.com>

* Revert "Remove loggers"

This reverts commit 54f91dfcae20488944162335ab4ad8be459df1ab.

Signed-off-by: Marc Tuduri <marctc@protonmail.com>

* use log level debug for loggers

Signed-off-by: Marc Tuduri <marctc@protonmail.com>

* Fix linter

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Remove noisy debug-level logs; add 'reason' label to drop metrics

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Remove accidentally committed files

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Propagate logger to buildWriteRequest to log dropped data

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Fix docs comment

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Make drop reason more specific

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Remove unnecessary pass of logger

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Use snake_case for reason label

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Fix dropped samples metric

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

---------

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
Signed-off-by: Marc Tuduri <marctc@protonmail.com>
Signed-off-by: Paschalis Tsilias <tpaschalis@users.noreply.github.com>
Co-authored-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
Co-authored-by: Paschalis Tsilias <tpaschalis@users.noreply.github.com>
2024-01-05 10:40:30 -08:00
Daniel Kerbel b2185d96af
Consider storage.ErrTooOldSample as non-retryable
Signed-off-by: Daniel Kerbel <nmdanny@gmail.com>
2023-12-26 18:44:39 +02:00
Bryan Boreham 8065bef172 Move metric type definitions to common/model
They are used in multiple repos, so common is a better place for them.
Several packages now don't depend on `model/textparse`, e.g.
`storage/remote`.

Also remove `metadata` struct from `api.go`, since it was identical to
a struct in the `metadata` package.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-12-19 18:56:54 +00:00
Bryan Boreham 99c17b4319
Merge pull request #13177 from bboreham/less-madness
scrape: consistent function names for metadata
2023-12-19 17:51:52 +00:00
Björn Rabenstein 775de1a3bd
Merge pull request #13276 from fpetkovski/reuse-float-histograms
Reuse float histogram objects
2023-12-13 13:43:01 +01:00
Filip Petkovski ea356c472e
Add comment on SampleRingIterator methods
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2023-12-13 08:35:02 +01:00
Filip Petkovski bb8363dbb3
Add comment on SampleRingIterator
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2023-12-13 08:30:02 +01:00
Filip Petkovski 48df9fc020
Export SampleRingIterator
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2023-12-11 11:18:08 +01:00
Arthur Silva Sens 5082655392
Append Created Timestamps (#12733)
* Append created timestamps.

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>

* Log when created timestamps are ignored

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>

* Proposed changes to Append CT PR.

Changes:

* Changed textparse Parser interface for consistency and robustness.
* Changed CT interface to be more explicit and handle validation.
* Simplified test, change scrapeManager to allow testability.
* Added TODOs.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Updates.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Addressed comments.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Refactor head_appender test

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>

* Fix linter issues

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>

* Use model.Sample in head appender test

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>

---------

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>
Signed-off-by: bwplotka <bwplotka@gmail.com>
Co-authored-by: bwplotka <bwplotka@gmail.com>
2023-12-11 08:43:42 +00:00
Filip Petkovski e2a9f8ac0f
Reuse float histogram objects
This commit reduces the memory needed to query native histogram objects
by reusing existing HPoint instances.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2023-12-11 08:24:58 +01:00
Filip Petkovski 10a82f87fd
Enable reusing memory when converting between histogram types
The 'ToFloat' method on integer histograms currently allocates new memory
each time it is called.

This commit adds an optional *FloatHistogram parameter that can be used
to reuse span and bucket slices. It is up to the caller to make sure the
input float histogram is not used anymore after the call.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2023-12-08 10:22:59 +01:00
Matthieu MOREL 9c4782f1cc
golangci-lint: enable testifylint linter (#13254)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-12-07 11:35:01 +00:00
Oleksandr Redko 2a75604f8e
Enable default revive rules (#13068)
Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>
2023-11-29 17:23:34 +00:00
Fiona Liao 5bee0cfce2
Change ChunkReader.Chunk() to ChunkOrIterable()
The ChunkReader interface's Chunk() has been changed to ChunkOrIterable(). 

This is a precursor to OOO native histogram support - with OOO native histograms, the chunks.Meta passed to Chunk() can result in multiple chunks being returned rather than just a single chunk (e.g. if oooMergedChunk has a counter reset in the middle). 

To support this, ChunkOrIterable() requires either a single chunk or an iterable to be returned. If an iterable is returned, the caller has the responsibility of converting the samples from the iterable into possibly multiple chunks. The OOOHeadChunkReader now returns an iterable rather than a chunk to prepare for the native histograms case. Also as a beneficial side effect, oooMergedChunk and boundedChunk has been simplified as they only need to implement the Iterable interface now, not the full Chunk interface.

---------

Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
2023-11-28 11:14:29 +01:00
Bryan Boreham 34676a240e scrape: consistent function names for metadata
Too confusing to have `MetadataList` and `ListMetadata`, etc.
I standardised on the ones which are in an interface.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-11-23 09:08:02 +00:00
Bryan Boreham a2d5c02298
Merge pull request #13084 from charleskorn/concatenatingchunkiterator
Fix issue where `concatenatingChunkIterator` can obscure errors
2023-11-22 08:16:39 +00:00
Goutham 3048a88ae7
Add suffixes
Older version already did that. This upgrade needed manual opt-in

Signed-off-by: Goutham <gouthamve@gmail.com>
2023-11-15 15:52:18 +01:00
Goutham a99f48cc9f
Bump OTel Collector dependency to v0.88.0
I initially didn't copy the otlptranslator/prometheus folder because I
assumed it wouldn't get changes. But it did. So this PR fixes that and
updates the Collector version.

Supersedes: https://github.com/prometheus/prometheus/pull/12809

Signed-off-by: Goutham <gouthamve@gmail.com>
2023-11-15 15:18:14 +01:00
machine424 413b713aa8
remote/storage.go: adjust Storage.Notify() to avoid a race condition with Storage.ApplyConfig()
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2023-11-14 10:07:45 +01:00
machine424 08c17df244
remote/storage.go: add a test to highlight a race condition
between Storage.Notify() and Storage.ApplyConfig()

see https://github.com/prometheus/prometheus/issues/12747

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2023-11-13 13:23:53 +01:00
machine424 0996b78326
remote_write: add a unit test to make sure the write client sends
the extra http headers as expected

This will help letting prometheus off the hook from situations like
https://github.com/prometheus/prometheus/issues/13030

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2023-11-09 15:56:48 +01:00
Julien Pivotto cf01ec2119
Merge pull request #13091 from mmorel-35/errorlint/util
util: use Go standard errors package
2023-11-07 21:07:25 -06:00
Linas Medziunas 1f8aea11d6 Move histogram validation code to model/histogram
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
2023-11-03 16:17:24 +02:00
Linas Medziunas 1cd6c1cde5 ValidateHistogram: strict Count check in absence of NaNs
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
2023-11-03 16:17:24 +02:00
Matthieu MOREL fe057fc60d use Go standard errors package
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-11-03 07:26:31 +00:00
Charles Korn 8274e248ad
Fix issue where concatenatingChunkIterator can obscure errors.
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2023-11-02 15:29:09 +11:00
Björn Rabenstein a43669e611
Merge pull request #12928 from alexandear/ci-enable-godot
ci(lint): enable godot; append dot at the end of comments
2023-11-01 17:15:41 +01:00
Charles Korn 5184368db6
Fix issue where chainSampleIterator can obscure errors (#13006)
* Fix issue where `chainSampleIterator` can obscure errors

Signed-off-by: Charles Korn <charles.korn@grafana.com>

* Address PR feedback.

Signed-off-by: Charles Korn <charles.korn@grafana.com>

---------

Signed-off-by: Charles Korn <charles.korn@grafana.com>
2023-10-31 22:35:17 +00:00
Julien Pivotto f568221610
Merge pull request #13057 from prometheus/release-2.48
Merge release-2.48 back into main
2023-10-31 15:24:39 -04:00
Oleksandr Redko fa90ca46e5 ci(lint): enable godot; append dot at the end of comments
Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>
2023-10-31 19:53:38 +02:00
beorn7 4696b46dd5 storage: Fix mixed samples handling in sampleRing
Two issues are fixed here, that lead to the same problem:

1. If `newSampleRing` is called with an unknown ValueType including
   ValueNone, we have initialized the interface buffer (`iBuf`).
   However, we would still use a specialized buffer for the first
   sample, opportunistically assuming that we might still not
   encounter mixed samples and we should go down the more efficient
   road.

2. If the `sampleRing` is `reset`, we leave all buffers alone,
   including `iBuf`, which is generally fine, but not for `iBuf`, see
   below.

In both cases, `iBuf` already contains values, but we will fill one of
the specialized buffers first. Once we then actually encounter mixed
samples, the content of the specialized buffer is copied into `iBuf`
using `append`. That's by itself the right idea because `iBuf` might
be `nil`, and even if not, it might or might not have the right
capacity. However, this approach assumes that `iBuf` is empty, or more
precisely has a length of zero.

This commit makes sure that `iBuf` does not get needlessly initialized
in `newSampleRing` and that it is emptied upon `reset`.

A test case is added to demonstrate both issues above.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-10-31 16:18:09 +01:00
Oleksandr Redko 8e5f0387a2
ci(lint): enable nolintlint and remove redundant comments (#12926)
Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>
2023-10-31 12:35:13 +01:00
Matthieu MOREL 1ec6e407d0
ci(lint): enable errorlint on storage (#12935)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-10-31 12:15:30 +01:00
Levi Harrison 454a0a2c1b Update dependencies for 2.48 (#12964)
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2023-10-15 13:47:42 -04:00
Levi Harrison dcaca86958
Update dependencies for 2.48 (#12964) 2023-10-15 10:53:59 -04:00
Bryan Boreham a5a4eab679
Storage: reduce memory allocations when merging series sets (#12938)
Instead of setting to nil and allocating a new slice every time the
merge is advanced, re-use the previous slice.
This is safe because the `currentSets` member is only used inside member
functions, and explicitly copied in `At()`, the only place it leaves the
struct.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-10-06 12:28:07 +01:00
rakshith210 cdad64002a
Added Azure OAuth support (#12572)
* Added Azure OAuth support

Signed-off-by: rakshith210 <rakshith.me@gmail.com>

* Added missing comment

Signed-off-by: rakshith210 <rakshith.me@gmail.com>

* Addressing comment

Signed-off-by: rakshith210 <rakshith.me@gmail.com>

* Fixed lint issue

Signed-off-by: rakshith210 <rakshith.me@gmail.com>

* Fix test

Signed-off-by: rakshith210 <rakshith.me@gmail.com>

* Addressing comments

Signed-off-by: rakshith210 <rakshith.me@gmail.com>

* Added documentation and updated unit tests

Signed-off-by: rakshith210 <rakshith.me@gmail.com>

* Addressing comments

Signed-off-by: rakshith210 <rakshith.me@gmail.com>

---------

Signed-off-by: rakshith210 <rakshith.me@gmail.com>
2023-10-04 22:16:36 -04:00
Goutham Veeramachaneni 86729d4d7b
Update exp package (#12650) 2023-09-21 22:53:51 +02:00
Björn Rabenstein ade2ee3fe4
Merge pull request #12716 from bboreham/simplify-seek
storage: simplify Seek on BufferedSeriesIterator
2023-09-21 13:56:29 +02:00
William Dumont ce6ad15422 remote-write: TestClientRetryAfter status code 500
and compare the retryAfter values.

Signed-off-by: William Dumont <william.dumont@grafana.com>
2023-09-20 10:25:43 +00:00
William Dumont febd62a23e remote-write: refactor TestClientRetryAfter
The new version features a set of test cases that simplify the addition
of new HTTP status codes.

Signed-off-by: William Dumont <william.dumont@grafana.com>
2023-09-20 10:24:52 +00:00
Bryan Boreham 9b85354acd remote-write: respect Retry-After header on 5xx errors
If the server sent it to us, we should assume it knows better than we
do and respect it.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-09-20 10:14:38 +00:00
Paschalis Tsilias c173cd57c9
Add a header to count retried remote write requests (#12729)
Header name is `Retry-Attempt`, only set when >0.

Signed-off-by: Marc Tuduri <marctc@protonmail.com>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
2023-09-20 11:11:03 +01:00
George Krajcsovits 3512b2d678
storage: make histogram reset handling consistent in chainSampleIterator (#12779)
storage: make histogram reset handling consistent in chainSampleIterator

---------

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2023-09-19 17:06:46 +02:00
zenador 69edd8709b
Add warnings (and annotations) to PromQL query results (#12152)
Return annotations (warnings and infos) from PromQL queries

This generalizes the warnings we have already used before (but only for problems with remote read) as "annotations".

Annotations can be warnings or infos (the latter could be false positives). We do not treat them different in the API for now and return them all as "warnings". It would be easy to distinguish them and return infos separately, should that appear useful in the future.

The new annotations are then used to create a lot of warnings or infos during PromQL evaluations. Partially these are things we have wanted for a long time (e.g. inform the user that they have applied `rate` to a metric that doesn't look like a counter), but the new native histograms have created even more needs for those annotations (e.g. if a query tries to aggregate float numbers with histograms).

The annotations added here are not yet complete. A prominent example would be a warning about a range too short for a rate calculation. But such a warnings is more tricky to create with good fidelity and we will tackle it later.

Another TODO is to take annotations into account when evaluating recording rules.

---------

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2023-09-14 18:57:31 +02:00
Arve Knudsen 156222cc50
Add context argument to LabelQuerier.LabelValues (#12665)
Add context argument to LabelQuerier.LabelValues and
LabelQuerier.SortedLabelValues.

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2023-09-14 16:02:04 +02:00
Arve Knudsen a964349e97
Add context argument to LabelQuerier.LabelNames (#12666)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2023-09-14 10:39:51 +02:00
beorn7 0521ec12af storage: remove obsolete TODO
This was solved one layer deeper with #11687.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-09-13 15:10:22 +02:00
Arve Knudsen 6daee89e5f
Add context argument to Querier.Select (#12660)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2023-09-12 12:37:38 +02:00
Gregor Zeitlinger f01718262a
Unit tests for native histograms (#12668)
promql: Extend testing framework to support native histograms

This includes both the internal testing framework as well as the rules unit test feature of promtool.

This also adds a bunch of basic tests. Many of the code level tests can now be converted to tests within the framework, and more tests can be added easily.

---------

Signed-off-by: Harold Dost <h.dost@criteo.com>
Signed-off-by: Gregor Zeitlinger <gregor.zeitlinger@grafana.com>
Signed-off-by: Stephen Lang <stephen.lang@grafana.com>
Co-authored-by: Harold Dost <h.dost@criteo.com>
Co-authored-by: Stephen Lang <stephen.lang@grafana.com>
Co-authored-by: Gregor Zeitlinger <gregor.zeitlinger@grafana.com>
2023-08-25 23:35:42 +02:00
Justin Lei 8ef7dfdeeb
Add a chunk size limit in bytes (#12054)
Add a chunk size limit in bytes

This creates a hard cap for XOR chunks of 1024 bytes.

The limit for histogram chunk is also 1024 bytes, but it is a soft limit as a histogram has a dynamic size, and even a single one could be larger than 1024 bytes.

This also avoids cutting new histogram chunks if the existing chunk has fewer than 10 histograms yet. In that way, we are accepting "jumbo chunks" in order to have at least 10 histograms in a chunk, allowing compression to kick in.

Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-08-24 15:21:17 +02:00
beorn7 aa82fe198f tsdb: Fix histogram validation
So far, `ValidateHistogram` would not detect if the count did not
include the count in the zero bucket. This commit fixes the problem
and updates all the tests that have been undetected offenders so far.

Note that this problem would only ever create false negatives, so we
never falsely rejected to store a histogram because of it.

On the other hand, `ValidateFloatHistogram` has been to strict with
the count being at least as large as the sum of the counts in all the
buckets. Float precision issues could create false positives here, see
products of PromQL evaluations, it's actually quite hard to put an
upper limit no the floating point imprecision. Users could produce the
weirdest expressions, maxing out float precision problems. Therefore,
this commit simply removes that particular check from
`ValidateFloatHistogram`.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-08-22 23:04:01 +02:00
Michael Hoffmann 4d8e380269
promql: allow tests to be imported (#12050)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2023-08-18 20:48:59 +02:00
Bryan Boreham a018a7ef53 storage: simplify Seek on BufferedSeriesIterator
Small tweak to call a simpler method

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-08-17 07:42:18 +01:00
Bryan Boreham d2ae8dc3cb remote-write: add http.resend_count tracing attribute
As recommended by the OpenTelemetry semantic conventions.

https://opentelemetry.io/docs/specs/otel/trace/semantic_conventions/http/#http-client
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-08-11 16:20:12 +00:00
Goutham Veeramachaneni ad4f514e66
Add OTLP Ingestion endpoint (#12571)
* Add OTLP Ingestion endpoint

We copy files from the otel-collector-contrib. See the README in
`storage/remote/otlptranslator/README.md`.

This supersedes: https://github.com/prometheus/prometheus/pull/11965

Signed-off-by: gouthamve <gouthamve@gmail.com>

* Return a 200 OK

It is what the OTEL Golang SDK expect :(

https://github.com/open-telemetry/opentelemetry-go/issues/4363

Signed-off-by: Goutham <gouthamve@gmail.com>

---------

Signed-off-by: gouthamve <gouthamve@gmail.com>
Signed-off-by: Goutham <gouthamve@gmail.com>
2023-07-28 12:35:28 +02:00
George Krajcsovits 6cd2d1621f
Hide histogram chunk append and reset header internals (#12352)
tsdb: Hide histogram chunk append and reset header internals

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>
2023-07-26 15:08:16 +02:00
LHHDZ 7d8f9b0978
remote-write receiver: reuse 'ref' to optimize multiple samples for same series (#12580)
reuse 'ref' to optimize multi samples processing efficiency

Signed-off-by: changlin.shi <changlin.shi@ly.com>
2023-07-22 14:24:46 +01:00
György Krajcsovits d4e355243a tsdbutil/ChunkFromSamplesGeneric should not panic
Add error handling instead.
Prepares for #12352

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2023-07-20 17:01:34 +02:00
Julien Pivotto 0f85e4f41d
Merge pull request #12539 from bboreham/slices-sorts
Replace sort.Slice with faster slices.SortFunc
2023-07-11 13:09:02 +02:00
Bryan Boreham ce153e3fff Replace sort.Sort with faster slices.SortFunc
The generic version is more efficient.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-07-10 09:43:45 +00:00
Julien Pivotto 986fde06b2
Merge pull request #11688 from damnever/fix/datamodelvalidation-remotewriteapi
Validate the metric names and labels in the remote write handler
2023-07-04 13:52:02 +02:00
Bryan Boreham 5255bf06ad Replace sort.Slice with faster slices.SortFunc
The generic version is more efficient.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-07-02 22:17:08 +00:00
rakshith210 b1675e23af
Add Azure AD package for remote write (#11944)
* Add Azure AD package for remote write
* Made AzurePublic default and updated configuration.md
* Updated config structure and removed getToken at initialization
* Changed passing context from request

Signed-off-by: Rakshith Padmanabha <rapadman@microsoft.com>
Signed-off-by: rakshith210 <rakshith.me@gmail.com>
2023-06-01 15:20:10 -06:00
Bryan Boreham a073e04a9b
Merge pull request #12366 from prometheus/release-2.44
Merge release 2.44 back to main
2023-05-16 18:06:29 +01:00
Callum Styan 0d2108ad79
[tsdb] re-implement WAL watcher to read via a "notification" channel (#11949)
* WIP implement WAL watcher reading via notifications over a channel from
the TSDB code

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Notify via head appenders Commit (finished all WAL logging) rather than
on each WAL Log call

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix misspelled Notify plus add a metric for dropped Write notifications

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Update tests to handle new notification pattern

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* this test maybe needs more time on windows?

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* does this test need more time on windows as well?

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* read timeout is already a time.Duration

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* remove mistakenly commited benchmark data files

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* address some review feedback

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* fix missed changes from previous commit

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix issues from wrapper function

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* try fixing race condition in test by allowing tests to overwrite the
read ticker timeout instead of calling the Notify function

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* fix linting

Signed-off-by: Callum Styan <callumstyan@gmail.com>

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2023-05-15 12:31:49 -07:00
George Krajcsovits f5fcaa3872
Fix setting reset header to gauge histogram in seriesToChunkEncoder (#12329)
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2023-05-05 18:04:30 +05:30
Justin Lei 7bbf24b707 Make MemoizedSeriesIterator not implement chunkenc.Iterator
Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-05-03 12:45:39 -07:00
beorn7 b0272255b7 storage: optimise sampleRing
Replace many checks for the lengths of slices with a single tracking
variable.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-05-03 20:09:29 +02:00
Justin Lei 6985dcbe73 Optimize and test MemoizedSeriesIterator
Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-05-02 08:53:18 -07:00
Filip Petkovski 0d049feac7
Fix encoding samples in ChunkSeries (#12185)
The storage.ChunkSeries iterator assumes that a histogram sample can always be
appended to the currently open chunk. This is not the case when there is a counter reset,
or when appending a stale sample to a chunk with non-stale samples. In addition, the open chunk sometimes
needs to be recoded before a sample can be appended.

This commit addresses the issue by implementing a RecodingAppender which can recode incoming
samples in a transparent way. It also detects cases when a sample cannot be appended at all and
returns `false` so that the caller can open a new chunk.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-04-28 16:52:21 -04:00
Jeanette Tan 1102ffd188 Fix according to code review
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2023-04-22 02:27:15 +08:00
Jeanette Tan e9a1e26ab7 Perform integer/float histogram type checking on conversions, and use a consistent method for determining integer vs float histogram
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2023-04-22 02:27:15 +08:00
Björn Rabenstein 78cd9ae2c3
Merge pull request #12264 from rabenhorst/sample-ring-iterator-mixed-histograms-fix
Fix for `sampleRingIterator` with mixed histograms
2023-04-20 16:58:18 +02:00
Matthieu MOREL bae9a21200
Merge branch 'main' into linter/nilerr
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-04-19 19:56:39 +02:00
beorn7 5b53aa1108 style: Replace else if cascades with switch
Wiser coders than myself have come to the conclusion that a `switch`
statement is almost always superior to a statement that includes any
`else if`.

The exceptions that I have found in our codebase are just these two:

* The `if else` is followed by an additional statement before the next
  condition (separated by a `;`).
* The whole thing is within a `for` loop and `break` statements are
  used. In this case, using `switch` would require tagging the `for`
  loop, which probably tips the balance.

Why are `switch` statements more readable?

For one, fewer curly braces. But more importantly, the conditions all
have the same alignment, so the whole thing follows the natural flow
of going down a list of conditions. With `else if`, in contrast, all
conditions but the first are "hidden" behind `} else if `, harder to
spot and (for no good reason) presented differently from the first
condition.

I'm sure the aforemention wise coders can list even more reasons.

In any case, I like it so much that I have found myself recommending
it in code reviews. I would like to make it a habit in our code base,
without making it a hard requirement that we would test on the CI. But
for that, there has to be a role model, so this commit eliminates all
`if else` occurrences, unless it is autogenerated code or fits one of
the exceptions above.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-19 17:22:31 +02:00
beorn7 c3c7d44d84 lint: Adjust to the lint warnings raised by current versions of golint-ci
We haven't updated golint-ci in our CI yet, but this commit prepares
for that.

There are a lot of new warnings, and it is mostly because the "revive"
linter got updated. I agree with most of the new warnings, mostly
around not naming unused function parameters (although it is justified
in some cases for documentation purposes – while things like mocks are
a good example where not naming the parameter is clearer).

I'm pretty upset about the "empty block" warning to include `for`
loops. It's such a common pattern to do something in the head of the
`for` loop and then have an empty block. There is still an open issue
about this: https://github.com/mgechev/revive/issues/810 I have
disabled "revive" altogether in files where empty blocks are used
excessively, and I have made the effort to add individual
`// nolint:revive` where empty blocks are used just once or twice.
It's borderline noisy, though, but let's go with it for now.

I should mention that none of the "empty block" warnings for `for`
loop bodies were legitimate.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-19 17:10:10 +02:00
Sebastian Rabenhorst 5d4ec08a1f
Fixed sampleRingIterator for mixed histograms
Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

Fixed sampleRingIterator for mixed histograms

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

Fixed lint
2023-04-14 12:45:36 +02:00
Matthieu MOREL fb3eb21230 enable gocritic, unconvert and unused linters
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-04-13 19:20:22 +00:00
beorn7 717a3f8e25 storage: Manually expand genericAdd for specific types
This commit is doing what I would have expected that Go generics do
for me. However, the PromQL benchmarks show a significant runtime and
allocation increase with `genericAdd`, so this replaces it with
hand-coded non-generic versions of it.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-13 19:25:24 +02:00
beorn7 817a2396cb Name float values as "floats", not as "values"
In the past, every sample value was a float, so it was fine to call a
variable holding such a float "value" or "sample". With native
histograms, a sample might have a histogram value. And a histogram
value is still a value. Calling a float value just "value" or "sample"
or "V" is therefore misleading. Over the last few commits, I already
renamed many variables, but this cleans up a few more places where the
changes are more invasive.

Note that we do not to attempt naming in the JSON APIs or in the
protobufs. That would be quite a disruption. However, internally, we
can call variables as we want, and we should go with the option of
avoiding misunderstandings.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-13 19:25:24 +02:00
beorn7 462240bc78 storage: add specialized buffers to sampleRing
This utilizes the fact that most sampleRings will only contain samples
of one type. In this case, the generic interface is circumvented, and
a bespoke buffer for the one actually occurring sample type is
used. Should a sampleRing receive a sample of a different kind later,
it will transparently switch to the generic behavior.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-13 19:25:24 +02:00
beorn7 630bcb494b storage: Use separate sample types for histogram vs. float
Previously, we had one “polymorphous” `sample` type in the `storage`
package. This commit breaks it up into `fSample`, `hSample`, and
`fhSample`, each still implementing the `tsdbutil.Sample` interface.

This reduces allocations in `sampleRing.Add` but inflicts the penalty
of the interface wrapper, which makes things worse in total.

This commit therefore just demonstrates the step taken. The next
commit will tackle the interface overhead problem.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-13 19:25:24 +02:00
Björn Rabenstein 6e0a46900b
Merge pull request #12192 from leizor/leizor/prometheus/issues/11204
Add support for native histograms to concreteSeriesIterator
2023-04-11 12:30:35 +02:00
Justin Lei f90013a5a0 Update storage/remote/codec.go
Co-authored-by: Björn Rabenstein <github@rabenste.in>
Signed-off-by: Justin Lei <97976793+leizor@users.noreply.github.com>
2023-04-06 09:54:15 -07:00
Justin Lei 83f43982c9 Add support for native histograms to concreteSeriesIterator
Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-04-06 09:54:15 -07:00
Xiaochao Dong (@damnever) 2b7202c4cc Validate the metric names and labels in the remote write handler
Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>
2023-04-05 19:09:05 +08:00
Bryan Boreham b987afa7ef labels: simplify call to get Labels from Builder
It took a `Labels` where the memory could be re-used, but in practice
this hardly ever benefitted. Especially after converting `relabel.Process`
to `relabel.ProcessBuilder`.

Comparing the parameter to `nil` was a bug; `EmptyLabels` is not `nil`
so the slice was reallocated multiple times by `append`.

Lastly `Builder.Labels()` now estimates that the final size will depend
on labels added and deleted.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-03-22 17:05:20 +00:00
Björn Rabenstein 559adab471
Merge pull request #12085 from leizor/leizor/prometheus/issues/11204
Handle native histograms in remote read
2023-03-21 17:25:34 +01:00
Oleg Zaytsev beb7d3b80f
remote.Client: store urlString
During remote write, we call url.String() twice:
- to add the Endpoint() to the span
- to actually know where whe should send the request

This value does not change over time, and it's not really that
lightweight to calculate. I wrote this simple benchmark:

    func BenchmarkURLString(b *testing.B) {
        u, err := url.Parse("https://remote.write.com/api/v1")
        require.NoError(b, err)

        b.Run("string", func(b *testing.B) {
            count := 0
            for i := 0; i < b.N; i++ {
                count += len(u.String())
            }
        })
    }

And the results are ~200ns/op, 80B/op, 3 allocs/op.

Yes, we're going to go to the network here, which is a huge amount of
resources compared to this, but still, on agents that send 500 requests
per second, that is 1500 wasteful allocations per second.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2023-03-16 09:53:10 +01:00
Justin Lei 60ad864667 Remove hacky promql.Test native histogram thing
Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-03-09 11:05:53 -08:00
Justin Lei c16b6a0185 Handle native histograms in remote read
Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-03-09 09:13:53 -08:00
Arve Knudsen bc9a82f5a1
remote: Improve some comments (#12102)
Improve some comments in storage/remote/queue_manager.go, wrt. general
language and a typo.

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2023-03-09 11:05:24 +00:00
Arve Knudsen 435b500de7
remote: Convert to RecoverableError using errors.As (#12103)
In storage/remote, try converting to RecoverableError using errors.As,
instead of through direct casting.

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2023-03-08 13:58:09 -07:00
Julien Pivotto 475f9984d0
Merge pull request #11787 from damnever/perf/avoid-alloc-if-no-externallabels
Avoid allocation during remote write if external labels is empty
2023-02-22 23:38:21 +01:00
Julien Pivotto dfd2b5340e
Merge pull request #11951 from Fish-pro/chore/httpvar
Use http constants instead of string
2023-02-10 22:44:50 +01:00
Justin Lei af1d9e01c7
Refactor tsdbutil for tests/native histograms (#11948)
* Add float histograms to ChunkFromSamplesGeneric

Signed-off-by: Justin Lei <justin.lei@grafana.com>

* Add Generate*Samples functions to tsdbutil

Signed-off-by: Justin Lei <justin.lei@grafana.com>

* PR responses

Signed-off-by: Justin Lei <justin.lei@grafana.com>

---------

Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-02-10 17:09:33 +05:30
Fish-pro 43d77f7c41 Use http constants instead of string
Signed-off-by: Fish-pro <zechun.chen@daocloud.io>
2023-02-10 10:21:05 +08:00
Charles Korn 0a1de58f7e
Mark Histogram.(Positive|Negative)Spans as non-nullable.
As far as I understand it, we'd never expect to receive a nil span,
and remote.spansProtoToSpans would panic if we received a nil span.

Marking the fields as non-nullable also means the generated Golang
code doesn't use pointers for these fields, reducing allocations.

Signed-off-by: Charles Korn <charles.korn@grafana.com>
2023-02-03 13:49:22 +11:00
Björn Rabenstein 60d763282e
Merge pull request #11864 from prometheus/beorn7/histogram2
histograms: Return actually useful counter reset hints
2023-01-26 11:22:40 +01:00
beorn7 49c5b1fae4 histograms: Fix counter reset header during merging
See detailed discussion:
https://github.com/prometheus/prometheus/pull/11864#issuecomment-1403963451

Signed-off-by: beorn7 <beorn@grafana.com>
2023-01-25 18:23:10 +01:00
beorn7 1cfc8f65a3 histograms: Return actually useful counter reset hints
This is a bit more conservative than we could be. As long as a chunk
isn't the first in a block, we can be pretty sure that the previous
chunk won't disappear. However, the incremental gain of returning
NotCounterReset in these cases is probably very small and might not be
worth the code complications.

Wwith this, we now also pay attention to an explicitly set counter
reset during ingestion. While the case doesn't show up in practice
yet, there could be scenarios where the metric source knows there was
a counter reset even if it might not be visible from the values in the
histogram. It is also useful for testing.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-01-25 16:57:21 +01:00
György Krajcsovits 2d9a9cbc08 Fix storage/remote/codec ignoreing histogram reset hint
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2023-01-24 12:56:30 +01:00
Bryan Boreham faac4c066d package storage: fix up test not to access Labels internals
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-01-16 18:48:49 +00:00
Jesus Vazquez 136956cca4
Attempt to append ooo sample at the end first (#11615)
This is an optimization on the existing append in OOOChunk.

What we've been doing so far is find the place inside the out-of-order
slice where the new sample should go in and then place it there and move
any samples to the right if necessary. This is OK but requires a binary
search every time the slice is bigger than 0.

The optimization is opinionated and suggests that although out-of-order
samples can be out-of-order amongst themselves they'll probably be in
order thus we can probably optimistically append at the end and if not
do the binary search.

OOOChunks are capped to 30 samples by default so this is a small
optimization but everything adds up, specially if you handle many active
timeseries with out-of-order samples.

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Signed-off-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-01-13 19:00:50 +05:30
Marc Tudurí 721f33dbb0
histograms: Add remote-write support for Float Histograms (#11817)
* adapt code.go and write_handler.go to support float histograms
* adapt watcher.go to support float histograms
* wip adapt queue_manager.go to support float histograms
* address comments for metrics in queue_manager.go
* set test cases for queue manager
* use same counts for histograms and float histograms
* refactor createHistograms tests
* fix float histograms ref in watcher_test.go
* address PR comments

Signed-off-by: Marc Tuduri <marctc@protonmail.com>
2023-01-13 16:39:20 +05:30
beorn7 d121db7a65
federate: Fix PeekBack usage
In most cases, there is no sample at `maxt`, so `PeekBack` has to be
used. So far, `PeekBack` did not return a float histogram, and we
disregarded even any returned normal histogram. This fixes both, and
also tweaks the unit test to discover the problem (by using an earlier
timestamp than "now" for the samples in the TSDB).

Signed-off-by: beorn7 <beorn@grafana.com>
2023-01-12 20:43:02 +05:30
Xiaochao Dong (@damnever) 2d61d012ff Avoid copy during remote write if external labels is empty
Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>
2022-12-30 19:18:30 +08:00
Fish-pro 6ed71a229e Use errors.Is to check for a specific error
Signed-off-by: Fish-pro <zechun.chen@daocloud.io>
2022-12-29 23:23:07 +08:00