Commit graph

178 commits

Author SHA1 Message Date
zenador 32ee1b15de
Fix error on ingesting out-of-order exemplars (#13021)
Fix and improve ingesting exemplars for native histograms.

See code comment for a detailed explanation of the algorithm.

Note that this changes the current behavior for all kind of samples slightly: We now allow exemplars with the same timestamp as during the last scrape if the value or the labels have changed.

Also note that we now do not ingest exemplars without timestamps for native histograms anymore.

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>

---------

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: zenador <zenador@users.noreply.github.com>
Co-authored-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
2023-11-16 15:07:37 +01:00
Matthieu MOREL 7eaefcf379
ci(lint): enable errorlint on scrape (#12923)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
Signed-off-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>
Co-authored-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>
2023-11-01 20:06:46 +01:00
Björn Rabenstein a43669e611
Merge pull request #12928 from alexandear/ci-enable-godot
ci(lint): enable godot; append dot at the end of comments
2023-11-01 17:15:41 +01:00
Julien Pivotto 84aadfc45b scrape: Added trackTimestampsStaleness configuration option
Add the ability to track staleness when an explicit timestamp is set.
Useful for cAdvisor.

Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-10-31 16:58:42 -04:00
Oleksandr Redko fa90ca46e5 ci(lint): enable godot; append dot at the end of comments
Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>
2023-10-31 19:53:38 +02:00
Paulin Todev 5752050b42
Scrape metrics can now be registered with a non-default registry.
* A registerer is passed to the scrape Manager,
and all scrape metrics register with it.
* For now the registry which we pass to the scrape
Manager is still the global one.

Signed-off-by: Paulin Todev <paulin.todev@gmail.com>
2023-10-11 16:19:00 +01:00
Bartlomiej Plotka 624b973ebf
Added ability to specify scrape protocols to accept during HTTP content type negotiation. (#12738)
* Added ability to specify scrape protocols to accept during HTTP content type negotiation.


This is done via new option in GlobalConfig and ScrapeConfig: "scrape_protocol"

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Fixed readability and log message.

Signed-off-by: bwplotka <bwplotka@gmail.com>

---------

Signed-off-by: bwplotka <bwplotka@gmail.com>
2023-10-10 11:16:55 +01:00
Bryan Boreham f6d9c84fde
scraping: delay creating buffer, to save memory (#12953)
We don't need the buffer to read the response until the scrape http call
returns; creating it earlier makes the buffer pool larger.

I split `scrape()` into `scrape()` which returns with the http response,
and `readResponse()` which decompresses and copies the data into the
supplied buffer. This design was chosen to minimize impact on the logic.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-10-09 17:23:53 +01:00
Bryan Boreham 7c934ae18c scraping: hoist labels variable to save garbage
`lset` escapes to heap due to being passed through the text-parser
interface, so we can reduce garbage by hoisting it out of the loop so
only one allocation is done for every series in a scrape.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-10-05 11:04:59 +00:00
Goutham Veeramachaneni 86729d4d7b
Update exp package (#12650) 2023-09-21 22:53:51 +02:00
Bryan Boreham 611f50bb3d scrape: retain all dropped targets when KeepDroppedTargets is zero
This was a bug.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-08-20 14:32:23 +01:00
Bryan Boreham 1e3fef6ab0
scraping: limit detail on dropped targets, to save memory (#12647)
It's possible (quite common on Kubernetes) to have a service discovery
return thousands of targets then drop most of them in relabel rules.
The main place this data is used is to display in the web UI, where
you don't want thousands of lines of display.

The new limit is `keep_dropped_targets`, which defaults to 0
for backwards-compatibility.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-08-14 15:39:25 +01:00
beorn7 0e3f35324b scrape: Enable ingestion of multiple exemplars per sample
This has become a requirement for native histograms, as a single
histogram sample commonly has many buckets, so that providing many
exemplars makes sense.

Since OM text doesn't support native histograms yet, the test had to
be expanded to also support protobuf test cases.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-07-13 14:16:10 +02:00
Bryan Boreham 5255bf06ad Replace sort.Slice with faster slices.SortFunc
The generic version is more efficient.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-07-02 22:17:08 +00:00
Julius Volz cb045c0e4b Fix wording from "jitterSeed" -> "offsetSeed" for server-wide scrape offsets
In digital communication, "jitter" usually refers to how much a signal deviates
from true periodicity, see https://en.wikipedia.org/wiki/Jitter. The way we are
using the "jitterSeed" in Prometheus does not affect the true periodicity at
all, but just introduces a constant phase shift (or offset) within the period.
So it would be more correct and less confusing to call the "jitterSeed" an
"offsetSeed" instead.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2023-05-25 11:54:00 +02:00
beorn7 9e500345f3 textparse/scrape: Add option to scrape both classic and native histograms
So far, if a target exposes a histogram with both classic and native
buckets, a native-histogram enabled Prometheus would ignore the
classic buckets. With the new scrape config option
`scrape_classic_histograms` set, both buckets will be ingested,
creating all the series of a classic histogram in parallel to the
native histogram series. For example, a histogram `foo` would create a
native histogram series `foo` and classic series called `foo_sum`,
`foo_count`, and `foo_bucket`.

This feature can be used in a migration strategy from classic to
native histograms, where it is desired to have a transition period
during which both native and classic histograms are present.

Note that two bugs in classic histogram parsing were found and fixed
as a byproduct of testing the new feature:

1. Series created from classic _gauge_ histograms didn't get the
   _sum/_count/_bucket prefix set.
2. Values of classic _float_ histograms weren't parsed properly.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-05-13 01:32:25 +02:00
Jeanette Tan 40240c9c1c Update according to code review
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2023-05-05 02:33:00 +08:00
Jeanette Tan 2ad39baa72 Treat bucket limit like sample limit and make it fail the whole scrape and return an error
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2023-04-22 03:25:07 +08:00
Jeanette Tan 4d21ac23e6 Implement bucket limit for native histograms
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2023-04-22 03:14:19 +08:00
Matthieu MOREL bae9a21200
Merge branch 'main' into linter/nilerr
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-04-19 19:56:39 +02:00
beorn7 5b53aa1108 style: Replace else if cascades with switch
Wiser coders than myself have come to the conclusion that a `switch`
statement is almost always superior to a statement that includes any
`else if`.

The exceptions that I have found in our codebase are just these two:

* The `if else` is followed by an additional statement before the next
  condition (separated by a `;`).
* The whole thing is within a `for` loop and `break` statements are
  used. In this case, using `switch` would require tagging the `for`
  loop, which probably tips the balance.

Why are `switch` statements more readable?

For one, fewer curly braces. But more importantly, the conditions all
have the same alignment, so the whole thing follows the natural flow
of going down a list of conditions. With `else if`, in contrast, all
conditions but the first are "hidden" behind `} else if `, harder to
spot and (for no good reason) presented differently from the first
condition.

I'm sure the aforemention wise coders can list even more reasons.

In any case, I like it so much that I have found myself recommending
it in code reviews. I would like to make it a habit in our code base,
without making it a hard requirement that we would test on the CI. But
for that, there has to be a role model, so this commit eliminates all
`if else` occurrences, unless it is autogenerated code or fits one of
the exceptions above.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-19 17:22:31 +02:00
Matthieu MOREL fb3eb21230 enable gocritic, unconvert and unused linters
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-04-13 19:20:22 +00:00
Bryan Boreham b987afa7ef labels: simplify call to get Labels from Builder
It took a `Labels` where the memory could be re-used, but in practice
this hardly ever benefitted. Especially after converting `relabel.Process`
to `relabel.ProcessBuilder`.

Comparing the parameter to `nil` was a bug; `EmptyLabels` is not `nil`
so the slice was reallocated multiple times by `append`.

Lastly `Builder.Labels()` now estimates that the final size will depend
on labels added and deleted.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-03-22 17:05:20 +00:00
Bryan Boreham 0c09c3feb0 scrape sync: avoid copy of labels for dropped targets
Since the Target object was just created in this function, nobody else
has a reference to it and there are no concerns about it being modified
concurrently so we don't need to copy the value.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-03-16 20:35:13 +00:00
Bryan Boreham 0dfa1e73f8 scrape: use LabelsRange instead of Labels, for performance
Includes a rewrite of `resolveConflictingExposedLabels` to use
`labels.Builder.Get`, which simplifies it considerably.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-03-16 20:35:13 +00:00
Bryan Boreham f4fd9b0d68 scrape: re-use memory in TargetsFromGroup
Common service discovery mechanisms such as Kubernetes can generate a
lot of target groups, so this function was allocating a lot of memory
which then immediately became garbage. Re-using the structures across
an entire Sync saves effort.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-03-07 17:21:37 +00:00
Jimmie Han a13249a98f scrape: fix prometheus_target_scrape_pool_target_limit metric not set on creating scrape pool (#12001)
Signed-off-by: Jimmie Han <hanjinming@outlook.com>
2023-02-21 13:14:04 +08:00
Bryan Boreham 75e5d600d9
Merge pull request #11748 from bboreham/safe-scrape
scrape: remove unsafe code
2023-01-16 17:57:12 +00:00
Bryan Boreham d228d1d9cc scrape: remove 'mets' string completely
This makes all usage of maps in scrape.go consistent.

Also remove comment about unsafe strings, since we don't use them any
more in this package.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-01-04 12:05:58 +00:00
Fish-pro 6ed71a229e Use errors.Is to check for a specific error
Signed-off-by: Fish-pro <zechun.chen@daocloud.io>
2022-12-29 23:23:07 +08:00
Marc Tudurí 9474610baf
Support FloatHistogram in TSDB (#11522)
Extends Appender.AppendHistogram function to accept the FloatHistogram. TSDB supports appending, querying, WAL replay, for this new type of histogram.

Signed-off-by: Marc Tudurí <marctc@protonmail.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-12-28 14:25:07 +05:30
Bryan Boreham bec5abc4dc scrape: remove unsafe code
The `yolostring` routine was intended to avoid an allocation when
converting from a `[]byte` to a `string` for map lookup.
However, since 2014 Go has recognized this pattern and does not make
a copy of the data when looking up a map. So the unsafe code is not
necessary.

In line with this, constants like `scrapeHealthMetricName` also become
`[]byte`.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-12-20 17:26:43 +00:00
Bryan Boreham 91254fb187 Update package scrape for new labels.Labels type
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-12-19 15:22:09 +00:00
Xiaochao Dong (@damnever) 9979024a30 Report error if the series contains invalid metric names or labels during scrape
Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>
2022-12-08 20:01:20 +08:00
Björn Rabenstein a61c4b266a
scrape: Fix accept header, now for real (#11552)
This reinstates the behavior of v2.39. The header got messed up in the
sparsehistogram when the change of the version in main was merged into
it (and the merge conflict had to be resolved).

I don't think the current state will actually break anyone, although
it is technically possible. I propose to merge this into the bugfix
branch in any case, but I think we can wait for other bugfixes before
cutting a v2.40.1. (Unless, of course, somebody reports an actual
breakage because of the header.)

Signed-off-by: beorn7 <beorn@grafana.com>
2022-11-09 11:19:25 +01:00
Björn Rabenstein 54ce07e9a0
scrape: Fix accept header (#11542)
First of all, there was a typo: `encoding=delimited` was a left-over
in the `scrapeAcceptHeader`.

Second, the recently updated `version=1.0.0` prevents current versions
of client_golang to negotiate OpenMetrics, as they expect
`version=0.0.1` or no version at all. This commit adds, with lower
priority, the latter (no version at all) to the accept header.

Fixes #11540,

Signed-off-by: beorn7 <beorn@grafana.com>
2022-11-07 18:22:03 +01:00
Ganesh Vernekar 3cbf87b83d
Enable protobuf negotiation only when histograms are enabled
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-10-12 13:27:22 +05:30
Jesus Vazquez e934d0f011 Merge 'main' into sparsehistogram
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
2022-10-05 22:14:49 +02:00
Bogdan Drutu 3cde9287a6
scrape: remove unused member from cacheEntry (#11281)
Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
2022-09-08 00:01:01 +02:00
Bogdan Drutu f736a9e953
scrape: remove duplicate mutex unlock (#11282)
Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
2022-09-08 00:00:14 +02:00
Bogdan Drutu c8cfe5c25d
scrape: remove unused argument in newScrapeLoop (#11283)
Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
2022-09-07 23:59:57 +02:00
Paschalis Tsilias 5a8e202f94
Append metadata to the WAL in the scrape loop (#10312)
* Append metadata to the WAL

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Remove extra whitespace; Reword some docstrings and comments

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Use RLock() for hasNewMetadata check

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Use single byte for metric type in RefMetadata

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Update proposed WAL format for single-byte type metadata

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Address first round of review comments

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Amend description of metadata in wal.md

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Correct key used to retrieve metadata from cache

When we're setting metadata entries in the scrapeCace, we're using the
p.Help(), p.Unit(), p.Type() helpers, which retrieve the series name and
use it as the cache key. When checking for cache entries though, we used
p.Series() as the key, which included the metric name _with_ its labels.
That meant that we were never actually hitting the cache. We're fixing
this by utiling the __name__ internal label for correctly getting the
cache entries after they've been set by setHelp(), setType() or
setUnit().

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Put feature behind a feature flag

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Reorder WAL format document

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Fix CR comments

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Extract logic about changing metadata in an anonymous function

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Implement new proposed WAL format and amend relevant tests

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Use 'const' for metadata field names

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Apply metadata to head memSeries in Commit, not in AppendMetadata

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Add docstring and rename extracted helper in scrape.go

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Fix review comments around TestMetadata* tests

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Rebase with merged TSDB changes; fix duplicate definitions after rebase

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Remove leftover changes on db_test.go

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Rename feature flag

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Simplify updateMetadata helper function

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Remove extra newline

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
2022-08-31 15:50:05 +02:00
Marc Tudurí f7df3b86ba
histograms: parse float histograms from proto definition (#11149)
* histograms: parse float histograms from proto definition

Signed-off-by: Marc Tuduri <marctc@protonmail.com>

* Improve comment

Signed-off-by: Marc Tuduri <marctc@protonmail.com>

* Ignore float buckets

Signed-off-by: Marc Tuduri <marctc@protonmail.com>

* Refactor Histogram() function

Signed-off-by: Marc Tuduri <marctc@protonmail.com>

* Fix test_float_histogram

Signed-off-by: Marc Tuduri <marctc@protonmail.com>

* Update model/textparse/protobufparse.go

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Signed-off-by: Marc Tudurí <marctc@protonmail.com>

* Update protobufparse.go

Signed-off-by: Marc Tudurí <marctc@protonmail.com>

* Update scrape.go

Signed-off-by: Marc Tudurí <marctc@protonmail.com>

* Update scrape/scrape.go

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Signed-off-by: Marc Tudurí <marctc@protonmail.com>

Signed-off-by: Marc Tuduri <marctc@protonmail.com>
Signed-off-by: Marc Tudurí <marctc@protonmail.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
2022-08-25 20:37:41 +05:30
Bryan Boreham 8b863c42dd
Optimise relabeling by re-using memory (#11147)
* model/relabel: Add benchmark

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* model/relabel: re-use Builder across relabels

Saves memory allocations.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* labels.Builder: allow re-use of result slice

This reduces memory allocations where the caller has a suitable slice available.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* model/relabel: re-use source values slice

To reduce memory allocations.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Unwind one change causing test failures

Restore original behaviour in PopulateLabels, where we must not overwrite the input set.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* relabel: simplify values optimisation

Use a stack-based array for up to 16 source labels, which will be the
vast majority of cases.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* lint

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-08-19 15:27:52 +05:30
beorn7 c9fd3c235d Merge branch 'main' into sparsehistogram 2022-08-10 17:54:37 +02:00
Levi Harrison d61459d826
no-default-scrape-port feature flag (#9523)
* Add `no-default-scrape-port` flag

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2022-07-20 13:35:47 +02:00
beorn7 28f028e938 Merge branch 'main' into sparsehistogram 2022-07-12 19:07:13 +02:00
Xiaonan Shen 0c3abdc26d
Keep relabeled scrape interval and timeout on reloads (#10916)
* Preserve relabeled scrape interval and timeout on reloads

Signed-off-by: Xiaonan Shen <s@sxn.dev>
2022-06-28 11:58:52 +02:00
beorn7 3bc711e333 Merge branch 'main' into sparsehistogram 2022-05-04 13:37:13 +02:00
Goutham Veeramachaneni 2381d7be57
Send target and metadata cache in context (again) (#10636)
* Send target and metadata cache in context (again)

The previous attempt was rolled back in #10590 due to memory issues.

`sl.parentCtx` and `sl.ctx` both had a copy of the cache and target info
in the previous attempt and it was hard to pin-point where the context
was being retained causing the memory increase.

I've experimented a bunch in #10627 to figure out that this approach doesn't
cause memory increase. Beyond that, just using this info in _any_ other context
is causing a memory increase.

The change fixed a bunch of long-standing in the OTel Collector that the
community was waiting on and release is blocked on a few downstream distrubutions
of OTel Collector waiting on a fix. I propose to merge this change in while
I investigate what is happening.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Gate the change behind a manager option

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2022-05-03 11:45:52 -07:00