Commit graph

281 commits

Author SHA1 Message Date
Julien Pivotto 965e603fa7
Merge pull request #13184 from bboreham/exemplar-sort
Scraping: use slices.sort for exemplars
2023-11-25 09:34:48 +01:00
Julien Pivotto eda73dd3e5
Merge pull request #13187 from bboreham/refactor-newscrapeloop
Scraping tests: refactor scrapeLoop creation
2023-11-24 19:48:44 +01:00
Bryan Boreham 3e287e0170 Scraping tests: refactor scrapeLoop creation
Pull boilerplate code into a function. Where appropriate we set some
config on the returned object.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-11-24 17:28:09 +00:00
Bryan Boreham 784a2d2c74
Merge pull request #12992 from bboreham/single-scrape-buffer-pool
Scraping: share buffer pool across all scrapes
2023-11-24 16:26:19 +00:00
Bryan Boreham f0e1b592ab Scraping: use slices.sort for exemplars
The sort implementation using Go generics is used everywhere else
in Prometheus.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-11-24 14:42:26 +00:00
Paulin Todev 0102425af1
Use only one scrapeMetrics object per test. (#13051)
The scrape loop and scrape cache should use the same instance.
This brings the tests' behavior more in line with production.

Signed-off-by: Paulin Todev <paulin.todev@gmail.com>
2023-11-23 11:24:08 +00:00
Bryan Boreham 9051100aba Scraping: share buffer pool across all scrapes
Previously we had one per scrapePool, and one of those per configured
scraping job. Each pool holds a few unused buffers, so sharing one
across all scrapePools reduces total heap memory.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-11-23 10:23:34 +00:00
Łukasz Mierzwa 870627fbed Add enable_compression scrape config option
Currently Prometheus will always request gzip compression from the target when sending scrape requests.
HTTP compression does reduce the amount of bytes sent over the wire and so is often desirable.
The downside of compression is that it requires extra resources - cpu & memory.

This also affects the resource usage on the target since it has to compress the response
before sending it to Prometheus.

This change adds a new option to the scrape job configuration block: enable_compression.
The default is true so it remains the same as current Prometheus behaviour.

Setting this option to false allows users to disable compression between Prometheus
and the scraped target, which will require more bandwidth but it lowers the resource
usage of both Prometheus and the target.

Fixes #12319.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2023-11-20 12:02:55 +00:00
zenador 32ee1b15de
Fix error on ingesting out-of-order exemplars (#13021)
Fix and improve ingesting exemplars for native histograms.

See code comment for a detailed explanation of the algorithm.

Note that this changes the current behavior for all kind of samples slightly: We now allow exemplars with the same timestamp as during the last scrape if the value or the labels have changed.

Also note that we now do not ingest exemplars without timestamps for native histograms anymore.

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>

---------

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: zenador <zenador@users.noreply.github.com>
Co-authored-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
2023-11-16 15:07:37 +01:00
Julien Pivotto 0fe34f6d78 Follow-up to #13060: Add test to ensure staleness tracking
This commit introduces an additional test in `scrape_test.go` to verify
staleness tracking when `trackTimestampStaleness` is enabled. The new
`TestScrapeLoopAppendStalenessIfTrackTimestampStaleness` function
asserts that the scrape loop correctly appends staleness markers when
necessary, reflecting the expected behavior with the feature flag turned
on.

The previous tests were only testing end of scrape staleness.

Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-11-09 10:20:35 -06:00
Matthieu MOREL fe057fc60d use Go standard errors package
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-11-03 07:26:31 +00:00
Matthieu MOREL 7eaefcf379
ci(lint): enable errorlint on scrape (#12923)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
Signed-off-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>
Co-authored-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>
2023-11-01 20:06:46 +01:00
George Krajcsovits e399395b01
Native histograms vs labels (#13005)
* Document le and quantile label transition due to native histograms

Fixes: #12984

For full explanation see the related issue. The le and quantile labels
are formatted as float with trailing .0 for whole number values when
native histograms is enabled, e.g. 10.0. This changes the resulting series
in Prometheus if previously we scraped the whole number itself, e.g. 10
over the text format.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

---------

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>
2023-11-01 18:30:34 +01:00
Björn Rabenstein a43669e611
Merge pull request #12928 from alexandear/ci-enable-godot
ci(lint): enable godot; append dot at the end of comments
2023-11-01 17:15:41 +01:00
Julien Pivotto 84aadfc45b scrape: Added trackTimestampsStaleness configuration option
Add the ability to track staleness when an explicit timestamp is set.
Useful for cAdvisor.

Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-10-31 16:58:42 -04:00
Oleksandr Redko fa90ca46e5 ci(lint): enable godot; append dot at the end of comments
Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>
2023-10-31 19:53:38 +02:00
Oleksandr Redko 8e5f0387a2
ci(lint): enable nolintlint and remove redundant comments (#12926)
Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>
2023-10-31 12:35:13 +01:00
Paulin Todev 5752050b42
Scrape metrics can now be registered with a non-default registry.
* A registerer is passed to the scrape Manager,
and all scrape metrics register with it.
* For now the registry which we pass to the scrape
Manager is still the global one.

Signed-off-by: Paulin Todev <paulin.todev@gmail.com>
2023-10-11 16:19:00 +01:00
Bartlomiej Plotka 624b973ebf
Added ability to specify scrape protocols to accept during HTTP content type negotiation. (#12738)
* Added ability to specify scrape protocols to accept during HTTP content type negotiation.


This is done via new option in GlobalConfig and ScrapeConfig: "scrape_protocol"

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Fixed readability and log message.

Signed-off-by: bwplotka <bwplotka@gmail.com>

---------

Signed-off-by: bwplotka <bwplotka@gmail.com>
2023-10-10 11:16:55 +01:00
Bryan Boreham f6d9c84fde
scraping: delay creating buffer, to save memory (#12953)
We don't need the buffer to read the response until the scrape http call
returns; creating it earlier makes the buffer pool larger.

I split `scrape()` into `scrape()` which returns with the http response,
and `readResponse()` which decompresses and copies the data into the
supplied buffer. This design was chosen to minimize impact on the logic.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-10-09 17:23:53 +01:00
Bryan Boreham 7c934ae18c scraping: hoist labels variable to save garbage
`lset` escapes to heap due to being passed through the text-parser
interface, so we can reduce garbage by hoisting it out of the loop so
only one allocation is done for every series in a scrape.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-10-05 11:04:59 +00:00
Goutham Veeramachaneni 86729d4d7b
Update exp package (#12650) 2023-09-21 22:53:51 +02:00
Arve Knudsen 6daee89e5f
Add context argument to Querier.Select (#12660)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2023-09-12 12:37:38 +02:00
Bryan Boreham d73b4acb30
Merge pull request #12737 from prometheus/beorn7/histogram
textparse: fix infinite loop during exemplar parsing
2023-08-23 09:36:56 +01:00
György Krajcsovits 983c0c5e9d Add missing buckets
My previous proposal for a fix was wrong and also missed these.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2023-08-21 14:44:53 +02:00
György Krajcsovits 2ae8c2bd3d Set expected values in test
The parsing doesn't seem to be perfect as I don't get all classic buckets
possibly another bug found?

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2023-08-21 13:55:13 +02:00
György Krajcsovits 2a781ec5ac Replicate infinite loop in native-classic histogram scrape
Enable scraping a native histogram with exemplars that leads to
infinite loop.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2023-08-21 13:12:45 +02:00
Bryan Boreham 611f50bb3d scrape: retain all dropped targets when KeepDroppedTargets is zero
This was a bug.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-08-20 14:32:23 +01:00
Bryan Boreham 627c99424b scrape: extend TestDroppedTargetsList to check counts
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-08-20 14:32:23 +01:00
Bryan Boreham 1e3fef6ab0
scraping: limit detail on dropped targets, to save memory (#12647)
It's possible (quite common on Kubernetes) to have a service discovery
return thousands of targets then drop most of them in relabel rules.
The main place this data is used is to display in the web UI, where
you don't want thousands of lines of display.

The new limit is `keep_dropped_targets`, which defaults to 0
for backwards-compatibility.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-08-14 15:39:25 +01:00
beorn7 536a487af4 scrape: Refactor names of float samples
Continue to remove confusion that histogram samples are also samples
and histogram values are also values etc. by renaming float values and
float samples using the same schema as for histograms.

Concretely:
- result → resultFloats (corresponding to resultHistograms)
- pendingResult → pendingFloats (corresponding to pendingHistograms)
- rolledbackResult → rolledbackFloats (corresponding to rolledbackHistograms)
- sample → floatSample (corresponding to histogramSample)

This also order the fields in `collectResultAppender` more
consistently.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-07-13 14:27:51 +02:00
beorn7 0e3f35324b scrape: Enable ingestion of multiple exemplars per sample
This has become a requirement for native histograms, as a single
histogram sample commonly has many buckets, so that providing many
exemplars makes sense.

Since OM text doesn't support native histograms yet, the test had to
be expanded to also support protobuf test cases.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-07-13 14:16:10 +02:00
Bryan Boreham 5255bf06ad Replace sort.Slice with faster slices.SortFunc
The generic version is more efficient.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-07-02 22:17:08 +00:00
Julius Volz ac8abdaacd
Rename remaining jitterSeed -> offsetSeed variables (#12414)
I had changed the naming from "jitter" to "offset" in:

cb045c0e4b

...but I forgot to add this file to the commit to complete the renaming,
doing that now.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2023-06-05 17:36:11 +02:00
Julius Volz cb045c0e4b Fix wording from "jitterSeed" -> "offsetSeed" for server-wide scrape offsets
In digital communication, "jitter" usually refers to how much a signal deviates
from true periodicity, see https://en.wikipedia.org/wiki/Jitter. The way we are
using the "jitterSeed" in Prometheus does not affect the true periodicity at
all, but just introduces a constant phase shift (or offset) within the period.
So it would be more correct and less confusing to call the "jitterSeed" an
"offsetSeed" instead.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2023-05-25 11:54:00 +02:00
beorn7 9e500345f3 textparse/scrape: Add option to scrape both classic and native histograms
So far, if a target exposes a histogram with both classic and native
buckets, a native-histogram enabled Prometheus would ignore the
classic buckets. With the new scrape config option
`scrape_classic_histograms` set, both buckets will be ingested,
creating all the series of a classic histogram in parallel to the
native histogram series. For example, a histogram `foo` would create a
native histogram series `foo` and classic series called `foo_sum`,
`foo_count`, and `foo_bucket`.

This feature can be used in a migration strategy from classic to
native histograms, where it is desired to have a transition period
during which both native and classic histograms are present.

Note that two bugs in classic histogram parsing were found and fixed
as a byproduct of testing the new feature:

1. Series created from classic _gauge_ histograms didn't get the
   _sum/_count/_bucket prefix set.
2. Values of classic _float_ histograms weren't parsed properly.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-05-13 01:32:25 +02:00
Björn Rabenstein bd98fc8c45
Merge pull request #12254 from zenador/histogram-bucket-limit
Implement bucket limit for native histograms
2023-05-10 17:42:29 +02:00
Jeanette Tan 40240c9c1c Update according to code review
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2023-05-05 02:33:00 +08:00
György Krajcsovits 19a4f314f5 Refactor testutil/protobuf.go into scrape package
Renamed to clientprotobuf.go and added comments to indicate the
intended usage.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2023-05-04 08:36:44 +02:00
Russ Cox 28f5502828 scrape: fix two loop variable scoping bugs in test
Consider code like:

	for i := 0; i < numTargets; i++ {
		stopFuncs = append(stopFuncs, func() {
			time.Sleep(i*20*time.Millisecond)
		})
	}

Because the loop variable i is shared by all closures,
all the stopFuncs sleep for numTargets*20 ms.

If the i were made per-iteration, as we are considering
for a future Go release, the stopFuncs would have sleep
durations ranging from 0 to (numTargets-1)*20 ms.

Two tests had code like this and were checking that the
aggregate sleep was at least numTargets*20 ms
("at least as long as the last target slept"). This is only true
today because i == numTarget during all the sleeps.

To keep the code working even if the semantics of this loop
change, this PR computes

	d := time.Duration((i+1)*20) * time.Millisecond

outside the closure (but inside the loop body), and then each
closure has its own d. Now the sleeps range from 20 ms
to numTargets*20 ms, keeping the test passing
(and probably behaving closer to the intent of the test author).

The failure being fixed can be reproduced by using the current
Go development branch with

	GOEXPERIMENT=loopvar go test

Signed-off-by: Russ Cox <rsc@golang.org>
2023-04-26 10:33:10 -04:00
Jeanette Tan dfabc69303 Add tests according to code review
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2023-04-25 02:07:36 +08:00
Jeanette Tan 2ad39baa72 Treat bucket limit like sample limit and make it fail the whole scrape and return an error
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2023-04-22 03:25:07 +08:00
György Krajcsovits 071426f72f Add unit test for bucket limit appender
Refactors textparser test to use a common test utility to create
protobuf representation from MetricFamily

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2023-04-22 03:14:19 +08:00
Jeanette Tan 4d21ac23e6 Implement bucket limit for native histograms
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2023-04-22 03:14:19 +08:00
Matthieu MOREL bae9a21200
Merge branch 'main' into linter/nilerr
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-04-19 19:56:39 +02:00
beorn7 5b53aa1108 style: Replace else if cascades with switch
Wiser coders than myself have come to the conclusion that a `switch`
statement is almost always superior to a statement that includes any
`else if`.

The exceptions that I have found in our codebase are just these two:

* The `if else` is followed by an additional statement before the next
  condition (separated by a `;`).
* The whole thing is within a `for` loop and `break` statements are
  used. In this case, using `switch` would require tagging the `for`
  loop, which probably tips the balance.

Why are `switch` statements more readable?

For one, fewer curly braces. But more importantly, the conditions all
have the same alignment, so the whole thing follows the natural flow
of going down a list of conditions. With `else if`, in contrast, all
conditions but the first are "hidden" behind `} else if `, harder to
spot and (for no good reason) presented differently from the first
condition.

I'm sure the aforemention wise coders can list even more reasons.

In any case, I like it so much that I have found myself recommending
it in code reviews. I would like to make it a habit in our code base,
without making it a hard requirement that we would test on the CI. But
for that, there has to be a role model, so this commit eliminates all
`if else` occurrences, unless it is autogenerated code or fits one of
the exceptions above.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-19 17:22:31 +02:00
beorn7 c3c7d44d84 lint: Adjust to the lint warnings raised by current versions of golint-ci
We haven't updated golint-ci in our CI yet, but this commit prepares
for that.

There are a lot of new warnings, and it is mostly because the "revive"
linter got updated. I agree with most of the new warnings, mostly
around not naming unused function parameters (although it is justified
in some cases for documentation purposes – while things like mocks are
a good example where not naming the parameter is clearer).

I'm pretty upset about the "empty block" warning to include `for`
loops. It's such a common pattern to do something in the head of the
`for` loop and then have an empty block. There is still an open issue
about this: https://github.com/mgechev/revive/issues/810 I have
disabled "revive" altogether in files where empty blocks are used
excessively, and I have made the effort to add individual
`// nolint:revive` where empty blocks are used just once or twice.
It's borderline noisy, though, but let's go with it for now.

I should mention that none of the "empty block" warnings for `for`
loop bodies were legitimate.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-19 17:10:10 +02:00
Matthieu MOREL fb3eb21230 enable gocritic, unconvert and unused linters
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-04-13 19:20:22 +00:00
Bryan Boreham b987afa7ef labels: simplify call to get Labels from Builder
It took a `Labels` where the memory could be re-used, but in practice
this hardly ever benefitted. Especially after converting `relabel.Process`
to `relabel.ProcessBuilder`.

Comparing the parameter to `nil` was a bug; `EmptyLabels` is not `nil`
so the slice was reallocated multiple times by `append`.

Lastly `Builder.Labels()` now estimates that the final size will depend
on labels added and deleted.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-03-22 17:05:20 +00:00
Bryan Boreham 0c09c3feb0 scrape sync: avoid copy of labels for dropped targets
Since the Target object was just created in this function, nobody else
has a reference to it and there are no concerns about it being modified
concurrently so we don't need to copy the value.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-03-16 20:35:13 +00:00