Commit graph

105 commits

Author SHA1 Message Date
Ganesh Vernekar 3cbf87b83d
Enable protobuf negotiation only when histograms are enabled
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-10-12 13:27:22 +05:30
Jesus Vazquez e934d0f011 Merge 'main' into sparsehistogram
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
2022-10-05 22:14:49 +02:00
Bryan Boreham 4927e13537 scrape tests: undo EmptyLabels change
Needs other code changes otherwise tests fail

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-09-09 13:34:49 +02:00
Bryan Boreham 14780c3b4e scrape: in tests use labels.FromStrings
And a few cases of `EmptyLabels()`.
Replacing code which assumes the internal structure of `Labels`.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-09-09 13:34:49 +02:00
Bogdan Drutu 3cde9287a6
scrape: remove unused member from cacheEntry (#11281)
Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
2022-09-08 00:01:01 +02:00
Bogdan Drutu c8cfe5c25d
scrape: remove unused argument in newScrapeLoop (#11283)
Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
2022-09-07 23:59:57 +02:00
Paschalis Tsilias 5a8e202f94
Append metadata to the WAL in the scrape loop (#10312)
* Append metadata to the WAL

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Remove extra whitespace; Reword some docstrings and comments

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Use RLock() for hasNewMetadata check

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Use single byte for metric type in RefMetadata

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Update proposed WAL format for single-byte type metadata

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Address first round of review comments

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Amend description of metadata in wal.md

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Correct key used to retrieve metadata from cache

When we're setting metadata entries in the scrapeCace, we're using the
p.Help(), p.Unit(), p.Type() helpers, which retrieve the series name and
use it as the cache key. When checking for cache entries though, we used
p.Series() as the key, which included the metric name _with_ its labels.
That meant that we were never actually hitting the cache. We're fixing
this by utiling the __name__ internal label for correctly getting the
cache entries after they've been set by setHelp(), setType() or
setUnit().

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Put feature behind a feature flag

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Reorder WAL format document

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Fix CR comments

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Extract logic about changing metadata in an anonymous function

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Implement new proposed WAL format and amend relevant tests

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Use 'const' for metadata field names

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Apply metadata to head memSeries in Commit, not in AppendMetadata

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Add docstring and rename extracted helper in scrape.go

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Fix review comments around TestMetadata* tests

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Rebase with merged TSDB changes; fix duplicate definitions after rebase

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Remove leftover changes on db_test.go

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Rename feature flag

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Simplify updateMetadata helper function

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Remove extra newline

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
2022-08-31 15:50:05 +02:00
beorn7 c9fd3c235d Merge branch 'main' into sparsehistogram 2022-08-10 17:54:37 +02:00
Levi Harrison d61459d826
no-default-scrape-port feature flag (#9523)
* Add `no-default-scrape-port` flag

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2022-07-20 13:35:47 +02:00
beorn7 28f028e938 Merge branch 'main' into sparsehistogram 2022-07-12 19:07:13 +02:00
Julien Pivotto 90583c8906
TestScrapeLoopCache: Display content of the appender (#10937)
This should help identifying windows tests flakiness.

Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2022-07-01 14:28:56 +02:00
Xiaonan Shen 0c3abdc26d
Keep relabeled scrape interval and timeout on reloads (#10916)
* Preserve relabeled scrape interval and timeout on reloads

Signed-off-by: Xiaonan Shen <s@sxn.dev>
2022-06-28 11:58:52 +02:00
beorn7 3bc711e333 Merge branch 'main' into sparsehistogram 2022-05-04 13:37:13 +02:00
Goutham Veeramachaneni 2381d7be57
Send target and metadata cache in context (again) (#10636)
* Send target and metadata cache in context (again)

The previous attempt was rolled back in #10590 due to memory issues.

`sl.parentCtx` and `sl.ctx` both had a copy of the cache and target info
in the previous attempt and it was hard to pin-point where the context
was being retained causing the memory increase.

I've experimented a bunch in #10627 to figure out that this approach doesn't
cause memory increase. Beyond that, just using this info in _any_ other context
is causing a memory increase.

The change fixed a bunch of long-standing in the OTel Collector that the
community was waiting on and release is blocked on a few downstream distrubutions
of OTel Collector waiting on a fix. I propose to merge this change in while
I investigate what is happening.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Gate the change behind a manager option

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2022-05-03 11:45:52 -07:00
Matthieu MOREL e2ede285a2
refactor: move from io/ioutil to io and os packages (#10528)
* refactor: move from io/ioutil to io and os packages
* use fs.DirEntry instead of os.FileInfo after os.ReadDir

Signed-off-by: MOREL Matthieu <matthieu.morel@cnp.fr>
2022-04-27 11:24:36 +02:00
beorn7 4210aac74a Merge branch 'main' into sparsehistogram 2022-03-22 14:47:42 +01:00
Robert Fratto f0ec619eec
scrape: allow providing a custom Dialer for scraping (#10415)
* scrape: allow providing a custom Dialer for scraping

This commit extends config.ScrapeConfig with an optional field to
override how HTTP connections to targets are created. This field is not
set directly in Prometheus, and is only added for the convenience of
downstream importers.

Closes #9706

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

* scrape: move custom dial function to scrape.Options

Signed-off-by: Robert Fratto <robertfratto@gmail.com>
2022-03-09 00:48:47 +01:00
Jayapriya Pai edfe657b54
scrape: Fix label_limits cache usage (#10370)
Fixes #10344

Signed-off-by: Jayapriya Pai <janantha@redhat.com>
2022-03-03 18:37:53 +01:00
Matheus Pimenta 8d8ce641a4
error for invalid media type should not be completely swallowed (#10186)
* error for invalid media type should not be completely swallowed

Signed-off-by: Matheus Pimenta <matheuscscp@gmail.com>
2022-02-08 10:57:56 +01:00
beorn7 86cc83b13c storage: iterator fixes after merge
Signed-off-by: beorn7 <beorn@grafana.com>
2021-12-18 14:12:01 +01:00
beorn7 64c7bd2b08 Merge branch 'main' into sparsehistogram 2021-12-18 14:04:25 +01:00
Julien Pivotto e94a0b28e1 Append reporting metrics without limit
If reporting metrics fails due to reaching the limit, this makes the
target appear as UP in the UI, but the metrics are missing.

This commit bypasses that limit for report metrics.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-12-16 13:26:53 +01:00
Björn Rabenstein 7e42acd3b1
tsdb: Rework iterators (#9877)
- Pick At... method via return value of Next/Seek.
- Do not clobber returned buckets.
- Add partial FloatHistogram suppert.

Note that the promql package is now _only_ dealing with
FloatHistograms, following the idea that PromQL only knows float
values.

As a byproduct, I have removed the histogramSeries metric. In my
understanding, series can have both float and histogram samples, so
that metric doesn't make sense anymore.

As another byproduct, I have converged the sampleBuf and the
histogramSampleBuf in memSeries into one. The sample type stored in
the sampleBuf has been extended to also contain histograms even before
this commit.

Signed-off-by: beorn7 <beorn@grafana.com>
2021-11-29 13:24:23 +05:30
beorn7 5d4db805ac Merge branch 'main' into sparsehistogram 2021-11-17 19:57:31 +01:00
beorn7 c954cd9d1d Move packages out of deprecated pkg directory
This creates a new `model` directory and moves all data-model related
packages over there:
  exemplar labels relabel rulefmt textparse timestamp value

All the others are more or less utilities and have been moved to `util`:
  gate logging modetimevfs pool runtime

Signed-off-by: beorn7 <beorn@grafana.com>
2021-11-09 08:03:10 +01:00
Dieter Plaetinck cda025b5b5
TSDB: demistify SeriesRefs and ChunkRefs (#9536)
* TSDB: demistify seriesRefs and ChunkRefs

The TSDB package contains many types of series and chunk references,
all shrouded in uint types.  Often the same uint value may
actually mean one of different types, in non-obvious ways.

This PR aims to clarify the code and help navigating to relevant docs,
usage, etc much quicker.

Concretely:

* Use appropriately named types and document their semantics and
  relations.
* Make multiplexing and demuxing of types explicit
  (on the boundaries between concrete implementations and generic
  interfaces).
* Casting between different types should be free.  None of the changes
  should have any impact on how the code runs.

TODO: Implement BlockSeriesRef where appropriate (for a future PR)

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* feedback

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* agent: demistify seriesRefs and ChunkRefs

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
2021-11-06 15:40:04 +05:30
Mateusz Gozdek 1a6c2283a3 Format Go source files using 'gofumpt -w -s -extra'
Part of #9557

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
2021-11-02 19:52:34 +01:00
beorn7 a9008f5423 Merge branch 'main' into sparsehistogram 2021-10-19 17:14:23 +02:00
Shirley Leu c890ea407f
Resolve conflicts between multiple exported label prefixes (#9479)
Resolve conflicts between multiple exported label prefixes

Signed-off-by: Shirley Leu <shirley.w.leu@gmail.com>
2021-10-15 20:31:03 +02:00
beorn7 fd5ea4e0b5 Merge branch 'main' into sparsehistogram 2021-10-07 23:16:42 +02:00
Bryan Boreham 92a3eeac55
Create less garbage when parsing metrics (#9299)
* Refactor: extract function to make scrapeLoop for testing

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Add benchmarks for ScrapeLoopAppend

For Prometheus and OpenMetrics

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Create less garbage when parsing metrics

Exemplar escapes to heap due to being passed through text-parser
interface, but we can reduce the impact by hoisting it out of the loop
and resetting it after every use.

(Note the cost was paid on every line even when exemplars were disabled)

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Create less garbage when parsing OpenMetrics

After calling parseLVals() we always append the return value, so pass in
what we want to append it to and save garbage.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2021-09-08 13:39:21 +05:30
Łukasz Mierzwa f0a26266c0 Add scrape_sample_limit metric
This adds a new metric exposing per target scrape sample_limit value. Metrics are only exposed if extra-scrape-metrics feature flag is enabled.
scrape_sample_limit will make it easy to monitor and alert on targets getting close to configured sample_limit, which is important given than exceeding sample_limit results in the entire scrape results being rejected.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2021-09-03 15:42:41 +01:00
SuperQ 31f4108758
Add scrape_timeout_seconds metric
Add a new built-in metric `scrape_timeout_seconds` to allow monitoring
of the ratio of scrape duration to the scrape timeout. Hide behind a
feature flag to avoid additional cardinality by default.

Signed-off-by: SuperQ <superq@gmail.com>
2021-09-02 12:15:35 +02:00
Levi Harrison 70f597b033
Configure Scrape Interval and Timeout Via Relabeling (#8911)
* Configure scrape interval and timeout with labels

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-08-31 17:37:32 +02:00
Ganesh Vernekar 8b70e87ab9
Merge remote-tracking branch 'upstream/main' into sparse-refactor
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-08-05 12:16:08 +05:30
Arunprasad Rajkumar 5527e26efc
scrape: fix 'target_limit exceeded error' when reloading conf with 0
Signed-off-by: Arunprasad Rajkumar <arajkuma@redhat.com>
2021-07-27 17:34:22 +05:30
beorn7 5de2df752f Hacky implementation of protobuf parsing
This "brings back" protobuf parsing, with the only goal to play with
the new sparse histograms.

The Prom-2.x style parser is highly adapted to the structure of the
Prometheus text format (and later OpenMetrics). Some jumping through
hoops is required to feed protobuf into it.

This is not meant to be a model for the final implementation. It
should just enable sparse histogram ingestion at a reasonable
efficiency.

Following known shortcomings and flaws:

- No tests yet.

- Summaries and legacy histograms, i.e. without sparse buckets, are
  ignored.

- Staleness doesn't work (but this could be fixed in the appender, to
  be discussed).

- No tricks have been tried that would be similar to the tricks the
  text parsers do (like direct pointers into the HTTP response
  body). That makes things weird here. Tricky optimizations only make
  sense once the final format is specified, which will almost
  certainly not be the old protobuf format. (Interestingly, I expect
  this implementation to be in fact much more efficient than the
  original protobuf ingestion in Prom-1.x.)

- This is using a proto3 version of metrics.proto (mostly to be
  consistent with the other protobuf uses). However, proto3 sees no
  difference between an unset field. We depend on that to distinguish
  between an unset timestamp and the timestamp 0 (1970-01-01, 00:00:00
  UTC). In this experimental code, we just assume that timestamp is
  never specified and therefore a timestamp of 0 always is interpreted
  as "not set".

Signed-off-by: beorn7 <beorn@grafana.com>
2021-07-01 01:35:11 +02:00
Julius Volz 9d495afd2c Remove trailing zeros in scrape timeout header
See https://twitter.com/AviKivity/status/1405147699557638145 and
https://twitter.com/juliusvolz/status/1405790211670515712

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2021-06-18 09:38:12 +02:00
Levi Harrison b5f6f8fb36 Switched to go-kit/log
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-06-11 12:28:36 -04:00
hanjm 1df05bfd49 Add body_size_limit to prevent bad targets response large body cause Prometheus server OOM (#8827)
Signed-off-by: hanjm <hanjinming@outlook.com>
2021-05-29 07:05:42 +08:00
Damien Grisonnet b50f9c1c84
Add label scrape limits (#8777)
* scrape: add label limits per scrape

Add three new limits to the scrape configuration to provide some
mechanism to defend against unbound number of labels and excessive
label lengths. If any of these limits are broken by a sample from a
scrape, the whole scrape will fail. For all of these configuration
options, a zero value means no limit.

The `label_limit` configuration will provide a mechanism to bound the
number of labels per-scrape of a certain sample to a user defined limit.
This limit will be tested against the sample labels plus the discovery
labels, but it will exclude the __name__ from the count since it is a
mandatory Prometheus label to which applying constraints isn't
meaningful.

The `label_name_length_limit` and `label_value_length_limit` will
prevent having labels of excessive lengths. These limits also skip the
__name__ label for the same reasons as the `label_limit` option and will
also make the scrape fail if any sample has a label name/value length
that exceed the predefined limits.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

* scrape: add metrics and alert to label limits

Add three gauge, one for each label limit to easily access the
limit set by a certain scrape target.
Also add a counter to count the number of targets that exceeded the
label limits and thus were dropped. This is useful for the
`PrometheusLabelLimitHit` alert that will notify the users that scraping
some targets failed because they had samples exceeding the label limits
defined in the scrape configuration.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

* scrape: apply label limits to __name__ label

Apply limits to the __name__ label that was previously skipped and
truncate the label names and values in the error messages as they can be
very very long.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

* scrape: remove label limits gauges and refactor

Remove `prometheus_target_scrape_pool_label_limit`,
`prometheus_target_scrape_pool_label_name_length_limit`, and
`prometheus_target_scrape_pool_label_value_length_limit` as they are not
really useful since we don't have the information on the labels in it.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-05-06 09:56:21 +01:00
Callum Styan 289ba11b79
Add circular in-memory exemplars storage (#6635)
* Add circular in-memory exemplars storage

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>

Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com>
Co-authored-by: Martin Disibio <mdisibio@gmail.com>

* Fix some comments, clean up exemplar metrics struct and exemplar tests.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix exemplar query api null vs empty array issue.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com>
Co-authored-by: Martin Disibio <mdisibio@gmail.com>
2021-03-16 15:17:45 +05:30
Tom Wilkie 7369561305
Combine Appender.Add and AddFast into a single Append method. (#8489)
This moves the label lookup into TSDB, whilst still keeping the cached-ref optimisation for repeated Appends.

This makes the API easier to consume and implement.  In particular this change is motivated by the scrape-time-aggregation work, which I don't think is possible to implement without it as it needs access to label values.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2021-02-18 17:37:00 +05:30
Julien Pivotto 6c56a1faaa
Testify: move to require (#8122)
* Testify: move to require

Moving testify to require to fail tests early in case of errors.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* More moves

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-29 09:43:23 +00:00
Julien Pivotto 1282d1b39c
Refactor test assertions (#8110)
* Refactor test assertions

This pull request gets rid of assert.True where possible to use
fine-grained assertions.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-27 11:06:53 +01:00
Julien Pivotto 4e5b1722b3
Move away from testutil, refactor imports (#8087)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-22 11:00:08 +02:00
Julien Pivotto 6f13c60219
Scrape: Test that deduplicated targets are started (#7975)
This PR test that de-duplicated targets are actually started.

It is a unit test for this line of code:
072b9649a3/scrape/scrape.go (L457)
which is working and necessary but was not tested yet.

It also tests that scrapes are started in the normal way, in the targets
limit test.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-09-30 20:21:32 +02:00
Julien Pivotto 2899773b01
Do not stop scrapes in progress during reload (#7752)
* Do not stop scrapes in progress during reload.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-08-07 15:58:16 +02:00
Annanay ec562f152b Merge branch 'master' into appender-context
Signed-off-by: Annanay <annanayagarwal@gmail.com>
2020-07-31 13:03:56 +05:30
Julien Pivotto f482c7bdd7
Add per scrape-config targets limit (#7554)
* Add per scrape-config targets limit

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-30 14:20:24 +02:00