Commit graph

131 commits

Author SHA1 Message Date
Damien Grisonnet b50f9c1c84
Add label scrape limits (#8777)
* scrape: add label limits per scrape

Add three new limits to the scrape configuration to provide some
mechanism to defend against unbound number of labels and excessive
label lengths. If any of these limits are broken by a sample from a
scrape, the whole scrape will fail. For all of these configuration
options, a zero value means no limit.

The `label_limit` configuration will provide a mechanism to bound the
number of labels per-scrape of a certain sample to a user defined limit.
This limit will be tested against the sample labels plus the discovery
labels, but it will exclude the __name__ from the count since it is a
mandatory Prometheus label to which applying constraints isn't
meaningful.

The `label_name_length_limit` and `label_value_length_limit` will
prevent having labels of excessive lengths. These limits also skip the
__name__ label for the same reasons as the `label_limit` option and will
also make the scrape fail if any sample has a label name/value length
that exceed the predefined limits.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

* scrape: add metrics and alert to label limits

Add three gauge, one for each label limit to easily access the
limit set by a certain scrape target.
Also add a counter to count the number of targets that exceeded the
label limits and thus were dropped. This is useful for the
`PrometheusLabelLimitHit` alert that will notify the users that scraping
some targets failed because they had samples exceeding the label limits
defined in the scrape configuration.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

* scrape: apply label limits to __name__ label

Apply limits to the __name__ label that was previously skipped and
truncate the label names and values in the error messages as they can be
very very long.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

* scrape: remove label limits gauges and refactor

Remove `prometheus_target_scrape_pool_label_limit`,
`prometheus_target_scrape_pool_label_name_length_limit`, and
`prometheus_target_scrape_pool_label_value_length_limit` as they are not
really useful since we don't have the information on the labels in it.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-05-06 09:56:21 +01:00
Marco Pracucci 4da5c25ea4
Upgrade prometheus/common to v0.21.0
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-04-21 12:19:16 +02:00
Julien Pivotto e14176756f
Merge pull request #8601 from dgl/fix-8243
Ensure that timestamp comparison uses wall clock time
2021-03-16 16:00:25 +01:00
Callum Styan 289ba11b79
Add circular in-memory exemplars storage (#6635)
* Add circular in-memory exemplars storage

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>

Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com>
Co-authored-by: Martin Disibio <mdisibio@gmail.com>

* Fix some comments, clean up exemplar metrics struct and exemplar tests.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix exemplar query api null vs empty array issue.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com>
Co-authored-by: Martin Disibio <mdisibio@gmail.com>
2021-03-16 15:17:45 +05:30
David Leadbeater 21a282fabe Ensure that timestamp comparison uses wall clock time
It's not possible to assume subtraction and addition of a time.Time will
result in consistent values.

Signed-off-by: David Leadbeater <dgl@dgl.cx>
2021-03-15 13:05:17 +00:00
Tom Wilkie 7369561305
Combine Appender.Add and AddFast into a single Append method. (#8489)
This moves the label lookup into TSDB, whilst still keeping the cached-ref optimisation for repeated Appends.

This makes the API easier to consume and implement.  In particular this change is motivated by the scrape-time-aggregation work, which I don't think is possible to implement without it as it needs access to label values.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2021-02-18 17:37:00 +05:30
gotjosh 4eca4dffb8
Allow metric metadata to be propagated via Remote Write. (#6815)
* Introduce a metadata watcher

Similarly to the WAL watcher, its purpose is to observe the scrape manager and pull metadata. Then, send it to a remote storage.

Signed-off-by: gotjosh <josue@grafana.com>

* Additional fixes after rebasing.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Rework samples/metadata metrics.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Use more descriptive variable names in MetadataWatcher collect.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix issues caused during rebasing.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix missing metric add and unneeded config code.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address some review comments.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix metrics and docs

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Replace assert with require

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Bring back max_samples_per_send metric

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix tests

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

Co-authored-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-11-19 20:53:03 +05:30
Brian Brazil ebe0da7a72
Protect sp.loops from concurrent access. (#8176)
Manager.reload takes the mutex that would make it safe, however
releases it before the goroutines spawned are finished with it.
Thus more explicit locking of scrapePool.Sync/stop/reload is needed.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2020-11-12 16:06:25 +00:00
Julien Pivotto 6c56a1faaa
Testify: move to require (#8122)
* Testify: move to require

Moving testify to require to fail tests early in case of errors.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* More moves

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-29 09:43:23 +00:00
Julien Pivotto 1282d1b39c
Refactor test assertions (#8110)
* Refactor test assertions

This pull request gets rid of assert.True where possible to use
fine-grained assertions.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-27 11:06:53 +01:00
Brian Brazil 3f8e51738c
More granular locking for scrapeLoop. (#8104)
Don't lock for all of Sync/stop/reload as that holds up /metrics and the
UI when they want a list of active/dropped targets. Instead take
advantage of the fact that Sync/stop/reload cannot be called
concurrently by the scrape Manager and lock just on the targets
themselves.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2020-10-26 14:46:20 +00:00
Julien Pivotto 4e5b1722b3
Move away from testutil, refactor imports (#8087)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-22 11:00:08 +02:00
Julien Pivotto be5ba1a62d Fix wordings
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-07 21:44:36 +02:00
Julien Pivotto 671f7c66e5 Adjust comment
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-07 18:28:02 +02:00
Julien Pivotto 627ff84599 Adjust flag
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-07 18:25:52 +02:00
Julien Pivotto 536dfb6234 Add an experimental, hidden flag
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-07 17:31:46 +02:00
Julien Pivotto b90c7a55da Simplify logic
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-06 21:17:16 +02:00
Julien Pivotto ccc1df3140 Fix comment
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-06 13:48:24 +02:00
Julien Pivotto 98e14611a5 Move the tolerance logic in the loop function.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-05 18:20:10 +02:00
Julien Pivotto 6544f95403 Introduce timestamp tolerance in scrapes
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-05 18:20:10 +02:00
Julien Pivotto 6f13c60219
Scrape: Test that deduplicated targets are started (#7975)
This PR test that de-duplicated targets are actually started.

It is a unit test for this line of code:
072b9649a3/scrape/scrape.go (L457)
which is working and necessary but was not tested yet.

It also tests that scrapes are started in the normal way, in the targets
limit test.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-09-30 20:21:32 +02:00
iurii bd53b5ff37
Unnecessary go routine spawn. (#7879)
* Unnecessary go routine spawn.
* Remove unnecessary local variable creation.

Signed-off-by: iurii <iurii@coins.ph>
Co-authored-by: iurii <iurii@coins.ph>
2020-09-02 16:26:42 +01:00
Andy Bursavich 4e6a94a27d
Invert service discovery dependencies (#7701)
This also fixes a bug in query_log_file, which now is relative to the config file like all other paths.

Signed-off-by: Andy Bursavich <abursavich@gmail.com>
2020-08-20 13:48:26 +01:00
Julien Pivotto 64236cf9e8
Use SAN in test certificate (#7789)
go 1.15 deprecated the common name verification.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-08-12 23:15:38 +02:00
Julien Pivotto 2899773b01
Do not stop scrapes in progress during reload (#7752)
* Do not stop scrapes in progress during reload.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-08-07 15:58:16 +02:00
johncming 5578c96307
scrape: fix typo. (#7712)
Signed-off-by: johncming <johncming@yahoo.com>
2020-08-01 09:56:21 +01:00
Julien Pivotto 7b5507ce4b
Scrape: defer report (#7700)
When I started wotking on target_limit, scrapeAndReport did not exist
yet. Then I simply rebased my work without thinking.

It appears that there is a lot that can be inline if I defer() the
report.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-31 19:11:08 +02:00
Annanay ec562f152b Merge branch 'master' into appender-context
Signed-off-by: Annanay <annanayagarwal@gmail.com>
2020-07-31 13:03:56 +05:30
Julien Pivotto f482c7bdd7
Add per scrape-config targets limit (#7554)
* Add per scrape-config targets limit

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-30 14:20:24 +02:00
Annanay 9bba8a6eae Merge branch 'master' into appender-context
Signed-off-by: Annanay <annanayagarwal@gmail.com>
2020-07-30 16:43:18 +05:30
Annanay 89129cd39a Address comments
Signed-off-by: Annanay <annanayagarwal@gmail.com>
2020-07-30 16:41:13 +05:30
Julien Pivotto e76c436e9c
Goleak in discoveries, scrape, rules (#7662)
* Add go leak tests for discoveries with goroutines

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Add go leak tests in rules

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Add go leak tests in scrape tests

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-27 09:38:08 +01:00
Annanay 7f98a744e5 Add context to Appender interface
Signed-off-by: Annanay <annanayagarwal@gmail.com>
2020-07-24 19:40:51 +05:30
johncming 490f9c664e
scrape: remove two blank lines. (#7610)
Signed-off-by: johncming <johncming@yahoo.com>
2020-07-19 07:34:04 +02:00
Julien Pivotto 22aa21e508
scrape tests: Make appenders more realistic (#7594)
With this, the storage tests inside the scrape package are more
realistic.

Discovered with #7593, but fixed independently as #7593 will probably
take some time.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-17 12:30:22 +02:00
Julien Pivotto 754461b74f
Reuse the same appender for report and scrape (#7562)
Additionally, implement isolation in collectResultAppender.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-16 13:53:39 +02:00
Julien Pivotto 190addffd8
Change Scrape Loop mtx to Mutex (#7553)
It was still RWLock but we never use the read lock..

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-11 15:37:13 +02:00
Kemal Akkoyun 66dfb951c4
*: Consistent Error/Warning handling for SeriesSet iterator: Allowing Async Select (#7251)
* Add errors and Warnings to SeriesSet

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Change Querier interface and refactor accordingly

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Refactor promql/engine to propagate warnings at eval stage

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Address review issues

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Make sure all the series from all Selects are pre-advanced

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Address review issues

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Separate merge series sets

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Clean

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Refactor merge querier failure handling

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Refactored and simplified fanout with improvements from incoming chunk iterator PRs.

* Secondary logic is hidden, instead of weird failed series set logic we had.
* Fanout is well commented
* Fanout closing record all errors
* MergeQuerier improved API (clearer)
* deferredGenericMergeSeriesSet is not needed as we return no samples anyway for failed series sets (next = false).

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fix formatting

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Fix CI issues

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Added final tests for error handling.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Brian's comments.

* Moved hints in populate to be allocated only when needed.
* Used sync.Once in secondary Querier to achieve all-or-nothing partial response logic.
* Select after first Next is done will panic.

NOTE: in lazySeriesSet in theory we could just panic, I think however we can
totally just return error, it will panic in expand anyway.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Utilize errWithWarnings

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Fix recently introduced expansion issue

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add tests for secondary querier error handling

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Implement lazy merge

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add name to test cases

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Reorganize

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Address review comments

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Address review comments

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Remove redundant warnings

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Fix rebase mistake

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-06-09 17:57:31 +01:00
Brian Brazil f9d21f10ec
Only relabelling should apply for scrape_samples_scraped_post_relabelling. (#7342)
More consistent variable names.

Fixes #7298

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2020-06-04 16:00:37 +01:00
Brian Brazil c9565f08aa
Pass reference to checkAddError so appendErrors is updated. (#7294)
This was preventing the warnings from being logged.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2020-05-26 15:14:55 +01:00
ZouYu 2b7437d60e
Fix some warnings: 'redundant type from array, slice, or map composite literal' (#7109)
Signed-off-by: ZouYu <zouy.fnst@cn.fujitsu.com>
2020-04-15 11:17:41 +01:00
Marek Slabicki 8224ddec23
Capitalizing first letter of all log lines (#7043)
Signed-off-by: Marek Slabicki <thaniri@gmail.com>
2020-04-11 09:22:18 +01:00
Julien Pivotto 0c4ec8d9dd
Merge pull request #6911 from mjtrangoni/remove-buildnametocertificate
scrape/target_test.go: remove deprecated function BuildNameToCertificate()
2020-03-27 17:00:19 +01:00
Callum Styan c453def8c5
Separate scrape add error checking out into it's own function. (#6930)
* Separate scrape add error checking out into it's own function.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* pass sampleLimitError to checkAddError instead of returning an error

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Return bool, error from checkAddError so we can properly handle
ErrNotFound for AddFast. This should in theory never happen, but the
previous code path handled this case. Adds a test for this, which master
passes and the previous commit fails.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address comment changes.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Move sampleAdded inside the loop iteration within append, since that's
the only block the variable is used in.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2020-03-25 19:31:48 -07:00
Bartlomiej Plotka c4eefd1b3a storage: Removed SelectSorted method; Simplified interface; Added requirement for remote read to sort response.
This is technically BREAKING CHANGE, but it was like this from the beginning: I just notice that we rely in
Prometheus on remote read being sorted. This is because we use selected data from remote reads in MergeSeriesSet
which rely on sorting.

I found during work on https://github.com/prometheus/prometheus/pull/5882 that
we do so many repetitions because of this, for not good reason. I think
I found a good balance between convenience and readability with just one method.
Smaller the interface = better.

Also I don't know what TestSelectSorted was testing, but now it's testing sorting.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-03-20 21:14:43 +01:00
Julien Pivotto d6ad5551c9
Scrape: do not put staleness marker when cache is reused (#7011)
* Scrape: do not put staleness marker when cache is reused

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-03-20 17:43:26 +01:00
Julien Pivotto 8907ba6235 Make TSDB use storage errors
This fixes #6992, which was introduced by #6777. There was an
intermediate component which translated TSDB errors into storage errors,
but that component was deleted and this bug went unnoticed, until we
were watching at the Prombench results. Without this, scrape will fail
instead of dropping samples or using "Add" when the series have been
garbage collected.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-03-17 22:24:25 +01:00
Björn Rabenstein d80b0810c1
Move crucial actions to defer (#6918)
With defer having less of a performance penalty, there is no reason
not to do those crucial operations via defer.

Context: With isolation in place, if we forget to Commit/Rollback, the
low watermark will get stuck forever.

The current code should not have any bugs, but moving to defer helps
to avoid future bugs.

This is also moving the `closeAppend` in the `Commit` implementation
itself to defer. If logging to the WAL fails, we would have missed the
`closeAppend`.

Signed-off-by: beorn7 <beorn@grafana.com>
2020-03-13 20:54:47 +01:00
Brian Brazil 5da8990053
Log scrape append failures as debug rather than warn. (#6852)
This is most likely due to an endpoint not producing valid
metrics output, which we should treat the same as a failed
scrape, and thus not spam the application logs with it.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2020-03-06 00:46:03 +00:00
李国忠 52025bd7a9
[comments] change word ‘wheter’ to ‘whether’ (#6912)
* [comments] change word ‘wheter’ to ‘whether’
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>

* [comments] change word ‘wheter’ to ‘whether’
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
2020-03-02 13:51:24 +05:30