prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-12-25 05:34:05 -08:00

Author	SHA1	Message	Date
Levi Harrison	2826fbeeb7	SD: Add target creation failure counter and change failure handling (#8786 ) * Added metric and changed failure/drop strategy Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-05-28 23:50:59 +02:00
Callum Styan	8fd73b1d28	Add Exemplar Remote Write support (#8296 ) * Write exemplars to the WAL and send them over remote write. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Update example for exemplars, print data in a more obvious format. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Add metrics for remote write of exemplars. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix incorrect slices passed to send in remote write. Signed-off-by: Callum Styan <callumstyan@gmail.com> * We need to unregister the new metrics. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address review comments Signed-off-by: Callum Styan <callumstyan@gmail.com> * Order of exemplar append vs write exemplar to WAL needs to change. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Several fixes to prevent sending uninitialized or incorrect samples with an exemplar. Fix dropping exemplar for missing series. Add tests for queue_manager sending exemplars Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Store both samples and exemplars in the same timeseries buffer to remove the alloc when building final request, keep sub-slices in separate buffers for re-use Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Condense sample/exemplar delivery tests to parameterized sub-tests Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Rename test methods for clarity now that they also handle exemplars Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Rename counter variable. Fix instances where metrics were not updated correctly Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Add exemplars to LoadWAL benchmark Signed-off-by: Callum Styan <callumstyan@gmail.com> * last exemplars timestamp metric needs to convert value to seconds with ms precision Signed-off-by: Callum Styan <callumstyan@gmail.com> * Process exemplar records in a separate go routine when loading the WAL. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address review comments related to clarifying comments and variable names. Also refactor sample/exemplar to enqueue prompb types. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Regenerate types proto with comments, update protoc version again. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Put remote write of exemplars behind a feature flag. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address some of Ganesh's review comments. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Move exemplar remote write feature flag to a config file field. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address Bartek's review comments. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Don't allocate exemplar buffers in queue_manager if we're not going to send exemplars over remote write. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Add ValidateExemplar function, validate exemplars when appending to head and log them all to WAL before adding them to exemplar storage. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address more reivew comments from Ganesh. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Add exemplar total label length check. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address a few last review comments Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Martin Disibio <mdisibio@gmail.com>	2021-05-06 13:53:52 -07:00
Damien Grisonnet	b50f9c1c84	Add label scrape limits (#8777 ) * scrape: add label limits per scrape Add three new limits to the scrape configuration to provide some mechanism to defend against unbound number of labels and excessive label lengths. If any of these limits are broken by a sample from a scrape, the whole scrape will fail. For all of these configuration options, a zero value means no limit. The `label_limit` configuration will provide a mechanism to bound the number of labels per-scrape of a certain sample to a user defined limit. This limit will be tested against the sample labels plus the discovery labels, but it will exclude the __name__ from the count since it is a mandatory Prometheus label to which applying constraints isn't meaningful. The `label_name_length_limit` and `label_value_length_limit` will prevent having labels of excessive lengths. These limits also skip the __name__ label for the same reasons as the `label_limit` option and will also make the scrape fail if any sample has a label name/value length that exceed the predefined limits. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com> * scrape: add metrics and alert to label limits Add three gauge, one for each label limit to easily access the limit set by a certain scrape target. Also add a counter to count the number of targets that exceeded the label limits and thus were dropped. This is useful for the `PrometheusLabelLimitHit` alert that will notify the users that scraping some targets failed because they had samples exceeding the label limits defined in the scrape configuration. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com> * scrape: apply label limits to __name__ label Apply limits to the __name__ label that was previously skipped and truncate the label names and values in the error messages as they can be very very long. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com> * scrape: remove label limits gauges and refactor Remove `prometheus_target_scrape_pool_label_limit`, `prometheus_target_scrape_pool_label_name_length_limit`, and `prometheus_target_scrape_pool_label_value_length_limit` as they are not really useful since we don't have the information on the labels in it. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>	2021-05-06 09:56:21 +01:00
Marco Pracucci	4da5c25ea4	Upgrade prometheus/common to v0.21.0 Signed-off-by: Marco Pracucci <marco@pracucci.com>	2021-04-21 12:19:16 +02:00
Julien Pivotto	e14176756f	Merge pull request #8601 from dgl/fix-8243 Ensure that timestamp comparison uses wall clock time	2021-03-16 16:00:25 +01:00
Callum Styan	289ba11b79	Add circular in-memory exemplars storage (#6635 ) * Add circular in-memory exemplars storage Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> Signed-off-by: Martin Disibio <mdisibio@gmail.com> Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> Co-authored-by: Martin Disibio <mdisibio@gmail.com> * Fix some comments, clean up exemplar metrics struct and exemplar tests. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix exemplar query api null vs empty array issue. Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> Co-authored-by: Martin Disibio <mdisibio@gmail.com>	2021-03-16 15:17:45 +05:30
David Leadbeater	21a282fabe	Ensure that timestamp comparison uses wall clock time It's not possible to assume subtraction and addition of a time.Time will result in consistent values. Signed-off-by: David Leadbeater <dgl@dgl.cx>	2021-03-15 13:05:17 +00:00
Tom Wilkie	7369561305	Combine Appender.Add and AddFast into a single Append method. (#8489 ) This moves the label lookup into TSDB, whilst still keeping the cached-ref optimisation for repeated Appends. This makes the API easier to consume and implement. In particular this change is motivated by the scrape-time-aggregation work, which I don't think is possible to implement without it as it needs access to label values. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2021-02-18 17:37:00 +05:30
gotjosh	4eca4dffb8	Allow metric metadata to be propagated via Remote Write. (#6815 ) * Introduce a metadata watcher Similarly to the WAL watcher, its purpose is to observe the scrape manager and pull metadata. Then, send it to a remote storage. Signed-off-by: gotjosh <josue@grafana.com> * Additional fixes after rebasing. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Rework samples/metadata metrics. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Use more descriptive variable names in MetadataWatcher collect. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix issues caused during rebasing. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix missing metric add and unneeded config code. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address some review comments. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix metrics and docs Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Replace assert with require Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Bring back max_samples_per_send metric Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Fix tests Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> Co-authored-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-11-19 20:53:03 +05:30
Brian Brazil	ebe0da7a72	Protect sp.loops from concurrent access. (#8176 ) Manager.reload takes the mutex that would make it safe, however releases it before the goroutines spawned are finished with it. Thus more explicit locking of scrapePool.Sync/stop/reload is needed. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2020-11-12 16:06:25 +00:00
Julien Pivotto	6c56a1faaa	Testify: move to require (#8122 ) * Testify: move to require Moving testify to require to fail tests early in case of errors. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * More moves Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-29 09:43:23 +00:00
Julien Pivotto	1282d1b39c	Refactor test assertions (#8110 ) * Refactor test assertions This pull request gets rid of assert.True where possible to use fine-grained assertions. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-27 11:06:53 +01:00
Brian Brazil	3f8e51738c	More granular locking for scrapeLoop. (#8104 ) Don't lock for all of Sync/stop/reload as that holds up /metrics and the UI when they want a list of active/dropped targets. Instead take advantage of the fact that Sync/stop/reload cannot be called concurrently by the scrape Manager and lock just on the targets themselves. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2020-10-26 14:46:20 +00:00
Julien Pivotto	4e5b1722b3	Move away from testutil, refactor imports (#8087 ) Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-22 11:00:08 +02:00
Julien Pivotto	be5ba1a62d	Fix wordings Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-07 21:44:36 +02:00
Julien Pivotto	671f7c66e5	Adjust comment Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-07 18:28:02 +02:00
Julien Pivotto	627ff84599	Adjust flag Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-07 18:25:52 +02:00
Julien Pivotto	536dfb6234	Add an experimental, hidden flag Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-07 17:31:46 +02:00
Julien Pivotto	b90c7a55da	Simplify logic Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-06 21:17:16 +02:00
Julien Pivotto	ccc1df3140	Fix comment Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-06 13:48:24 +02:00
Julien Pivotto	98e14611a5	Move the tolerance logic in the loop function. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-05 18:20:10 +02:00
Julien Pivotto	6544f95403	Introduce timestamp tolerance in scrapes Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-05 18:20:10 +02:00
Julien Pivotto	6f13c60219	Scrape: Test that deduplicated targets are started (#7975 ) This PR test that de-duplicated targets are actually started. It is a unit test for this line of code: `072b9649a3/scrape/scrape.go (L457)` which is working and necessary but was not tested yet. It also tests that scrapes are started in the normal way, in the targets limit test. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-09-30 20:21:32 +02:00
iurii	bd53b5ff37	Unnecessary go routine spawn. (#7879 ) * Unnecessary go routine spawn. * Remove unnecessary local variable creation. Signed-off-by: iurii <iurii@coins.ph> Co-authored-by: iurii <iurii@coins.ph>	2020-09-02 16:26:42 +01:00
Andy Bursavich	4e6a94a27d	Invert service discovery dependencies (#7701 ) This also fixes a bug in query_log_file, which now is relative to the config file like all other paths. Signed-off-by: Andy Bursavich <abursavich@gmail.com>	2020-08-20 13:48:26 +01:00
Julien Pivotto	64236cf9e8	Use SAN in test certificate (#7789 ) go 1.15 deprecated the common name verification. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-08-12 23:15:38 +02:00
Julien Pivotto	2899773b01	Do not stop scrapes in progress during reload (#7752 ) * Do not stop scrapes in progress during reload. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-08-07 15:58:16 +02:00
johncming	5578c96307	scrape: fix typo. (#7712 ) Signed-off-by: johncming <johncming@yahoo.com>	2020-08-01 09:56:21 +01:00
Julien Pivotto	7b5507ce4b	Scrape: defer report (#7700 ) When I started wotking on target_limit, scrapeAndReport did not exist yet. Then I simply rebased my work without thinking. It appears that there is a lot that can be inline if I defer() the report. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-07-31 19:11:08 +02:00
Annanay	ec562f152b	Merge branch 'master' into appender-context Signed-off-by: Annanay <annanayagarwal@gmail.com>	2020-07-31 13:03:56 +05:30
Julien Pivotto	f482c7bdd7	Add per scrape-config targets limit (#7554 ) * Add per scrape-config targets limit Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-07-30 14:20:24 +02:00
Annanay	9bba8a6eae	Merge branch 'master' into appender-context Signed-off-by: Annanay <annanayagarwal@gmail.com>	2020-07-30 16:43:18 +05:30
Annanay	89129cd39a	Address comments Signed-off-by: Annanay <annanayagarwal@gmail.com>	2020-07-30 16:41:13 +05:30
Julien Pivotto	e76c436e9c	Goleak in discoveries, scrape, rules (#7662 ) * Add go leak tests for discoveries with goroutines Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * Add go leak tests in rules Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * Add go leak tests in scrape tests Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-07-27 09:38:08 +01:00
Annanay	7f98a744e5	Add context to Appender interface Signed-off-by: Annanay <annanayagarwal@gmail.com>	2020-07-24 19:40:51 +05:30
johncming	490f9c664e	scrape: remove two blank lines. (#7610 ) Signed-off-by: johncming <johncming@yahoo.com>	2020-07-19 07:34:04 +02:00
Julien Pivotto	22aa21e508	scrape tests: Make appenders more realistic (#7594 ) With this, the storage tests inside the scrape package are more realistic. Discovered with #7593, but fixed independently as #7593 will probably take some time. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-07-17 12:30:22 +02:00
Julien Pivotto	754461b74f	Reuse the same appender for report and scrape (#7562 ) Additionally, implement isolation in collectResultAppender. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-07-16 13:53:39 +02:00
Julien Pivotto	190addffd8	Change Scrape Loop mtx to Mutex (#7553 ) It was still RWLock but we never use the read lock.. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-07-11 15:37:13 +02:00
Kemal Akkoyun	66dfb951c4	: Consistent Error/Warning handling for SeriesSet iterator: Allowing Async Select (#7251 ) Add errors and Warnings to SeriesSet Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Change Querier interface and refactor accordingly Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Refactor promql/engine to propagate warnings at eval stage Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Address review issues Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Make sure all the series from all Selects are pre-advanced Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Address review issues Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Separate merge series sets Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Clean Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Refactor merge querier failure handling Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Refactored and simplified fanout with improvements from incoming chunk iterator PRs. * Secondary logic is hidden, instead of weird failed series set logic we had. * Fanout is well commented * Fanout closing record all errors * MergeQuerier improved API (clearer) * deferredGenericMergeSeriesSet is not needed as we return no samples anyway for failed series sets (next = false). Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fix formatting Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix CI issues Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Added final tests for error handling. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed Brian's comments. * Moved hints in populate to be allocated only when needed. * Used sync.Once in secondary Querier to achieve all-or-nothing partial response logic. * Select after first Next is done will panic. NOTE: in lazySeriesSet in theory we could just panic, I think however we can totally just return error, it will panic in expand anyway. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Utilize errWithWarnings Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix recently introduced expansion issue Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add tests for secondary querier error handling Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Implement lazy merge Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add name to test cases Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Reorganize Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Address review comments Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Address review comments Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Remove redundant warnings Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix rebase mistake Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-06-09 17:57:31 +01:00
Brian Brazil	f9d21f10ec	Only relabelling should apply for scrape_samples_scraped_post_relabelling. (#7342 ) More consistent variable names. Fixes #7298 Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2020-06-04 16:00:37 +01:00
Brian Brazil	c9565f08aa	Pass reference to checkAddError so appendErrors is updated. (#7294 ) This was preventing the warnings from being logged. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2020-05-26 15:14:55 +01:00
ZouYu	2b7437d60e	Fix some warnings: 'redundant type from array, slice, or map composite literal' (#7109 ) Signed-off-by: ZouYu <zouy.fnst@cn.fujitsu.com>	2020-04-15 11:17:41 +01:00
Marek Slabicki	8224ddec23	Capitalizing first letter of all log lines (#7043 ) Signed-off-by: Marek Slabicki <thaniri@gmail.com>	2020-04-11 09:22:18 +01:00
Julien Pivotto	0c4ec8d9dd	Merge pull request #6911 from mjtrangoni/remove-buildnametocertificate scrape/target_test.go: remove deprecated function BuildNameToCertificate()	2020-03-27 17:00:19 +01:00
Callum Styan	c453def8c5	Separate scrape add error checking out into it's own function. (#6930 ) * Separate scrape add error checking out into it's own function. Signed-off-by: Callum Styan <callumstyan@gmail.com> * pass sampleLimitError to checkAddError instead of returning an error Signed-off-by: Callum Styan <callumstyan@gmail.com> * Return bool, error from checkAddError so we can properly handle ErrNotFound for AddFast. This should in theory never happen, but the previous code path handled this case. Adds a test for this, which master passes and the previous commit fails. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address comment changes. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Move sampleAdded inside the loop iteration within append, since that's the only block the variable is used in. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2020-03-25 19:31:48 -07:00
Bartlomiej Plotka	c4eefd1b3a	storage: Removed SelectSorted method; Simplified interface; Added requirement for remote read to sort response. This is technically BREAKING CHANGE, but it was like this from the beginning: I just notice that we rely in Prometheus on remote read being sorted. This is because we use selected data from remote reads in MergeSeriesSet which rely on sorting. I found during work on https://github.com/prometheus/prometheus/pull/5882 that we do so many repetitions because of this, for not good reason. I think I found a good balance between convenience and readability with just one method. Smaller the interface = better. Also I don't know what TestSelectSorted was testing, but now it's testing sorting. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-03-20 21:14:43 +01:00
Julien Pivotto	d6ad5551c9	Scrape: do not put staleness marker when cache is reused (#7011 ) * Scrape: do not put staleness marker when cache is reused Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-03-20 17:43:26 +01:00
Julien Pivotto	8907ba6235	Make TSDB use storage errors This fixes #6992, which was introduced by #6777. There was an intermediate component which translated TSDB errors into storage errors, but that component was deleted and this bug went unnoticed, until we were watching at the Prombench results. Without this, scrape will fail instead of dropping samples or using "Add" when the series have been garbage collected. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-03-17 22:24:25 +01:00
Björn Rabenstein	d80b0810c1	Move crucial actions to defer (#6918 ) With defer having less of a performance penalty, there is no reason not to do those crucial operations via defer. Context: With isolation in place, if we forget to Commit/Rollback, the low watermark will get stuck forever. The current code should not have any bugs, but moving to defer helps to avoid future bugs. This is also moving the `closeAppend` in the `Commit` implementation itself to defer. If logging to the WAL fails, we would have missed the `closeAppend`. Signed-off-by: beorn7 <beorn@grafana.com>	2020-03-13 20:54:47 +01:00

1 2 3

133 commits