prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-11-18 11:34:05 -08:00

Author	SHA1	Message	Date
beorn7	5de2df752f	Hacky implementation of protobuf parsing This "brings back" protobuf parsing, with the only goal to play with the new sparse histograms. The Prom-2.x style parser is highly adapted to the structure of the Prometheus text format (and later OpenMetrics). Some jumping through hoops is required to feed protobuf into it. This is not meant to be a model for the final implementation. It should just enable sparse histogram ingestion at a reasonable efficiency. Following known shortcomings and flaws: - No tests yet. - Summaries and legacy histograms, i.e. without sparse buckets, are ignored. - Staleness doesn't work (but this could be fixed in the appender, to be discussed). - No tricks have been tried that would be similar to the tricks the text parsers do (like direct pointers into the HTTP response body). That makes things weird here. Tricky optimizations only make sense once the final format is specified, which will almost certainly not be the old protobuf format. (Interestingly, I expect this implementation to be in fact much more efficient than the original protobuf ingestion in Prom-1.x.) - This is using a proto3 version of metrics.proto (mostly to be consistent with the other protobuf uses). However, proto3 sees no difference between an unset field. We depend on that to distinguish between an unset timestamp and the timestamp 0 (1970-01-01, 00:00:00 UTC). In this experimental code, we just assume that timestamp is never specified and therefore a timestamp of 0 always is interpreted as "not set". Signed-off-by: beorn7 <beorn@grafana.com>	2021-07-01 01:35:11 +02:00
Julius Volz	9d495afd2c	Remove trailing zeros in scrape timeout header See https://twitter.com/AviKivity/status/1405147699557638145 and https://twitter.com/juliusvolz/status/1405790211670515712 Signed-off-by: Julius Volz <julius.volz@gmail.com>	2021-06-18 09:38:12 +02:00
Levi Harrison	b5f6f8fb36	Switched to go-kit/log Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-06-11 12:28:36 -04:00
hanjm	1df05bfd49	Add body_size_limit to prevent bad targets response large body cause Prometheus server OOM (#8827 ) Signed-off-by: hanjm <hanjinming@outlook.com>	2021-05-29 07:05:42 +08:00
Damien Grisonnet	b50f9c1c84	Add label scrape limits (#8777 ) * scrape: add label limits per scrape Add three new limits to the scrape configuration to provide some mechanism to defend against unbound number of labels and excessive label lengths. If any of these limits are broken by a sample from a scrape, the whole scrape will fail. For all of these configuration options, a zero value means no limit. The `label_limit` configuration will provide a mechanism to bound the number of labels per-scrape of a certain sample to a user defined limit. This limit will be tested against the sample labels plus the discovery labels, but it will exclude the __name__ from the count since it is a mandatory Prometheus label to which applying constraints isn't meaningful. The `label_name_length_limit` and `label_value_length_limit` will prevent having labels of excessive lengths. These limits also skip the __name__ label for the same reasons as the `label_limit` option and will also make the scrape fail if any sample has a label name/value length that exceed the predefined limits. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com> * scrape: add metrics and alert to label limits Add three gauge, one for each label limit to easily access the limit set by a certain scrape target. Also add a counter to count the number of targets that exceeded the label limits and thus were dropped. This is useful for the `PrometheusLabelLimitHit` alert that will notify the users that scraping some targets failed because they had samples exceeding the label limits defined in the scrape configuration. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com> * scrape: apply label limits to __name__ label Apply limits to the __name__ label that was previously skipped and truncate the label names and values in the error messages as they can be very very long. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com> * scrape: remove label limits gauges and refactor Remove `prometheus_target_scrape_pool_label_limit`, `prometheus_target_scrape_pool_label_name_length_limit`, and `prometheus_target_scrape_pool_label_value_length_limit` as they are not really useful since we don't have the information on the labels in it. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>	2021-05-06 09:56:21 +01:00
Callum Styan	289ba11b79	Add circular in-memory exemplars storage (#6635 ) * Add circular in-memory exemplars storage Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> Signed-off-by: Martin Disibio <mdisibio@gmail.com> Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> Co-authored-by: Martin Disibio <mdisibio@gmail.com> * Fix some comments, clean up exemplar metrics struct and exemplar tests. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix exemplar query api null vs empty array issue. Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> Co-authored-by: Martin Disibio <mdisibio@gmail.com>	2021-03-16 15:17:45 +05:30
Tom Wilkie	7369561305	Combine Appender.Add and AddFast into a single Append method. (#8489 ) This moves the label lookup into TSDB, whilst still keeping the cached-ref optimisation for repeated Appends. This makes the API easier to consume and implement. In particular this change is motivated by the scrape-time-aggregation work, which I don't think is possible to implement without it as it needs access to label values. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2021-02-18 17:37:00 +05:30
Julien Pivotto	6c56a1faaa	Testify: move to require (#8122 ) * Testify: move to require Moving testify to require to fail tests early in case of errors. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * More moves Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-29 09:43:23 +00:00
Julien Pivotto	1282d1b39c	Refactor test assertions (#8110 ) * Refactor test assertions This pull request gets rid of assert.True where possible to use fine-grained assertions. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-27 11:06:53 +01:00
Julien Pivotto	4e5b1722b3	Move away from testutil, refactor imports (#8087 ) Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-22 11:00:08 +02:00
Julien Pivotto	6f13c60219	Scrape: Test that deduplicated targets are started (#7975 ) This PR test that de-duplicated targets are actually started. It is a unit test for this line of code: `072b9649a3/scrape/scrape.go (L457)` which is working and necessary but was not tested yet. It also tests that scrapes are started in the normal way, in the targets limit test. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-09-30 20:21:32 +02:00
Julien Pivotto	2899773b01	Do not stop scrapes in progress during reload (#7752 ) * Do not stop scrapes in progress during reload. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-08-07 15:58:16 +02:00
Annanay	ec562f152b	Merge branch 'master' into appender-context Signed-off-by: Annanay <annanayagarwal@gmail.com>	2020-07-31 13:03:56 +05:30
Julien Pivotto	f482c7bdd7	Add per scrape-config targets limit (#7554 ) * Add per scrape-config targets limit Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-07-30 14:20:24 +02:00
Annanay	9bba8a6eae	Merge branch 'master' into appender-context Signed-off-by: Annanay <annanayagarwal@gmail.com>	2020-07-30 16:43:18 +05:30
Annanay	89129cd39a	Address comments Signed-off-by: Annanay <annanayagarwal@gmail.com>	2020-07-30 16:41:13 +05:30
Julien Pivotto	e76c436e9c	Goleak in discoveries, scrape, rules (#7662 ) * Add go leak tests for discoveries with goroutines Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * Add go leak tests in rules Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * Add go leak tests in scrape tests Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-07-27 09:38:08 +01:00
Annanay	7f98a744e5	Add context to Appender interface Signed-off-by: Annanay <annanayagarwal@gmail.com>	2020-07-24 19:40:51 +05:30
Julien Pivotto	22aa21e508	scrape tests: Make appenders more realistic (#7594 ) With this, the storage tests inside the scrape package are more realistic. Discovered with #7593, but fixed independently as #7593 will probably take some time. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-07-17 12:30:22 +02:00
Julien Pivotto	754461b74f	Reuse the same appender for report and scrape (#7562 ) Additionally, implement isolation in collectResultAppender. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-07-16 13:53:39 +02:00
Kemal Akkoyun	66dfb951c4	: Consistent Error/Warning handling for SeriesSet iterator: Allowing Async Select (#7251 ) Add errors and Warnings to SeriesSet Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Change Querier interface and refactor accordingly Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Refactor promql/engine to propagate warnings at eval stage Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Address review issues Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Make sure all the series from all Selects are pre-advanced Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Address review issues Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Separate merge series sets Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Clean Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Refactor merge querier failure handling Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Refactored and simplified fanout with improvements from incoming chunk iterator PRs. * Secondary logic is hidden, instead of weird failed series set logic we had. * Fanout is well commented * Fanout closing record all errors * MergeQuerier improved API (clearer) * deferredGenericMergeSeriesSet is not needed as we return no samples anyway for failed series sets (next = false). Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fix formatting Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix CI issues Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Added final tests for error handling. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed Brian's comments. * Moved hints in populate to be allocated only when needed. * Used sync.Once in secondary Querier to achieve all-or-nothing partial response logic. * Select after first Next is done will panic. NOTE: in lazySeriesSet in theory we could just panic, I think however we can totally just return error, it will panic in expand anyway. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Utilize errWithWarnings Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix recently introduced expansion issue Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add tests for secondary querier error handling Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Implement lazy merge Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add name to test cases Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Reorganize Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Address review comments Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Address review comments Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Remove redundant warnings Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix rebase mistake Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-06-09 17:57:31 +01:00
Brian Brazil	f9d21f10ec	Only relabelling should apply for scrape_samples_scraped_post_relabelling. (#7342 ) More consistent variable names. Fixes #7298 Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2020-06-04 16:00:37 +01:00
Brian Brazil	c9565f08aa	Pass reference to checkAddError so appendErrors is updated. (#7294 ) This was preventing the warnings from being logged. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2020-05-26 15:14:55 +01:00
ZouYu	2b7437d60e	Fix some warnings: 'redundant type from array, slice, or map composite literal' (#7109 ) Signed-off-by: ZouYu <zouy.fnst@cn.fujitsu.com>	2020-04-15 11:17:41 +01:00
Callum Styan	c453def8c5	Separate scrape add error checking out into it's own function. (#6930 ) * Separate scrape add error checking out into it's own function. Signed-off-by: Callum Styan <callumstyan@gmail.com> * pass sampleLimitError to checkAddError instead of returning an error Signed-off-by: Callum Styan <callumstyan@gmail.com> * Return bool, error from checkAddError so we can properly handle ErrNotFound for AddFast. This should in theory never happen, but the previous code path handled this case. Adds a test for this, which master passes and the previous commit fails. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address comment changes. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Move sampleAdded inside the loop iteration within append, since that's the only block the variable is used in. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2020-03-25 19:31:48 -07:00
Bartlomiej Plotka	c4eefd1b3a	storage: Removed SelectSorted method; Simplified interface; Added requirement for remote read to sort response. This is technically BREAKING CHANGE, but it was like this from the beginning: I just notice that we rely in Prometheus on remote read being sorted. This is because we use selected data from remote reads in MergeSeriesSet which rely on sorting. I found during work on https://github.com/prometheus/prometheus/pull/5882 that we do so many repetitions because of this, for not good reason. I think I found a good balance between convenience and readability with just one method. Smaller the interface = better. Also I don't know what TestSelectSorted was testing, but now it's testing sorting. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-03-20 21:14:43 +01:00
Julien Pivotto	d6ad5551c9	Scrape: do not put staleness marker when cache is reused (#7011 ) * Scrape: do not put staleness marker when cache is reused Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-03-20 17:43:26 +01:00
Julien Pivotto	8907ba6235	Make TSDB use storage errors This fixes #6992, which was introduced by #6777. There was an intermediate component which translated TSDB errors into storage errors, but that component was deleted and this bug went unnoticed, until we were watching at the Prombench results. Without this, scrape will fail instead of dropping samples or using "Add" when the series have been garbage collected. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-03-17 22:24:25 +01:00
Julien Pivotto	ed623f69e2	tsdb: don't allow ingesting empty labelsets (#6891 ) * tsdb: don't allow ingesting empty labelsets When we ingest an empty labelset in the head, further blocks can not be compacted, with the error: ``` level=error ts=2020-02-27T21:26:58.379Z caller=db.go:659 component=tsdb msg="compaction failed" err="persist head block: write compaction: add series: out-of-order series added with label set \"{}\" / prev: \"{}\"" ``` We should therefore reject those invalid empty labelsets upfront. This can be reproduced with the following: ``` cat << END > prometheus.yml scrape_configs: - job_name: 'prometheus' scrape_interval: 1s basic_auth: username: test password: test metric_relabel_configs: - regex: ".*" action: labeldrop static_configs: - targets: - 127.0.1.1:9090 END ./prometheus --storage.tsdb.min-block-duration=1m ``` And wait a few minutes. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-03-02 07:18:05 +00:00
Boqin Qin	0e51cf65e7	scrape_test: fix send-to-closed-channel bugs (#6849 ) Signed-off-by: BurtonQin <bobbqqin@gmail.com>	2020-02-20 13:40:25 +00:00
Bartlomiej Plotka	34426766d8	Unify Iterator interfaces. All point to storage now. This is part of https://github.com/prometheus/prometheus/pull/5882 that can be done to simplify things. All todos I added will be fixed in follow up PRs. * querier.Querier, querier.Appender, querier.SeriesSet, and querier.Series interfaces merged with storage interface.go. All imports that. * querier.SeriesIterator replaced by chunkenc.Iterator * Added chunkenc.Iterator.Seek method and tests for xor implementation (?) * Since we properly handle SelectParams for Select methods I adjusted min max based on that. This should help in terms of performance for queries with functions like offset. * added Seek to deletedIterator and test. * storage/tsdb was removed as it was only a unnecessary glue with incompatible structs. No logic was changed, only different source of abstractions, so no need for benchmarks. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-02-17 18:03:54 +00:00
Boqin Qin	cdbd42393e	scrape: fix goroutine leak in test (#6812 ) * scrape: fix goroutine leak in test Signed-off-by: BurtonQin <bobbqqin@gmail.com>	2020-02-13 07:53:07 +00:00
Julien Pivotto	9c67fce6e0	Scrape: test samples_post_metric_relabeling when metrics are dropped (#6720 ) Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-01-29 17:47:36 +00:00
Julien Pivotto	fafb7940b1	Pass over scrape cache to the next scrape (#6670 ) * Pass over scrape cache to the next scrape Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-01-22 12:13:47 +00:00
Julien Pivotto	46d18112a3	tsdb: error on series with duplicate labels (#6664 ) Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-01-20 11:05:27 +00:00
gotjosh	05842176a6	Make the scrape.metricMetadataStore interface public To test the implementation of our metric metadata API, we need to represent various states of metadata in the scrape metadata store. That is currently not possible as the interface and method to set the store are private. This changes the interface, list and get methods, and the SetMetadaStore function to be public. Incidentally, the scrapeCache implementation needs to be renamed to match the new signature. Signed-off-by: gotjosh <josue@grafana.com>	2019-12-05 10:29:58 +00:00
Geoffrey Beausire	5cb7987314	Fix relabaling collision when using exported label When using both a label and the suffix+label in the relabel config. It's possible that Prometheus remove the suffx+label for no obvious reason. It's due to a collision when merging labels from target and from the sample. Signed-off-by: Geoffrey Beausire <g.beausire@criteo.com>	2019-11-26 11:03:11 +01:00
Dustin Hooten	ca60bf298c	React UI: Implement /targets page (#6276 ) * Add LastScrapeDuration to targets endpoint Signed-off-by: Dustin Hooten <dhooten@splunk.com> * Add Scrape job name to targets endpoint Signed-off-by: Dustin Hooten <dhooten@splunk.com> * Implement the /targets page in react Signed-off-by: Dustin Hooten <dhooten@splunk.com> * Add state query param to targets endpoint Signed-off-by: Dustin Hooten <dhooten@splunk.com> * Use state filter in api call Signed-off-by: Dustin Hooten <dhooten@splunk.com> * api feedback Signed-off-by: Dustin Hooten <dhooten@splunk.com> * pr feedback frontend Signed-off-by: Dustin Hooten <dhooten@splunk.com> * Implement and use localstorage hook Signed-off-by: Dustin Hooten <dhooten@splunk.com> * PR feedback Signed-off-by: Dustin Hooten <dhooten@splunk.com>	2019-11-11 22:42:24 +01:00
Alex Dzyoba	1a38075f83	scrape: Move tests to testutil (#6187 ) Part of the fix for #3242. Signed-off-by: Alex Dzyoba <alex@dzyoba.com>	2019-11-04 16:43:42 -07:00
yuxiaobo	47e51c8b2b	Correct spelling mistakes Signed-off-by: yuxiaobo <yuxiaobogo@163.com>	2019-10-10 18:46:27 +08:00
Brian Brazil	e62f30d497	Correctly handle empty labels from alert templates. (#5845 ) Fixes https://github.com/prometheus/common/issues/36 Move logic handling this into the labels package, so all the cases are handled in one place and we're less likely to have this come up again. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2019-08-13 11:19:17 +01:00
Chris Marchbanks	529ccff07b	Remove all usages of stretchr/testify Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-08-08 19:49:27 -06:00
Chris Marchbanks	0685eb5395	Refactor testutil.NewStorage into a new package This avoids a circular dependency between the testutil and storage packages. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-08-08 19:43:04 -06:00
Brian Brazil	b98e818876	Add scrape_series_added per-scrape metric. (#5546 ) This is an estimate of churn, with series being added to the cache being considered churn. This will have both false positives (e.g. series appearing and disappearing) and false negatives (e.g. series hit sample_limit, but still created in head block), but should be generally useful as-is. Relevant docs live in another repo. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2019-05-08 22:24:00 +01:00
Simon Pasquier	c1682adb2f	Bump prometheus/common to v0.3.0 (#5344 ) * Reload certificates from disk automatically This change bumps github.com/prometheus/common to include https://github.com/prometheus/common/pull/173 Signed-off-by: Simon Pasquier <spasquie@redhat.com> * scrape: close idle connections on reload/stop Signed-off-by: Simon Pasquier <spasquie@redhat.com> * use v0.3.0 tag Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-04-10 13:20:00 +01:00
Brian Brazil	f7184978f4	Protect against memory exhaustion when scraping. Now that we're not losing the scrape cache across failed scrape, a scrape that continually failed but had varying series or metadata (e.g. timestamps in metric names, plus hitting smaple_limit) would grow the cache indefinitely. Add some code to catch that, and flush the cache anyway. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2019-04-04 19:09:11 +01:00
Brian Brazil	dd3073616c	Don't lose the scrape cache on a failed scrape. This avoids CPU usage increasing when the target comes back. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2019-04-04 19:09:11 +01:00
Tariq Ibrahim	8fdfa8abea	refine error handling in prometheus (#5388 ) i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors. ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives. iii) Does away with the use of fmt package for errors in favour of pkg/errors Signed-off-by: tariqibrahim <tariq181290@gmail.com>	2019-03-26 00:01:12 +01:00
Julien Pivotto	4397916cb2	Add honor_timestamps (#5304 ) Fixes #5302 Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2019-03-15 10:04:15 +00:00
xjewer	0d1a69353e	scrape: Add global jitter for HA server (#5181 ) * scrape: Add global jitter for HA server Covers issue in https://github.com/prometheus/prometheus/pull/4926#issuecomment-449039848 where the HA setup become a problem for targets unable to be scraped simultaneously. The new jitter per server relies on the hostname and external labels which necessarily to be uniq. As before, scrape offset will be calculated with regard the absolute time, so even restart/reload doesn't change scrape time per scrape target + prometheus instance. Use fqdn if possible, otherwise fall back to the hostname. It adds extra random seed to calculate server hash to be distinguish on machines with the same hostname, but different DC. Signed-off-by: Aleksei Semiglazov <xjewer@gmail.com>	2019-03-12 10:46:15 +00:00

1 2

69 commits