prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-11-10 07:34:04 -08:00

Author	SHA1	Message	Date
Ganesh Vernekar	8b70e87ab9	Merge remote-tracking branch 'upstream/main' into sparse-refactor Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-08-05 12:16:08 +05:30
Arunprasad Rajkumar	5527e26efc	scrape: fix 'target_limit exceeded error' when reloading conf with 0 Signed-off-by: Arunprasad Rajkumar <arajkuma@redhat.com>	2021-07-27 17:34:22 +05:30
beorn7	5de2df752f	Hacky implementation of protobuf parsing This "brings back" protobuf parsing, with the only goal to play with the new sparse histograms. The Prom-2.x style parser is highly adapted to the structure of the Prometheus text format (and later OpenMetrics). Some jumping through hoops is required to feed protobuf into it. This is not meant to be a model for the final implementation. It should just enable sparse histogram ingestion at a reasonable efficiency. Following known shortcomings and flaws: - No tests yet. - Summaries and legacy histograms, i.e. without sparse buckets, are ignored. - Staleness doesn't work (but this could be fixed in the appender, to be discussed). - No tricks have been tried that would be similar to the tricks the text parsers do (like direct pointers into the HTTP response body). That makes things weird here. Tricky optimizations only make sense once the final format is specified, which will almost certainly not be the old protobuf format. (Interestingly, I expect this implementation to be in fact much more efficient than the original protobuf ingestion in Prom-1.x.) - This is using a proto3 version of metrics.proto (mostly to be consistent with the other protobuf uses). However, proto3 sees no difference between an unset field. We depend on that to distinguish between an unset timestamp and the timestamp 0 (1970-01-01, 00:00:00 UTC). In this experimental code, we just assume that timestamp is never specified and therefore a timestamp of 0 always is interpreted as "not set". Signed-off-by: beorn7 <beorn@grafana.com>	2021-07-01 01:35:11 +02:00
Julius Volz	9d495afd2c	Remove trailing zeros in scrape timeout header See https://twitter.com/AviKivity/status/1405147699557638145 and https://twitter.com/juliusvolz/status/1405790211670515712 Signed-off-by: Julius Volz <julius.volz@gmail.com>	2021-06-18 09:38:12 +02:00
Levi Harrison	b5f6f8fb36	Switched to go-kit/log Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-06-11 12:28:36 -04:00
hanjm	1df05bfd49	Add body_size_limit to prevent bad targets response large body cause Prometheus server OOM (#8827 ) Signed-off-by: hanjm <hanjinming@outlook.com>	2021-05-29 07:05:42 +08:00
Levi Harrison	2826fbeeb7	SD: Add target creation failure counter and change failure handling (#8786 ) * Added metric and changed failure/drop strategy Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-05-28 23:50:59 +02:00
Callum Styan	8fd73b1d28	Add Exemplar Remote Write support (#8296 ) * Write exemplars to the WAL and send them over remote write. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Update example for exemplars, print data in a more obvious format. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Add metrics for remote write of exemplars. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix incorrect slices passed to send in remote write. Signed-off-by: Callum Styan <callumstyan@gmail.com> * We need to unregister the new metrics. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address review comments Signed-off-by: Callum Styan <callumstyan@gmail.com> * Order of exemplar append vs write exemplar to WAL needs to change. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Several fixes to prevent sending uninitialized or incorrect samples with an exemplar. Fix dropping exemplar for missing series. Add tests for queue_manager sending exemplars Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Store both samples and exemplars in the same timeseries buffer to remove the alloc when building final request, keep sub-slices in separate buffers for re-use Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Condense sample/exemplar delivery tests to parameterized sub-tests Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Rename test methods for clarity now that they also handle exemplars Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Rename counter variable. Fix instances where metrics were not updated correctly Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Add exemplars to LoadWAL benchmark Signed-off-by: Callum Styan <callumstyan@gmail.com> * last exemplars timestamp metric needs to convert value to seconds with ms precision Signed-off-by: Callum Styan <callumstyan@gmail.com> * Process exemplar records in a separate go routine when loading the WAL. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address review comments related to clarifying comments and variable names. Also refactor sample/exemplar to enqueue prompb types. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Regenerate types proto with comments, update protoc version again. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Put remote write of exemplars behind a feature flag. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address some of Ganesh's review comments. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Move exemplar remote write feature flag to a config file field. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address Bartek's review comments. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Don't allocate exemplar buffers in queue_manager if we're not going to send exemplars over remote write. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Add ValidateExemplar function, validate exemplars when appending to head and log them all to WAL before adding them to exemplar storage. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address more reivew comments from Ganesh. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Add exemplar total label length check. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address a few last review comments Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Martin Disibio <mdisibio@gmail.com>	2021-05-06 13:53:52 -07:00
Damien Grisonnet	b50f9c1c84	Add label scrape limits (#8777 ) * scrape: add label limits per scrape Add three new limits to the scrape configuration to provide some mechanism to defend against unbound number of labels and excessive label lengths. If any of these limits are broken by a sample from a scrape, the whole scrape will fail. For all of these configuration options, a zero value means no limit. The `label_limit` configuration will provide a mechanism to bound the number of labels per-scrape of a certain sample to a user defined limit. This limit will be tested against the sample labels plus the discovery labels, but it will exclude the __name__ from the count since it is a mandatory Prometheus label to which applying constraints isn't meaningful. The `label_name_length_limit` and `label_value_length_limit` will prevent having labels of excessive lengths. These limits also skip the __name__ label for the same reasons as the `label_limit` option and will also make the scrape fail if any sample has a label name/value length that exceed the predefined limits. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com> * scrape: add metrics and alert to label limits Add three gauge, one for each label limit to easily access the limit set by a certain scrape target. Also add a counter to count the number of targets that exceeded the label limits and thus were dropped. This is useful for the `PrometheusLabelLimitHit` alert that will notify the users that scraping some targets failed because they had samples exceeding the label limits defined in the scrape configuration. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com> * scrape: apply label limits to __name__ label Apply limits to the __name__ label that was previously skipped and truncate the label names and values in the error messages as they can be very very long. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com> * scrape: remove label limits gauges and refactor Remove `prometheus_target_scrape_pool_label_limit`, `prometheus_target_scrape_pool_label_name_length_limit`, and `prometheus_target_scrape_pool_label_value_length_limit` as they are not really useful since we don't have the information on the labels in it. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>	2021-05-06 09:56:21 +01:00
Marco Pracucci	4da5c25ea4	Upgrade prometheus/common to v0.21.0 Signed-off-by: Marco Pracucci <marco@pracucci.com>	2021-04-21 12:19:16 +02:00
Julien Pivotto	e14176756f	Merge pull request #8601 from dgl/fix-8243 Ensure that timestamp comparison uses wall clock time	2021-03-16 16:00:25 +01:00
Callum Styan	289ba11b79	Add circular in-memory exemplars storage (#6635 ) * Add circular in-memory exemplars storage Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> Signed-off-by: Martin Disibio <mdisibio@gmail.com> Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> Co-authored-by: Martin Disibio <mdisibio@gmail.com> * Fix some comments, clean up exemplar metrics struct and exemplar tests. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix exemplar query api null vs empty array issue. Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> Co-authored-by: Martin Disibio <mdisibio@gmail.com>	2021-03-16 15:17:45 +05:30
David Leadbeater	21a282fabe	Ensure that timestamp comparison uses wall clock time It's not possible to assume subtraction and addition of a time.Time will result in consistent values. Signed-off-by: David Leadbeater <dgl@dgl.cx>	2021-03-15 13:05:17 +00:00
Tom Wilkie	7369561305	Combine Appender.Add and AddFast into a single Append method. (#8489 ) This moves the label lookup into TSDB, whilst still keeping the cached-ref optimisation for repeated Appends. This makes the API easier to consume and implement. In particular this change is motivated by the scrape-time-aggregation work, which I don't think is possible to implement without it as it needs access to label values. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2021-02-18 17:37:00 +05:30
Brian Brazil	ebe0da7a72	Protect sp.loops from concurrent access. (#8176 ) Manager.reload takes the mutex that would make it safe, however releases it before the goroutines spawned are finished with it. Thus more explicit locking of scrapePool.Sync/stop/reload is needed. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2020-11-12 16:06:25 +00:00
Brian Brazil	3f8e51738c	More granular locking for scrapeLoop. (#8104 ) Don't lock for all of Sync/stop/reload as that holds up /metrics and the UI when they want a list of active/dropped targets. Instead take advantage of the fact that Sync/stop/reload cannot be called concurrently by the scrape Manager and lock just on the targets themselves. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2020-10-26 14:46:20 +00:00
Julien Pivotto	be5ba1a62d	Fix wordings Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-07 21:44:36 +02:00
Julien Pivotto	671f7c66e5	Adjust comment Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-07 18:28:02 +02:00
Julien Pivotto	627ff84599	Adjust flag Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-07 18:25:52 +02:00
Julien Pivotto	536dfb6234	Add an experimental, hidden flag Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-07 17:31:46 +02:00
Julien Pivotto	b90c7a55da	Simplify logic Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-06 21:17:16 +02:00
Julien Pivotto	ccc1df3140	Fix comment Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-06 13:48:24 +02:00
Julien Pivotto	98e14611a5	Move the tolerance logic in the loop function. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-05 18:20:10 +02:00
Julien Pivotto	6544f95403	Introduce timestamp tolerance in scrapes Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-05 18:20:10 +02:00
iurii	bd53b5ff37	Unnecessary go routine spawn. (#7879 ) * Unnecessary go routine spawn. * Remove unnecessary local variable creation. Signed-off-by: iurii <iurii@coins.ph> Co-authored-by: iurii <iurii@coins.ph>	2020-09-02 16:26:42 +01:00
Andy Bursavich	4e6a94a27d	Invert service discovery dependencies (#7701 ) This also fixes a bug in query_log_file, which now is relative to the config file like all other paths. Signed-off-by: Andy Bursavich <abursavich@gmail.com>	2020-08-20 13:48:26 +01:00
Julien Pivotto	2899773b01	Do not stop scrapes in progress during reload (#7752 ) * Do not stop scrapes in progress during reload. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-08-07 15:58:16 +02:00
johncming	5578c96307	scrape: fix typo. (#7712 ) Signed-off-by: johncming <johncming@yahoo.com>	2020-08-01 09:56:21 +01:00
Julien Pivotto	7b5507ce4b	Scrape: defer report (#7700 ) When I started wotking on target_limit, scrapeAndReport did not exist yet. Then I simply rebased my work without thinking. It appears that there is a lot that can be inline if I defer() the report. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-07-31 19:11:08 +02:00
Annanay	ec562f152b	Merge branch 'master' into appender-context Signed-off-by: Annanay <annanayagarwal@gmail.com>	2020-07-31 13:03:56 +05:30
Julien Pivotto	f482c7bdd7	Add per scrape-config targets limit (#7554 ) * Add per scrape-config targets limit Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-07-30 14:20:24 +02:00
Annanay	7f98a744e5	Add context to Appender interface Signed-off-by: Annanay <annanayagarwal@gmail.com>	2020-07-24 19:40:51 +05:30
johncming	490f9c664e	scrape: remove two blank lines. (#7610 ) Signed-off-by: johncming <johncming@yahoo.com>	2020-07-19 07:34:04 +02:00
Julien Pivotto	754461b74f	Reuse the same appender for report and scrape (#7562 ) Additionally, implement isolation in collectResultAppender. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-07-16 13:53:39 +02:00
Julien Pivotto	190addffd8	Change Scrape Loop mtx to Mutex (#7553 ) It was still RWLock but we never use the read lock.. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-07-11 15:37:13 +02:00
Brian Brazil	f9d21f10ec	Only relabelling should apply for scrape_samples_scraped_post_relabelling. (#7342 ) More consistent variable names. Fixes #7298 Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2020-06-04 16:00:37 +01:00
Brian Brazil	c9565f08aa	Pass reference to checkAddError so appendErrors is updated. (#7294 ) This was preventing the warnings from being logged. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2020-05-26 15:14:55 +01:00
Marek Slabicki	8224ddec23	Capitalizing first letter of all log lines (#7043 ) Signed-off-by: Marek Slabicki <thaniri@gmail.com>	2020-04-11 09:22:18 +01:00
Callum Styan	c453def8c5	Separate scrape add error checking out into it's own function. (#6930 ) * Separate scrape add error checking out into it's own function. Signed-off-by: Callum Styan <callumstyan@gmail.com> * pass sampleLimitError to checkAddError instead of returning an error Signed-off-by: Callum Styan <callumstyan@gmail.com> * Return bool, error from checkAddError so we can properly handle ErrNotFound for AddFast. This should in theory never happen, but the previous code path handled this case. Adds a test for this, which master passes and the previous commit fails. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address comment changes. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Move sampleAdded inside the loop iteration within append, since that's the only block the variable is used in. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2020-03-25 19:31:48 -07:00
Julien Pivotto	d6ad5551c9	Scrape: do not put staleness marker when cache is reused (#7011 ) * Scrape: do not put staleness marker when cache is reused Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-03-20 17:43:26 +01:00
Julien Pivotto	8907ba6235	Make TSDB use storage errors This fixes #6992, which was introduced by #6777. There was an intermediate component which translated TSDB errors into storage errors, but that component was deleted and this bug went unnoticed, until we were watching at the Prombench results. Without this, scrape will fail instead of dropping samples or using "Add" when the series have been garbage collected. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-03-17 22:24:25 +01:00
Björn Rabenstein	d80b0810c1	Move crucial actions to defer (#6918 ) With defer having less of a performance penalty, there is no reason not to do those crucial operations via defer. Context: With isolation in place, if we forget to Commit/Rollback, the low watermark will get stuck forever. The current code should not have any bugs, but moving to defer helps to avoid future bugs. This is also moving the `closeAppend` in the `Commit` implementation itself to defer. If logging to the WAL fails, we would have missed the `closeAppend`. Signed-off-by: beorn7 <beorn@grafana.com>	2020-03-13 20:54:47 +01:00
Brian Brazil	5da8990053	Log scrape append failures as debug rather than warn. (#6852 ) This is most likely due to an endpoint not producing valid metrics output, which we should treat the same as a failed scrape, and thus not spam the application logs with it. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2020-03-06 00:46:03 +00:00
李国忠	52025bd7a9	[comments] change word ‘wheter’ to ‘whether’ (#6912 ) * [comments] change word ‘wheter’ to ‘whether’ Signed-off-by: fuling <fuling.lgz@alibaba-inc.com> * [comments] change word ‘wheter’ to ‘whether’ Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>	2020-03-02 13:51:24 +05:30
Julien Pivotto	ed623f69e2	tsdb: don't allow ingesting empty labelsets (#6891 ) * tsdb: don't allow ingesting empty labelsets When we ingest an empty labelset in the head, further blocks can not be compacted, with the error: ``` level=error ts=2020-02-27T21:26:58.379Z caller=db.go:659 component=tsdb msg="compaction failed" err="persist head block: write compaction: add series: out-of-order series added with label set \"{}\" / prev: \"{}\"" ``` We should therefore reject those invalid empty labelsets upfront. This can be reproduced with the following: ``` cat << END > prometheus.yml scrape_configs: - job_name: 'prometheus' scrape_interval: 1s basic_auth: username: test password: test metric_relabel_configs: - regex: ".*" action: labeldrop static_configs: - targets: - 127.0.1.1:9090 END ./prometheus --storage.tsdb.min-block-duration=1m ``` And wait a few minutes. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-03-02 07:18:05 +00:00
Bartlomiej Plotka	34426766d8	Unify Iterator interfaces. All point to storage now. This is part of https://github.com/prometheus/prometheus/pull/5882 that can be done to simplify things. All todos I added will be fixed in follow up PRs. * querier.Querier, querier.Appender, querier.SeriesSet, and querier.Series interfaces merged with storage interface.go. All imports that. * querier.SeriesIterator replaced by chunkenc.Iterator * Added chunkenc.Iterator.Seek method and tests for xor implementation (?) * Since we properly handle SelectParams for Select methods I adjusted min max based on that. This should help in terms of performance for queries with functions like offset. * added Seek to deletedIterator and test. * storage/tsdb was removed as it was only a unnecessary glue with incompatible structs. No logic was changed, only different source of abstractions, so no need for benchmarks. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-02-17 18:03:54 +00:00
gotjosh	8b49c9285d	scrape: Add metrics to track bytes and entries in the metadata cache (#6675 ) Signed-off-by: gotjosh <josue@grafana.com>	2020-01-29 11:13:18 +00:00
Julien Pivotto	fafb7940b1	Pass over scrape cache to the next scrape (#6670 ) * Pass over scrape cache to the next scrape Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-01-22 12:13:47 +00:00
gotjosh	05842176a6	Make the scrape.metricMetadataStore interface public To test the implementation of our metric metadata API, we need to represent various states of metadata in the scrape metadata store. That is currently not possible as the interface and method to set the store are private. This changes the interface, list and get methods, and the SetMetadaStore function to be public. Incidentally, the scrapeCache implementation needs to be renamed to match the new signature. Signed-off-by: gotjosh <josue@grafana.com>	2019-12-05 10:29:58 +00:00
Geoffrey Beausire	5cb7987314	Fix relabaling collision when using exported label When using both a label and the suffix+label in the relabel config. It's possible that Prometheus remove the suffx+label for no obvious reason. It's due to a collision when merging labels from target and from the sample. Signed-off-by: Geoffrey Beausire <g.beausire@criteo.com>	2019-11-26 11:03:11 +01:00

1 2

92 commits