Commit graph

8815 commits

Author SHA1 Message Date
hanjm 1df05bfd49 Add body_size_limit to prevent bad targets response large body cause Prometheus server OOM (#8827)
Signed-off-by: hanjm <hanjinming@outlook.com>
2021-05-29 07:05:42 +08:00
Levi Harrison 2826fbeeb7
SD: Add target creation failure counter and change failure handling (#8786)
* Added metric and changed failure/drop strategy

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-05-28 23:50:59 +02:00
Julien Pivotto ae086c73cb
Merge pull request #8757 from songjiayang/refactor-processExternalLabels
Refactor processExternalLabels method with slice copy for left labels
2021-05-25 18:12:16 +02:00
ide-rea ef584a9df6
Improve wal.go segments sequential validation (#8859)
Signed-off-by: XiaoYu Zhang <ideoutrea@163.com>
2021-05-25 15:38:35 +05:30
Julien Pivotto 03b354d4d9
Merge pull request #8838 from eltociear/patch-1
Fix typo in storage.md
2021-05-22 16:22:36 +02:00
Julien Pivotto 8ccde7f45e
Merge pull request #8851 from kjinan/main
typos correct
2021-05-22 16:22:15 +02:00
Augustin Husson 1838068db5
bump codemirror-promql to v0.16.0 (#8856)
Signed-off-by: Augustin Husson <husson.augustin@gmail.com>
2021-05-20 23:00:15 +02:00
Julien Pivotto ac49c1be4a
Merge pull request #8828 from roidelapluie/fix_prometheus_sd_discovered_targets
Fix the computation of prometheus_sd_discovered_targets
2021-05-20 21:17:15 +02:00
Ikko Ashimine 446e5cc160 Fix typo in storage.md
mulitple -> multiple

Signed-off-by: Ikko Ashimine <eltociear@gmail.com>
2021-05-20 20:32:03 +09:00
kjinan e1370eecde typos correct
Signed-off-by: kjinan <2008kongxiangsheng@163.com>
2021-05-20 09:52:33 +08:00
Julien Pivotto ea33dbf80f
Merge pull request #8822 from kcx2366425574/main
remove unused param
2021-05-19 23:15:17 +02:00
Ben Ye d95b097250
expose seriesToChunkEncoder (#8845)
Signed-off-by: yeya24 <yb532204897@gmail.com>
2021-05-19 13:01:35 +01:00
Julius Volz e6bb865ad4
Add upcoming release shepherds and new releases (#8842)
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2021-05-18 21:43:38 +02:00
Ben Ye 0a8912433a
allow compact series merger to be configurable (#8836)
Signed-off-by: yeya24 <yb532204897@gmail.com>
2021-05-18 18:38:37 +02:00
Matthias Loibl 7e7efaba32
storage: Split chunks if more than 120 samples (#8582)
* storage: Split chunks if more than 120 samples

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* storage: Don't set maxt which is overwritten right away

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* storage: Improve comments on merge_test

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* storage: Improve comments and move code closer to usage

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* tsdb/tsdbutil: Add comment for GenerateSamples

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
2021-05-18 18:37:16 +02:00
Chris Marchbanks 4bf7de3b56
Merge pull request #8841 from prometheus/release-2.27
Merge 2.27.1 into main
2021-05-18 09:04:03 -06:00
Julien Pivotto db7f0bcec2
Merge pull request from GHSA-vx57-7f4q-fpc7
* Do not remove /new because it is not part of the route parameter (CVE-2021-29622)

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Release 2.27.1

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-05-18 06:47:45 -06:00
Julien Pivotto 58a6c7d699
Merge pull request #8834 from SuperSandro2000/patch-1
Fix indentation
2021-05-16 23:08:08 +02:00
Sandro 0ffcddbee8
Fix indentation
Signed-off-by: Sandro Jäckel <sandro.jaeckel@gmail.com>
2021-05-16 05:27:05 +02:00
Julien Pivotto e1774b6f83 Fix the computation of prometheus_sd_discovered_targets
prometheus_sd_discovered_targets is wrongly calculated when there are
multiple SD configurations in place. One discovery manager can have
multiple groups coming from multiple service discoveries.

When multiple service discovery configs are used, we do not compute the
metric correctly, and instead just set the metric to one of the service
discoveries.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-05-14 22:38:37 +02:00
Julien Pivotto 11bcb93100
Merge pull request #8826 from kjinan/main
typos correct
2021-05-14 12:46:25 +02:00
Julien Pivotto a40743c4c0
Merge pull request #8819 from prometheus/release-2.27
Merge Relase 2.27 back to main
2021-05-14 09:50:31 +02:00
kjinan 24869ff2d0 typos correct
Signed-off-by: kjinan <2008kongxiangsheng@163.com>
2021-05-14 09:34:44 +08:00
kcx2366425574 be9c870b06 remove the param that is not used
Signed-off-by: kcx2366425574 <kuangcx@inspur.com>
2021-05-13 20:15:13 +08:00
Chris Marchbanks 24c9b61221
Release 2.27.0 (#8814)
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2021-05-12 11:32:12 -06:00
ide-rea 277bac622a
validate exemplar labelSet length first (#8816)
* ignore check exemplar labelSet length when append

Signed-off-by: XiaoYu Zhang <ideoutrea@163.com>

* validate exemplar labelSet length firstly

Signed-off-by: XiaoYu Zhang <ideoutrea@163.com>
2021-05-12 20:17:05 +05:30
songjiayang 9a01472780 Refactor processExternalLabels method with slice copy for left labels
Signed-off-by: songjiayang <songjiayang1@gmail.com>
2021-05-12 21:31:41 +08:00
Julius Volz e313ffa8ab
Fix "instant selector vector" typo in error messages (#8800)
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2021-05-10 23:33:26 +02:00
Chris Marchbanks aedd4fa95c
Cut v2.27.0-rc.0 (#8793)
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2021-05-09 16:10:58 -06:00
Julien Pivotto 0a28f1ae9d
Merge pull request #8796 from hs0210/work
Fix golint issue
2021-05-08 14:57:30 +02:00
Hu Shuai 996848ef40 Fix golint issue
Signed-off-by: Hu Shuai <hus.fnst@cn.fujitsu.com>
2021-05-08 11:45:29 +08:00
Chris Marchbanks 5b61ac4412
Merge pull request #8792 from prometheus/v2-27-update-dependencies
Update dependencies
2021-05-07 07:48:13 -06:00
Callum Styan 8fd73b1d28
Add Exemplar Remote Write support (#8296)
* Write exemplars to the WAL and send them over remote write.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Update example for exemplars, print data in a more obvious format.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Add metrics for remote write of exemplars.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix incorrect slices passed to send in remote write.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* We need to unregister the new metrics.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address review comments

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Order of exemplar append vs write exemplar to WAL needs to change.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Several fixes to prevent sending uninitialized or incorrect samples with an exemplar. Fix dropping exemplar for missing series. Add tests for queue_manager sending exemplars

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Store both samples and exemplars in the same timeseries buffer to remove the alloc when building final request, keep sub-slices in separate buffers for re-use

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Condense sample/exemplar delivery tests to parameterized sub-tests

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Rename test methods for clarity now that they also handle exemplars

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Rename counter variable. Fix instances where metrics were not updated correctly

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Add exemplars to LoadWAL benchmark

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* last exemplars timestamp metric needs to convert value to seconds with
ms precision

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Process exemplar records in a separate go routine when loading the WAL.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address review comments related to clarifying comments and variable
names. Also refactor sample/exemplar to enqueue prompb types.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Regenerate types proto with comments, update protoc version again.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Put remote write of exemplars behind a feature flag.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address some of Ganesh's review comments.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Move exemplar remote write feature flag to a config file field.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address Bartek's review comments.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Don't allocate exemplar buffers in queue_manager if we're not going to
send exemplars over remote write.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Add ValidateExemplar function, validate exemplars when appending to head
and log them all to WAL before adding them to exemplar storage.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address more reivew comments from Ganesh.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Add exemplar total label length check.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address a few last review comments

Signed-off-by: Callum Styan <callumstyan@gmail.com>

Co-authored-by: Martin Disibio <mdisibio@gmail.com>
2021-05-06 13:53:52 -07:00
Marco Pracucci 4b49ffbad5
Stop the bleed on chunk mapper panic (#8723)
* Added test to reproduce panic on TSDB head chunks truncated while querying

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added test for Querier too

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Stop the bleed on mmap-ed head chunks panic

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Lower memory pressure in tests to ensure it doesn't OOM

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Skip TestQuerier_ShouldNotPanicIfHeadChunkIsTruncatedWhileReadingQueriedChunks

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Experiment to not trigger runtime.GC() continuously

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Try to fix test in CI

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Do not call runtime.GC() at all

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* I have no idea why it's failing in CI, skipping tests

Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-05-06 14:18:59 -06:00
Chris Marchbanks 063ab7555d
Update javascript dependencies
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2021-05-06 13:16:06 -06:00
Chris Marchbanks 45c7c51a3b
Update go dependencies
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2021-05-06 13:15:16 -06:00
Chris Marchbanks 7c7dafc321
Do not snappy encode if record is too large (#8790)
Snappy cannot encode records larger than ~3.7 GB and will panic if an
encoding is attempted. Check to make sure that the record is smaller
than this before encoding.

In the future, we could improve this behavior to still compress large
records (or break them up into smaller records), but this avoids the
panic for users with very large single scrape targets.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2021-05-06 12:56:45 -06:00
Damien Grisonnet b50f9c1c84
Add label scrape limits (#8777)
* scrape: add label limits per scrape

Add three new limits to the scrape configuration to provide some
mechanism to defend against unbound number of labels and excessive
label lengths. If any of these limits are broken by a sample from a
scrape, the whole scrape will fail. For all of these configuration
options, a zero value means no limit.

The `label_limit` configuration will provide a mechanism to bound the
number of labels per-scrape of a certain sample to a user defined limit.
This limit will be tested against the sample labels plus the discovery
labels, but it will exclude the __name__ from the count since it is a
mandatory Prometheus label to which applying constraints isn't
meaningful.

The `label_name_length_limit` and `label_value_length_limit` will
prevent having labels of excessive lengths. These limits also skip the
__name__ label for the same reasons as the `label_limit` option and will
also make the scrape fail if any sample has a label name/value length
that exceed the predefined limits.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

* scrape: add metrics and alert to label limits

Add three gauge, one for each label limit to easily access the
limit set by a certain scrape target.
Also add a counter to count the number of targets that exceeded the
label limits and thus were dropped. This is useful for the
`PrometheusLabelLimitHit` alert that will notify the users that scraping
some targets failed because they had samples exceeding the label limits
defined in the scrape configuration.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

* scrape: apply label limits to __name__ label

Apply limits to the __name__ label that was previously skipped and
truncate the label names and values in the error messages as they can be
very very long.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

* scrape: remove label limits gauges and refactor

Remove `prometheus_target_scrape_pool_label_limit`,
`prometheus_target_scrape_pool_label_name_length_limit`, and
`prometheus_target_scrape_pool_label_value_length_limit` as they are not
really useful since we don't have the information on the labels in it.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-05-06 09:56:21 +01:00
Ben Ye 8f05cd8f9e
tsdb: move exemplar series labels to index entry (#8783)
Signed-off-by: yeya24 <yb532204897@gmail.com>
2021-05-05 18:51:16 +01:00
Ben Ye 9e8df5ade9
check latest exemplar timestamp (#8782)
Signed-off-by: yeya24 <yb532204897@gmail.com>
2021-05-05 16:28:48 +01:00
Julien Pivotto 27b78c336e
Merge pull request #8701 from prometheus/revert-8690-conorevans/changelog-include-links
Revert "Changelog: Add hyperlinks to PRs"
2021-05-03 16:21:27 +02:00
Julien Pivotto e69093f8f7
Merge pull request #8778 from owen-d/enhancement/expose-rule-metrics
[Enhancement] Expose rule metrics fields
2021-05-01 03:11:36 +02:00
Owen Diehl 23999df27c expose rule metrics fields
Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
2021-04-30 13:36:44 -04:00
Julien Pivotto 2a4b8e12bb
Merge pull request #8766 from Nick-Triller/consul-sd-always-send-targetgroups
Send empty targetgroup if nothing discovered [consul_sd]
2021-04-30 10:27:41 +02:00
Goutham Veeramachaneni 2efdf660b1
Increase evaluation failures on Commit() (#8770)
I think we should increment the metric here, we're setting the rule
health anyways. This means even if the "evaluation" suceeded, none of
the samples made it to storage.

This is a simplified solution to: https://github.com/prometheus/prometheus/pull/8410/

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2021-04-29 14:28:48 +02:00
Hu Shuai 9d7d818629
Fix golint issues caused by typos (#8769)
Signed-off-by: Hu Shuai <hus.fnst@cn.fujitsu.com>
2021-04-29 10:05:33 +02:00
Nick Triller 15d328750a
Fix typo in SD docs
Signed-off-by: Nick Triller <nicktriller@gmail.com>
2021-04-29 09:06:52 +02:00
Nick Triller fddf4918c0
Send empty targetgroup if nothing discovered
Signed-off-by: Nick Triller <nicktriller@gmail.com>
2021-04-29 09:06:52 +02:00
Julien Pivotto f3b2d2a998
Fix config tests in main branch (#8767)
The merge of 8761 did not catch that the secrets were off by one
because it was not rebased on top of 8693.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-04-29 00:00:30 +02:00
Levi Harrison fa184a5fc3
Add OAuth 2.0 Config (#8761)
* Introduced oauth2 config into the codebase

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-04-28 14:47:52 +02:00