Commit graph

8934 commits

Author SHA1 Message Date
Callum Styan 8fd73b1d28
Add Exemplar Remote Write support (#8296)
* Write exemplars to the WAL and send them over remote write.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Update example for exemplars, print data in a more obvious format.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Add metrics for remote write of exemplars.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix incorrect slices passed to send in remote write.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* We need to unregister the new metrics.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address review comments

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Order of exemplar append vs write exemplar to WAL needs to change.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Several fixes to prevent sending uninitialized or incorrect samples with an exemplar. Fix dropping exemplar for missing series. Add tests for queue_manager sending exemplars

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Store both samples and exemplars in the same timeseries buffer to remove the alloc when building final request, keep sub-slices in separate buffers for re-use

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Condense sample/exemplar delivery tests to parameterized sub-tests

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Rename test methods for clarity now that they also handle exemplars

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Rename counter variable. Fix instances where metrics were not updated correctly

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Add exemplars to LoadWAL benchmark

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* last exemplars timestamp metric needs to convert value to seconds with
ms precision

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Process exemplar records in a separate go routine when loading the WAL.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address review comments related to clarifying comments and variable
names. Also refactor sample/exemplar to enqueue prompb types.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Regenerate types proto with comments, update protoc version again.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Put remote write of exemplars behind a feature flag.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address some of Ganesh's review comments.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Move exemplar remote write feature flag to a config file field.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address Bartek's review comments.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Don't allocate exemplar buffers in queue_manager if we're not going to
send exemplars over remote write.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Add ValidateExemplar function, validate exemplars when appending to head
and log them all to WAL before adding them to exemplar storage.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address more reivew comments from Ganesh.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Add exemplar total label length check.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address a few last review comments

Signed-off-by: Callum Styan <callumstyan@gmail.com>

Co-authored-by: Martin Disibio <mdisibio@gmail.com>
2021-05-06 13:53:52 -07:00
Marco Pracucci 4b49ffbad5
Stop the bleed on chunk mapper panic (#8723)
* Added test to reproduce panic on TSDB head chunks truncated while querying

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added test for Querier too

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Stop the bleed on mmap-ed head chunks panic

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Lower memory pressure in tests to ensure it doesn't OOM

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Skip TestQuerier_ShouldNotPanicIfHeadChunkIsTruncatedWhileReadingQueriedChunks

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Experiment to not trigger runtime.GC() continuously

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Try to fix test in CI

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Do not call runtime.GC() at all

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* I have no idea why it's failing in CI, skipping tests

Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-05-06 14:18:59 -06:00
Chris Marchbanks 063ab7555d
Update javascript dependencies
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2021-05-06 13:16:06 -06:00
Chris Marchbanks 45c7c51a3b
Update go dependencies
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2021-05-06 13:15:16 -06:00
Chris Marchbanks 7c7dafc321
Do not snappy encode if record is too large (#8790)
Snappy cannot encode records larger than ~3.7 GB and will panic if an
encoding is attempted. Check to make sure that the record is smaller
than this before encoding.

In the future, we could improve this behavior to still compress large
records (or break them up into smaller records), but this avoids the
panic for users with very large single scrape targets.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2021-05-06 12:56:45 -06:00
Damien Grisonnet b50f9c1c84
Add label scrape limits (#8777)
* scrape: add label limits per scrape

Add three new limits to the scrape configuration to provide some
mechanism to defend against unbound number of labels and excessive
label lengths. If any of these limits are broken by a sample from a
scrape, the whole scrape will fail. For all of these configuration
options, a zero value means no limit.

The `label_limit` configuration will provide a mechanism to bound the
number of labels per-scrape of a certain sample to a user defined limit.
This limit will be tested against the sample labels plus the discovery
labels, but it will exclude the __name__ from the count since it is a
mandatory Prometheus label to which applying constraints isn't
meaningful.

The `label_name_length_limit` and `label_value_length_limit` will
prevent having labels of excessive lengths. These limits also skip the
__name__ label for the same reasons as the `label_limit` option and will
also make the scrape fail if any sample has a label name/value length
that exceed the predefined limits.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

* scrape: add metrics and alert to label limits

Add three gauge, one for each label limit to easily access the
limit set by a certain scrape target.
Also add a counter to count the number of targets that exceeded the
label limits and thus were dropped. This is useful for the
`PrometheusLabelLimitHit` alert that will notify the users that scraping
some targets failed because they had samples exceeding the label limits
defined in the scrape configuration.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

* scrape: apply label limits to __name__ label

Apply limits to the __name__ label that was previously skipped and
truncate the label names and values in the error messages as they can be
very very long.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

* scrape: remove label limits gauges and refactor

Remove `prometheus_target_scrape_pool_label_limit`,
`prometheus_target_scrape_pool_label_name_length_limit`, and
`prometheus_target_scrape_pool_label_value_length_limit` as they are not
really useful since we don't have the information on the labels in it.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-05-06 09:56:21 +01:00
Ben Ye 8f05cd8f9e
tsdb: move exemplar series labels to index entry (#8783)
Signed-off-by: yeya24 <yb532204897@gmail.com>
2021-05-05 18:51:16 +01:00
Ben Ye 9e8df5ade9
check latest exemplar timestamp (#8782)
Signed-off-by: yeya24 <yb532204897@gmail.com>
2021-05-05 16:28:48 +01:00
Julien Pivotto 27b78c336e
Merge pull request #8701 from prometheus/revert-8690-conorevans/changelog-include-links
Revert "Changelog: Add hyperlinks to PRs"
2021-05-03 16:21:27 +02:00
Julien Pivotto e69093f8f7
Merge pull request #8778 from owen-d/enhancement/expose-rule-metrics
[Enhancement] Expose rule metrics fields
2021-05-01 03:11:36 +02:00
Owen Diehl 23999df27c expose rule metrics fields
Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
2021-04-30 13:36:44 -04:00
Julien Pivotto 2a4b8e12bb
Merge pull request #8766 from Nick-Triller/consul-sd-always-send-targetgroups
Send empty targetgroup if nothing discovered [consul_sd]
2021-04-30 10:27:41 +02:00
Goutham Veeramachaneni 2efdf660b1
Increase evaluation failures on Commit() (#8770)
I think we should increment the metric here, we're setting the rule
health anyways. This means even if the "evaluation" suceeded, none of
the samples made it to storage.

This is a simplified solution to: https://github.com/prometheus/prometheus/pull/8410/

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2021-04-29 14:28:48 +02:00
Hu Shuai 9d7d818629
Fix golint issues caused by typos (#8769)
Signed-off-by: Hu Shuai <hus.fnst@cn.fujitsu.com>
2021-04-29 10:05:33 +02:00
Nick Triller 15d328750a
Fix typo in SD docs
Signed-off-by: Nick Triller <nicktriller@gmail.com>
2021-04-29 09:06:52 +02:00
Nick Triller fddf4918c0
Send empty targetgroup if nothing discovered
Signed-off-by: Nick Triller <nicktriller@gmail.com>
2021-04-29 09:06:52 +02:00
Julien Pivotto f3b2d2a998
Fix config tests in main branch (#8767)
The merge of 8761 did not catch that the secrets were off by one
because it was not rebased on top of 8693.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-04-29 00:00:30 +02:00
Levi Harrison fa184a5fc3
Add OAuth 2.0 Config (#8761)
* Introduced oauth2 config into the codebase

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-04-28 14:47:52 +02:00
n888 7c028d59c2
Add lightsail service discovery (#8693)
Signed-off-by: N888 <drifto@gmail.com>
2021-04-28 11:29:12 +02:00
Matthew Smedberg 8490273bac docs :: querying :: functions :: label_replace
Clarify documentation for label_replace() because of ambiguities
between label keys and label values.

Signed-off-by: Matthew Smedberg <matthew.smedberg@gmail.com>
2021-04-27 10:32:36 -06:00
Julien Pivotto e36e5fa833
Merge pull request #8731 from yeya24/update
Improve grouping label match logic
2021-04-27 00:55:22 +02:00
Fiona Liao 9b83d8330a
Fix memSafeIterator.Seek() (#8748)
* Add range query test cases

This includes a couple of failing ones that double count some points due
to the iterator seek bug.

Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>

* Add Seek() implementation for memSafeIterator

Previously, calling memSafeIterator.Seek() would call the Seek() method
on its embedded iterator. This was causing the embedded iterator and the
memSafeIterator to get out of sync because when the embedded Seek()
moved to the next element of the embedded iterator, memSafeIterator
didn't "know" about it. memSafeIterator has to "know" when the embedded
iterator has moved to be able to work out when it should be reading from
its buffer rather than the embedded iterator.

Used same logic as for xorIterator.Seek() (which in runtime is used as
the embedded iterator) - return false if the iterator has an error and
try to move to next element if the required time hasn't been reached, or
if no elements have been read yet. The memSafeIterator.Next() method is
being called so memSafeIterator.i is always accurate.

Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>

* Add tsdb package test

Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>

Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
2021-04-27 00:43:22 +02:00
Andrew Pickering 5047d36a77
Upgrade cssnano from v4.1.10 to v4.1.11 (#8759)
Removes the is-svg dependency since cssnano no longer depends on it.

Signed-off-by: Andy Pickering <anpicker@redhat.com>
2021-04-26 19:14:19 +02:00
Andrew Pickering 67514a5282
Upgrade sanitize-html from v1.27.5 to v2.3.3 (#8760)
sanitize-html v1.27.5 had several issues that are fixed in newer
versions.

Signed-off-by: Andy Pickering <anpicker@redhat.com>
2021-04-26 12:17:36 +02:00
ZouYu c7262f0d70
Fix some gofmt warnings (#8743)
Signed-off-by: Zou Yu <zouy.fnst@cn.fujitsu.com>
2021-04-22 08:43:30 -06:00
Julien Pivotto 7a2159e308
Merge pull request #8740 from GezimSejdiu/main
Fix a broken link for the bcrypt ref. at the web-config.yml example
2021-04-22 01:53:29 +02:00
Marco Pracucci 52df5ef7a3
TSDB: do not allocate exemplars buffer if exemplars are disabled (#8746)
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-04-21 20:02:21 +05:30
Julien Pivotto 896f37f1a5
Merge pull request #8744 from pracucci/upgrade-common
Upgrade prometheus/common to v0.21.0
2021-04-21 16:04:14 +02:00
Marco Pracucci 42c6f042cf
Cleanup go.sum
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-04-21 12:33:19 +02:00
Marco Pracucci 4da5c25ea4
Upgrade prometheus/common to v0.21.0
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-04-21 12:19:16 +02:00
Gezim Sejdiu 97acd170b2 Fix a broken link for the bcrypt ref. at the web-config.yml example
Signed-off-by: Gezim Sejdiu <g.sejdiu@gmail.com>
2021-04-20 22:43:37 +02:00
Julien Pivotto 5f4a5e79ea
Merge pull request #8737 from roidelapluie/scwpointer
scaleway_sd_config: be more cautious with pointers
2021-04-20 12:07:33 +02:00
Julien Pivotto 73237c04bf scaleway_sd_config: be more cautious with pointers
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-04-19 20:40:14 +02:00
Julien Pivotto a9a5f04ff9
UI: Move away from deprecated node-sass (#8733)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-04-19 09:01:45 +02:00
Julien Pivotto d78e9e973e
Merge pull request #8729 from code1305/azure-sd-errfix
return right error if any target creation fails
2021-04-19 01:13:39 +02:00
yeya24 d698e062dc improve grouping label match logic
Signed-off-by: yeya24 <yb532204897@gmail.com>
2021-04-16 22:04:58 -04:00
code1305 9c705ffdfb err fix if target creation fails
return right error if any target creation fails. Need to wrap the right error.

Signed-off-by: Anshul <anshulkhandelwal.nitj@gmail.com>
2021-04-16 20:37:52 +05:30
Bartlomiej Plotka 80545bfb2e
Instrumented circular exemplar storage. (#8712)
* Instrumented circular storage.

Fixes: https://github.com/prometheus/prometheus/issues/8708
Fixes: https://github.com/prometheus/prometheus/issues/8707

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed CB.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Julien comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Callum comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2021-04-16 13:44:53 +01:00
Julien Pivotto 85670a8040
Merge pull request #8721 from zhangshj-inspur/shaojie-branch
update redirected url
2021-04-16 10:11:03 +02:00
Łukasz Mierzwa 850dbda5c3
Add a dark theme (#8604)
* Upgrade bootstrap and reactstrap to the latest version

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>

* Add SASS support

node-sass is needed for cra to handle SCSS files instead of pure CSS.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>

* Add a dark theme

This adds a dark theme and UI controls to switch between themes.
Dark theme will require some CSS changes that will follow in future commits.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>

* Add a margin to Prometheus brand

There is no space between 'Prometheus' brand text and the toggle button when using mobile device.
This adds a margin to the button that's only rendered on mobile

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>

* Add a dark theme for CollapsibleAlertPanel

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>

* Add a dark theme for RulesContent

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>

* Add a dark theme for Config

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>

* Use bootstrap classes for margins

We can override margins via bootstrap css classes instead of loading custom css module.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>

* Add a dark theme for QueryStatsView

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>

* Add a dark theme for MetricsExplorer

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>

* Add a dark theme for 'Clear time' button
This button had some custom css based on light bootstrap theme so it needs to be adjusted for dark theme.
This change re-uses bootstrap styles used for input components instead of copying color values

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>

* Add dark theme for Graph panel input

This makes the whole input group look consistent in dark mode as the old styles were made to blend it with the default bootstrap theme.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>

* Add a dark theme for CME expression input

This change splits current CME theme into 3:
1 - base theme used for both light and dark mode
2 - light mode specific theme that overrides base
3 - dark mode specific theme that overrides base

To make it all work we also need to move theme to dynamic config, so when theme value
in ThemeContext changes CME input will apply a new theme.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>

* Add a dark theme for /graph page tabs

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>

* Fix metrics explorer modal scroll

bootstrap-dark breaks scrolling on the metrics modal, so we need an extra rule to fix that.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>

* Move App.css rules to themes/_shared.scss

This completes splitting styles into light and dark theme.
It also fixes some small issues with themes as now all styles from App.css are applied correctly.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>

* Move html{} styles to a dedicated file

html block is root document so styles for it cannot be nested under theme classes.
Move it out and add a bit of documentation to explain what which file does.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>

* Fix reboot styles overriding other FontAwesome classes

Both bootstrap themes we use import reboot classes (https://getbootstrap.com/docs/4.6/content/reboot/) which has the side effect of overriding other classes. We need reboot to be applied as defaults for the browser, so it needs to be moved out of theme class selectors. But because reboot requires scss variables we need to feed it something, for that we use the default light theme, so it gets imported there and browser will use style of the default theme to reset default (unthemed) styles.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>

* Move codicon font to app.scss

This needs to be applied globally, not per theme.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2021-04-15 18:14:07 +02:00
nberkley f9e2dd0697
Add support for smaller block chunk segment allocations (#8478)
* Add support for --storage.tsdb.max-chunk-size to suport small chunks for space limited prometheus instances.

Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Update tsdb/compact.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Update tsdb/db.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Update cmd/prometheus/main.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Change naming scheme to

Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Add a lower bound to --storage.tsdb.max-block-chunk-segment-size

Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Update storage.md to explain what a chunk segment is

Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Apply suggestions from code review

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Force tests

Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Fix code style

Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
2021-04-15 14:25:01 +05:30
Julien Pivotto 39d79c3cfb
Merge pull request #8719 from Nexucis/feature/cm-promql-v0.15
Bump cm-promql to v0.15.0
2021-04-15 01:21:49 +02:00
zhangshj 1956f07197 update redirected url
Signed-off-by: zhangshj <zhangshj@inspur.com>
2021-04-14 13:54:40 +08:00
Julien Pivotto ea6f6bba74
Enable parsing strings in humanize functions (#8682)
* Enable parsing strings in humanize functions

This is useful to humanize count_values or buckets labels.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-04-14 00:30:15 +02:00
Augustin Husson bfc022fdf4 use the metricsNames in PromQLExtension & update the import path
Signed-off-by: Augustin Husson <husson.augustin@gmail.com>
2021-04-13 22:13:47 +02:00
Augustin Husson 7071b94a07 remove unused import
Signed-off-by: Augustin Husson <husson.augustin@gmail.com>
2021-04-13 22:13:02 +02:00
Augustin Husson 4c25b48ed5 bump cm-promql to v0.15.0
Signed-off-by: Augustin Husson <husson.augustin@gmail.com>
2021-04-13 22:12:42 +02:00
Ben Kochie 62afcabd01
Merge pull request #8716 from prometheus/superq/bump_tool
Update Makefile.common
2021-04-13 14:40:18 +02:00
Bogdan Drutu d1ced85d7a
Bump k8s.io/* from 0.20.5 to 0.21.0 (#8714)
Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
2021-04-13 10:00:00 +02:00
Julien Pivotto 5bce801a09
Rename discovery/dockerswarm to discovery/moby (#8691)
This makes it clear that the dockerswarm package does more than docker
swarm, but does also docker.

I have picked moby as it is the upstream name: https://mobyproject.org/

There is no user-facing change, except in the case of a bad
configuration. Previously, a user who would have a bad docker sd config
would see an error like:

> field xx not found in type dockerswarm.plain

Now that error would be turned into:

> field xx not found in type moby.plain

While not perfect, it should at not be confusing between docker and
dockerswarm.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-04-13 09:33:54 +02:00