* scrape: add label limits per scrape
Add three new limits to the scrape configuration to provide some
mechanism to defend against unbound number of labels and excessive
label lengths. If any of these limits are broken by a sample from a
scrape, the whole scrape will fail. For all of these configuration
options, a zero value means no limit.
The `label_limit` configuration will provide a mechanism to bound the
number of labels per-scrape of a certain sample to a user defined limit.
This limit will be tested against the sample labels plus the discovery
labels, but it will exclude the __name__ from the count since it is a
mandatory Prometheus label to which applying constraints isn't
meaningful.
The `label_name_length_limit` and `label_value_length_limit` will
prevent having labels of excessive lengths. These limits also skip the
__name__ label for the same reasons as the `label_limit` option and will
also make the scrape fail if any sample has a label name/value length
that exceed the predefined limits.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
* scrape: add metrics and alert to label limits
Add three gauge, one for each label limit to easily access the
limit set by a certain scrape target.
Also add a counter to count the number of targets that exceeded the
label limits and thus were dropped. This is useful for the
`PrometheusLabelLimitHit` alert that will notify the users that scraping
some targets failed because they had samples exceeding the label limits
defined in the scrape configuration.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
* scrape: apply label limits to __name__ label
Apply limits to the __name__ label that was previously skipped and
truncate the label names and values in the error messages as they can be
very very long.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
* scrape: remove label limits gauges and refactor
Remove `prometheus_target_scrape_pool_label_limit`,
`prometheus_target_scrape_pool_label_name_length_limit`, and
`prometheus_target_scrape_pool_label_value_length_limit` as they are not
really useful since we don't have the information on the labels in it.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
- Remove unrelated changes
- Refactor code out of the API module - that is already getting pretty crowded.
- Don't track reference for AddFast in remote write. This has the potential to consume unlimited server-side memory if a malicious client pushes a different label set for every series. For now, its easier and safer to always use the 'slow' path.
- Return 400 on out of order samples.
- Use remote.DecodeWriteRequest in the remote write adapters.
- Put this behing the 'remote-write-server' feature flag
- Add some (very) basic docs.
- Used named return & add test for commit error propagation
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
In its current form this configuration clashes in one of the most widely
used configurations (kube-prometheus). This patch scopes the
configuration to prevent this.
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
Currently, it relies on `job, instance` being the labels completely
identifying a Prometheus instance. However, what's intended is to
simply not match on `remote_name, url`.
Signed-off-by: beorn7 <beorn@grafana.com>
There is certainly a potential to add more of these. This is mostly
meant to introduce the concept and cover a few critical parts.
Signed-off-by: beorn7 <beorn@grafana.com>
* Testify: move to require
Moving testify to require to fail tests early in case of errors.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* More moves
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* add networking.k8s.io for ingress
level=error ts=2020-10-19T08:32:30.544Z caller=klog.go:96 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:494: Failed to watch *v1beta1.Ingress: failed to list *v1beta1.Ingress: ingresses.networking.k8s.io is forbidden: User \"system:serviceaccount:monitoring:prometheus\" cannot list resource \"ingresses\" in API group \"networking.k8s.io\" at the cluster scope"
Signed-off-by: root <likerj@inspur.com>
* Update rbac-setup.yml
Signed-off-by: root <likerj@inspur.com>
This should be the way forward when importing libraries in jsonnet. It's
closer to how Go imports look and makes it more obvious where packages
live.
This is not breaking anything, as the old imports were already symlinks
to the now directly used directories.
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
* Mixin: Ignore unset remote write timestamp
This pull request ignores the zero value of highest_sent_timestamp_seconds
in Highest Timestamp In vs. Highest Timestamp Sent which just show that
remote write has not been successful yet.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* update the doc link in internal_arthitecture.md
* address reviewer's comment to remove out-dated wrapper
Signed-off-by: Luke Chen <showuon@gmail.com>
* .circleci/config.yml: check mixins
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Run jsonnetfmt
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Install tools in the image instead of using coreos/jsonnet-ci
The latter is deprecated
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Update jsonnetfile.json
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Due to https://github.com/grafana/grafana/issues/15642, this prevents users putting this dashboard in a Grafana folder called 'Prometheus'.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
* add test to custom-sd/adapter writeOutput() function
Signed-off-by: Benoit Gagnon <benoit.gagnon@ubisoft.com>
* fix Adapter.writeOutput() function to work on Windows
On that platform, files cannot be moved while a process holds a handle
to them. Added an explicit Close() before that move. With this change,
the unit test succeeds.
Signed-off-by: Benoit Gagnon <benoit.gagnon@ubisoft.com>
* add missing dot to comment
Signed-off-by: Benoit Gagnon <benoit.gagnon@ubisoft.com>
* [bugfix] custom SD: when ip out of order, reflect.deepEqual can not correctly identify whether there is a change
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
* [format] makefile:Makefile.common:116: common-style
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
* [bugfix] custom sd: simonpasquier comment,It would be simpler to sort the targets alphabetically and keep reflect.DeepEqual.
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
* [bugfix]custom SD:fix sort
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
* [bugfix] custom SD : adapter.go need an empty line after "sort"
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
* [bugfix]custom SD:test sign-off
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
* [bugfix]custom SD: fix adaper_test.go
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
The counter is only increased when tsdb.Open() is called which
Prometheus does only once in its lifetime (when it initializes). If the
corruption can't be recovered, tsdb.Open() returns an error and
Prometheus exits. Hence the metric is either 0 (no corruption) or 1
(corruption detected and repaired). If the latter, the alert isn't
actionable and the only way to resolve it is to restart Prometheus which
would reset the counter.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
While doing so, re-introduce the summary/description
annotations. Also, add a few more rules and tweak a few of the
existing ones.
Signed-off-by: beorn7 <beorn@grafana.com>
From the documentation:
> The default HTTP client's Transport may not
> reuse HTTP/1.x "keep-alive" TCP connections if the Body is
> not read to completion and closed.
This effectively enable keep-alive for the fixed requests.
Signed-off-by: Romain Baugue <romain.baugue@elwinar.com>
i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors.
ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives.
iii) Does away with the use of fmt package for errors in favour of pkg/errors
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
Although it is spelling mistakes, it might make an affects while reading.
Co-Authored-By: Nguyen Phuong An <AnNP@vn.fujitsu.com>
Signed-off-by: Kim Bao Long <longkb@vn.fujitsu.com>
Fix http link to https link for secure, modify http to https
in the links of project. Have some http links doesn't
redirect into https.
Co-Authored-By: Nguyen Van Trung trungnv@vn.fujitsu.com
Signed-off-by: Nguyen Hai Truong <truongnh@vn.fujitsu.com>
* *: bump gRPC dependencies
This change updates the gRPC dependencies to more recent versions:
* github.com/gogo/protobuf => v1.2.0
* github.com/grpc-ecosystem/grpc-gateway => v1.6.3
* google.golang.org/grpc => v1.17.0
In addition scripts/genproto.sh leverages Go modules information instead of
hardcoding SHA1 commits. This ensures that the code is generated from
the exact same sources.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Run 'make proto' in CI
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Revert tabs -> spaces change
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Fix 'make proto' step
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* 'go get' grpc/protobuf dependencies
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Prepopulate cache with go mod download
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* *: use latest release of staticcheck
It also fixes a couple of things in the code flagged by the additional
checks.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Use official release of staticcheck
Also run 'go list' before staticcheck to avoid failures when downloading packages.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* update promlog to latest version
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* Update api tests, fix main setup
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* tidy go.sum
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* revendor prometheus/common
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* only initialize config; use kingpin for remote_storage_adapter
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* actually parse the flags
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* clean up imports
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* *: remove use of golang.org/x/net/context
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* scrape: fix TestTargetScrapeScrapeCancel
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
There are many more (mostly finalizers like Close/Stop/etc.), but most of
the others seemed like one couldn't do much about them anyway.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
As alertmanager needs to be configured in the config file in Prometheus 2, I think it is useful to have it in the example config.
Also renamed the rules in the example config so they are explicitely yml files.
* k8s: Support discovery of ingresses
* Move additional labels below allocation
This makes it more obvious why the additional elements are allocated.
Also fix allocation for node where we only set a single label.
* k8s: Remove port from ingress discovery
* k8s: Add comment to ingress discovery example
Kubernetes 1.7+ no longer exposes cAdvisor metrics on the Kubelet
metrics endpoint. Update the example configuration to scrape cAdvisor
in addition to Kubelet. The provided configuration works for 1.7.3+
and commented notes are given for 1.7.2 and earlier versions.
Also remove the comment about node (Kubelet) CA not matching the master
CA. Since the example no longer connects directly to the nodes, it
doesn't matter what CA they're using.
References:
- https://github.com/kubernetes/kubernetes/issues/48483
- https://github.com/kubernetes/kubernetes/pull/49079