Commit graph

280 commits

Author SHA1 Message Date
Levi Harrison b5f6f8fb36 Switched to go-kit/log
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-06-11 12:28:36 -04:00
Julien Pivotto 20c6739adc
Merge pull request #8833 from hanjm/feature/add-scape-read-body-limit
Add body_size_limit to prevent bad targets response large body cause Prometheus server OOM (#8827)
2021-06-02 09:24:59 +02:00
TJ Hoplock dc22c65349
Add Linode Service Discovery (#8846)
* Add Linode Service Discovery

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
2021-06-01 20:32:36 +02:00
hanjm 1df05bfd49 Add body_size_limit to prevent bad targets response large body cause Prometheus server OOM (#8827)
Signed-off-by: hanjm <hanjinming@outlook.com>
2021-05-29 07:05:42 +08:00
Levi Harrison 2826fbeeb7
SD: Add target creation failure counter and change failure handling (#8786)
* Added metric and changed failure/drop strategy

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-05-28 23:50:59 +02:00
Callum Styan 8fd73b1d28
Add Exemplar Remote Write support (#8296)
* Write exemplars to the WAL and send them over remote write.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Update example for exemplars, print data in a more obvious format.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Add metrics for remote write of exemplars.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix incorrect slices passed to send in remote write.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* We need to unregister the new metrics.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address review comments

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Order of exemplar append vs write exemplar to WAL needs to change.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Several fixes to prevent sending uninitialized or incorrect samples with an exemplar. Fix dropping exemplar for missing series. Add tests for queue_manager sending exemplars

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Store both samples and exemplars in the same timeseries buffer to remove the alloc when building final request, keep sub-slices in separate buffers for re-use

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Condense sample/exemplar delivery tests to parameterized sub-tests

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Rename test methods for clarity now that they also handle exemplars

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Rename counter variable. Fix instances where metrics were not updated correctly

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Add exemplars to LoadWAL benchmark

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* last exemplars timestamp metric needs to convert value to seconds with
ms precision

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Process exemplar records in a separate go routine when loading the WAL.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address review comments related to clarifying comments and variable
names. Also refactor sample/exemplar to enqueue prompb types.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Regenerate types proto with comments, update protoc version again.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Put remote write of exemplars behind a feature flag.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address some of Ganesh's review comments.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Move exemplar remote write feature flag to a config file field.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address Bartek's review comments.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Don't allocate exemplar buffers in queue_manager if we're not going to
send exemplars over remote write.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Add ValidateExemplar function, validate exemplars when appending to head
and log them all to WAL before adding them to exemplar storage.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address more reivew comments from Ganesh.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Add exemplar total label length check.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address a few last review comments

Signed-off-by: Callum Styan <callumstyan@gmail.com>

Co-authored-by: Martin Disibio <mdisibio@gmail.com>
2021-05-06 13:53:52 -07:00
Damien Grisonnet b50f9c1c84
Add label scrape limits (#8777)
* scrape: add label limits per scrape

Add three new limits to the scrape configuration to provide some
mechanism to defend against unbound number of labels and excessive
label lengths. If any of these limits are broken by a sample from a
scrape, the whole scrape will fail. For all of these configuration
options, a zero value means no limit.

The `label_limit` configuration will provide a mechanism to bound the
number of labels per-scrape of a certain sample to a user defined limit.
This limit will be tested against the sample labels plus the discovery
labels, but it will exclude the __name__ from the count since it is a
mandatory Prometheus label to which applying constraints isn't
meaningful.

The `label_name_length_limit` and `label_value_length_limit` will
prevent having labels of excessive lengths. These limits also skip the
__name__ label for the same reasons as the `label_limit` option and will
also make the scrape fail if any sample has a label name/value length
that exceed the predefined limits.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

* scrape: add metrics and alert to label limits

Add three gauge, one for each label limit to easily access the
limit set by a certain scrape target.
Also add a counter to count the number of targets that exceeded the
label limits and thus were dropped. This is useful for the
`PrometheusLabelLimitHit` alert that will notify the users that scraping
some targets failed because they had samples exceeding the label limits
defined in the scrape configuration.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

* scrape: apply label limits to __name__ label

Apply limits to the __name__ label that was previously skipped and
truncate the label names and values in the error messages as they can be
very very long.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

* scrape: remove label limits gauges and refactor

Remove `prometheus_target_scrape_pool_label_limit`,
`prometheus_target_scrape_pool_label_name_length_limit`, and
`prometheus_target_scrape_pool_label_value_length_limit` as they are not
really useful since we don't have the information on the labels in it.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-05-06 09:56:21 +01:00
Gezim Sejdiu 97acd170b2 Fix a broken link for the bcrypt ref. at the web-config.yml example
Signed-off-by: Gezim Sejdiu <g.sejdiu@gmail.com>
2021-04-20 22:43:37 +02:00
zhangshj 1956f07197 update redirected url
Signed-off-by: zhangshj <zhangshj@inspur.com>
2021-04-14 13:54:40 +08:00
Robert Jacob b253056163
Implement Docker discovery (#8629)
* Implement Docker discovery

Signed-off-by: Robert Jacob <xperimental@solidproject.de>
2021-03-29 22:30:23 +02:00
Rémy Léone f690b811c5
add support for scaleway service discovery (#8555)
Co-authored-by: Patrik <patrik@ptrk.io>
Co-authored-by: Julien Pivotto <roidelapluie@inuits.eu>

Signed-off-by: Rémy Léone <rleone@scaleway.com>
2021-03-10 15:10:17 +01:00
Julien Pivotto 432d5ebc6c Rename default branch to main
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-02-22 20:28:02 +01:00
Julien Pivotto 8787f0aed7 Update common to support credentials type
Most of the backwards compat tests is done in common.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-02-18 23:28:22 +01:00
Tom Wilkie d479151f1f Various enhancements and refactorings for remote write receiver:
- Remove unrelated changes
- Refactor code out of the API module - that is already getting pretty crowded.
- Don't track reference for AddFast in remote write.  This has the potential to consume unlimited server-side memory if a malicious client pushes a different label set for every series.  For now, its easier and safer to always use the 'slow' path.
- Return 400 on out of order samples.
- Use remote.DecodeWriteRequest in the remote write adapters.
- Put this behing the 'remote-write-server' feature flag
- Add some (very) basic docs.
- Used named return & add test for commit error propagation

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2021-02-08 20:41:23 +00:00
ravilr adc8807851
Update remote-write alert rules mixin (#8423)
Signed-off-by: ravilr <raviprasad_lr@yahoo.com>
2021-01-31 20:07:49 +00:00
Julien Pivotto 5bd7145e55
Merge pull request #8327 from roidelapluie/tlsexemple
https: Add example configuration file
2021-01-15 09:50:52 +01:00
Julien Pivotto 08c259cda6 https: Add example configuration file
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-01-15 01:37:50 +01:00
Frederic Branczyk 62bc755733
mixin: Scope grafana config
In its current form this configuration clashes in one of the most widely
used configurations (kube-prometheus). This patch scopes the
configuration to prevent this.

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
2020-12-30 17:50:34 +01:00
Nicolas Lamirault aa1ca13025
Add: Custom tags and prefix in Prometheus Mixin (#8287)
* Add: custom tags and prefix

Signed-off-by: Nicolas Lamirault <nicolas.lamirault@gmail.com>

* Fix: fmt

Signed-off-by: Nicolas Lamirault <nicolas.lamirault@gmail.com>
2020-12-16 18:49:06 +01:00
Björn Rabenstein 511511324a
Merge pull request #8235 from Allex1/master
Update remote-write grafana mixin
2020-12-08 14:50:47 +01:00
beorn7 553f904f2d mixin: Add a capability to exclude non-prod AM instances
Signed-off-by: beorn7 <beorn@grafana.com>
2020-12-03 20:59:53 +01:00
birca 3ec4161575 Update remote-write grafana mixin
Signed-off-by: birca <birca@adobe.com>
2020-12-02 09:50:15 +02:00
beorn7 638e99c814 prometheus-mixin: Make PrometheusRemoteWriteBehind more generic
Currently, it relies on `job, instance` being the labels completely
identifying a Prometheus instance. However, what's intended is to
simply not match on `remote_name, url`.

Signed-off-by: beorn7 <beorn@grafana.com>
2020-11-17 13:29:49 +01:00
beorn7 371ca9ff46 prometheus-mixin: add HA-group aware alerts
There is certainly a potential to add more of these. This is mostly
meant to introduce the concept and cover a few critical parts.

Signed-off-by: beorn7 <beorn@grafana.com>
2020-11-11 19:45:34 +01:00
Julien Pivotto 6c56a1faaa
Testify: move to require (#8122)
* Testify: move to require

Moving testify to require to fail tests early in case of errors.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* More moves

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-29 09:43:23 +00:00
like-inspur 29b551225b
add networking.k8s.io for ingress (#8091)
* add networking.k8s.io for ingress

level=error ts=2020-10-19T08:32:30.544Z caller=klog.go:96 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:494: Failed to watch *v1beta1.Ingress: failed to list *v1beta1.Ingress: ingresses.networking.k8s.io is forbidden: User \"system:serviceaccount:monitoring:prometheus\" cannot list resource \"ingresses\" in API group \"networking.k8s.io\" at the cluster scope"

Signed-off-by: root <likerj@inspur.com>

* Update rbac-setup.yml

Signed-off-by: root <likerj@inspur.com>
2020-10-22 15:08:12 -06:00
Julien Pivotto 4e5b1722b3
Move away from testutil, refactor imports (#8087)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-22 11:00:08 +02:00
Matthias Loibl 13ba013a24
Use absolute jsonnet import paths
This should be the way forward when importing libraries in jsonnet. It's
closer to how Go imports look and makes it more obvious where packages
live.

This is not breaking anything, as the old imports were already symlinks
to the now directly used directories.

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
2020-10-20 11:42:30 +02:00
Björn Rabenstein d49f267f76
Merge pull request #8054 from simonpasquier/improve-not-ingesting-samples-alert
documentation/prometheus-mixin: improve PrometheusNotIngestingSamples
2020-10-15 12:29:39 +02:00
Simon Pasquier f381d8a9bd documentation/prometheus-mixin: improve PrometheusNotIngestingSamples
The alert shouldn't fire when there's no target and no rule configured.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-10-15 11:13:17 +02:00
Julien Pivotto 4596abee4d
Mixin: Ignore unset remote write timestamp (#8046)
* Mixin: Ignore unset remote write timestamp

This pull request ignores the zero value of highest_sent_timestamp_seconds
in Highest Timestamp In vs. Highest Timestamp Sent which just show that
remote write has not been successful yet.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-15 09:15:59 +02:00
garanews c38816828f
fix few typo (#8023)
Signed-off-by: garanews <puntogtg@tiscali.it>
2020-10-07 16:51:31 +01:00
Luke Chen 3364875ae5
update the doc link in internal_arthitecture.md (#7966)
* update the doc link in internal_arthitecture.md
* address reviewer's comment to remove out-dated wrapper

Signed-off-by: Luke Chen <showuon@gmail.com>
2020-09-24 09:10:41 +01:00
Julien Pivotto e208afcc95
web: Remove APIv2 (#7935)
* web: Remove APIv2

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-09-15 09:30:55 +02:00
kangwoo 7c0d5ae4e7
Add Eureka Service Discovery (#3369)
Signed-off-by: kangwoo <kangwoo@gmail.com>
2020-08-26 17:36:59 +02:00
Simon Pasquier e693af6c01
.circleci/config.yml: check mixins (#6895)
* .circleci/config.yml: check mixins

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Run jsonnetfmt

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Install tools in the image instead of using coreos/jsonnet-ci

The latter is deprecated

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Update jsonnetfile.json

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-08-25 15:59:41 +02:00
Lukas Kämmerling b6955bf1ca
Add hetzner service discovery (#7822)
Signed-off-by: Lukas Kämmerling <lukas.kaemmerling@hetzner-cloud.de>
2020-08-21 15:49:19 +02:00
Julien Pivotto f482c7bdd7
Add per scrape-config targets limit (#7554)
* Add per scrape-config targets limit

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-30 14:20:24 +02:00
Frederic Branczyk 9f9fb1ab33
documentation: Adapt Kubernetes RBAC to use metrics roles (#3661) 2020-07-24 16:36:56 +02:00
Julien Pivotto 48140e5189 Improve docker swarm configuration exemple
Improve to use the unix socket as this is what is enabled by default.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-20 13:42:57 +02:00
Julien Pivotto be96951c56
Add Docker Swarm configuration example (#7542)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-09 20:05:21 +02:00
John Bampton 98a69b77d1
Fix spelling (#7512)
Signed-off-by: John Bampton <jbampton@users.noreply.github.com>
2020-07-04 14:54:26 +02:00
Tom Wilkie 27b1009acd
Rename the dashboard in the mixin to 'Prometheus Overview'. (#7489)
Due to https://github.com/grafana/grafana/issues/15642, this prevents users putting this dashboard in a Grafana folder called 'Prometheus'.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2020-06-30 15:45:44 +01:00
Julien Pivotto c61141ce51
Add DigitalOcean service discovery (#7407)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-06-18 17:04:41 +02:00
Manuel Fontan 6e7554639b Update Readme since jsonnetfmt is available in the jsonnet go implementation since v0.16.0
Signed-off-by: Manuel Fontan <mfontangarcia@slack-corp.com>
2020-06-16 10:41:58 +01:00
TakumaNakagame 7a541bd9a7
fix document rabbitmq example (#7297)
* remove prometheus.io annotations and add scrape_configs

Signed-off-by: TakumaNakagame <5129906+TakumaNakagame@users.noreply.github.com>
2020-05-27 11:34:05 +01:00
Bartlomiej Plotka 1d13a2cd2f Updated different swagger output.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-04-23 16:52:14 +01:00
Marek Slabicki 8224ddec23
Capitalizing first letter of all log lines (#7043)
Signed-off-by: Marek Slabicki <thaniri@gmail.com>
2020-04-11 09:22:18 +01:00
Callum Styan 5400e71b91 Update mixin dashboards and alerts for new remote write label names.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2020-04-08 12:56:00 -07:00
qinng e31b7b2679
[Doc] Fix wrong description in kubernetes expamle (#7012)
Signed-off-by: guoruyi1 <guoruyi1@xiaomi.com>

Co-authored-by: guoruyi1 <guoruyi1@xiaomi.com>
2020-03-20 08:03:43 +00:00
Julien Pivotto ef63d8d16d Update vendors
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-02-25 10:33:41 +01:00
Marco Pracucci 1e1785690a
Fix queue in alerts annotation
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2020-02-12 12:48:13 +01:00
paulfantom 7321f1d227
documentation/prometheus-mixin: add dependency on grafonnet
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2020-01-11 23:18:04 +01:00
Josh Soref 91d76c8023 Spelling (#6517)
* spelling: alertmanager

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: attributes

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: autocomplete

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: bootstrap

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: caught

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: chunkenc

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: compaction

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: corrupted

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: deletable

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: expected

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: fine-grained

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: initialized

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: iteration

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: javascript

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: multiple

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: number

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: overlapping

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: possible

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: postings

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: procedure

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: programmatic

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: queuing

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: querier

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: repairing

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: received

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: reproducible

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: retention

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: sample

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: segements

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: semantic

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: software [LICENSE]

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: staging

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: timestamp

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: unfortunately

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: uvarint

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: subsequently

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: ressamples

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
2020-01-02 15:54:09 +01:00
Callum Styan f4fb6dc208 Simplify remote write dashboard in mixin.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-11-18 19:58:07 -08:00
beorn7 9c8f9bfa63 Fix the description template for PrometheusRemoteWriteDesiredShards
Signed-off-by: beorn7 <beorn@grafana.com>
2019-10-30 13:27:37 +01:00
Björn Rabenstein 7c039a6b3b
Merge pull request #6242 from prometheus/beorn7/mixin
Fix PrometheusRemoteWriteDesiredShards
2019-10-29 16:01:09 +01:00
Benoit Gagnon 6d931a2195 Fix Windows support for custom-sd adapter (#6217)
* add test to custom-sd/adapter writeOutput() function

Signed-off-by: Benoit Gagnon <benoit.gagnon@ubisoft.com>

* fix Adapter.writeOutput() function to work on Windows

On that platform, files cannot be moved while a process holds a handle
to them. Added an explicit Close() before that move. With this change,
the unit test succeeds.

Signed-off-by: Benoit Gagnon <benoit.gagnon@ubisoft.com>

* add missing dot to comment

Signed-off-by: Benoit Gagnon <benoit.gagnon@ubisoft.com>
2019-10-29 10:41:31 +01:00
beorn7 61617eb2d9 Fix PrometheusRemoteWriteDesiredShards
This rule has the same labels on both sides. We don't want
`group_right` and `on`, we want nothing.

Signed-off-by: beorn7 <beorn@grafana.com>
2019-10-29 00:23:39 +01:00
Callum Styan da6d46625f Repeat shards panels on the queue label.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-10-21 11:03:50 -07:00
Callum Styan 818974ff8f Rewrite remote write dashboard using base grafonnet.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-10-17 15:40:58 -07:00
Callum Styan 81fa63006c Add additional shards/segment graphs to remote write dashboard.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-10-09 09:59:02 -07:00
Simon Pasquier e36ab7e192
prometheus-mixin: improve description of sample alerts (#6050)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-09-24 17:44:27 +02:00
Björn Rabenstein 3b3eaf3496
Merge pull request #5787 from cstyan/reshard-max-logging
Add metrics for max/min/desired shards to queue manager.
2019-09-09 22:32:54 +02:00
Callum Styan a98599bea8 Update remote write max shards alert; properly template/query for max
shards in description.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-09-09 12:01:11 -07:00
李国忠 d89e783217 [bugfix] custom SD: when ip out of order, reflect.deepEqual can not correctly identify whether there is a change (#5856)
* [bugfix] custom SD: when ip out of order, reflect.deepEqual can not correctly identify whether there is a change

Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>

* [format] makefile:Makefile.common:116: common-style

Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>

* [bugfix] custom sd: simonpasquier comment,It would be simpler to sort the targets alphabetically and keep reflect.DeepEqual.

Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>

* [bugfix]custom SD:fix sort

Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>

* [bugfix] custom SD : adapter.go need an empty line after "sort"

Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>

* [bugfix]custom SD:test sign-off

Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>

* [bugfix]custom SD: fix adaper_test.go

Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
2019-08-22 11:49:45 +02:00
Ganesh Vernekar 5ecef3542d
Cleanup after merging tsdb into prometheus
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-08-13 14:04:14 +05:30
Callum Styan 3b75614892 Add a warning alert, since the remote write behind alert will probably
already be going off, about desired shards being higher than max shards.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-08-08 06:45:46 -07:00
Simon Pasquier dd174963a2 prometheus-mixin: remove PrometheusTSDBWALCorruptions
The counter is only increased when tsdb.Open() is called which
Prometheus does only once in its lifetime (when it initializes). If the
corruption can't be recovered, tsdb.Open() returns an error and
Prometheus exits. Hence the metric is either 0 (no corruption) or 1
(corruption detected and repaired). If the latter, the alert isn't
actionable and the only way to resolve it is to restart Prometheus which
would reset the counter.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-08-06 14:36:56 +02:00
Vadym Martsynovskyy a9970a47ef Fix incorrect examples in docs
Signed-off-by: Vadym Martsynovskyy <vmartsynovskyy@gmail.com>
2019-08-04 16:42:42 -07:00
Matthias Loibl 20d12ff1c7
Fix prometheus-mixin dashboards to use grafanaDashboards
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
2019-07-11 15:40:26 +02:00
beorn7 4825585834 Tweak tenses
Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-28 17:37:49 +02:00
beorn7 9a2177949d Protect gauge-based alerts against failed scrapes
Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-28 16:46:19 +02:00
beorn7 52707535b8 Remove/improve unused variables and weird doc comments
Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-28 15:41:31 +02:00
beorn7 7a25a2586d Sync with alerts from kube-prometheus
While doing so, re-introduce the summary/description
annotations. Also, add a few more rules and tweak a few of the
existing ones.

Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-27 23:50:26 +02:00
beorn7 ded0705bdc Update remote repo for grafana-builder dependency
Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-27 14:39:38 +02:00
beorn7 1336a28848 Use a config variable for the Prometheus name
Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-27 14:34:11 +02:00
beorn7 613cb5430c Add a "work in progress" disclaimer.
Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-26 23:24:22 +02:00
beorn7 e34af6d4d3 Address various comments from the review
Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-26 23:22:16 +02:00
beorn7 23c03207e9 Fixed indentation
Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-26 20:31:05 +02:00
beorn7 d5845ad05b Fix formatting
This is the outcome of `make fmt`.

Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-26 16:23:25 +02:00
beorn7 d45e8a0f61 Adjust to jsonnet v0.13
Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-26 16:22:21 +02:00
beorn7 5c04ef3935 Make README.md immediately useful
Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-26 16:12:59 +02:00
beorn7 ddfabda152 Add Makefile and suitable jsonnet files
This makes the mixins usable as abvertised.

Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-26 15:30:55 +02:00
beorn7 e943803a3c Add .gitignore file
Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-26 15:22:23 +02:00
Björn Rabenstein 498d31e178
Merge pull request #5681 from prometheus/beorn7/mixin
Merge master into mixin
2019-06-19 23:17:41 +02:00
Callum Styan a5762f3681 Add dashboard for remote write to prometheus-mixin.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-06-17 15:02:42 -07:00
beorn7 5639aaf0a4 Merge branch 'master' into mixin 2019-06-17 13:07:11 +02:00
Romain Baugue 95193fa027 Exhaust every request body before closing it (#5166) (#5479)
From the documentation:
> The default HTTP client's Transport may not
> reuse HTTP/1.x "keep-alive" TCP connections if the Body is
> not read to completion and closed.

This effectively enable keep-alive for the fixed requests.

Signed-off-by: Romain Baugue <romain.baugue@elwinar.com>
2019-04-18 09:50:37 +01:00
qinng cc75c27580 Fix multiple response.WriteHeader calls error in remote read adapter (#5159)
* fix multiple response.WriteHeader calls in remote read adapter
* remove useless return

Signed-off-by: qinng <guoruyi1@xiaomi.com>
2019-04-10 13:25:35 +01:00
Tariq Ibrahim 8fdfa8abea refine error handling in prometheus (#5388)
i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors.
ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives.
iii) Does away with the use of fmt package for errors in favour of pkg/errors

Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-03-26 00:01:12 +01:00
Tom Wilkie 38a9bbbec2 Loosen off PrometheusRemoteWriteBehind alert.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-03-04 12:47:24 +00:00
Tom Wilkie b615069289 Update metric names.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-03-01 07:39:48 -08:00
LongKB 23480bef43 Remove the duplicated words (#5251)
Although it is spelling mistakes, it might make an affects while reading.

Co-Authored-By: Nguyen Phuong An <AnNP@vn.fujitsu.com>
Signed-off-by: Kim Bao Long <longkb@vn.fujitsu.com>
2019-02-22 14:32:34 +01:00
Nguyen Hai Truong 5fbda4c9d7 Secure http links (#5244)
Fix http link to https link for secure, modify http to https
in the links of project. Have some http links doesn't
redirect into https.

Co-Authored-By: Nguyen Van Trung trungnv@vn.fujitsu.com
Signed-off-by: Nguyen Hai Truong <truongnh@vn.fujitsu.com>
2019-02-21 10:48:47 +01:00
Kim Bao Long 94f5352951 Trivial fix: Fix some typos in comments
Co-Authored-By: Nguyen Phuong An <AnNP@vn.fujitsu.com>
Signed-off-by: Kim Bao Long <longkb@vn.fujitsu.com>
2019-02-21 09:07:49 +07:00
Tom Wilkie e248ffb220 Add alert for WAL remote write falling behind.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-12 15:22:58 +00:00
Callum Styan 5358f76c5c update remote write path proto so that Labels/Timeseries can't be nil (#4957)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-01-15 19:13:39 +00:00
Simon Pasquier 375ad1185c
*: bump gRPC dependencies (#5075)
* *: bump gRPC dependencies

This change updates the gRPC dependencies to more recent versions:

* github.com/gogo/protobuf => v1.2.0
* github.com/grpc-ecosystem/grpc-gateway => v1.6.3
* google.golang.org/grpc => v1.17.0

In addition scripts/genproto.sh leverages Go modules information instead of
hardcoding SHA1 commits. This ensures that the code is generated from
the exact same sources.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Run 'make proto' in CI

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Revert tabs -> spaces change

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Fix 'make proto' step

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* 'go get' grpc/protobuf dependencies

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Prepopulate cache with go mod download

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-01-15 15:32:05 +01:00
Simon Pasquier f678e27eb6
*: use latest release of staticcheck (#5057)
* *: use latest release of staticcheck

It also fixes a couple of things in the code flagged by the additional
checks.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Use official release of staticcheck

Also run 'go list' before staticcheck to avoid failures when downloading packages.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-01-04 14:47:38 +01:00