prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2025-03-05 20:59:13 -08:00

Author	SHA1	Message	Date
Ganesh Vernekar	23ce9ad9f0	Introduce evaluation delay for rule groups (#155 ) * Allow having evaluation delay for rule groups Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix lint Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Move the option to ManagerOptions Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Include evaluation_delay in the group config Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix comments Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-03-14 13:20:07 +00:00
paulfantom	151a8daa98	documentation: align kubernetes example with the prom operator and mixins Signed-off-by: paulfantom <pawel@krupa.net.pl>	2021-11-22 11:13:47 +01:00
Björn Rabenstein	2234798f60	Merge pull request #9700 from nikosmeds/nikosmeds/hagroupcrashlooping-mixin-60m Increase time range for PrometheusHAGroupCrashlooping alert	2021-11-19 12:53:55 +01:00
Niko Smeds	53ca693f9e	Be specific Signed-off-by: Niko Smeds <nikosmeds@gmail.com>	2021-11-18 11:28:38 -08:00
Niko Smeds	0bc2cbdd7d	Leave time range for clean restarts as-is Signed-off-by: Niko Smeds <nikosmeds@gmail.com>	2021-11-17 15:14:26 -08:00
Fatih Sarhan	bc89e9e494	mixin: Reorder template variables on Remote Write dashboard Signed-off-by: f9n <f9n@protonmail.com>	2021-11-12 14:38:05 +03:00
Niko Smeds	fdcd423dfe	Increase time range for PrometheusHAGroupCrashlooping alert Signed-off-by: Niko Smeds <nikosmeds@gmail.com>	2021-11-08 15:06:42 -08:00
Mateusz Gozdek	1a6c2283a3	Format Go source files using 'gofumpt -w -s -extra' Part of #9557 Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>	2021-11-02 19:52:34 +01:00
Arthur Silva Sens	be2599c853	config: Make remote-write required for Agent mode (#9618 ) * config: Make remote-write required for Agent mode Signed-off-by: ArthurSens <arthursens2005@gmail.com>	2021-10-30 01:41:40 +02:00
SuperQ	3cd2c033e2	Use Go 1.16+ install for mixin tests Use new `go install` syntax to fetch tools. Signed-off-by: SuperQ <superq@gmail.com>	2021-10-23 22:52:16 +02:00
Julien Pivotto	3458e338c6	docs: Improve PuppetDB example (#9547 ) Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2021-10-20 21:03:17 +02:00
Witek Bedyk	cda2dbbef6	Add Uyuni service discovery (#8190 ) * Add Uyuni service discovery Signed-off-by: Witek Bedyk <witold.bedyk@suse.com> Co-authored-by: Joao Cavalheiro <jcavalheiro@suse.de> Co-authored-by: Marcelo Chiaradia <mchiaradia@suse.com> Co-authored-by: Stefano Torresi <stefano@torresi.io> Co-authored-by: Julien Pivotto <roidelapluie@gmail.com>	2021-10-19 01:00:44 +02:00
Julien Pivotto	8920024323	Add PuppetDB service discovery We have been Puppet user for 10 years and we are users of https://github.com/camptocamp/prometheus-puppetdb-sd However, that file_sd implementation contains business logic and assumptions around e.g. the modules which you are using. This pull request adds a simple PuppetDB service discovery, which will enable more use cases than the upstream sd. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2021-09-16 16:54:26 +02:00
Paweł Szulik	f5563bfe95	tests: Move from t.Errorf and others. (Part 2) (#9309 ) * Refactor util tests. Signed-off-by: Paweł Szulik <paul.szulik@gmail.com>	2021-09-13 21:19:20 +02:00
Julien Pivotto	d5676fb9e0	Merge pull request #9254 from prometheus/superq/go1.17 Build with Go 1.17 / npm 7 / node 16	2021-08-28 18:36:42 +02:00
Frederic Hemberger	16b8911b1a	docs: Replace `go get` with `go install` for command installation (#9098 ) `go get` is deprecated for installation of commands as of go v1.17 Ref: https://go.googlesource.com/go/+/ced0fdbad0655d63d535390b1a7126fd1fef8348 Signed-off-by: Frederic Hemberger <mail@frederic-hemberger.de>	2021-08-27 11:08:21 +02:00
SuperQ	e167a45c65	Add new Go build tags. Add new go:build comments based on 1.17 formatting[0]. [0]: https://golang.org/doc/go1.17#gofmt Signed-off-by: SuperQ <superq@gmail.com>	2021-08-27 10:24:14 +02:00
Björn Rabenstein	9c43ac451c	Merge pull request #9129 from PhilipGough/bz-1984365 mixin: Filter instance by selected job for Prometheus overview dashboard	2021-08-13 14:03:16 +02:00
TJ Hoplock	7baf084092	optimize Linode SD by polling for event changes during refresh (#8980 ) * optimize Linode SD by polling for event changes during refresh Most accounts are fairly "static", in the sense that they're not cycling through instances constantly. So rather than do a full refresh every interval and potentially make several behind-the-scenes paginated API calls, this will now poll the `/account/events/` endpoint every minute with a list of events that we care about. If a matching event is found, we then do a full refresh. Co-authored-by: William Smith <wsmith@linode.com> Signed-off-by: TJ Hoplock <t.hoplock@gmail.com> Signed-off-by: William Smith <wsmith@linode.com>	2021-08-04 12:05:49 +02:00
Philip Gough	751ca03fad	mixin: Filter instance by job for Prometheus overview dashboard Signed-off-by: Philip Gough <philip.p.gough@gmail.com>	2021-07-28 14:34:26 +01:00
Julius Volz	179b2155d1	Fix: Use json.Unmarshal() instead of json.Decoder (#9033 ) * Fix: Use json.Unmarshal() instead of json.Decoder See https://ahmet.im/blog/golang-json-decoder-pitfalls/ json.Decoder is for JSON streams, not single JSON objects / bodies. Signed-off-by: Julius Volz <julius.volz@gmail.com> * Revert modifications to targetgroup parsing Signed-off-by: Julius Volz <julius.volz@gmail.com>	2021-07-02 09:38:14 +01:00
Ben Kochie	7cb55d5732	Merge pull request #8802 from mwasilew2/yaml-linting Adds yamllinting to Makefile.common	2021-06-24 15:59:35 +02:00
Levi Harrison	4a4882d4c7	Replace godoc.org links Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-06-17 07:18:51 -04:00
Julien Duchesne	8855c2e626	Add `prometheus_tsdb_clean_start` metric (#8824 ) Add cleanup of the lockfile when the db is cleanly closed The metric describes the status of the lockfile on startup 0: Already existed 1: Did not exist -1: Disabled Therefore, if the min value over time of this metric is 0, that means that executions have exited uncleanly We can then use that metric to have a much lower threshold on the crashlooping alert: If the metric exists and it has been zero, two restarts is enough to trigger the alarm If it does not exist (old prom version for example), the current five restarts threshold remains Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com> * Change metric name + set unset value to -1 Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com> * Only check the last value of the clean start alert Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com> * Fix test + nit Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>	2021-06-16 15:03:02 +05:30
Michal Wasilewski	3f686cad8b	fixes yamllint errors Signed-off-by: Michal Wasilewski <mwasilewski@gmx.com>	2021-06-12 12:47:47 +02:00
Levi Harrison	b5f6f8fb36	Switched to go-kit/log Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-06-11 12:28:36 -04:00
Julien Pivotto	20c6739adc	Merge pull request #8833 from hanjm/feature/add-scape-read-body-limit Add body_size_limit to prevent bad targets response large body cause Prometheus server OOM (#8827)	2021-06-02 09:24:59 +02:00
TJ Hoplock	dc22c65349	Add Linode Service Discovery (#8846 ) * Add Linode Service Discovery Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>	2021-06-01 20:32:36 +02:00
hanjm	1df05bfd49	Add body_size_limit to prevent bad targets response large body cause Prometheus server OOM (#8827 ) Signed-off-by: hanjm <hanjinming@outlook.com>	2021-05-29 07:05:42 +08:00
Levi Harrison	2826fbeeb7	SD: Add target creation failure counter and change failure handling (#8786 ) * Added metric and changed failure/drop strategy Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-05-28 23:50:59 +02:00
Callum Styan	8fd73b1d28	Add Exemplar Remote Write support (#8296 ) * Write exemplars to the WAL and send them over remote write. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Update example for exemplars, print data in a more obvious format. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Add metrics for remote write of exemplars. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix incorrect slices passed to send in remote write. Signed-off-by: Callum Styan <callumstyan@gmail.com> * We need to unregister the new metrics. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address review comments Signed-off-by: Callum Styan <callumstyan@gmail.com> * Order of exemplar append vs write exemplar to WAL needs to change. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Several fixes to prevent sending uninitialized or incorrect samples with an exemplar. Fix dropping exemplar for missing series. Add tests for queue_manager sending exemplars Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Store both samples and exemplars in the same timeseries buffer to remove the alloc when building final request, keep sub-slices in separate buffers for re-use Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Condense sample/exemplar delivery tests to parameterized sub-tests Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Rename test methods for clarity now that they also handle exemplars Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Rename counter variable. Fix instances where metrics were not updated correctly Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Add exemplars to LoadWAL benchmark Signed-off-by: Callum Styan <callumstyan@gmail.com> * last exemplars timestamp metric needs to convert value to seconds with ms precision Signed-off-by: Callum Styan <callumstyan@gmail.com> * Process exemplar records in a separate go routine when loading the WAL. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address review comments related to clarifying comments and variable names. Also refactor sample/exemplar to enqueue prompb types. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Regenerate types proto with comments, update protoc version again. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Put remote write of exemplars behind a feature flag. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address some of Ganesh's review comments. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Move exemplar remote write feature flag to a config file field. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address Bartek's review comments. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Don't allocate exemplar buffers in queue_manager if we're not going to send exemplars over remote write. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Add ValidateExemplar function, validate exemplars when appending to head and log them all to WAL before adding them to exemplar storage. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address more reivew comments from Ganesh. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Add exemplar total label length check. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address a few last review comments Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Martin Disibio <mdisibio@gmail.com>	2021-05-06 13:53:52 -07:00
Damien Grisonnet	b50f9c1c84	Add label scrape limits (#8777 ) * scrape: add label limits per scrape Add three new limits to the scrape configuration to provide some mechanism to defend against unbound number of labels and excessive label lengths. If any of these limits are broken by a sample from a scrape, the whole scrape will fail. For all of these configuration options, a zero value means no limit. The `label_limit` configuration will provide a mechanism to bound the number of labels per-scrape of a certain sample to a user defined limit. This limit will be tested against the sample labels plus the discovery labels, but it will exclude the __name__ from the count since it is a mandatory Prometheus label to which applying constraints isn't meaningful. The `label_name_length_limit` and `label_value_length_limit` will prevent having labels of excessive lengths. These limits also skip the __name__ label for the same reasons as the `label_limit` option and will also make the scrape fail if any sample has a label name/value length that exceed the predefined limits. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com> * scrape: add metrics and alert to label limits Add three gauge, one for each label limit to easily access the limit set by a certain scrape target. Also add a counter to count the number of targets that exceeded the label limits and thus were dropped. This is useful for the `PrometheusLabelLimitHit` alert that will notify the users that scraping some targets failed because they had samples exceeding the label limits defined in the scrape configuration. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com> * scrape: apply label limits to __name__ label Apply limits to the __name__ label that was previously skipped and truncate the label names and values in the error messages as they can be very very long. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com> * scrape: remove label limits gauges and refactor Remove `prometheus_target_scrape_pool_label_limit`, `prometheus_target_scrape_pool_label_name_length_limit`, and `prometheus_target_scrape_pool_label_value_length_limit` as they are not really useful since we don't have the information on the labels in it. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>	2021-05-06 09:56:21 +01:00
Gezim Sejdiu	97acd170b2	Fix a broken link for the bcrypt ref. at the web-config.yml example Signed-off-by: Gezim Sejdiu <g.sejdiu@gmail.com>	2021-04-20 22:43:37 +02:00
zhangshj	1956f07197	update redirected url Signed-off-by: zhangshj <zhangshj@inspur.com>	2021-04-14 13:54:40 +08:00
Robert Jacob	b253056163	Implement Docker discovery (#8629 ) * Implement Docker discovery Signed-off-by: Robert Jacob <xperimental@solidproject.de>	2021-03-29 22:30:23 +02:00
Rémy Léone	f690b811c5	add support for scaleway service discovery (#8555 ) Co-authored-by: Patrik <patrik@ptrk.io> Co-authored-by: Julien Pivotto <roidelapluie@inuits.eu> Signed-off-by: Rémy Léone <rleone@scaleway.com>	2021-03-10 15:10:17 +01:00
Julien Pivotto	432d5ebc6c	Rename default branch to main Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2021-02-22 20:28:02 +01:00
Julien Pivotto	8787f0aed7	Update common to support credentials type Most of the backwards compat tests is done in common. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2021-02-18 23:28:22 +01:00
Tom Wilkie	d479151f1f	Various enhancements and refactorings for remote write receiver: - Remove unrelated changes - Refactor code out of the API module - that is already getting pretty crowded. - Don't track reference for AddFast in remote write. This has the potential to consume unlimited server-side memory if a malicious client pushes a different label set for every series. For now, its easier and safer to always use the 'slow' path. - Return 400 on out of order samples. - Use remote.DecodeWriteRequest in the remote write adapters. - Put this behing the 'remote-write-server' feature flag - Add some (very) basic docs. - Used named return & add test for commit error propagation Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2021-02-08 20:41:23 +00:00
ravilr	adc8807851	Update remote-write alert rules mixin (#8423 ) Signed-off-by: ravilr <raviprasad_lr@yahoo.com>	2021-01-31 20:07:49 +00:00
Julien Pivotto	5bd7145e55	Merge pull request #8327 from roidelapluie/tlsexemple https: Add example configuration file	2021-01-15 09:50:52 +01:00
Julien Pivotto	08c259cda6	https: Add example configuration file Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2021-01-15 01:37:50 +01:00
Frederic Branczyk	62bc755733	mixin: Scope grafana config In its current form this configuration clashes in one of the most widely used configurations (kube-prometheus). This patch scopes the configuration to prevent this. Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>	2020-12-30 17:50:34 +01:00
Nicolas Lamirault	aa1ca13025	Add: Custom tags and prefix in Prometheus Mixin (#8287 ) * Add: custom tags and prefix Signed-off-by: Nicolas Lamirault <nicolas.lamirault@gmail.com> * Fix: fmt Signed-off-by: Nicolas Lamirault <nicolas.lamirault@gmail.com>	2020-12-16 18:49:06 +01:00
Björn Rabenstein	511511324a	Merge pull request #8235 from Allex1/master Update remote-write grafana mixin	2020-12-08 14:50:47 +01:00
beorn7	553f904f2d	mixin: Add a capability to exclude non-prod AM instances Signed-off-by: beorn7 <beorn@grafana.com>	2020-12-03 20:59:53 +01:00
birca	3ec4161575	Update remote-write grafana mixin Signed-off-by: birca <birca@adobe.com>	2020-12-02 09:50:15 +02:00
beorn7	638e99c814	prometheus-mixin: Make PrometheusRemoteWriteBehind more generic Currently, it relies on `job, instance` being the labels completely identifying a Prometheus instance. However, what's intended is to simply not match on `remote_name, url`. Signed-off-by: beorn7 <beorn@grafana.com>	2020-11-17 13:29:49 +01:00
beorn7	371ca9ff46	prometheus-mixin: add HA-group aware alerts There is certainly a potential to add more of these. This is mostly meant to introduce the concept and cover a few critical parts. Signed-off-by: beorn7 <beorn@grafana.com>	2020-11-11 19:45:34 +01:00
Julien Pivotto	6c56a1faaa	Testify: move to require (#8122 ) * Testify: move to require Moving testify to require to fail tests early in case of errors. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * More moves Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-29 09:43:23 +00:00

1 2 3 4 5 ...

255 commits