prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-11-14 17:44:06 -08:00

Author	SHA1	Message	Date
Bryan Boreham	b87b88ddc2	Merge branch 'main' into consul-catalog-filter-support Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2024-10-08 12:20:31 +01:00
TJ Hoplock	6ebfbd2d54	chore!: adopt log/slog, remove go-kit/log For: #14355 This commit updates Prometheus to adopt stdlib's log/slog package in favor of go-kit/log. As part of converting to use slog, several other related changes are required to get prometheus working, including: - removed unused logging util func `RateLimit()` - forward ported the util/logging/Deduper logging by implementing a small custom slog.Handler that does the deduping before chaining log calls to the underlying real slog.Logger - move some of the json file logging functionality to use prom/common package functionality - refactored some of the new json file logging for scraping - changes to promql.QueryLogger interface to swap out logging methods for relevant slog sugar wrappers - updated lots of tests that used/replicated custom logging functionality, attempting to keep the logical goal of the tests consistent after the transition - added a healthy amount of `if logger == nil { $makeLogger }` type conditional checks amongst various functions where none were provided -- old code that used the go-kit/log.Logger interface had several places where there were nil references when trying to use functions like `With()` to add keyvals on the new *slog.Logger type Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>	2024-10-07 15:58:50 -04:00
Oleksandr Redko	f10c3454e9	Enable perfsprint linter and fix up code Signed-off-by: Oleksandr Redko <oleksandr.red+github@gmail.com>	2024-05-15 17:51:05 +03:00
Daniel Kimsey	aa3e58358b	consul: Add support for catalog list services filter This adds support for Consul's Catalog [List Services][^1] API's `filter` parameter added in 1.14.x. This parameter grants the operator more flexibility to do server-side filtering of the Catalog, before Prometheus subscribes for updates. Operators can use this to improve both the performance of Prometheus's Consul SD and reduce the impact of enumerating large catalogs. [^1]: https://developer.hashicorp.com/consul/api-docs/v1.14.x/catalog Signed-off-by: Daniel Kimsey <dekimsey@protonmail.com>	2024-03-17 20:32:54 -05:00
Paulin Todev	78411d5e8b	SD Managers taking over responsibility for registration of debug metrics (#13375 ) SD Managers take over responsibility for SD metrics registration --------- Signed-off-by: Paulin Todev <paulin.todev@gmail.com> Signed-off-by: Björn Rabenstein <github@rabenste.in> Co-authored-by: Björn Rabenstein <github@rabenste.in>	2024-01-23 16:53:55 +01:00
Paulin Todev	6de80d7fb0	Allow non-default registry to be used for metrics of SD components Signed-off-by: Paulin Todev <paulin.todev@gmail.com>	2023-12-11 11:14:26 +00:00
Oleksandr Redko	fa90ca46e5	ci(lint): enable godot; append dot at the end of comments Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>	2023-10-31 19:53:38 +02:00
Julien Pivotto	0dc31ade41	Add support for consul path_prefix Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>	2023-05-17 00:14:58 +02:00
David Cañadillas	51a44e6657	Adding Consul Enterprise Admin Partitions (#11482 ) * Adding Consul Enterprise Admin Partitions Signed-off-by: dcanadillas <dcanadillas@hashicorp.com>	2022-10-21 14:13:01 +02:00
Matthieu MOREL	f43749e82f	refactor (discovery): move from github.com/pkg/errors to 'errors' and 'fmt' (#10807 ) Signed-off-by: Matthieu MOREL <mmorel-35@users.noreply.github.com> Co-authored-by: Matthieu MOREL <mmorel-35@users.noreply.github.com>	2022-06-03 13:47:14 +02:00
Conor Evans	c28b9a0574	Add datacenter to Consul service discovery logs (#9668 ) * add datacenter to consul service discovery logs Signed-off-by: Conor Evans <coevans@tcd.ie>	2021-11-08 09:34:21 +01:00
Mateusz Gozdek	1a6c2283a3	Format Go source files using 'gofumpt -w -s -extra' Part of #9557 Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>	2021-11-02 19:52:34 +01:00
Julien Pivotto	63b3e4e5ec	Enable HTTP2 again (#9398 ) We are re-enabling HTTP 2 again. There has been a few bugfixes upstream in go, and we have also enabled ReadIdleTimeout. Fix #7588 Fix #9068 Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2021-09-26 23:16:12 +02:00
jinglina	ed24e51e7c	remove redundant type conversion (#9126 ) Signed-off-by: jinglina <jinglinax@163.com>	2021-07-28 13:33:46 +05:30
Levi Harrison	faed8df31d	Enable reading consul token from file (#8926 ) * Adopted common http client Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-06-12 00:06:59 +02:00
Levi Harrison	b5f6f8fb36	Switched to go-kit/log Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-06-11 12:28:36 -04:00
Austin Cawley-Edwards	301815e48b	Update prometheus-common and the consul HTTP client (#8913 ) * Update to prometheus-common@v0.29.0 Signed-off-by: austin ce <austin.cawley@gmail.com>	2021-06-11 14:24:41 +02:00
Frederic Hemberger	39a87fd9d2	consul_sd: Add namespace support for Consul Enterprise Signed-off-by: Frederic Hemberger <mail@frederic-hemberger.de>	2021-06-09 16:35:02 +02:00
songjiayang	b781b5cac5	Refactor file discovery init function (#8891 ) * Refactor file discovery init function Combine to one init function like other discovery. Signed-off-by: songjiayang <songjiayang1@gmail.com>	2021-06-04 14:43:24 +02:00
Nick Triller	fddf4918c0	Send empty targetgroup if nothing discovered Signed-off-by: Nick Triller <nicktriller@gmail.com>	2021-04-29 09:06:52 +02:00
johncming	a5beb627ff	some fixies for consul sd. (#7799 ) * discovery/consul: make duration more accurate. Signed-off-by: johncming <johncming@yahoo.com> * discovery/consul: fix bug when context done. Signed-off-by: johncming <johncming@yahoo.com>	2020-08-25 15:46:14 +02:00
Andy Bursavich	4e6a94a27d	Invert service discovery dependencies (#7701 ) This also fixes a bug in query_log_file, which now is relative to the config file like all other paths. Signed-off-by: Andy Bursavich <abursavich@gmail.com>	2020-08-20 13:48:26 +01:00
Julien Pivotto	3a7120bc07	Consul: Reduce WatchTimeout to 2m and set it as timeout for requests Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-08-03 00:42:55 +02:00
John Bampton	98a69b77d1	Fix spelling (#7512 ) Signed-off-by: John Bampton <jbampton@users.noreply.github.com>	2020-07-04 14:54:26 +02:00
Pierre Souchay	1508678001	Use 10m timeouts for watches (#7423 ) use ?wait=10m will give results as fast as usual when data is changing but will perform far less requests when services do not change. On large infrastructure, this will reduce quite a lot the number of qps on Consul servers while having the same performance for freshness of results. Signed-off-by: Pierre Souchay <p.souchay@criteo.com>	2020-06-20 20:22:45 +01:00
Mathilde Gilles	9b9c58aea8	[Consul] Add health label to metrics (#5313 ) Label metrics with the target health using consul's /health endpoint. Signed-off-by: Mathilde Gilles <m.gilles@criteo.com>	2020-02-25 13:32:30 +00:00
Simon Pasquier	fe76ccbfe3	discovery/consul: fix logging of tags (#6783 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2020-02-13 13:11:44 +01:00
Ben Ye	60527de355	keep consul service metrics in global variables (#6764 ) Signed-off-by: yeya24 <yb532204897@gmail.com>	2020-02-06 05:48:58 +00:00
Simon Pasquier	19ce6b7f5f	discovery: fix more error logs on context cancelation (#6133 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-10-18 11:48:51 +02:00
beorn7	dd81912554	Add objectives to Summaries With the next release of client_golang, Summaries will not have objectives by default. To not lose the objectives we have right now, explicitly state the current default objectives. Signed-off-by: beorn7 <beorn@grafana.com>	2019-06-12 02:03:13 +02:00
Simon Pasquier	45506841e6	*: enable all default linters (#5504 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-05-03 15:11:28 +02:00
Tariq Ibrahim	8fdfa8abea	refine error handling in prometheus (#5388 ) i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors. ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives. iii) Does away with the use of fmt package for errors in favour of pkg/errors Signed-off-by: tariqibrahim <tariq181290@gmail.com>	2019-03-26 00:01:12 +01:00
Simon Pasquier	782d00059a	discovery: factorize for SD based on refresh (#5381 ) * discovery: factorize for SD based on refresh Signed-off-by: Simon Pasquier <spasquie@redhat.com> * discovery: use common metrics for refresh Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-03-25 11:54:22 +01:00
Callum Styan	83c46fd549	update Consul vendor code so that catalog.ServiceMultipleTags can be (#5151 ) Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-03-12 10:31:27 +00:00
Simon Pasquier	f9462d5d44	discovery/consul: pass current context to Consul queries Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-02-18 14:23:56 +01:00
Simon Pasquier	f678e27eb6	: use latest release of staticcheck (#5057 ) : use latest release of staticcheck It also fixes a couple of things in the code flagged by the additional checks. Signed-off-by: Simon Pasquier <spasquie@redhat.com> Use official release of staticcheck Also run 'go list' before staticcheck to avoid failures when downloading packages. Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-01-04 14:47:38 +01:00
Samuel Alfageme	240321acee	Add taggedAddress to the labels in ConsulSD (#5001 ) Useful when multiple (tagged) addresses for a node are exposed on the catalog API Ref. https://www.consul.io/api/catalog.html#taggedaddresses Signed-off-by: Samuel Alfageme <samuel@alfage.me>	2018-12-18 11:51:05 +01:00
Simon Pasquier	1cd29f782c	discovery/consul: close idle connections on stop Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2018-08-01 17:26:52 +02:00
Romain Baugue	b41be4ef52	Discovery consul service meta (#4280 ) * Upgrade Consul client * Add ServiceMeta to the labels in ConsulSD Signed-off-by: Romain Baugue <romain.baugue@elwinar.com>	2018-07-18 05:06:56 +01:00
Julius Volz	5cf0113762	Add "omitempty" to some SD config YAML field tags (#4338 ) Especially for Kubernetes SD, this fixes a bug where the rendered configuration says "api_server: null", which when read back is not interpreted as an un-set API server (thus the default is not applied). Signed-off-by: Julius Volz <julius.volz@gmail.com>	2018-07-03 13:43:41 +02:00
Adam Shannon	809881d7f5	support reading basic_auth password_file for HTTP basic auth (#4077 ) Issue: https://github.com/prometheus/prometheus/issues/4076 Signed-off-by: Adam Shannon <adamkshannon@gmail.com>	2018-04-25 18:19:06 +01:00
sev3ryn	cc917aee7f	fix of endless loop while doing Consul service discovery. (#4044 ) Reloading Prometheus configs doesn't make loop end. It produced a goroutine leak	2018-04-05 10:41:09 +01:00
Manos Fokas	25f929b772	Yaml UnmarshalStrict implementation. (#4033 ) * Updated yaml vendor package. * remove checkOverflow duplicate in rulefmt * remove duplicated HTTPClientConfig.Validate() * Added yaml static check.	2018-04-04 09:07:39 +01:00
Corentin Chary	60dafd425c	consul: improve consul service discovery (#3814 ) * consul: improve consul service discovery Related to #3711 - Add the ability to filter by tag and node-meta in an efficient way (`/catalog/services` allow filtering by node-meta, and returns a `map[string]string` or `service`->`tags`). Tags and nore-meta are also used in `/catalog/service` requests. - Do not require a call to the catalog if services are specified by name. This is important because on large cluster `/catalog/services` changes all the time. - Add `allow_stale` configuration option to do stale reads. Non-stale reads can be costly, even more when you are doing them to a remote datacenter with 10k+ targets over WAN (which is common for federation). - Add `refresh_interval` to minimize the strain on the catalog and on the service endpoint. This is needed because of that kind of behavior from consul: https://github.com/hashicorp/consul/issues/3712 and because a catalog on a large cluster would basically change all the time. No need to discover targets in 1sec if we scrape them every minute. - Added plenty of unit tests. Benchmarks ---------- ```yaml scrape_configs: - job_name: prometheus scrape_interval: 60s static_configs: - targets: ["127.0.0.1:9090"] - job_name: "observability-by-tag" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 tag: marathon-user-observability # Used in After refresh_interval: 30s # Used in After+delay relabel_configs: - source_labels: [__meta_consul_tags] regex: ^(.,)?marathon-user-observability(,.)?$ action: keep - job_name: "observability-by-name" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - observability-cerebro - observability-portal-web - job_name: "fake-fake-fake" scrape_interval: "15s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - fake-fake-fake ``` Note: tested with ~1200 services, ~5000 nodes. \| Resource \| Empty \| Before \| After \| After + delay \| \| -------- \|:-----:\|:------:\|:-----:\|:-------------:\| \|/service-discovery size\|5K\|85MiB\|27k\|27k\|27k\| \|`go_memstats_heap_objects`\|100k\|1M\|120k\|110k\| \|`go_memstats_heap_alloc_bytes`\|24MB\|150MB\|28MB\|27MB\| \|`rate(go_memstats_alloc_bytes_total[5m])`\|0.2MB/s\|28MB/s\|2MB/s\|0.3MB/s\| \|`rate(process_cpu_seconds_total[5m])`\|0.1%\|15%\|2%\|0.01%\| \|`process_open_fds`\|16\|1236\|22\|22\| \|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="services"}[5m])`\|~0\|1\|1\|0.03\| \|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="service"}[5m])`\|0.1\|80\|0.5\|0.5\| \|`prometheus_target_sync_length_seconds{quantile="0.9",scrape_job="observability-by-tag"}`\|N/A\|200ms\|0.2ms\|0.2ms\| \|Network bandwidth\|~10kbps\|~2.8Mbps\|~1.6Mbps\|~10kbps\| Filtering by tag using relabel_configs uses 100kiB and 23kiB/s per service per job and quite a lot of CPU. Also sends and additional 1Mbps of traffic to consul. Being a little bit smarter about this reduces the overhead quite a lot. Limiting the number of `/catalog/services` queries per second almost removes the overhead of service discovery. * consul: tweak `refresh_interval` behavior `refresh_interval` now does what is advertised in the documentation, there won't be more that one update per `refresh_interval`. It now defaults to 30s (which was also the current waitTime in the consul query). This also make sure we don't wait another 30s if we already waited 29s in the blocking call by substracting the number of elapsed seconds. Hopefully this will do what people expect it does and will be safer for existing consul infrastructures.	2018-03-23 14:48:43 +00:00
zemek	8a01a0fbed	Set consul server default to localhost:8500 (#3703 )	2018-01-24 12:14:32 +00:00
Shubheksha Jalan	0471e64ad1	Use shared types from the `common` repo (#3674 ) * refactor: use shared types from common repo, remove util/config * vendor: add common/config * fix nit	2018-01-11 16:10:25 +01:00
Callum Styan	97464236c7	comments with TargetProvider should read Discoverer instead (#3667 )	2018-01-08 23:59:18 +00:00
Shubheksha Jalan	ec94df49d4	Refactor SD configuration to remove `config` dependency (#3629 ) * refactor: move targetGroup struct and CheckOverflow() to their own package * refactor: move auth and security related structs to a utility package, fix import error in utility package * refactor: Azure SD, remove SD struct from config * refactor: DNS SD, remove SD struct from config into dns package * refactor: ec2 SD, move SD struct from config into the ec2 package * refactor: file SD, move SD struct from config to file discovery package * refactor: gce, move SD struct from config to gce discovery package * refactor: move HTTPClientConfig and URL into util/config, fix import error in httputil * refactor: consul, move SD struct from config into consul discovery package * refactor: marathon, move SD struct from config into marathon discovery package * refactor: triton, move SD struct from config to triton discovery package, fix test * refactor: zookeeper, move SD structs from config to zookeeper discovery package * refactor: openstack, remove SD struct from config, move into openstack discovery package * refactor: kubernetes, move SD struct from config into kubernetes discovery package * refactor: notifier, use targetgroup package instead of config * refactor: tests for file, marathon, triton SD - use targetgroup package instead of config.TargetGroup * refactor: retrieval, use targetgroup package instead of config.TargetGroup * refactor: storage, use config util package * refactor: discovery manager, use targetgroup package instead of config.TargetGroup * refactor: use HTTPClient and TLS config from configUtil instead of config * refactor: tests, use targetgroup package instead of config.TargetGroup * refactor: fix tagetgroup.Group pointers that were removed by mistake * refactor: openstack, kubernetes: drop prefixes * refactor: remove import aliases forced due to vscode bug * refactor: move main SD struct out of config into discovery/config * refactor: rename configUtil to config_util * refactor: rename yamlUtil to yaml_config * refactor: kubernetes, remove prefixes * refactor: move the TargetGroup package to discovery/ * refactor: fix order of imports	2017-12-29 21:01:34 +01:00
Callum Styan	7776527390	bump consul HTTP client timeout by 5s so it doesn't match up exactly with the consul SD watch timeout	2017-10-28 16:42:42 -07:00
Julius Volz	099df0c5f0	Migrate "golang.org/x/net/context" -> "context" (#3333 ) In some places, where ctxhttp or gRPC are concerned, we still need to use the old contexts.	2017-10-24 21:21:42 -07:00

1 2

62 commits