prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-11-14 01:24:04 -08:00

Author	SHA1	Message	Date
Corentin Chary	60dafd425c	consul: improve consul service discovery (#3814 ) * consul: improve consul service discovery Related to #3711 - Add the ability to filter by tag and node-meta in an efficient way (`/catalog/services` allow filtering by node-meta, and returns a `map[string]string` or `service`->`tags`). Tags and nore-meta are also used in `/catalog/service` requests. - Do not require a call to the catalog if services are specified by name. This is important because on large cluster `/catalog/services` changes all the time. - Add `allow_stale` configuration option to do stale reads. Non-stale reads can be costly, even more when you are doing them to a remote datacenter with 10k+ targets over WAN (which is common for federation). - Add `refresh_interval` to minimize the strain on the catalog and on the service endpoint. This is needed because of that kind of behavior from consul: https://github.com/hashicorp/consul/issues/3712 and because a catalog on a large cluster would basically change all the time. No need to discover targets in 1sec if we scrape them every minute. - Added plenty of unit tests. Benchmarks ---------- ```yaml scrape_configs: - job_name: prometheus scrape_interval: 60s static_configs: - targets: ["127.0.0.1:9090"] - job_name: "observability-by-tag" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 tag: marathon-user-observability # Used in After refresh_interval: 30s # Used in After+delay relabel_configs: - source_labels: [__meta_consul_tags] regex: ^(.,)?marathon-user-observability(,.)?$ action: keep - job_name: "observability-by-name" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - observability-cerebro - observability-portal-web - job_name: "fake-fake-fake" scrape_interval: "15s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - fake-fake-fake ``` Note: tested with ~1200 services, ~5000 nodes. \| Resource \| Empty \| Before \| After \| After + delay \| \| -------- \|:-----:\|:------:\|:-----:\|:-------------:\| \|/service-discovery size\|5K\|85MiB\|27k\|27k\|27k\| \|`go_memstats_heap_objects`\|100k\|1M\|120k\|110k\| \|`go_memstats_heap_alloc_bytes`\|24MB\|150MB\|28MB\|27MB\| \|`rate(go_memstats_alloc_bytes_total[5m])`\|0.2MB/s\|28MB/s\|2MB/s\|0.3MB/s\| \|`rate(process_cpu_seconds_total[5m])`\|0.1%\|15%\|2%\|0.01%\| \|`process_open_fds`\|16\|1236\|22\|22\| \|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="services"}[5m])`\|~0\|1\|1\|0.03\| \|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="service"}[5m])`\|0.1\|80\|0.5\|0.5\| \|`prometheus_target_sync_length_seconds{quantile="0.9",scrape_job="observability-by-tag"}`\|N/A\|200ms\|0.2ms\|0.2ms\| \|Network bandwidth\|~10kbps\|~2.8Mbps\|~1.6Mbps\|~10kbps\| Filtering by tag using relabel_configs uses 100kiB and 23kiB/s per service per job and quite a lot of CPU. Also sends and additional 1Mbps of traffic to consul. Being a little bit smarter about this reduces the overhead quite a lot. Limiting the number of `/catalog/services` queries per second almost removes the overhead of service discovery. * consul: tweak `refresh_interval` behavior `refresh_interval` now does what is advertised in the documentation, there won't be more that one update per `refresh_interval`. It now defaults to 30s (which was also the current waitTime in the consul query). This also make sure we don't wait another 30s if we already waited 29s in the blocking call by substracting the number of elapsed seconds. Hopefully this will do what people expect it does and will be safer for existing consul infrastructures.	2018-03-23 14:48:43 +00:00
Yecheng Fu	56ed29fbf7	Map target infos of endpoints to prometheus meta labels. (#3770 )	2018-03-09 10:07:00 +00:00
Fabian Reinartz	3e6c890aea	api: add flag to skip head on snapshots	2018-03-08 13:07:12 +01:00
Jeffrey Zhang	21f96caab3	Fix wrong syntax for alert field templates (#3883 )	2018-02-24 09:37:43 +00:00
Conor Broderick	99006d3baf	Added dropped targets API to targets endpoint (#3870 )	2018-02-21 17:26:18 +00:00
Conor Broderick	1fd20fc954	Add dropped alertmanagers to alertmanagers API (#3865 )	2018-02-21 09:00:07 +00:00
Bartek Plotka	93a63ac5fd	api: Added v1/status/flags endpoint. (#3864 ) Endpoint URL: /api/v1/status/flags Example Output: ```json { "status": "success", "data": { "alertmanager.notification-queue-capacity": "10000", "alertmanager.timeout": "10s", "completion-bash": "false", "completion-script-bash": "false", "completion-script-zsh": "false", "config.file": "my_cool_prometheus.yaml", "help": "false", "help-long": "false", "help-man": "false", "log.level": "info", "query.lookback-delta": "5m", "query.max-concurrency": "20", "query.timeout": "2m", "storage.tsdb.max-block-duration": "36h", "storage.tsdb.min-block-duration": "2h", "storage.tsdb.no-lockfile": "false", "storage.tsdb.path": "data/", "storage.tsdb.retention": "15d", "version": "false", "web.console.libraries": "console_libraries", "web.console.templates": "consoles", "web.enable-admin-api": "false", "web.enable-lifecycle": "false", "web.external-url": "", "web.listen-address": "0.0.0.0:9090", "web.max-connections": "512", "web.read-timeout": "5m", "web.route-prefix": "/", "web.user-assets": "" } } ``` Signed-off-by: Bartek Plotka <bwplotka@gmail.com>	2018-02-21 08:49:02 +00:00
Pedro Araújo	575f665944	Add OS type meta label to Azure SD (#3863 ) There is currently no way to differentiate Windows instances from Linux ones. This is needed when you have a mix of node_exporters / wmi_exporters for OS-level metrics and you want to have them in separate scrape jobs. This change allows you to do just that. Example: ``` - job_name: 'node' azure_sd_configs: - <azure_sd_config> relabel_configs: - source_labels: [__meta_azure_machine_os_type] regex: Linux action: keep ``` The way the vendor'd AzureSDK provides to get the OsType is a bit awkward - as far as I can tell, this information can only be gotten from the startup disk. Newer versions of the SDK appear to improve this a bit (by having OS information in the InstanceView), but the current way still works.	2018-02-19 15:40:57 +00:00
Andrea Giardini	3a9637fa3c	docs: Fix remote_read/remote_timeout default (#3829 )	2018-02-12 12:52:33 +00:00
Brian Brazil	66b8bdbf4a	Fix docs for #3820 (#3823 )	2018-02-11 23:35:08 +00:00
Ben Kochie	40acc632bb	Merge pull request #3505 from rdemachkovych/ansible_prom2.0 Added to documentation Ansible roles for Prometheus 2.0	2018-01-26 11:30:15 +01:00
Roman Demachkovych	8bfc611616	Remove not maintained roles	2018-01-26 09:46:44 +01:00
zemek	8a01a0fbed	Set consul server default to localhost:8500 (#3703 )	2018-01-24 12:14:32 +00:00
James Turnbull	00f4821178	Added missing ingress from role list (#3666 )	2018-01-08 21:23:01 +00:00
James Turnbull	380cacd3a4	Readability edits to vector matching (#3624 ) * Added L3 headings - makes page a little easier to read * Made use of right-hand and left-hand consistent	2017-12-26 10:28:39 +00:00
Brian Brazil	fba80da635	Fix default of read_recent to be false. (#3617 ) This is what is documented in the migration guide, and the default settings should make sense for a true long term storage. Document the setting.	2017-12-23 17:21:38 +00:00
James Turnbull	c3f9238756	Updated alert templating docs (#3596 ) The docs suggest that alert templating only works in the summary and description annotation fields. Some testing and a review of the code suggests this is no longer true and that you can template any annotation field.	2017-12-19 08:04:06 +00:00
Brian Brazil	9083d41d3a	Add 2.0 stability guarantees (#3484 ) As discussed generally consider SDs as unstable, as realistically they are never going to be. Drop the words "experimental/beta" from most places in the docs, as users are getting the wrong impression from this.	2017-12-14 12:54:32 +00:00
Simon Pasquier	aa25dff1ea	Update the openstack_sd_config section openstack_sd_config requires a 'role' parameter which wasn't documented.	2017-12-14 12:20:28 +00:00
Krasi Georgiev	08ee713c82	example to show the difference between "sum by" and "sum without" (#3558 )	2017-12-14 12:20:28 +00:00
vthriller	b4bd91958a	[minor] docs: recording_rules: fix missing key	2017-12-14 12:20:28 +00:00
Tobias Schmidt	28205f5ca9	Remove wrong statement about alertmanager URL configuration	2017-12-14 12:20:28 +00:00
Mike Rostermund	4648f4c156	New server uses read protocol, to eh, read. (#3444 )	2017-12-14 12:20:28 +00:00
Brian Brazil	e0711c2e9b	Document consul sd tls_config (#3440 ) Fixes https://github.com/prometheus/docs/issues/681	2017-12-14 12:20:28 +00:00
Tom Wilkie	d2f6803d14	'Prometheus lifecycle' should be a subsection of 'Miscellaneous'	2017-12-14 12:20:28 +00:00
Or Elimelech	6e8d192ba0	Wrong URL for remote.proto (#3431 ) Change wrong URL for remote.proto	2017-12-14 12:20:28 +00:00
phyber	013dc30dee	Fix markdown in recording rules. (#3432 ) Resolves an issue where rendered markdown was incorrect.	2017-12-14 12:20:28 +00:00
Tobias Schmidt	87f5fe3576	Fix migration documentation title in docs menu	2017-12-14 12:20:28 +00:00
Brian Brazil	5dff97639f	Tweak migration doc (#3430 )	2017-12-14 12:20:28 +00:00
Jose Donizetti	b3b6538348	Small changes to migration guide	2017-12-14 12:20:28 +00:00
Goutham Veeramachaneni	bee6864c14	Make the date returned by snapshot script friendly Fixes #3568 Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-12-10 15:14:31 -06:00
Goutham Veeramachaneni	e0d917e2f5	Merge pull request #3523 from Gouthamve/clean-tomb Add endpoint to cleanup tombstones	2017-12-07 14:39:24 -06:00
Goutham Veeramachaneni	f0599d4dbf	Incorporate review-feedback Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-12-07 09:06:04 -06:00
James Turnbull	330735aca6	Added another full link to the configuration docs (#3553 )	2017-12-07 08:31:15 +00:00
Amy Holt	607a675617	Add prefix to relative 3 URLs (#3551 )	2017-12-06 21:16:53 +00:00
Goutham Veeramachaneni	311edc5a38	Merge branch 'master' into clean-tomb Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-12-05 10:23:21 -06:00
Goutham Veeramachaneni	d8515b2580	Move Admin APIs to v1 Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-12-04 00:13:43 +05:30
Goutham Veeramachaneni	41b8f1f8fe	Add admin API docs Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-12-02 15:37:31 +05:30
Matthias Rampke	cae4538b3e	Docs: state that all regular expressions are RE2. (#3518 ) We already mentioned that regular expressions are RE2 for [relabeling][0], but left open what the regular expression syntax anywhere else is. In the querying examples and reference, make it explicit that _all_ regular expressions are RE2. [0]: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config	2017-12-01 17:26:06 +00:00
Roman Demachkovych	e0ad66f5a6	fix link name	2017-11-27 18:22:27 +01:00
Roman Demachkovych	370d045f5d	Change repo link	2017-11-27 18:14:12 +01:00
James Turnbull	47311bf005	Update configuration.md (#3513 ) 1. Removed https://prometheus.io prefix 2. Fixed broken file discovery link.	2017-11-27 14:52:32 +00:00
Tom Wilkie	9d4e332137	Merge pull request #3495 from tomwilkie/pod-uid-discovery-master Include Pod UID in the discovery metadata.	2017-11-24 15:37:57 +00:00
Tom Wilkie	7d4f7c4b71	Update docs for __meta_kubernetes_pod_uid	2017-11-24 15:02:53 +00:00
Roman Demachkovych	5e243bc556	fix link	2017-11-22 16:26:06 +01:00
Roman Demachkovych	b758039f80	Added in to documentation Ansible roles for Prometheus 2.0	2017-11-22 16:15:46 +01:00
Ben Kochie	40f33f45cb	Fix docs that use regexp anchors (#3504 ) Remove/fix docs that use anchors in label regexp matches.	2017-11-22 12:11:21 +00:00
Tobias Schmidt	7098c56474	Add remote read filter option For special remote read endpoints which have only data for specific queries, it is desired to limit the number of queries sent to the configured remote read endpoint to reduce latency and performance overhead.	2017-11-13 23:30:01 +01:00
Tom Wilkie	617e7d0203	Add migration docs for 2.0 (#3374 ) * Initial draft of migration.md * Edits. * Review feedback. * Review feedback. * Staleness link to video; add docker root example; remote config file section. * s/NB/NOTE/, remove external labels link. * More typos. * Add more details link for removed PromQL features. * s/you/your/ * Expand on prom1.8/2.0 side by side setup. * More feedback. * update links. * --query.lookback-delta flag.	2017-11-08 08:14:33 +01:00
Julius Volz	02ca988bbd	Remove /api/v1/delete_series docs for 2.0 (#3425 ) This endpoint has moved to /api/v2 (with somewhat different properties) in Prometheus 2.0 and should now be part of a separate admin API page.	2017-11-07 22:37:03 +00:00
Tobias Schmidt	a117f051da	Remove outdated information about next-release doc branch	2017-11-07 22:28:04 +01:00
Julius Volz	ef08df0e6f	Add 2.0 storage docs (#3423 ) * Add 2.0 storage docs * Review fixups * More review fixups	2017-11-07 22:00:38 +01:00
Brian Brazil	a5b7955ace	Tweak marathon wording around clustering.	2017-11-02 13:03:19 +00:00
Goutham Veeramachaneni	646e33242e	docs: Fix minor issues with the docs. (#3389 ) Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-11-01 15:35:50 +00:00
Brian Brazil	b6494960d1	docs: Document new recording rule format (#3378 )	2017-11-01 12:58:32 +00:00
Brian Brazil	7187771f20	Document new staleness (#3380 ) Remove "interpolation" for this heading, that hasn't been in these docs for a long time.	2017-11-01 12:40:47 +00:00
James Turnbull	3701a827cf	Updates to alerting rules docs (#3381 ) 1. Added a further explanation of the for clause. 2. Added further clarification of non identifying labels.	2017-10-31 19:19:17 +00:00
Brian Brazil	8cf279efb1	Document new alerting rule format.	2017-10-31 14:46:34 +00:00
Brian Brazil	efaa8f9ce8	Update getting started with new rules format	2017-10-31 13:58:09 +00:00
Fabian Reinartz	a32e4cbdd8	docs: remove 1.x storage docs The only section that still aplies was the one on the default storage directory so those docs seem obsolete. We'll probably have a similar page on the new storage but we'll only find out what caveats etc. we'll have to point out as we get people reporting problems or notable behavior.	2017-10-28 12:11:35 +02:00
Fabian Reinartz	8cc78b36a2	docs: remove obsolete info in getting started Go automatically configures the number of used threads appropriately and tweaking it is no longer relevant for a basic setup of Prometheus. The baseline consumption tied to the storage layer no longer applies.	2017-10-28 12:09:03 +02:00
Fabian Reinartz	8a2b5a3936	docs: update flags to new double-dash syntax	2017-10-28 12:08:33 +02:00
Brian Brazil	faf4bb03ee	Docs: timestamp() function.	2017-10-27 15:54:45 +01:00
Brian Brazil	aeb524ad14	Docs: remove keep_common, count_scalar, drop_common_labels	2017-10-27 15:54:45 +01:00
Tobias Schmidt	f49ae044d7	Import template reference and examples	2017-10-27 16:08:38 +02:00
Tobias Schmidt	f432b8176d	Consolidate configuration and rules docs in docs/configuration/	2017-10-27 09:54:02 +02:00
Tobias Schmidt	4d30a11ab6	Import storage and federation documentation from docs	2017-10-26 22:36:47 +02:00
Tobias Schmidt	e6cdc2d355	Import querying documentation from prometheus/docs	2017-10-26 22:36:47 +02:00
Tobias Schmidt	299802dfd0	Integrate changes from prometheus/docs	2017-10-26 16:14:43 +02:00
Tobias Schmidt	41281aff81	Include 1.8 changes in configuration docs	2017-10-26 16:14:43 +02:00
Tobias Schmidt	53a5f52224	Import first batch of Prometheus documentation In order to provide documentation for each individual version, this commit starts moving Prometheus server specific documentation into the repository itself.	2017-10-26 16:14:43 +02:00

1 2 3

121 commits