prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-11-09 23:24:05 -08:00

Author	SHA1	Message	Date
Julius Volz	fe10b36b30	Fix curl example for deleting series (#4046 )	2018-04-05 13:06:18 +01:00
Philippe Laflamme	2aba238f31	Use common HTTPClientConfig for marathon_sd configuration (#4009 ) This adds support for basic authentication which closes #3090 The support for specifying the client timeout was removed as discussed in https://github.com/prometheus/common/pull/123. Marathon was the only sd mechanism doing this and configuring the timeout is done through `Context`. DC/OS uses a custom `Authorization` header for authenticating. This adds 2 new configuration properties to reflect this. Existing configuration files that use the bearer token will no longer work. More work is required to make this backwards compatible.	2018-04-05 09:08:18 +01:00
albatross0	0245fd55bf	Add a machine type label to GCE SD (#4032 )	2018-03-31 09:20:19 +01:00
Kristiyan Nikolov	be85ba3842	discovery/ec2: Support filtering instances in discovery (#4011 )	2018-03-31 07:51:11 +01:00
Corentin Chary	60dafd425c	consul: improve consul service discovery (#3814 ) * consul: improve consul service discovery Related to #3711 - Add the ability to filter by tag and node-meta in an efficient way (`/catalog/services` allow filtering by node-meta, and returns a `map[string]string` or `service`->`tags`). Tags and nore-meta are also used in `/catalog/service` requests. - Do not require a call to the catalog if services are specified by name. This is important because on large cluster `/catalog/services` changes all the time. - Add `allow_stale` configuration option to do stale reads. Non-stale reads can be costly, even more when you are doing them to a remote datacenter with 10k+ targets over WAN (which is common for federation). - Add `refresh_interval` to minimize the strain on the catalog and on the service endpoint. This is needed because of that kind of behavior from consul: https://github.com/hashicorp/consul/issues/3712 and because a catalog on a large cluster would basically change all the time. No need to discover targets in 1sec if we scrape them every minute. - Added plenty of unit tests. Benchmarks ---------- ```yaml scrape_configs: - job_name: prometheus scrape_interval: 60s static_configs: - targets: ["127.0.0.1:9090"] - job_name: "observability-by-tag" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 tag: marathon-user-observability # Used in After refresh_interval: 30s # Used in After+delay relabel_configs: - source_labels: [__meta_consul_tags] regex: ^(.,)?marathon-user-observability(,.)?$ action: keep - job_name: "observability-by-name" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - observability-cerebro - observability-portal-web - job_name: "fake-fake-fake" scrape_interval: "15s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - fake-fake-fake ``` Note: tested with ~1200 services, ~5000 nodes. \| Resource \| Empty \| Before \| After \| After + delay \| \| -------- \|:-----:\|:------:\|:-----:\|:-------------:\| \|/service-discovery size\|5K\|85MiB\|27k\|27k\|27k\| \|`go_memstats_heap_objects`\|100k\|1M\|120k\|110k\| \|`go_memstats_heap_alloc_bytes`\|24MB\|150MB\|28MB\|27MB\| \|`rate(go_memstats_alloc_bytes_total[5m])`\|0.2MB/s\|28MB/s\|2MB/s\|0.3MB/s\| \|`rate(process_cpu_seconds_total[5m])`\|0.1%\|15%\|2%\|0.01%\| \|`process_open_fds`\|16\|1236\|22\|22\| \|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="services"}[5m])`\|~0\|1\|1\|0.03\| \|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="service"}[5m])`\|0.1\|80\|0.5\|0.5\| \|`prometheus_target_sync_length_seconds{quantile="0.9",scrape_job="observability-by-tag"}`\|N/A\|200ms\|0.2ms\|0.2ms\| \|Network bandwidth\|~10kbps\|~2.8Mbps\|~1.6Mbps\|~10kbps\| Filtering by tag using relabel_configs uses 100kiB and 23kiB/s per service per job and quite a lot of CPU. Also sends and additional 1Mbps of traffic to consul. Being a little bit smarter about this reduces the overhead quite a lot. Limiting the number of `/catalog/services` queries per second almost removes the overhead of service discovery. * consul: tweak `refresh_interval` behavior `refresh_interval` now does what is advertised in the documentation, there won't be more that one update per `refresh_interval`. It now defaults to 30s (which was also the current waitTime in the consul query). This also make sure we don't wait another 30s if we already waited 29s in the blocking call by substracting the number of elapsed seconds. Hopefully this will do what people expect it does and will be safer for existing consul infrastructures.	2018-03-23 14:48:43 +00:00
Yecheng Fu	56ed29fbf7	Map target infos of endpoints to prometheus meta labels. (#3770 )	2018-03-09 10:07:00 +00:00
Fabian Reinartz	3e6c890aea	api: add flag to skip head on snapshots	2018-03-08 13:07:12 +01:00
Jeffrey Zhang	21f96caab3	Fix wrong syntax for alert field templates (#3883 )	2018-02-24 09:37:43 +00:00
Conor Broderick	99006d3baf	Added dropped targets API to targets endpoint (#3870 )	2018-02-21 17:26:18 +00:00
Conor Broderick	1fd20fc954	Add dropped alertmanagers to alertmanagers API (#3865 )	2018-02-21 09:00:07 +00:00
Bartek Plotka	93a63ac5fd	api: Added v1/status/flags endpoint. (#3864 ) Endpoint URL: /api/v1/status/flags Example Output: ```json { "status": "success", "data": { "alertmanager.notification-queue-capacity": "10000", "alertmanager.timeout": "10s", "completion-bash": "false", "completion-script-bash": "false", "completion-script-zsh": "false", "config.file": "my_cool_prometheus.yaml", "help": "false", "help-long": "false", "help-man": "false", "log.level": "info", "query.lookback-delta": "5m", "query.max-concurrency": "20", "query.timeout": "2m", "storage.tsdb.max-block-duration": "36h", "storage.tsdb.min-block-duration": "2h", "storage.tsdb.no-lockfile": "false", "storage.tsdb.path": "data/", "storage.tsdb.retention": "15d", "version": "false", "web.console.libraries": "console_libraries", "web.console.templates": "consoles", "web.enable-admin-api": "false", "web.enable-lifecycle": "false", "web.external-url": "", "web.listen-address": "0.0.0.0:9090", "web.max-connections": "512", "web.read-timeout": "5m", "web.route-prefix": "/", "web.user-assets": "" } } ``` Signed-off-by: Bartek Plotka <bwplotka@gmail.com>	2018-02-21 08:49:02 +00:00
Pedro Araújo	575f665944	Add OS type meta label to Azure SD (#3863 ) There is currently no way to differentiate Windows instances from Linux ones. This is needed when you have a mix of node_exporters / wmi_exporters for OS-level metrics and you want to have them in separate scrape jobs. This change allows you to do just that. Example: ``` - job_name: 'node' azure_sd_configs: - <azure_sd_config> relabel_configs: - source_labels: [__meta_azure_machine_os_type] regex: Linux action: keep ``` The way the vendor'd AzureSDK provides to get the OsType is a bit awkward - as far as I can tell, this information can only be gotten from the startup disk. Newer versions of the SDK appear to improve this a bit (by having OS information in the InstanceView), but the current way still works.	2018-02-19 15:40:57 +00:00
Andrea Giardini	3a9637fa3c	docs: Fix remote_read/remote_timeout default (#3829 )	2018-02-12 12:52:33 +00:00
Brian Brazil	66b8bdbf4a	Fix docs for #3820 (#3823 )	2018-02-11 23:35:08 +00:00
Ben Kochie	40acc632bb	Merge pull request #3505 from rdemachkovych/ansible_prom2.0 Added to documentation Ansible roles for Prometheus 2.0	2018-01-26 11:30:15 +01:00
Roman Demachkovych	8bfc611616	Remove not maintained roles	2018-01-26 09:46:44 +01:00
zemek	8a01a0fbed	Set consul server default to localhost:8500 (#3703 )	2018-01-24 12:14:32 +00:00
James Turnbull	00f4821178	Added missing ingress from role list (#3666 )	2018-01-08 21:23:01 +00:00
James Turnbull	380cacd3a4	Readability edits to vector matching (#3624 ) * Added L3 headings - makes page a little easier to read * Made use of right-hand and left-hand consistent	2017-12-26 10:28:39 +00:00
Brian Brazil	fba80da635	Fix default of read_recent to be false. (#3617 ) This is what is documented in the migration guide, and the default settings should make sense for a true long term storage. Document the setting.	2017-12-23 17:21:38 +00:00
James Turnbull	c3f9238756	Updated alert templating docs (#3596 ) The docs suggest that alert templating only works in the summary and description annotation fields. Some testing and a review of the code suggests this is no longer true and that you can template any annotation field.	2017-12-19 08:04:06 +00:00
Brian Brazil	9083d41d3a	Add 2.0 stability guarantees (#3484 ) As discussed generally consider SDs as unstable, as realistically they are never going to be. Drop the words "experimental/beta" from most places in the docs, as users are getting the wrong impression from this.	2017-12-14 12:54:32 +00:00
Simon Pasquier	aa25dff1ea	Update the openstack_sd_config section openstack_sd_config requires a 'role' parameter which wasn't documented.	2017-12-14 12:20:28 +00:00
Krasi Georgiev	08ee713c82	example to show the difference between "sum by" and "sum without" (#3558 )	2017-12-14 12:20:28 +00:00
vthriller	b4bd91958a	[minor] docs: recording_rules: fix missing key	2017-12-14 12:20:28 +00:00
Tobias Schmidt	28205f5ca9	Remove wrong statement about alertmanager URL configuration	2017-12-14 12:20:28 +00:00
Mike Rostermund	4648f4c156	New server uses read protocol, to eh, read. (#3444 )	2017-12-14 12:20:28 +00:00
Brian Brazil	e0711c2e9b	Document consul sd tls_config (#3440 ) Fixes https://github.com/prometheus/docs/issues/681	2017-12-14 12:20:28 +00:00
Tom Wilkie	d2f6803d14	'Prometheus lifecycle' should be a subsection of 'Miscellaneous'	2017-12-14 12:20:28 +00:00
Or Elimelech	6e8d192ba0	Wrong URL for remote.proto (#3431 ) Change wrong URL for remote.proto	2017-12-14 12:20:28 +00:00
phyber	013dc30dee	Fix markdown in recording rules. (#3432 ) Resolves an issue where rendered markdown was incorrect.	2017-12-14 12:20:28 +00:00
Tobias Schmidt	87f5fe3576	Fix migration documentation title in docs menu	2017-12-14 12:20:28 +00:00
Brian Brazil	5dff97639f	Tweak migration doc (#3430 )	2017-12-14 12:20:28 +00:00
Jose Donizetti	b3b6538348	Small changes to migration guide	2017-12-14 12:20:28 +00:00
Goutham Veeramachaneni	bee6864c14	Make the date returned by snapshot script friendly Fixes #3568 Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-12-10 15:14:31 -06:00
Goutham Veeramachaneni	e0d917e2f5	Merge pull request #3523 from Gouthamve/clean-tomb Add endpoint to cleanup tombstones	2017-12-07 14:39:24 -06:00
Goutham Veeramachaneni	f0599d4dbf	Incorporate review-feedback Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-12-07 09:06:04 -06:00
James Turnbull	330735aca6	Added another full link to the configuration docs (#3553 )	2017-12-07 08:31:15 +00:00
Amy Holt	607a675617	Add prefix to relative 3 URLs (#3551 )	2017-12-06 21:16:53 +00:00
Goutham Veeramachaneni	311edc5a38	Merge branch 'master' into clean-tomb Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-12-05 10:23:21 -06:00
Goutham Veeramachaneni	d8515b2580	Move Admin APIs to v1 Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-12-04 00:13:43 +05:30
Goutham Veeramachaneni	41b8f1f8fe	Add admin API docs Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-12-02 15:37:31 +05:30
Matthias Rampke	cae4538b3e	Docs: state that all regular expressions are RE2. (#3518 ) We already mentioned that regular expressions are RE2 for [relabeling][0], but left open what the regular expression syntax anywhere else is. In the querying examples and reference, make it explicit that _all_ regular expressions are RE2. [0]: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config	2017-12-01 17:26:06 +00:00
Roman Demachkovych	e0ad66f5a6	fix link name	2017-11-27 18:22:27 +01:00
Roman Demachkovych	370d045f5d	Change repo link	2017-11-27 18:14:12 +01:00
James Turnbull	47311bf005	Update configuration.md (#3513 ) 1. Removed https://prometheus.io prefix 2. Fixed broken file discovery link.	2017-11-27 14:52:32 +00:00
Tom Wilkie	9d4e332137	Merge pull request #3495 from tomwilkie/pod-uid-discovery-master Include Pod UID in the discovery metadata.	2017-11-24 15:37:57 +00:00
Tom Wilkie	7d4f7c4b71	Update docs for __meta_kubernetes_pod_uid	2017-11-24 15:02:53 +00:00
Roman Demachkovych	5e243bc556	fix link	2017-11-22 16:26:06 +01:00
Roman Demachkovych	b758039f80	Added in to documentation Ansible roles for Prometheus 2.0	2017-11-22 16:15:46 +01:00

1 2

75 commits