prometheus/config/testdata/conf.good.yml

290 lines
6 KiB
YAML
Raw Normal View History

2015-05-07 01:55:03 -07:00
# my global config
global:
2015-05-07 01:55:03 -07:00
scrape_interval: 15s
evaluation_interval: 30s
# scrape_timeout is set to the global default (10s).
external_labels:
monitor: codelab
foo: bar
2015-05-07 01:55:03 -07:00
rule_files:
- "first.rules"
- "my/*.rules"
2015-05-07 01:55:03 -07:00
remote_write:
- url: http://remote1/push
name: drop_expensive
write_relabel_configs:
- source_labels: [__name__]
regex: expensive.*
action: drop
- url: http://remote2/push
name: rw_tls
tls_config:
cert_file: valid_cert_file
key_file: valid_key_file
remote_read:
- url: http://remote1/read
read_recent: true
name: default
- url: http://remote3/read
read_recent: false
name: read_special
required_matchers:
job: special
tls_config:
cert_file: valid_cert_file
key_file: valid_key_file
2015-05-07 01:55:03 -07:00
scrape_configs:
- job_name: prometheus
2015-05-07 01:55:03 -07:00
honor_labels: true
# scrape_interval is defined by the configured global (15s).
# scrape_timeout is defined by the global default (10s).
2015-05-07 01:55:03 -07:00
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
2015-05-07 01:55:03 -07:00
file_sd_configs:
- files:
- foo/*.slow.json
- foo/*.slow.yml
- single/file.yml
refresh_interval: 10m
- files:
- bar/*.yaml
static_configs:
- targets: ['localhost:9090', 'localhost:9191']
2015-05-07 01:55:03 -07:00
labels:
my: label
your: label
relabel_configs:
- source_labels: [job, __meta_dns_name]
regex: (.*)some-[regex]
target_label: job
replacement: foo-${1}
# action defaults to 'replace'
- source_labels: [abc]
target_label: cde
- replacement: static
target_label: abc
- regex:
replacement: static
target_label: abc
bearer_token_file: valid_token_file
- job_name: service-x
basic_auth:
username: admin_name
password: "multiline\nmysecret\ntest"
scrape_interval: 50s
scrape_timeout: 5s
sample_limit: 1000
metrics_path: /my_path
scheme: https
dns_sd_configs:
- refresh_interval: 15s
names:
- first.dns.address.domain.com
- second.dns.address.domain.com
- names:
- first.dns.address.domain.com
# refresh_interval defaults to 30s.
relabel_configs:
- source_labels: [job]
regex: (.*)some-[regex]
action: drop
- source_labels: [__address__]
modulus: 8
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: 1
action: keep
2015-09-21 12:41:19 -07:00
- action: labelmap
regex: 1
- action: labeldrop
regex: d
- action: labelkeep
regex: k
metric_relabel_configs:
- source_labels: [__name__]
regex: expensive_metric.*
action: drop
- job_name: service-y
consul_sd_configs:
- server: 'localhost:1234'
2017-06-01 14:14:23 -07:00
token: mysecret
services: ['nginx', 'cache', 'mysql']
tags: ["canary", "v1"]
consul: improve consul service discovery (#3814) * consul: improve consul service discovery Related to #3711 - Add the ability to filter by tag and node-meta in an efficient way (`/catalog/services` allow filtering by node-meta, and returns a `map[string]string` or `service`->`tags`). Tags and nore-meta are also used in `/catalog/service` requests. - Do not require a call to the catalog if services are specified by name. This is important because on large cluster `/catalog/services` changes all the time. - Add `allow_stale` configuration option to do stale reads. Non-stale reads can be costly, even more when you are doing them to a remote datacenter with 10k+ targets over WAN (which is common for federation). - Add `refresh_interval` to minimize the strain on the catalog and on the service endpoint. This is needed because of that kind of behavior from consul: https://github.com/hashicorp/consul/issues/3712 and because a catalog on a large cluster would basically change *all* the time. No need to discover targets in 1sec if we scrape them every minute. - Added plenty of unit tests. Benchmarks ---------- ```yaml scrape_configs: - job_name: prometheus scrape_interval: 60s static_configs: - targets: ["127.0.0.1:9090"] - job_name: "observability-by-tag" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 tag: marathon-user-observability # Used in After refresh_interval: 30s # Used in After+delay relabel_configs: - source_labels: [__meta_consul_tags] regex: ^(.*,)?marathon-user-observability(,.*)?$ action: keep - job_name: "observability-by-name" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - observability-cerebro - observability-portal-web - job_name: "fake-fake-fake" scrape_interval: "15s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - fake-fake-fake ``` Note: tested with ~1200 services, ~5000 nodes. | Resource | Empty | Before | After | After + delay | | -------- |:-----:|:------:|:-----:|:-------------:| |/service-discovery size|5K|85MiB|27k|27k|27k| |`go_memstats_heap_objects`|100k|1M|120k|110k| |`go_memstats_heap_alloc_bytes`|24MB|150MB|28MB|27MB| |`rate(go_memstats_alloc_bytes_total[5m])`|0.2MB/s|28MB/s|2MB/s|0.3MB/s| |`rate(process_cpu_seconds_total[5m])`|0.1%|15%|2%|0.01%| |`process_open_fds`|16|*1236*|22|22| |`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="services"}[5m])`|~0|1|1|*0.03*| |`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="service"}[5m])`|0.1|*80*|0.5|0.5| |`prometheus_target_sync_length_seconds{quantile="0.9",scrape_job="observability-by-tag"}`|N/A|200ms|0.2ms|0.2ms| |Network bandwidth|~10kbps|~2.8Mbps|~1.6Mbps|~10kbps| Filtering by tag using relabel_configs uses **100kiB and 23kiB/s per service per job** and quite a lot of CPU. Also sends and additional *1Mbps* of traffic to consul. Being a little bit smarter about this reduces the overhead quite a lot. Limiting the number of `/catalog/services` queries per second almost removes the overhead of service discovery. * consul: tweak `refresh_interval` behavior `refresh_interval` now does what is advertised in the documentation, there won't be more that one update per `refresh_interval`. It now defaults to 30s (which was also the current waitTime in the consul query). This also make sure we don't wait another 30s if we already waited 29s in the blocking call by substracting the number of elapsed seconds. Hopefully this will do what people expect it does and will be safer for existing consul infrastructures.
2018-03-23 07:48:43 -07:00
node_meta:
rack: "123"
allow_stale: true
scheme: https
tls_config:
ca_file: valid_ca_file
cert_file: valid_cert_file
key_file: valid_key_file
insecure_skip_verify: false
relabel_configs:
- source_labels: [__meta_sd_consul_tags]
separator: ','
regex: label:([^=]+)=([^,]+)
target_label: ${1}
replacement: ${2}
- job_name: service-z
tls_config:
cert_file: valid_cert_file
key_file: valid_key_file
bearer_token: mysecret
- job_name: service-kubernetes
kubernetes_sd_configs:
- role: endpoints
2016-10-10 03:42:01 -07:00
api_server: 'https://localhost:1234'
tls_config:
cert_file: valid_cert_file
key_file: valid_key_file
basic_auth:
username: 'myusername'
password: 'mysecret'
- job_name: service-kubernetes-namespaces
kubernetes_sd_configs:
- role: endpoints
api_server: 'https://localhost:1234'
namespaces:
names:
- default
basic_auth:
username: 'myusername'
password_file: valid_password_file
- job_name: service-marathon
marathon_sd_configs:
- servers:
2016-09-29 05:57:28 -07:00
- 'https://marathon.example.com:443'
auth_token: "mysecret"
2016-09-29 05:57:28 -07:00
tls_config:
cert_file: valid_cert_file
key_file: valid_key_file
2015-09-21 11:49:19 -07:00
- job_name: service-ec2
ec2_sd_configs:
- region: us-east-1
access_key: access
secret_key: mysecret
profile: profile
filters:
- name: tag:environment
values:
- prod
- name: tag:service
values:
- web
- db
- job_name: service-azure
azure_sd_configs:
- environment: AzurePublicCloud
authentication_method: OAuth
subscription_id: 11AAAA11-A11A-111A-A111-1111A1111A11
tenant_id: BBBB222B-B2B2-2B22-B222-2BB2222BB2B2
client_id: 333333CC-3C33-3333-CCC3-33C3CCCCC33C
client_secret: mysecret
port: 9100
- job_name: service-nerve
nerve_sd_configs:
- servers:
- localhost
paths:
- /monitoring
- job_name: 0123service-xxx
metrics_path: /metrics
static_configs:
- targets:
- localhost:9090
- job_name: badfederation
honor_timestamps: false
metrics_path: /federate
static_configs:
- targets:
- localhost:9090
- job_name: 測試
metrics_path: /metrics
static_configs:
- targets:
- localhost:9090
2016-11-23 03:42:33 -08:00
2016-12-27 18:16:47 -08:00
- job_name: service-triton
triton_sd_configs:
- account: 'testAccount'
dns_suffix: 'triton.example.com'
endpoint: 'triton.example.com'
port: 9163
refresh_interval: 1m
version: 1
tls_config:
cert_file: valid_cert_file
key_file: valid_key_file
2016-12-27 18:16:47 -08:00
- job_name: digitalocean-droplets
digitalocean_sd_configs:
- bearer_token: abcdef
- job_name: dockerswarm
dockerswarm_sd_configs:
- host: http://127.0.0.1:2375
role: nodes
- job_name: service-openstack
openstack_sd_configs:
- role: instance
region: RegionOne
port: 80
refresh_interval: 1m
tls_config:
ca_file: valid_ca_file
cert_file: valid_cert_file
key_file: valid_key_file
2016-11-23 03:42:33 -08:00
alerting:
alertmanagers:
- scheme: https
static_configs:
- targets:
- "1.2.3.4:9093"
- "1.2.3.5:9093"
- "1.2.3.6:9093"