* consul: improve consul service discovery
Related to #3711
- Add the ability to filter by tag and node-meta in an efficient way (`/catalog/services`
allow filtering by node-meta, and returns a `map[string]string` or `service`->`tags`).
Tags and nore-meta are also used in `/catalog/service` requests.
- Do not require a call to the catalog if services are specified by name. This is important
because on large cluster `/catalog/services` changes all the time.
- Add `allow_stale` configuration option to do stale reads. Non-stale
reads can be costly, even more when you are doing them to a remote
datacenter with 10k+ targets over WAN (which is common for federation).
- Add `refresh_interval` to minimize the strain on the catalog and on the
service endpoint. This is needed because of that kind of behavior from
consul: https://github.com/hashicorp/consul/issues/3712 and because a catalog
on a large cluster would basically change *all* the time. No need to discover
targets in 1sec if we scrape them every minute.
- Added plenty of unit tests.
Benchmarks
----------
```yaml
scrape_configs:
- job_name: prometheus
scrape_interval: 60s
static_configs:
- targets: ["127.0.0.1:9090"]
- job_name: "observability-by-tag"
scrape_interval: "60s"
metrics_path: "/metrics"
consul_sd_configs:
- server: consul.service.par.consul.prod.crto.in:8500
tag: marathon-user-observability # Used in After
refresh_interval: 30s # Used in After+delay
relabel_configs:
- source_labels: [__meta_consul_tags]
regex: ^(.*,)?marathon-user-observability(,.*)?$
action: keep
- job_name: "observability-by-name"
scrape_interval: "60s"
metrics_path: "/metrics"
consul_sd_configs:
- server: consul.service.par.consul.prod.crto.in:8500
services:
- observability-cerebro
- observability-portal-web
- job_name: "fake-fake-fake"
scrape_interval: "15s"
metrics_path: "/metrics"
consul_sd_configs:
- server: consul.service.par.consul.prod.crto.in:8500
services:
- fake-fake-fake
```
Note: tested with ~1200 services, ~5000 nodes.
| Resource | Empty | Before | After | After + delay |
| -------- |:-----:|:------:|:-----:|:-------------:|
|/service-discovery size|5K|85MiB|27k|27k|27k|
|`go_memstats_heap_objects`|100k|1M|120k|110k|
|`go_memstats_heap_alloc_bytes`|24MB|150MB|28MB|27MB|
|`rate(go_memstats_alloc_bytes_total[5m])`|0.2MB/s|28MB/s|2MB/s|0.3MB/s|
|`rate(process_cpu_seconds_total[5m])`|0.1%|15%|2%|0.01%|
|`process_open_fds`|16|*1236*|22|22|
|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="services"}[5m])`|~0|1|1|*0.03*|
|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="service"}[5m])`|0.1|*80*|0.5|0.5|
|`prometheus_target_sync_length_seconds{quantile="0.9",scrape_job="observability-by-tag"}`|N/A|200ms|0.2ms|0.2ms|
|Network bandwidth|~10kbps|~2.8Mbps|~1.6Mbps|~10kbps|
Filtering by tag using relabel_configs uses **100kiB and 23kiB/s per service per job** and quite a lot of CPU. Also sends and additional *1Mbps* of traffic to consul.
Being a little bit smarter about this reduces the overhead quite a lot.
Limiting the number of `/catalog/services` queries per second almost removes the overhead of service discovery.
* consul: tweak `refresh_interval` behavior
`refresh_interval` now does what is advertised in the documentation,
there won't be more that one update per `refresh_interval`. It now
defaults to 30s (which was also the current waitTime in the consul query).
This also make sure we don't wait another 30s if we already waited 29s
in the blocking call by substracting the number of elapsed seconds.
Hopefully this will do what people expect it does and will be safer
for existing consul infrastructures.
* web: replace deprecated InstrumentHandler()
This change replaces the deprecated InstrumentHandler function by the
equivalent functions from the promhttp package.
The following metrics are removed:
* http_request_duration_microseconds (Summary).
* http_request_size_bytes (Summary).
* http_requests_total (Counter).
And the following metrics are added instead:
* prometheus_http_request_duration_seconds (Histogram).
* prometheus_http_response_size_bytes (Histogram).
* promhttp_metric_handler_requests_in_flight (Gauge).
* promhttp_metric_handler_requests_total (Counter).
* Update github.com/prometheus/common/route package
* web: refactor using the new prometheus/common/route package
* parser test cleanup
- Test against the exported package functions instead of the private functions.
* Improves readability of TestParseSeries
- Moves package function closer to parser function
This is based on my experience while debugging https://github.com/prometheus/prometheus/issues/3943.
I needed to deduct few things, and all that would be just bit easier with these two logs:
- new block's ULID on each compaction.
- actual list of Blocks (ulid + time range) on Prometheus startup (easy to log that while repairing blocks).
We don't really need blocks that takes part in compaction - that can be deducted easily based on time ranges of blocks we have currently in system.
What do you think?
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>