The Prometheus monitoring system and time series database.
Find a file
Corentin Chary 60dafd425c consul: improve consul service discovery (#3814)
* consul: improve consul service discovery

Related to #3711

- Add the ability to filter by tag and node-meta in an efficient way (`/catalog/services`
  allow filtering by node-meta, and returns a `map[string]string` or `service`->`tags`).
  Tags and nore-meta are also used in `/catalog/service` requests.
- Do not require a call to the catalog if services are specified by name. This is important
  because on large cluster `/catalog/services` changes all the time.
- Add `allow_stale` configuration option to do stale reads. Non-stale
  reads can be costly, even more when you are doing them to a remote
  datacenter with 10k+ targets over WAN (which is common for federation).
- Add `refresh_interval` to minimize the strain on the catalog and on the
  service endpoint. This is needed because of that kind of behavior from
  consul: https://github.com/hashicorp/consul/issues/3712 and because a catalog
  on a large cluster would basically change *all* the time. No need to discover
  targets in 1sec if we scrape them every minute.
- Added plenty of unit tests.

Benchmarks
----------

```yaml
scrape_configs:

- job_name: prometheus
  scrape_interval: 60s
  static_configs:
    - targets: ["127.0.0.1:9090"]

- job_name: "observability-by-tag"
  scrape_interval: "60s"
  metrics_path: "/metrics"
  consul_sd_configs:
    - server: consul.service.par.consul.prod.crto.in:8500
      tag: marathon-user-observability  # Used in After
      refresh_interval: 30s             # Used in After+delay
  relabel_configs:
    - source_labels: [__meta_consul_tags]
      regex: ^(.*,)?marathon-user-observability(,.*)?$
      action: keep

- job_name: "observability-by-name"
  scrape_interval: "60s"
  metrics_path: "/metrics"
  consul_sd_configs:
    - server: consul.service.par.consul.prod.crto.in:8500
      services:
        - observability-cerebro
        - observability-portal-web

- job_name: "fake-fake-fake"
  scrape_interval: "15s"
  metrics_path: "/metrics"
  consul_sd_configs:
    - server: consul.service.par.consul.prod.crto.in:8500
      services:
        - fake-fake-fake
```

Note: tested with ~1200 services, ~5000 nodes.

| Resource | Empty | Before | After | After + delay |
| -------- |:-----:|:------:|:-----:|:-------------:|
|/service-discovery size|5K|85MiB|27k|27k|27k|
|`go_memstats_heap_objects`|100k|1M|120k|110k|
|`go_memstats_heap_alloc_bytes`|24MB|150MB|28MB|27MB|
|`rate(go_memstats_alloc_bytes_total[5m])`|0.2MB/s|28MB/s|2MB/s|0.3MB/s|
|`rate(process_cpu_seconds_total[5m])`|0.1%|15%|2%|0.01%|
|`process_open_fds`|16|*1236*|22|22|
|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="services"}[5m])`|~0|1|1|*0.03*|
|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="service"}[5m])`|0.1|*80*|0.5|0.5|
|`prometheus_target_sync_length_seconds{quantile="0.9",scrape_job="observability-by-tag"}`|N/A|200ms|0.2ms|0.2ms|
|Network bandwidth|~10kbps|~2.8Mbps|~1.6Mbps|~10kbps|

Filtering by tag using relabel_configs uses **100kiB and 23kiB/s per service per job** and quite a lot of CPU. Also sends and additional *1Mbps* of traffic to consul.
Being a little bit smarter about this reduces the overhead quite a lot.
Limiting the number of `/catalog/services` queries per second almost removes the overhead of service discovery.

* consul: tweak `refresh_interval` behavior

`refresh_interval` now does what is advertised in the documentation,
there won't be more that one update per `refresh_interval`. It now
defaults to 30s (which was also the current waitTime in the consul query).

This also make sure we don't wait another 30s if we already waited 29s
in the blocking call by substracting the number of elapsed seconds.

Hopefully this will do what people expect it does and will be safer
for existing consul infrastructures.
2018-03-23 14:48:43 +00:00
.github Update command line flags in issue template (#3317) 2017-10-19 13:59:00 +01:00
cmd Report internal runtime information on status page (#3921) 2018-03-21 16:08:37 +00:00
config consul: improve consul service discovery (#3814) 2018-03-23 14:48:43 +00:00
console_libraries Cut down console template examples to just node and prometheus (#3099) 2017-08-21 16:35:20 +01:00
consoles Fix rendering issues with console templates. (#3744) 2018-01-29 10:38:39 +00:00
discovery consul: improve consul service discovery (#3814) 2018-03-23 14:48:43 +00:00
docs consul: improve consul service discovery (#3814) 2018-03-23 14:48:43 +00:00
documentation minor yaml indentation consistency fix in example configs (#3946) 2018-03-11 23:06:13 +00:00
notifier Add dropped alertmanagers to alertmanagers API (#3865) 2018-02-21 09:00:07 +00:00
pkg Merge pull request #3835 from krasi-georgiev/pool-package-generalize 2018-02-28 14:30:46 +01:00
prompb api: add flag to skip head on snapshots 2018-03-08 13:07:12 +01:00
promql Parser test cleanup (#3977) 2018-03-20 14:30:52 +00:00
relabel rename package retrieve to scrape 2018-02-01 09:55:07 +00:00
rules all: remove unnecessary type conversions (#3992) 2018-03-21 09:25:22 +00:00
scrape all: remove unnecessary type conversions (#3992) 2018-03-21 09:25:22 +00:00
scripts revert ot using the gogofast plugin and regenerate grpc server 2018-03-01 11:57:31 +02:00
storage all: remove unnecessary type conversions (#3992) 2018-03-21 09:25:22 +00:00
template template: all text_template settings before parsing (bugfix "nil-pointer dereference") (#3854) 2018-02-17 07:57:25 +00:00
util General simplifications (#3887) 2018-02-26 07:58:10 +00:00
vendor Update vendor golang.org/x/... 2018-03-22 07:59:39 +00:00
web Report internal runtime information on status page (#3921) 2018-03-21 16:08:37 +00:00
.dockerignore New release process using docker, circleci and a centralized 2016-04-18 22:41:04 +02:00
.gitignore cleanup gitignore (#3869) 2018-02-20 11:03:22 +00:00
.promu.yml promu: Use default Go version again 2016-10-11 11:42:05 +02:00
.travis.yml Check for unused vendored packages (#3892) 2018-03-02 10:20:45 +00:00
CHANGELOG.md *: cut v2.2.0 2018-03-08 15:37:46 +01:00
circle.yml bump to golang 1.10 (#3856) 2018-02-26 09:42:49 +00:00
code-of-conduct.md Add CNCF code of conduct as the Prometheus code of conduct 2016-10-19 21:39:19 +02:00
CONTRIBUTING.md Add section for new contributors 2017-07-27 16:53:34 +05:30
Dockerfile Fix command line flags in Dockerfile 2017-07-13 12:14:49 +02:00
LICENSE Clean up license issues. 2015-01-21 20:07:45 +01:00
MAINTAINERS.md Remove _local storage_ from fabxc's responsibilities again 2017-11-03 12:52:24 +01:00
Makefile web: replace deprecated InstrumentHandler() (#3862) 2018-03-21 08:16:16 +00:00
NOTICE Update NOTICE for gogo/protobuf 2017-11-02 15:28:47 +01:00
README.md bump to golang 1.10 (#3856) 2018-02-26 09:42:49 +00:00
VERSION *: cut v2.2.0 2018-03-08 15:37:46 +01:00

Prometheus Build Status

CircleCI Docker Repository on Quay Docker Pulls Go Report Card CII Best Practices

Visit prometheus.io for the full documentation, examples and guides.

Prometheus, a Cloud Native Computing Foundation project, is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true.

Prometheus' main distinguishing features as compared to other monitoring systems are:

  • a multi-dimensional data model (timeseries defined by metric name and set of key/value dimensions)
  • a flexible query language to leverage this dimensionality
  • no dependency on distributed storage; single server nodes are autonomous
  • timeseries collection happens via a pull model over HTTP
  • pushing timeseries is supported via an intermediary gateway
  • targets are discovered via service discovery or static configuration
  • multiple modes of graphing and dashboarding support
  • support for hierarchical and horizontal federation

Architecture overview

Install

There are various ways of installing Prometheus.

Precompiled binaries

Precompiled binaries for released versions are available in the download section on prometheus.io. Using the latest production release binary is the recommended way of installing Prometheus. See the Installing chapter in the documentation for all the details.

Debian packages are available.

Docker images

Docker images are available on Quay.io.

You can launch a Prometheus container for trying it out with

$ docker run --name prometheus -d -p 127.0.0.1:9090:9090 quay.io/prometheus/prometheus

Prometheus will now be reachable at http://localhost:9090/.

Building from source

To build Prometheus from the source code yourself you need to have a working Go environment with version 1.10 or greater installed.

You can directly use the go tool to download and install the prometheus and promtool binaries into your GOPATH:

$ go get github.com/prometheus/prometheus/cmd/...
$ prometheus --config.file=your_config.yml

You can also clone the repository yourself and build using make:

$ mkdir -p $GOPATH/src/github.com/prometheus
$ cd $GOPATH/src/github.com/prometheus
$ git clone https://github.com/prometheus/prometheus.git
$ cd prometheus
$ make build
$ ./prometheus --config.file=your_config.yml

The Makefile provides several targets:

  • build: build the prometheus and promtool binaries
  • test: run the tests
  • test-short: run the short tests
  • format: format the source code
  • vet: check the source code for common errors
  • assets: rebuild the static assets
  • docker: build a docker container for the current HEAD

More information

  • The source code is periodically indexed: Prometheus Core.
  • You will find a Travis CI configuration in .travis.yml.
  • See the Community page for how to reach the Prometheus developers and users on various communication channels.

Contributing

Refer to CONTRIBUTING.md

License

Apache License 2.0, see LICENSE.