mirror of https://github.com/prometheus/prometheus.git synced 2025-03-05 20:59:13 -08:00

The Prometheus monitoring system and time series database.

alerting graphing hacktoberfest metrics monitoring prometheus time-series

Find a file

Corentin Chary 60dafd425c consul: improve consul service discovery (#3814 ) * consul: improve consul service discovery Related to #3711 - Add the ability to filter by tag and node-meta in an efficient way (`/catalog/services` allow filtering by node-meta, and returns a `map[string]string` or `service`->`tags`). Tags and nore-meta are also used in `/catalog/service` requests. - Do not require a call to the catalog if services are specified by name. This is important because on large cluster `/catalog/services` changes all the time. - Add `allow_stale` configuration option to do stale reads. Non-stale reads can be costly, even more when you are doing them to a remote datacenter with 10k+ targets over WAN (which is common for federation). - Add `refresh_interval` to minimize the strain on the catalog and on the service endpoint. This is needed because of that kind of behavior from consul: https://github.com/hashicorp/consul/issues/3712 and because a catalog on a large cluster would basically change all the time. No need to discover targets in 1sec if we scrape them every minute. - Added plenty of unit tests. Benchmarks ---------- ```yaml scrape_configs: - job_name: prometheus scrape_interval: 60s static_configs: - targets: ["127.0.0.1:9090"] - job_name: "observability-by-tag" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 tag: marathon-user-observability # Used in After refresh_interval: 30s # Used in After+delay relabel_configs: - source_labels: [__meta_consul_tags] regex: ^(.,)?marathon-user-observability(,.)?$ action: keep - job_name: "observability-by-name" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - observability-cerebro - observability-portal-web - job_name: "fake-fake-fake" scrape_interval: "15s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - fake-fake-fake ``` Note: tested with ~1200 services, ~5000 nodes. \| Resource \| Empty \| Before \| After \| After + delay \| \| -------- \|:-----:\|:------:\|:-----:\|:-------------:\| \|/service-discovery size\|5K\|85MiB\|27k\|27k\|27k\| \|`go_memstats_heap_objects`\|100k\|1M\|120k\|110k\| \|`go_memstats_heap_alloc_bytes`\|24MB\|150MB\|28MB\|27MB\| \|`rate(go_memstats_alloc_bytes_total[5m])`\|0.2MB/s\|28MB/s\|2MB/s\|0.3MB/s\| \|`rate(process_cpu_seconds_total[5m])`\|0.1%\|15%\|2%\|0.01%\| \|`process_open_fds`\|16\|1236\|22\|22\| \|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="services"}[5m])`\|~0\|1\|1\|0.03\| \|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="service"}[5m])`\|0.1\|80\|0.5\|0.5\| \|`prometheus_target_sync_length_seconds{quantile="0.9",scrape_job="observability-by-tag"}`\|N/A\|200ms\|0.2ms\|0.2ms\| \|Network bandwidth\|~10kbps\|~2.8Mbps\|~1.6Mbps\|~10kbps\| Filtering by tag using relabel_configs uses 100kiB and 23kiB/s per service per job and quite a lot of CPU. Also sends and additional 1Mbps of traffic to consul. Being a little bit smarter about this reduces the overhead quite a lot. Limiting the number of `/catalog/services` queries per second almost removes the overhead of service discovery. * consul: tweak `refresh_interval` behavior `refresh_interval` now does what is advertised in the documentation, there won't be more that one update per `refresh_interval`. It now defaults to 30s (which was also the current waitTime in the consul query). This also make sure we don't wait another 30s if we already waited 29s in the blocking call by substracting the number of elapsed seconds. Hopefully this will do what people expect it does and will be safer for existing consul infrastructures.		2018-03-23 14:48:43 +00:00
.github	Update command line flags in issue template (#3317 )	2017-10-19 13:59:00 +01:00
cmd	Report internal runtime information on status page (#3921 )	2018-03-21 16:08:37 +00:00
config	consul: improve consul service discovery (#3814 )	2018-03-23 14:48:43 +00:00
console_libraries	Cut down console template examples to just node and prometheus (#3099 )	2017-08-21 16:35:20 +01:00
consoles	Fix rendering issues with console templates. (#3744 )	2018-01-29 10:38:39 +00:00
discovery	consul: improve consul service discovery (#3814 )	2018-03-23 14:48:43 +00:00
docs	consul: improve consul service discovery (#3814 )	2018-03-23 14:48:43 +00:00
documentation	minor yaml indentation consistency fix in example configs (#3946 )	2018-03-11 23:06:13 +00:00
notifier	Add dropped alertmanagers to alertmanagers API (#3865 )	2018-02-21 09:00:07 +00:00
pkg	Merge pull request #3835 from krasi-georgiev/pool-package-generalize	2018-02-28 14:30:46 +01:00
prompb	api: add flag to skip head on snapshots	2018-03-08 13:07:12 +01:00
promql	Parser test cleanup (#3977 )	2018-03-20 14:30:52 +00:00
relabel	rename package retrieve to scrape	2018-02-01 09:55:07 +00:00
rules	all: remove unnecessary type conversions (#3992 )	2018-03-21 09:25:22 +00:00
scrape	all: remove unnecessary type conversions (#3992 )	2018-03-21 09:25:22 +00:00
scripts	revert ot using the gogofast plugin and regenerate grpc server	2018-03-01 11:57:31 +02:00
storage	all: remove unnecessary type conversions (#3992 )	2018-03-21 09:25:22 +00:00
template	template: all `text_template` settings before parsing (bugfix "nil-pointer dereference") (#3854 )	2018-02-17 07:57:25 +00:00
util	General simplifications (#3887 )	2018-02-26 07:58:10 +00:00
vendor	Update vendor golang.org/x/...	2018-03-22 07:59:39 +00:00
web	Report internal runtime information on status page (#3921 )	2018-03-21 16:08:37 +00:00
.dockerignore	New release process using docker, circleci and a centralized	2016-04-18 22:41:04 +02:00
.gitignore	cleanup gitignore (#3869 )	2018-02-20 11:03:22 +00:00
.promu.yml	promu: Use default Go version again	2016-10-11 11:42:05 +02:00
.travis.yml	Check for unused vendored packages (#3892 )	2018-03-02 10:20:45 +00:00
CHANGELOG.md	*: cut v2.2.0	2018-03-08 15:37:46 +01:00
circle.yml	bump to golang 1.10 (#3856 )	2018-02-26 09:42:49 +00:00
code-of-conduct.md	Add CNCF code of conduct as the Prometheus code of conduct	2016-10-19 21:39:19 +02:00
CONTRIBUTING.md	Add section for new contributors	2017-07-27 16:53:34 +05:30
Dockerfile	Fix command line flags in Dockerfile	2017-07-13 12:14:49 +02:00
LICENSE	Clean up license issues.	2015-01-21 20:07:45 +01:00
MAINTAINERS.md	Remove _local storage_ from fabxc's responsibilities again	2017-11-03 12:52:24 +01:00
Makefile	web: replace deprecated InstrumentHandler() (#3862 )	2018-03-21 08:16:16 +00:00
NOTICE	Update NOTICE for gogo/protobuf	2017-11-02 15:28:47 +01:00
README.md	bump to golang 1.10 (#3856 )	2018-02-26 09:42:49 +00:00
VERSION	*: cut v2.2.0	2018-03-08 15:37:46 +01:00

README.md

Prometheus

Visit prometheus.io for the full documentation, examples and guides.

Prometheus, a Cloud Native Computing Foundation project, is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true.

Prometheus' main distinguishing features as compared to other monitoring systems are:

a multi-dimensional data model (timeseries defined by metric name and set of key/value dimensions)
a flexible query language to leverage this dimensionality
no dependency on distributed storage; single server nodes are autonomous
timeseries collection happens via a pull model over HTTP
pushing timeseries is supported via an intermediary gateway
targets are discovered via service discovery or static configuration
multiple modes of graphing and dashboarding support
support for hierarchical and horizontal federation

Architecture overview

Install

There are various ways of installing Prometheus.

Precompiled binaries

Precompiled binaries for released versions are available in the download section on prometheus.io. Using the latest production release binary is the recommended way of installing Prometheus. See the Installing chapter in the documentation for all the details.

Debian packages are available.

Docker images

Docker images are available on Quay.io.

You can launch a Prometheus container for trying it out with

$ docker run --name prometheus -d -p 127.0.0.1:9090:9090 quay.io/prometheus/prometheus

Prometheus will now be reachable at http://localhost:9090/.

Building from source

To build Prometheus from the source code yourself you need to have a working Go environment with version 1.10 or greater installed.

You can directly use the go tool to download and install the prometheus and promtool binaries into your GOPATH:

$ go get github.com/prometheus/prometheus/cmd/...
$ prometheus --config.file=your_config.yml

You can also clone the repository yourself and build using make:

$ mkdir -p $GOPATH/src/github.com/prometheus
$ cd $GOPATH/src/github.com/prometheus
$ git clone https://github.com/prometheus/prometheus.git
$ cd prometheus
$ make build
$ ./prometheus --config.file=your_config.yml

The Makefile provides several targets:

build: build the prometheus and promtool binaries
test: run the tests
test-short: run the short tests
format: format the source code
vet: check the source code for common errors
assets: rebuild the static assets
docker: build a docker container for the current HEAD

More information

The source code is periodically indexed: Prometheus Core.
You will find a Travis CI configuration in .travis.yml.
See the Community page for how to reach the Prometheus developers and users on various communication channels.

Contributing

Refer to CONTRIBUTING.md

License

Apache License 2.0, see LICENSE.