3369422327
The problem reported in #2799 was that in the event that all records for a name were removed, the target group was never updated to be the "empty" set. Essentially, whatever Prometheus last saw as a non-empty list of targets would stay that way forever (or at least until Prometheus restarted...). This came about because of a fairly naive interpretation of what a valid-looking DNS response actually looked like -- essentially, the only valid DNS responses were ones that had a non-empty record list. That's fine as long as your config always lists only target names which have non-empty record sets; if your environment happens to legitimately have empty record sets sometimes, all hell breaks loose (otherwise-cleanly shutdown systems trigger up==0 alerts, for instance). This patch is a refactoring of the DNS lookup behaviour that maintains existing behaviour with regard to search paths, but correctly handles empty and non-existent record sets. RFC1034 s4.3.1 says there's three ways a recursive DNS server can respond: 1. Here is your answer (possibly an empty answer, because of the way DNS considers all records for a name, regardless of type, when deciding whether the name exists). 2. There is no spoon (the name you asked for definitely does not exist). 3. I am a teapot (something has gone terribly wrong). Situations 1 and 2 are fine and dandy; whatever the answer is (empty or otherwise) is the list of targets. If something has gone wrong, then we shouldn't go updating the target list because we don't really *know* what the target list should be. Multiple DNS servers to query is a straightforward augmentation; if you get an error, then try the next server in the list, until you get an answer or run out servers to ask. Only if *all* the servers return errors should you return an error to the calling code. Where things get complicated is the search path. In order to be able to confidently say, "this name does not exist anywhere, you can remove all the targets for this name because it's definitely GORN", at least one server for *all* the possible names need to return either successful-but-empty responses, or NXDOMAIN. If any name errors out, then -- since that one might have been the one where the records came from -- you need to say "maintain the status quo until we get a known-good response". It is possible, though unlikely, that a poorly-configured DNS setup (say, one which had a domain in its search path for which all configured recursive resolvers respond with REFUSED) could result in the same "stuck" records problem we're solving here, but the DNS configuration should be fixed in that case, and there's nothing we can do in Prometheus itself to fix the problem. I've tested this patch on a local scratch instance in all the various ways I can think of: 1. Adding records (targets get scraped) 2. Adding records of a different type 3. Remove records of the requested type, leaving other type records intact (targets don't get scraped) 4. Remove all records for the name (targets don't get scraped) 5. Shutdown the resolver (targets still get scraped) There's no automated test suite additions, because there isn't a test suite for DNS discovery, and I was stretching my Go skills to the limit to make this happen; mock objects are beyond me. |
||
---|---|---|
.github | ||
cmd | ||
config | ||
console_libraries | ||
consoles | ||
discovery | ||
documentation | ||
notifier | ||
promql | ||
relabel | ||
retrieval | ||
rules | ||
scripts | ||
storage | ||
template | ||
util | ||
vendor | ||
web | ||
.codeclimate.yml | ||
.dockerignore | ||
.gitignore | ||
.promu.yml | ||
.travis.yml | ||
CHANGELOG.md | ||
circle.yml | ||
code-of-conduct.md | ||
CONTRIBUTING.md | ||
Dockerfile | ||
LICENSE | ||
MAINTAINERS.md | ||
Makefile | ||
NOTICE | ||
README.md | ||
VERSION |
Prometheus
Visit prometheus.io for the full documentation, examples and guides.
Prometheus, a Cloud Native Computing Foundation project, is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true.
Prometheus' main distinguishing features as compared to other monitoring systems are:
- a multi-dimensional data model (timeseries defined by metric name and set of key/value dimensions)
- a flexible query language to leverage this dimensionality
- no dependency on distributed storage; single server nodes are autonomous
- timeseries collection happens via a pull model over HTTP
- pushing timeseries is supported via an intermediary gateway
- targets are discovered via service discovery or static configuration
- multiple modes of graphing and dashboarding support
- support for hierarchical and horizontal federation
Architecture overview
Install
There are various ways of installing Prometheus.
Precompiled binaries
Precompiled binaries for released versions are available in the download section on prometheus.io. Using the latest production release binary is the recommended way of installing Prometheus. See the Installing chapter in the documentation for all the details.
Debian packages are available.
Docker images
Docker images are available on Quay.io.
You can launch a Prometheus container for trying it out with
$ docker run --name prometheus -d -p 127.0.0.1:9090:9090 quay.io/prometheus/prometheus
Prometheus will now be reachable at http://localhost:9090/.
Building from source
To build Prometheus from the source code yourself you need to have a working Go environment with version 1.8 or greater installed.
You can directly use the go
tool to download and install the prometheus
and promtool
binaries into your GOPATH
. We use Go 1.5's experimental
vendoring feature, so you will also need to set the GO15VENDOREXPERIMENT=1
environment variable in this case:
$ GO15VENDOREXPERIMENT=1 go get github.com/prometheus/prometheus/cmd/...
$ prometheus -config.file=your_config.yml
You can also clone the repository yourself and build using make
:
$ mkdir -p $GOPATH/src/github.com/prometheus
$ cd $GOPATH/src/github.com/prometheus
$ git clone https://github.com/prometheus/prometheus.git
$ cd prometheus
$ make build
$ ./prometheus -config.file=your_config.yml
The Makefile provides several targets:
- build: build the
prometheus
andpromtool
binaries - test: run the tests
- test-short: run the short tests
- format: format the source code
- vet: check the source code for common errors
- assets: rebuild the static assets
- docker: build a docker container for the current
HEAD
More information
- The source code is periodically indexed: Prometheus Core.
- You will find a Travis CI configuration in
.travis.yml
. - See the Community page for how to reach the Prometheus developers and users on various communication channels.
Contributing
Refer to CONTRIBUTING.md
License
Apache License 2.0, see LICENSE.