Commit graph

3997 commits

Author SHA1 Message Date
beorn7 84211bd2df Foward-merge bug fixes and cherry-picks from 'release-1.7' 2017-09-15 13:44:22 +02:00
Matt Palmer 3369422327 Improve DNS response handling to prevent "stuck" records [Fixes #2799] (#3138)
The problem reported in #2799 was that in the event that all records for a
name were removed, the target group was never updated to be the "empty" set.
Essentially, whatever Prometheus last saw as a non-empty list of targets
would stay that way forever (or at least until Prometheus restarted...).  This
came about because of a fairly naive interpretation of what a valid-looking
DNS response actually looked like -- essentially, the only valid DNS responses
were ones that had a non-empty record list.  That's fine as long as your
config always lists only target names which have non-empty record sets; if
your environment happens to legitimately have empty record sets sometimes,
all hell breaks loose (otherwise-cleanly shutdown systems trigger up==0 alerts,
for instance).

This patch is a refactoring of the DNS lookup behaviour that maintains
existing behaviour with regard to search paths, but correctly handles empty
and non-existent record sets.

RFC1034 s4.3.1 says there's three ways a recursive DNS server can respond:

1.  Here is your answer (possibly an empty answer, because of the way DNS
   considers all records for a name, regardless of type, when deciding
   whether the name exists).

2. There is no spoon (the name you asked for definitely does not exist).

3. I am a teapot (something has gone terribly wrong).

Situations 1 and 2 are fine and dandy; whatever the answer is (empty or
otherwise) is the list of targets.  If something has gone wrong, then we
shouldn't go updating the target list because we don't really *know* what
the target list should be.

Multiple DNS servers to query is a straightforward augmentation; if you get
an error, then try the next server in the list, until you get an answer or
run out servers to ask.  Only if *all* the servers return errors should you
return an error to the calling code.

Where things get complicated is the search path.  In order to be able to
confidently say, "this name does not exist anywhere, you can remove all the
targets for this name because it's definitely GORN", at least one server for
*all* the possible names need to return either successful-but-empty
responses, or NXDOMAIN.  If any name errors out, then -- since that one
might have been the one where the records came from -- you need to say
"maintain the status quo until we get a known-good response".

It is possible, though unlikely, that a poorly-configured DNS setup (say,
one which had a domain in its search path for which all configured recursive
resolvers respond with REFUSED) could result in the same "stuck" records
problem we're solving here, but the DNS configuration should be fixed in
that case, and there's nothing we can do in Prometheus itself to fix the
problem.

I've tested this patch on a local scratch instance in all the various ways I
can think of:

1. Adding records (targets get scraped)

2. Adding records of a different type

3. Remove records of the requested type, leaving other type records intact
   (targets don't get scraped)

4. Remove all records for the name (targets don't get scraped)

5. Shutdown the resolver (targets still get scraped)

There's no automated test suite additions, because there isn't a test suite
for DNS discovery, and I was stretching my Go skills to the limit to make
this happen; mock objects are beyond me.
2017-09-15 12:26:10 +02:00
Björn Rabenstein 4b8666b739 Merge pull request #3176 from prometheus/beorn7/release
Backport the templating fix from master
2017-09-14 19:07:52 +02:00
beorn7 a3fd7dd335 Backport the templating fix from master
The original fix is in commit 5f5d77848e
2017-09-14 18:12:00 +02:00
Julius Volz 8ebeed0b44 remote: Expose ClientConfig type (#3165)
The Client type is already exposed, but can't be used without the config for it
also being exposed. Using the remote.Client from other programs is useful to do
full end-to-end tests of Prometheus's remote protocol against adapter
implementations.
2017-09-14 15:25:09 +02:00
Björn Rabenstein df4bc3e407 Merge pull request #3170 from tomwilkie/1.7-2969-negative-shards
Prevent number of remote write shards from going negative.
2017-09-14 13:29:34 +02:00
Tom Wilkie f66f882d08 Merge pull request #3160 from bboreham/remote-keepalive
Re-enable http keepalive on remote storage
2017-09-14 08:23:43 +01:00
Tom Wilkie 4f8efdbd59 Prevent number of remote write shards from going negative.
This can happen in the situation where the system scales up the number of shards massively (to deal with some backlog), then scales it down again as the number of samples sent during the time period is less than the number received.
2017-09-14 08:07:40 +01:00
Ben Kochie 1ab0bbb2c2 Merge pull request #3125 from prometheus/bjk/staticcheck
Enable statitcheck at build time.
2017-09-13 14:42:29 -07:00
Björn Rabenstein 4d8e7ca185 Merge pull request #3159 from mattbostock/1.7_marathon_sd_cherrypick
Marathon SD: Set port index label
2017-09-12 18:53:40 +02:00
Matt Bostock e758260986 Marathon SD: Set port index label
The changes [1][] to Marathon service discovery to support multiple
ports mean that Prometheus now attempts to scrape all ports belonging to
a Marathon service.

You can use port definition or port mapping labels to filter out which
ports to scrape but that requires service owners to update their
Marathon configuration.

To allow for a smoother migration path, add a
`__meta_marathon_port_index` label, whose value is set to the port's
sequential index integer. For example, PORT0 has the value `0`, PORT1
has the value `1`, and so on.

This allows you to support scraping both the first available port (the
previous behaviour) in addition to ports with a `metrics` label.

For example, here's the relabel configuration we might use with
this patch:

    - action: keep
      source_labels: ['__meta_marathon_port_definition_label_metrics', '__meta_marathon_port_mapping_label_metrics', '__meta_marathon_port_index']
      # Keep if port mapping or definition has a 'metrics' label with any
      # non-empty value, or if no 'metrics' port label exists but this is the
      # service's first available port
      regex: ([^;]+;;[^;]+|;[^;]+;[^;]+|;;0)

This assumes that the Marathon API returns the ports in sorted order
(matching PORT0, PORT1, etc), which it appears that it does.

[1]: https://github.com/prometheus/prometheus/pull/2506
2017-09-11 13:40:51 +01:00
Bryan Boreham 9d6b945e41 Default HTTP keep-alive ON for remote read/write 2017-09-11 09:48:30 +00:00
Bryan Boreham e0a4d18301 Allow http keep-alive setting to be overridden in config 2017-09-11 09:07:14 +00:00
Tobias Schmidt 8bee283f8a Merge pull request #2895 from jamiemoore/ec2_discovery_rolearn
Add the ability to assume a role for ec2 discovery
2017-09-09 19:20:47 +02:00
Jamie Moore 7a135e0a1b Add the ability to assume a role for ec2 discovery 2017-09-10 00:36:43 +10:00
Fabian Reinartz 9b4c3d4254 Merge pull request #3146 from prometheus/fixprofpath
web: fix profile paths
2017-09-08 14:19:46 +02:00
Fabian Reinartz 64c7c56df8 Merge pull request #3147 from dvrkps/patch-1
travis: add 1.x to go versions
2017-09-08 09:36:14 +02:00
Davor Kapsa bb853abf24 travis: add 1.x to go versions 2017-09-07 17:24:02 +02:00
Fabian Reinartz 27bdddbf51 web: fix profile paths 2017-09-07 16:24:12 +02:00
Fabian Reinartz 6ab652e3dc Merge pull request #3144 from wgliang/master
should use time.Since instead of time.Now().Sub
2017-09-07 13:51:46 +02:00
Fabian Reinartz a0280cc489 Merge pull request #3142 from prometheus/fish/fix-k8s-ingress-type
k8s: Use versioned struct for ingress discovery
2017-09-07 13:51:20 +02:00
wangguoliang 7e6c6020ff should use time.Since instead of time.Now().Sub
Signed-off-by: wgliang <liangcszzu@163.com>
2017-09-07 18:00:45 +08:00
Johannes 'fish' Ziemke 75aec7d970 k8s: Use versioned struct for ingress discovery 2017-09-06 12:47:03 +02:00
Johannes 'fish' Ziemke 70f3d1e9f9 k8s: Support discovery of ingresses (#3111)
* k8s: Support discovery of ingresses

* Move additional labels below allocation

This makes it more obvious why the additional elements are allocated.
Also fix allocation for node where we only set a single label.

* k8s: Remove port from ingress discovery

* k8s: Add comment to ingress discovery example
2017-09-04 13:10:44 +02:00
Tobias Schmidt 29fff1eca4 Merge pull request #2966 from alkalinecoffee/consul-node-metadata
Add support for consul's node metadata
2017-09-02 18:43:25 +02:00
Tobias Schmidt d0a02703a2 Merge pull request #3105 from sak0/dev
discovery openstack: support discovery hypervisors, add rule option.
2017-08-31 14:08:16 +02:00
CuiHaozhi b1c18bf29b discovery openstack: support discovery hosts, add rule option.
Signed-off-by: CuiHaozhi <cuihz@wise2c.com>
2017-08-29 10:14:00 -04:00
Julius Volz aa5cdcb11e Remove extra space in log output 2017-08-29 15:24:00 +02:00
gdmello 35c952e344 Added logging for remote storage adapter (#3106)
* Added logging for remote storage adapter on startup and on any error condition during /read or /write.

* CR feedback.
2017-08-29 15:22:56 +02:00
Lynn Lin 1bf25dc1b2 fix issues reported by gofmt and spelling typo (#3127) 2017-08-29 09:00:11 +01:00
Ben Kochie 59aca4138b Fix staticcheck issues. 2017-08-28 17:29:01 +02:00
Ben Kochie 0fcfe3209f Add staticcheck to build. 2017-08-28 17:29:01 +02:00
Richard Hartmann 923be6a418 Merge pull request #3113 from prometheus/RichiH-patch-1
Point help to docs, not main Prometheus website
2017-08-26 20:18:13 +02:00
Richard Hartmann aa3fb1e7c4 Point help to docs, not main Prometheus website
No matter how we refactor docs, `/docs/` will stay the prefix, so there's not long-term risk in changing this.

One we version docs, we should probably try and keep link & version in sync.
2017-08-25 10:53:36 +02:00
Tobias Schmidt d6a0f46baf Fix formatting of GitHub issue template
There is actually an easier way to format comments, which doesn't
require a hack and also fixes the dispay in non-monospace fonts.
2017-08-24 13:33:16 +02:00
Tobias Schmidt 57a9de4a9a Merge pull request #3076 from Colstuwjx/fix/nil-target-group
Fix target group foreach nil bug.
2017-08-24 01:00:44 +02:00
Mark Adams 77c816b309 Fix pprof endpoints when -web.route-prefix or -web.external-url is used (#3054)
Whenever a route prefix is applied, the router prepends the prefix to
the URL path on the request. For most handlers, this is not an issue
because the request's path is only used for routing and is not actually
needed by the handler itself. However, Prometheus delegates the handling
of the /debug/* endpoints to the http.DefaultServeMux which has it's own
routing logic that depends on the url.Path. As a result, whenever a
prefix is applied, the prefixed URL is passed to the DefaultServeMux
which has no awareness of the prefix and returns a 404.

This change fixes the issue by creating a new serveDebug handler which
routes requests /debug/* requests to appropriate net/http/pprof handler
and removing the net/http/pprof import in cmd/prometheus since it is no
longer necessary.

Fixes #2183.
2017-08-23 00:00:56 +01:00
Colstuwjx 2b49df2c61 Fix target group foreach nil bug, directly return err. 2017-08-22 08:37:39 +08:00
Tobias Schmidt 32a951ec89 Add a big notice header to the github issue template header (#3103)
Trying to prevent usage questions in Github issue, this change adds a
multi-line notice header directing people to the mailing list.
2017-08-22 00:46:49 +01:00
Brian Brazil 2354c2544b Set timestamp for date functions (#3070) 2017-08-21 17:15:25 +01:00
Max Inden 3101606756 Merge pull request #2711 from mxinden/api-config
Expose current Prometheus config via /status/config
2017-08-14 19:01:13 +02:00
Fabian Reinartz 7d8cd4e6bf Merge pull request #3057 from sak0/dev
discovery openstack: handle instances without ip
2017-08-14 14:46:51 +02:00
Max Leonard Inden 1c96fbb992
Expose current Prometheus config via /status/config
This PR adds the `/status/config` endpoint which exposes the currently
loaded Prometheus config. This is the same config that is displayed on
`/config` in the UI in YAML format. The response payload looks like
such:
```
{
  "status": "success",
  "data": {
    "yaml": <CONFIG>
  }
}
```
2017-08-13 22:21:18 +02:00
Karsten Weiss 5f5d77848e Fix 'predefined escaper "html" disallowed in template' in /targets (#3046) (#3050)
Issue #3046 is triggered by html/template changes in go1.9.

See https://tip.golang.org/pkg/html/template. Quote:

//   To ease migration to Go 1.9 and beyond, "html" and "urlquery" will
//   continue to be allowed as the last command in a pipeline. However, if the
//   pipeline occurs in an unquoted attribute value context, "html" is
//   disallowed. Avoid using "html" and "urlquery" entirely in new templates.

The commit also includes a trivial whitespace fix.
2017-08-11 18:31:46 +01:00
CuiHaozhi 31b6f8b04c discovery openstack: handle instances without ip
Signed-off-by: CuiHaozhi <cuihz@wise2c.com>
2017-08-11 12:36:12 -04:00
Björn Rabenstein f1067f4cf9 Merge pull request #3051 from prometheus/beorn7/web
Update web/ui/bindata.go
2017-08-10 17:16:51 +02:00
beorn7 6cf62fe8ba Update web/ui/bindata.go 2017-08-10 14:40:19 +02:00
Roman Khavronenko 245b8a0b37 Allow to collapse jobs at /targets page (#2628) 2017-08-09 17:10:30 +02:00
Pablo Andres Fuente c79a4db812 Adding tests for util/httputil/client (#3002)
Adding tests for util/httputil/client with a 100% coverage.
Removing the NewDeadlineRoundTripper from util/httputil/client because
is not used.
Adding a new test util to check http.Request in http.RoundTrip interface
implementors.
2017-08-09 13:23:57 +01:00
Tobias Schmidt 1ea9ab601e Merge pull request #2997 from emluque/2831-Healthy_Ready_Endpoints
Add `/-/healthy` and `/-/ready` endpoints #2831
2017-08-07 23:35:07 +02:00