Commit graph

421 commits

Author SHA1 Message Date
Fabian Reinartz 59f1e722df Return error on sample appending 2016-02-02 14:01:44 +01:00
Björn Rabenstein 9ea3897ea7 Merge pull request #1354 from prometheus/beorn7/storage
Rework the way to communicate backpressure (AKA suspended ingestion)
2016-02-01 15:10:13 +01:00
beorn7 ec08c9a391 Rework the way to communicate backpressure (AKA suspended ingestion)
This gives up on the idea to communicate throuh the Append() call (by
either not returning as it is now or returning an error as
suggested/explored elsewhere). Here I have added a Throttled() call,
which has the advantage that it can be called before a whole _batch_
of Append()'s. Scrapes will happen completely or not at all. Same for
rule group evaluations. That's a highly desired behavior (as discussed
elsewhere). The code is even simpler now as the whole ingestion buffer
could be removed.

Logging of throttled mode has been streamlined and will create at most
one message per minute.
2016-02-01 14:45:44 +01:00
beorn7 a7408bfb47 Unify duration parsing
It's actually happening in several places (and for flags, we use the
standard Go time.Duration...). This at least reduces all our
home-grown parsing to one place (in model).
2016-01-29 15:41:50 +01:00
Jimmi Dyson 9faa7515c6 Kubernetes SD: Refactor to handle missing Kubernetes events 2016-01-19 20:49:58 +00:00
Brian Brazil 4a829e63a2 Merge pull request #1299 from PrFalken/master
Support AirBnB's Smartstack Nerve client for SD
2016-01-18 13:31:04 +00:00
Julien Dehee 061fe2f364 Support AirBnB's Smartstack Nerve client for SD
nerve's registration format differs from serverset. With this commit
there is now a dedicated treecache file in util,
and two separate files for serverset and nerve.

Reference:
https://github.com/airbnb/nerve
2016-01-18 14:07:28 +01:00
Brian Brazil 7a5f019c40 Use up/down in UI for consistency with 'up' metric. 2016-01-12 12:09:20 +00:00
Brian Brazil 6b7629be27 Merge pull request #1242 from tommyulfsparre/watcher-fix
Reduces watches in serverset
2015-12-10 10:43:57 +00:00
Jimmi Dyson c12fb447b8 Kubernetes SD: Use first TCP service port as target port & clean up
example config

Fixes #1256
2015-12-08 10:29:40 +00:00
Tommy Ulfsparre 83e09422bf skip already watched child nodes. 2015-12-02 21:31:05 +01:00
Fabian Reinartz 29a69eecb8 Do not panic in Consul SD creation 2015-11-30 18:41:48 +01:00
Jimmi Dyson 2cca07381b KubernetesSD: Create targets for services as well as service endpoints 2015-11-18 14:15:39 +00:00
Brian Brazil 427bf29db1 Add in default port after relabelling.
For the SNMP and blackbox exporters where
the ports tends to not be 80/443 and indeed
there may not be a port this makes the relabelling
a bit simpler as you don't have to figure out this
logic exists and strip off the :80.

This is a breaking change for the example configs of
those exporters.
2015-11-08 11:42:18 +00:00
Brian Brazil fd2bd81cd8 Allow all instance labels in target groups
With the blackbox exporter, the instance label will commonly
be used for things other than hostnames so remove this restriction.
https://example.com or https://example.com/probe/me are some examples.

To prevent user error, check that urls aren't provided as targets
when there's no relabelling that could potentically fix them.
2015-11-07 14:35:20 +00:00
Fabian Reinartz 9cad147265 Merge pull request #1172 from federicobaldo/ec2_sd_improvements
Minor improvements to ec2 service discovery
2015-11-04 13:02:51 +01:00
Federico Baldo d14d2429ea Minor improvements to ec2 sd:
1. static credentials replaced with defaults.DefaultChainCredentials.
This change ensures that credentials are sourced form all possible
providers available with the aws sdk,           in the following order:
env variables, shared awsconfig file in user folder, ec2 instance role.

2. Added a few labels: AvailabilityZone, PublicDns, VpcId (if
available), SubnetId (if in Vpc)
2015-11-02 14:55:24 +01:00
Jimmi Dyson 87940ec213 Kubernetes SD: Rename masters to api_servers in config 2015-10-24 14:41:14 +01:00
Jimmi Dyson 7ff5cc66ea Kubernetes SD authentication options cleanup 2015-10-23 16:47:52 +01:00
Jimmi Dyson ea9a173008 Kubernetes SD: Use node name as instance label 2015-10-12 21:26:09 +01:00
Julius Volz d88aea7e6f Fix SD mechanism source prefix handling.
The prefixed target provider changed a pointerized target group that was
reused in the wrapped target provider, causing an ever-increasing chain
of source prefixes in target groups from the Consul target provider.

We now make this bug generally impossible by switching the target group
channel from pointer to value type and thus ensuring that target groups
are copied before being passed on to other parts of the system.

I tried to not let the depointerization leak too far outside of the
channel handling (both upstream and downstream) because I tried that
initially and caused some nasty bugs, which I want to minimize.

Fixes https://github.com/prometheus/prometheus/issues/1083
2015-10-09 14:08:22 +02:00
Julius Volz dec9fc9c32 Merge pull request #1148 from prometheus/fix-serverset-multiple-paths
Fix watching multiple Zookeeper paths in serverset SD.
2015-10-08 19:27:06 +02:00
Matt Jibson dcb4856d72 Add SD for Amazon EC2 instances 2015-10-06 18:36:17 -04:00
Julius Volz 60cf4015a4 Fix watching multiple Zookeeper paths in serverset SD.
Fix https://github.com/prometheus/prometheus/issues/1137
2015-10-06 15:54:54 +02:00
Fabian Reinartz e3b6ec9784 Switch to common/log 2015-10-03 10:21:43 +02:00
Jimmi Dyson 0d61605526 Kubernetes SD example: separate out cluster level components & services 2015-09-29 11:22:18 +01:00
Julius Volz 99e8fff872 Fix target manager CPU busyloop caused by bad done-channel handling.
Unfortunately this isn't nicely testable, as it's timing-dependent and
one would have to detect a stray goroutine doing a CPU busyloop...

Fixes https://github.com/prometheus/prometheus/issues/1114
2015-09-28 11:51:16 +02:00
Fabian Reinartz 097d810f37 Merge pull request #1120 from prometheus/flaky-test
retrieval: Reduce flakiness of TestTargetRunScraperScrapes
2015-09-28 09:57:16 +02:00
Brian Brazil ba6688bfce retrieval: Reduce flakiness of TestTargetRunScraperScrapes 2015-09-28 08:34:54 +01:00
Brian Brazil b03569267e retrieval: Add URL parameters to fullLabels too
Move all the special cases into one map, rather than
spreading the logic around.
2015-09-26 16:59:24 +01:00
Brian Brazil 50258929ac Retrieval: Show error message for failed test scrape
This is flaky, and I suspect it was due the to I/O timeout that I've
already fixed. In case that wasn't it, display the error should it
happen again.
2015-09-23 09:24:50 +01:00
Brian Brazil 4bc39dc60e retrieval: Reduce flakiness of TestTargetManagerChan
This will increase test time by a few hundred ms,
this is the 2nd most common cause of flakiness.
2015-09-23 09:00:37 +01:00
Brian Brazil 93145b960a retrieval: Reduce flakiness of target tests
Bump timeouts of tests where we don't want I/O timeouts.

Adjust the full channel test to be much more reliable,
by reducing the ingestion timeout from 1ms to 0.
2015-09-22 19:23:36 +01:00
Fabian Reinartz cac6eea434 Merge pull request #1105 from prometheus/consulnil
Fix nil panic on consul error
2015-09-22 14:55:31 +02:00
Fabian Reinartz 327152862c Update expfmt.NewDecoder usage 2015-09-22 12:11:28 +02:00
Fabian Reinartz 1ce89a4a0b Fix nil panic on consul error 2015-09-22 09:04:31 +02:00
Julius Volz af513468eb Fix some dead code, missing error checks, shadowings.
I applied
https://medium.com/@jgautheron/quality-pipeline-for-go-projects-497e34d6567
and was greeted with a deluge of warnings, most of which were not
applicable or really fixable realistically. These are some of the first
ones I decided to fix.
2015-09-14 12:21:34 +02:00
Jimmi Dyson 7ef9399920 Clean up kubernetes http response bodies 2015-09-11 11:44:28 +01:00
Anders Daljord Morken 9fb65a91af Close HTTP connections on HTTP errors too.
Move defer resp.Body.Close() up to make sure it's called even when the
HTTP request returns something other than 200 or Decoder construction
fails. This avoids leaking and eventually running out of file descriptors.
2015-09-10 22:41:05 +02:00
Fabian Reinartz 8456b7e12f Use go1.5.1 2015-09-10 12:11:44 +02:00
Jimmi Dyson a1574aa2b3 Move TLS options to scrape config
Fixes #1013, fixes #989
2015-09-09 09:52:21 +01:00
Julius Volz b7b7b2e883 Merge pull request #1050 from fabric8io/kubernetes-discovery
Kubernetes SD improvements
2015-09-04 14:58:11 +02:00
Jimmi Dyson d7a7fd4589 Kubernetes SD improvements
* Support multiple masters with retries against each master as required.
* Scrape masters' metrics.
* Add role meta label for node/service/master to make it easier for relabeling.
2015-09-04 11:31:20 +01:00
Fabian Reinartz cc1a2a2061 Remove attachment of global labels upon ingestion 2015-09-03 14:16:23 +02:00
Fabian Reinartz ebf417a282 Fix map initialization 2015-09-01 18:06:22 +02:00
Julius Volz f63a899744 Change config regexes to full-string matches.
This anchors all regular expressions entered via the config to match a
full string vs. a substring.

THIS IS A BREAKING CHANGE!

Fixes part of https://github.com/prometheus/prometheus/issues/996
2015-09-01 15:46:41 +02:00
Fabian Reinartz 542da6774e Fix draining of file watcher events 2015-08-28 12:17:22 +02:00
Daniel Lundin 4abf54b747 serverset: extract shard number from serverset data 2015-08-27 16:26:00 +02:00
Julius Volz 29eaa8c7cf Merge pull request #1030 from prometheus/fix-flakey-filesd
Fix flakey FileSD test.
2015-08-26 13:25:00 +02:00
Julius Volz 3fd5826589 Fix flakey FileSD test.
When the test ends, all files matching the watcher's glob are removed
via defer. In that moment, the draining goroutine may still be running
and then detect no files matching the configured glob just before the
test exits.

This is now solved by waiting for the draining goroutine to finish
before leaving the test function and thus causing the deferred file
removal.
2015-08-26 13:06:34 +02:00
Julius Volz 744d5d5a7a Merge pull request #1029 from prometheus/vet-fixes
Fix "go vet" errors.
2015-08-26 12:50:18 +02:00
Julius Volz 995d3b831d Fix most golint warnings.
This is with `golint -min_confidence=0.5`.

I left several lint warnings untouched because they were either
incorrect or I felt it was better not to change them at the moment.
2015-08-26 12:44:46 +02:00
Julius Volz 963ad82dcb Fix "go vet" errors.
I ignored all errors of the type "composite literal uses unkeyed
fields". Most of them are wrong because of
https://github.com/golang/go/issues/9171.
2015-08-26 02:05:04 +02:00
Fabian Reinartz 6664b77f36 Merge pull request #1021 from prometheus/appenders
move metric modifications into SampleAppenders
2015-08-25 17:47:55 +02:00
Fabian Reinartz 01834fa528 Move metric modifications into SampleAppenders 2015-08-25 15:32:37 +02:00
Fabian Reinartz d6d88f8950 Add missing license headers 2015-08-24 19:19:21 +02:00
Julius Volz d36a7f4e6f Fix busylooping in case of no target providers.
merge() closes the channel that handleUpdates() reads from when there
are zero configured target providers in the configuration. In that case,
the for-select loop in handleUpdates() entered a busy loop. It should
exit when the upstream channel is closed.
2015-08-24 16:42:28 +02:00
Fabian Reinartz 3a0145c09e Reenable blocked appending tests 2015-08-22 09:47:57 +02:00
Fabian Reinartz 438e232c9b Fix grouping of import blocks 2015-08-22 09:42:45 +02:00
Fabian Reinartz 6d0f58dcf3 sanitize scrape health recording code 2015-08-21 23:01:08 +02:00
Fabian Reinartz 25bf5fdaf5 Timeout sample appends 2015-08-21 18:04:35 +02:00
Fabian Reinartz 11a577fcd0 Switch to common/expfmt for extraction 2015-08-21 13:33:38 +02:00
Fabian Reinartz 306e8468a0 Switch from client_golang/model to common/model 2015-08-21 13:33:38 +02:00
Sharif Nassar 6cb519fe82 Add Consul ServiceID to the discovery meta labels. 2015-08-20 14:04:42 -07:00
Fabian Reinartz 0f5022c091 Add missing Kubernetes doc strings 2015-08-18 14:37:28 +02:00
Fabian Reinartz f592740bac Only exit static target provider on done 2015-08-18 11:51:53 +02:00
Julius Volz b4adf2723d Merge pull request #994 from robbiet480/consul-datacenter-name
Pass through current agent Consul datacenter name
2015-08-18 01:09:24 +02:00
Robbie Trencheny 48e461f7db Pass through current agent Consul datacenter name
Instead of only filling __meta_consul_dc when datacenter is set in
consul_sd_config this change fills the label based on what the agent
reports it's current data center is, if datacenter isn't manually set,
otherwise it uses whatever datacenter was set to.
2015-08-17 16:00:26 -07:00
Fabian Reinartz d0a90964c1 Fix license header 2015-08-17 19:51:12 +02:00
Fabian Reinartz eabbdc6603 Add missing license headers 2015-08-17 19:49:10 +02:00
Julius Volz 47a96bff1a Update constant names in comments. 2015-08-17 15:05:06 +02:00
Brian Brazil e1d5eb52f2 retrieval: Don't include unmatched source of regex in replacement.
ReplaceAllString only replaces the matching part of the regex,
the unmatched bits around it are left in place. This is not the
expected or desired behaviour as the replacement string should
be everything.

This may break users dependant on this behaviour, but
what they're doing is still possible.
2015-08-17 00:31:56 +01:00
Fabian Reinartz 3c6dd161d7 Scrape all services on empty services list. 2015-08-14 17:39:41 +02:00
Fabian Reinartz 9b9ff66212 Merge pull request #977 from prometheus/fabxc/target-dedup
Improve target discovery pipeline
2015-08-14 16:38:16 +02:00
Fabian Reinartz 8fa9ec278b Add application labels as meta labels
Removes built-in conditional scraping based on application's
'prometheus' label.
2015-08-14 15:34:02 +02:00
Fabian Reinartz f269943950 Adjust Kubernetes SD to pipeline changes 2015-08-14 13:30:27 +02:00
Fabian Reinartz 4e84b86510 Improve target discovery pipeline
Replace the TargetProvider Stop method with done channels
that ensure properly broadcasted shutdown of the whole pipeline.
2015-08-14 13:30:27 +02:00
Fabian Reinartz 15b4115a25 Merge pull request #986 from prometheus/fabxc/tpdoc
Clarify docs of TargetProvider
2015-08-14 12:07:15 +02:00
Fabian Reinartz 625374ee36 Clarify docs of TargetProvider 2015-08-14 12:02:22 +02:00
Fabian Reinartz f7e3722388 Rename __meta_dns_srv_name to __meta_dns_name
This is change potentially breaking relabeling rules.
2015-08-13 17:02:56 +02:00
Fabian Reinartz b964da4b75 Merge pull request #905 from fabric8io/kubernetes-discovery
Kubernetes discovery
2015-08-13 15:08:32 +02:00
Fabian Reinartz 24e91720ad Merge pull request #980 from prometheus/map-labels
Retrieval: Add relabel action to map labels names with a regex.
2015-08-13 14:36:59 +02:00
Brian Brazil 4e70a0a14e Retrieval: Add relabel action to map label names with a regex.
The intended use case is where a user has tags/labels coming
from metadata in Kubernetes or EC2, and wants to make
some subset of them into target labels.
2015-08-13 13:19:11 +01:00
Jimmi Dyson 923f8111d4 Initial Kubernetes discovery
Fixes #904
2015-08-13 10:38:52 +01:00
Miek Gieben caaa3de4ff Make HashMod use MD5 instead of FNV
MD5 will will distribute the inputs more uniformly over the output
space than FNV; leading to more evenly balanced load when using HashMod.
2015-08-13 09:42:07 +01:00
Fabian Reinartz 0138d37458 Improve unique target group sources.
Include position of same SD mechanisms within the same scrape configuration.
Move unique prefixing out of SD implementations and target manager into
its own interface.
2015-08-10 11:29:09 +02:00
Fabian Reinartz 54202bc5a8 Merge pull request #902 from xperimental/feature/marathon-discovery
retrieval/discovery: Service discovery using marathon API
2015-08-10 01:43:37 +02:00
Robert Jacob 4d0f974c42 Add service discovery using Marathon API. 2015-08-10 01:36:24 +02:00
Will Rouesnel 7810448dbe Add proxy_url parameter to allow specifying per-job HTTP proxy servers
Allow scrape_configs to have an optional proxy_url option which specifies
a proxy to be used for all connections to hosts in that config.

Internally this modifies the various client functions to take a *url.URL pointer
which currently must point to an HTTP proxy (but has been left open-ended to
allow the url format to be extended to support others, such as maybe SOCKS if
needed).
2015-08-08 04:29:27 +10:00
Jimmi Dyson da4c50a6cf Make scheme relabelable via discovery 2015-08-06 12:00:33 +01:00
Jimmi Dyson 52cf6b3e6e Configuration options for bearer tokens, client certs & CA certs
Fixes #918, fixes #917
2015-08-04 17:18:46 +01:00
Florian Pfitzer 1fa0b0f253 fix consul port label 2015-07-31 16:20:17 +00:00
Brian Brazil adf7f16d1a Merge pull request #934 from prometheus/query-params
Retrieval: Make it possible to relabel query params
2015-07-31 11:01:45 +01:00
Brian Brazil d8875d17d8 Retrieval: Make it possible to relabel query params
This only allows relabelling the first value
for a given parameter, this should be sufficient in practice.
2015-07-31 10:09:28 +01:00
Johannes 'fish' Ziemke 6e7d743cd4 Merge pull request #946 from prometheus/add-sd-dns-a
Add support for A record based DNS SD
2015-07-30 16:01:47 +02:00
Johannes 'fish' Ziemke 9ab340e95e Add support for A record based DNS SD
If using A records, the user needs to specify "port" and set "type" to
"A".
2015-07-30 15:55:38 +02:00
beorn7 645f6772e5 Add Consul Address, ServicePort, and ServiceAddress to the meta labels.
In setups where the ServiceAddress is the relevant address for
scraping, users can relabel the `__address__` label to ServiceAddress
+ ":" + ServicePort.

This needs to be documented, of course. Will do once this is LGTM'd.
2015-07-22 18:19:13 +02:00
Julius Volz 9d98910fca Revert "Use Consul ServiceAddress instead of Address when set"
This reverts commit 0ac7e7217e.

See discussion on https://github.com/prometheus/prometheus/pull/812 for
reasoning. While fixing one use case, it breaks others, and we need a
more generic way of handling this.
2015-07-22 13:04:29 +02:00
Fabian Reinartz d53cc7935d retrieval: avoid race conditions 2015-07-08 21:27:52 +02:00
Brian Brazil 3d268d681e retrieval: Handle serverset node not existing.
This stops configuration loading hanging if
the Znode doesn't exist, and retries until the
node does exist.
2015-07-01 13:56:31 +01:00
Fabian Reinartz 080e067601 Merge pull request #832 from prometheus/fabxc/target-test
retrieval: double timeout in target scrape test.
2015-06-25 17:23:52 +02:00
Brian Brazil 52859b8033 Merge pull request #836 from prometheus/shard
Add 'hashmod' relabel action.
2015-06-24 21:40:10 +01:00
Brian Brazil 682f949ab1 Add 'hashmod' relabel action.
This takes the modulus of a hash of some labels.
Combined with a keep relabel action, this allows
for sharding of targets across multiple prometheus
servers.
2015-06-24 21:14:53 +01:00
Fabian Reinartz 23862c92c4 retrieval/discovery: refresh services in Consul to recover from missing events. 2015-06-24 17:48:27 +02:00
Fabian Reinartz c292979374 retrieval: double timeout in target scrape test. 2015-06-23 21:59:55 +02:00
Julius Volz d868264bb8 Improve UI of /alerts page.
Changes to the UI:
- "Active Since" timestamps are now human-readable.
- Alerting rules are now pretty-printed better.
- Labels are no longer just strings, but alert bubbles (like we do on
  the status page for base labels).
- Alert states and target health states are now capitalized in the
  presentation layer rather than at the source.
2015-06-23 18:48:45 +02:00
Fabian Reinartz 53b9d5917d web: improve target URL handling and display. 2015-06-23 13:45:15 +02:00
Fabian Reinartz dc7d27ab9a retrieval: add honor label handling and parametrized querying.
This commit adds the honor_labels and params arguments to the scrape
config. This allows to specify query parameters used by the scrapers
and handling scraped labels with precedence.
2015-06-23 13:45:14 +02:00
Fabian Reinartz 459d18cf18 Merge pull request #812 from Marmelatze/consul_services
Use Consul ServiceAddress instead of Address when set
2015-06-17 20:10:52 +02:00
Florian Pfitzer 0ac7e7217e Use Consul ServiceAddress instead of Address when set 2015-06-17 15:39:42 +02:00
Brian Brazil 4d895242f9 Add support for Zookeeper Serversets for SD.
It can discover an entire tree of serversets, or just one.
2015-06-16 11:02:08 +01:00
Brian Brazil 0dbae36d36 Allow ingested metrics to be relabeled.
The main purpose of this is to allow for blacklisting
of expensive metrics as a tactical option.
It could also find uses for renaming and removing labels
from federation.
2015-06-13 15:18:27 +01:00
Brian Brazil 58ceae82bc Revert "Allow ingested metrics to be relabeled."
This reverts commit f2f26ca08f.

Was accidentally pushed to master instead of a branch for PR.
2015-06-12 22:12:26 +01:00
Brian Brazil f2f26ca08f Allow ingested metrics to be relabeled.
The main purpose of this is to allow for blacklisting
of expensive metrics as a tactical option.
It could also find uses for renaming and removing labels
from federation.
2015-06-12 22:06:30 +01:00
Fabian Reinartz b5fe2e9afe Merge pull request #773 from prometheus/fabxc/simple-cfg
config: simplify default config handling.
2015-06-08 16:22:06 +02:00
Brian Brazil b8b1d3cbac Web: Add pre-relabel labels to status page.
Figuring out what's going on with the new service discovery
and labels is difficult. Add a popover with the labels
to the target table to make things simpler, and help
discovery of potentially useful labels.
2015-06-08 12:19:01 +01:00
Fabian Reinartz 0af1cff8af config: simplify default config handling. 2015-06-06 09:04:04 +02:00
Fabian Reinartz 8214b4ee78 retrieval/discovery: surround __meta_consul_tags value with tag seperators. 2015-06-05 19:18:34 +02:00
Fabian Reinartz 280d11dca8 main: exit on invalid rule files on startup. 2015-06-02 18:44:41 +02:00
Fabian Reinartz 0de6edbdfc Move pkg/ to util/ 2015-06-01 21:12:32 +02:00
Fabian Reinartz dfaf31a1da Move web/httputils to pkg/httputil and add DeadlineClient to it 2015-06-01 21:12:31 +02:00
Fabian Reinartz a4f179230a Merge pull request #744 from prometheus/fabxc/fix-labels
Fix discarding of labels in file target groups
2015-05-27 19:57:15 +02:00
Fabian Reinartz e9b344abee Fix discarding of labels in file target groups 2015-05-27 18:52:44 +02:00
Fabian Reinartz 8b7e5f9184 Stop holding TargetManager lock when stopping components.
TargetProviders may flush some last changes to the target manager
before actually stopping. To properly read those form the channel
the target manager must not be locked while stopping a provider.
2015-05-27 12:41:37 +02:00
Brian Brazil f34de493d5 Add increase() function, to replace delta(..., 1).
This calculates how much a counter increases over
a given period of time, which is the area under the curve
of it's rate.

increase(x[5m]) is equivilent to rate(x[5m]) * 300.
2015-05-26 22:49:21 +01:00
Fabian Reinartz efb39cfd4e Fix file SD test 2015-05-23 21:20:39 +02:00
Julius Volz 267fd34156 Switch Prometheus to use github.com/prometheus/log.
This change is conceptually very simple, although the diff is large. It
switches logging from "github.com/golang/glog" to
"github.com/prometheus/log", while not actually changing any log
messages. V(1)-style logging has been changed to be log.Debug*().
2015-05-20 18:19:32 +02:00
Fabian Reinartz 7143dff02f Add initial implementation for SD via Consul.
This commit adds service discovery using Consul's HTTP API and watches
(long polling) to retrieve target updates.
2015-05-20 11:46:24 +02:00
Fabian Reinartz b0c181dc0d Add Consul SD configuration. 2015-05-20 11:46:24 +02:00
Fabian Reinartz ff832d2e03 Attach __meta_filepath label to file SD targets. 2015-05-19 15:49:38 +02:00
Fabian Reinartz 8de50619f1 Increase target test wait times
On slow systems such as Travis CI occasionally the tests fail
because the wait times are too short.
2015-05-19 12:06:52 +02:00
Fabian Reinartz 385919a65a Avoid inter-component blocking if ingestion/scraping blocks.
Appending to the storage can block for a long time. Timing out
scrapes can also cause longer blocks. This commit avoids that those
blocks affect other compnents than the target itself.
Also the Target interface was removed.
2015-05-18 17:58:51 +02:00
Fabian Reinartz 1a2d57b45c Move template functionality out of target.
The target implementation and interface contain methods only serving a
specific purpose of the templates. They were moved to the template
as they operate on more fundamental target data.
2015-05-18 13:35:43 +02:00
Fabian Reinartz dbc08d390e Move target status data into its own object 2015-05-18 11:15:42 +02:00
Fabian Reinartz 9ca47869ed Provide full SD configs to discovery constructors.
Some SD configs may have many options. To be readable and consistent, make
all discovery constructors receive the full config rather than the separate
arguments.
2015-05-15 14:54:29 +02:00
Fabian Reinartz 93548a8882 Add initial file based service discovery.
This commits adds file based service discovery which reads target
groups from specified files. It detects changes based on file watches
and regular refreshes.
2015-05-15 14:44:54 +02:00
Fabian Reinartz d5aa012fd0 Make HTTP basic auth configurable for scrape targets. 2015-05-15 12:47:50 +02:00
Fabian Reinartz bb540fd9fd Implement config reloading on SIGHUP.
With this commit, sending SIGHUP to the Prometheus process will reload
and apply the configuration file. The different components attempt
to handle failing changes gracefully.
2015-05-13 16:49:46 +02:00
Fabian Reinartz 86087120dd Replace example config with new YAML format. 2015-05-11 18:14:07 +02:00
Fabian Reinartz 5fbde88919 Switch config to YAML format. 2015-05-07 16:52:14 +02:00
Fabian Reinartz b5a8f7b8fa Cleanup, test, and document config. 2015-04-30 21:17:19 +02:00
Fabian Reinartz 945c49a2dd Add relabelling to target management.
This commit adds a relabelling stage on the set of base
labels from which a target is created. It allows to drop
targets and rewrite any regular or internal label.
2015-04-30 18:46:33 +02:00
Fabian Reinartz 0b619b46d6 Change JobConfig to ScrapeConfig.
This commit changes the configuration interface from job configs to scrape
configs. This includes allowing multiple ways of target definition at once
and moving DNS SD to its own config message. DNS SD can now contain multiple
DNS names per configured discovery.
2015-04-28 23:18:55 +02:00
Fabian Reinartz 5015c2a0e8 Make target manager source based.
This commit shifts responsibility for maintaining targets from providers and
pools to the target manager. Target groups have a source name that identifies
them for updates.
2015-04-24 15:49:35 +02:00
Fabian Reinartz 4f8673aa88 Simplify update sync for targets, format config fixtures. 2015-04-19 10:36:26 +02:00
Fabian Reinartz 36184f3530 Show correct error on wrong DNS response. 2015-04-11 16:14:38 +02:00
beorn7 fa1935a644 Remove /api/targets call and do not show job and instance labels on status.
/api/targets was undocumented and never used and also broken.

Showing instance and job labels on the status page (next to targets)
does not make sense as those labels are set in an obvious way.

Also add a doc comment to TargetStateToClass.
2015-03-18 18:53:43 +01:00
beorn7 be11cb2b07 Remove the sample ingestion channel.
The one central sample ingestion channel has caused a variety of
trouble. This commit removes it. Targets and rule evaluation call an
Append method directly now. To incorporate multiple storage backends
(like OpenTSDB), storage.Tee forks the Append into two different
appenders.

Note that the tsdb queue manager had its own queue anyway. It was a
queue after a queue... Much queue, so overhead...

Targets have their own little buffer (implemented as a channel) to
avoid stalling during an http scrape. But a new scrape will only be
started once the old one is fully ingested.

The contraption of three pipelined ingesters was removed. A Target is
an ingester itself now. Despite more logic in Target, things should be
less confusing now.

Also, remove lint and vet warnings in ast.go.
2015-03-15 14:08:22 +01:00
Julius Volz 140eede5e0 Rename UNREACHABLE to UNHEALTHY.
The current wording suggests that a target is not reachable at all,
although it might also get set when the target was reachable, but there
was some other error during the scrape (invalid headers or invalid
scrape content). UNHEALTHY is a more general wording that includes all
these cases.

For consistency, ALIVE is also renamed to HEALTHY.
2015-03-07 23:18:18 +01:00
Sergiusz 'q3k' Bazański 0d0bb3c030 Change instance identifiers to be host:port
This changes the PublicURL function into InstanceIdentifier, which now
returns a simple <host>:<port> string instead of a full URL.
2015-02-20 16:21:13 +01:00
Sergiusz 'q3k' Bazański bb69a3d284 Hide HTTP auth parts from URL
This  instroduces an extra function in the Target interface (PublicURL)
which is used to populate the instance field in scraped metrics.
2015-02-19 18:58:47 +01:00
Julius Volz af627bb2b9 Copy vendored deps manually instead of using Godeps.
We were using Godep incorrectly (cloning repos from the internet during
build time instead of including Godeps/_workspace in the GOPATH via
"godep go"). However, to avoid even having to fetch "godeps" from the
internet during build, this now just copies the vendored files into the
GOPATH.

Also, the protocol buffer library moved from Google Code to GitHub,
which is reflected in these updates.

This fixes https://github.com/prometheus/prometheus/issues/525
2015-02-17 02:08:56 +01:00
beorn7 11b3c2387c Improvements after review.
- Increase samplesQueueCapacity.

- Improve docstring for the above.

- Accept a short waiting period for the ingest channel to become
  ready. This should depend on the http timeout, but 100ms is probably
  good enough to cushion bursts bigger than samplesQueueCapacity,
  while it is unlikely that anybody ever will set an HTTP timeout
  similarly short.
2015-02-10 14:58:46 +01:00
beorn7 0f191629c6 Next try to deal with backed-up ingestion.
This is now not even trying to throttle in a benign way, but creates a
fully-fledged error. Advantage: It shows up very visible on the status
page. Disadvantage: The server does not really adjusts to a lower
scraping rate. However, if your ingestion backs up, you are in a very
irregulare state, I'd say it _should_ be considered an error and not
dealt with in a more graceful way.

In different news: I'll work on optimizing ingestion so that we will
not as easily run into that situation in the first place.
2015-02-09 17:32:47 +01:00
beorn7 16a1a6d324 Add another check for stopped scraper. 2015-02-06 18:30:33 +01:00
beorn7 5678a86924 Throttle scraping if a scrape took longer than the configured interval.
The simple algorithm applied here will increase the actual interval
incrementally, whenever and as long as the scrape itself takes longer
than the configured interval. Once it takes shorter again, the actual
interval will iteratively decrease again.
2015-02-06 16:44:56 +01:00
Bjoern Rabenstein 5859b74f1b Clean up license issues.
- Move CONTRIBUTORS.md to the more common AUTHORS.
- Added the required NOTICE file.
- Changed "Prometheus Team" to "The Prometheus Authors".
- Reverted the erroneous changes to the Apache License.
2015-01-21 20:07:45 +01:00
Bjoern Rabenstein b09453af1d Adjust to new client_golang API. 2015-01-21 15:42:25 +01:00
Julius Volz d6b9e97655 Remove extraction.Result type, simplify code. 2015-01-08 16:34:01 +01:00
juliusv 917acb6baf Merge pull request #429 from brian-brazil/scrape-time
Have scrape time as a pseudovariable, not a prometheus variable.
2015-01-02 13:22:04 +01:00
Brian Brazil e56786b221 Have scrape time as a pseudovariable, not a prometheus variable.
This ensures it has the right timestamp, and is easier to work with.

Switch sd variable away from 'outcome', using total/failed instead.
2014-12-27 00:39:33 +00:00
Brian Brazil 89c43dd0d7 Sort targets on the status page.
Change-Id: I6b59c97ab50093c50b608e29be2304475bc5d9f6
2014-12-26 13:14:19 +00:00
Johannes 'fish' Ziemke ff95a52b0f Rename Address to URL
The "Address" is actually a URL which may contain username and
password. Calling this Address is misleading so we rename it.

Change-Id: I441c7ab9dfa2ceedc67cde7a47e6843a65f60511
2014-12-18 12:18:16 +01:00
Bjoern Rabenstein b1e4956142 Apply a giant code cleanup.
Essentially:

- Remove unused code.

- Make it 'go vet' clean. The only remaining warnings are in generated code.

- Make it 'golint' clean. The only remaining warnings are in gerenated code.

- Smoothed out same minor things.

Change-Id: I3fe5c1fbead27b0e7a9c247fee2f5a45bc2d42c6
2014-12-10 16:16:49 +01:00
Bjoern Rabenstein 89bb376bce Reduce lock-protected area during scrape.
Change-Id: Iaa7faa7c916b1890b568d05bd8bfff6299b6767d
2014-12-05 19:40:41 +01:00
Bjoern Rabenstein fee88a7a77 Remove the remaining races, new and old.
Also, resolve a few other TODOs.

Change-Id: Icb39b5a5e8ca22ebcb48771cd8951c5d9e112691
2014-12-03 18:07:23 +01:00
Bjoern Rabenstein 14bda4180c Changes after pair code review.
Change-Id: Ib72d40f8e9027818cfbbd32a7a7201eebda07455
2014-11-25 17:12:59 +01:00
Bjoern Rabenstein a2feed343a Convert another occurrence from chan bool to chan struct{}.
Change-Id: I11ba127a934ee3aec0fcd139ad32a7751cff77a0
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein 74c143c4c9 Improve scraper shutdown time.
- Stop target pools in parallel.
- Stop individual scrapers in goroutines, too.
- Timing tweaks.

Change-Id: I9dff1ee18616694f14b04408eaf1625d0f989696
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein 92156ee89d Drain the newBaseLabels channel upon shutdown.
This should help cut down shutdown times.

Change-Id: I6e70a598a9e49aa6eeeb2034105b1bc6e9014324
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein 6b37e47f9e Remove unused metrics.
Change-Id: Icf03ba4ce92a5e38daf12930f9661daba79c83bb
2014-11-25 17:09:03 +01:00
Bjoern Rabenstein 4fc8ad6677 Fix retrieval unit tests.
Change-Id: I299b71406b59539230e5182ccc37bc8a83af60b3
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein b3ed9aa7a2 Clean up start-up and shut-down.
Change-Id: Idff4bbb0a15a9f879bfbb3da5b1025179cab5e2c
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein 4447708c9f Fix a race in target.go.
Also, fix problems in shutdown.
Starting serving and shutdown still has to be cleaned up properly.
It's a mess.

Change-Id: I51061db12064e434066446e6fceac32741c4f84c
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein 38fc24d0ed Fix targetpool_test.go and other tests.
Change-Id: I91a4dd1d39e01f174e1aaae653ce1ed7aecaa624
2014-11-25 17:08:26 +01:00
Julius Volz 7f5d3c2c29 Fix and improve the fp locker.
Benchmark:
$ go test -bench 'Fingerprint' -test.run 'Fingerprint' -test.cpu=1,2,4

OLD
BenchmarkFingerprintLockerParallel        500000              3618 ns/op
BenchmarkFingerprintLockerParallel-2      100000             12257 ns/op
BenchmarkFingerprintLockerParallel-4      500000             10164 ns/op
BenchmarkFingerprintLockerSerial        10000000               283 ns/op
BenchmarkFingerprintLockerSerial-2      10000000               284 ns/op
BenchmarkFingerprintLockerSerial-4      10000000               288 ns/op

NEW
BenchmarkFingerprintLockerParallel       1000000              1018 ns/op
BenchmarkFingerprintLockerParallel-2     1000000              1164 ns/op
BenchmarkFingerprintLockerParallel-4     2000000               910 ns/op
BenchmarkFingerprintLockerSerial        50000000                56.0 ns/op
BenchmarkFingerprintLockerSerial-2      50000000                47.9 ns/op
BenchmarkFingerprintLockerSerial-4      50000000                54.5 ns/op

Change-Id: I3c65a43822840e7e64c3c3cfe759e1de51272581
2014-11-25 17:07:45 +01:00
Bjoern Rabenstein e0a6cb281e Fix the accept header.
A '/' is a separator and has to be in a quoted string.

Change-Id: If7a3a847f84f8f709074d05dc98b5b21e954030c
2014-11-25 17:02:00 +01:00
Brian Brazil 5edf689133 Stagger scrapes to spread out load.
Change-Id: Ib141b271e4adfb817886871f86051c207b05cf35
2014-11-25 17:02:00 +01:00
Bjoern Rabenstein 1909686789 Make metrics exported by the Prometheus server itself more consistent.
- Always spell out the time unit (e.g. milliseconds instead of ms).

- Remove "_total" from the names of metrics that are not counters.

- Make use of the "Namespace" and "Subsystem" fields in the options.

- Removed the "capacity" facet from all metrics about channels/queues.
  These are all fixed via command line flags and will never change
  during the runtime of a process. Also, they should not be part of
  the same metric family. I have added separate metrics for the
  capacity of queues as convenience. (They will never change and are
  only set once.)

- I left "metric_disk_latency_microseconds" unchanged, although that
  metric measures the latency of the storage device, even if it is not
  a spinning disk. "SSD" is read by many as "solid state disk", so
  it's not too far off. (It should be "solid state drive", of course,
  but "metric_drive_latency_microseconds" is probably confusing.)

- Brian suggested to not mix "failure" and "success" outcome in the
  same metric family (distinguished by labels). For now, I left it as
  it is. We are touching some bigger issue here, especially as other
  parts in the Prometheus ecosystem are following the same
  principle. We still need to come to terms here and then change
  things consistently everywhere.

Change-Id: If799458b450d18f78500f05990301c12525197d3
2014-11-25 17:02:00 +01:00
Brian Brazil 4a2b96f848 Remove backoff on scrape failure.
Having metrics with variable timestamps inconsistently
spaced when things fail will make it harder to write correct rules.

Update status page, requires some refactoring to insert a function.

Change-Id: Ie1c586cca53b8f3b318af8c21c418873063738a8
2014-11-25 17:02:00 +01:00
Julius Volz 1bb7074fec Fix HTTP connection leak upon non-OK status.
Change-Id: Ie7fbd7dcc089b8306b40631be3e3d736c23c1cd3
2014-11-25 17:02:00 +01:00
Bjoern Rabenstein bacc31d5cc Remove work-around that required copying all bytes of a scrape.
Now that the subtle bug in matttproud/golang_protobuf_extensions is
fixed, we do not need to copy the bytes of a scrape into a buffer
first before starting to parse it.

Change-Id: Ib73ecae16173ddd219cda56388a8f853332f8853
2014-11-25 17:01:59 +01:00
Bjoern Rabenstein 8956faeccb Migrate to new client_golang.
This change will only be submitted when the new client_golang has been
moved to the new version.

Change-Id: Ifceb59333072a08286a8ac910709a8ba2e3a1581
2014-11-25 17:01:59 +01:00
Bjoern Rabenstein 814e479723 Treat non-200 HTTP response as error.
Change-Id: I2a9f3b47012b3c4839be53aa44c66d16dd41a24a
2014-11-25 17:01:59 +01:00
Bjoern Rabenstein ca6a4fccef Weed out our homegrown test.Tester.
The Go stdlib has testing.TB now, which fulfills the exact same
purpose.

Change-Id: I0db9c73400e208ca376b932a02b7e3402234b87c
2014-05-21 19:27:24 +02:00
Brian Brazil 23255f1499 Fix negative Next Retrieval on status page.
Change-Id: Ifa754034660a251fee71f166dbf057697ec4e872
2014-05-12 15:24:34 +01:00
Bjoern Rabenstein 64811caaec Make Prometheus announce its new super-power: text format!
Change-Id: Ia2ddfb28999c145e4d46c395381a9bf89d43148c
2014-04-22 18:44:52 +02:00
Julius Volz 84df022025 Cleanup server address handling, support IPv6.
This fixes https://github.com/prometheus/prometheus/issues/377, as
IPv6 server addresses are now handled correctly.

Change-Id: Iebde7cfdadb0a52041472517e6fdcff4303a25ab
2014-03-09 23:31:30 +01:00
Julius Volz b382e8b7bd Remove overly verbose DNS-SD logging line.
Change-Id: Ie4534437ab88b9a6b99f5cb6c2f32c9588c1fff6
2014-01-24 16:09:41 +01:00
Julius Volz 0378c2ca1f Nonexistent labels in BY-clauses shouldn't propagate to result.
This fixes bug 2. of https://github.com/prometheus/prometheus/issues/374

Change-Id: Ia4a13153616bafce5bf10597966b071434422d09
2014-01-24 16:05:30 +01:00
Stuart Nelson 48a6326d25 Added DNS-SD lookup counter for successful/unsuccessful lookups
Change-Id: I0a71e994a989cecace280b5134a31ebc2ace7591
2013-12-16 08:48:56 -05:00
Julius Volz fb44580110 Cleanup/fix program termination sequence.
Change-Id: I2bc58a2583fb079c9ef383cfc7a5e0fbe613f1cd
2013-12-11 15:40:32 +01:00
Julius Volz 740d448983 Use custom timestamp type for sample timestamps and related code.
So far we've been using Go's native time.Time for anything related to sample
timestamps. Since the range of time.Time is much bigger than what we need, this
has created two problems:

- there could be time.Time values which were out of the range/precision of the
  time type that we persist to disk, therefore causing incorrectly ordered keys.
  One bug caused by this was:

  https://github.com/prometheus/prometheus/issues/367

  It would be good to use a timestamp type that's more closely aligned with
  what the underlying storage supports.

- sizeof(time.Time) is 192, while Prometheus should be ok with a single 64-bit
  Unix timestamp (possibly even a 32-bit one). Since we store samples in large
  numbers, this seriously affects memory usage. Furthermore, copying/working
  with the data will be faster if it's smaller.

*MEMORY USAGE RESULTS*
Initial memory usage comparisons for a running Prometheus with 1 timeseries and
100,000 samples show roughly a 13% decrease in total (VIRT) memory usage. In my
tests, this advantage for some reason decreased a bit the more samples the
timeseries had (to 5-7% for millions of samples). This I can't fully explain,
but perhaps garbage collection issues were involved.

*WHEN TO USE THE NEW TIMESTAMP TYPE*
The new clientmodel.Timestamp type should be used whenever time
calculations are either directly or indirectly related to sample
timestamps.

For example:
- the timestamp of a sample itself
- all kinds of watermarks
- anything that may become or is compared to a sample timestamp (like the timestamp
  passed into Target.Scrape()).

When to still use time.Time:
- for measuring durations/times not related to sample timestamps, like duration
  telemetry exporting, timers that indicate how frequently to execute some
  action, etc.

*NOTE ON OPERATOR OPTIMIZATION TESTS*
We don't use operator optimization code anymore, but it still lives in
the code as dead code. It still has tests, but I couldn't get all of them to
pass with the new timestamp format. I commented out the failing cases for now,
but we should probably remove the dead code soon. I just didn't want to do that
in the same change as this.

Change-Id: I821787414b0debe85c9fffaeb57abd453727af0f
2013-12-03 09:11:28 +01:00
Johannes 'fish' Ziemke 8c08a5031f Add search domain support to SRV lookups
This adds search domain support by trying to resolve a name by
appending each search domain configured in /etc/resolv.conf until
the query succeeds (NOERROR) and has at least one answer.

Change-Id: Ibdc5138c5d8cc049e11fab90c3d5243d5a06852c
2013-10-29 17:19:49 +01:00
Julius Volz 274934bcd3 Revert "Revert "Merge pull request #317 from prometheus/fix/miekg-dns-for-srv""
This reverts commit 88099328d1.

Change-Id: I7bf74de5fda458e2e6f9eea2eacd0e256f95bdee
2013-09-10 17:48:05 +02:00
Johannes 'fish' Ziemke 88099328d1 Revert "Merge pull request #317 from prometheus/fix/miekg-dns-for-srv"
This reverts commit e3bc6fc9dc, reversing
changes made to 1cf9e5840a.

Conflicts:
	retrieval/target_provider.go

Change-Id: Icb6e98fb30419e9e2fe9b686c243702ced372014
2013-08-30 16:32:51 +02:00
Julius Volz 788587426b Make scrape timeouts configurable per job.
Change-Id: I77a7514ad9e7969771f873d63d6353ec50082a62
2013-08-19 12:21:47 +02:00
Julius Volz d69b85e6c9 Add global label support via Ingesters. 2013-08-13 16:54:15 +02:00
Julius Volz 0003027dce Add needed trailing spaces in logs. 2013-08-12 18:22:48 +02:00
Julius Volz aa5d251f8d Use github.com/golang/glog for all logging. 2013-08-12 17:54:36 +02:00