Commit graph

354 commits

Author SHA1 Message Date
Fabian Reinartz 84f74b9a84 Apply new scrape config on reload.
This commit updates a target set's scrape configuration
on reload. This will cause all running scrape loops to be
stopped and started again with new parameters.
2016-03-01 13:50:51 +01:00
Fabian Reinartz 02f635dc24 Remove interval/timeout from Target internals 2016-03-01 13:50:51 +01:00
Fabian Reinartz 775316f8d2 Move appender construction from Target to scrapePool 2016-03-01 13:50:51 +01:00
Fabian Reinartz fbe251c2df Fix scrape interval length calculation 2016-03-01 13:48:36 +01:00
Fabian Reinartz 1a3253e8ed Make scrape time unambigious.
This commit changes the scraper interface to accept a timestamp
so the reported timestamp by the caller and the timestamp
attached to samples does not differ.
2016-03-01 13:48:36 +01:00
Fabian Reinartz 2bb8ef99d1 Test scrape loop behavior. 2016-03-01 13:48:36 +01:00
Fabian Reinartz c7bbe95597 Remove outdated target tests 2016-03-01 13:48:36 +01:00
Fabian Reinartz 05de8b7f8d Extract target scraping into scrape loop.
This commit factors out the scrape loop handling into
its own data structure.
For the transition it will be directly attached to the
target.
2016-03-01 13:48:36 +01:00
Fabian Reinartz cebba3efbb Simplify and fix TargetManager reloading 2016-03-01 13:48:36 +01:00
Fabian Reinartz da99366f85 Consolidate Target.Update into constructor.
The Target.Update method is no longer needed.
2016-03-01 13:48:36 +01:00
Fabian Reinartz d15adfc917 Preserve target state across reloads.
This commit moves Scraper handling into a separate scrapePool type.
TargetSets only manage TargetProvider lifecycles and sync the
retrieved updates to the scrapePool.

TargetProviders are now expected to send a full initial target set
within 5 seconds. The scrapePools preserve target state across reloads
and only drop targets after the initial set was synced.
2016-03-01 13:48:36 +01:00
Fabian Reinartz 5b30bdb610 Change TargetProvider interface.
This commit changes the TargetProvider interface to use a
context.Context and send lists of TargetGroups, rather than
single ones.
2016-03-01 13:48:36 +01:00
Fabian Reinartz bb6dc3ff78 Remove old tests 2016-03-01 13:48:36 +01:00
Fabian Reinartz 5bfa4cdd46 Simplify target update handling.
We group providers by their scrape configuration. Each provider produces
target groups with an unique identifier.

On stopping a set of target providers we cancel the target providers,
stop scraping the targets and wait for the scrapers to finish.

On configuration reload all provider sets are stopped and new ones
are created. This will make targets disappear briefly on configuration
reload. Potentially scrapes are missed but due to the consistent
scrape intervals implemented recently, the impact is minor.
2016-03-01 13:48:36 +01:00
Jimmi Dyson e59b7c15a3 Kubernetes SD: Fix node IP discovery 2016-03-01 12:24:52 +00:00
beorn7 33a50e69f7 Fix a deadlock
Double acquisition of the RLock usually doesn't blow up, but if the
write lock is called for between the two RLock's, we are deadlocked.

This deadlock does not exist in release-0.17, BTW.
2016-02-29 16:34:29 +01:00
beorn7 fd5108b038 Fix a targetmanager test 2016-02-22 16:43:48 +01:00
Fabian Reinartz 6df1f49c13 Remove fullLabels method and fix target updating
With recent changes to a Target's internal data representation
updating by fullLabels() assigns the additional default
instance label. This breaks target identity comparison and causes
identical targets from service discovery to be constantly swapped.
2016-02-22 13:06:30 +01:00
Fabian Reinartz 825831e98f Use fingerprint for target identity comparison
So far we were using the InstanceIdentifier to compare equality of targets.
This is not always accurate, for example for the blackbox exporter where the 
actual target is in the parameter.
2016-02-17 16:34:53 +01:00
Fabian Reinartz 66767121ab Handle scrape timeout on request.
For historic reasons we were enforcing a timeout directly
via the TCP dialer. This is no longer necessary for quite a while now.
Switching to context.Context will allow us to properly terminate
requests on shutdown as well.
2016-02-16 11:46:02 +01:00
Julius Volz 293486c7b1 Remove old superfluous calls to setLastScrape().
This is called from within the scrape()->report() flow now.

See https://github.com/prometheus/prometheus/pull/1394/files#r52945817
2016-02-15 22:42:24 +01:00
Fabian Reinartz a0078ec84c Merge pull request #1394 from prometheus/scraperef2
Refactor and test appender modifications
2016-02-15 21:19:40 +01:00
Fabian Reinartz 463dd3ea06 Refactor target scrape reporting. 2016-02-15 18:06:15 +01:00
Fabian Reinartz cd28b88b08 Fix wrong EOF error on successful target scraping 2016-02-15 17:23:04 +01:00
Fabian Reinartz 27d71b08d1 Factor out appender wrapping 2016-02-15 16:47:39 +01:00
Fabian Reinartz fe7e91e2eb Make scraping offset consistent.
To evenly distribute scraping load we currently rely on random
jittering. This commit hashes over the target's identity and calculates
a consistent offset. This also ensures that scrape intervals
are constantly spaced between config/target changes.
2016-02-15 16:46:29 +01:00
Fabian Reinartz a06bc75519 Remove occurrences of 'base' labels 2016-02-15 10:36:57 +01:00
Fabian Reinartz 0d44248fb8 Cleanup cluttered test data 2016-02-13 10:13:38 +01:00
Fabian Reinartz 65eba080a0 Cleanup internal target data 2016-02-13 10:13:38 +01:00
Julius Volz 9b6d69610a Fix various typos in comments.
Helpfully reported by
https://goreportcard.com/report/github.com/prometheus/prometheus :)
2016-02-10 03:47:00 +01:00
Julius Volz 3728b5872f Fix target update error handling.
Fixes https://github.com/prometheus/prometheus/issues/1378
2016-02-08 21:42:59 +01:00
Fabian Reinartz 1f877f3d2a Fix deadlock, structure target logging 2016-02-03 10:39:34 +01:00
Fabian Reinartz d0d2c38c68 Fix tests for append API changes 2016-02-03 10:17:08 +01:00
Fabian Reinartz 59f1e722df Return error on sample appending 2016-02-02 14:01:44 +01:00
Björn Rabenstein 9ea3897ea7 Merge pull request #1354 from prometheus/beorn7/storage
Rework the way to communicate backpressure (AKA suspended ingestion)
2016-02-01 15:10:13 +01:00
beorn7 ec08c9a391 Rework the way to communicate backpressure (AKA suspended ingestion)
This gives up on the idea to communicate throuh the Append() call (by
either not returning as it is now or returning an error as
suggested/explored elsewhere). Here I have added a Throttled() call,
which has the advantage that it can be called before a whole _batch_
of Append()'s. Scrapes will happen completely or not at all. Same for
rule group evaluations. That's a highly desired behavior (as discussed
elsewhere). The code is even simpler now as the whole ingestion buffer
could be removed.

Logging of throttled mode has been streamlined and will create at most
one message per minute.
2016-02-01 14:45:44 +01:00
beorn7 a7408bfb47 Unify duration parsing
It's actually happening in several places (and for flags, we use the
standard Go time.Duration...). This at least reduces all our
home-grown parsing to one place (in model).
2016-01-29 15:41:50 +01:00
Jimmi Dyson 9faa7515c6 Kubernetes SD: Refactor to handle missing Kubernetes events 2016-01-19 20:49:58 +00:00
Brian Brazil 4a829e63a2 Merge pull request #1299 from PrFalken/master
Support AirBnB's Smartstack Nerve client for SD
2016-01-18 13:31:04 +00:00
Julien Dehee 061fe2f364 Support AirBnB's Smartstack Nerve client for SD
nerve's registration format differs from serverset. With this commit
there is now a dedicated treecache file in util,
and two separate files for serverset and nerve.

Reference:
https://github.com/airbnb/nerve
2016-01-18 14:07:28 +01:00
Brian Brazil 7a5f019c40 Use up/down in UI for consistency with 'up' metric. 2016-01-12 12:09:20 +00:00
Brian Brazil 6b7629be27 Merge pull request #1242 from tommyulfsparre/watcher-fix
Reduces watches in serverset
2015-12-10 10:43:57 +00:00
Jimmi Dyson c12fb447b8 Kubernetes SD: Use first TCP service port as target port & clean up
example config

Fixes #1256
2015-12-08 10:29:40 +00:00
Tommy Ulfsparre 83e09422bf skip already watched child nodes. 2015-12-02 21:31:05 +01:00
Fabian Reinartz 29a69eecb8 Do not panic in Consul SD creation 2015-11-30 18:41:48 +01:00
Jimmi Dyson 2cca07381b KubernetesSD: Create targets for services as well as service endpoints 2015-11-18 14:15:39 +00:00
Brian Brazil 427bf29db1 Add in default port after relabelling.
For the SNMP and blackbox exporters where
the ports tends to not be 80/443 and indeed
there may not be a port this makes the relabelling
a bit simpler as you don't have to figure out this
logic exists and strip off the :80.

This is a breaking change for the example configs of
those exporters.
2015-11-08 11:42:18 +00:00
Brian Brazil fd2bd81cd8 Allow all instance labels in target groups
With the blackbox exporter, the instance label will commonly
be used for things other than hostnames so remove this restriction.
https://example.com or https://example.com/probe/me are some examples.

To prevent user error, check that urls aren't provided as targets
when there's no relabelling that could potentically fix them.
2015-11-07 14:35:20 +00:00
Fabian Reinartz 9cad147265 Merge pull request #1172 from federicobaldo/ec2_sd_improvements
Minor improvements to ec2 service discovery
2015-11-04 13:02:51 +01:00
Federico Baldo d14d2429ea Minor improvements to ec2 sd:
1. static credentials replaced with defaults.DefaultChainCredentials.
This change ensures that credentials are sourced form all possible
providers available with the aws sdk,           in the following order:
env variables, shared awsconfig file in user folder, ec2 instance role.

2. Added a few labels: AvailabilityZone, PublicDns, VpcId (if
available), SubnetId (if in Vpc)
2015-11-02 14:55:24 +01:00
Jimmi Dyson 87940ec213 Kubernetes SD: Rename masters to api_servers in config 2015-10-24 14:41:14 +01:00
Jimmi Dyson 7ff5cc66ea Kubernetes SD authentication options cleanup 2015-10-23 16:47:52 +01:00
Jimmi Dyson ea9a173008 Kubernetes SD: Use node name as instance label 2015-10-12 21:26:09 +01:00
Julius Volz d88aea7e6f Fix SD mechanism source prefix handling.
The prefixed target provider changed a pointerized target group that was
reused in the wrapped target provider, causing an ever-increasing chain
of source prefixes in target groups from the Consul target provider.

We now make this bug generally impossible by switching the target group
channel from pointer to value type and thus ensuring that target groups
are copied before being passed on to other parts of the system.

I tried to not let the depointerization leak too far outside of the
channel handling (both upstream and downstream) because I tried that
initially and caused some nasty bugs, which I want to minimize.

Fixes https://github.com/prometheus/prometheus/issues/1083
2015-10-09 14:08:22 +02:00
Julius Volz dec9fc9c32 Merge pull request #1148 from prometheus/fix-serverset-multiple-paths
Fix watching multiple Zookeeper paths in serverset SD.
2015-10-08 19:27:06 +02:00
Matt Jibson dcb4856d72 Add SD for Amazon EC2 instances 2015-10-06 18:36:17 -04:00
Julius Volz 60cf4015a4 Fix watching multiple Zookeeper paths in serverset SD.
Fix https://github.com/prometheus/prometheus/issues/1137
2015-10-06 15:54:54 +02:00
Fabian Reinartz e3b6ec9784 Switch to common/log 2015-10-03 10:21:43 +02:00
Jimmi Dyson 0d61605526 Kubernetes SD example: separate out cluster level components & services 2015-09-29 11:22:18 +01:00
Julius Volz 99e8fff872 Fix target manager CPU busyloop caused by bad done-channel handling.
Unfortunately this isn't nicely testable, as it's timing-dependent and
one would have to detect a stray goroutine doing a CPU busyloop...

Fixes https://github.com/prometheus/prometheus/issues/1114
2015-09-28 11:51:16 +02:00
Fabian Reinartz 097d810f37 Merge pull request #1120 from prometheus/flaky-test
retrieval: Reduce flakiness of TestTargetRunScraperScrapes
2015-09-28 09:57:16 +02:00
Brian Brazil ba6688bfce retrieval: Reduce flakiness of TestTargetRunScraperScrapes 2015-09-28 08:34:54 +01:00
Brian Brazil b03569267e retrieval: Add URL parameters to fullLabels too
Move all the special cases into one map, rather than
spreading the logic around.
2015-09-26 16:59:24 +01:00
Brian Brazil 50258929ac Retrieval: Show error message for failed test scrape
This is flaky, and I suspect it was due the to I/O timeout that I've
already fixed. In case that wasn't it, display the error should it
happen again.
2015-09-23 09:24:50 +01:00
Brian Brazil 4bc39dc60e retrieval: Reduce flakiness of TestTargetManagerChan
This will increase test time by a few hundred ms,
this is the 2nd most common cause of flakiness.
2015-09-23 09:00:37 +01:00
Brian Brazil 93145b960a retrieval: Reduce flakiness of target tests
Bump timeouts of tests where we don't want I/O timeouts.

Adjust the full channel test to be much more reliable,
by reducing the ingestion timeout from 1ms to 0.
2015-09-22 19:23:36 +01:00
Fabian Reinartz cac6eea434 Merge pull request #1105 from prometheus/consulnil
Fix nil panic on consul error
2015-09-22 14:55:31 +02:00
Fabian Reinartz 327152862c Update expfmt.NewDecoder usage 2015-09-22 12:11:28 +02:00
Fabian Reinartz 1ce89a4a0b Fix nil panic on consul error 2015-09-22 09:04:31 +02:00
Julius Volz af513468eb Fix some dead code, missing error checks, shadowings.
I applied
https://medium.com/@jgautheron/quality-pipeline-for-go-projects-497e34d6567
and was greeted with a deluge of warnings, most of which were not
applicable or really fixable realistically. These are some of the first
ones I decided to fix.
2015-09-14 12:21:34 +02:00
Jimmi Dyson 7ef9399920 Clean up kubernetes http response bodies 2015-09-11 11:44:28 +01:00
Anders Daljord Morken 9fb65a91af Close HTTP connections on HTTP errors too.
Move defer resp.Body.Close() up to make sure it's called even when the
HTTP request returns something other than 200 or Decoder construction
fails. This avoids leaking and eventually running out of file descriptors.
2015-09-10 22:41:05 +02:00
Fabian Reinartz 8456b7e12f Use go1.5.1 2015-09-10 12:11:44 +02:00
Jimmi Dyson a1574aa2b3 Move TLS options to scrape config
Fixes #1013, fixes #989
2015-09-09 09:52:21 +01:00
Julius Volz b7b7b2e883 Merge pull request #1050 from fabric8io/kubernetes-discovery
Kubernetes SD improvements
2015-09-04 14:58:11 +02:00
Jimmi Dyson d7a7fd4589 Kubernetes SD improvements
* Support multiple masters with retries against each master as required.
* Scrape masters' metrics.
* Add role meta label for node/service/master to make it easier for relabeling.
2015-09-04 11:31:20 +01:00
Fabian Reinartz cc1a2a2061 Remove attachment of global labels upon ingestion 2015-09-03 14:16:23 +02:00
Fabian Reinartz ebf417a282 Fix map initialization 2015-09-01 18:06:22 +02:00
Julius Volz f63a899744 Change config regexes to full-string matches.
This anchors all regular expressions entered via the config to match a
full string vs. a substring.

THIS IS A BREAKING CHANGE!

Fixes part of https://github.com/prometheus/prometheus/issues/996
2015-09-01 15:46:41 +02:00
Fabian Reinartz 542da6774e Fix draining of file watcher events 2015-08-28 12:17:22 +02:00
Daniel Lundin 4abf54b747 serverset: extract shard number from serverset data 2015-08-27 16:26:00 +02:00
Julius Volz 29eaa8c7cf Merge pull request #1030 from prometheus/fix-flakey-filesd
Fix flakey FileSD test.
2015-08-26 13:25:00 +02:00
Julius Volz 3fd5826589 Fix flakey FileSD test.
When the test ends, all files matching the watcher's glob are removed
via defer. In that moment, the draining goroutine may still be running
and then detect no files matching the configured glob just before the
test exits.

This is now solved by waiting for the draining goroutine to finish
before leaving the test function and thus causing the deferred file
removal.
2015-08-26 13:06:34 +02:00
Julius Volz 744d5d5a7a Merge pull request #1029 from prometheus/vet-fixes
Fix "go vet" errors.
2015-08-26 12:50:18 +02:00
Julius Volz 995d3b831d Fix most golint warnings.
This is with `golint -min_confidence=0.5`.

I left several lint warnings untouched because they were either
incorrect or I felt it was better not to change them at the moment.
2015-08-26 12:44:46 +02:00
Julius Volz 963ad82dcb Fix "go vet" errors.
I ignored all errors of the type "composite literal uses unkeyed
fields". Most of them are wrong because of
https://github.com/golang/go/issues/9171.
2015-08-26 02:05:04 +02:00
Fabian Reinartz 6664b77f36 Merge pull request #1021 from prometheus/appenders
move metric modifications into SampleAppenders
2015-08-25 17:47:55 +02:00
Fabian Reinartz 01834fa528 Move metric modifications into SampleAppenders 2015-08-25 15:32:37 +02:00
Fabian Reinartz d6d88f8950 Add missing license headers 2015-08-24 19:19:21 +02:00
Julius Volz d36a7f4e6f Fix busylooping in case of no target providers.
merge() closes the channel that handleUpdates() reads from when there
are zero configured target providers in the configuration. In that case,
the for-select loop in handleUpdates() entered a busy loop. It should
exit when the upstream channel is closed.
2015-08-24 16:42:28 +02:00
Fabian Reinartz 3a0145c09e Reenable blocked appending tests 2015-08-22 09:47:57 +02:00
Fabian Reinartz 438e232c9b Fix grouping of import blocks 2015-08-22 09:42:45 +02:00
Fabian Reinartz 6d0f58dcf3 sanitize scrape health recording code 2015-08-21 23:01:08 +02:00
Fabian Reinartz 25bf5fdaf5 Timeout sample appends 2015-08-21 18:04:35 +02:00
Fabian Reinartz 11a577fcd0 Switch to common/expfmt for extraction 2015-08-21 13:33:38 +02:00
Fabian Reinartz 306e8468a0 Switch from client_golang/model to common/model 2015-08-21 13:33:38 +02:00
Sharif Nassar 6cb519fe82 Add Consul ServiceID to the discovery meta labels. 2015-08-20 14:04:42 -07:00
Fabian Reinartz 0f5022c091 Add missing Kubernetes doc strings 2015-08-18 14:37:28 +02:00
Fabian Reinartz f592740bac Only exit static target provider on done 2015-08-18 11:51:53 +02:00
Julius Volz b4adf2723d Merge pull request #994 from robbiet480/consul-datacenter-name
Pass through current agent Consul datacenter name
2015-08-18 01:09:24 +02:00