Commit graph

289 commits

Author SHA1 Message Date
beorn7 33a50e69f7 Fix a deadlock
Double acquisition of the RLock usually doesn't blow up, but if the
write lock is called for between the two RLock's, we are deadlocked.

This deadlock does not exist in release-0.17, BTW.
2016-02-29 16:34:29 +01:00
beorn7 fd5108b038 Fix a targetmanager test 2016-02-22 16:43:48 +01:00
Fabian Reinartz 6df1f49c13 Remove fullLabels method and fix target updating
With recent changes to a Target's internal data representation
updating by fullLabels() assigns the additional default
instance label. This breaks target identity comparison and causes
identical targets from service discovery to be constantly swapped.
2016-02-22 13:06:30 +01:00
Fabian Reinartz 825831e98f Use fingerprint for target identity comparison
So far we were using the InstanceIdentifier to compare equality of targets.
This is not always accurate, for example for the blackbox exporter where the 
actual target is in the parameter.
2016-02-17 16:34:53 +01:00
Fabian Reinartz 66767121ab Handle scrape timeout on request.
For historic reasons we were enforcing a timeout directly
via the TCP dialer. This is no longer necessary for quite a while now.
Switching to context.Context will allow us to properly terminate
requests on shutdown as well.
2016-02-16 11:46:02 +01:00
Julius Volz 293486c7b1 Remove old superfluous calls to setLastScrape().
This is called from within the scrape()->report() flow now.

See https://github.com/prometheus/prometheus/pull/1394/files#r52945817
2016-02-15 22:42:24 +01:00
Fabian Reinartz a0078ec84c Merge pull request #1394 from prometheus/scraperef2
Refactor and test appender modifications
2016-02-15 21:19:40 +01:00
Fabian Reinartz 463dd3ea06 Refactor target scrape reporting. 2016-02-15 18:06:15 +01:00
Fabian Reinartz cd28b88b08 Fix wrong EOF error on successful target scraping 2016-02-15 17:23:04 +01:00
Fabian Reinartz 27d71b08d1 Factor out appender wrapping 2016-02-15 16:47:39 +01:00
Fabian Reinartz fe7e91e2eb Make scraping offset consistent.
To evenly distribute scraping load we currently rely on random
jittering. This commit hashes over the target's identity and calculates
a consistent offset. This also ensures that scrape intervals
are constantly spaced between config/target changes.
2016-02-15 16:46:29 +01:00
Fabian Reinartz a06bc75519 Remove occurrences of 'base' labels 2016-02-15 10:36:57 +01:00
Fabian Reinartz 0d44248fb8 Cleanup cluttered test data 2016-02-13 10:13:38 +01:00
Fabian Reinartz 65eba080a0 Cleanup internal target data 2016-02-13 10:13:38 +01:00
Julius Volz 9b6d69610a Fix various typos in comments.
Helpfully reported by
https://goreportcard.com/report/github.com/prometheus/prometheus :)
2016-02-10 03:47:00 +01:00
Julius Volz 3728b5872f Fix target update error handling.
Fixes https://github.com/prometheus/prometheus/issues/1378
2016-02-08 21:42:59 +01:00
Fabian Reinartz 1f877f3d2a Fix deadlock, structure target logging 2016-02-03 10:39:34 +01:00
Fabian Reinartz d0d2c38c68 Fix tests for append API changes 2016-02-03 10:17:08 +01:00
Fabian Reinartz 59f1e722df Return error on sample appending 2016-02-02 14:01:44 +01:00
Björn Rabenstein 9ea3897ea7 Merge pull request #1354 from prometheus/beorn7/storage
Rework the way to communicate backpressure (AKA suspended ingestion)
2016-02-01 15:10:13 +01:00
beorn7 ec08c9a391 Rework the way to communicate backpressure (AKA suspended ingestion)
This gives up on the idea to communicate throuh the Append() call (by
either not returning as it is now or returning an error as
suggested/explored elsewhere). Here I have added a Throttled() call,
which has the advantage that it can be called before a whole _batch_
of Append()'s. Scrapes will happen completely or not at all. Same for
rule group evaluations. That's a highly desired behavior (as discussed
elsewhere). The code is even simpler now as the whole ingestion buffer
could be removed.

Logging of throttled mode has been streamlined and will create at most
one message per minute.
2016-02-01 14:45:44 +01:00
beorn7 a7408bfb47 Unify duration parsing
It's actually happening in several places (and for flags, we use the
standard Go time.Duration...). This at least reduces all our
home-grown parsing to one place (in model).
2016-01-29 15:41:50 +01:00
Jimmi Dyson 9faa7515c6 Kubernetes SD: Refactor to handle missing Kubernetes events 2016-01-19 20:49:58 +00:00
Brian Brazil 4a829e63a2 Merge pull request #1299 from PrFalken/master
Support AirBnB's Smartstack Nerve client for SD
2016-01-18 13:31:04 +00:00
Julien Dehee 061fe2f364 Support AirBnB's Smartstack Nerve client for SD
nerve's registration format differs from serverset. With this commit
there is now a dedicated treecache file in util,
and two separate files for serverset and nerve.

Reference:
https://github.com/airbnb/nerve
2016-01-18 14:07:28 +01:00
Brian Brazil 7a5f019c40 Use up/down in UI for consistency with 'up' metric. 2016-01-12 12:09:20 +00:00
Brian Brazil 6b7629be27 Merge pull request #1242 from tommyulfsparre/watcher-fix
Reduces watches in serverset
2015-12-10 10:43:57 +00:00
Jimmi Dyson c12fb447b8 Kubernetes SD: Use first TCP service port as target port & clean up
example config

Fixes #1256
2015-12-08 10:29:40 +00:00
Tommy Ulfsparre 83e09422bf skip already watched child nodes. 2015-12-02 21:31:05 +01:00
Fabian Reinartz 29a69eecb8 Do not panic in Consul SD creation 2015-11-30 18:41:48 +01:00
Jimmi Dyson 2cca07381b KubernetesSD: Create targets for services as well as service endpoints 2015-11-18 14:15:39 +00:00
Brian Brazil 427bf29db1 Add in default port after relabelling.
For the SNMP and blackbox exporters where
the ports tends to not be 80/443 and indeed
there may not be a port this makes the relabelling
a bit simpler as you don't have to figure out this
logic exists and strip off the :80.

This is a breaking change for the example configs of
those exporters.
2015-11-08 11:42:18 +00:00
Brian Brazil fd2bd81cd8 Allow all instance labels in target groups
With the blackbox exporter, the instance label will commonly
be used for things other than hostnames so remove this restriction.
https://example.com or https://example.com/probe/me are some examples.

To prevent user error, check that urls aren't provided as targets
when there's no relabelling that could potentically fix them.
2015-11-07 14:35:20 +00:00
Fabian Reinartz 9cad147265 Merge pull request #1172 from federicobaldo/ec2_sd_improvements
Minor improvements to ec2 service discovery
2015-11-04 13:02:51 +01:00
Federico Baldo d14d2429ea Minor improvements to ec2 sd:
1. static credentials replaced with defaults.DefaultChainCredentials.
This change ensures that credentials are sourced form all possible
providers available with the aws sdk,           in the following order:
env variables, shared awsconfig file in user folder, ec2 instance role.

2. Added a few labels: AvailabilityZone, PublicDns, VpcId (if
available), SubnetId (if in Vpc)
2015-11-02 14:55:24 +01:00
Jimmi Dyson 87940ec213 Kubernetes SD: Rename masters to api_servers in config 2015-10-24 14:41:14 +01:00
Jimmi Dyson 7ff5cc66ea Kubernetes SD authentication options cleanup 2015-10-23 16:47:52 +01:00
Jimmi Dyson ea9a173008 Kubernetes SD: Use node name as instance label 2015-10-12 21:26:09 +01:00
Julius Volz d88aea7e6f Fix SD mechanism source prefix handling.
The prefixed target provider changed a pointerized target group that was
reused in the wrapped target provider, causing an ever-increasing chain
of source prefixes in target groups from the Consul target provider.

We now make this bug generally impossible by switching the target group
channel from pointer to value type and thus ensuring that target groups
are copied before being passed on to other parts of the system.

I tried to not let the depointerization leak too far outside of the
channel handling (both upstream and downstream) because I tried that
initially and caused some nasty bugs, which I want to minimize.

Fixes https://github.com/prometheus/prometheus/issues/1083
2015-10-09 14:08:22 +02:00
Julius Volz dec9fc9c32 Merge pull request #1148 from prometheus/fix-serverset-multiple-paths
Fix watching multiple Zookeeper paths in serverset SD.
2015-10-08 19:27:06 +02:00
Matt Jibson dcb4856d72 Add SD for Amazon EC2 instances 2015-10-06 18:36:17 -04:00
Julius Volz 60cf4015a4 Fix watching multiple Zookeeper paths in serverset SD.
Fix https://github.com/prometheus/prometheus/issues/1137
2015-10-06 15:54:54 +02:00
Fabian Reinartz e3b6ec9784 Switch to common/log 2015-10-03 10:21:43 +02:00
Jimmi Dyson 0d61605526 Kubernetes SD example: separate out cluster level components & services 2015-09-29 11:22:18 +01:00
Julius Volz 99e8fff872 Fix target manager CPU busyloop caused by bad done-channel handling.
Unfortunately this isn't nicely testable, as it's timing-dependent and
one would have to detect a stray goroutine doing a CPU busyloop...

Fixes https://github.com/prometheus/prometheus/issues/1114
2015-09-28 11:51:16 +02:00
Fabian Reinartz 097d810f37 Merge pull request #1120 from prometheus/flaky-test
retrieval: Reduce flakiness of TestTargetRunScraperScrapes
2015-09-28 09:57:16 +02:00
Brian Brazil ba6688bfce retrieval: Reduce flakiness of TestTargetRunScraperScrapes 2015-09-28 08:34:54 +01:00
Brian Brazil b03569267e retrieval: Add URL parameters to fullLabels too
Move all the special cases into one map, rather than
spreading the logic around.
2015-09-26 16:59:24 +01:00
Brian Brazil 50258929ac Retrieval: Show error message for failed test scrape
This is flaky, and I suspect it was due the to I/O timeout that I've
already fixed. In case that wasn't it, display the error should it
happen again.
2015-09-23 09:24:50 +01:00
Brian Brazil 4bc39dc60e retrieval: Reduce flakiness of TestTargetManagerChan
This will increase test time by a few hundred ms,
this is the 2nd most common cause of flakiness.
2015-09-23 09:00:37 +01:00