Commit graph

576 commits

Author SHA1 Message Date
Brian Brazil d8b4995ddd Check target labels are valid. Check for address after relabelling.
Fixes #2822
Fixes #2819
2017-06-09 16:18:19 +01:00
Fabian Reinartz 669075c6b9 Merge branch 'master' into dev-2.0 2017-06-06 09:36:51 +02:00
Fabian Reinartz eb651233ac Merge pull request #2787 from prometheus/limit2
Rework sample limit to work for 2.0
2017-06-06 08:21:12 +02:00
Chris Goller 42de0ae013 Use log.Logger interface for all discovery services 2017-06-01 11:25:55 -05:00
Brian Brazil 37bc607e96 Rework sample limit to work for 2.0
Correctly update reported series.
Increment prometheus_target_scrapes_exceeded_sample_limit_total.
Add back unittests.
Ignore stale markers when calculating sample limit.

Fixes #2770
2017-05-31 15:41:51 +01:00
Fabian Reinartz c6eed97c77 Merge pull request #2774 from prometheus/stalemem
Fix staleness memory leak
2017-05-30 15:55:37 -07:00
Fabian Reinartz bc7aff8cef retrieval: extract scrape cache 2017-05-30 09:37:23 -07:00
Brian Brazil 72a276e7ed Pass through storage errors in limitAppender. 2017-05-26 11:28:22 +01:00
Fabian Reinartz a83014f53c retrieval: fix memory leak and consumption for caches 2017-05-26 08:44:24 +02:00
Fabian Reinartz 3d8661b8d5 Add comment 2017-05-24 17:05:42 +02:00
Fabian Reinartz 43ca652217 retrieval: Don't allocate map on every scrape 2017-05-24 16:23:48 +02:00
Fabian Reinartz d3f662f15e Merge branch 'dev-2.0' into grobie/reduce-noisy-append-errors 2017-05-24 15:29:30 +02:00
Fabian Reinartz d289dc55c3 storage: update TSDB 2017-05-22 11:53:08 +02:00
Brian Brazil 0920972f79 Initilise scraped sample map, and rename to series map. 2017-05-16 18:33:51 +01:00
Brian Brazil bf38963118 Plumb through logger with target field to scrape loop. 2017-05-16 18:33:51 +01:00
Brian Brazil d657d722dc Log count of dupliates/out of order samples as warnings.
Keep log of each sample as debug log.
2017-05-16 18:33:51 +01:00
Brian Brazil 8b9d3e7547 Put end of run staleness handler in seperate function.
Improve log message.
2017-05-16 18:33:51 +01:00
Brian Brazil d532272520 Add stalemarkers to synthetic series too when target stops. 2017-05-16 18:33:51 +01:00
Brian Brazil b87d3ca9ea Create stale markers when a target is stopped.
When a target is no longer returned from SD stop()
is called. However it may be recreated before the
next scrape interval happens. So we wait to set stalemarkers
until the scrape of the new target would have happened
and been ingested, which is 2 scrape intervals.

If we're shutting down the context will be cancelled,
so return immediately rather than holding things up for potentially
minutes waiting to safely set stalemarkers no newer than now.
If the server starts immediately back up again all is well.
If not, we're missing some stale markers.
2017-05-16 18:33:51 +01:00
Brian Brazil 95162ebc16 Add log messages for out of order samples 2017-05-16 18:33:51 +01:00
Brian Brazil 3c45400130 Don't fail scrape if one sample violates ordering.
In Prometheus 1.x one sample that is out of order
or that has a duplicate timestamp is discarded, and
the rest of the scrape ingestion continues on.
This will now also be true for 2.0.
2017-05-16 18:33:51 +01:00
Brian Brazil fd5c5a50a3 Add stale markers on parse error.
If we fail to parse the result of a scrape,
we should treat that as a failed scrape and
add stale markers.
2017-05-16 18:33:51 +01:00
Brian Brazil c0c7e32e61 Treat a failed scrape as an empty scrape for staleness.
If a target has died but is still in SD, we want the previously
scraped values to go stale. This would also apply to brief blips.
2017-05-16 18:33:51 +01:00
Brian Brazil 850ea412ad If an explicit timestamp is provided, bypass staleness. 2017-05-16 18:33:51 +01:00
Brian Brazil a5cf25743c Move stalness check into a function 2017-05-16 18:33:51 +01:00
Brian Brazil 5060a0fc51 Add unittests for ingestion stale NaNs 2017-05-16 18:33:51 +01:00
Brian Brazil 4f35952cf3 Inject a stale NaN when sample disappears between scrapes. 2017-05-16 18:33:51 +01:00
Brian Brazil beaa7d5a43 Move consistent NaN logic into the parser. 2017-05-16 18:33:51 +01:00
Brian Brazil 76acf7b9b1 Ensure all the NaNs we ingest have the same bit pattern. 2017-05-16 18:33:51 +01:00
Brian Brazil 0eabed8048 Remove unused metric 2017-05-15 15:06:54 +01:00
Fabian Reinartz 76b3378190 retrieval: add missing scrape context cancelation 2017-05-11 17:20:03 +02:00
Julius Volz f160f17a6f retrieval: fix missing scrape context cancellation (#2599) 2017-05-11 16:15:07 +02:00
Tobias Schmidt 368206d2f5 Handle errSeriesDropped correctly
If metrics_relabel_configs are used to drop metrics, an errSeriesDropped
is returned. This shouldn't be used to return an error at the end of a
append() call.
2017-05-05 14:58:36 +02:00
Fabian Reinartz e829dbe2be retrieval: comment out accept header again 2017-04-27 11:46:08 +02:00
Fabian Reinartz 73b8ff0ddc Merge branch 'master' into dev-2.0 2017-04-27 10:19:55 +02:00
Matt Layher 5e4f5fb5ad retrieval: make scrape timeout header consistent with others 2017-04-05 14:56:22 -04:00
Alexey Palazhchenko 17f15d024a Small fixes. (#2578)
Fix typos. Simplify with gofmt -s
2017-04-05 14:24:22 +01:00
Matt Layher fe4b6693f7 retrieval: add Scrape-Timeout-Seconds header to each scrape request (#2565)
Fixes #2508.
2017-04-04 18:26:28 +01:00
Fabian Reinartz 8ffc851147 Merge branch 'master' into dev-2.0 2017-04-04 15:17:56 +02:00
Julius Volz 947c83be3b Sort targets by instance within a job
Fixes https://github.com/prometheus/prometheus/issues/2536
2017-03-31 13:14:20 +02:00
Julius Volz 815762a4ad Move retrieval.NewHTTPClient -> httputil.NewClientFromConfig 2017-03-20 14:17:04 +01:00
Fabian Reinartz c389193b37 Merge branch 'master' into dev-2.0 2017-03-17 16:27:07 +01:00
Fabian Reinartz 5ec1efe622 retrieval: fix test 2017-03-08 15:37:12 +01:00
Fabian Reinartz d9fb57cde4 *: Simplify []byte to string unsafe conversion 2017-03-07 11:41:11 +01:00
Fabian Reinartz 9304179ef7 Merge branch 'master' into dev-2.0 2017-03-02 08:16:58 +01:00
Erdem Agaoglu 8809735d7f Setting User-Agent header (#2447) 2017-02-28 09:59:33 -04:00
Fabian Reinartz cc0ff26f1f retrieval: handle GZIP compression ourselves
The automatic GZIP handling of net/http does not preserve
buffers across requests and thus generates a lot of garbage.
We handle GZIP ourselves to circumvent this.t
2017-02-22 13:25:25 +01:00
Fabian Reinartz 311e7b5069 storage/vendor: update to latest fabxc/tsdb 2017-02-20 11:11:44 +01:00
Fabian Reinartz 5772f1a7ba retrieval/storage: adapt to new interface
This simplifies the interface to two add methods for
appends with labels or faster reference numbers.
2017-02-02 13:05:46 +01:00
Brian Brazil 34767c2221 Clone lset before relabelling. (#2386)
We need to not change the lset passed into populateLabels, as that
is kept around by the SDs.

Fixes 2377
2017-02-01 19:49:50 +00:00
Fabian Reinartz 1d3cdd0d67 Merge branch 'master' into dev-2.0-rebase 2017-01-30 17:43:01 +01:00
Fabian Reinartz 035976b275 retrieval: handle not found error correctly 2017-01-20 11:27:01 +01:00
Fabian Reinartz 598e2f01c0 retrieval: don't erronously break appending 2017-01-17 08:39:18 +01:00
Fabian Reinartz c691895a0f retrieval: cache series references, use pkg/textparse
With this change the scraping caches series references and only
allocates label sets if it has to retrieve a new reference.
pkg/textparse is used to do the conditional parsing and reduce
allocations from 900B/sample to 0 in the general case.
2017-01-16 12:03:57 +01:00
Fabian Reinartz ad9bc62e4c storage: extend appender and adapt it 2017-01-13 14:48:01 +01:00
Fabian Reinartz 3302bb1eb1 Merge pull request #2323 from prometheus/beorn7/retrieval
Retrieval: Avoid copying Target
2017-01-08 06:49:47 +01:00
Björn Rabenstein ad40d0abbc Merge pull request #2288 from prometheus/limit-scrape
Add ability to limit scrape samples, and related metrics
2017-01-08 01:34:06 +01:00
beorn7 5dc01202d7 Retrieval: Remove some test lines that fail on Travis only
These lines exercise an append in
TestScrapeLoopWrapSampleAppender. Arguably, append shouldn't be tested
there in the first place.

Still it's weird why this fails on Travis:

```
--- FAIL: TestScrapeLoopWrapSampleAppender (0.00s)
    scrape_test.go:259: Expected count of 1, got 0
    scrape_test.go:290: Expected count of 1, got 0
2017/01/07 22:48:26 http: TLS handshake error from 127.0.0.1:50716: read tcp 127.0.0.1:40265->127.0.0.1:50716: read: connection reset by peer
FAIL
FAIL	github.com/prometheus/prometheus/retrieval	3.603s
```

Should anybody ever find out why, please revert this commit accordingly.
2017-01-08 00:01:46 +01:00
beorn7 3610331eeb Retrieval: Do not buffer the samples if no sample limit configured
Also, simplify and streamline the code a bit.
2017-01-07 18:18:54 +01:00
beorn7 767c0709b1 Retrieval: Avoid copying Target
retreival.Target contains a mutex. It was copied in the Targets()
call. This potentially can wreak a lot of havoc.

It might even have caused the issues reported as #2266 and #2262 .
2017-01-06 18:43:41 +01:00
Fabian Reinartz e631a1260d retrieval: use separate appender per target 2016-12-30 21:35:35 +01:00
Fabian Reinartz f8fc1f5bb2 *: migrate ingestion to new batch Appender 2016-12-29 11:03:56 +01:00
Brian Brazil 6c07453ec1 Only clone the metric in the one place relabelling needs it. (#2292)
This cuts ~17% off memory allocations related to ingesting data
in a basic setup.
2016-12-21 10:00:33 +00:00
Brian Brazil f421ce0636 Remove label from prometheus_target_skipped_scrapes_total (#2289)
This avoids it not being intialised, and breaking out by
interval wasn't partiuclarly useful.

Fixes #2269
2016-12-16 18:00:52 +00:00
Brian Brazil 30448286c7 Add sample_limit to scrape config.
This imposes a hard limit on the number of samples ingested from the
target. This is counted after metric relabelling, to allow dropping of
problemtic metrics.

This is intended as a very blunt tool to prevent overload due to
misbehaving targets that suddenly jump in sample count (e.g. adding
a label containing email addresses).

Add metric to track how often this happens.

Fixes #2137
2016-12-16 15:10:09 +00:00
Brian Brazil c8de1484d5 Add scrape_samples_post_metric_relabeling
This reports the number of samples post any keep/drop
from metric relabelling.
2016-12-13 17:32:11 +00:00
Brian Brazil 06b9df65ec Refactor and add unittests to scrape result handling. 2016-12-13 16:49:17 +00:00
Brian Brazil b5ded43594 Allow buffering of scraped samples before sending them to storage. 2016-12-13 15:01:35 +00:00
Frederic Branczyk 33b583d50e
web/api: add targets endpoint 2016-12-05 13:13:21 +01:00
Frederic Branczyk 8f8cea4fbd
retrieval: refactor TargetManager to return flat list of Targets 2016-12-02 13:28:58 +01:00
Fabian Reinartz 200bbe1bad config: extract SD and HTTPClient configurations 2016-11-23 18:23:37 +01:00
Fabian Reinartz 47623202c7 retrieval: remove metric namespaces 2016-11-23 09:17:04 +01:00
Fabian Reinartz d7f4f8b879 discovery: move TargetSet into discovery package 2016-11-23 09:14:44 +01:00
Fabian Reinartz d19d1bcad3 discovery: move into top-level package 2016-11-22 12:56:33 +01:00
Fabian Reinartz 7bd9508c9b discovery: move TargetProvider and multi-constructor 2016-11-22 12:56:33 +01:00
Fabian Reinartz bd0048477c discovery: move remaining SDs into own package 2016-11-22 12:56:33 +01:00
Fabian Reinartz 5b72eae1b0 Merge pull request #2203 from prometheus/sdfix
Service discovery fixes
2016-11-21 16:46:20 +01:00
Fabian Reinartz ec66082749 Merge branch 'ec2_sd_profile_support' of https://github.com/Ticketmaster/prometheus into Ticketmaster-ec2_sd_profile_support 2016-11-21 11:49:23 +01:00
Fabian Reinartz 06555bde93 Merge branch 'k8s_sd_metrics' of https://github.com/dominikschulz/prometheus into dominikschulz-k8s_sd_metrics 2016-11-21 11:44:48 +01:00
Fabian Reinartz a1eec447a4 discovery: fix+consolidate Zookeeper discoveries 2016-11-18 13:20:58 +01:00
Fabian Reinartz b4d7ce1370 discovery: respect context cancellation everywhere
This also removes closing of the target group channel everywhere
as the contexts cancels across all stages and we don't care about
draining all events once that happened.
2016-11-18 10:55:29 +01:00
Fabian Reinartz bc7bd7202c discovery: terminate senders before closing channel
Fixes #2200
2016-11-18 10:03:12 +01:00
Frederic Branczyk 0fcea6e9fb retrieval/discovery/kubernetes: fix cache state unknown behavior (#2180)
* retrieval/discovery/kubernetes: fix cache state unknown behavior

* retrieval/discovery/kubernetes: extract type casting

* retrieval/discovery/kubernetes: add tests for possible regressions
2016-11-14 16:21:38 +01:00
Fabian Reinartz fa82c65d15 Merge pull request #2186 from prometheus/fixes
Test fixes
2016-11-14 09:52:15 +01:00
Fabian Reinartz 7ecc271411 Move Fatalf call into main test goroutine 2016-11-13 18:21:42 +01:00
Fabian Reinartz 530cdba103 kubernetes: only use one error logging handler 2016-11-12 14:13:38 +01:00
beorn7 92c0ef1a92 Merge branch 'release-1.2' into beorn7/release 2016-11-03 22:48:39 +01:00
Kraig Amador bec6870ed4 ec2_sd_configs: Support profiles for configuring the ec2 service 2016-11-03 08:38:02 -07:00
beorn7 0fdb74c069 Adjust dns.go to new miekg/dns package and improve error handling.
When hitting the 64kiB limit of DNS, the error message so far was
really misleading.
2016-11-03 15:42:11 +01:00
Brian Brazil 64263f280d Add scrape_samples_scraped to indicate samples scraped. (#2123) 2016-10-26 17:43:01 +01:00
Brian Brazil bbec65d454 Call SD metrics refresh rather than scrape. (#2120)
This avoids confusion with scrape_duration_seconds, and
is more in line with the API naming.
2016-10-26 10:03:35 +01:00
bekbulatov 2bc12fa2fb Set timeout for marathon_sd 2016-10-24 11:27:08 +01:00
bekbulatov c689b35858 Merge branch 'master' into marathon_tls 2016-10-24 10:37:32 +01:00
Dominik Schulz eb10ff9871 Also handle service update in endpoints.go 2016-10-23 13:33:54 +02:00
Dominik Schulz f002fe186a Add Marathon-SD metrics. (#2106) 2016-10-21 11:14:53 +01:00
Mitsuhiro Tanda 296644adeb Expose ec2_instance_type (#2107) 2016-10-21 11:13:47 +01:00
Dominik Schulz 36de163900 Add File-SD metrics (#2103)
* Add File-SD metrics

* Count read errors, not scan errors.
2016-10-21 11:12:19 +01:00
Dominik Schulz 3d0fb0cf17 Avoid too generic label type. 2016-10-21 12:11:15 +02:00
Dominik Schulz e1e30f12cd Add Kubernetes-SD metrics. 2016-10-21 10:48:28 +02:00
Dominik Schulz 552ab61fa1 Change SD metric names to make logical grouping more visible. (#2102) 2016-10-21 09:18:28 +01:00