Brian Brazil
d8b4995ddd
Check target labels are valid. Check for address after relabelling.
...
Fixes #2822
Fixes #2819
2017-06-09 16:18:19 +01:00
Fabian Reinartz
669075c6b9
Merge branch 'master' into dev-2.0
2017-06-06 09:36:51 +02:00
Fabian Reinartz
eb651233ac
Merge pull request #2787 from prometheus/limit2
...
Rework sample limit to work for 2.0
2017-06-06 08:21:12 +02:00
Chris Goller
42de0ae013
Use log.Logger interface for all discovery services
2017-06-01 11:25:55 -05:00
Brian Brazil
37bc607e96
Rework sample limit to work for 2.0
...
Correctly update reported series.
Increment prometheus_target_scrapes_exceeded_sample_limit_total.
Add back unittests.
Ignore stale markers when calculating sample limit.
Fixes #2770
2017-05-31 15:41:51 +01:00
Fabian Reinartz
c6eed97c77
Merge pull request #2774 from prometheus/stalemem
...
Fix staleness memory leak
2017-05-30 15:55:37 -07:00
Fabian Reinartz
bc7aff8cef
retrieval: extract scrape cache
2017-05-30 09:37:23 -07:00
Brian Brazil
72a276e7ed
Pass through storage errors in limitAppender.
2017-05-26 11:28:22 +01:00
Fabian Reinartz
a83014f53c
retrieval: fix memory leak and consumption for caches
2017-05-26 08:44:24 +02:00
Fabian Reinartz
3d8661b8d5
Add comment
2017-05-24 17:05:42 +02:00
Fabian Reinartz
43ca652217
retrieval: Don't allocate map on every scrape
2017-05-24 16:23:48 +02:00
Fabian Reinartz
d3f662f15e
Merge branch 'dev-2.0' into grobie/reduce-noisy-append-errors
2017-05-24 15:29:30 +02:00
Fabian Reinartz
d289dc55c3
storage: update TSDB
2017-05-22 11:53:08 +02:00
Brian Brazil
0920972f79
Initilise scraped sample map, and rename to series map.
2017-05-16 18:33:51 +01:00
Brian Brazil
bf38963118
Plumb through logger with target field to scrape loop.
2017-05-16 18:33:51 +01:00
Brian Brazil
d657d722dc
Log count of dupliates/out of order samples as warnings.
...
Keep log of each sample as debug log.
2017-05-16 18:33:51 +01:00
Brian Brazil
8b9d3e7547
Put end of run staleness handler in seperate function.
...
Improve log message.
2017-05-16 18:33:51 +01:00
Brian Brazil
d532272520
Add stalemarkers to synthetic series too when target stops.
2017-05-16 18:33:51 +01:00
Brian Brazil
b87d3ca9ea
Create stale markers when a target is stopped.
...
When a target is no longer returned from SD stop()
is called. However it may be recreated before the
next scrape interval happens. So we wait to set stalemarkers
until the scrape of the new target would have happened
and been ingested, which is 2 scrape intervals.
If we're shutting down the context will be cancelled,
so return immediately rather than holding things up for potentially
minutes waiting to safely set stalemarkers no newer than now.
If the server starts immediately back up again all is well.
If not, we're missing some stale markers.
2017-05-16 18:33:51 +01:00
Brian Brazil
95162ebc16
Add log messages for out of order samples
2017-05-16 18:33:51 +01:00
Brian Brazil
3c45400130
Don't fail scrape if one sample violates ordering.
...
In Prometheus 1.x one sample that is out of order
or that has a duplicate timestamp is discarded, and
the rest of the scrape ingestion continues on.
This will now also be true for 2.0.
2017-05-16 18:33:51 +01:00
Brian Brazil
fd5c5a50a3
Add stale markers on parse error.
...
If we fail to parse the result of a scrape,
we should treat that as a failed scrape and
add stale markers.
2017-05-16 18:33:51 +01:00
Brian Brazil
c0c7e32e61
Treat a failed scrape as an empty scrape for staleness.
...
If a target has died but is still in SD, we want the previously
scraped values to go stale. This would also apply to brief blips.
2017-05-16 18:33:51 +01:00
Brian Brazil
850ea412ad
If an explicit timestamp is provided, bypass staleness.
2017-05-16 18:33:51 +01:00
Brian Brazil
a5cf25743c
Move stalness check into a function
2017-05-16 18:33:51 +01:00
Brian Brazil
5060a0fc51
Add unittests for ingestion stale NaNs
2017-05-16 18:33:51 +01:00
Brian Brazil
4f35952cf3
Inject a stale NaN when sample disappears between scrapes.
2017-05-16 18:33:51 +01:00
Brian Brazil
beaa7d5a43
Move consistent NaN logic into the parser.
2017-05-16 18:33:51 +01:00
Brian Brazil
76acf7b9b1
Ensure all the NaNs we ingest have the same bit pattern.
2017-05-16 18:33:51 +01:00
Brian Brazil
0eabed8048
Remove unused metric
2017-05-15 15:06:54 +01:00
Fabian Reinartz
76b3378190
retrieval: add missing scrape context cancelation
2017-05-11 17:20:03 +02:00
Julius Volz
f160f17a6f
retrieval: fix missing scrape context cancellation ( #2599 )
2017-05-11 16:15:07 +02:00
Tobias Schmidt
368206d2f5
Handle errSeriesDropped correctly
...
If metrics_relabel_configs are used to drop metrics, an errSeriesDropped
is returned. This shouldn't be used to return an error at the end of a
append() call.
2017-05-05 14:58:36 +02:00
Fabian Reinartz
e829dbe2be
retrieval: comment out accept header again
2017-04-27 11:46:08 +02:00
Fabian Reinartz
73b8ff0ddc
Merge branch 'master' into dev-2.0
2017-04-27 10:19:55 +02:00
Matt Layher
5e4f5fb5ad
retrieval: make scrape timeout header consistent with others
2017-04-05 14:56:22 -04:00
Alexey Palazhchenko
17f15d024a
Small fixes. ( #2578 )
...
Fix typos. Simplify with gofmt -s
2017-04-05 14:24:22 +01:00
Matt Layher
fe4b6693f7
retrieval: add Scrape-Timeout-Seconds header to each scrape request ( #2565 )
...
Fixes #2508 .
2017-04-04 18:26:28 +01:00
Fabian Reinartz
8ffc851147
Merge branch 'master' into dev-2.0
2017-04-04 15:17:56 +02:00
Julius Volz
947c83be3b
Sort targets by instance within a job
...
Fixes https://github.com/prometheus/prometheus/issues/2536
2017-03-31 13:14:20 +02:00
Julius Volz
815762a4ad
Move retrieval.NewHTTPClient -> httputil.NewClientFromConfig
2017-03-20 14:17:04 +01:00
Fabian Reinartz
c389193b37
Merge branch 'master' into dev-2.0
2017-03-17 16:27:07 +01:00
Fabian Reinartz
5ec1efe622
retrieval: fix test
2017-03-08 15:37:12 +01:00
Fabian Reinartz
d9fb57cde4
*: Simplify []byte to string unsafe conversion
2017-03-07 11:41:11 +01:00
Fabian Reinartz
9304179ef7
Merge branch 'master' into dev-2.0
2017-03-02 08:16:58 +01:00
Erdem Agaoglu
8809735d7f
Setting User-Agent header ( #2447 )
2017-02-28 09:59:33 -04:00
Fabian Reinartz
cc0ff26f1f
retrieval: handle GZIP compression ourselves
...
The automatic GZIP handling of net/http does not preserve
buffers across requests and thus generates a lot of garbage.
We handle GZIP ourselves to circumvent this.t
2017-02-22 13:25:25 +01:00
Fabian Reinartz
311e7b5069
storage/vendor: update to latest fabxc/tsdb
2017-02-20 11:11:44 +01:00
Fabian Reinartz
5772f1a7ba
retrieval/storage: adapt to new interface
...
This simplifies the interface to two add methods for
appends with labels or faster reference numbers.
2017-02-02 13:05:46 +01:00
Brian Brazil
34767c2221
Clone lset before relabelling. ( #2386 )
...
We need to not change the lset passed into populateLabels, as that
is kept around by the SDs.
Fixes 2377
2017-02-01 19:49:50 +00:00
Fabian Reinartz
1d3cdd0d67
Merge branch 'master' into dev-2.0-rebase
2017-01-30 17:43:01 +01:00
Fabian Reinartz
035976b275
retrieval: handle not found error correctly
2017-01-20 11:27:01 +01:00
Fabian Reinartz
598e2f01c0
retrieval: don't erronously break appending
2017-01-17 08:39:18 +01:00
Fabian Reinartz
c691895a0f
retrieval: cache series references, use pkg/textparse
...
With this change the scraping caches series references and only
allocates label sets if it has to retrieve a new reference.
pkg/textparse is used to do the conditional parsing and reduce
allocations from 900B/sample to 0 in the general case.
2017-01-16 12:03:57 +01:00
Fabian Reinartz
ad9bc62e4c
storage: extend appender and adapt it
2017-01-13 14:48:01 +01:00
Fabian Reinartz
3302bb1eb1
Merge pull request #2323 from prometheus/beorn7/retrieval
...
Retrieval: Avoid copying Target
2017-01-08 06:49:47 +01:00
Björn Rabenstein
ad40d0abbc
Merge pull request #2288 from prometheus/limit-scrape
...
Add ability to limit scrape samples, and related metrics
2017-01-08 01:34:06 +01:00
beorn7
5dc01202d7
Retrieval: Remove some test lines that fail on Travis only
...
These lines exercise an append in
TestScrapeLoopWrapSampleAppender. Arguably, append shouldn't be tested
there in the first place.
Still it's weird why this fails on Travis:
```
--- FAIL: TestScrapeLoopWrapSampleAppender (0.00s)
scrape_test.go:259: Expected count of 1, got 0
scrape_test.go:290: Expected count of 1, got 0
2017/01/07 22:48:26 http: TLS handshake error from 127.0.0.1:50716: read tcp 127.0.0.1:40265->127.0.0.1:50716: read: connection reset by peer
FAIL
FAIL github.com/prometheus/prometheus/retrieval 3.603s
```
Should anybody ever find out why, please revert this commit accordingly.
2017-01-08 00:01:46 +01:00
beorn7
3610331eeb
Retrieval: Do not buffer the samples if no sample limit configured
...
Also, simplify and streamline the code a bit.
2017-01-07 18:18:54 +01:00
beorn7
767c0709b1
Retrieval: Avoid copying Target
...
retreival.Target contains a mutex. It was copied in the Targets()
call. This potentially can wreak a lot of havoc.
It might even have caused the issues reported as #2266 and #2262 .
2017-01-06 18:43:41 +01:00
Fabian Reinartz
e631a1260d
retrieval: use separate appender per target
2016-12-30 21:35:35 +01:00
Fabian Reinartz
f8fc1f5bb2
*: migrate ingestion to new batch Appender
2016-12-29 11:03:56 +01:00
Brian Brazil
6c07453ec1
Only clone the metric in the one place relabelling needs it. ( #2292 )
...
This cuts ~17% off memory allocations related to ingesting data
in a basic setup.
2016-12-21 10:00:33 +00:00
Brian Brazil
f421ce0636
Remove label from prometheus_target_skipped_scrapes_total ( #2289 )
...
This avoids it not being intialised, and breaking out by
interval wasn't partiuclarly useful.
Fixes #2269
2016-12-16 18:00:52 +00:00
Brian Brazil
30448286c7
Add sample_limit to scrape config.
...
This imposes a hard limit on the number of samples ingested from the
target. This is counted after metric relabelling, to allow dropping of
problemtic metrics.
This is intended as a very blunt tool to prevent overload due to
misbehaving targets that suddenly jump in sample count (e.g. adding
a label containing email addresses).
Add metric to track how often this happens.
Fixes #2137
2016-12-16 15:10:09 +00:00
Brian Brazil
c8de1484d5
Add scrape_samples_post_metric_relabeling
...
This reports the number of samples post any keep/drop
from metric relabelling.
2016-12-13 17:32:11 +00:00
Brian Brazil
06b9df65ec
Refactor and add unittests to scrape result handling.
2016-12-13 16:49:17 +00:00
Brian Brazil
b5ded43594
Allow buffering of scraped samples before sending them to storage.
2016-12-13 15:01:35 +00:00
Frederic Branczyk
33b583d50e
web/api: add targets endpoint
2016-12-05 13:13:21 +01:00
Frederic Branczyk
8f8cea4fbd
retrieval: refactor TargetManager to return flat list of Targets
2016-12-02 13:28:58 +01:00
Fabian Reinartz
200bbe1bad
config: extract SD and HTTPClient configurations
2016-11-23 18:23:37 +01:00
Fabian Reinartz
47623202c7
retrieval: remove metric namespaces
2016-11-23 09:17:04 +01:00
Fabian Reinartz
d7f4f8b879
discovery: move TargetSet into discovery package
2016-11-23 09:14:44 +01:00
Fabian Reinartz
d19d1bcad3
discovery: move into top-level package
2016-11-22 12:56:33 +01:00
Fabian Reinartz
7bd9508c9b
discovery: move TargetProvider and multi-constructor
2016-11-22 12:56:33 +01:00
Fabian Reinartz
bd0048477c
discovery: move remaining SDs into own package
2016-11-22 12:56:33 +01:00
Fabian Reinartz
5b72eae1b0
Merge pull request #2203 from prometheus/sdfix
...
Service discovery fixes
2016-11-21 16:46:20 +01:00
Fabian Reinartz
ec66082749
Merge branch 'ec2_sd_profile_support' of https://github.com/Ticketmaster/prometheus into Ticketmaster-ec2_sd_profile_support
2016-11-21 11:49:23 +01:00
Fabian Reinartz
06555bde93
Merge branch 'k8s_sd_metrics' of https://github.com/dominikschulz/prometheus into dominikschulz-k8s_sd_metrics
2016-11-21 11:44:48 +01:00
Fabian Reinartz
a1eec447a4
discovery: fix+consolidate Zookeeper discoveries
2016-11-18 13:20:58 +01:00
Fabian Reinartz
b4d7ce1370
discovery: respect context cancellation everywhere
...
This also removes closing of the target group channel everywhere
as the contexts cancels across all stages and we don't care about
draining all events once that happened.
2016-11-18 10:55:29 +01:00
Fabian Reinartz
bc7bd7202c
discovery: terminate senders before closing channel
...
Fixes #2200
2016-11-18 10:03:12 +01:00
Frederic Branczyk
0fcea6e9fb
retrieval/discovery/kubernetes: fix cache state unknown behavior ( #2180 )
...
* retrieval/discovery/kubernetes: fix cache state unknown behavior
* retrieval/discovery/kubernetes: extract type casting
* retrieval/discovery/kubernetes: add tests for possible regressions
2016-11-14 16:21:38 +01:00
Fabian Reinartz
fa82c65d15
Merge pull request #2186 from prometheus/fixes
...
Test fixes
2016-11-14 09:52:15 +01:00
Fabian Reinartz
7ecc271411
Move Fatalf call into main test goroutine
2016-11-13 18:21:42 +01:00
Fabian Reinartz
530cdba103
kubernetes: only use one error logging handler
2016-11-12 14:13:38 +01:00
beorn7
92c0ef1a92
Merge branch 'release-1.2' into beorn7/release
2016-11-03 22:48:39 +01:00
Kraig Amador
bec6870ed4
ec2_sd_configs: Support profiles for configuring the ec2 service
2016-11-03 08:38:02 -07:00
beorn7
0fdb74c069
Adjust dns.go to new miekg/dns package and improve error handling.
...
When hitting the 64kiB limit of DNS, the error message so far was
really misleading.
2016-11-03 15:42:11 +01:00
Brian Brazil
64263f280d
Add scrape_samples_scraped to indicate samples scraped. ( #2123 )
2016-10-26 17:43:01 +01:00
Brian Brazil
bbec65d454
Call SD metrics refresh rather than scrape. ( #2120 )
...
This avoids confusion with scrape_duration_seconds, and
is more in line with the API naming.
2016-10-26 10:03:35 +01:00
bekbulatov
2bc12fa2fb
Set timeout for marathon_sd
2016-10-24 11:27:08 +01:00
bekbulatov
c689b35858
Merge branch 'master' into marathon_tls
2016-10-24 10:37:32 +01:00
Dominik Schulz
eb10ff9871
Also handle service update in endpoints.go
2016-10-23 13:33:54 +02:00
Dominik Schulz
f002fe186a
Add Marathon-SD metrics. ( #2106 )
2016-10-21 11:14:53 +01:00
Mitsuhiro Tanda
296644adeb
Expose ec2_instance_type ( #2107 )
2016-10-21 11:13:47 +01:00
Dominik Schulz
36de163900
Add File-SD metrics ( #2103 )
...
* Add File-SD metrics
* Count read errors, not scan errors.
2016-10-21 11:12:19 +01:00
Dominik Schulz
3d0fb0cf17
Avoid too generic label type.
2016-10-21 12:11:15 +02:00
Dominik Schulz
e1e30f12cd
Add Kubernetes-SD metrics.
2016-10-21 10:48:28 +02:00
Dominik Schulz
552ab61fa1
Change SD metric names to make logical grouping more visible. ( #2102 )
2016-10-21 09:18:28 +01:00