Commit graph

63 commits

Author SHA1 Message Date
Frederic Branczyk 5cea27f06a
Merge pull request #3655 from Conorbro/dropped-target-fix
Fix dropped target list growing forever
2018-01-17 11:03:39 +01:00
Adam f64014f70b Fix the incrementing of prometheus_target_scrapes_exceeded_sample_limit_total (#3669)
Once a scrape target has reached the samle_limit, any further samples
from that target will continue to result in errSampleLimit.

This means that after the completion of the append loop `err` must still
be errSampleLimit, and so the metric would not have been incremented.
2018-01-09 15:43:28 +00:00
conorbroderick 658914ba27 Fix dropped target list growing forever 2018-01-05 11:08:37 +00:00
Krasi Georgiev 60ef2016d5 add a cancel func to the scrape pool as it is needed in the scrape loop select block 2017-12-18 17:29:00 +00:00
Krasi Georgiev 9c61f0e8a0 scrape pool doesn't rely on context as Stop() needs to be blocking to prevent Scrape loops trying to write to a closed TSDB storage. 2017-12-18 17:22:49 +00:00
Julius Volz 099df0c5f0 Migrate "golang.org/x/net/context" -> "context" (#3333)
In some places, where ctxhttp or gRPC are concerned, we still need to use the
old contexts.
2017-10-24 21:21:42 -07:00
Tobias Schmidt 40c278ee2d Send a HTTP Accept header when scraping 2017-09-25 14:51:29 +02:00
Fabian Reinartz 437f51a85f Fix cache maintenance on changing metric representations
We were not properly maintaining the scrape cache when the same metric
was exposed with a different string representation.
This overall reduces the scraping cache's complexity, which fixes the
issue and saves about 10% of memory in a scraping-only Prometheus
instance.
2017-09-19 15:03:27 +02:00
Goutham Veeramachaneni 3f0267c548 Merge branch 'dev-2.0' into go-kit/log
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-09-15 23:15:27 +05:30
Fabian Reinartz 1121b9f7d4 retrieval: cache dropped series, mutate labels in place 2017-09-14 08:36:19 +02:00
Fabian Reinartz d21f149745 *: migrate to go-kit/log 2017-09-08 22:01:51 +05:30
Fabian Reinartz 5bed8af4cb retrieval: pool scrape buffers
This adds a bucketed buffer pool to the scrapers so we don't have to
allocate a new buffer on each scrape or hold it fixed to the scrape
loop.

The latter can consume significant amounts of unused memory, e.g. 4GB
when scraping 2MB /metrics from 2000 targets.
2017-09-07 14:43:21 +02:00
Fabian Reinartz 0efecea6d4 Adapt storage APIs to uint64 references 2017-09-07 14:14:41 +02:00
Tom Wilkie db8128ceeb Add label set as first parameter to AddFast, ingored by TSDB adapter. 2017-07-12 15:20:12 +01:00
Goutham Veeramachaneni 643c5837a0 Stop metrics that are 10mins ahead from now
Fixes #2893

Signed-off-by: Goutham Veeramachaneni <goutham@boomerangcommerce.com>
2017-07-04 15:34:08 +02:00
Brian Brazil a6ca391e6e Reject scrapes with invalid utf-8 label values. 2017-06-20 10:54:39 +01:00
Fabian Reinartz 98c2d8477a Merge pull request #2844 from Gouthamve/cobra
Move CLI commander to cobra
2017-06-19 11:59:52 +02:00
Goutham Veeramachaneni 507790a357
Rework logging to use explicitly passed logger
Mostly cleaned up the global logger use. Still some uses in discovery
package.

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-06-16 15:52:44 +05:30
Julius Volz 6f66125809 retrieval: Fix "up" reporting for failed scrapes 2017-06-14 22:22:12 -04:00
Fabian Reinartz eb651233ac Merge pull request #2787 from prometheus/limit2
Rework sample limit to work for 2.0
2017-06-06 08:21:12 +02:00
Brian Brazil 37bc607e96 Rework sample limit to work for 2.0
Correctly update reported series.
Increment prometheus_target_scrapes_exceeded_sample_limit_total.
Add back unittests.
Ignore stale markers when calculating sample limit.

Fixes #2770
2017-05-31 15:41:51 +01:00
Fabian Reinartz bc7aff8cef retrieval: extract scrape cache 2017-05-30 09:37:23 -07:00
Fabian Reinartz a83014f53c retrieval: fix memory leak and consumption for caches 2017-05-26 08:44:24 +02:00
Fabian Reinartz 43ca652217 retrieval: Don't allocate map on every scrape 2017-05-24 16:23:48 +02:00
Fabian Reinartz d289dc55c3 storage: update TSDB 2017-05-22 11:53:08 +02:00
Brian Brazil bf38963118 Plumb through logger with target field to scrape loop. 2017-05-16 18:33:51 +01:00
Brian Brazil d532272520 Add stalemarkers to synthetic series too when target stops. 2017-05-16 18:33:51 +01:00
Brian Brazil b87d3ca9ea Create stale markers when a target is stopped.
When a target is no longer returned from SD stop()
is called. However it may be recreated before the
next scrape interval happens. So we wait to set stalemarkers
until the scrape of the new target would have happened
and been ingested, which is 2 scrape intervals.

If we're shutting down the context will be cancelled,
so return immediately rather than holding things up for potentially
minutes waiting to safely set stalemarkers no newer than now.
If the server starts immediately back up again all is well.
If not, we're missing some stale markers.
2017-05-16 18:33:51 +01:00
Brian Brazil 3c45400130 Don't fail scrape if one sample violates ordering.
In Prometheus 1.x one sample that is out of order
or that has a duplicate timestamp is discarded, and
the rest of the scrape ingestion continues on.
This will now also be true for 2.0.
2017-05-16 18:33:51 +01:00
Brian Brazil fd5c5a50a3 Add stale markers on parse error.
If we fail to parse the result of a scrape,
we should treat that as a failed scrape and
add stale markers.
2017-05-16 18:33:51 +01:00
Brian Brazil c0c7e32e61 Treat a failed scrape as an empty scrape for staleness.
If a target has died but is still in SD, we want the previously
scraped values to go stale. This would also apply to brief blips.
2017-05-16 18:33:51 +01:00
Brian Brazil 850ea412ad If an explicit timestamp is provided, bypass staleness. 2017-05-16 18:33:51 +01:00
Brian Brazil 5060a0fc51 Add unittests for ingestion stale NaNs 2017-05-16 18:33:51 +01:00
Brian Brazil 4f35952cf3 Inject a stale NaN when sample disappears between scrapes. 2017-05-16 18:33:51 +01:00
Brian Brazil beaa7d5a43 Move consistent NaN logic into the parser. 2017-05-16 18:33:51 +01:00
Brian Brazil 76acf7b9b1 Ensure all the NaNs we ingest have the same bit pattern. 2017-05-16 18:33:51 +01:00
Fabian Reinartz 73b8ff0ddc Merge branch 'master' into dev-2.0 2017-04-27 10:19:55 +02:00
Matt Layher 5e4f5fb5ad retrieval: make scrape timeout header consistent with others 2017-04-05 14:56:22 -04:00
Matt Layher fe4b6693f7 retrieval: add Scrape-Timeout-Seconds header to each scrape request (#2565)
Fixes #2508.
2017-04-04 18:26:28 +01:00
Fabian Reinartz 1d3cdd0d67 Merge branch 'master' into dev-2.0-rebase 2017-01-30 17:43:01 +01:00
Fabian Reinartz c691895a0f retrieval: cache series references, use pkg/textparse
With this change the scraping caches series references and only
allocates label sets if it has to retrieve a new reference.
pkg/textparse is used to do the conditional parsing and reduce
allocations from 900B/sample to 0 in the general case.
2017-01-16 12:03:57 +01:00
Fabian Reinartz ad9bc62e4c storage: extend appender and adapt it 2017-01-13 14:48:01 +01:00
beorn7 5dc01202d7 Retrieval: Remove some test lines that fail on Travis only
These lines exercise an append in
TestScrapeLoopWrapSampleAppender. Arguably, append shouldn't be tested
there in the first place.

Still it's weird why this fails on Travis:

```
--- FAIL: TestScrapeLoopWrapSampleAppender (0.00s)
    scrape_test.go:259: Expected count of 1, got 0
    scrape_test.go:290: Expected count of 1, got 0
2017/01/07 22:48:26 http: TLS handshake error from 127.0.0.1:50716: read tcp 127.0.0.1:40265->127.0.0.1:50716: read: connection reset by peer
FAIL
FAIL	github.com/prometheus/prometheus/retrieval	3.603s
```

Should anybody ever find out why, please revert this commit accordingly.
2017-01-08 00:01:46 +01:00
beorn7 3610331eeb Retrieval: Do not buffer the samples if no sample limit configured
Also, simplify and streamline the code a bit.
2017-01-07 18:18:54 +01:00
Fabian Reinartz e631a1260d retrieval: use separate appender per target 2016-12-30 21:35:35 +01:00
Fabian Reinartz f8fc1f5bb2 *: migrate ingestion to new batch Appender 2016-12-29 11:03:56 +01:00
Brian Brazil 30448286c7 Add sample_limit to scrape config.
This imposes a hard limit on the number of samples ingested from the
target. This is counted after metric relabelling, to allow dropping of
problemtic metrics.

This is intended as a very blunt tool to prevent overload due to
misbehaving targets that suddenly jump in sample count (e.g. adding
a label containing email addresses).

Add metric to track how often this happens.

Fixes #2137
2016-12-16 15:10:09 +00:00
Brian Brazil c8de1484d5 Add scrape_samples_post_metric_relabeling
This reports the number of samples post any keep/drop
from metric relabelling.
2016-12-13 17:32:11 +00:00
Brian Brazil 06b9df65ec Refactor and add unittests to scrape result handling. 2016-12-13 16:49:17 +00:00
Brian Brazil b5ded43594 Allow buffering of scraped samples before sending them to storage. 2016-12-13 15:01:35 +00:00