Commit graph

528 commits

Author SHA1 Message Date
Fabian Reinartz 5772f1a7ba retrieval/storage: adapt to new interface
This simplifies the interface to two add methods for
appends with labels or faster reference numbers.
2017-02-02 13:05:46 +01:00
Brian Brazil 34767c2221 Clone lset before relabelling. (#2386)
We need to not change the lset passed into populateLabels, as that
is kept around by the SDs.

Fixes 2377
2017-02-01 19:49:50 +00:00
Fabian Reinartz 1d3cdd0d67 Merge branch 'master' into dev-2.0-rebase 2017-01-30 17:43:01 +01:00
Fabian Reinartz 035976b275 retrieval: handle not found error correctly 2017-01-20 11:27:01 +01:00
Fabian Reinartz 598e2f01c0 retrieval: don't erronously break appending 2017-01-17 08:39:18 +01:00
Fabian Reinartz c691895a0f retrieval: cache series references, use pkg/textparse
With this change the scraping caches series references and only
allocates label sets if it has to retrieve a new reference.
pkg/textparse is used to do the conditional parsing and reduce
allocations from 900B/sample to 0 in the general case.
2017-01-16 12:03:57 +01:00
Fabian Reinartz ad9bc62e4c storage: extend appender and adapt it 2017-01-13 14:48:01 +01:00
Fabian Reinartz 3302bb1eb1 Merge pull request #2323 from prometheus/beorn7/retrieval
Retrieval: Avoid copying Target
2017-01-08 06:49:47 +01:00
Björn Rabenstein ad40d0abbc Merge pull request #2288 from prometheus/limit-scrape
Add ability to limit scrape samples, and related metrics
2017-01-08 01:34:06 +01:00
beorn7 5dc01202d7 Retrieval: Remove some test lines that fail on Travis only
These lines exercise an append in
TestScrapeLoopWrapSampleAppender. Arguably, append shouldn't be tested
there in the first place.

Still it's weird why this fails on Travis:

```
--- FAIL: TestScrapeLoopWrapSampleAppender (0.00s)
    scrape_test.go:259: Expected count of 1, got 0
    scrape_test.go:290: Expected count of 1, got 0
2017/01/07 22:48:26 http: TLS handshake error from 127.0.0.1:50716: read tcp 127.0.0.1:40265->127.0.0.1:50716: read: connection reset by peer
FAIL
FAIL	github.com/prometheus/prometheus/retrieval	3.603s
```

Should anybody ever find out why, please revert this commit accordingly.
2017-01-08 00:01:46 +01:00
beorn7 3610331eeb Retrieval: Do not buffer the samples if no sample limit configured
Also, simplify and streamline the code a bit.
2017-01-07 18:18:54 +01:00
beorn7 767c0709b1 Retrieval: Avoid copying Target
retreival.Target contains a mutex. It was copied in the Targets()
call. This potentially can wreak a lot of havoc.

It might even have caused the issues reported as #2266 and #2262 .
2017-01-06 18:43:41 +01:00
Fabian Reinartz e631a1260d retrieval: use separate appender per target 2016-12-30 21:35:35 +01:00
Fabian Reinartz f8fc1f5bb2 *: migrate ingestion to new batch Appender 2016-12-29 11:03:56 +01:00
Brian Brazil 6c07453ec1 Only clone the metric in the one place relabelling needs it. (#2292)
This cuts ~17% off memory allocations related to ingesting data
in a basic setup.
2016-12-21 10:00:33 +00:00
Brian Brazil f421ce0636 Remove label from prometheus_target_skipped_scrapes_total (#2289)
This avoids it not being intialised, and breaking out by
interval wasn't partiuclarly useful.

Fixes #2269
2016-12-16 18:00:52 +00:00
Brian Brazil 30448286c7 Add sample_limit to scrape config.
This imposes a hard limit on the number of samples ingested from the
target. This is counted after metric relabelling, to allow dropping of
problemtic metrics.

This is intended as a very blunt tool to prevent overload due to
misbehaving targets that suddenly jump in sample count (e.g. adding
a label containing email addresses).

Add metric to track how often this happens.

Fixes #2137
2016-12-16 15:10:09 +00:00
Brian Brazil c8de1484d5 Add scrape_samples_post_metric_relabeling
This reports the number of samples post any keep/drop
from metric relabelling.
2016-12-13 17:32:11 +00:00
Brian Brazil 06b9df65ec Refactor and add unittests to scrape result handling. 2016-12-13 16:49:17 +00:00
Brian Brazil b5ded43594 Allow buffering of scraped samples before sending them to storage. 2016-12-13 15:01:35 +00:00
Frederic Branczyk 33b583d50e
web/api: add targets endpoint 2016-12-05 13:13:21 +01:00
Frederic Branczyk 8f8cea4fbd
retrieval: refactor TargetManager to return flat list of Targets 2016-12-02 13:28:58 +01:00
Fabian Reinartz 200bbe1bad config: extract SD and HTTPClient configurations 2016-11-23 18:23:37 +01:00
Fabian Reinartz 47623202c7 retrieval: remove metric namespaces 2016-11-23 09:17:04 +01:00
Fabian Reinartz d7f4f8b879 discovery: move TargetSet into discovery package 2016-11-23 09:14:44 +01:00
Fabian Reinartz d19d1bcad3 discovery: move into top-level package 2016-11-22 12:56:33 +01:00
Fabian Reinartz 7bd9508c9b discovery: move TargetProvider and multi-constructor 2016-11-22 12:56:33 +01:00
Fabian Reinartz bd0048477c discovery: move remaining SDs into own package 2016-11-22 12:56:33 +01:00
Fabian Reinartz 5b72eae1b0 Merge pull request #2203 from prometheus/sdfix
Service discovery fixes
2016-11-21 16:46:20 +01:00
Fabian Reinartz ec66082749 Merge branch 'ec2_sd_profile_support' of https://github.com/Ticketmaster/prometheus into Ticketmaster-ec2_sd_profile_support 2016-11-21 11:49:23 +01:00
Fabian Reinartz 06555bde93 Merge branch 'k8s_sd_metrics' of https://github.com/dominikschulz/prometheus into dominikschulz-k8s_sd_metrics 2016-11-21 11:44:48 +01:00
Fabian Reinartz a1eec447a4 discovery: fix+consolidate Zookeeper discoveries 2016-11-18 13:20:58 +01:00
Fabian Reinartz b4d7ce1370 discovery: respect context cancellation everywhere
This also removes closing of the target group channel everywhere
as the contexts cancels across all stages and we don't care about
draining all events once that happened.
2016-11-18 10:55:29 +01:00
Fabian Reinartz bc7bd7202c discovery: terminate senders before closing channel
Fixes #2200
2016-11-18 10:03:12 +01:00
Frederic Branczyk 0fcea6e9fb retrieval/discovery/kubernetes: fix cache state unknown behavior (#2180)
* retrieval/discovery/kubernetes: fix cache state unknown behavior

* retrieval/discovery/kubernetes: extract type casting

* retrieval/discovery/kubernetes: add tests for possible regressions
2016-11-14 16:21:38 +01:00
Fabian Reinartz fa82c65d15 Merge pull request #2186 from prometheus/fixes
Test fixes
2016-11-14 09:52:15 +01:00
Fabian Reinartz 7ecc271411 Move Fatalf call into main test goroutine 2016-11-13 18:21:42 +01:00
Fabian Reinartz 530cdba103 kubernetes: only use one error logging handler 2016-11-12 14:13:38 +01:00
beorn7 92c0ef1a92 Merge branch 'release-1.2' into beorn7/release 2016-11-03 22:48:39 +01:00
Kraig Amador bec6870ed4 ec2_sd_configs: Support profiles for configuring the ec2 service 2016-11-03 08:38:02 -07:00
beorn7 0fdb74c069 Adjust dns.go to new miekg/dns package and improve error handling.
When hitting the 64kiB limit of DNS, the error message so far was
really misleading.
2016-11-03 15:42:11 +01:00
Brian Brazil 64263f280d Add scrape_samples_scraped to indicate samples scraped. (#2123) 2016-10-26 17:43:01 +01:00
Brian Brazil bbec65d454 Call SD metrics refresh rather than scrape. (#2120)
This avoids confusion with scrape_duration_seconds, and
is more in line with the API naming.
2016-10-26 10:03:35 +01:00
bekbulatov 2bc12fa2fb Set timeout for marathon_sd 2016-10-24 11:27:08 +01:00
bekbulatov c689b35858 Merge branch 'master' into marathon_tls 2016-10-24 10:37:32 +01:00
Dominik Schulz eb10ff9871 Also handle service update in endpoints.go 2016-10-23 13:33:54 +02:00
Dominik Schulz f002fe186a Add Marathon-SD metrics. (#2106) 2016-10-21 11:14:53 +01:00
Mitsuhiro Tanda 296644adeb Expose ec2_instance_type (#2107) 2016-10-21 11:13:47 +01:00
Dominik Schulz 36de163900 Add File-SD metrics (#2103)
* Add File-SD metrics

* Count read errors, not scan errors.
2016-10-21 11:12:19 +01:00
Dominik Schulz 3d0fb0cf17 Avoid too generic label type. 2016-10-21 12:11:15 +02:00
Dominik Schulz e1e30f12cd Add Kubernetes-SD metrics. 2016-10-21 10:48:28 +02:00
Dominik Schulz 552ab61fa1 Change SD metric names to make logical grouping more visible. (#2102) 2016-10-21 09:18:28 +01:00
Dominik Schulz 0c69227616 Add Consul-SD metrics (#2097)
* Add Consul-SD metrics

* Remove unnecessary metric and add labels to summary.

* Do not stutter
2016-10-21 08:59:43 +01:00
Dominik Schulz 255a8c8b4c Fix small typo in EC2 SD metric name (#2100) 2016-10-20 09:01:00 +01:00
Dominik Schulz 00e486a05b Add Azure-SD metrics (#2099) 2016-10-20 08:23:50 +01:00
Dominik Schulz 163d5a8977 Add EC2 SD metrics (#2095)
* Add EC2 SD metrics

* Address review comments
2016-10-19 10:20:00 +01:00
Fabian Reinartz 3c8140f2e6 kubernetes: fix typo in endpoint switch case 2016-10-18 16:20:26 +02:00
bekbulatov ac702f66eb Resolve merge conflicts 2016-10-18 14:14:24 +01:00
Fabian Reinartz 228bfc1bb5 Merge pull request #2040 from prometheus/kubernetes
Add K8S v2 pod discovery
2016-10-17 20:09:22 +02:00
Fabian Reinartz ce45040e47 kubernetes: fix missing port labels
This commit fixes endpoint port labeling, adjusts tests accordingly
and enhances test delta printing
2016-10-17 11:05:13 +02:00
Frederic Branczyk 8f576a8510 retrieval: add kubernetes endpoint discovery tests 2016-10-17 10:32:10 +02:00
Frederic Branczyk 08fa4eaa92 retrieval: add kubernetes pod discovery tests 2016-10-17 10:32:10 +02:00
Frederic Branczyk 3762e39ce5 retrieval: add kubernetes service discovery tests 2016-10-17 10:32:10 +02:00
Frederic Branczyk 397072a482 retrieval: add kubernetes node discovery tests 2016-10-17 10:32:10 +02:00
Frederic Branczyk cc46058802 retrieval: kubernetes nodes are not namespaced 2016-10-17 10:32:10 +02:00
Frederic Branczyk a318d9ad27 retrieval: fix pod label and annotation prefixes 2016-10-17 10:32:10 +02:00
Fabian Reinartz b24602f713 kubernetes: merge back into single configuration 2016-10-17 10:32:10 +02:00
Fabian Reinartz a9cfb66b28 kubernetes: add node discovery 2016-10-17 10:32:10 +02:00
Fabian Reinartz d896a654f9 kubernetes: Add discovery of services 2016-10-17 10:32:10 +02:00
Fabian Reinartz 6d269ed870 kubernetes: infer pod information in endpoints discovery 2016-10-17 10:32:10 +02:00
Fabian Reinartz 7c439a9060 kubernetes: use and vendor 1.5 client 2016-10-17 10:32:10 +02:00
Fabian Reinartz de22524e57 kubernetes: add KubernetesV2 endpoints 2016-10-17 10:32:10 +02:00
Fabian Reinartz 2331701b50 kubernetes: Add K8S v2 pod discovery
This adds plumbing for a parallel version of the new K8S SD
and adds pod discovery as the first role.
2016-10-17 10:32:10 +02:00
Dominik Schulz bfa7099616 Report GCE instance metdata (#2084)
* Report GCE instance metdata

* Fix spelling acording to code review guidelines

* Address review comments
2016-10-17 09:45:43 +02:00
Dominik Schulz c73aa82589 Add GCE Instance Status 2016-10-08 08:40:12 +02:00
bekbulatov 01b53c1180 Add tls support 2016-10-07 13:40:22 +01:00
Roman Vynar db63a4bd2a
Do not fail Consul discovery on Prometheus startup when Consul is down. 2016-09-26 22:20:56 +03:00
Dominik Schulz f6fbcf9aa2 Expose ec2_instance_state 2016-09-22 15:01:23 +02:00
Tom Wilkie 4520e12440 Add HTTP Basic Auth & TLS support to the generic write path. (#1957)
* Add config, HTTP Basic Auth and TLS support to the generic write path.

- Move generic write path configuration to the config file
- Factor out config.TLSConfig -> tlf.Config translation
- Support TLSConfig for generic remote storage
- Rename Run to Start, and make it non-blocking.
- Dedupe code in httputil for TLS config.
- Make remote queue metrics global.
2016-09-19 22:47:51 +02:00
Matt Bostock 4fc619b605 Scrape: Remove JSON from Accept request header
JSON is no longer supported as an exposition format [1] [2] [3]. Remove
it from the `Accept` header added to requests when scraping targets.

[1]: https://github.com/prometheus/prometheus/blob/master/CHANGELOG.md#100--2016-07-18
[2]: https://prometheus.io/docs/instrumenting/exposition_formats/#historical-versions
[3]: https://docs.google.com/document/d/1ZjyKiKxZV83VI9ZKAXRGKaUKK2BIWCT7oiGBKDBpjEY/edit?usp=sharing
2016-09-17 10:28:03 +01:00
Ingo Gottwald 3b546d061f Add support for GCE discovery 2016-09-16 08:55:33 +02:00
Tobias Schmidt 29ced0090f Fix common english misspellings 2016-09-14 23:23:28 -04:00
Tobias Schmidt 27074863b4 Print url.URLs correctly in tests 2016-09-14 23:15:18 -04:00
Tobias Schmidt 8f3b62bfe4 Simplify struct initialization 2016-09-14 23:13:27 -04:00
Dan Milstein 0cb6b9962e Fix broken test which relied on DNS resolution #1962
Switched to testing by way of the static_configs rather than
dns_sd_config parameter.  Verified that the revised test both passes
without network access, and also still catches the bug it's supposed to
cover.
2016-09-08 16:59:46 -04:00
Fabian Reinartz fec3b54cfc Merge pull request #1946 from prometheus/ipv6
Fix IPv6 scraping
2016-09-06 17:18:28 +02:00
Fabian Reinartz a15237a0b8 retrieval: correctly handle IPv6 addresses
This updates all service discoveries to correctly
build the __address__ label for IPv6 addresses.
2016-09-06 15:06:49 +02:00
Fabian Reinartz 17cdd4f966 retrieval: fix IPv6 port default, add tests
This fixes port defaulting for IPv6 addresses and restructures
and test the construction of target label sets.
2016-09-06 15:06:48 +02:00
Fabian Reinartz 0322c59dc3 retrieval: export NewHTTPClient 2016-09-05 16:44:40 +02:00
Dan Milstein b9fb9742ed Move test helper function into scope of test func 2016-08-29 16:08:40 -04:00
Dan Milstein 79216011cb Add basic test for TargetManager.targetSet
Verify that if the configs change, target groups are cleaned on
TargetManager.reload (rather than having old ones linger around, even if
they are no longer present in the configs).

This covers the bug fixed in #1907 -- I verified that by checking out
source from before that commit.

This is a start on #1906
2016-08-26 14:30:26 -04:00
Björn Rabenstein 4b8f963847 Merge pull request #1915 from prometheus/release-1.0
Forward-merge the bug fix from release-1.0
2016-08-24 13:04:45 +02:00
beorn7 e2b3626e0c retrieval: Clean up target group map on config reload
Also, remove unused `providers` field in targetSet.

If the config file changes, we recreate all providers (by calling
`providersFromConfig`) and retrieve all targets anew from the newly
created providers. From that perspective, it cannot harm to clean up
the target group map in the targetSet. Not doing so (as it was the
case so far) keeps stale targets around. This mattered if an existing
key in the target group map was not overwritten in the initial fetch
of all targets from the providers. Examples where that mattered:

```
scrape_configs:
- job_name: "foo"
  static_configs:
  - targets: ["foo:9090"]
  - targets: ["bar:9090"]
```
updated to:
```
scrape_configs:
- job_name: "foo"
  static_configs:
  - targets: ["foo:9090"]
```

`bar:9090` would still be monitored. (The static provider just
enumerates the target groups. If the number of target groups
decreases, the old ones stay around.

```
scrape_configs:
- job_name: "foo"
  dns_sd_configs:
  - names:
    - "srv.name.one.example.org"
```
updated to:
```
scrape_configs:
- job_name: "foo"
  dns_sd_configs:
  - names:
    - "srv.name.two.example.org"
```

Now both SRV records are still monitored. The SRV name is part of the
key in the target group map, thus the new one is just added and the
old ane stays around.

Obviously, this should have tests, and should have tests before, not
only for this case. This is the quick fix. I have created
https://github.com/prometheus/prometheus/issues/1906 to track test
creation.

Fixes https://github.com/prometheus/prometheus/issues/1610 .
2016-08-22 19:25:33 +02:00
Anders Daljord Morken 95cadd0702 Run scrape loop with interval 1 instead of 0
0 is considered an invalid interval by time.NewTicker() and will cause a
panic if control reaches that point. Given the vagaries of timekeeping,
this may occasionally happen and make this test unstable.
2016-08-18 09:39:11 +02:00
Anders Daljord Morken 8633ac180e Strip stray whitespace from bearer token file
Apart from not trying to send a newline in a HTTP header,
this also allows Prometheus to build and pass tests with Go 1.7,
which features stricter checking of HTTP headers.
2016-08-17 15:36:18 +02:00
Frederic Branczyk 7714b9c781 move relabeling functionality to its own package
also remove the returned error as it was always nil
2016-08-09 14:19:20 +02:00
Jimmi Dyson 6c8080607f
Kubernetes SD: Add node name and host IP to pod discovery 2016-07-20 12:00:54 +01:00
Dmitry Vorobev 273e457da4 web: return status code and error message for config resource 2016-07-15 10:15:24 +02:00
beorn7 064b57858e Consistently use the Seconds() method for conversion of durations
This also fixes one remaining case of recording integral numbers
of seconds only for a metric, i.e. this will probably fix #1796.
2016-07-07 15:24:35 +02:00
Fabian Reinartz 4591a2623b discovery/kubernetes: filter pod/container, service/endpoint
This change distinguishes and filters by pod/container and
service/endpoint in the respective sub-SDs.
2016-07-05 14:24:17 +02:00