Commit graph

235 commits

Author SHA1 Message Date
Julien Pivotto 9d65017798 config: fix puppetdb tests
This PR fixes the tests in main. The last merge introduced a failing
test in the config package.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-09-16 18:31:28 +02:00
Julien Pivotto 8920024323 Add PuppetDB service discovery
We have been Puppet user for 10 years and we are users of
https://github.com/camptocamp/prometheus-puppetdb-sd

However, that file_sd implementation contains business logic and
assumptions around e.g. the modules which you are using.

This pull request adds a simple PuppetDB service discovery, which will
enable more use cases than the upstream sd.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-09-16 16:54:26 +02:00
DrAuYueng e8be1d0a5c
Check relabel action at yaml unmarshal stage (#9224)
Signed-off-by: DrAuYueng <ouyang1204@gmail.com>
2021-08-31 17:52:57 +02:00
Levi Harrison c1b1b826ce HostNetworkHost -> HostNetworkingHost
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-08-03 05:58:49 -06:00
Levi Harrison 89f154d643 Added tests
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-07-27 23:33:28 -04:00
austin ce bbc951f50b
Add config tests for kuma SD
Signed-off-by: austin ce <austin.cawley@gmail.com>
2021-07-21 12:55:02 -04:00
3Xpl0it3r a0bac4b488
add kubeconfig support in discovery module (#8811)
Signed-off-by: 3Xpl0it3r <shouc.wang@hotmail.com>
2021-06-17 12:41:50 +02:00
Levi Harrison faed8df31d
Enable reading consul token from file (#8926)
* Adopted common http client

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-06-12 00:06:59 +02:00
Julien Pivotto c0c22ed042
Merge pull request #8927 from LeviHarrison/move-to-go-kit/log
Migrate From `go-kit/kit/log` to `go-kit/log`
2021-06-11 21:15:56 +02:00
Levi Harrison b5f6f8fb36 Switched to go-kit/log
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-06-11 12:28:36 -04:00
Julien Pivotto 9444698ae2
http_sd (#8839)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-06-11 18:04:45 +02:00
Julien Pivotto 20c6739adc
Merge pull request #8833 from hanjm/feature/add-scape-read-body-limit
Add body_size_limit to prevent bad targets response large body cause Prometheus server OOM (#8827)
2021-06-02 09:24:59 +02:00
TJ Hoplock dc22c65349
Add Linode Service Discovery (#8846)
* Add Linode Service Discovery

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
2021-06-01 20:32:36 +02:00
hanjm 1df05bfd49 Add body_size_limit to prevent bad targets response large body cause Prometheus server OOM (#8827)
Signed-off-by: hanjm <hanjinming@outlook.com>
2021-05-29 07:05:42 +08:00
Julien Pivotto f3b2d2a998
Fix config tests in main branch (#8767)
The merge of 8761 did not catch that the secrets were off by one
because it was not rebased on top of 8693.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-04-29 00:00:30 +02:00
Levi Harrison fa184a5fc3
Add OAuth 2.0 Config (#8761)
* Introduced oauth2 config into the codebase

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-04-28 14:47:52 +02:00
n888 7c028d59c2
Add lightsail service discovery (#8693)
Signed-off-by: N888 <drifto@gmail.com>
2021-04-28 11:29:12 +02:00
Julien Pivotto 5bce801a09
Rename discovery/dockerswarm to discovery/moby (#8691)
This makes it clear that the dockerswarm package does more than docker
swarm, but does also docker.

I have picked moby as it is the upstream name: https://mobyproject.org/

There is no user-facing change, except in the case of a bad
configuration. Previously, a user who would have a bad docker sd config
would see an error like:

> field xx not found in type dockerswarm.plain

Now that error would be turned into:

> field xx not found in type moby.plain

While not perfect, it should at not be confusing between docker and
dockerswarm.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-04-13 09:33:54 +02:00
Julien Pivotto e635ca834b Add environment variable expansion in external label values
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-03-30 01:36:28 +02:00
Robert Jacob b253056163
Implement Docker discovery (#8629)
* Implement Docker discovery

Signed-off-by: Robert Jacob <xperimental@solidproject.de>
2021-03-29 22:30:23 +02:00
Julien Pivotto 5a6d244b00 Scaleway SD: Add the ability to read token from file
Prometheus adds the ability to read secrets from files. This add
this feature for the scaleway service discovery.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-03-25 00:52:33 +01:00
Julien Pivotto 49016994ac Switch to alertmanager api v2
According to the 2.25 release notes, 2.26 should switch to alertmanager
api v2 by default.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-03-20 01:01:10 +01:00
Rémy Léone f690b811c5
add support for scaleway service discovery (#8555)
Co-authored-by: Patrik <patrik@ptrk.io>
Co-authored-by: Julien Pivotto <roidelapluie@inuits.eu>

Signed-off-by: Rémy Léone <rleone@scaleway.com>
2021-03-10 15:10:17 +01:00
Julien Pivotto 93c6139bc1 Support follow_redirect
This PR introduces support for follow_redirect, to enable users to
disable following HTTP redirects.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-02-26 22:50:56 +01:00
Harkishen-Singh 79ba53a6c4 Custom headers on remote-read and refactor implementation to roundtripper.
Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>
2021-02-26 17:20:29 +05:30
Julien Pivotto 8787f0aed7 Update common to support credentials type
Most of the backwards compat tests is done in common.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-02-18 23:28:22 +01:00
Harkishen-Singh 77c20fd2f8 Adds support to configure retry on Rate-Limiting from remote-write config.
Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>
2021-02-16 14:52:49 +05:30
Nándor István Krácser 509000269a
remote_write: allow passing along custom HTTP headers (#8416)
* remote_write: allow passing along custom HTTP headers

Signed-off-by: Nandor Kracser <bonifaido@gmail.com>

* add warning

Signed-off-by: Nandor Kracser <bonifaido@gmail.com>

* remote_write: add header valadtion

Signed-off-by: Nandor Kracser <bonifaido@gmail.com>

* extend tests for bad remote write headers

Signed-off-by: Nandor Kracser <bonifaido@gmail.com>

* remote_write: add note about the authorization header

Signed-off-by: Nandor Kracser <bonifaido@gmail.com>
2021-02-04 14:18:13 -07:00
gotjosh 4eca4dffb8
Allow metric metadata to be propagated via Remote Write. (#6815)
* Introduce a metadata watcher

Similarly to the WAL watcher, its purpose is to observe the scrape manager and pull metadata. Then, send it to a remote storage.

Signed-off-by: gotjosh <josue@grafana.com>

* Additional fixes after rebasing.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Rework samples/metadata metrics.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Use more descriptive variable names in MetadataWatcher collect.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix issues caused during rebasing.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix missing metric add and unneeded config code.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address some review comments.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix metrics and docs

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Replace assert with require

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Bring back max_samples_per_send metric

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix tests

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

Co-authored-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-11-19 20:53:03 +05:30
Julien Pivotto 3509647462
Docker swarm: add filtering of services (#8074)
* Docker swarm: add filtering of services

Add filters on all docker swarm roles (nodes, tasks and services).

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-11-09 12:41:02 +01:00
Julien Pivotto 6c56a1faaa
Testify: move to require (#8122)
* Testify: move to require

Moving testify to require to fail tests early in case of errors.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* More moves

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-29 09:43:23 +00:00
Julien Pivotto 1282d1b39c
Refactor test assertions (#8110)
* Refactor test assertions

This pull request gets rid of assert.True where possible to use
fine-grained assertions.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-27 11:06:53 +01:00
Julien Pivotto 4e5b1722b3
Move away from testutil, refactor imports (#8087)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-22 11:00:08 +02:00
kangwoo 7c0d5ae4e7
Add Eureka Service Discovery (#3369)
Signed-off-by: kangwoo <kangwoo@gmail.com>
2020-08-26 17:36:59 +02:00
Lukas Kämmerling b6955bf1ca
Add hetzner service discovery (#7822)
Signed-off-by: Lukas Kämmerling <lukas.kaemmerling@hetzner-cloud.de>
2020-08-21 15:49:19 +02:00
Andy Bursavich 4e6a94a27d
Invert service discovery dependencies (#7701)
This also fixes a bug in query_log_file, which now is relative to the config file like all other paths.

Signed-off-by: Andy Bursavich <abursavich@gmail.com>
2020-08-20 13:48:26 +01:00
Julien Pivotto f8ec72d730 Add digitalocean test
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-22 00:04:36 +02:00
Julien Pivotto a197508d09 Add docker swarm test
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-22 00:04:36 +02:00
Steffen Neubauer 9c9b872087
OpenStack SD: Add availability config option, to choose endpoint type (#7494)
* OpenStack SD: Add availability config option, to choose endpoint type

In some environments Prometheus must query OpenStack via an alternative
endpoint type (gophercloud calls this `availability`.

This commit implements this option.

Co-Authored-By: Dennis Kuhn <d.kuhn@syseleven.de>
Signed-off-by: Steffen Neubauer <s.neubauer@syseleven.de>
2020-07-02 15:17:56 +01:00
Jop Zinkweg 1f69c38ba4
Add discovery support for triton compute nodes (#7250)
Added optional configuration item role, defaults to 'container' (backwards-compatible).
Setting role to 'cn' will discover compute nodes instead.

Human-friendly compute node hostname discovery depends on cmon 1.7.0:
c1a2aeca36

Adjust testcases to use discovery config per case as two different types are now supported.

Updated documentation:
* new role setting
* clarify what the name 'container' covers as triton uses different names in different locations

Signed-off-by: jzinkweg <jzinkweg@gmail.com>
2020-05-22 16:19:21 +01:00
Aleksandra Gacek 8e53c19f9c discovery/kubernetes: expose label_selector and field_selector
Close #6807

Co-authored-by @shuttie
Signed-off-by: Aleksandra Gacek <algacek@google.com>
2020-02-15 14:57:56 +01:00
Grebennikov Roman b4445ff03f discovery/kubernetes: expose label_selector and field_selector
Closes #6096

Signed-off-by: Grebennikov Roman <grv@dfdx.me>
2020-02-15 14:57:38 +01:00
Julien Pivotto 9d9bc524e5 Add query log (#6520)
* Add query log, make stats logged in JSON like in the API

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-08 13:28:43 +00:00
Callum Styan 67838643ee
Add config option for remote job name (#6043)
* Track remote write queues via a map so we don't care about index.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Support a job name for remote write/read so we can differentiate between
them using the name.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Remote write/read has Name to not confuse the meaning of the field with
scrape job names.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Split queue/client label into remote_name and url labels.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Don't allow for duplicate remote write/read configs.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Ensure we restart remote write queues if the hash of their config has
not changed, but the remote name has changed.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Include name in remote read/write config hashes, simplify duplicates
check, update test accordingly.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-12-12 12:47:23 -08:00
johncming 8d3083e256 config: add test case for scrape interval larger than timeout. (#6037)
Signed-off-by: johncming <johncming@yahoo.com>
2019-09-23 13:26:56 +02:00
Bartek Plotka f0863a604e Removed extra tsdb/testutil after merge.
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2019-08-14 10:12:32 +01:00
Chris Marchbanks 529ccff07b
Remove all usages of stretchr/testify
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2019-08-08 19:49:27 -06:00
Max Leonard Inden 41c22effbe
config&notifier: Add option to use Alertmanager API v2
With v0.16.0 Alertmanager introduced a new API (v2). This patch adds a
configuration option for Prometheus to send alerts to the v2 endpoint
instead of the defautl v1 endpoint.

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-06-21 16:33:53 +02:00
Callum Styan 5603b857a9 Check if label value is valid when unmarhsaling external labels from
YAML, add a test to config_tests for valid/invalid external label
value.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-03-18 20:31:12 +00:00
Tom Wilkie c7b3535997 Use pkg/relabelling in remote write.
- Unmarshall external_labels config as labels.Labels, add tests.
- Convert some more uses of model.LabelSet to labels.Labels.
- Remove old relabel pkg (fixes #3647).
- Validate external label names.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-03-18 20:31:12 +00:00
Julien Pivotto 4397916cb2 Add honor_timestamps (#5304)
Fixes #5302

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2019-03-15 10:04:15 +00:00
Callum Styan 83c46fd549 update Consul vendor code so that catalog.ServiceMultipleTags can be (#5151)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-03-12 10:31:27 +00:00
Simon Pasquier 027d2ece14 config: resolve more file paths (#5284)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-03-12 10:24:15 +00:00
Simon Pasquier e72c875e63
config: fix Kubernetes config with empty API server (#5256)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-22 15:51:47 +01:00
Simon Pasquier c8a1a5a93c
discovery/kubernetes: fix support for password_file and bearer_token_file (#5211)
* discovery/kubernetes: fix support for password_file

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Create and pass custom RoundTripper to Kubernetes client

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Use inline HTTPClientConfig

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-20 11:22:34 +01:00
Simon Pasquier f678e27eb6
*: use latest release of staticcheck (#5057)
* *: use latest release of staticcheck

It also fixes a couple of things in the code flagged by the additional
checks.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Use official release of staticcheck

Also run 'go list' before staticcheck to avoid failures when downloading packages.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-01-04 14:47:38 +01:00
Marcel D. Juhnke c7d83b2b6a discovery: add support for Managed Identity authentication in Azure SD (#4590)
Signed-off-by: Marcel Juhnke <marrat@marrat.de>
2018-12-19 10:03:33 +00:00
Bartek Płotka 62c8337e77 Moved configuration into relabel package. (#4955)
Adapted top dir relabel to use pkg relabel structs.

Removal of this in a separate tracked here: https://github.com/prometheus/prometheus/issues/3647

Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2018-12-18 11:26:36 +00:00
Julius Volz d28246e337
Fix config loading panics on nil pointer slice elements (#4942)
Fixes https://github.com/prometheus/prometheus/issues/4902
Fixes https://github.com/prometheus/prometheus/issues/4889

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-12-03 18:09:02 +08:00
mengnan a5d39361ab discovery/azure: Fail hard when Azure authentication parameters are missing (#4907)
* discovery/azure: fail hard when client_id/client_secret is empty

Signed-off-by: mengnan <supernan1994@gmail.com>

* discovery/azure: fail hard when authentication parameters are missing

Signed-off-by: mengnan <supernan1994@gmail.com>

* add unit test

Signed-off-by: mengnan <supernan1994@gmail.com>

* add unit test

Signed-off-by: mengnan <supernan1994@gmail.com>

* format code

Signed-off-by: mengnan <supernan1994@gmail.com>
2018-11-29 16:47:59 +01:00
Ben Kochie c6399296dc
Fix spelling/typos (#4921)
* Fix spelling/typos

Fix spelling/typos reported by codespell/misspell.
* UK -> US spelling changes.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-27 17:44:29 +01:00
Simon Pasquier ff08c40091 discovery/openstack: support tls_config
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-09-25 14:31:32 +02:00
Simon Pasquier 128ff546b8 config: add test for OpenStack SD (#4594)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-09-13 21:44:27 +05:30
Tariq Ibrahim f708fd5c99 Adding support for multiple azure environments (#4569)
Signed-off-by: Tariq Ibrahim <tariq.ibrahim@microsoft.com>
2018-09-04 17:55:40 +02:00
Daisy T 7d01ead689 change time.duration to model.duration for standardization (#4479)
Signed-off-by: Daisy T <daisyts@gmx.com>
2018-08-24 16:55:21 +02:00
Paul Gier d24d2acd11 config: set target group source index during unmarshalling (#4245)
* config: set target group source index during unmarshalling

Fixes issue #4214 where the scrape pool is unnecessarily reloaded for a
config reload where the config hasn't changed.  Previously, the discovery
manager changed the static config after loading which caused the in-memory
config to differ from a freshly reloaded config.

Signed-off-by: Paul Gier <pgier@redhat.com>

* [issue #4214] Test that static targets are not modified by discovery manager

Signed-off-by: Paul Gier <pgier@redhat.com>
2018-06-13 16:34:59 +01:00
Philippe Laflamme 2aba238f31 Use common HTTPClientConfig for marathon_sd configuration (#4009)
This adds support for basic authentication which closes #3090

The support for specifying the client timeout was removed as discussed in https://github.com/prometheus/common/pull/123. Marathon was the only sd mechanism doing this and configuring the timeout is done through `Context`.

DC/OS uses a custom `Authorization` header for authenticating. This adds 2 new configuration properties to reflect this.

Existing configuration files that use the bearer token will no longer work. More work is required to make this backwards compatible.
2018-04-05 09:08:18 +01:00
Manos Fokas 25f929b772 Yaml UnmarshalStrict implementation. (#4033)
* Updated yaml vendor package.

* remove checkOverflow duplicate in rulefmt

* remove duplicated HTTPClientConfig.Validate()

* Added yaml static check.
2018-04-04 09:07:39 +01:00
Kristiyan Nikolov be85ba3842 discovery/ec2: Support filtering instances in discovery (#4011) 2018-03-31 07:51:11 +01:00
Corentin Chary 60dafd425c consul: improve consul service discovery (#3814)
* consul: improve consul service discovery

Related to #3711

- Add the ability to filter by tag and node-meta in an efficient way (`/catalog/services`
  allow filtering by node-meta, and returns a `map[string]string` or `service`->`tags`).
  Tags and nore-meta are also used in `/catalog/service` requests.
- Do not require a call to the catalog if services are specified by name. This is important
  because on large cluster `/catalog/services` changes all the time.
- Add `allow_stale` configuration option to do stale reads. Non-stale
  reads can be costly, even more when you are doing them to a remote
  datacenter with 10k+ targets over WAN (which is common for federation).
- Add `refresh_interval` to minimize the strain on the catalog and on the
  service endpoint. This is needed because of that kind of behavior from
  consul: https://github.com/hashicorp/consul/issues/3712 and because a catalog
  on a large cluster would basically change *all* the time. No need to discover
  targets in 1sec if we scrape them every minute.
- Added plenty of unit tests.

Benchmarks
----------

```yaml
scrape_configs:

- job_name: prometheus
  scrape_interval: 60s
  static_configs:
    - targets: ["127.0.0.1:9090"]

- job_name: "observability-by-tag"
  scrape_interval: "60s"
  metrics_path: "/metrics"
  consul_sd_configs:
    - server: consul.service.par.consul.prod.crto.in:8500
      tag: marathon-user-observability  # Used in After
      refresh_interval: 30s             # Used in After+delay
  relabel_configs:
    - source_labels: [__meta_consul_tags]
      regex: ^(.*,)?marathon-user-observability(,.*)?$
      action: keep

- job_name: "observability-by-name"
  scrape_interval: "60s"
  metrics_path: "/metrics"
  consul_sd_configs:
    - server: consul.service.par.consul.prod.crto.in:8500
      services:
        - observability-cerebro
        - observability-portal-web

- job_name: "fake-fake-fake"
  scrape_interval: "15s"
  metrics_path: "/metrics"
  consul_sd_configs:
    - server: consul.service.par.consul.prod.crto.in:8500
      services:
        - fake-fake-fake
```

Note: tested with ~1200 services, ~5000 nodes.

| Resource | Empty | Before | After | After + delay |
| -------- |:-----:|:------:|:-----:|:-------------:|
|/service-discovery size|5K|85MiB|27k|27k|27k|
|`go_memstats_heap_objects`|100k|1M|120k|110k|
|`go_memstats_heap_alloc_bytes`|24MB|150MB|28MB|27MB|
|`rate(go_memstats_alloc_bytes_total[5m])`|0.2MB/s|28MB/s|2MB/s|0.3MB/s|
|`rate(process_cpu_seconds_total[5m])`|0.1%|15%|2%|0.01%|
|`process_open_fds`|16|*1236*|22|22|
|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="services"}[5m])`|~0|1|1|*0.03*|
|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="service"}[5m])`|0.1|*80*|0.5|0.5|
|`prometheus_target_sync_length_seconds{quantile="0.9",scrape_job="observability-by-tag"}`|N/A|200ms|0.2ms|0.2ms|
|Network bandwidth|~10kbps|~2.8Mbps|~1.6Mbps|~10kbps|

Filtering by tag using relabel_configs uses **100kiB and 23kiB/s per service per job** and quite a lot of CPU. Also sends and additional *1Mbps* of traffic to consul.
Being a little bit smarter about this reduces the overhead quite a lot.
Limiting the number of `/catalog/services` queries per second almost removes the overhead of service discovery.

* consul: tweak `refresh_interval` behavior

`refresh_interval` now does what is advertised in the documentation,
there won't be more that one update per `refresh_interval`. It now
defaults to 30s (which was also the current waitTime in the consul query).

This also make sure we don't wait another 30s if we already waited 29s
in the blocking call by substracting the number of elapsed seconds.

Hopefully this will do what people expect it does and will be safer
for existing consul infrastructures.
2018-03-23 14:48:43 +00:00
pasquier-s fc8cf08f42 Prevent invalid label names with labelmap (#3868)
This change ensures that the relabeling configurations using labelmap
can't generate invalid label names.
2018-02-21 10:02:22 +00:00
Shubheksha Jalan 0471e64ad1 Use shared types from the common repo (#3674)
* refactor: use shared types from common repo, remove util/config

* vendor: add common/config

* fix nit
2018-01-11 16:10:25 +01:00
Shubheksha Jalan ec94df49d4 Refactor SD configuration to remove config dependency (#3629)
* refactor: move targetGroup struct and CheckOverflow() to their own package

* refactor: move auth and security related structs to a utility package, fix import error in utility package

* refactor: Azure SD, remove SD struct from config

* refactor: DNS SD, remove SD struct from config into dns package

* refactor: ec2 SD, move SD struct from config into the ec2 package

* refactor: file SD, move SD struct from config to file discovery package

* refactor: gce, move SD struct from config to gce discovery package

* refactor: move HTTPClientConfig and URL into util/config, fix import error in httputil

* refactor: consul, move SD struct from config into consul discovery package

* refactor: marathon, move SD struct from config into marathon discovery package

* refactor: triton, move SD struct from config to triton discovery package, fix test

* refactor: zookeeper, move SD structs from config to zookeeper discovery package

* refactor: openstack, remove SD struct from config, move into openstack discovery package

* refactor: kubernetes, move SD struct from config into kubernetes discovery package

* refactor: notifier, use targetgroup package instead of config

* refactor: tests for file, marathon, triton SD - use targetgroup package instead of config.TargetGroup

* refactor: retrieval, use targetgroup package instead of config.TargetGroup

* refactor: storage, use config util package

* refactor: discovery manager, use targetgroup package instead of config.TargetGroup

* refactor: use HTTPClient and TLS config from configUtil instead of config

* refactor: tests, use targetgroup package instead of config.TargetGroup

* refactor: fix tagetgroup.Group pointers that were removed by mistake

* refactor: openstack, kubernetes: drop prefixes

* refactor: remove import aliases forced due to vscode bug

* refactor: move main SD struct out of config into discovery/config

* refactor: rename configUtil to config_util

* refactor: rename yamlUtil to yaml_config

* refactor: kubernetes, remove prefixes

* refactor: move the TargetGroup package to discovery/

* refactor: fix order of imports
2017-12-29 21:01:34 +01:00
Alberto Cortés 29da2fb9cd testutil: update to go1.9 testing.Helper 2017-12-08 19:06:53 +01:00
Alberto Cortés 8f6a9f7833 config: simplify tests by using testutil.NotOk (#3289)
Also include filename in all LoadFile errors

Also add mesage to testuitl.NotOk so we can identify failing tests when
using table driven tests.
2017-12-08 16:52:25 +00:00
Tobias Schmidt 7098c56474 Add remote read filter option
For special remote read endpoints which have only data for specific
queries, it is desired to limit the number of queries sent to the
configured remote read endpoint to reduce latency and performance
overhead.
2017-11-13 23:30:01 +01:00
Thibault Chataigner bf4a279a91 Remote storage reads based on oldest timestamp in primary storage (#3129)
Currently all read queries are simply pushed to remote read clients.
This is fine, except for remote storage for wich it unefficient and
make query slower even if remote read is unnecessary.
So we need instead to compare the oldest timestamp in primary/local
storage with the query range lower boundary. If the oldest timestamp
is older than the mint parameter, then there is no need for remote read.
This is an optionnal behavior per remote read client.

Signed-off-by: Thibault Chataigner <t.chataigner@criteo.com>
2017-10-18 12:08:14 +01:00
Alberto Cortés 6c67296423 config: fix error message for unexpected result of yaml marshal 2017-10-12 19:50:07 +02:00
Alberto Cortés 0f3d8ea075 config: use testutil package 2017-10-12 19:50:07 +02:00
Fabian Reinartz 87918f3097 Merge branch 'master' into dev-2.0 2017-09-04 14:09:21 +02:00
Max Leonard Inden 1c96fbb992
Expose current Prometheus config via /status/config
This PR adds the `/status/config` endpoint which exposes the currently
loaded Prometheus config. This is the same config that is displayed on
`/config` in the UI in YAML format. The response payload looks like
such:
```
{
  "status": "success",
  "data": {
    "yaml": <CONFIG>
  }
}
```
2017-08-13 22:21:18 +02:00
Fabian Reinartz 25f3e1c424 Merge branch 'master' into mergemaster 2017-08-10 17:04:25 +02:00
Yuki Ito 1bf3b91ae0 Make sure that url for remote_read/write is not nil (#3024) 2017-08-07 08:49:45 +01:00
Tom Wilkie 5169f990f9 Review feedback: add yaml struct tags, don't embed queue config.
Also, rename QueueManageConfig to QueueConfig, for consistency with tags.
2017-08-01 14:43:56 +01:00
Tom Wilkie 454b661145 Make queue manager configurable. 2017-07-25 13:47:34 +01:00
Fabian Reinartz dba7586671 Merge branch 'master' into dev-2.0 2017-07-11 17:22:14 +02:00
Fuente, Pablo Andres 902fafb8e7 Fixing tests for Windows
Fixing the config/config_test, the discovery/file/file_test and the
promql/promql_test tests for Windows. For most of the tests, the fix involved
correct handling of path separators. In the case of the promql tests, the
issue was related to the removal of the temporal directories used by the
storage. The issue is that the RemoveAll() call returns an error when it
tries to remove a directory which is not empty, which seems to be true due to
some kind of process that is still running after closing the storage. To fix
it I added some retries to the remove of the temporal directories.
Adding tags file from Universal Ctags to .gitignore
2017-07-09 01:59:30 -03:00
Fabian Reinartz 65b087bcc1 config: resolve file SD paths relative to config 2017-07-04 11:40:26 +02:00
Roman Vynar dbe2eb2afc Hide consul token on UI. (#2797) 2017-06-01 22:14:23 +01:00
Julius Volz 240bb671e2 config: Fix overflow checking in global config (#2783) 2017-05-30 20:58:06 +02:00
Conor Broderick 6766123f93 Replace regex with Secret type and remarshal config to hide secrets (#2775) 2017-05-29 12:46:23 +01:00
Brian Akins 27d66628a1 Allow limiting Kubernetes service discover to certain namespaces
Allow namespace discovery to be more easily extended in the future by using a struct rather than just a list.

Rename fields for kubernetes namespace discovery
2017-04-27 07:41:36 -04:00
Julius Volz 525da88c35 Merge pull request #2479 from YKlausz/consul-tls
Adding consul capability to connect via tls
2017-03-20 11:40:18 +01:00
Goutham Veeramachaneni 5c89cec65c Stricter Relabel Config Checking for Labeldrop/keep (#2510)
* Minor code cleanup

* Labeldrop/Labelkeep Now *Only* Support Regex

Ref promtheus/prometheus#2368
2017-03-18 22:32:08 +01:00
yklausz 75880b594f Adding consul capability to connect via tls 2017-03-17 22:37:18 +01:00
Julius Volz e9476b35d5 Re-add multiple remote writers
Each remote write endpoint gets its own set of relabeling rules.

This is based on the (yet-to-be-merged)
https://github.com/prometheus/prometheus/pull/2419, which removes legacy
remote write implementations.
2017-02-20 13:23:12 +01:00
Fabian Reinartz 7eb849e6a8 Merge pull request #2307 from joyent/triton_discovery
Add Joyent Triton discovery
2017-01-18 05:08:11 +01:00
Richard Kiene f3d9692d09 Add Joyent Triton discovery 2017-01-17 20:34:32 +00:00
Björn Rabenstein ad40d0abbc Merge pull request #2288 from prometheus/limit-scrape
Add ability to limit scrape samples, and related metrics
2017-01-08 01:34:06 +01:00
Brian Brazil 30448286c7 Add sample_limit to scrape config.
This imposes a hard limit on the number of samples ingested from the
target. This is counted after metric relabelling, to allow dropping of
problemtic metrics.

This is intended as a very blunt tool to prevent overload due to
misbehaving targets that suddenly jump in sample count (e.g. adding
a label containing email addresses).

Add metric to track how often this happens.

Fixes #2137
2016-12-16 15:10:09 +00:00