Commit graph

56 commits

Author SHA1 Message Date
Nick Triller fddf4918c0
Send empty targetgroup if nothing discovered
Signed-off-by: Nick Triller <nicktriller@gmail.com>
2021-04-29 09:06:52 +02:00
Julien Pivotto 6c56a1faaa
Testify: move to require (#8122)
* Testify: move to require

Moving testify to require to fail tests early in case of errors.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* More moves

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-29 09:43:23 +00:00
Julien Pivotto 1282d1b39c
Refactor test assertions (#8110)
* Refactor test assertions

This pull request gets rid of assert.True where possible to use
fine-grained assertions.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-27 11:06:53 +01:00
Julien Pivotto 4e5b1722b3
Move away from testutil, refactor imports (#8087)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-22 11:00:08 +02:00
johncming a5beb627ff
some fixies for consul sd. (#7799)
* discovery/consul: make duration more accurate.

Signed-off-by: johncming <johncming@yahoo.com>

* discovery/consul: fix bug when context done.

Signed-off-by: johncming <johncming@yahoo.com>
2020-08-25 15:46:14 +02:00
Andy Bursavich 4e6a94a27d
Invert service discovery dependencies (#7701)
This also fixes a bug in query_log_file, which now is relative to the config file like all other paths.

Signed-off-by: Andy Bursavich <abursavich@gmail.com>
2020-08-20 13:48:26 +01:00
Julien Pivotto 9da53391d1
Merge pull request #7739 from prometheus/release-2.20
Merge release-2.20 into the main branch after Consul fix
2020-08-04 20:15:43 +02:00
Julien Pivotto 3a7120bc07 Consul: Reduce WatchTimeout to 2m and set it as timeout for requests
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-08-03 00:42:55 +02:00
Julien Pivotto 93e9c010f3
Add more Go leak tests (#7652)
* Implement go leak test for promql

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Implement go leak test for Consul SD

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Implement go leak test in discovery manager

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-24 10:10:20 +01:00
John Bampton 98a69b77d1
Fix spelling (#7512)
Signed-off-by: John Bampton <jbampton@users.noreply.github.com>
2020-07-04 14:54:26 +02:00
Pierre Souchay 1508678001
Use 10m timeouts for watches (#7423)
use ?wait=10m will give results as fast as usual when data is changing
but will perform far less requests when services do not change.

On large infrastructure, this will reduce quite a lot the number of
qps on Consul servers while having the same performance for freshness
of results.

Signed-off-by: Pierre Souchay <p.souchay@criteo.com>
2020-06-20 20:22:45 +01:00
Mathilde Gilles 9b9c58aea8
[Consul] Add health label to metrics (#5313)
Label metrics with the target health using consul's /health endpoint.

Signed-off-by: Mathilde Gilles <m.gilles@criteo.com>
2020-02-25 13:32:30 +00:00
Simon Pasquier fe76ccbfe3
discovery/consul: fix logging of tags (#6783)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-02-13 13:11:44 +01:00
Ben Ye 60527de355
keep consul service metrics in global variables (#6764)
Signed-off-by: yeya24 <yb532204897@gmail.com>
2020-02-06 05:48:58 +00:00
Josh Soref 91d76c8023 Spelling (#6517)
* spelling: alertmanager

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: attributes

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: autocomplete

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: bootstrap

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: caught

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: chunkenc

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: compaction

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: corrupted

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: deletable

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: expected

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: fine-grained

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: initialized

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: iteration

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: javascript

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: multiple

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: number

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: overlapping

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: possible

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: postings

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: procedure

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: programmatic

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: queuing

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: querier

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: repairing

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: received

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: reproducible

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: retention

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: sample

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: segements

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: semantic

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: software [LICENSE]

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: staging

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: timestamp

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: unfortunately

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: uvarint

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: subsequently

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: ressamples

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
2020-01-02 15:54:09 +01:00
Jean-Baptiste Le Duigou 5973227434 adding additional unit tests for getDataCenter() in consul (#6192)
* adding additional unit tests for getDataCenter() in consul

Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>

* Consult Tests : update comments to start with uppercase and end with point

Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>

* Consult Test : using table-driven tests

Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>

* Consul Test : cleaner syntax

Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>

* Consul Test : even cleaner syntax

Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>

* Consul Test : update comments

Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>

* Fixing naming convention by removing underscore in function name

Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>

* Removing duplicated test case for getDatacenter()

Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
2019-11-15 14:52:39 +01:00
Simon Pasquier 19ce6b7f5f
discovery: fix more error logs on context cancelation (#6133)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-10-18 11:48:51 +02:00
Ganesh Vernekar 5ecef3542d
Cleanup after merging tsdb into prometheus
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-08-13 14:04:14 +05:30
AllenZMC 41151ca8dc fix mis-spelling in consul_test.go (#5836)
Signed-off-by: czm <zhongming.chang@daocloud.io>
2019-08-06 06:11:41 +01:00
beorn7 dd81912554 Add objectives to Summaries
With the next release of client_golang, Summaries will not have
objectives by default. To not lose the objectives we have right now,
explicitly state the current default objectives.

Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-12 02:03:13 +02:00
Simon Pasquier 45506841e6
*: enable all default linters (#5504)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-05-03 15:11:28 +02:00
Tariq Ibrahim 8fdfa8abea refine error handling in prometheus (#5388)
i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors.
ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives.
iii) Does away with the use of fmt package for errors in favour of pkg/errors

Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-03-26 00:01:12 +01:00
Simon Pasquier 782d00059a
discovery: factorize for SD based on refresh (#5381)
* discovery: factorize for SD based on refresh

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* discovery: use common metrics for refresh

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-03-25 11:54:22 +01:00
Mario Trangoni 5354ffff99 Fix some spelling issues (#5361)
See,
$ codespell -S './vendor/*,./.git*,./web/ui/static/vendor*' --ignore-words-list="uint,dur,ue,iff,te,wan"

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2019-03-14 14:38:54 +00:00
Callum Styan 83c46fd549 update Consul vendor code so that catalog.ServiceMultipleTags can be (#5151)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-03-12 10:31:27 +00:00
Simon Pasquier f9462d5d44 discovery/consul: pass current context to Consul queries
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-18 14:23:56 +01:00
Simon Pasquier f678e27eb6
*: use latest release of staticcheck (#5057)
* *: use latest release of staticcheck

It also fixes a couple of things in the code flagged by the additional
checks.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Use official release of staticcheck

Also run 'go list' before staticcheck to avoid failures when downloading packages.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-01-04 14:47:38 +01:00
Samuel Alfageme 240321acee Add taggedAddress to the labels in ConsulSD (#5001)
Useful when multiple (tagged) addresses for a node are exposed on the catalog API
Ref. https://www.consul.io/api/catalog.html#taggedaddresses

Signed-off-by: Samuel Alfageme <samuel@alfage.me>
2018-12-18 11:51:05 +01:00
Ben Kochie c6399296dc
Fix spelling/typos (#4921)
* Fix spelling/typos

Fix spelling/typos reported by codespell/misspell.
* UK -> US spelling changes.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-27 17:44:29 +01:00
Simon Pasquier 1cd29f782c discovery/consul: close idle connections on stop
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-08-01 17:26:52 +02:00
Romain Baugue b41be4ef52 Discovery consul service meta (#4280)
* Upgrade Consul client
* Add ServiceMeta to the labels in ConsulSD

Signed-off-by: Romain Baugue <romain.baugue@elwinar.com>
2018-07-18 05:06:56 +01:00
Julius Volz 5cf0113762
Add "omitempty" to some SD config YAML field tags (#4338)
Especially for Kubernetes SD, this fixes a bug where the rendered
configuration says "api_server: null", which when read back is not
interpreted as an un-set API server (thus the default is not applied).

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-07-03 13:43:41 +02:00
Elif T. Kuş 57dcdfb15f Rewrote tests with testutil for several test files (#4086)
* promql: Rewrote tests with testutil for functions_test

Signed-off-by: Elif T. Kuş <elifkus@gmail.com>

* pkg/relabel: Rewrote tests with testutil for relabel_test

Signed-off-by: Elif T. Kuş <elifkus@gmail.com>

* discovery/consul: Rewrote tests with testutil for consul_test

Signed-off-by: Elif T. Kuş <elifkus@gmail.com>

* scrape: Rewrote tests with testutil for manager_test

Signed-off-by: Elif T. Kuş <elifkus@gmail.com>
2018-04-27 13:11:16 +01:00
Adam Shannon 809881d7f5 support reading basic_auth password_file for HTTP basic auth (#4077)
Issue: https://github.com/prometheus/prometheus/issues/4076

Signed-off-by: Adam Shannon <adamkshannon@gmail.com>
2018-04-25 18:19:06 +01:00
sev3ryn cc917aee7f fix of endless loop while doing Consul service discovery. (#4044)
Reloading Prometheus configs doesn't make loop end.
It produced a goroutine leak
2018-04-05 10:41:09 +01:00
Manos Fokas 25f929b772 Yaml UnmarshalStrict implementation. (#4033)
* Updated yaml vendor package.

* remove checkOverflow duplicate in rulefmt

* remove duplicated HTTPClientConfig.Validate()

* Added yaml static check.
2018-04-04 09:07:39 +01:00
Corentin Chary 60dafd425c consul: improve consul service discovery (#3814)
* consul: improve consul service discovery

Related to #3711

- Add the ability to filter by tag and node-meta in an efficient way (`/catalog/services`
  allow filtering by node-meta, and returns a `map[string]string` or `service`->`tags`).
  Tags and nore-meta are also used in `/catalog/service` requests.
- Do not require a call to the catalog if services are specified by name. This is important
  because on large cluster `/catalog/services` changes all the time.
- Add `allow_stale` configuration option to do stale reads. Non-stale
  reads can be costly, even more when you are doing them to a remote
  datacenter with 10k+ targets over WAN (which is common for federation).
- Add `refresh_interval` to minimize the strain on the catalog and on the
  service endpoint. This is needed because of that kind of behavior from
  consul: https://github.com/hashicorp/consul/issues/3712 and because a catalog
  on a large cluster would basically change *all* the time. No need to discover
  targets in 1sec if we scrape them every minute.
- Added plenty of unit tests.

Benchmarks
----------

```yaml
scrape_configs:

- job_name: prometheus
  scrape_interval: 60s
  static_configs:
    - targets: ["127.0.0.1:9090"]

- job_name: "observability-by-tag"
  scrape_interval: "60s"
  metrics_path: "/metrics"
  consul_sd_configs:
    - server: consul.service.par.consul.prod.crto.in:8500
      tag: marathon-user-observability  # Used in After
      refresh_interval: 30s             # Used in After+delay
  relabel_configs:
    - source_labels: [__meta_consul_tags]
      regex: ^(.*,)?marathon-user-observability(,.*)?$
      action: keep

- job_name: "observability-by-name"
  scrape_interval: "60s"
  metrics_path: "/metrics"
  consul_sd_configs:
    - server: consul.service.par.consul.prod.crto.in:8500
      services:
        - observability-cerebro
        - observability-portal-web

- job_name: "fake-fake-fake"
  scrape_interval: "15s"
  metrics_path: "/metrics"
  consul_sd_configs:
    - server: consul.service.par.consul.prod.crto.in:8500
      services:
        - fake-fake-fake
```

Note: tested with ~1200 services, ~5000 nodes.

| Resource | Empty | Before | After | After + delay |
| -------- |:-----:|:------:|:-----:|:-------------:|
|/service-discovery size|5K|85MiB|27k|27k|27k|
|`go_memstats_heap_objects`|100k|1M|120k|110k|
|`go_memstats_heap_alloc_bytes`|24MB|150MB|28MB|27MB|
|`rate(go_memstats_alloc_bytes_total[5m])`|0.2MB/s|28MB/s|2MB/s|0.3MB/s|
|`rate(process_cpu_seconds_total[5m])`|0.1%|15%|2%|0.01%|
|`process_open_fds`|16|*1236*|22|22|
|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="services"}[5m])`|~0|1|1|*0.03*|
|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="service"}[5m])`|0.1|*80*|0.5|0.5|
|`prometheus_target_sync_length_seconds{quantile="0.9",scrape_job="observability-by-tag"}`|N/A|200ms|0.2ms|0.2ms|
|Network bandwidth|~10kbps|~2.8Mbps|~1.6Mbps|~10kbps|

Filtering by tag using relabel_configs uses **100kiB and 23kiB/s per service per job** and quite a lot of CPU. Also sends and additional *1Mbps* of traffic to consul.
Being a little bit smarter about this reduces the overhead quite a lot.
Limiting the number of `/catalog/services` queries per second almost removes the overhead of service discovery.

* consul: tweak `refresh_interval` behavior

`refresh_interval` now does what is advertised in the documentation,
there won't be more that one update per `refresh_interval`. It now
defaults to 30s (which was also the current waitTime in the consul query).

This also make sure we don't wait another 30s if we already waited 29s
in the blocking call by substracting the number of elapsed seconds.

Hopefully this will do what people expect it does and will be safer
for existing consul infrastructures.
2018-03-23 14:48:43 +00:00
zemek 8a01a0fbed Set consul server default to localhost:8500 (#3703) 2018-01-24 12:14:32 +00:00
Shubheksha Jalan 0471e64ad1 Use shared types from the common repo (#3674)
* refactor: use shared types from common repo, remove util/config

* vendor: add common/config

* fix nit
2018-01-11 16:10:25 +01:00
Callum Styan 97464236c7 comments with TargetProvider should read Discoverer instead (#3667) 2018-01-08 23:59:18 +00:00
Shubheksha Jalan ec94df49d4 Refactor SD configuration to remove config dependency (#3629)
* refactor: move targetGroup struct and CheckOverflow() to their own package

* refactor: move auth and security related structs to a utility package, fix import error in utility package

* refactor: Azure SD, remove SD struct from config

* refactor: DNS SD, remove SD struct from config into dns package

* refactor: ec2 SD, move SD struct from config into the ec2 package

* refactor: file SD, move SD struct from config to file discovery package

* refactor: gce, move SD struct from config to gce discovery package

* refactor: move HTTPClientConfig and URL into util/config, fix import error in httputil

* refactor: consul, move SD struct from config into consul discovery package

* refactor: marathon, move SD struct from config into marathon discovery package

* refactor: triton, move SD struct from config to triton discovery package, fix test

* refactor: zookeeper, move SD structs from config to zookeeper discovery package

* refactor: openstack, remove SD struct from config, move into openstack discovery package

* refactor: kubernetes, move SD struct from config into kubernetes discovery package

* refactor: notifier, use targetgroup package instead of config

* refactor: tests for file, marathon, triton SD - use targetgroup package instead of config.TargetGroup

* refactor: retrieval, use targetgroup package instead of config.TargetGroup

* refactor: storage, use config util package

* refactor: discovery manager, use targetgroup package instead of config.TargetGroup

* refactor: use HTTPClient and TLS config from configUtil instead of config

* refactor: tests, use targetgroup package instead of config.TargetGroup

* refactor: fix tagetgroup.Group pointers that were removed by mistake

* refactor: openstack, kubernetes: drop prefixes

* refactor: remove import aliases forced due to vscode bug

* refactor: move main SD struct out of config into discovery/config

* refactor: rename configUtil to config_util

* refactor: rename yamlUtil to yaml_config

* refactor: kubernetes, remove prefixes

* refactor: move the TargetGroup package to discovery/

* refactor: fix order of imports
2017-12-29 21:01:34 +01:00
Callum Styan 7776527390 bump consul HTTP client timeout by 5s so it doesn't match up exactly with the consul SD watch timeout 2017-10-28 16:42:42 -07:00
Julius Volz 099df0c5f0 Migrate "golang.org/x/net/context" -> "context" (#3333)
In some places, where ctxhttp or gRPC are concerned, we still need to use the
old contexts.
2017-10-24 21:21:42 -07:00
Callum Styan 45f9f3c539 use a timeout in the HTTP client used for consul sd (#3303) 2017-10-20 16:56:30 +01:00
Marc Sluiter 6a633eece1 Added go-conntrack for monitoring http connections (#3241)
Added metrics for in- and outgoing traffic with go-conntrack.
2017-10-06 11:22:19 +01:00
Fabian Reinartz d21f149745 *: migrate to go-kit/log 2017-09-08 22:01:51 +05:30
Joe Martin aba41c7d0f add support for consul's node metadata 2017-07-18 16:46:16 -04:00
Roman Vynar dbe2eb2afc Hide consul token on UI. (#2797) 2017-06-01 22:14:23 +01:00
Chris Goller 42de0ae013 Use log.Logger interface for all discovery services 2017-06-01 11:25:55 -05:00
Conor Broderick 6766123f93 Replace regex with Secret type and remarshal config to hide secrets (#2775) 2017-05-29 12:46:23 +01:00