Commit graph

7627 commits

Author SHA1 Message Date
beorn7 bd44e7fe98 Update vendoring of prometheus/common/route to include data race fix
See https://github.com/prometheus/common/pull/125

Signed-off-by: beorn7 <beorn@soundcloud.com>
2018-04-09 17:48:32 +02:00
Krasi Georgiev ddd46de6f4 Races/3994 (#4005)
Fix race by properly locking access to scrape pools. Use separate mutex for information needed by UI so that UI isn't blocked when targets are being updated.
2018-04-09 15:18:25 +01:00
Mario Trangoni 464e747f1e fix some comments typos (#4059) 2018-04-08 10:51:54 +01:00
Mario Trangoni b7173eb0e5 fix some comments typos (#315) 2018-04-08 10:28:30 +01:00
Sneha Inguva cbfb207cca vendor: correctly update golang client (#4056) 2018-04-06 18:05:32 +01:00
Tony Lee 7cd56f56df add queue_time slice to query_duration_seconds (#4050) 2018-04-05 19:56:58 +01:00
Fabian Reinartz bd832fc827
Merge pull request #314 from Bplotka/overlap-log-improvement
db: Made overlap String() prettier and more readable.
2018-04-05 18:20:54 +02:00
Bartek Plotka 00594b85cd db: Addressed comments.
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2018-04-05 16:53:24 +01:00
Bartek Plotka 03e94365e1 db: Made overlap String() prettier and more readable.
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2018-04-05 16:01:16 +01:00
Fabian Reinartz 00e13f519a
Merge pull request #310 from Bplotka/overlap-detection-fix
db: Fixed validateBlockSequence, exported it and added tests.
2018-04-05 15:30:22 +02:00
Bartek Plotka 15b5d89222 db: Addressed comments.
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2018-04-05 14:15:24 +01:00
Bartek Plotka cc306ef0d5 Added grouping by overlap range.
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2018-04-05 13:51:33 +01:00
Goutham Veeramachaneni 90d55672d1
Merge pull request #286 from mattbostock/rename_high_timestamp
head: Rename highTimestamp to maxt
2018-04-05 17:41:48 +05:30
Goutham Veeramachaneni d610390427
Merge pull request #312 from gouthamve/nit-1
Simplify stones counting.
2018-04-05 17:37:47 +05:30
Julius Volz fe10b36b30 Fix curl example for deleting series (#4046) 2018-04-05 13:06:18 +01:00
sev3ryn cc917aee7f fix of endless loop while doing Consul service discovery. (#4044)
Reloading Prometheus configs doesn't make loop end.
It produced a goroutine leak
2018-04-05 10:41:09 +01:00
Philippe Laflamme 2aba238f31 Use common HTTPClientConfig for marathon_sd configuration (#4009)
This adds support for basic authentication which closes #3090

The support for specifying the client timeout was removed as discussed in https://github.com/prometheus/common/pull/123. Marathon was the only sd mechanism doing this and configuring the timeout is done through `Context`.

DC/OS uses a custom `Authorization` header for authenticating. This adds 2 new configuration properties to reflect this.

Existing configuration files that use the bearer token will no longer work. More work is required to make this backwards compatible.
2018-04-05 09:08:18 +01:00
Manos Fokas 25f929b772 Yaml UnmarshalStrict implementation. (#4033)
* Updated yaml vendor package.

* remove checkOverflow duplicate in rulefmt

* remove duplicated HTTPClientConfig.Validate()

* Added yaml static check.
2018-04-04 09:07:39 +01:00
Goutham Veeramachaneni f82a1fe4f2
Simplify stones counting.
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2018-04-03 18:17:02 +05:30
Krasi Georgiev 406233e937
Merge pull request #4034 from si74/main_comments
main: actor functionality comments
2018-04-03 12:52:15 +03:00
Goutham Veeramachaneni 8748f33c54
Merge pull request #259 from mattbostock/add_benchout_to_gitignore
Add default benchmark output dir to .gitignore
2018-04-03 12:39:13 +05:30
Goutham Veeramachaneni 8a301b126a
Merge pull request #307 from mjtrangoni/fixes
Fix some megacheck and unconvert issues
2018-04-03 12:27:27 +05:30
Goutham Veeramachaneni 2f37e1eddc
Merge pull request #277 from simonpasquier/delete-series-without-samples
Fix crash when a series has no block
2018-04-03 12:24:12 +05:30
Goutham Veeramachaneni 3733f14dc5
Merge pull request #250 from simonpasquier/update-doc
Update doc
2018-04-03 12:19:29 +05:30
Sneha Inguva 7be846754a main: actor functionality comments 2018-04-01 11:19:30 -07:00
albatross0 0245fd55bf Add a machine type label to GCE SD (#4032) 2018-03-31 09:20:19 +01:00
Kristiyan Nikolov be85ba3842 discovery/ec2: Support filtering instances in discovery (#4011) 2018-03-31 07:51:11 +01:00
Bryan Boreham 93494d8b7e Add an OpenTracing span for each rule (#4027)
* Add an OpenTracing span for each rule

So that tags and child spans can be traced back to the rule that they
refer to.
2018-03-30 21:29:19 +01:00
Björn Rabenstein 6cf725c56d
Merge pull request #4031 from codesome/fix-bug-from-4025
Fix bug from 4025
2018-03-30 16:41:30 +02:00
Ganesh Vernekar b44ce11d1b Added test to check pathPrefix 2018-03-30 11:55:54 +05:30
Ganesh Vernekar cd2820e165 Fix pathPrefix bug from PR-4025 2018-03-30 11:04:15 +05:30
Solomon Van 68e394a56e notifier: update use testutil for testing (#3695) 2018-03-29 16:07:26 +01:00
Elif T. Kuş daebf68ea2 Rewrote tests for relabel and template (#3754)
* relabel: use testutil for testing

* template: use testutil for testing
2018-03-29 16:02:28 +01:00
Björn Rabenstein 61accb51ac
Merge pull request #4025 from codesome/route-prefix
Fixed pathPrefix for web pages
2018-03-29 16:22:54 +02:00
Ganesh Vernekar f30b37e00b Fixed pathPrefix for web pages 2018-03-29 18:02:25 +05:30
Bartek Plotka 7412e2b44b Added more cases and modified one var name.
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2018-03-29 12:50:46 +01:00
Bartek Plotka f07d829946 db: Tiny tuning of algo + added proper print.
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2018-03-28 23:50:42 +01:00
Bartek Plotka 1e60f02066 db: Simplified tests.
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2018-03-28 23:19:22 +01:00
Bartek Plotka c8b4a7b839 db: Simplified algorithm.
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2018-03-28 23:18:24 +01:00
Bartek Plotka 51ce1cc7ff db: Fixed validateBlockSequence.
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2018-03-28 18:33:41 +01:00
Bartek Plotka a9b28a6aa0 db: Added tests for validateBlockSequence to confirm a bug.
(That's why test fails)

Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2018-03-28 15:50:52 +01:00
Fabian Reinartz 184b6e3767
Merge pull request #3968 from zjwzte/fix-magic-number
Fix magic number.
2018-03-28 14:09:43 +02:00
Krasi Georgiev dfd6709a44 update common package (#4015) 2018-03-27 10:21:56 +05:30
Matt Bostock 793c1078dd bench: Fix path to default sample file
The sample file used for benchmarking was renamed in 8326e410d0 but
the `--file` flag default was not updated.
2018-03-25 23:24:30 +08:00
Krasi Georgiev 5fec98d0a7 simplify server error handling (#4006) 2018-03-25 10:05:59 +01:00
Corentin Chary 60dafd425c consul: improve consul service discovery (#3814)
* consul: improve consul service discovery

Related to #3711

- Add the ability to filter by tag and node-meta in an efficient way (`/catalog/services`
  allow filtering by node-meta, and returns a `map[string]string` or `service`->`tags`).
  Tags and nore-meta are also used in `/catalog/service` requests.
- Do not require a call to the catalog if services are specified by name. This is important
  because on large cluster `/catalog/services` changes all the time.
- Add `allow_stale` configuration option to do stale reads. Non-stale
  reads can be costly, even more when you are doing them to a remote
  datacenter with 10k+ targets over WAN (which is common for federation).
- Add `refresh_interval` to minimize the strain on the catalog and on the
  service endpoint. This is needed because of that kind of behavior from
  consul: https://github.com/hashicorp/consul/issues/3712 and because a catalog
  on a large cluster would basically change *all* the time. No need to discover
  targets in 1sec if we scrape them every minute.
- Added plenty of unit tests.

Benchmarks
----------

```yaml
scrape_configs:

- job_name: prometheus
  scrape_interval: 60s
  static_configs:
    - targets: ["127.0.0.1:9090"]

- job_name: "observability-by-tag"
  scrape_interval: "60s"
  metrics_path: "/metrics"
  consul_sd_configs:
    - server: consul.service.par.consul.prod.crto.in:8500
      tag: marathon-user-observability  # Used in After
      refresh_interval: 30s             # Used in After+delay
  relabel_configs:
    - source_labels: [__meta_consul_tags]
      regex: ^(.*,)?marathon-user-observability(,.*)?$
      action: keep

- job_name: "observability-by-name"
  scrape_interval: "60s"
  metrics_path: "/metrics"
  consul_sd_configs:
    - server: consul.service.par.consul.prod.crto.in:8500
      services:
        - observability-cerebro
        - observability-portal-web

- job_name: "fake-fake-fake"
  scrape_interval: "15s"
  metrics_path: "/metrics"
  consul_sd_configs:
    - server: consul.service.par.consul.prod.crto.in:8500
      services:
        - fake-fake-fake
```

Note: tested with ~1200 services, ~5000 nodes.

| Resource | Empty | Before | After | After + delay |
| -------- |:-----:|:------:|:-----:|:-------------:|
|/service-discovery size|5K|85MiB|27k|27k|27k|
|`go_memstats_heap_objects`|100k|1M|120k|110k|
|`go_memstats_heap_alloc_bytes`|24MB|150MB|28MB|27MB|
|`rate(go_memstats_alloc_bytes_total[5m])`|0.2MB/s|28MB/s|2MB/s|0.3MB/s|
|`rate(process_cpu_seconds_total[5m])`|0.1%|15%|2%|0.01%|
|`process_open_fds`|16|*1236*|22|22|
|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="services"}[5m])`|~0|1|1|*0.03*|
|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="service"}[5m])`|0.1|*80*|0.5|0.5|
|`prometheus_target_sync_length_seconds{quantile="0.9",scrape_job="observability-by-tag"}`|N/A|200ms|0.2ms|0.2ms|
|Network bandwidth|~10kbps|~2.8Mbps|~1.6Mbps|~10kbps|

Filtering by tag using relabel_configs uses **100kiB and 23kiB/s per service per job** and quite a lot of CPU. Also sends and additional *1Mbps* of traffic to consul.
Being a little bit smarter about this reduces the overhead quite a lot.
Limiting the number of `/catalog/services` queries per second almost removes the overhead of service discovery.

* consul: tweak `refresh_interval` behavior

`refresh_interval` now does what is advertised in the documentation,
there won't be more that one update per `refresh_interval`. It now
defaults to 30s (which was also the current waitTime in the consul query).

This also make sure we don't wait another 30s if we already waited 29s
in the blocking call by substracting the number of elapsed seconds.

Hopefully this will do what people expect it does and will be safer
for existing consul infrastructures.
2018-03-23 14:48:43 +00:00
Ben Kochie 0d9fe18f5e Fix nil context staticcheck error. 2018-03-22 07:59:39 +00:00
Ben Kochie 0f37c02343 Update vendor golang.org/x/...
Update vendor golang.org/x/sys/unix
Update vendor golang.org/x/net/...
2018-03-22 07:59:39 +00:00
Ben Kochie 2b02fcb0cb Update vendor github.com/miekg/dns@v1.0.4
Update vendor `github.com/miekg/dns` to `v1.0.4` release.
* Add dependent vendor `golang.org/x/crypto/ed25519`.
* Add dependent vendor `golang.org/x/crypto/ed25519/internal/edwards25519`.
* Add dependent vendor `golang.org/x/net/bpf`.
* Add dependent vendor `golang.org/x/net/internal/iana`.
* Add dependent vendor `golang.org/x/net/internal/socket`.
* Add dependent vendor `golang.org/x/net/ipv4`.
* Add dependent vendor `golang.org/x/net/ipv6`.
2018-03-22 07:59:39 +00:00
kun 5f929254a3 Fix labels bench test
Signed-off-by: kun <oiooj@qq.com>
2018-03-22 12:28:09 +08:00