Commit graph

7154 commits

Author SHA1 Message Date
Goutham Veeramachaneni 3733f14dc5
Merge pull request #250 from simonpasquier/update-doc
Update doc
2018-04-03 12:19:29 +05:30
Sneha Inguva 7be846754a main: actor functionality comments 2018-04-01 11:19:30 -07:00
albatross0 0245fd55bf Add a machine type label to GCE SD (#4032) 2018-03-31 09:20:19 +01:00
Kristiyan Nikolov be85ba3842 discovery/ec2: Support filtering instances in discovery (#4011) 2018-03-31 07:51:11 +01:00
Bryan Boreham 93494d8b7e Add an OpenTracing span for each rule (#4027)
* Add an OpenTracing span for each rule

So that tags and child spans can be traced back to the rule that they
refer to.
2018-03-30 21:29:19 +01:00
Björn Rabenstein 6cf725c56d
Merge pull request #4031 from codesome/fix-bug-from-4025
Fix bug from 4025
2018-03-30 16:41:30 +02:00
Ganesh Vernekar b44ce11d1b Added test to check pathPrefix 2018-03-30 11:55:54 +05:30
Ganesh Vernekar cd2820e165 Fix pathPrefix bug from PR-4025 2018-03-30 11:04:15 +05:30
Solomon Van 68e394a56e notifier: update use testutil for testing (#3695) 2018-03-29 16:07:26 +01:00
Elif T. Kuş daebf68ea2 Rewrote tests for relabel and template (#3754)
* relabel: use testutil for testing

* template: use testutil for testing
2018-03-29 16:02:28 +01:00
Björn Rabenstein 61accb51ac
Merge pull request #4025 from codesome/route-prefix
Fixed pathPrefix for web pages
2018-03-29 16:22:54 +02:00
Ganesh Vernekar f30b37e00b Fixed pathPrefix for web pages 2018-03-29 18:02:25 +05:30
Bartek Plotka 7412e2b44b Added more cases and modified one var name.
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2018-03-29 12:50:46 +01:00
Bartek Plotka f07d829946 db: Tiny tuning of algo + added proper print.
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2018-03-28 23:50:42 +01:00
Bartek Plotka 1e60f02066 db: Simplified tests.
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2018-03-28 23:19:22 +01:00
Bartek Plotka c8b4a7b839 db: Simplified algorithm.
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2018-03-28 23:18:24 +01:00
Bartek Plotka 51ce1cc7ff db: Fixed validateBlockSequence.
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2018-03-28 18:33:41 +01:00
Bartek Plotka a9b28a6aa0 db: Added tests for validateBlockSequence to confirm a bug.
(That's why test fails)

Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2018-03-28 15:50:52 +01:00
Fabian Reinartz 184b6e3767
Merge pull request #3968 from zjwzte/fix-magic-number
Fix magic number.
2018-03-28 14:09:43 +02:00
Krasi Georgiev dfd6709a44 update common package (#4015) 2018-03-27 10:21:56 +05:30
Matt Bostock 793c1078dd bench: Fix path to default sample file
The sample file used for benchmarking was renamed in 8326e410d0 but
the `--file` flag default was not updated.
2018-03-25 23:24:30 +08:00
Krasi Georgiev 5fec98d0a7 simplify server error handling (#4006) 2018-03-25 10:05:59 +01:00
Corentin Chary 60dafd425c consul: improve consul service discovery (#3814)
* consul: improve consul service discovery

Related to #3711

- Add the ability to filter by tag and node-meta in an efficient way (`/catalog/services`
  allow filtering by node-meta, and returns a `map[string]string` or `service`->`tags`).
  Tags and nore-meta are also used in `/catalog/service` requests.
- Do not require a call to the catalog if services are specified by name. This is important
  because on large cluster `/catalog/services` changes all the time.
- Add `allow_stale` configuration option to do stale reads. Non-stale
  reads can be costly, even more when you are doing them to a remote
  datacenter with 10k+ targets over WAN (which is common for federation).
- Add `refresh_interval` to minimize the strain on the catalog and on the
  service endpoint. This is needed because of that kind of behavior from
  consul: https://github.com/hashicorp/consul/issues/3712 and because a catalog
  on a large cluster would basically change *all* the time. No need to discover
  targets in 1sec if we scrape them every minute.
- Added plenty of unit tests.

Benchmarks
----------

```yaml
scrape_configs:

- job_name: prometheus
  scrape_interval: 60s
  static_configs:
    - targets: ["127.0.0.1:9090"]

- job_name: "observability-by-tag"
  scrape_interval: "60s"
  metrics_path: "/metrics"
  consul_sd_configs:
    - server: consul.service.par.consul.prod.crto.in:8500
      tag: marathon-user-observability  # Used in After
      refresh_interval: 30s             # Used in After+delay
  relabel_configs:
    - source_labels: [__meta_consul_tags]
      regex: ^(.*,)?marathon-user-observability(,.*)?$
      action: keep

- job_name: "observability-by-name"
  scrape_interval: "60s"
  metrics_path: "/metrics"
  consul_sd_configs:
    - server: consul.service.par.consul.prod.crto.in:8500
      services:
        - observability-cerebro
        - observability-portal-web

- job_name: "fake-fake-fake"
  scrape_interval: "15s"
  metrics_path: "/metrics"
  consul_sd_configs:
    - server: consul.service.par.consul.prod.crto.in:8500
      services:
        - fake-fake-fake
```

Note: tested with ~1200 services, ~5000 nodes.

| Resource | Empty | Before | After | After + delay |
| -------- |:-----:|:------:|:-----:|:-------------:|
|/service-discovery size|5K|85MiB|27k|27k|27k|
|`go_memstats_heap_objects`|100k|1M|120k|110k|
|`go_memstats_heap_alloc_bytes`|24MB|150MB|28MB|27MB|
|`rate(go_memstats_alloc_bytes_total[5m])`|0.2MB/s|28MB/s|2MB/s|0.3MB/s|
|`rate(process_cpu_seconds_total[5m])`|0.1%|15%|2%|0.01%|
|`process_open_fds`|16|*1236*|22|22|
|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="services"}[5m])`|~0|1|1|*0.03*|
|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="service"}[5m])`|0.1|*80*|0.5|0.5|
|`prometheus_target_sync_length_seconds{quantile="0.9",scrape_job="observability-by-tag"}`|N/A|200ms|0.2ms|0.2ms|
|Network bandwidth|~10kbps|~2.8Mbps|~1.6Mbps|~10kbps|

Filtering by tag using relabel_configs uses **100kiB and 23kiB/s per service per job** and quite a lot of CPU. Also sends and additional *1Mbps* of traffic to consul.
Being a little bit smarter about this reduces the overhead quite a lot.
Limiting the number of `/catalog/services` queries per second almost removes the overhead of service discovery.

* consul: tweak `refresh_interval` behavior

`refresh_interval` now does what is advertised in the documentation,
there won't be more that one update per `refresh_interval`. It now
defaults to 30s (which was also the current waitTime in the consul query).

This also make sure we don't wait another 30s if we already waited 29s
in the blocking call by substracting the number of elapsed seconds.

Hopefully this will do what people expect it does and will be safer
for existing consul infrastructures.
2018-03-23 14:48:43 +00:00
Ben Kochie 0d9fe18f5e Fix nil context staticcheck error. 2018-03-22 07:59:39 +00:00
Ben Kochie 0f37c02343 Update vendor golang.org/x/...
Update vendor golang.org/x/sys/unix
Update vendor golang.org/x/net/...
2018-03-22 07:59:39 +00:00
Ben Kochie 2b02fcb0cb Update vendor github.com/miekg/dns@v1.0.4
Update vendor `github.com/miekg/dns` to `v1.0.4` release.
* Add dependent vendor `golang.org/x/crypto/ed25519`.
* Add dependent vendor `golang.org/x/crypto/ed25519/internal/edwards25519`.
* Add dependent vendor `golang.org/x/net/bpf`.
* Add dependent vendor `golang.org/x/net/internal/iana`.
* Add dependent vendor `golang.org/x/net/internal/socket`.
* Add dependent vendor `golang.org/x/net/ipv4`.
* Add dependent vendor `golang.org/x/net/ipv6`.
2018-03-22 07:59:39 +00:00
kun 5f929254a3 Fix labels bench test
Signed-off-by: kun <oiooj@qq.com>
2018-03-22 12:28:09 +08:00
Mario Trangoni e5dabad1d7 fix megacheck issue: simplified return err 2018-03-21 22:39:15 +01:00
Mario Trangoni c2182820ed fix megacheck issue: should omit values from range 2018-03-21 22:39:15 +01:00
Mario Trangoni c0e888e82b fix megacheck issues: os.SEEK_SET is deprecated: Use io.SeekStart, io.SeekCurrent, and io.SeekEnd. 2018-03-21 22:39:15 +01:00
Mario Trangoni 09142e4dd1 fix unconvert issues: unnecessary conversion 2018-03-21 22:39:14 +01:00
Marek Siarkowicz bb86c3f62b Report internal runtime information on status page (#3921)
Add information about tsdb, wal and config reload
2018-03-21 16:08:37 +00:00
Aaron Kirkbride c47fbcb626 Fix moved fsnotify dependency (#3995) 2018-03-21 15:46:31 +00:00
Brian Brazil cc39021b2b Provide custom marshalling for Point
Point has a non-standard marshalling, and is also
where the vast majority of CPU time is spent so
it is worth optimising.
2018-03-21 15:02:01 +00:00
Brian Brazil f35fca1c3f Vendor github.com/json-iterator/go 2018-03-21 15:02:01 +00:00
Brian Brazil 299b78a887 Switch to json-iterator for v1 api.
This makes queries ~15% faster and cuts cpu
time spent on json encoding by ~40%.
2018-03-21 15:02:01 +00:00
Brian Brazil 8ede14b24c Add unittests for Point json output 2018-03-21 15:02:01 +00:00
Brian Brazil ecd0a9c6ba web: Add benchmark for respond() 2018-03-21 15:02:01 +00:00
Anton Tereshchenkov 4cb8f6c260 web: remove unused MetricsPath option (#3964) 2018-03-21 09:29:40 +00:00
ferhat elmas ec8e4d8a7c all: remove unnecessary type conversions (#3992)
excep promql due to not to create conflict with #3966.
2018-03-21 09:25:22 +00:00
Simon Pasquier 83325c8d82 web: replace deprecated InstrumentHandler() (#3862)
* web: replace deprecated InstrumentHandler()

This change replaces the deprecated InstrumentHandler function by the
equivalent functions from the promhttp package.

The following metrics are removed:

* http_request_duration_microseconds (Summary).
* http_request_size_bytes (Summary).
* http_requests_total (Counter).

And the following metrics are added instead:

* prometheus_http_request_duration_seconds (Histogram).
* prometheus_http_response_size_bytes (Histogram).
* promhttp_metric_handler_requests_in_flight (Gauge).
* promhttp_metric_handler_requests_total (Counter).

* Update github.com/prometheus/common/route package

* web: refactor using the new prometheus/common/route package
2018-03-21 08:16:16 +00:00
James Turnbull ba5273a0ab Minor edits to help text (#3990) 2018-03-20 16:54:36 +00:00
Simon Pasquier e1fd96db25 cmd: fix help text (#3989) 2018-03-20 15:58:19 +00:00
Warren Fernandes d49a3df55b Parser test cleanup (#3977)
* parser test cleanup

- Test against the exported package functions instead of the private functions.

* Improves readability of TestParseSeries

- Moves package function closer to parser function
2018-03-20 14:30:52 +00:00
Jeeyoung Kim 5b962c5748 Revert "Feature: Allow getting credentials via EC2 role (#3343)" (#3985)
This reverts commit 808f79f00a.
2018-03-20 12:34:54 +00:00
Goutham Veeramachaneni 195bc0d286
Merge pull request #303 from Bplotka/bp/better-compact-logging
repair + compact: Improved logging for easier future debug purposes.
2018-03-16 00:45:47 +05:30
Bartek Plotka fada85a83c repair + compact: Improved logging for easier future debug purposes.
This is based on my experience while debugging https://github.com/prometheus/prometheus/issues/3943.

I needed to deduct few things, and all that would be just bit easier with these two logs:
- new block's ULID on each compaction.
- actual list of Blocks (ulid + time range) on Prometheus startup (easy to log that while repairing blocks).

We don't really need blocks that takes part in compaction - that can be deducted easily based on time ranges of blocks we have currently in system.

What do you think?

Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2018-03-15 14:59:32 +00:00
Warren Fernandes 58e2a31db8 Cleans up test by removing unused function (#3969) 2018-03-15 08:59:19 +00:00
zjwzte b7a37a1604 Fix magic number. 2018-03-15 10:15:35 +08:00
Fabian Reinartz e87c6c8b28
Merge pull request #3963 from mz-techops/fix-query-err-scope
promql: propagate storage errors
2018-03-14 11:04:02 -04:00