Commit graph

5868 commits

Author SHA1 Message Date
xjewer 0d1a69353e scrape: Add global jitter for HA server (#5181)
* scrape: Add global jitter for HA server

Covers issue in https://github.com/prometheus/prometheus/pull/4926#issuecomment-449039848
where the HA setup become a problem for targets unable to be scraped simultaneously.
The new jitter per server relies on the hostname and external labels which necessarily to be uniq.

As before, scrape offset will be calculated with regard the absolute time, so even
restart/reload doesn't change scrape time per scrape target + prometheus instance.

Use fqdn if possible, otherwise fall back to the hostname. It adds extra random seed
to calculate server hash to be distinguish on machines with the same hostname, but
different DC.

Signed-off-by: Aleksei Semiglazov <xjewer@gmail.com>
2019-03-12 10:46:15 +00:00
Callum Styan 83c46fd549 update Consul vendor code so that catalog.ServiceMultipleTags can be (#5151)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-03-12 10:31:27 +00:00
Julien Pivotto 04ce817c49 scrape: Rewrite scrape loop options as a struct (#5314)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2019-03-12 10:26:18 +00:00
Simon Pasquier 027d2ece14 config: resolve more file paths (#5284)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-03-12 10:24:15 +00:00
Daisy T 683fbc59ec exponentation operator to drop metric name in result of op operation (#5329)
Signed-off-by: Daisy T <daisyts@gmx.com>
2019-03-12 10:21:42 +00:00
Ganesh Vernekar 59369491cf
Merge pull request #5333 from codesome/release-2.8.0
*: cut 2.8.0
2019-03-12 09:39:20 +05:30
Ganesh Vernekar 6043e0e715
*: cut 2.8.0
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-03-12 09:16:26 +05:30
Ganesh Vernekar 2df0b5e837
Merge pull request #5335 from simonpasquier/debug-travis-ci-failures
.travis.yml: download modules in advance
2019-03-12 09:06:42 +05:30
Simon Pasquier 758a68a52f .travis.yml: download modules in advance
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-03-11 21:49:15 +01:00
Krasi Georgiev 9d96ada510 Display correct values for the retention in the flags web gui. (#5322)
* Display correct values for the retention in the flags web gui.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>

* adding a log entry

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>

* added the retention info to the runtime status page

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>

* simplify the retention display

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-03-11 22:48:57 +05:30
Ganesh Vernekar b61c8e30dc
Merge pull request #5320 from roidelapluie/exprsize
ui: Expand expression_select to 220px
2019-03-11 22:24:12 +05:30
Julien Pivotto 6152df44c2 ui: Expand expression_select to 220px
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2019-03-11 17:24:25 +01:00
Ganesh Vernekar 46e0587cb3
Merge pull request #5318 from roidelapluie/sdtag
ui: remove extra table tag in service discovery
2019-03-11 20:13:13 +05:30
Ganesh Vernekar 901b82d94c
Merge pull request #5319 from roidelapluie/border
ui: Remove time picker borders
2019-03-11 20:11:22 +05:30
Julien Pivotto 5a162dc1ab
ui: remove extra table tag in service discovery
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2019-03-09 11:02:14 +01:00
Julien Pivotto 981f9208fb
ui: Remove time picker borders
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2019-03-08 22:58:31 +01:00
Ganesh Vernekar d390497280
*: cut 2.8.0-rc.0 (#5287)
* Release 2.8.0-rc.0

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Mark tsdb/370 experimental and update flag.

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* 5290 as BUGFIX

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-03-06 21:35:44 +05:30
David 38fea83c1f Debounce input key press handling (#5309)
- input key handler causes 2 layout cycles on each keypress which can
clog up browser rendering when typing quickly
- this change adds a debounce to the key press handler of 500ms

Fixes #5308
Signed-off-by: David Kaltschmidt <david.kaltschmidt@gmail.com>
2019-03-06 16:16:55 +01:00
Ganesh Vernekar 225bc77448
Merge pull request #5310 from codesome/vendor-tsdb
vendor: Update tsdb to 0.6.1
2019-03-05 22:39:54 +05:30
Ganesh Vernekar adbb57d5f0
Update tsdb to 0.6.1
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-03-05 22:15:22 +05:30
Ganesh Vernekar 25889d6be5
Merge pull request #5298 from prometheus/release-2.7
Merge 2.7.2 changelog forward
2019-03-05 19:16:16 +05:30
Tom Wilkie 2fa93595d6
More WAL remote_write tweaks. (#5300)
* Consistently pre-lookup the metrics for a given queue in queue manager.
* Don't open the WAL (for writing) in the remote_write code.
* Add some more logging.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-03-05 12:21:11 +00:00
Krasi Georgiev 1684dc750a
updated tsdb to 0.6.0 (#5292)
* updated tsdb to 0.6.0

as part of the update also added the new storage.tsdb.allow-overlapping-blocks flag and mark it as experimental.
2019-03-04 21:42:45 +02:00
Tariq Ibrahim 1adb91738d fix typo in recordType method of wal_watcher.go (#5297)
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-03-04 17:33:35 +01:00
Ganesh Vernekar cfb9135a41
Merge pull request #5290 from prometheus/scalar-crash
Fix panic when aggregator param is not a literal.
2019-03-04 21:51:14 +05:30
Tom Wilkie 38a9bbbec2 Loosen off PrometheusRemoteWriteBehind alert.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-03-04 12:47:24 +00:00
Brian Brazil 858c363e94 Fix panic when aggregator param is not a literal.
The return value for checkForSeriesSetExpansion
is always nil, simplify.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-03-04 12:00:05 +00:00
Björn Rabenstein 32130cbbfa
Merge pull request #5285 from prometheus/beorn7/cleanup
Cleanup .gitignore
2019-03-04 11:53:33 +01:00
Tariq Ibrahim 197e5ac597 docs: minor improvements to the service discovery README.md (#5296)
i) Increased the size of the Service Discovery Readme title
ii) Changed `TargetGroups` to "target groups" as it has been relocated and renamed to another package.

Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-03-03 19:48:03 +01:00
Tariq Ibrahim ab8e9b7423 fix typo in queue_manager.go comment (#5294)
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-03-03 11:35:29 +00:00
Goutham Veeramachaneni 82f98c825a
Merge pull request #5291 from gouthamve/sevenpointtwo
*: cut 2.7.2
2019-03-02 06:30:42 -08:00
Goutham Veeramachaneni 535e631621
*: cut 2.7.2
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-03-02 15:19:55 +01:00
Tom Wilkie 67da8e7b46
Refactor and fix queue resharding (#5286)
- Remove prometheus_remote_queue_last_send_timestamp_seconds metric.  Its not particularly useful, we have highest_timestamp_seconds.
- Factor out maxGauage, a gauge that only increases.
- Change sharding calculations to use max samples in timestamp - max samples out timestamp (not rates).
- Also include the ratio of samples dropped to correctly predict number of pending samples.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-03-01 11:04:26 -08:00
Tom Wilkie b615069289 Update metric names.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-03-01 07:39:48 -08:00
Paul Gier d8c06bb2b7 Makefile.common: update promu to v0.3.0 (#5280)
Signed-off-by: Paul Gier <pgier@redhat.com>
2019-02-28 19:00:49 +01:00
Callum Styan b8106dd459 Review feedback:
- Add a dropped samples EWMA and use it in calculating desired shards.
- Update metric names and a log messages.
- Limit number of entries in the dedupe logging middleware to prevent potential OOM.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-28 08:38:39 -08:00
Callum Styan 512f549064 Refactor: inline decodeRecord in readSegment and don't bother decoding samples records if we're not tailing the segment, add a benchmark test and fix some other tests
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-02-28 08:38:39 -08:00
Tom Wilkie f795942572 Decrement pending sample when queue exits.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-28 08:38:39 -08:00
Tom Wilkie ee7efa93fe Fix some tests.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-28 08:38:39 -08:00
Callum Styan b69bdfb4d1 Store the checkpoint we read last, so that we don't keep reading the same checkpoint on each tick.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-02-28 08:38:39 -08:00
Tom Wilkie efbd9559f4 Deal with corruptions in the WAL:
- If we're replaying the WAL to get series records, skip that segment when we hit corruptions.
- If we're tailing the WAL for samples, fail the watcher.
- When the watcher fails, restart from the latest checkpoint - and only send new samples by updating startTime.
- Tidy up log lines and error handling, don't return so many errors on quiting.
- Expect EOF when processing checkpoints.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-28 08:38:39 -08:00
Tom Wilkie 92fcf375b0 Update vendored TSDB version.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-28 08:38:39 -08:00
Tom Wilkie adf5307470 Update wal LiveReader to ensure EOF is correctly propagated.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-28 08:38:39 -08:00
Callum Styan d6258aea8f Fix up remote write tests:
- Tests that created a QueueManager were leaving behind files at the end of tests.
- WAL replaying (readToEnd)tests seem to require extra time to finish now.
- Some fixes to make staticcheck happy

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-02-28 08:38:39 -08:00
Tom Wilkie 184f06a981 Combine the record decoding metrics into one; break out garbage collection into a separate function.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-28 08:38:39 -08:00
Tom Wilkie 859cda27ff Remove some 'global' state, moving segment numbers to parameters.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-28 08:38:39 -08:00
Tom Wilkie bdc6b764b0 If reading the WAL fails, try again. Also, read from the segment containing the index for the last checkpoint, not the first segment.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-28 08:38:39 -08:00
Tom Wilkie d6f911b511 Factor out logging ratelimit & dedupe middleware.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-28 08:38:39 -08:00
Tom Wilkie a5c20642b3 Refactor WAL watcher to remove some duplication.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-28 08:38:39 -08:00
Tom Wilkie 37ad4db485 Export timestamps in seconds since epoch.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-28 08:38:39 -08:00