Commit graph

5874 commits

Author SHA1 Message Date
Tom Wilkie b93bafeee1 Various fixes to locking & shutdown for WAL-based remote write.
- Remove datarace in the exported highest scrape timestamp.
- Backoff on enqueue should be per-sample - reset the result for each sample.
- Remove diffKeys, unused ctx and cancelfunc in WALWatcher, 'name' from writeTo interface, and pass it to constructor.
- Reorder functions in WALWatcher depth-first according to call graph.
- Fix vendor/modules.txt.
- Split out the various timer periods into consts at the top of the file.
- Move w.currentSegmentMetric.Set close to where we set the currentSegment.
- Combine r.Next() and isClosed(w.quit) into a single loop.
- Unnest some ifs in WALWatcher.watch, propagate erros in decodeRecord, add some new lines to make it easier to read.
- Reorganise checkpoint handling to reduce nesting and make it easier to follow.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-12 11:39:13 +00:00
Callum Styan 6f69e31398 Tail the TSDB WAL for remote_write
This change switches the remote_write API to use the TSDB WAL.  This should reduce memory usage and prevent sample loss when the remote end point is down.

We use the new LiveReader from TSDB to tail WAL segments.  Logic for finding the tracking segment is included in this PR.  The WAL is tailed once for each remote_write endpoint specified. Reading from the segment is based on a ticker rather than relying on fsnotify write events, which were found to be complicated and unreliable in early prototypes.

Enqueuing a sample for sending via remote_write can now block, to provide back pressure.  Queues are still required to acheive parallelism and batching.  We have updated the queue config based on new defaults for queue capacity and pending samples values - much smaller values are now possible.  The remote_write resharding code has been updated to prevent deadlocks, and extra tests have been added for these cases.

As part of this change, we attempt to guarantee that samples are not lost; however this initial version doesn't guarantee this across Prometheus restarts or non-retryable errors from the remote end (eg 400s).

This changes also includes the following optimisations:
- only marshal the proto request once, not once per retry
- maintain a single copy of the labels for given series to reduce GC pressure

Other minor tweaks:
- only reshard if we've also successfully sent recently
- add pending samples, latest sent timestamp, WAL events processed metrics

Co-authored-by: Chris Marchbanks <csmarchbanks.com> (initial prototype)
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> (sharding changes)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-02-12 11:39:13 +00:00
Maria Nemtinova 8e3a39f725 Web UI QoL improvements (#5201)
1. Added an ability to resize text area on mouseclick
2. Remember selected target status button on page reload

Signed-off-by: Maria Nemtinova <nemtinovamasha@gmail.com>
2019-02-12 00:22:05 +01:00
Ganesh Vernekar ce69dcb0e5
Propose @codesome as 2.8 release shepherd
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-02-11 23:44:01 +05:30
JoeWrightss 4cb6c202ff Fix fmt.Errorf error message (#5199)
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
2019-02-10 15:16:20 +05:30
Tariq Ibrahim a2a6e24f9f show list of offending labels in the error message in many-to-many scenarios (#5189)
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-02-09 10:17:52 +01:00
Minh-Long Do b26b5c9e96 Add rendering test of template based web endpoints (#5188)
Signed-off-by: Minh-Long  Do <minhlong.langos@gmail.com>
2019-02-08 10:17:47 +00:00
jritchieBAE b8f0a41745 Update to Bootstrap 4.1.3 (#5192)
* web: updated bootstrap3-typeahead file to work with bootstrap 4.0.0

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: Replaced bootstrap-3.3.1 with bootstrap 4.0.0

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: Added bootstrap4-glyphicons as 4.0.0 doesnt include bootstrap3 glyphicons

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: updated js jquery to 3.3.1

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: updated _base.html to import new bootstrap 4.0.0, jquery3.3.1 and bootstrap class tags to be 4.0 compatible

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: _base.html missed word out in title tag (Server).

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: updated alerts.html class names and tags to be bootstrap 4 compatible.

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: updated config.html class names and tags to be bootstrap 4 compatible.

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: updated flags.html class names and tags to be bootstrap 4 compatible.

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: updated service-discovery.html class names and tags to be bootstrap 4 compatible.

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: updated status.html class names and tags to be bootstrap 4 compatible.

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: updated targets.html class names and tags to be bootstrap 4 compatible.

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: updated graph_template.handlebar class names and tags to be bootstrap 4 compatible.

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: alerts.css fix for button color inheritance on alerts page.

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: graph.css fix for color inheritance.

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: prometheus.css updated to fix nav bar.

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* web: previous merge conflict not fixed correctly on _base.html

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* menu.lib and prom.lib imports updated

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* bootstrap 4.1.3 imported

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Bootstrap 4.1.3 imported into _base.html

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* bootstrap 4.1.3 imported into prom.lib

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* menu.lib style adjusted to view sidebar

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Alert colour uplifted to bootstrap 4.1.3

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Alerts display code reformatted similarly to config

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Consoles pages adjusted to account for new navbar

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* LHS Menu fixed in console pages

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Minor changes to prom_console to adjust lhs nav

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Prom.lib and some css updated to fix console graph controls

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Bootstrap 4.0.0 files removed

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Consoles configured so that the graph fits with the new side bar, css files also adjusted

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Import popper.min.js for dropdowns

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Popper.min.js imported locally

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Re-added #4764 and fixed css

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Removed .DS_Store

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Rebuilt assets

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* Spaces between buttons and inputs on graph page removed

Signed-off-by: ksherryBAE <kieran.sherry@baesystems.com>

* fixed spacing in buttons on /targets

Signed-off-by: Pritam Bhudia <pritam.bhudia@baesystems.com>

* Updated vfsdata.go

Signed-off-by: Pritam Bhudia <pritam.bhudia@baesystems.com>

* fixed typeahead issue

Signed-off-by: James Ritchie <james.g.ritchie@baesystems.com>

* added css for dropdown

Signed-off-by: James Ritchie <james.g.ritchie@baesystems.com>

* changed order of css imports

Signed-off-by: James Ritchie <james.g.ritchie@baesystems.com>

* tinkered with CSS changes to make keyboard select and mouseover match

Signed-off-by: James Ritchie <james.g.ritchie@baesystems.com>
2019-02-07 22:18:09 +01:00
Simon Pasquier fc10f6d814
Unset GO111MODULE variable in Makefile.common (#5191)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-07 17:22:04 +01:00
Erik Hollensbe 9154e9aca8 Add miekg/dns 1.1.4
Signed-off-by: Erik Hollensbe <github@hollensbe.org>
2019-02-06 21:41:36 +00:00
Goutham Veeramachaneni 9b8bbe3246
Merge pull request #5187 from prometheus/beorn7/release
Merge v2.7 bugfixes into master
2019-02-06 21:32:06 +01:00
beorn7 d26e134bd4 Merge branch 'release-2.7' into beorn7/release
Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-06 15:22:40 +01:00
Björn Rabenstein 3db36f34ec
Merge pull request #5186 from prometheus/beorn7/metrics
Fix prometheus_rule_group_last_evaluation_timestamp_seconds
2019-02-06 15:19:08 +01:00
beorn7 2db1eeb4ec Fix prometheus_rule_group_last_evaluation_timestamp_seconds
It should be a unix timestamp, not the seconds in the minute.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-06 11:02:49 +01:00
zhulongcheng fd964426a7 web: predeclare and reuse errors (#5180)
Predeclare and reuse errors to reduce duplicate code

Signed-off-by: zhulongcheng <zhulongcheng.me@gmail.com>
2019-02-04 13:06:26 +01:00
zhulongcheng a75f8a8e05 update error message in extractTimeRange (#5179)
Update error message in the extractTimeRange function
to match function's logic

Signed-off-by: zhulongcheng <zhulongcheng.me@gmail.com>
2019-02-03 09:29:23 +00:00
JoeWrightss e158c53fa9 Fix some typos in comment (#5175)
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
2019-02-01 14:35:32 +00:00
Brian Brazil c66aeb3fff
In histogram_quantile merge buckets with equivalent le values (#5158)
This makes things generally more resilient, and will
help with OpenMetrics transitions (and inconsistencies).

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-02-01 10:22:44 +00:00
Simon Pasquier a60431f3cd Merge v2.7.1 into master (#5170)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-01 09:54:12 +01:00
Vishnunarayan K I 108b9b0e5f Limit number of merics in prometheus UI (#5139)
Signed-off-by: Vishnunarayan K I <appukuttancr@gmail.com>
2019-01-31 17:03:50 +00:00
Frederic Branczyk 50e1228f88
Merge pull request #5147 from prometheus/brancz-patch-1
docs: Add filesystem POSIX requirement
2019-01-31 16:20:46 +01:00
Frederic Branczyk 32079f351f
docs: Specifically call out NFS and POSIX
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
2019-01-31 12:57:48 +01:00
Goutham Veeramachaneni 62e591f928
*: cut 2.7.1 (#5164)
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-31 12:13:25 +01:00
Goutham Veeramachaneni b03d6f6eff
Remove custom highlight code, it's not needed. (#5163)
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-01-31 11:27:18 +01:00
Ganesh Vernekar 10ae00ab9d Fix bug from #4898 (#5161)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-01-31 11:14:14 +01:00
Ganesh Vernekar 787eb1e904 Set rule_group_last_duration_seconds to seconds (#5153)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-01-31 11:07:58 +01:00
Frederic Branczyk 3de734d8de
docs: Add filesystem POSIX requirement
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
2019-01-29 13:51:16 +01:00
Ganesh Vernekar a2ef8cf2f5
Merge pull request #5146 from tariq1890/ineff
fix ineffectual assignment in dns.go
2019-01-29 11:50:34 +05:30
tariqibrahim b173de0c26 fix ineffectual assignment in dns.go
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-01-28 17:15:43 -08:00
Jannick Fahlbusch ฏ๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎ 63f375e80a [FIX] Azure DS: Return error when request failed (#4719)
This fixes the issue that the error is swallowed when the request failed.

Signed-off-by: Jannick Fahlbusch <git@jf-projects.de>
2019-01-28 21:31:45 +00:00
Brian Brazil 1dd57765b4
Reduce time that alertmanagers are in flux when reloaded. (#5126)
This no longer waits for all of the scrape reload to complete
before getting a list of AMs again.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-01-28 18:34:12 +00:00
Bryan Boreham 8841692a63 Use the context associated with the inner evaluation span (#5130)
Signed-off-by: Bryan Boreham <bryan@weave.works>
2019-01-28 18:33:30 +00:00
Tariq Ibrahim f4275d2352 Use the latest versions of azure go sdk and go-autorest (#5015)
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-01-28 18:30:29 +00:00
Tariq Ibrahim bfcdba211f remove the prepended watch reactor from the fake k8s client (#5140)
Signed-off-by: tariqibrahim <tariq.ibrahim@microsoft.com>
2019-01-28 16:42:25 +01:00
Goutham Veeramachaneni 410ee9e04a
*: cut 2.7.0 (#5141)
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-28 15:37:30 +05:30
Goutham Veeramachaneni 7f7b211047
*: cut 2.7.0-rc.2 (#5134)
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-24 18:55:04 +05:30
Goutham Veeramachaneni b454ed3ec2
*: cut 2.7.0-rc.1 (#5123)
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-21 18:47:37 +05:30
Goutham Veeramachaneni 4e83f91cfd
Rollback Dockerfile to version @ 2.5.x (#5122)
Fixes https://github.com/prometheus/prometheus/issues/5043

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-21 17:27:16 +05:30
Hrishikesh Barman 9c4e258651 corrected regex string check for anyorigin(*) (#5117)
Signed-off-by: Hrishikesh Barman <hrishikeshbman@gmail.com>
2019-01-21 17:17:27 +05:30
Goutham Veeramachaneni 24f19f03db
*: cut 2.7.0-rc.0 (#5114)
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-18 22:16:02 +05:30
Goutham Veeramachaneni 4068968e12
Protect retention from overflowing (#5112)
Also sanitise the max block duration to max a month.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-18 20:18:06 +05:30
Goutham Veeramachaneni 384cba1211
Add flag for size based retention (#5109)
* Add flag for size based retention

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Deprecate the old retention flag for a new one.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Add ability to take a suffix for size flag

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Address feedback

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-18 19:18:36 +05:30
Krasi Georgiev 3bd41cc92c Udpate tsdb to 0.4 (#5110)
* update tsdb to v0.4.0

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>

* remove unused struct field

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-01-18 16:32:14 +05:30
Simon Pasquier 68e4c211f2
discovery/azure: more robust handling of go routines (#5106)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-01-18 09:55:47 +01:00
Hrishikesh Barman a1f34bec2e Added CORS Origin flag (#5011)
Signed-off-by: Hrishikesh Barman <hrishikeshbman@gmail.com>
2019-01-17 15:01:06 +00:00
Matt Layher c44cd7e166
Merge pull request #5102 from prometheus/mdl-gofmt
*: apply gofmt -s
2019-01-16 19:12:43 -05:00
Matt Layher 67c43f3054
Merge pull request #5101 from prometheus/mdl-no-fatal
pkg/runtime: use panic instead of log.Fatal for system call errors
2019-01-16 19:12:29 -05:00
Matt Layher a68d1f2688
Merge pull request #5100 from prometheus/mdl-lexer-subtests
promql: use subtests in TestLexer
2019-01-16 19:12:11 -05:00
Matt Layher 302148fd69 *: apply gofmt -s
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2019-01-16 17:28:14 -05:00
Matt Layher 28eff63eca pkg/runtime: use panic instead of log.Fatal for system call errors
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2019-01-16 17:03:30 -05:00