Commit graph

5522 commits

Author SHA1 Message Date
Palash Nigam 09208b1a58 queryRange: Add more descriptive error messages (#5229)
Fixes: https://github.com/prometheus/prometheus/issues/4811

Signed-off-by: Palash Nigam <npalash25@gmail.com>
2019-02-19 19:16:14 +00:00
Krasi Georgiev a3c41f4256
use the default time retention value only when no size retention is set (#5216)
fixes https://github.com/prometheus/prometheus/issues/5213

Now that we have time and size base retention time bases should not have a default value. A default is set only when both - time and size flags are not set.

This change will not affect current installations that rely on the default time based value, and will avoid confusions when only the size retention is set and it is expected that the default time based setting would be no longer in place.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-02-19 13:53:43 +02:00
Krasi Georgiev 41dee81554
Makefile.common: add check_license by default. (#5236) 2019-02-19 13:25:10 +02:00
Sylvain Rabot 87c79b0c81 Fix console templates (#5228)
Signed-off-by: Sylvain Rabot <s.rabot@lectra.com>
2019-02-18 13:14:58 +00:00
Simon Pasquier b41d6d54f2
storage/remote: increase timeouts for Travis CI (#5224)
* storage/remote: adapt tests for Travis CI

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Check filesystems on Travis environment

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Run remote/storage tests on CircleCI for troubleshooting

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Try using tmpfs partition

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Revert "Try using tmpfs partition"

This reverts commit 85a30deb72.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Don't store labels in writeToMock

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Fix data race

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Bump retries to 100 meaning that the total timeout is 10s

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* clean up .travis.yml

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* code fixup

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Remove unneeded empty line

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-15 16:47:41 +01:00
Goutham Veeramachaneni b7594f650f
Merge pull request #5203 from codesome/shepherd
Propose myself (Ganesh, @codesome) as 2.8 release shepherd
2019-02-14 14:21:17 +01:00
Simon Pasquier 12708acd15
scrape: catch errors when creating HTTP clients (#5182)
* scrape: catch errors when creating HTTP clients

This change makes sure that no scrape pool is created with a nil HTTP
client.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Address Tariq's comment

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Address Brian's comment

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-13 14:24:22 +01:00
Callum Styan 37e35f9e0c Various improvements to WAL based remote write.
- Use the queue name in WAL watcher logging.
- Don't return from watch if the reader error was EOF.
- Fix sample timestamp check logic regarding what samples we send.
- Refactor so we don't need readToEnd/readSeriesRecords
- Fix wal_watcher tests since readToEnd no longer exists

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-02-12 11:39:13 +00:00
Tom Wilkie b93bafeee1 Various fixes to locking & shutdown for WAL-based remote write.
- Remove datarace in the exported highest scrape timestamp.
- Backoff on enqueue should be per-sample - reset the result for each sample.
- Remove diffKeys, unused ctx and cancelfunc in WALWatcher, 'name' from writeTo interface, and pass it to constructor.
- Reorder functions in WALWatcher depth-first according to call graph.
- Fix vendor/modules.txt.
- Split out the various timer periods into consts at the top of the file.
- Move w.currentSegmentMetric.Set close to where we set the currentSegment.
- Combine r.Next() and isClosed(w.quit) into a single loop.
- Unnest some ifs in WALWatcher.watch, propagate erros in decodeRecord, add some new lines to make it easier to read.
- Reorganise checkpoint handling to reduce nesting and make it easier to follow.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-12 11:39:13 +00:00
Callum Styan 6f69e31398 Tail the TSDB WAL for remote_write
This change switches the remote_write API to use the TSDB WAL.  This should reduce memory usage and prevent sample loss when the remote end point is down.

We use the new LiveReader from TSDB to tail WAL segments.  Logic for finding the tracking segment is included in this PR.  The WAL is tailed once for each remote_write endpoint specified. Reading from the segment is based on a ticker rather than relying on fsnotify write events, which were found to be complicated and unreliable in early prototypes.

Enqueuing a sample for sending via remote_write can now block, to provide back pressure.  Queues are still required to acheive parallelism and batching.  We have updated the queue config based on new defaults for queue capacity and pending samples values - much smaller values are now possible.  The remote_write resharding code has been updated to prevent deadlocks, and extra tests have been added for these cases.

As part of this change, we attempt to guarantee that samples are not lost; however this initial version doesn't guarantee this across Prometheus restarts or non-retryable errors from the remote end (eg 400s).

This changes also includes the following optimisations:
- only marshal the proto request once, not once per retry
- maintain a single copy of the labels for given series to reduce GC pressure

Other minor tweaks:
- only reshard if we've also successfully sent recently
- add pending samples, latest sent timestamp, WAL events processed metrics

Co-authored-by: Chris Marchbanks <csmarchbanks.com> (initial prototype)
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> (sharding changes)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-02-12 11:39:13 +00:00
Maria Nemtinova 8e3a39f725 Web UI QoL improvements (#5201)
1. Added an ability to resize text area on mouseclick
2. Remember selected target status button on page reload

Signed-off-by: Maria Nemtinova <nemtinovamasha@gmail.com>
2019-02-12 00:22:05 +01:00
Ganesh Vernekar ce69dcb0e5
Propose @codesome as 2.8 release shepherd
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-02-11 23:44:01 +05:30
JoeWrightss 4cb6c202ff Fix fmt.Errorf error message (#5199)
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
2019-02-10 15:16:20 +05:30
Tariq Ibrahim a2a6e24f9f show list of offending labels in the error message in many-to-many scenarios (#5189)
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-02-09 10:17:52 +01:00
Minh-Long Do b26b5c9e96 Add rendering test of template based web endpoints (#5188)
Signed-off-by: Minh-Long  Do <minhlong.langos@gmail.com>
2019-02-08 10:17:47 +00:00
Simon Pasquier fc10f6d814
Unset GO111MODULE variable in Makefile.common (#5191)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-07 17:22:04 +01:00
Goutham Veeramachaneni 9b8bbe3246
Merge pull request #5187 from prometheus/beorn7/release
Merge v2.7 bugfixes into master
2019-02-06 21:32:06 +01:00
beorn7 d26e134bd4 Merge branch 'release-2.7' into beorn7/release
Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-06 15:22:40 +01:00
Björn Rabenstein 3db36f34ec
Merge pull request #5186 from prometheus/beorn7/metrics
Fix prometheus_rule_group_last_evaluation_timestamp_seconds
2019-02-06 15:19:08 +01:00
beorn7 2db1eeb4ec Fix prometheus_rule_group_last_evaluation_timestamp_seconds
It should be a unix timestamp, not the seconds in the minute.

Signed-off-by: beorn7 <beorn@soundcloud.com>
2019-02-06 11:02:49 +01:00
zhulongcheng fd964426a7 web: predeclare and reuse errors (#5180)
Predeclare and reuse errors to reduce duplicate code

Signed-off-by: zhulongcheng <zhulongcheng.me@gmail.com>
2019-02-04 13:06:26 +01:00
zhulongcheng a75f8a8e05 update error message in extractTimeRange (#5179)
Update error message in the extractTimeRange function
to match function's logic

Signed-off-by: zhulongcheng <zhulongcheng.me@gmail.com>
2019-02-03 09:29:23 +00:00
JoeWrightss e158c53fa9 Fix some typos in comment (#5175)
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
2019-02-01 14:35:32 +00:00
Brian Brazil c66aeb3fff
In histogram_quantile merge buckets with equivalent le values (#5158)
This makes things generally more resilient, and will
help with OpenMetrics transitions (and inconsistencies).

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-02-01 10:22:44 +00:00
Simon Pasquier a60431f3cd Merge v2.7.1 into master (#5170)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-01 09:54:12 +01:00
Vishnunarayan K I 108b9b0e5f Limit number of merics in prometheus UI (#5139)
Signed-off-by: Vishnunarayan K I <appukuttancr@gmail.com>
2019-01-31 17:03:50 +00:00
Frederic Branczyk 50e1228f88
Merge pull request #5147 from prometheus/brancz-patch-1
docs: Add filesystem POSIX requirement
2019-01-31 16:20:46 +01:00
Frederic Branczyk 32079f351f
docs: Specifically call out NFS and POSIX
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
2019-01-31 12:57:48 +01:00
Goutham Veeramachaneni 62e591f928
*: cut 2.7.1 (#5164)
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-31 12:13:25 +01:00
Goutham Veeramachaneni b03d6f6eff
Remove custom highlight code, it's not needed. (#5163)
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-01-31 11:27:18 +01:00
Ganesh Vernekar 10ae00ab9d Fix bug from #4898 (#5161)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-01-31 11:14:14 +01:00
Ganesh Vernekar 787eb1e904 Set rule_group_last_duration_seconds to seconds (#5153)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-01-31 11:07:58 +01:00
Frederic Branczyk 3de734d8de
docs: Add filesystem POSIX requirement
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
2019-01-29 13:51:16 +01:00
Ganesh Vernekar a2ef8cf2f5
Merge pull request #5146 from tariq1890/ineff
fix ineffectual assignment in dns.go
2019-01-29 11:50:34 +05:30
tariqibrahim b173de0c26 fix ineffectual assignment in dns.go
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-01-28 17:15:43 -08:00
Jannick Fahlbusch ฏ๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎๎ 63f375e80a [FIX] Azure DS: Return error when request failed (#4719)
This fixes the issue that the error is swallowed when the request failed.

Signed-off-by: Jannick Fahlbusch <git@jf-projects.de>
2019-01-28 21:31:45 +00:00
Brian Brazil 1dd57765b4
Reduce time that alertmanagers are in flux when reloaded. (#5126)
This no longer waits for all of the scrape reload to complete
before getting a list of AMs again.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-01-28 18:34:12 +00:00
Bryan Boreham 8841692a63 Use the context associated with the inner evaluation span (#5130)
Signed-off-by: Bryan Boreham <bryan@weave.works>
2019-01-28 18:33:30 +00:00
Tariq Ibrahim f4275d2352 Use the latest versions of azure go sdk and go-autorest (#5015)
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-01-28 18:30:29 +00:00
Tariq Ibrahim bfcdba211f remove the prepended watch reactor from the fake k8s client (#5140)
Signed-off-by: tariqibrahim <tariq.ibrahim@microsoft.com>
2019-01-28 16:42:25 +01:00
Goutham Veeramachaneni 410ee9e04a
*: cut 2.7.0 (#5141)
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-28 15:37:30 +05:30
Goutham Veeramachaneni 7f7b211047
*: cut 2.7.0-rc.2 (#5134)
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-24 18:55:04 +05:30
Goutham Veeramachaneni b454ed3ec2
*: cut 2.7.0-rc.1 (#5123)
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-21 18:47:37 +05:30
Goutham Veeramachaneni 4e83f91cfd
Rollback Dockerfile to version @ 2.5.x (#5122)
Fixes https://github.com/prometheus/prometheus/issues/5043

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-21 17:27:16 +05:30
Hrishikesh Barman 9c4e258651 corrected regex string check for anyorigin(*) (#5117)
Signed-off-by: Hrishikesh Barman <hrishikeshbman@gmail.com>
2019-01-21 17:17:27 +05:30
Goutham Veeramachaneni 24f19f03db
*: cut 2.7.0-rc.0 (#5114)
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-18 22:16:02 +05:30
Goutham Veeramachaneni 4068968e12
Protect retention from overflowing (#5112)
Also sanitise the max block duration to max a month.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-18 20:18:06 +05:30
Goutham Veeramachaneni 384cba1211
Add flag for size based retention (#5109)
* Add flag for size based retention

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Deprecate the old retention flag for a new one.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Add ability to take a suffix for size flag

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Address feedback

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-18 19:18:36 +05:30
Krasi Georgiev 3bd41cc92c Udpate tsdb to 0.4 (#5110)
* update tsdb to v0.4.0

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>

* remove unused struct field

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-01-18 16:32:14 +05:30
Simon Pasquier 68e4c211f2
discovery/azure: more robust handling of go routines (#5106)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-01-18 09:55:47 +01:00