Commit graph

7834 commits

Author SHA1 Message Date
Simon Pasquier c8a1a5a93c
discovery/kubernetes: fix support for password_file and bearer_token_file (#5211)
* discovery/kubernetes: fix support for password_file

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Create and pass custom RoundTripper to Kubernetes client

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Use inline HTTPClientConfig

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-20 11:22:34 +01:00
Nguyen Van Duc 89d36a4bf6 Change http to https for security links (#5238)
Signed-off-by: vanduc95 <ducnguyenvan.bk@gmail.com>
2019-02-20 09:50:45 +00:00
Erik Hollensbe be3c082539 discovery/dns/dns.go: fix handling of truncated dns records
https://github.com/miekg/dns/pull/815 goes into the detail, but more or
less the existing solution was no longer supported and needed to be
rewritten to support the new versions of the library. miekg additionally
claims this is more correct in the ticket.

Signed-off-by: Erik Hollensbe <github@hollensbe.org>
2019-02-20 00:36:41 +00:00
Julius Volz f7332c4dcf
Merge pull request #5226 from prometheus/bootstrap4
Update to Bootstrap 4
2019-02-20 00:00:31 +00:00
Julius Volz 795c989d36 Merge branch 'master' into bootstrap4
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2019-02-19 22:32:55 +00:00
Palash Nigam 09208b1a58 queryRange: Add more descriptive error messages (#5229)
Fixes: https://github.com/prometheus/prometheus/issues/4811

Signed-off-by: Palash Nigam <npalash25@gmail.com>
2019-02-19 19:16:14 +00:00
Tom Wilkie 77d5a7d47a
LiveReader can get into an infinite loop on corrupt WALs. (#524)
Make WAL live tailer return EOF when the there is a half-written record at the end of the file.

Previously, this would cause an infinite loop as we ignored EOFs when filling the buffer.  We now differentiate between EOFs that read >0 bytes, and EOFs that didn't.

Add some more unit tests for tailing a corrupt WAL, and unify interfaces Reader and LiveReader for the purposes of testing.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-19 14:33:57 +00:00
Krasi Georgiev 8913617a9e
update makefile.common and run make all be default. (#529)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-02-19 13:55:01 +02:00
Krasi Georgiev a3c41f4256
use the default time retention value only when no size retention is set (#5216)
fixes https://github.com/prometheus/prometheus/issues/5213

Now that we have time and size base retention time bases should not have a default value. A default is set only when both - time and size flags are not set.

This change will not affect current installations that rely on the default time based value, and will avoid confusions when only the size retention is set and it is expected that the default time based setting would be no longer in place.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-02-19 13:53:43 +02:00
Krasi Georgiev 41dee81554
Makefile.common: add check_license by default. (#5236) 2019-02-19 13:25:10 +02:00
Tom Wilkie bc3b0bd429
Test to corrupt segments mid-WAL, repair and check we can read the correct number of records. (#528)
Test to corrupt segments mid-WAL, repair and check we can read the correct number of records.

Make segmentBufReader pad short segments with zeros, and only advance curr segment index after fully reading segment.
2019-02-18 19:05:07 +00:00
Simon Pasquier f9462d5d44 discovery/consul: pass current context to Consul queries
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-18 14:23:56 +01:00
Sylvain Rabot 87c79b0c81 Fix console templates (#5228)
Signed-off-by: Sylvain Rabot <s.rabot@lectra.com>
2019-02-18 13:14:58 +00:00
Julius Volz fdbaef86df Re-add typeahead license header to minified file
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2019-02-17 22:50:20 +00:00
Julius Volz 661f0127bc Rebuild web assets
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2019-02-17 22:50:16 +00:00
Julius Volz ed635190ba Re-add popper.js to fix target label tooltips
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2019-02-17 19:40:57 +00:00
Julius Volz 7b724cea3a Whitespace and other cleanups
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2019-02-17 19:40:52 +00:00
Julius Volz 7244ef3783 Add more top/bottom spacing for All/Unhealthy buttons
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2019-02-17 19:40:48 +00:00
Julius Volz 028e99e3d6 Remove spacing between All/Unhealthy buttons
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2019-02-17 19:40:44 +00:00
Julius Volz 45b91e8e80 Fix copy&paste button on /config, move pre style to CSS
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2019-02-17 19:40:30 +00:00
Julius Volz cd569b51d9 Merge branch 'master' into bootstrap4 2019-02-17 17:22:41 +00:00
Simon Pasquier b41d6d54f2
storage/remote: increase timeouts for Travis CI (#5224)
* storage/remote: adapt tests for Travis CI

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Check filesystems on Travis environment

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Run remote/storage tests on CircleCI for troubleshooting

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Try using tmpfs partition

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Revert "Try using tmpfs partition"

This reverts commit 85a30deb72.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Don't store labels in writeToMock

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Fix data race

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Bump retries to 100 meaning that the total timeout is 10s

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* clean up .travis.yml

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* code fixup

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Remove unneeded empty line

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-15 16:47:41 +01:00
Ganesh Vernekar c59ed492b2 Vertical query merging and compaction (#370)
* Vertical series iterator

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Select overlapped blocks first in compactor Plan()

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Added vertical compaction

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Code cleanup and comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix tests

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Add benchmark for compaction

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Perform vertical compaction only when blocks are overlapping.

Actions for vertical compaction:
* Sorting chunk metas
* Calling chunks.MergeOverlappingChunks on the chunks

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Benchmark for vertical compaction

* BenchmarkNormalCompaction => BenchmarkCompaction
* Moved the benchmark from db_test.go to compact_test.go

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Benchmark for query iterator and seek for non overlapping blocks

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Vertical query merge only for overlapping blocks

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Simplify logging in Compact(...)

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Updated CHANGELOG.md

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Calculate overlapping inside populateBlock

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* MinTime and MaxTime for BlockReader.

Using this to find overlapping blocks in populateBlock()

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Sort blocks w.r.t. MinTime in reload()

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Log about overlapping in LeveledCompactor.write() instead of returning bool

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Log about overlapping inside LeveledCompactor.populateBlock()

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Refactor createBlock to take optional []Series

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* review1

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>

* Updated CHANGELOG and minor nits

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* nits

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Updated CHANGELOG

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Refactor iterator and seek benchmarks for Querier.

Also has as overlapping blocks.

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Additional test case

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* genSeries takes optional labels. Updated BenchmarkQueryIterator and BenchmarkQuerySeek.

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Split genSeries into genSeries and populateSeries

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Check error in benchmark

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Warn about overlapping blocks in reload()

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-02-14 14:29:41 +01:00
Goutham Veeramachaneni b7594f650f
Merge pull request #5203 from codesome/shepherd
Propose myself (Ganesh, @codesome) as 2.8 release shepherd
2019-02-14 14:21:17 +01:00
Callum Styan 89ee5aaed4 clarify which segments are deleted when we find a corrupted segment (#522)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-02-14 05:44:19 +02:00
Bryan Boreham 74093f0508 Remove pointer indirection on chunk bstream (#499)
It isn't necessary since the member is never changed after
initialization.

Signed-off-by: Bryan Boreham <bryan@weave.works>
2019-02-13 23:41:12 +01:00
Simon Pasquier 12708acd15
scrape: catch errors when creating HTTP clients (#5182)
* scrape: catch errors when creating HTTP clients

This change makes sure that no scrape pool is created with a nil HTTP
client.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Address Tariq's comment

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Address Brian's comment

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-13 14:24:22 +01:00
Alec 109252f3aa Update encoding_helpers.go (len of be64 should be 8) (#521) 2019-02-13 11:04:21 +02:00
Tom Wilkie e248ffb220 Add alert for WAL remote write falling behind.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-12 15:22:58 +00:00
Callum Styan 37e35f9e0c Various improvements to WAL based remote write.
- Use the queue name in WAL watcher logging.
- Don't return from watch if the reader error was EOF.
- Fix sample timestamp check logic regarding what samples we send.
- Refactor so we don't need readToEnd/readSeriesRecords
- Fix wal_watcher tests since readToEnd no longer exists

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-02-12 11:39:13 +00:00
Tom Wilkie b93bafeee1 Various fixes to locking & shutdown for WAL-based remote write.
- Remove datarace in the exported highest scrape timestamp.
- Backoff on enqueue should be per-sample - reset the result for each sample.
- Remove diffKeys, unused ctx and cancelfunc in WALWatcher, 'name' from writeTo interface, and pass it to constructor.
- Reorder functions in WALWatcher depth-first according to call graph.
- Fix vendor/modules.txt.
- Split out the various timer periods into consts at the top of the file.
- Move w.currentSegmentMetric.Set close to where we set the currentSegment.
- Combine r.Next() and isClosed(w.quit) into a single loop.
- Unnest some ifs in WALWatcher.watch, propagate erros in decodeRecord, add some new lines to make it easier to read.
- Reorganise checkpoint handling to reduce nesting and make it easier to follow.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-02-12 11:39:13 +00:00
Callum Styan 6f69e31398 Tail the TSDB WAL for remote_write
This change switches the remote_write API to use the TSDB WAL.  This should reduce memory usage and prevent sample loss when the remote end point is down.

We use the new LiveReader from TSDB to tail WAL segments.  Logic for finding the tracking segment is included in this PR.  The WAL is tailed once for each remote_write endpoint specified. Reading from the segment is based on a ticker rather than relying on fsnotify write events, which were found to be complicated and unreliable in early prototypes.

Enqueuing a sample for sending via remote_write can now block, to provide back pressure.  Queues are still required to acheive parallelism and batching.  We have updated the queue config based on new defaults for queue capacity and pending samples values - much smaller values are now possible.  The remote_write resharding code has been updated to prevent deadlocks, and extra tests have been added for these cases.

As part of this change, we attempt to guarantee that samples are not lost; however this initial version doesn't guarantee this across Prometheus restarts or non-retryable errors from the remote end (eg 400s).

This changes also includes the following optimisations:
- only marshal the proto request once, not once per retry
- maintain a single copy of the labels for given series to reduce GC pressure

Other minor tweaks:
- only reshard if we've also successfully sent recently
- add pending samples, latest sent timestamp, WAL events processed metrics

Co-authored-by: Chris Marchbanks <csmarchbanks.com> (initial prototype)
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> (sharding changes)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-02-12 11:39:13 +00:00
Krasi Georgiev 9f28ffa6f4
Merge pull request #465 from krasi-georgiev/shutdown-during-compaction
use context to cancel compactions
2019-02-12 11:25:40 +02:00
Krasi Georgiev bf79c767f0 new line
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-02-12 11:08:09 +02:00
Krasi Georgiev beee5c58f3 Merge remote-tracking branch 'upstream/master' into shutdown-during-compaction
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-02-12 10:56:45 +02:00
Krasi Georgiev 6fce018def
Merge pull request #512 from krasi-georgiev/delete-compact-block-on-reload-error
Delete compact block on reload error
2019-02-12 10:50:21 +02:00
Maria Nemtinova 8e3a39f725 Web UI QoL improvements (#5201)
1. Added an ability to resize text area on mouseclick
2. Remember selected target status button on page reload

Signed-off-by: Maria Nemtinova <nemtinovamasha@gmail.com>
2019-02-12 00:22:05 +01:00
Ganesh Vernekar ce69dcb0e5
Propose @codesome as 2.8 release shepherd
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2019-02-11 23:44:01 +05:30
Krasi Georgiev bf2239079d refactor multi errors
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-02-11 12:28:46 +02:00
Krasi Georgiev c3a5c1d891 refactor error handling
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-02-11 12:22:11 +02:00
Krasi Georgiev 07df4fd383 nits
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-02-11 11:25:57 +02:00
JoeWrightss 4cb6c202ff Fix fmt.Errorf error message (#5199)
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
2019-02-10 15:16:20 +05:30
Tariq Ibrahim a2a6e24f9f show list of offending labels in the error message in many-to-many scenarios (#5189)
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-02-09 10:17:52 +01:00
Krasi Georgiev 0f8f5027ef remove nested for if
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-02-08 18:09:23 +02:00
Krasi Georgiev d48606827c simplify closers
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-02-08 13:35:32 +02:00
Krasi Georgiev e138c7ed7e Merge remote-tracking branch 'upstream/master' into delete-compact-block-on-reload-error
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-02-08 13:26:28 +02:00
Krasi Georgiev da9da9fbee fix the sleep logic
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-02-08 12:39:25 +02:00
Krasi Georgiev 0b72f9af4c
Merge pull request #270 from codesome/master
Head: don't create stones, delete samples directly
2019-02-08 12:35:01 +02:00
Minh-Long Do b26b5c9e96 Add rendering test of template based web endpoints (#5188)
Signed-off-by: Minh-Long  Do <minhlong.langos@gmail.com>
2019-02-08 10:17:47 +00:00
Krasi Georgiev 1e91d619de
Merge pull request #493 from simonpasquier/update-makefile-common
fix static check errors and removed a lot of unused code and vars.
updated Makefile.common
2019-02-08 11:47:06 +02:00