prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-09-20 07:47:31 -07:00

Author	SHA1	Message	Date
Tom Wilkie	adf5307470	Update wal LiveReader to ensure EOF is correctly propagated. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Callum Styan	d6258aea8f	Fix up remote write tests: - Tests that created a QueueManager were leaving behind files at the end of tests. - WAL replaying (readToEnd)tests seem to require extra time to finish now. - Some fixes to make staticcheck happy Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	184f06a981	Combine the record decoding metrics into one; break out garbage collection into a separate function. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	859cda27ff	Remove some 'global' state, moving segment numbers to parameters. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	bdc6b764b0	If reading the WAL fails, try again. Also, read from the segment containing the index for the last checkpoint, not the first segment. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	d6f911b511	Factor out logging ratelimit & dedupe middleware. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	a5c20642b3	Refactor WAL watcher to remove some duplication. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	37ad4db485	Export timestamps in seconds since epoch. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
JoeWrightss	362873f72b	Fix .Log() error message (#5257 ) Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>	2019-02-22 14:39:37 +00:00
Simon Pasquier	b41d6d54f2	storage/remote: increase timeouts for Travis CI (#5224 ) * storage/remote: adapt tests for Travis CI Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Check filesystems on Travis environment Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Run remote/storage tests on CircleCI for troubleshooting Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Try using tmpfs partition Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Revert "Try using tmpfs partition" This reverts commit `85a30deb72`. Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Don't store labels in writeToMock Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Fix data race Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Bump retries to 100 meaning that the total timeout is 10s Signed-off-by: Simon Pasquier <spasquie@redhat.com> * clean up .travis.yml Signed-off-by: Simon Pasquier <spasquie@redhat.com> * code fixup Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Remove unneeded empty line Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-02-15 16:47:41 +01:00
Callum Styan	37e35f9e0c	Various improvements to WAL based remote write. - Use the queue name in WAL watcher logging. - Don't return from watch if the reader error was EOF. - Fix sample timestamp check logic regarding what samples we send. - Refactor so we don't need readToEnd/readSeriesRecords - Fix wal_watcher tests since readToEnd no longer exists Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-02-12 11:39:13 +00:00
Tom Wilkie	b93bafeee1	Various fixes to locking & shutdown for WAL-based remote write. - Remove datarace in the exported highest scrape timestamp. - Backoff on enqueue should be per-sample - reset the result for each sample. - Remove diffKeys, unused ctx and cancelfunc in WALWatcher, 'name' from writeTo interface, and pass it to constructor. - Reorder functions in WALWatcher depth-first according to call graph. - Fix vendor/modules.txt. - Split out the various timer periods into consts at the top of the file. - Move w.currentSegmentMetric.Set close to where we set the currentSegment. - Combine r.Next() and isClosed(w.quit) into a single loop. - Unnest some ifs in WALWatcher.watch, propagate erros in decodeRecord, add some new lines to make it easier to read. - Reorganise checkpoint handling to reduce nesting and make it easier to follow. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-12 11:39:13 +00:00
Callum Styan	6f69e31398	Tail the TSDB WAL for remote_write This change switches the remote_write API to use the TSDB WAL. This should reduce memory usage and prevent sample loss when the remote end point is down. We use the new LiveReader from TSDB to tail WAL segments. Logic for finding the tracking segment is included in this PR. The WAL is tailed once for each remote_write endpoint specified. Reading from the segment is based on a ticker rather than relying on fsnotify write events, which were found to be complicated and unreliable in early prototypes. Enqueuing a sample for sending via remote_write can now block, to provide back pressure. Queues are still required to acheive parallelism and batching. We have updated the queue config based on new defaults for queue capacity and pending samples values - much smaller values are now possible. The remote_write resharding code has been updated to prevent deadlocks, and extra tests have been added for these cases. As part of this change, we attempt to guarantee that samples are not lost; however this initial version doesn't guarantee this across Prometheus restarts or non-retryable errors from the remote end (eg 400s). This changes also includes the following optimisations: - only marshal the proto request once, not once per retry - maintain a single copy of the labels for given series to reduce GC pressure Other minor tweaks: - only reshard if we've also successfully sent recently - add pending samples, latest sent timestamp, WAL events processed metrics Co-authored-by: Chris Marchbanks <csmarchbanks.com> (initial prototype) Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> (sharding changes) Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-02-12 11:39:13 +00:00
Goutham Veeramachaneni	384cba1211	Add flag for size based retention (#5109 ) * Add flag for size based retention Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Deprecate the old retention flag for a new one. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Add ability to take a suffix for size flag Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Address feedback Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>	2019-01-18 19:18:36 +05:30
Krasi Georgiev	3bd41cc92c	Udpate tsdb to 0.4 (#5110 ) * update tsdb to v0.4.0 Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com> * remove unused struct field Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>	2019-01-18 16:32:14 +05:30
Matt Layher	302148fd69	*: apply gofmt -s Signed-off-by: Matt Layher <mdlayher@gmail.com>	2019-01-16 17:28:14 -05:00
Callum Styan	5358f76c5c	update remote write path proto so that Labels/Timeseries can't be nil (#4957 ) Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-01-15 19:13:39 +00:00
Simon Pasquier	f678e27eb6	: use latest release of staticcheck (#5057 ) : use latest release of staticcheck It also fixes a couple of things in the code flagged by the additional checks. Signed-off-by: Simon Pasquier <spasquie@redhat.com> Use official release of staticcheck Also run 'go list' before staticcheck to avoid failures when downloading packages. Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-01-04 14:47:38 +01:00
glutamatt	5ddde1965b	tune the "Wal segment size" with a flag (#5029 ) Add WALSegmentSize as an option, and the corresponding flag "storage.tsdb.wal-segment-size" to tune the max size of wal segment files. The addressed base problem is to reduce the disk space used by wal segment files : on a raspberry pi, for instance, we often want to reduce write load of the sd card, then, the wal directory is mounted on a memory (space limited) partition. the default value of the segment max file size, pushed the size of directory to 128 MB for each segment , which is too much ram consumption on a rasp. the initial discussion is at https://github.com/prometheus/tsdb/pull/450	2019-01-03 17:13:21 +03:00
Tom Wilkie	6e08029b56	Move err to be the last return value from storage.Select. (#5054 ) Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-01-02 11:10:13 +00:00
AixesHunter	806632790e	update inconsistent comment (#5046 ) Co-Authored-By: aixeshunter <44970652+aixeshunter@users.noreply.github.com> Signed-off-by: aixeshunter <aixeshunter@gmail.com>	2018-12-27 14:02:36 +00:00
Bartek Płotka	62c8337e77	Moved configuration into `relabel` package. (#4955 ) Adapted top dir relabel to use pkg relabel structs. Removal of this in a separate tracked here: https://github.com/prometheus/prometheus/issues/3647 Signed-off-by: Bartek Plotka <bwplotka@gmail.com>	2018-12-18 11:26:36 +00:00
Alin Sinpalean	44bec482fb	Minor optimization for BufferedSeriesIterator: actually drop the samples falling outside of the new delta from the underlying sampleRing, when ReduceDelta is called. (#4849 ) Signed-off-by: Alin Sinpalean <alin.sinpalean@gmail.com>	2018-12-18 11:25:45 +00:00
Alin Sinpalean	d6adfe2ae2	Use a fake SeriesIterator (that generates samples on the fly instead of using a slice) for BufferedSeriesIterator, to reduce the variance of benchmark results due to memory pressure. (#4847 ) Signed-off-by: Alin Sinpalean <alin.sinpalean@gmail.com>	2018-12-18 11:22:33 +00:00
Ryota Arai	135d580ab2	Introduce min_shards for remote write to set minimum number of shards. (#4924 ) Signed-off-by: Ryota Arai <ryota.arai@gmail.com>	2018-12-04 17:32:14 +00:00
mknapphrt	f0e9196dca	Return warnings on a remote read fail (#4832 ) Signed-off-by: Mark Knapp <mknapp@hudson-trading.com>	2018-11-30 14:27:12 +00:00
Ben Kochie	c6399296dc	Fix spelling/typos (#4921 ) * Fix spelling/typos Fix spelling/typos reported by codespell/misspell. * UK -> US spelling changes. Signed-off-by: Ben Kochie <superq@gmail.com>	2018-11-27 17:44:29 +01:00
Daniele Sluijters	f25a6baedb	remote: Set User-Agent header in requests (#4891 ) Currently Prometheus requests show up with a UA of Go-http-client/1.1 which isn't super helpful. Though the X-Prometheus-Remote-* headers exist they need to be explicitly configured when logging the request in order to be able to deduce this is a request originating from Prometheus. By setting the header we remove this ambiguity and make default server logs just a bit more useful. This also updates a few other places to consistently capitalize the 'P' in the user agent, as well as ensure we set a UA to begin with. Signed-off-by: Daniele Sluijters <daenney@users.noreply.github.com>	2018-11-23 22:49:49 +08:00
Krasi Georgiev	bd100182b2	added tsdb/head mint maxt metrics (#4888 ) added the head metrics with the correct suffix. Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>	2018-11-21 12:57:32 +02:00
Simon Pasquier	ed19373a78	: remove use of golang.org/x/net/context (#4869 ) : remove use of golang.org/x/net/context Signed-off-by: Simon Pasquier <spasquie@redhat.com> scrape: fix TestTargetScrapeScrapeCancel Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2018-11-19 12:31:16 +01:00
Ganesh Vernekar	ca93fd544b	/api/v1/labels endpoint for getting all label names (#4835 ) * vendor: update tsdb Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * /api/v1/labels endpoint Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * regex matchers for API Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Add docs Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Matchers behaving as OR Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Removed the matchers Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * vendor: update tsdb using go mod Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * vendor update: tsdb Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Added LabelNames() to storage.Querier Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Test for api.labelNames Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Nits Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2018-11-19 15:51:14 +05:30
fengyuceNv	94fff219ab	improve remote storage enqueue performance (#4772 ) Signed-off-by: fyc <fyc22788@ly.com>	2018-11-13 12:19:05 +00:00
Tariq Ibrahim	3f7ed7de49	Adding new metric type to track in-flight remote read queries. (#4677 ) Signed-off-by: tariqibrahim <tariq.ibrahim@microsoft.com>	2018-10-10 14:48:32 -07:00
Tom Wilkie	d3a1ff1abf	Reduce memory usage of remote read by reducing pointer usage. (#4655 ) Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2018-09-25 19:14:00 +01:00
yzpeninsula	4ae3bce260	Fix typo (#4497 ) Signed-off-by: yzpeninsula <yzpeninsula@gmail.com>	2018-09-13 16:04:10 +05:30
Tom Wilkie	457e4bb58e	Limit the number of samples remote read can return. (#4532 ) * Limit the number of samples remote read can return. - Return 413 entity too large. - Limit can be set be a flag. Allow 0 to mean no limit. - Include limit in error message. - Set default limit to 50M (* 16 bytes = 800MB). Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2018-09-05 15:50:50 +02:00
Daisy T	7d01ead689	change time.duration to model.duration for standardization (#4479 ) Signed-off-by: Daisy T <daisyts@gmx.com>	2018-08-24 16:55:21 +02:00
Julius Volz	8fbe1b5133	Handle a bunch of unchecked errors (#4461 ) There are many more (mostly finalizers like Close/Stop/etc.), but most of the others seemed like one couldn't do much about them anyway. Signed-off-by: Julius Volz <julius.volz@gmail.com>	2018-08-17 17:24:35 +02:00
Henri DF	ffb7836c14	Send "Accept-Encoding" header in read request (#4421 ) We should be doing this since we only accept Snappy-encoded responses. Signed-off-by: Henri DF <henridf@gmail.com>	2018-07-26 12:45:04 +01:00
Henri DF	3abb2cc349	Fix typo (#4423 ) Signed-off-by: Henri DF <henridf@gmail.com>	2018-07-26 08:49:53 +01:00
Alin Sinpalean	372e7652b7	Reuse (copy) overlapping matrix samples between range evaluation steps (#4315 ) * Reuse (copy) overlapping matrix samples between range evaluation steps. Signed-off-by: Alin Sinpalean <alin.sinpalean@gmail.com>	2018-07-18 11:14:02 +01:00
Goutham Veeramachaneni	c28cc5076c	Saner defaults and metrics for remote-write (#4279 ) * Rename queueCapacity to shardCapacity * Saner defaults for remote write * Reduce allocs on retries Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2018-07-18 05:15:16 +01:00
Alin Sinpalean	e3b775b78b	Simplify BufferedSeriesIterator usage (#4294 ) * Allow for BufferedSeriesIterator instances to be created without an underlying iterator, to simplify their usage. Signed-off-by: Alin Sinpalean <alin.sinpalean@gmail.com>	2018-07-18 05:10:28 +01:00
Thomas Jackson	92c6f0c92e	Add offset to selectParams (#4226 ) * Add Start/End to SelectParams * Make remote read use the new selectParams for start/end This commit will continue sending the start/end time of the remote read query as the overarching promql time and the specific range of data that the query is intersted in receiving a response to is now part of the ReadHints (upstream discussion in #4226). * Remove unused vendored code The genproto.sh script was updated, but the code wasn't regenerated. This simply removes the vendored deps that are no longer part of the codegen output. Signed-off-by: Thomas Jackson <jacksontj.89@gmail.com>	2018-07-18 04:58:00 +01:00
Brian Brazil	fb695fb435	Merge pull request #4285 from prometheus/release-2.3 Merge release-2.3 back to master	2018-06-20 14:51:00 +01:00
Tom Wilkie	b8217720ac	Review feedback. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2018-06-19 13:03:01 +01:00
Corentin Chary	db9dbeeaec	federation: nil pointer deference when using remove read ``` level=error ts=2018-06-13T07:19:04.515149169Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56202: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.516199547Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56204: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.51717692Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56206: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.564952878Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56208: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.566575791Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56210: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.567106063Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56212: runtime error: invalid memory address or nil pointer dereference" ``` When remove read is enabled, federation will call `q.Select(nil, mset...)` which will break remote reads because it currently doesn't handle empty SelectParams. Signed-off-by: Corentin Chary <c.chary@criteo.com>	2018-06-19 13:03:01 +01:00
Brian Brazil	78efdc6d6b	Avoid infinite loop on duplicate NaN values. (#4275 ) Fixes #4254 NaNs don't equal themselves, so a duplicate NaN would always hit the break statement and never get popped. We should not be returning multiple data point for the same timestamp, so don't compare values at all. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2018-06-18 17:34:08 +01:00
Tom Wilkie	0b189b2da9	Review feedback. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2018-06-18 17:21:12 +01:00
Corentin Chary	530107f8ef	federation: nil pointer deference when using remove read ``` level=error ts=2018-06-13T07:19:04.515149169Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56202: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.516199547Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56204: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.51717692Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56206: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.564952878Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56208: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.566575791Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56210: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.567106063Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56212: runtime error: invalid memory address or nil pointer dereference" ``` When remove read is enabled, federation will call `q.Select(nil, mset...)` which will break remote reads because it currently doesn't handle empty SelectParams. Signed-off-by: Corentin Chary <c.chary@criteo.com>	2018-06-18 17:21:12 +01:00
Andreas Auernhammer	37d1bcf495	limit size of POST requests against remote read endpoint (#4239 ) This commit fixes a denial-of-service issue of the remote read endpoint. It limits the size of the POST request body to 32 MB such that clients cannot write arbitrary amounts of data to the server memory. Fixes #4238 Signed-off-by: Andreas Auernhammer <aead@mail.de>	2018-06-08 08:19:20 +01:00
Fabian Reinartz	fe80dddbc4	Merge pull request #4210 from bboreham/log-remote-name Add queue name to logger for remote writes	2018-06-04 15:49:39 +02:00
Brian Brazil	dd6781add2	Optimise PromQL (#3966 ) * Move range logic to 'eval' Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make aggregegate range aware Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * PromQL is statically typed, so don't eval to find the type. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Extend rangewrapper to multiple exprs Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Start making function evaluation ranged Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make instant queries a special case of range queries Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Eliminate evalString Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Evaluate range vector functions one series at a time Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make unary operators range aware Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make binops range aware Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Pass time to range-aware functions. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make simple _over_time functions range aware Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Reduce allocs when working with matrix selectors Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Add basic benchmark for range evaluation Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Reuse objects for function arguments Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Do dropmetricname and allocating output vector only once. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Add range-aware support for range vector functions with params Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Optimise holt_winters, cut cpu and allocs by ~25% Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make rate&friends range aware Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make more functions range aware. Document calling convention. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make date functions range aware Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make simple math functions range aware Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Convert more functions to be range aware Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make more functions range aware Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Specialcase timestamp() with vector selector arg for range awareness Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Remove transition code for functions Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Remove the rest of the engine transition code Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Remove more obselete code Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Remove the last uses of the eval* functions Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Remove engine finalizers to prevent corruption The finalizers set by matrixSelector were being called just before the value they were retruning to the pool was then being provided to the caller. Thus a concurrent query could corrupt the data that the user has just been returned. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Add new benchmark suite for range functinos Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Migrate existing benchmarks to new system Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Expand promql benchmarks Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Simply test by removing unused range code Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * When testing instant queries, check range queries too. To protect against subsequent steps in a range query being affected by the previous steps, add a test that evaluates an instant query that we know works again as a range query with the tiimestamp we care about not being the first step. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Reuse ring for matrix iters. Put query results back in pool. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Reuse buffer when iterating over matrix selectors Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Unary minus should remove metric name Cut down benchmarks for faster runs. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Reduce repetition in benchmark test cases Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Work series by series when doing normal vectorSelectors Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Optimise benchmark setup, cuts time by 60% Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Have rangeWrapper use an evalNodeHelper to cache across steps Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Use evalNodeHelper with functions Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Cache dropMetricName within a node evaluation. This saves both the calculations and allocs done by dropMetricName across steps. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Reuse input vectors in rangewrapper Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Reuse the point slices in the matrixes input/output by rangeWrapper Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make benchmark setup faster using AddFast Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Simplify benchmark code. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Add caching in VectorBinop Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Use xor to have one-level resultMetric hash key Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Add more benchmarks Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Call Query.Close in apiv1 This allows point slices allocated for the response data to be reused by later queries, saving allocations. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Optimise histogram_quantile It's now 5-10% faster with 97% less garbage generated for 1k steps Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make the input collection in rangeVector linear rather than quadratic Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Optimise label_replace, for 1k steps 15x fewer allocs and 3x faster Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Optimise label_join, 1.8x faster and 11x less memory for 1k steps Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Expand benchmarks, cleanup comments, simplify numSteps logic. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Address Fabian's comments Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Comments from Alin. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Address jrv's comments Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Remove dead code Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Address Simon's comments. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Rename populateIterators, pre-init some sizes Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Handle case where function has non-matrix args first Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Split rangeWrapper out to rangeEval function, improve comments Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Cleanup and make things more consistent Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Make EvalNodeHelper public Signed-off-by: Brian Brazil <brian.brazil@robustperception.io> * Fabian's comments. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2018-06-04 15:47:45 +02:00
Bryan Boreham	3277aeefaa	Add queue name to logger for remote writes More than one remote_write destination can be configured, in which case it's essential to know which one each log message refers to. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2018-06-01 13:04:00 +00:00
Tom Wilkie	b58199bf12	Review feedback. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2018-05-29 11:35:43 +01:00
Tom Wilkie	3353bbd018	Add proper unclean shutdown handling with a cancellable context. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2018-05-29 09:51:29 +01:00
Tom Wilkie	e51d6c4b6c	Make remote flush deadline a command line param. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2018-05-23 15:06:01 +01:00
Tom Wilkie	a6c353613a	Make the flush deadline configurable. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2018-05-23 15:04:36 +01:00
Tom Wilkie	aa17263edd	Remove WaitGroup and extra goroutine. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2018-05-23 15:04:34 +01:00
Tom Wilkie	f3c61f8bb2	Only give remote queues 1 minute to flush samples on shutdown. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2018-05-23 15:04:32 +01:00
Tom Wilkie	ba418780be	Dedupe samples in the mergeIterator. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2018-05-23 12:15:47 +01:00
Henri DF	2952387ed1	Pass query hints down into remote read query proto (#4122 ) Signed-off-by: Henri DF <henridf@gmail.com>	2018-05-08 09:48:13 +01:00
Adam Shannon	809881d7f5	support reading basic_auth password_file for HTTP basic auth (#4077 ) Issue: https://github.com/prometheus/prometheus/issues/4076 Signed-off-by: Adam Shannon <adamkshannon@gmail.com>	2018-04-25 18:19:06 +01:00
Mario Trangoni	464e747f1e	fix some comments typos (#4059 )	2018-04-08 10:51:54 +01:00
ferhat elmas	ec8e4d8a7c	all: remove unnecessary type conversions (#3992 ) excep promql due to not to create conflict with #3966.	2018-03-21 09:25:22 +00:00
Tom Wilkie	02a154ced6	Merge pull request #3941 from prometheus/3809-correctly-stop-timer Correctly stop the timer used in the remote write path.	2018-03-13 09:05:52 +00:00
Tom Wilkie	dc860e7d0e	Fix nit.	2018-03-12 16:48:51 +00:00
Tom Wilkie	390b018c90	Test sample timeout delivery.	2018-03-12 15:35:43 +00:00
Tom Wilkie	22d820ef8e	Review feedback.	2018-03-12 14:27:48 +00:00
Brian Brazil	a8c22c85cc	Correctly handle pruning wraparound after ring expansion (#3942 ) Fixes #3939	2018-03-12 13:16:59 +00:00
Tom Wilkie	f8c9d375b6	Correctly stop the timer used in the remote write path.	2018-03-09 12:00:26 +00:00
ferhat elmas	ffa673f7d8	General simplifications (#3887 ) Another try as in #1516	2018-02-26 07:58:10 +00:00
Fabian Reinartz	7ccd4b39b8	*: implement query params This adds a parameter to the storage selection interface which allows query engine(s) to pass information about the operations surrounding a data selection. This can for example be used by remote storage backends to infer the correct downsampling aggregates that need to be provided.	2018-02-13 12:17:22 +01:00
Tom Wilkie	a730083cbf	Merge pull request #3731 from bboreham/reuse-timer Re-use timer in remote storage queue	2018-02-05 10:54:08 +01:00
Krasi Georgiev	b75428ec19	rename package retrieve to scrape no fucnctinal changes just renaming retrieval to scrape	2018-02-01 09:55:07 +00:00
Tom Wilkie	3dc5b8eef5	Use sub benchmarks.	2018-01-29 11:37:48 +00:00
Tom Wilkie	da29c09dca	Some benchmarks for the mergeSeries set.	2018-01-26 11:01:59 +00:00
Tom Wilkie	749781edf3	Also, don't make a mergeSeriesSet if there is only one SeriesSet.	2018-01-25 11:17:16 +00:00
Tom Wilkie	48e39068bd	Don't allocate a mergeSeries if there is only one series to merge.	2018-01-25 11:11:55 +00:00
Bryan Boreham	8a4535e6ad	Re-use timer instead of creating new ones on every sample The docs for `time.After()` note that "The underlying Timer is not recovered by the garbage collector until the timer fires".	2018-01-24 12:36:29 +00:00
Tom Wilkie	f2c5399e39	Merge pull request #3561 from twiedenbein/master fixed bug with initialization of queueconfig	2018-01-17 12:24:58 +00:00
Shubheksha Jalan	0471e64ad1	Use shared types from the `common` repo (#3674 ) * refactor: use shared types from common repo, remove util/config * vendor: add common/config * fix nit	2018-01-11 16:10:25 +01:00
Shubheksha Jalan	ec94df49d4	Refactor SD configuration to remove `config` dependency (#3629 ) * refactor: move targetGroup struct and CheckOverflow() to their own package * refactor: move auth and security related structs to a utility package, fix import error in utility package * refactor: Azure SD, remove SD struct from config * refactor: DNS SD, remove SD struct from config into dns package * refactor: ec2 SD, move SD struct from config into the ec2 package * refactor: file SD, move SD struct from config to file discovery package * refactor: gce, move SD struct from config to gce discovery package * refactor: move HTTPClientConfig and URL into util/config, fix import error in httputil * refactor: consul, move SD struct from config into consul discovery package * refactor: marathon, move SD struct from config into marathon discovery package * refactor: triton, move SD struct from config to triton discovery package, fix test * refactor: zookeeper, move SD structs from config to zookeeper discovery package * refactor: openstack, remove SD struct from config, move into openstack discovery package * refactor: kubernetes, move SD struct from config into kubernetes discovery package * refactor: notifier, use targetgroup package instead of config * refactor: tests for file, marathon, triton SD - use targetgroup package instead of config.TargetGroup * refactor: retrieval, use targetgroup package instead of config.TargetGroup * refactor: storage, use config util package * refactor: discovery manager, use targetgroup package instead of config.TargetGroup * refactor: use HTTPClient and TLS config from configUtil instead of config * refactor: tests, use targetgroup package instead of config.TargetGroup * refactor: fix tagetgroup.Group pointers that were removed by mistake * refactor: openstack, kubernetes: drop prefixes * refactor: remove import aliases forced due to vscode bug * refactor: move main SD struct out of config into discovery/config * refactor: rename configUtil to config_util * refactor: rename yamlUtil to yaml_config * refactor: kubernetes, remove prefixes * refactor: move the TargetGroup package to discovery/ * refactor: fix order of imports	2017-12-29 21:01:34 +01:00
Ed Schouten	bb724f1bef	Deprecate DeduplicateSeriesSet() in favor of NewMergeSeriesSet(). Federation makes use of dedupedSeriesSet to merge SeriesSets for every query into one output stream. If many match[] arguments are provided, many dedupedSeriesSet objects will get chained. This has the downside of causing a potential O(nk) running time, where n is the number of series and k the number of match[] arguments. In the mean time, the storage package provides a mergeSeriesSet that accomplishes the same with an O(nlog(k)) running time by making use of a binary heap. Let's just get rid of dedupedSeriesSet and change all existing callers to use mergeSeriesSet.	2017-12-10 19:51:20 +01:00
Tom Wiedenbein	937ac8c060	fixed bug with initialization of queueconfig QueueConfigs would only ever initialize to the default settings, and would not pick up their respective values from YAML.	2017-12-08 02:11:45 -08:00
Fabian Reinartz	83cd270ea4	*: adapt to storage interface changes	2017-11-23 19:05:04 +01:00
Tobias Schmidt	7098c56474	Add remote read filter option For special remote read endpoints which have only data for specific queries, it is desired to limit the number of queries sent to the configured remote read endpoint to reduce latency and performance overhead.	2017-11-13 23:30:01 +01:00
Tobias Schmidt	434f0374f7	Refactor remote storage querier handling * Decouple remote client from ReadRecent feature. * Separate remote read filter into a small, testable function. * Use storage.Queryable interface to compose independent functionalities.	2017-11-13 23:19:15 +01:00
Tobias Schmidt	9b0091d487	Add storage.Queryable and storage.QueryableFunc In order to compose different querier implementations more easily, this change introduces a separate storage.Queryable interface grouping the query (Querier) function of the storage. Furthermore, it adds a QueryableFunc type to ease writing very simple queryable implementations.	2017-11-13 20:19:37 +01:00
Julius Volz	9f10c63cff	Fix remote read labelset corruption (#3456 ) The labelsets returned from remote read are mutated in higher levels (like seriesFilter.Labels()) and since the concreteSeriesSet didn't return a copy, the external mutation affected the labelset in the concreteSeries itself. This resulted in bizarre bugs where local and remote series would show with identical label sets in the UI, but not be deduplicated, since internally, a series might come to look like: {__name__="node_load5", instance="192.168.1.202:12090", job="node_exporter", node="odroid", node="odroid"} (note the repetition of the last label)	2017-11-12 00:47:47 +01:00
Krasi Georgiev	5d8f93a22a	now using only github.com/gogo/protobuf bumped all grpc-gateway packages to v1.2.2 updated and run the denproto.sh script	2017-11-02 11:31:57 +00:00
Fabian Reinartz	30e777d10d	tsdb: default too small max block duration	2017-10-30 12:09:56 +01:00
Tom Wilkie	48a7a00a38	Fast path the merge querier (#3358 ) * Fast path the merge querier such that it is completely removed from query path when there is no remote storage. * Add NoopQuerier * Add copyright notice. * Avoid global, use a function.	2017-10-27 13:29:05 +02:00
Tom Wilkie	0e572686db	Revert "Bypass the fanout storage merging if no remote storage is configured."	2017-10-26 16:09:39 +01:00
Tom Wilkie	1af3ef431d	s/TestRemoveLabels/TestSeriesSetFilter/	2017-10-26 13:50:39 +01:00
Tom Wilkie	9c3c98e8de	Revert "Port 'Don't disable HTTP keep-alives for remote storage connections.' to 2.0 (see #3173 )" This reverts commit `0997191b18`.	2017-10-26 13:43:48 +01:00
Tom Wilkie	746752b946	Merge external labels in order.	2017-10-26 11:44:49 +01:00
Tom Wilkie	6e4d4ea402	Initialise some counters in remote storage API.	2017-10-26 11:09:45 +01:00
Tom Wilkie	2ae04d0e79	Add license header.	2017-10-26 11:09:16 +01:00
Tom Wilkie	e8c264e47a	Add comment.	2017-10-26 11:09:16 +01:00
Tom Wilkie	ee011d906d	Port remote read server to 2.0.	2017-10-26 11:09:14 +01:00
Bryan Boreham	0997191b18	Port 'Don't disable HTTP keep-alives for remote storage connections.' to 2.0 (see #3173 ) Removes configurability introduced in #3160 in favour of hard-coding, per advice from @brian-brazil.	2017-10-26 11:08:33 +01:00
Tom Wilkie	56820726fa	Move a couple of the encoding/decoding functions into codec.go	2017-10-26 11:08:33 +01:00
Conor Broderick	08b7328669	Port Metric name validation to 2.0 (see #2975 )	2017-10-26 11:08:33 +01:00
Tom Wilkie	8fe0212ff7	Port 'Make queue manager configurable.' to 2.0, see #2991	2017-10-26 11:08:33 +01:00
Tom Wilkie	3760f56c0c	remote: Expose ClientConfig type (see #3165 )	2017-10-26 11:08:33 +01:00
Tom Wilkie	16f71a7723	Port codec.go over form 1.8 branch.	2017-10-26 11:08:33 +01:00
Fabian Reinartz	e53040e2ac	Merge pull request #3339 from tomwilkie/3065-remote-read-bypass Bypass the fanout storage merging if no remote storage is configured.	2017-10-26 09:14:26 +02:00
Fabian Reinartz	bf56ad4233	Merge branch 'master' into master	2017-10-26 09:06:12 +02:00
Paul Gier	c4c3205d76	storage/tsdb: check that max block duration is larger than min If the user accidentally sets the max block duration smaller than the min, the current error is not informative. This change just performs the check earlier and improves the error message.	2017-10-25 19:24:49 -05:00
Fabian Reinartz	ce63a5a855	Merge pull request #3352 from prometheus/rc2 Cut v2.0.0-rc.2	2017-10-25 20:39:39 +02:00
Thibault Chataigner	fc4406201e	Tsdb StartTime : Use a simplier way to compute StartTime	2017-10-25 17:41:00 +02:00
Julius Volz	099df0c5f0	Migrate "golang.org/x/net/context" -> "context" (#3333 ) In some places, where ctxhttp or gRPC are concerned, we still need to use the old contexts.	2017-10-24 21:21:42 -07:00
Tom Wilkie	4bbef0ec30	Bypass the fanout storage merging if no remote storage is configured.	2017-10-23 21:34:53 +01:00
Fabian Reinartz	a57ea79660	Close index reader properly	2017-10-23 21:59:18 +02:00
Julius Volz	c3d6abc8e6	Fix some lint errors (#3334 ) I left the promql ones and some others untouched as I remember that @fabxc prefers them that way.	2017-10-23 14:57:30 +01:00
Julius Volz	2846d62573	Fix staticcheck issue in test (#3331 ) staticcheck fails with: storage/remote/read_test.go:199:27: do not pass a nil Context, even if a function permits it; pass context.TODO if you are unsure about which Context to use (SA1012)	2017-10-23 11:51:48 +01:00
Brian Brazil	4a50f547c8	removeLabels needs a pointer to work. (#3326 )	2017-10-21 08:29:03 +01:00
Thibault Chataigner	bf4a279a91	Remote storage reads based on oldest timestamp in primary storage (#3129 ) Currently all read queries are simply pushed to remote read clients. This is fine, except for remote storage for wich it unefficient and make query slower even if remote read is unnecessary. So we need instead to compare the oldest timestamp in primary/local storage with the query range lower boundary. If the oldest timestamp is older than the mint parameter, then there is no need for remote read. This is an optionnal behavior per remote read client. Signed-off-by: Thibault Chataigner <t.chataigner@criteo.com>	2017-10-18 12:08:14 +01:00
Julius Volz	9ef8518b37	Remove "package remote" garbage from license headers (#3304 )	2017-10-17 02:26:38 +01:00
Tobias Schmidt	721050c6cb	Update prometheus/tsdb dependency	2017-10-16 15:36:25 +02:00
Julius Volz	33c1171b9c	Don't add anchoring to exported `Value` matcher field Instead, just make the anchoring part of the internal regex. This helps because some users will want to read back the `Value` field and expect it to be the same as the input value (e.g. some tests in Cortex), or use the value in another context which is already expected to add its own anchoring, leading to superfluous double anchoring (such as when we translate matchers into remote read request matchers).	2017-10-10 10:10:21 -07:00
Brian Brazil	73dc96e7f5	Fix leak of ticker in remote storage queue manager.	2017-10-09 19:44:03 +01:00
Brian Brazil	ee88f0d222	Ensure all values are used or _	2017-10-09 19:44:03 +01:00
Brian Brazil	37ec2d5283	Fix off by one error in concreteSeriesSet (#3262 )	2017-10-09 13:37:58 +01:00
Marc Sluiter	6a633eece1	Added go-conntrack for monitoring http connections (#3241 ) Added metrics for in- and outgoing traffic with go-conntrack.	2017-10-06 11:22:19 +01:00
Julius Volz	f7e8348a88	Re-add contexts to storage.Storage.Querier() (#3230 ) * Re-add contexts to storage.Storage.Querier() These are needed when replacing the storage by a multi-tenant implementation where the tenant is stored in the context. The 1.x query interfaces already had contexts, but they got lost in 2.x. * Convert promql.Engine to use native contexts	2017-10-04 21:04:15 +02:00
Fabian Reinartz	7b02bfee0a	web: start web handler while TSDB is starting up	2017-09-20 15:03:19 +02:00
Fabian Reinartz	d21f149745	*: migrate to go-kit/log	2017-09-08 22:01:51 +05:30
Fabian Reinartz	0efecea6d4	Adapt storage APIs to uint64 references	2017-09-07 14:14:41 +02:00
Fabian Reinartz	0c81d5f719	storage: instantiate correct block ranges	2017-08-24 12:36:07 +02:00
Fabian Reinartz	2037778d14	vendor: update TSDB	2017-08-10 14:51:02 +02:00
Tom Wilkie	b11bc8ae24	Fix some comments.	2017-08-01 11:19:35 +01:00
Tom Wilkie	ec999ff397	Prevent number of remote write shards from going negative. This can happen in the situation where the system scales up the number of shards massively (to deal with some backlog), then scales it down again as the number of samples sent during the time period is less than the number received.	2017-07-19 16:32:09 +01:00
Tom Wilkie	a09acdcc5b	Make concreteSeriersIterator behave.	2017-07-13 18:33:08 +01:00
Tom Wilkie	994a7f27d6	Propagate errors through mergeSeriesSet correctly.	2017-07-13 15:02:01 +01:00
Tom Wilkie	2e0d8487e3	Return zeros if At() is called after Next() returns false.	2017-07-13 14:40:29 +01:00
Tom Wilkie	014bd31a86	Remove unnecessary whitespace changes, add comment.	2017-07-13 11:26:46 +01:00
Tom Wilkie	98ac07f86a	Add unit test for the merging on the read path.	2017-07-13 11:05:38 +01:00
Tom Wilkie	b568ace7ce	Move protos to ./prompb	2017-07-12 22:06:35 +01:00
Tom Wilkie	96e25adc8d	Introduce 'primary' storage in fanout, and have Add return the ref from the primary. Also, ensure all append batches are rolled back when a commit or rollback fails.	2017-07-12 15:51:05 +01:00
Tom Wilkie	db8128ceeb	Add label set as first parameter to AddFast, ingored by TSDB adapter.	2017-07-12 15:20:12 +01:00
Tom Wilkie	2dda5775e3	Initial port of remote storage to v2.	2017-07-12 12:27:57 +01:00
Fabian Reinartz	16464c3a33	Merge pull request #2910 from prometheus/adminapi Admin API	2017-07-11 17:15:49 +02:00
Fabian Reinartz	ccf9e62972	*: add admin grpc API	2017-07-10 09:14:14 +02:00
Goutham Veeramachaneni	243419c007	Return tsdb.ErrOutOfBounds as storage.ErrOutOfBounds Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-07-06 14:18:31 +02:00
Goutham Veeramachaneni	3069bd3996	Handle scrapes with OutOfBounds metrics better fixes #2894 Signed-off-by: Goutham Veeramachaneni <goutham@boomerangcommerce.com>	2017-07-04 11:24:13 +02:00
Goutham Veeramachaneni	d407bd150c	Consolidate the duration params in CLI * All CLI params moved to model.Duration Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-16 20:20:57 +05:30
Goutham Veeramachaneni	baf5b0f0fc	Fix error where we look into the future. (#2829 ) * Fix error where we look into the future. So currently we are adding values that are in the future for an older timestamp. For example, if we have [(1, 1), (150, 2)] we will end up showing [(1, 1), (2,2)]. Further it is not advisable to call .At() after Next() returns false. Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in> * Retuen early if done Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in> * Handle Seek() where we reach the end of iterator Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in> * Simplify code Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-13 07:22:27 +02:00
Brian Brazil	c02c25d5ba	Allow peeking back further in buffer.	2017-05-24 14:27:17 +01:00
Fabian Reinartz	d289dc55c3	storage: update TSDB	2017-05-22 11:53:08 +02:00
Fabian Reinartz	9b175d48cb	Add flag to disable TSDB lock file	2017-05-09 12:56:51 +02:00
Fabian Reinartz	0f3110487d	Merge remote-tracking branch 'origin/dev-2.0' into dev-2.0	2017-04-27 10:25:04 +02:00
Fabian Reinartz	37deb21c45	vendor: remove unused dependency and last ref to fabxc/tsdb	2017-04-27 10:23:34 +02:00
Brian Brazil	5c9a6ce747	Add license to files. This should fix CI for dev-2.0.	2017-04-19 13:46:22 +01:00
Fabian Reinartz	8ffc851147	Merge branch 'master' into dev-2.0	2017-04-04 15:17:56 +02:00
Fabian Reinartz	cfb2a7f1d5	vendor: sync organisation migration of tsdb	2017-04-04 11:33:51 +02:00
Fabian Reinartz	bbcf20ba01	web: deduplicate series in federation	2017-04-04 11:20:23 +02:00
Fabian Reinartz	4e41987bcb	storage: add deduplication function This adds a function to deduplicate two series sets given that duplicate series have equivalent data points.	2017-04-04 11:07:21 +02:00
Björn Rabenstein	50e4f49b7e	Merge pull request #2561 from prometheus/beorn7/storage2 storage: Evict unused chunk.Descs in crash recovery	2017-04-04 00:05:03 +02:00
beorn7	08fc6cbd39	storage: Evict unused chunk.Descs in crash recovery This is in line with the v1.5 change in paradigm to not keep chunk.Descs without chunks around after a series maintenance. It's mainly motivated by avoiding excessive amounts of RAM usage during crash recovery. The code avoids to create memory time series with zero chunk.Descs as that is prone to trigger weird effects. (Series maintenance would archive series with zero chunk.Descs, but we cannot do that here because the archive indices still have to be checked.)	2017-04-04 00:04:22 +02:00
Björn Rabenstein	1c6240fc40	Merge pull request #2559 from prometheus/beorn7/storage storage: Replace fpIter by sortedFPs	2017-04-03 16:56:21 +02:00
beorn7	d284ffab03	storage: Replace fpIter by sortedFPs The fpIter was kind of cumbersome to use and required a lock for each iteration (which wasn't even needed for the iteration at startup after loading the checkpoint). The new implementation here has an obvious penalty in memory, but it's only 8 byte per series, so 80MiB for a beefy server with 10M memory time series (which would probably need ~100GiB RAM, so the memory penalty is only 0.1% of the total memory need). The big advantage is that now series maintenance happens in order, which leads to the time between two maintenances of the same series being less random. Ideally, after each maintenance, the next maintenance would tackle the series with the largest number of non-persisted chunks. That would be quite an effort to find out or track, but with the approach here, the next maintenance will tackle the series whose previous maintenance is longest ago, which is a good approximation. While this commit won't change the _average_ number of chunks persisted per maintenance, it will reduce the mean time a given chunk has to wait for its persistence and thus reduce the steady-state number of chunks waiting for persistence. Also, the map iteration in Go is non-deterministic but not truly random. In practice, the iteration appears to be somewhat "bucketed". You can often observe a bunch of series with similar duration since their last maintenance, i.e. you see batches of series with similar number of chunks persisted per maintenance. If that batch is relatively young, a whole lot of series are maintained with very few chunks to persist. (See screenshot in PR for a better explanation.)	2017-04-03 15:34:46 +02:00
Tobias Schmidt	eac36d123e	Fix unstable fanin test (#2558 )	2017-04-03 13:02:15 +02:00
Julius Volz	5a896033e3	Add remote read external label handling (#2555 ) * Add remote read external label handling This implements rule 1 and 2 from https://docs.google.com/document/d/188YauRgfF0J4CYMigLsVNN34V_kUwKnApBs2dQMfBbs/edit * Use more descriptive example labels in read test * Add comment for querier.addExternalLabels() * Make argument naming in removeLabels() more generic	2017-04-02 17:48:15 +02:00
Björn Rabenstein	e63d079b59	Merge pull request #2527 from prometheus/beorn7/storage storage: Evict chunks and calculate persistence pressure...	2017-03-27 14:49:42 +02:00
Julius Volz	b5b0e00923	Merge pull request #2499 from prometheus/remote-read Remote Read	2017-03-27 14:43:44 +02:00
beorn7	434ab2a6a3	storage: Evict chunks and calculate persistence pressure based on target heap size This is a fairly easy attempt to dynamically evict chunks based on the heap size. A target heap size has to be set as a command line flage, so that users can essentially say "utilize 4GiB of RAM, and please don't OOM". The -storage.local.max-chunks-to-persist and -storage.local.memory-chunks flags are deprecated by this change. Backwards compatibility is provided by ignoring -storage.local.max-chunks-to-persist and use -storage.local.memory-chunks to set the new -storage.local.target-heap-size to a reasonable (and conservative) value (both with a warning). This also makes the metrics intstrumentation more consistent (in naming and implementation) and cleans up a few quirks in the tests. Answers to anticipated comments: There is a chance that Go 1.9 will allow programs better control over the Go memory management. I don't expect those changes to be in contradiction with the approach here, but I do expect them to complement them and allow them to be more precise and controlled. In any case, once those Go changes are available, this code has to be revisted. One might be tempted to let the user specify an estimated value for the RSS usage, and then internall set a target heap size of a certain fraction of that. (In my experience, 2/3 is a fairly safe bet.) However, investigations have shown that RSS size and its relation to the heap size is really really complicated. It depends on so many factors that I wouldn't even start listing them in a commit description. It depends on many circumstances and not at least on the risk trade-off of each individual user between RAM utilization and probability of OOMing during a RAM usage peak. To not add even more to the confusion, we need to stick to the well-defined number we also use in the targeting here, the sum of the sizes of heap objects.	2017-03-27 14:33:50 +02:00
beorn7	96a303b348	storage: Use staleness delta as head chunk timeout Currently, if a series stops to exist, its head chunk will be kept open for an hour. That prevents it from being persisted. Which prevents it from being evicted. Which prevents the series from being archived. Most of the time, once no sample has been added to a series within the staleness limit, we can be pretty confident that this series will not receive samples anymore. The whole chain as described above can be started after 5m instead of 1h. In the relaxed case, this doesn't change a lot as the head chunk timeout is only checked during series maintenance, and usually, a series is only maintained every six hours. However, there is the typical scenario where a large service is deployed, the deoply turns out to be bad, and then it is deployed again within minutes, and quite quickly the number of time series has tripled. That's the point where the Prometheus server is stressed and switches (rightfully) into rushed mode. In that mode, time series are processed as quickly as possible, but all of that is in vein if all of those recently ended time series cannot be persisted yet for another hour. In that scenario, this change will help most, and it's exactly the scenario where help is most desperately needed.	2017-03-26 23:44:50 +02:00
Julius Volz	3f23aa2cc7	Add headers to indicate remote read/write version Also add Content-Type header.	2017-03-24 17:39:51 +01:00
Julius Volz	8fda83ea12	Make rules only read local data	2017-03-21 00:50:04 +01:00
Julius Volz	94acd3f1d8	Add fanin tests and fix uncovered bugs	2017-03-21 00:08:17 +01:00
Julius Volz	9b33cfc457	Fix/unify context-based remote storage timeouts	2017-03-20 14:17:06 +01:00
Julius Volz	815762a4ad	Move retrieval.NewHTTPClient -> httputil.NewClientFromConfig	2017-03-20 14:17:04 +01:00
Fabian Reinartz	397f001ac5	Merge branch 'master' into dev-2.0	2017-03-20 14:12:11 +01:00
Julius Volz	eb14678a25	Make remote read/write use config.HTTPClientConfig	2017-03-20 13:37:50 +01:00
Julius Volz	406b65d0dc	Rename remote.Storage to remote.Writer	2017-03-20 13:15:28 +01:00
Julius Volz	02395a224d	[WIP] Remote Read	2017-03-20 13:13:44 +01:00
Julius Volz	40e41a4776	Merge pull request #2494 from tomwilkie/remote-write-sharding Dynamically reshard the QueueManager based on observed load.	2017-03-20 12:45:17 +01:00
Fabian Reinartz	b586781283	*: update tsdb vendoring and add retention flag	2017-03-17 16:06:04 +01:00
beorn7	48d221c11e	storage: Fix typo in comment	2017-03-16 11:49:41 +01:00
Fabian Reinartz	0ecd205794	promql: Use buffer pool for matrix allocations	2017-03-14 10:57:34 +01:00
Tom Wilkie	75bb0f3253	Review feedback	2017-03-13 21:24:49 +00:00
Tom Wilkie	77cce900b8	Fix tests	2017-03-13 15:21:59 +00:00
Tom Wilkie	b48799a01e	Add license stanza	2017-03-13 14:50:15 +00:00
Tom Wilkie	9d22f030cf	Dynamically reshard the QueueManager based on observed load.	2017-03-13 14:41:16 +00:00
Fabian Reinartz	8a8eb12985	storage/tsdb: don't use partitioned DB.	2017-03-07 11:51:30 +01:00
Fabian Reinartz	9eb1d6c927	remote: take code from master	2017-03-07 11:43:32 +01:00
Fabian Reinartz	9304179ef7	Merge branch 'master' into dev-2.0	2017-03-02 08:16:58 +01:00
Fabian Reinartz	4397b4d508	*: pass Prometheus registry into storage	2017-02-28 09:33:14 +01:00
Tom Wilkie	1ab893c6ec	Limit 'discarding sample' logs to 1 every 10s (#2446 ) * Limit 'discarding sample' logs to 1 every 10s * Include the vendored library * Review feedback	2017-02-23 19:20:39 +01:00
Julius Volz	2f39dbc8b3	Rename StorageQueueManager -> QueueManager	2017-02-21 21:45:43 +01:00
Julius Volz	e9476b35d5	Re-add multiple remote writers Each remote write endpoint gets its own set of relabeling rules. This is based on the (yet-to-be-merged) https://github.com/prometheus/prometheus/pull/2419, which removes legacy remote write implementations.	2017-02-20 13:23:12 +01:00
Björn Rabenstein	089dc1076b	Merge pull request #2435 from jmeulemans/open-chunks-gauge Adding gauge for number of open head chunks.	2017-02-17 16:02:06 +01:00
Jeremy Meulemans	025c828976	Changed to open_head_chunks to address review. Now incrementing numHeadChunks directly.	2017-02-17 07:10:13 -06:00
Jeremy Meulemans	074050b8c0	Updating for failed codeclimate check.	2017-02-16 18:04:28 -06:00
Jeremy Meulemans	f70b52d0b6	Adding gauge for number of open head chunks. Fixes #1710	2017-02-16 17:56:45 -06:00
Julius Volz	beb3c4b389	Remove legacy remote storage implementations This removes legacy support for specific remote storage systems in favor of only offering the generic remote write protocol. An example bridge application that translates from the generic protocol to each of those legacy backends is still provided at: documentation/examples/remote_storage/remote_storage_bridge See also https://github.com/prometheus/prometheus/issues/10 The next step in the plan is to re-add support for multiple remote storages.	2017-02-14 17:52:05 +01:00
beorn7	d771185a43	storage: Fix chunkIndexToStartSeek calculation With a high enough shrink ratio and enough chunks to persist, the cutoff point could be _outside_ of the file, which wreaks havoc in the storage.	2017-02-10 11:42:59 +01:00
beorn7	73bd5e4dff	Merge branch 'beorn7/storage' into beorn7/storage3	2017-02-09 14:44:10 +01:00
beorn7	46a0837816	storage: Fix offset returned by dropAndPersistChunks This is another corner-case that was previously never exercised because the rewriting of a series file was never prevented by the shrink ratio. Scenario: There is an existing series on disk, which is archived. If a new sample comes in for that file, a new chunk in memory is created, and the chunkDescsOffset is set to -1. If series maintenance happens before the series has at least one chunk to persist _and_ an insufficient chunks on disk is old enough for purging (so that the shrink ratio kicks in), dropAndPersistChunks would return 0, but it should return the chunk length of the series file.	2017-02-09 14:35:07 +01:00
beorn7	9d12204da5	Merge branch 'release-1.5'	2017-02-09 13:11:53 +01:00
beorn7	bed4934224	storage: One more persist error code path discovered Also, in that code path, set chunkDescsOffset to 0 rather than -1 in case of "dropped more chunks from persistence than from memory" so that no other weird things happen before the series is quarantined for good.	2017-02-09 11:51:40 +01:00
beorn7	242d8edcb5	Merge branch 'release-1.5'	2017-02-08 17:28:09 +01:00
beorn7	8c8baaa558	storage: writeMemorySeries needs to return true for quarantined series This is another fallout of my bug hunt.	2017-02-08 16:28:56 +01:00
Mitsuhiro Tanda	be8b1eb656	storage: optimize dropping chunks by using minShrinkRatio (#2397 ) storage: prevent unnecessary chunk header reading if minShrinkRatio > 0	2017-02-07 17:33:54 +01:00
beorn7	2363a90adc	storage: Do not throw away fully persisted memory series in checkpointing	2017-02-06 17:39:59 +01:00
Fabian Reinartz	ea3ba338dd	main: add flags for new storage	2017-02-05 18:22:06 +01:00
beorn7	244a65fb29	storage: Increase persist watermark before calling append The append call may reuse cds, and thus change its len. (In practice, this wouldn't happen as cds should have len==cap. Still, the previous order of lines was problematic.)	2017-02-05 02:25:09 +01:00
beorn7	75282b27ba	storage: Added checks for invariants	2017-02-04 23:40:22 +01:00
beorn7	31e9db7f0c	storage: Simplify evictChunkDesc method	2017-02-04 22:29:37 +01:00
Fabian Reinartz	5772f1a7ba	retrieval/storage: adapt to new interface This simplifies the interface to two add methods for appends with labels or faster reference numbers.	2017-02-02 13:05:46 +01:00
beorn7	65dc8f44d3	storage: Test for errors returned by MaybePopulateLastTime	2017-02-01 23:43:58 +01:00
beorn7	752fac60ae	storage: Remove race condition from TestLoop	2017-02-01 23:43:58 +01:00
beorn7	4ccfc93dcf	storage: Set shrink ratio in the constructor.	2017-02-01 15:37:16 +01:00
beorn7	b2f086c6c4	storage: Expose bug of not setting the shrink ratio in the contstructor	2017-02-01 15:37:10 +01:00
Brian Brazil	c1b547a90e	Only checkpoint chunkdescs and series that need persisting. (#2340 ) This decreases checkpoint size by not checkpointing things that don't actually need checkpointing. This is fully compatible with the v2 checkpoint format, as it makes series appear as though the only chunksdescs in memory are those that need persisting.	2017-01-17 00:59:38 +00:00
Fabian Reinartz	c691895a0f	retrieval: cache series references, use pkg/textparse With this change the scraping caches series references and only allocates label sets if it has to retrieve a new reference. pkg/textparse is used to do the conditional parsing and reduce allocations from 900B/sample to 0 in the general case.	2017-01-16 12:03:57 +01:00
Brian Brazil	f64c231dad	Allow checkpoints and maintenance to happen concurrently. (#2321 ) This is essential on larger Prometheus servers, as otherwise checkpoints prevent sufficient persisting of chunks to disk.	2017-01-13 17:24:19 +00:00
Fabian Reinartz	ad9bc62e4c	storage: extend appender and adapt it	2017-01-13 14:48:01 +01:00
Brian Brazil	1dcb7637f5	Add various persistence related metrics (#2333 ) Add metrics around checkpointing and persistence * Add a metric to say if checkpointing is happening, and another to track total checkpoint time and count. This breaks the existing prometheus_local_storage_checkpoint_duration_seconds by renaming it to prometheus_local_storage_checkpoint_last_duration_seconds as the former name is more appropriate for a summary. * Add metric for last checkpoint size. * Add metric for series/chunks processed by checkpoints. For long checkpoints it'd be useful to see how they're progressing. * Add metric for dirty series * Add metric for number of chunks persisted per series. You can get the number of chunks from chunk_ops, but not the matching number of series. This helps determine the size of the writes being made. * Add metric for chunks queued for persistence Chunks created includes both chunks that'll need persistence and chunks read in for queries. This only includes chunks created for persistence. * Code review comments on new persistence metrics.	2017-01-11 15:11:19 +00:00
Fabian Reinartz	304cae9928	tsdb: Use PartitionedDB constructor	2017-01-06 12:34:54 +01:00
Brian Brazil	f9e581907a	Make index queue bigger. (#2322 ) When a large Prometheus starts up fresh it can take many minutes to warmup and clear out the index queue. A larger queue means less blocking, bigger batches and cuts down startup time by ~50%.	2017-01-05 17:57:42 +00:00
Fabian Reinartz	bc20d93f0a	storage: rename iterator value getters to At()	2017-01-02 13:33:37 +01:00
Fabian Reinartz	7322c46b8e	storage: add mock iterator for test	2016-12-30 10:45:56 +01:00
Fabian Reinartz	f8fc1f5bb2	*: migrate ingestion to new batch Appender	2016-12-29 11:03:56 +01:00
Fabian Reinartz	71fe0c58a8	promql: misc fixes	2016-12-28 11:32:15 +01:00
Mitsuhiro Tanda	7e369b9318	expose max memory chunks metrics (#2303 ) * expose max memory chunks metrics	2016-12-27 18:34:07 +00:00
Fabian Reinartz	fecf9532b9	*: fix misc compile errors	2016-12-25 11:42:57 +01:00
Fabian Reinartz	622ece6273	*: fix recording tests, migrate matcher types	2016-12-25 11:12:57 +01:00
Fabian Reinartz	0492ddbd4d	*: fully decouple tsdb, add new storage interfaces	2016-12-25 01:43:22 +01:00
Fabian Reinartz	d17b5be48a	storage/metric: remove package	2016-12-25 00:42:52 +01:00
Fabian Reinartz	8b84ee5ee6	storage: remove old storage This removes all old storage files and only keeps interfaces to still allow the code to compile.	2016-12-22 23:33:32 +01:00
Fabian Reinartz	11a731ba82	remote: remove hard-coded remote storages This commit removes the flag-configured remote storage integrations in favor of the generic remote write path.	2016-12-22 23:17:35 +01:00
Brian Brazil	93b70ee4ea	Evict chunk descs of all unloaded chunks during maintenance. (#2297 ) Keeping these around has two problems: 1) Each desc takes 64 bytes, 10 of them is 640B. This is a lot of overhead on a 1024 byte chunk. 2) It can take well over a week to reach a point where this and thus Prometheus memory usage as a whole enters steady state. This makes RAM estimation very hard for users, and makes it difficult to investigate things like memory fragmentation. Instead we'll wipe them during each memory series maintenance cycle, and if a query pulls them in they'll hang around as cache until the next cycle.	2016-12-22 13:49:03 +00:00
Brian Brazil	1b8a474612	Don't clone the metric if there's no remote writes. The metric clone can't be further optimised, and is a non-trivial memory allocation cost so fast path it if there's no remote writes configured.	2016-12-21 11:34:48 +00:00
Tristan Colgate	30be8e0b8a	ignore dotfiles in data directory	2016-12-15 11:48:23 +00:00
Björn Rabenstein	45570e5972	Merge pull request #2277 from prometheus/beorn7/storage2 storage: Sanity-check number of loaded chunk descs	2016-12-14 02:59:10 +01:00
beorn7	253be23c00	storage: Sanity-check number of loaded chunk descs Two cases: - An unarchived metric must have at least one chunk desc loaded upon unarchival. Otherwise, the file is gone or has size 0, which is an inconsistency (because the series is still indexed in the archive index). Hence, quarantining is triggered. - If loading the chunk descs of a series with a known chunkDescsOffset (i.e. != -1), the number of chunks loaded must be equal to chunkDescsOffset. If not, there is a data corruption. An error is returned, which leads to qurantining. In any case, there is a guard added to not access the 1st element of an empty chunkDescs slice. (That's what triggered the crashes in issue 2249.) A time series with unknown chunkDescsOffset and no chunks in memory and no chunks on disk either could trigger that case. I would assume such a "null series" doesn't exist, but it's not entirely unthinkable and unreasonable to happen (perhaps in future uses of the storage). (Create a series, and then something tries to preload chunks before the first sample is added.)	2016-12-13 23:19:39 +01:00
Björn Rabenstein	5f0c0e43cf	Merge pull request #2276 from prometheus/beorn7/storage storage: Catch data corruption that leads to division by zero	2016-12-13 23:13:39 +01:00
beorn7	837c029b16	storage: Fix linter issue Go style tries to avoid indented `else` blocks.	2016-12-13 19:05:30 +01:00
beorn7	4719482f5f	storage: Make tests go-vet and golint clean	2016-12-13 17:07:27 +01:00
beorn7	485ac8dff7	storage: Verify validity of byte length when unmarshalling (double)delta chunks This makes sure a division-by-zero crash cannot happen in the Len() method. Fixes #2773	2016-12-13 17:07:27 +01:00
tattsun	e714079cf2	storage: fix error message (#2270 ) * storage: add error message	2016-12-09 22:36:27 +00:00
Christopher M. Luciano	148b006e25	Clarify error message when Prometheus data dir finds unexpected files	2016-12-05 10:51:57 -05:00
Julius Volz	127332c56f	Merge pull request #2168 from tomwilkie/chunk-len Add call to estimate number of samples in a chunk to the API	2016-11-17 23:13:50 -08:00
Tom Wilkie	585878cdb2	Add call to estimate number of samples in a chunk to the API	2016-11-17 19:09:59 +00:00
Björn Rabenstein	036715370f	Merge pull request #2184 from huydx/master Fix possible memory leak by defer inside loop	2016-11-14 15:26:39 +01:00
huydx	c999902761	Fix possible memory leak by defer inside loop	2016-11-14 14:08:08 +09:00
Fabian Reinartz	856de30c09	Check error before defer closing If an error is returned the file might be nil and a Close call would cause a panic.	2016-11-13 18:16:02 +01:00

... 3 4 5 6 7 ...

1142 commits