prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2025-03-05 20:59:13 -08:00

Author	SHA1	Message	Date
Tom Wilkie	adf5307470	Update wal LiveReader to ensure EOF is correctly propagated. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Callum Styan	d6258aea8f	Fix up remote write tests: - Tests that created a QueueManager were leaving behind files at the end of tests. - WAL replaying (readToEnd)tests seem to require extra time to finish now. - Some fixes to make staticcheck happy Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	184f06a981	Combine the record decoding metrics into one; break out garbage collection into a separate function. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	859cda27ff	Remove some 'global' state, moving segment numbers to parameters. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	bdc6b764b0	If reading the WAL fails, try again. Also, read from the segment containing the index for the last checkpoint, not the first segment. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	d6f911b511	Factor out logging ratelimit & dedupe middleware. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	a5c20642b3	Refactor WAL watcher to remove some duplication. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	37ad4db485	Export timestamps in seconds since epoch. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
JoeWrightss	362873f72b	Fix .Log() error message (#5257 ) Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>	2019-02-22 14:39:37 +00:00
Simon Pasquier	b41d6d54f2	storage/remote: increase timeouts for Travis CI (#5224 ) * storage/remote: adapt tests for Travis CI Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Check filesystems on Travis environment Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Run remote/storage tests on CircleCI for troubleshooting Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Try using tmpfs partition Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Revert "Try using tmpfs partition" This reverts commit `85a30deb72`. Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Don't store labels in writeToMock Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Fix data race Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Bump retries to 100 meaning that the total timeout is 10s Signed-off-by: Simon Pasquier <spasquie@redhat.com> * clean up .travis.yml Signed-off-by: Simon Pasquier <spasquie@redhat.com> * code fixup Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Remove unneeded empty line Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-02-15 16:47:41 +01:00
Callum Styan	37e35f9e0c	Various improvements to WAL based remote write. - Use the queue name in WAL watcher logging. - Don't return from watch if the reader error was EOF. - Fix sample timestamp check logic regarding what samples we send. - Refactor so we don't need readToEnd/readSeriesRecords - Fix wal_watcher tests since readToEnd no longer exists Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-02-12 11:39:13 +00:00
Tom Wilkie	b93bafeee1	Various fixes to locking & shutdown for WAL-based remote write. - Remove datarace in the exported highest scrape timestamp. - Backoff on enqueue should be per-sample - reset the result for each sample. - Remove diffKeys, unused ctx and cancelfunc in WALWatcher, 'name' from writeTo interface, and pass it to constructor. - Reorder functions in WALWatcher depth-first according to call graph. - Fix vendor/modules.txt. - Split out the various timer periods into consts at the top of the file. - Move w.currentSegmentMetric.Set close to where we set the currentSegment. - Combine r.Next() and isClosed(w.quit) into a single loop. - Unnest some ifs in WALWatcher.watch, propagate erros in decodeRecord, add some new lines to make it easier to read. - Reorganise checkpoint handling to reduce nesting and make it easier to follow. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-12 11:39:13 +00:00
Callum Styan	6f69e31398	Tail the TSDB WAL for remote_write This change switches the remote_write API to use the TSDB WAL. This should reduce memory usage and prevent sample loss when the remote end point is down. We use the new LiveReader from TSDB to tail WAL segments. Logic for finding the tracking segment is included in this PR. The WAL is tailed once for each remote_write endpoint specified. Reading from the segment is based on a ticker rather than relying on fsnotify write events, which were found to be complicated and unreliable in early prototypes. Enqueuing a sample for sending via remote_write can now block, to provide back pressure. Queues are still required to acheive parallelism and batching. We have updated the queue config based on new defaults for queue capacity and pending samples values - much smaller values are now possible. The remote_write resharding code has been updated to prevent deadlocks, and extra tests have been added for these cases. As part of this change, we attempt to guarantee that samples are not lost; however this initial version doesn't guarantee this across Prometheus restarts or non-retryable errors from the remote end (eg 400s). This changes also includes the following optimisations: - only marshal the proto request once, not once per retry - maintain a single copy of the labels for given series to reduce GC pressure Other minor tweaks: - only reshard if we've also successfully sent recently - add pending samples, latest sent timestamp, WAL events processed metrics Co-authored-by: Chris Marchbanks <csmarchbanks.com> (initial prototype) Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> (sharding changes) Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-02-12 11:39:13 +00:00
Goutham Veeramachaneni	384cba1211	Add flag for size based retention (#5109 ) * Add flag for size based retention Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Deprecate the old retention flag for a new one. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Add ability to take a suffix for size flag Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Address feedback Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>	2019-01-18 19:18:36 +05:30
Krasi Georgiev	3bd41cc92c	Udpate tsdb to 0.4 (#5110 ) * update tsdb to v0.4.0 Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com> * remove unused struct field Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>	2019-01-18 16:32:14 +05:30
Matt Layher	302148fd69	*: apply gofmt -s Signed-off-by: Matt Layher <mdlayher@gmail.com>	2019-01-16 17:28:14 -05:00
Callum Styan	5358f76c5c	update remote write path proto so that Labels/Timeseries can't be nil (#4957 ) Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-01-15 19:13:39 +00:00
Simon Pasquier	f678e27eb6	: use latest release of staticcheck (#5057 ) : use latest release of staticcheck It also fixes a couple of things in the code flagged by the additional checks. Signed-off-by: Simon Pasquier <spasquie@redhat.com> Use official release of staticcheck Also run 'go list' before staticcheck to avoid failures when downloading packages. Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-01-04 14:47:38 +01:00
glutamatt	5ddde1965b	tune the "Wal segment size" with a flag (#5029 ) Add WALSegmentSize as an option, and the corresponding flag "storage.tsdb.wal-segment-size" to tune the max size of wal segment files. The addressed base problem is to reduce the disk space used by wal segment files : on a raspberry pi, for instance, we often want to reduce write load of the sd card, then, the wal directory is mounted on a memory (space limited) partition. the default value of the segment max file size, pushed the size of directory to 128 MB for each segment , which is too much ram consumption on a rasp. the initial discussion is at https://github.com/prometheus/tsdb/pull/450	2019-01-03 17:13:21 +03:00
Tom Wilkie	6e08029b56	Move err to be the last return value from storage.Select. (#5054 ) Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-01-02 11:10:13 +00:00
AixesHunter	806632790e	update inconsistent comment (#5046 ) Co-Authored-By: aixeshunter <44970652+aixeshunter@users.noreply.github.com> Signed-off-by: aixeshunter <aixeshunter@gmail.com>	2018-12-27 14:02:36 +00:00
Bartek Płotka	62c8337e77	Moved configuration into `relabel` package. (#4955 ) Adapted top dir relabel to use pkg relabel structs. Removal of this in a separate tracked here: https://github.com/prometheus/prometheus/issues/3647 Signed-off-by: Bartek Plotka <bwplotka@gmail.com>	2018-12-18 11:26:36 +00:00
Alin Sinpalean	44bec482fb	Minor optimization for BufferedSeriesIterator: actually drop the samples falling outside of the new delta from the underlying sampleRing, when ReduceDelta is called. (#4849 ) Signed-off-by: Alin Sinpalean <alin.sinpalean@gmail.com>	2018-12-18 11:25:45 +00:00
Alin Sinpalean	d6adfe2ae2	Use a fake SeriesIterator (that generates samples on the fly instead of using a slice) for BufferedSeriesIterator, to reduce the variance of benchmark results due to memory pressure. (#4847 ) Signed-off-by: Alin Sinpalean <alin.sinpalean@gmail.com>	2018-12-18 11:22:33 +00:00
Ryota Arai	135d580ab2	Introduce min_shards for remote write to set minimum number of shards. (#4924 ) Signed-off-by: Ryota Arai <ryota.arai@gmail.com>	2018-12-04 17:32:14 +00:00
mknapphrt	f0e9196dca	Return warnings on a remote read fail (#4832 ) Signed-off-by: Mark Knapp <mknapp@hudson-trading.com>	2018-11-30 14:27:12 +00:00
Ben Kochie	c6399296dc	Fix spelling/typos (#4921 ) * Fix spelling/typos Fix spelling/typos reported by codespell/misspell. * UK -> US spelling changes. Signed-off-by: Ben Kochie <superq@gmail.com>	2018-11-27 17:44:29 +01:00
Daniele Sluijters	f25a6baedb	remote: Set User-Agent header in requests (#4891 ) Currently Prometheus requests show up with a UA of Go-http-client/1.1 which isn't super helpful. Though the X-Prometheus-Remote-* headers exist they need to be explicitly configured when logging the request in order to be able to deduce this is a request originating from Prometheus. By setting the header we remove this ambiguity and make default server logs just a bit more useful. This also updates a few other places to consistently capitalize the 'P' in the user agent, as well as ensure we set a UA to begin with. Signed-off-by: Daniele Sluijters <daenney@users.noreply.github.com>	2018-11-23 22:49:49 +08:00
Krasi Georgiev	bd100182b2	added tsdb/head mint maxt metrics (#4888 ) added the head metrics with the correct suffix. Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>	2018-11-21 12:57:32 +02:00
Simon Pasquier	ed19373a78	: remove use of golang.org/x/net/context (#4869 ) : remove use of golang.org/x/net/context Signed-off-by: Simon Pasquier <spasquie@redhat.com> scrape: fix TestTargetScrapeScrapeCancel Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2018-11-19 12:31:16 +01:00
Ganesh Vernekar	ca93fd544b	/api/v1/labels endpoint for getting all label names (#4835 ) * vendor: update tsdb Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * /api/v1/labels endpoint Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * regex matchers for API Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Add docs Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Matchers behaving as OR Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Removed the matchers Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * vendor: update tsdb using go mod Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * vendor update: tsdb Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Added LabelNames() to storage.Querier Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Test for api.labelNames Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Nits Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2018-11-19 15:51:14 +05:30
fengyuceNv	94fff219ab	improve remote storage enqueue performance (#4772 ) Signed-off-by: fyc <fyc22788@ly.com>	2018-11-13 12:19:05 +00:00
Tariq Ibrahim	3f7ed7de49	Adding new metric type to track in-flight remote read queries. (#4677 ) Signed-off-by: tariqibrahim <tariq.ibrahim@microsoft.com>	2018-10-10 14:48:32 -07:00
Tom Wilkie	d3a1ff1abf	Reduce memory usage of remote read by reducing pointer usage. (#4655 ) Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2018-09-25 19:14:00 +01:00
yzpeninsula	4ae3bce260	Fix typo (#4497 ) Signed-off-by: yzpeninsula <yzpeninsula@gmail.com>	2018-09-13 16:04:10 +05:30
Tom Wilkie	457e4bb58e	Limit the number of samples remote read can return. (#4532 ) * Limit the number of samples remote read can return. - Return 413 entity too large. - Limit can be set be a flag. Allow 0 to mean no limit. - Include limit in error message. - Set default limit to 50M (* 16 bytes = 800MB). Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2018-09-05 15:50:50 +02:00
Daisy T	7d01ead689	change time.duration to model.duration for standardization (#4479 ) Signed-off-by: Daisy T <daisyts@gmx.com>	2018-08-24 16:55:21 +02:00
Julius Volz	8fbe1b5133	Handle a bunch of unchecked errors (#4461 ) There are many more (mostly finalizers like Close/Stop/etc.), but most of the others seemed like one couldn't do much about them anyway. Signed-off-by: Julius Volz <julius.volz@gmail.com>	2018-08-17 17:24:35 +02:00
Henri DF	ffb7836c14	Send "Accept-Encoding" header in read request (#4421 ) We should be doing this since we only accept Snappy-encoded responses. Signed-off-by: Henri DF <henridf@gmail.com>	2018-07-26 12:45:04 +01:00
Henri DF	3abb2cc349	Fix typo (#4423 ) Signed-off-by: Henri DF <henridf@gmail.com>	2018-07-26 08:49:53 +01:00
Alin Sinpalean	372e7652b7	Reuse (copy) overlapping matrix samples between range evaluation steps (#4315 ) * Reuse (copy) overlapping matrix samples between range evaluation steps. Signed-off-by: Alin Sinpalean <alin.sinpalean@gmail.com>	2018-07-18 11:14:02 +01:00
Goutham Veeramachaneni	c28cc5076c	Saner defaults and metrics for remote-write (#4279 ) * Rename queueCapacity to shardCapacity * Saner defaults for remote write * Reduce allocs on retries Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2018-07-18 05:15:16 +01:00
Alin Sinpalean	e3b775b78b	Simplify BufferedSeriesIterator usage (#4294 ) * Allow for BufferedSeriesIterator instances to be created without an underlying iterator, to simplify their usage. Signed-off-by: Alin Sinpalean <alin.sinpalean@gmail.com>	2018-07-18 05:10:28 +01:00
Thomas Jackson	92c6f0c92e	Add offset to selectParams (#4226 ) * Add Start/End to SelectParams * Make remote read use the new selectParams for start/end This commit will continue sending the start/end time of the remote read query as the overarching promql time and the specific range of data that the query is intersted in receiving a response to is now part of the ReadHints (upstream discussion in #4226). * Remove unused vendored code The genproto.sh script was updated, but the code wasn't regenerated. This simply removes the vendored deps that are no longer part of the codegen output. Signed-off-by: Thomas Jackson <jacksontj.89@gmail.com>	2018-07-18 04:58:00 +01:00
Brian Brazil	fb695fb435	Merge pull request #4285 from prometheus/release-2.3 Merge release-2.3 back to master	2018-06-20 14:51:00 +01:00
Tom Wilkie	b8217720ac	Review feedback. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2018-06-19 13:03:01 +01:00
Corentin Chary	db9dbeeaec	federation: nil pointer deference when using remove read ``` level=error ts=2018-06-13T07:19:04.515149169Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56202: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.516199547Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56204: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.51717692Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56206: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.564952878Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56208: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.566575791Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56210: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.567106063Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56212: runtime error: invalid memory address or nil pointer dereference" ``` When remove read is enabled, federation will call `q.Select(nil, mset...)` which will break remote reads because it currently doesn't handle empty SelectParams. Signed-off-by: Corentin Chary <c.chary@criteo.com>	2018-06-19 13:03:01 +01:00
Brian Brazil	78efdc6d6b	Avoid infinite loop on duplicate NaN values. (#4275 ) Fixes #4254 NaNs don't equal themselves, so a duplicate NaN would always hit the break statement and never get popped. We should not be returning multiple data point for the same timestamp, so don't compare values at all. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2018-06-18 17:34:08 +01:00
Tom Wilkie	0b189b2da9	Review feedback. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2018-06-18 17:21:12 +01:00
Corentin Chary	530107f8ef	federation: nil pointer deference when using remove read ``` level=error ts=2018-06-13T07:19:04.515149169Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56202: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.516199547Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56204: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.51717692Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56206: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.564952878Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56208: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.566575791Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56210: runtime error: invalid memory address or nil pointer dereference" level=error ts=2018-06-13T07:19:04.567106063Z caller=stdlib.go:89 component=web caller="http: panic serving [::1" msg="]:56212: runtime error: invalid memory address or nil pointer dereference" ``` When remove read is enabled, federation will call `q.Select(nil, mset...)` which will break remote reads because it currently doesn't handle empty SelectParams. Signed-off-by: Corentin Chary <c.chary@criteo.com>	2018-06-18 17:21:12 +01:00

1 2 3 4 5 ...

942 commits