prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-11-18 19:44:06 -08:00

Author	SHA1	Message	Date
Nicole J	5e9bd17b1f	added the prometheus_remote_storage_remote_read_queries_total (#7328 ) * added the prometheus_remote_storage_remote_read_queries_total query Signed-off-by: njingco <jingco.nicole@gmail.com> * adjusted the help label of remoteReadQueriesTotal Signed-off-by: njingco <jingco.nicole@gmail.com>	2020-06-03 10:30:52 -07:00
Cody Boggs	3268eac2dd	Trace Remote Write requests (#7206 ) * Trace Remote Write requests Signed-off-by: Cody Boggs <cboggs@splunk.com> * Refactor store attempts to keep code flow clearer, and avoid so many places to deal with span finishing Signed-off-by: Cody Boggs <cboggs@splunk.com>	2020-06-01 09:21:13 -06:00
Chris Marchbanks	c1f9917e90	Add test for unregistering queue manager metrics Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2020-05-05 14:14:04 -06:00
Chris Marchbanks	dfad1da296	Remove duplicate metrics in QueueManager Right now any new metrics added for remote write need to be added to both the QueueManager struct, and the queueManagerMetrics struct. Instead, use the queueManagerMetrics struct directly from QueueManager. The newQueueManagerMetrics constructor will now create the metrics for a specific queue with name and endpoint pre-populated, and a new copy of the struct will be created specifically for each queue. This also fixes a bug where prometheus_remote_storage_sent_bytes_total is not being unregistered after a queue is changed. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2020-05-05 14:13:59 -06:00
Julien Pivotto	7ecd2d1c24	Jaeger: Create child span for remote read (#7187 ) * Jaeger: Create child span for remote read * Jaeger: use middleware to trace client http request Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-05-02 22:41:55 +02:00
qinng	f36ae1c21c	[remote-storage] use warn log level when send samples to remote failed (#7184 ) [remote] increasing sendbatch error log level Signed-off-by: guoruyi1 <guoruyi1@xiaomi.com> Co-authored-by: guoruyi1 <guoruyi1@xiaomi.com>	2020-04-30 17:06:22 -06:00
Vasily Sliouniaev	0393b188c9	Add Jaeger (#7148 ) * Trace remote read Signed-off-by: vas <vasily.sliouniaev@jet.com> * Use jaeger Signed-off-by: vas <vasily.sliouniaev@jet.com>	2020-04-23 02:05:55 +02:00
Marek Slabicki	4b5e7d4984	Adding a shouldReshard function to modularize logic for the QueueManager deciding if it should shard or not (#7143 ) Signed-off-by: Marek Slabicki <thaniri@gmail.com>	2020-04-20 16:20:39 -06:00
Chris Marchbanks	cd12f0873c	Merge pull request #7073 from csmarchbanks/fix-md5-remote-write Fix remote write not updating when relabel configs or secrets change	2020-04-16 16:36:25 -06:00
Chris Marchbanks	5ab6b043c1	Always update lastSendTimestamp after a request (#7122 ) If the server is returning non-recoverable errors, such as if we are trying to push samples that are too old, remote write will never reshard. Non-recoverable errors should be treated the same as success for the purpose of resharding, just as we do with sample rates and durations. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2020-04-15 09:03:28 -06:00
ZouYu	2b7437d60e	Fix some warnings: 'redundant type from array, slice, or map composite literal' (#7109 ) Signed-off-by: ZouYu <zouy.fnst@cn.fujitsu.com>	2020-04-15 11:17:41 +01:00
Chris Marchbanks	d88a2b0261	Handle secret changes in remote write ApplyConfig Remake the http client whenever ApplyConfig is called. This allows secrets to be updated without needing to restart an otherwise unchanged queue. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2020-04-13 23:14:15 +00:00
Simon Pasquier	317e73de79	Hash YAML instead of JSON But it doesn't work either because of secret fields. Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2020-04-13 22:32:37 +00:00
Simon Pasquier	8cc84660fa	storage/remote: add tests for config changes Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2020-04-13 22:32:37 +00:00
Marek Slabicki	8224ddec23	Capitalizing first letter of all log lines (#7043 ) Signed-off-by: Marek Slabicki <thaniri@gmail.com>	2020-04-11 09:22:18 +01:00
ga	9a21fdcd1b	[storage] clean imports (#7099 ) Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>	2020-04-07 22:05:39 +01:00
Bartlomiej Plotka	d5c33877f9	storage: Added Chunks{Queryable/Querier/SeriesSet/Series/Iteratable. Added generic Merge{SeriesSet/Querier} implementation. (#7005 ) * storage: Added Chunks{Queryable/Querier/SeriesSet/Series/Iteratable. Added generic Merge{SeriesSet/Querier} implementation. ## Rationales: In many places (e.g. chunk Remote read, Thanos Receive fetching chunk from TSDB), we operate on encoded chunks not samples. This means that we unnecessary decode/encode, wasting CPU, time and memory. This PR adds chunk iterator interfaces and makes the merge code to be reused between both seriesSets I will make the use of it in following PR inside tsdb itself. For now fanout implements it and mergers. All merges now also allows passing series mergers. This opens doors for custom deduplications other than TSDB vertical ones (e.g. offline one we have in Thanos). ## Changes * Added Chunk versions of all iterating methods. It all starts in Querier/ChunkQuerier. The plan is that Storage will implement both chunked and samples. * Added Seek to chunks.Iterator interface for iterating over chunks. * NewMergeChunkQuerier was added; Both this and NewMergeQuerier are now using generigMergeQuerier to share the code. Generic code was added. * Improved tests. * Added some TODO for further simplifications in next PRs. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed Brian's comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Moved s/Labeled/SeriesLabels as per Krasi suggestion. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed Krasi's comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Second iteration of Krasi comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Another round of comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-03-24 20:15:47 +00:00
Bartlomiej Plotka	c4eefd1b3a	storage: Removed SelectSorted method; Simplified interface; Added requirement for remote read to sort response. This is technically BREAKING CHANGE, but it was like this from the beginning: I just notice that we rely in Prometheus on remote read being sorted. This is because we use selected data from remote reads in MergeSeriesSet which rely on sorting. I found during work on https://github.com/prometheus/prometheus/pull/5882 that we do so many repetitions because of this, for not good reason. I think I found a good balance between convenience and readability with just one method. Smaller the interface = better. Also I don't know what TestSelectSorted was testing, but now it's testing sorting. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-03-20 21:14:43 +01:00
Callum Styan	f802f1e8ca	Fix bug with WAL watcher and Live Reader metrics usage. (#6998 ) * Fix bug with WAL watcher and Live Reader metrics usage. Calling NewXMetrics when creating a Watcher or LiveReader results in a registration error, which we're ignoring, and as a result other than the first Watcher/Reader created, we had no metrics for either. So we would only have metrics like Watcher Records Read for the first remote write config in a users config file. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2020-03-20 17:34:15 +01:00
Callum Styan	1518083168	Rw testability improvements (#6537 ) * Change createTimeseries to take values for number of series and number of samples per series. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Take num of samples to expect in expectSampleCount instead of array of samples. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Add field to TestStorageClient to ignore samples sent waitgroup for potential tests where we don't care about delivery of all samples. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix up tests a little bit. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2020-02-25 11:10:57 -08:00
Bartlomiej Plotka	fb79f515fc	Fixed second bug. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-02-17 18:03:57 +00:00
Bartlomiej Plotka	34426766d8	Unify Iterator interfaces. All point to storage now. This is part of https://github.com/prometheus/prometheus/pull/5882 that can be done to simplify things. All todos I added will be fixed in follow up PRs. * querier.Querier, querier.Appender, querier.SeriesSet, and querier.Series interfaces merged with storage interface.go. All imports that. * querier.SeriesIterator replaced by chunkenc.Iterator * Added chunkenc.Iterator.Seek method and tests for xor implementation (?) * Since we properly handle SelectParams for Select methods I adjusted min max based on that. This should help in terms of performance for queries with functions like offset. * added Seek to deletedIterator and test. * storage/tsdb was removed as it was only a unnecessary glue with incompatible structs. No logic was changed, only different source of abstractions, so no need for benchmarks. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-02-17 18:03:54 +00:00
helenxu1221	7df4fe3faa	reset counter after collecting metric (#6798 ) Signed-off-by: HelenXu <helenxu@Helens-MacBook-Pro.local>	2020-02-09 20:51:21 -07:00
Robert Fratto	a53e00f9fd	pass registerer from storage to queue manager for its metrics (#6728 ) * pass registerer from storage to queue manager for its metrics Signed-off-by: Robert Fratto <robert.fratto@grafana.com>	2020-02-03 13:47:03 -08:00
Brian Brazil	38d32e0686	Don't sort postings if we only have one block. Sorting the heads postings can be quite slow. We only need sorted series when merging with another querier, so only sort then. This will make big queries that only touch the head faster, though queries that touch both the head and a block will still be the same speed. This probably won't help much with graphing unless the range is under an hour, however it should make most recording rules faster. Add gaurantee that remote read streaming produces sorted series. PromQL benchmarks for histograms show only 2-3% improvement, but they're only over 1k series. benchmark old ns/op new ns/op delta BenchmarkQuerierSelect/Head/1of1000000-4 1375486282 507657736 -63.09% BenchmarkQuerierSelect/Head/10of1000000-4 1387859004 507769850 -63.41% BenchmarkQuerierSelect/Head/100of1000000-4 1387087935 506029110 -63.52% BenchmarkQuerierSelect/Head/1000of1000000-4 1386869064 504521986 -63.62% BenchmarkQuerierSelect/Head/10000of1000000-4 1386213685 505210422 -63.55% BenchmarkQuerierSelect/Head/100000of1000000-4 1392754988 529842406 -61.96% BenchmarkQuerierSelect/Head/1000000of1000000-4 1569414722 725059506 -53.80% BenchmarkQuerierSelect/SortedHead/1of1000000-4 1381019902 1370495863 -0.76% BenchmarkQuerierSelect/SortedHead/10of1000000-4 1375696209 1366789468 -0.65% BenchmarkQuerierSelect/SortedHead/100of1000000-4 1386009422 1364519297 -1.55% BenchmarkQuerierSelect/SortedHead/1000of1000000-4 1377700532 1364486191 -0.96% BenchmarkQuerierSelect/SortedHead/10000of1000000-4 1383539536 1369545314 -1.01% BenchmarkQuerierSelect/SortedHead/100000of1000000-4 1410089163 1394731339 -1.09% BenchmarkQuerierSelect/SortedHead/1000000of1000000-4 1634744148 1581554956 -3.25% BenchmarkQuerierSelect/Block/1of1000000-4 881741242 879839470 -0.22% BenchmarkQuerierSelect/Block/10of1000000-4 880381562 882846038 +0.28% BenchmarkQuerierSelect/Block/100of1000000-4 887519357 881016916 -0.73% BenchmarkQuerierSelect/Block/1000of1000000-4 902194205 883433524 -2.08% BenchmarkQuerierSelect/Block/10000of1000000-4 892321964 885130170 -0.81% BenchmarkQuerierSelect/Block/100000of1000000-4 938604466 933527150 -0.54% BenchmarkQuerierSelect/Block/1000000of1000000-4 1313510845 1295881124 -1.34% benchmark old allocs new allocs delta BenchmarkQuerierSelect/Head/1of1000000-4 4000056 4000018 -0.00% BenchmarkQuerierSelect/Head/10of1000000-4 4000074 4000036 -0.00% BenchmarkQuerierSelect/Head/100of1000000-4 4000254 4000216 -0.00% BenchmarkQuerierSelect/Head/1000of1000000-4 4002054 4002016 -0.00% BenchmarkQuerierSelect/Head/10000of1000000-4 4020054 4020016 -0.00% BenchmarkQuerierSelect/Head/100000of1000000-4 4200054 4200016 -0.00% BenchmarkQuerierSelect/Head/1000000of1000000-4 6000054 6000016 -0.00% BenchmarkQuerierSelect/SortedHead/1of1000000-4 4000071 4000071 +0.00% BenchmarkQuerierSelect/SortedHead/10of1000000-4 4000089 4000089 +0.00% BenchmarkQuerierSelect/SortedHead/100of1000000-4 4000269 4000269 +0.00% BenchmarkQuerierSelect/SortedHead/1000of1000000-4 4002069 4002069 +0.00% BenchmarkQuerierSelect/SortedHead/10000of1000000-4 4020069 4020069 +0.00% BenchmarkQuerierSelect/SortedHead/100000of1000000-4 4200069 4200069 +0.00% BenchmarkQuerierSelect/SortedHead/1000000of1000000-4 6000069 6000069 +0.00% BenchmarkQuerierSelect/Block/1of1000000-4 6000023 6000022 -0.00% BenchmarkQuerierSelect/Block/10of1000000-4 6000059 6000058 -0.00% BenchmarkQuerierSelect/Block/100of1000000-4 6000419 6000418 -0.00% BenchmarkQuerierSelect/Block/1000of1000000-4 6004019 6004018 -0.00% BenchmarkQuerierSelect/Block/10000of1000000-4 6040019 6040018 -0.00% BenchmarkQuerierSelect/Block/100000of1000000-4 6400019 6400018 -0.00% BenchmarkQuerierSelect/Block/1000000of1000000-4 10000020 10000019 -0.00% benchmark old bytes new bytes delta BenchmarkQuerierSelect/Head/1of1000000-4 229192200 176001176 -23.21% BenchmarkQuerierSelect/Head/10of1000000-4 229193352 176002328 -23.21% BenchmarkQuerierSelect/Head/100of1000000-4 229204872 176013848 -23.21% BenchmarkQuerierSelect/Head/1000of1000000-4 229320072 176129048 -23.20% BenchmarkQuerierSelect/Head/10000of1000000-4 230472072 177281048 -23.08% BenchmarkQuerierSelect/Head/100000of1000000-4 241992072 188801048 -21.98% BenchmarkQuerierSelect/Head/1000000of1000000-4 357192072 304001048 -14.89% BenchmarkQuerierSelect/SortedHead/1of1000000-4 229193928 229193928 +0.00% BenchmarkQuerierSelect/SortedHead/10of1000000-4 229195080 229195080 +0.00% BenchmarkQuerierSelect/SortedHead/100of1000000-4 229206600 229206600 +0.00% BenchmarkQuerierSelect/SortedHead/1000of1000000-4 229321800 229321800 +0.00% BenchmarkQuerierSelect/SortedHead/10000of1000000-4 230473800 230473800 +0.00% BenchmarkQuerierSelect/SortedHead/100000of1000000-4 241993800 241993800 +0.00% BenchmarkQuerierSelect/SortedHead/1000000of1000000-4 357193800 357193800 +0.00% BenchmarkQuerierSelect/Block/1of1000000-4 227201516 227201500 -0.00% BenchmarkQuerierSelect/Block/10of1000000-4 227202924 227202908 -0.00% BenchmarkQuerierSelect/Block/100of1000000-4 227217036 227217020 -0.00% BenchmarkQuerierSelect/Block/1000of1000000-4 227358156 227358140 -0.00% BenchmarkQuerierSelect/Block/10000of1000000-4 228769356 228769340 -0.00% BenchmarkQuerierSelect/Block/100000of1000000-4 242881356 242881340 -0.00% BenchmarkQuerierSelect/Block/1000000of1000000-4 384001616 384001600 -0.00% Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2020-01-28 09:14:56 +00:00
Anand Singh Kunwar	aa61e392b2	Make remote client `Store` use passed context (#6673 ) * Remote store client's `Store` API currently doesn't use passed context, but instead just constructs a new `context.Background()` Signed-off-by: Anand Singh Kunwar <anandkunwar95@gmail.com>	2020-01-27 07:43:20 -07:00
Julien Pivotto	cf42888e4d	Fix order of testutil.Equals (#6695 ) Equals takes the expected value as first parameter, and the actual value as second parameter. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-01-27 12:21:59 +00:00
Julien Pivotto	aad8f89ecb	Remote storage: propagate json marshal errors (#6622 ) Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-01-14 08:40:30 +00:00
Chris Marchbanks	7f3aca62c4	Only reduce the number of shards when caught up. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2020-01-06 14:53:23 -07:00
Chris Marchbanks	9e24e1f9e8	Use samplesPending rather than integral The integral accumulator in the remote write sharding code is just a second way of keeping track of the number of samples pending. Remove integralAccumulator and use the samplesPending value we already calculate to calculate the number of shards. This has the added benefit of fixing a bug where the integralAccumulator was not being initialized correctly due to not taking into account the number of ticks being counted, causing the integralAccumulator initial value to be off by an order of magnitude in some cases. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2020-01-06 14:53:23 -07:00
Chris Marchbanks	847c66a843	Add sharding test Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2020-01-06 14:53:23 -07:00
Josh Soref	91d76c8023	Spelling (#6517 ) * spelling: alertmanager Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: attributes Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: autocomplete Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: bootstrap Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: caught Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: chunkenc Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: compaction Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: corrupted Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: deletable Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: expected Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: fine-grained Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: initialized Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: iteration Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: javascript Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: multiple Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: number Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: overlapping Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: possible Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: postings Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: procedure Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: programmatic Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: queuing Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: querier Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: repairing Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: received Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: reproducible Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: retention Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: sample Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: segements Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: semantic Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: software [LICENSE] Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: staging Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: timestamp Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: unfortunately Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: uvarint Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: subsequently Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: ressamples Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>	2020-01-02 15:54:09 +01:00
Julien Pivotto	31700a05df	Improve testutil.ErrorEqual (#6471 ) Also improves TestPopulateLabels: testutil.ErrorEqual just returned a bool without failing the test. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2019-12-17 21:11:33 +00:00
Callum Styan	67838643ee	Add config option for remote job name (#6043 ) * Track remote write queues via a map so we don't care about index. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Support a job name for remote write/read so we can differentiate between them using the name. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Remote write/read has Name to not confuse the meaning of the field with scrape job names. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Split queue/client label into remote_name and url labels. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Don't allow for duplicate remote write/read configs. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Ensure we restart remote write queues if the hash of their config has not changed, but the remote name has changed. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Include name in remote read/write config hashes, simplify duplicates check, update test accordingly. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-12-12 12:47:23 -08:00
Garrett	5a9c4acfbf	Pushdown aggregator group by through read hint (#6401 ) * Pushdown aggregator group by through read hint Implement https://github.com/prometheus/prometheus/issues/6400 * add temporal aggregation pushdown support Signed-off-by: xiancli <xiancli@ebay.com>	2019-12-05 14:06:28 +00:00
Chris Marchbanks	5000c05378	Merge pull request #6378 from prometheus/accurate-desired-shards-metric Change desired shards metric to report raw calculated value	2019-12-03 15:26:21 -07:00
Callum Styan	5830e03691	Merge pull request #6337 from cstyan/rw-log-replay Log the start and end of the WAL replay within the WAL watcher.	2019-12-03 13:24:26 -08:00
Callum Styan	6a24eee340	Simplify duration check for watcher WAL replay. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-11-26 16:53:11 -08:00
Chris Marchbanks	6f34e35b3e	Record the exact value of desired shards in metric It is possible that desired shards is always a bit higher than the number of shards (less than 30%) and by exporting desired shards as the raw number it will be easy to tell if a Prometheus is in that situation. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-11-26 06:26:03 -07:00
Chris Marchbanks	0e684ca205	Fix unknown type in sharding up log Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-11-26 06:22:56 -07:00
Callum Styan	c2cb1e4103	Add a metric to track total bytes sent per remote write queue. (#6344 ) Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-11-25 13:25:18 -07:00
Tom Wilkie	de0a772b8e	Port tsdb to use pkg/labels. (#6326 ) * Port tsdb to use pkg/labels. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> * Get tests passing. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> * Remove useless cast. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> * Appease linters. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> * Fix review comments Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2019-11-18 11:53:33 -08:00
Callum Styan	5f1be2cf45	Refactor calculateDesiredShards + don't reshard if we're having issues sending samples. (#6111 ) * Refactor calculateDesiredShards + don't reshard if we're having issues sending samples. * Track lastSendTimestamp via an int64 with atomic add/load, add a test for reshard calculation. * Simplify conditional for skipping resharding, add samplesIn/Out to shard testcase struct. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-10-21 15:54:25 -06:00
Krasi Georgiev	81d284f806	Merge the 2.13 release branch to master (#6117 )	2019-10-09 17:41:46 +02:00
Callum Styan	84ff928606	Make sure the remote write storage uses a dedupe logger. (#6113 ) Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-10-08 11:42:00 -06:00
Chris Marchbanks	8df4bca470	Garbage collect asynchronously in the WAL Watcher The WAL Watcher replays a checkpoint after it is created in order to garbage collect series that no longer exist in the WAL. Currently the garbage collection process is done serially with reading from the tip of the WAL which can cause large delays in writing samples to remote storage just after compaction occurs. This also fixes a memory leak where dropped series are not cleaned up as part of the SeriesReset process. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-10-07 14:36:10 -06:00
George Felix	895abbb7d0	Replaced test validations with testutils on storage/remote/codec_test.go (#6097 ) * Replaced test validations with testutils on storage/remote/codec_test.go Signed-off-by: George Felix <george.felix@ubeeqo.com> * gofmt Signed-off-by: George Felix <george.felix@ubeeqo.com> * Removed shouldPass assertion Signed-off-by: George Felix <gfelixc@gmail.com> * Fixes to improve readability Signed-off-by: George Felix <george.felix@ubeeqo.com> * Fixes based on code review comments Signed-off-by: George Felix <george.felix@ubeeqo.com>	2019-10-07 11:35:53 -06:00
陈谭军	103f26d188	fix the wrong word (#6069 ) Signed-off-by: chentanjun <2799194073@qq.com>	2019-09-30 09:54:55 -06:00
Callum Styan	3344bb5c33	Move WAL watcher code to tsdb/wal package. (#5999 ) * Move WAL watcher code to tsdb/wal package. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix tests after moving WAL watcher code. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Lint fixes. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-09-19 14:45:41 +05:30
Björn Rabenstein	3b3eaf3496	Merge pull request #5787 from cstyan/reshard-max-logging Add metrics for max/min/desired shards to queue manager.	2019-09-09 22:32:54 +02:00
Chris Marchbanks	b4317768b9	Merge pull request #5849 from csmarchbanks/rw-use-labels Cache labels.Labels to Identify Series in Remote Write	2019-09-04 14:35:52 -06:00
Yao Zengzeng	f65b7c296d	fix TODO: only stop & recreate remote write queues which have changes (#5540 ) Signed-off-by: YaoZengzeng <yaozengzeng@zju.edu.cn>	2019-09-04 11:21:53 -06:00
Chris Marchbanks	791a2409a2	Pre-allocate pendingSamples to reduce allocations Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-09-03 15:41:47 -06:00
Chris Marchbanks	160186da18	Store labels.Labels instead of []prompb.Label This will use half the steady state memory as required by prompb.Label. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-09-03 15:41:46 -06:00
Stanislav Putrya	6141a8bd7c	Show warnings in UI if query have returned some warnings (#5964 ) * Show warnings in UI if query have returned some warnings + improve warning (error) text if query to remote was finished with error * Add prefixes for remote_read errors Signed-off-by: Stan Putrya <root.vagner@gmail.com>	2019-08-28 14:25:28 +01:00
Bartek Płotka	48b2c9c8ea	remote-read: streamed chunked server side; Extended protobuf; Added chunked, checksumed reader (#5703 ) Part of: https://github.com/prometheus/prometheus/issues/4517 and https://github.com/improbable-eng/thanos/issues/488 Changes: * Extended protobuf for chunked remote read and negotation. * Added checksumed, chunked Writer/Reader. * Added Server side implementation for chunked streamed remote-read. Signed-off-by: Bartek Plotka <bwplotka@gmail.com>	2019-08-19 21:16:10 +01:00
Julius Volz	b5c833ca21	Update go.mod dependencies before release (#5883 ) * Update go.mod dependencies before release Signed-off-by: Julius Volz <julius.volz@gmail.com> * Add issue for showing query warnings in promtool Signed-off-by: Julius Volz <julius.volz@gmail.com> * Revert json-iterator back to 1.1.6 It produced errors when marshaling Point values with special float values. Signed-off-by: Julius Volz <julius.volz@gmail.com> * Fix expected step values in promtool tests after client_golang update Signed-off-by: Julius Volz <julius.volz@gmail.com> * Update generated protobuf code after proto dep updates Signed-off-by: Julius Volz <julius.volz@gmail.com>	2019-08-14 11:00:39 +02:00
Bartek Płotka	32be514845	Merge pull request #5805 from codesome/merge-tsdb Merge tsdb into prometheus	2019-08-13 11:39:41 +01:00
Chris Marchbanks	a6a55c433c	Improve desired shards calculation (#5763 ) The desired shards calculation now properly keeps track of the rate of pending samples, and uses the previously unused integralAccumulator to adjust for missing information in the desired shards calculation. Also, configure more capacity for each shard. The default 10 capacity causes shards to block on each other while sending remote requests. Default to a 500 sample capacity and explain in the documentation that having more capacity will help throughput. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-08-13 10:10:21 +01:00
Ganesh Vernekar	5ecef3542d	Cleanup after merging tsdb into prometheus Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2019-08-13 14:04:14 +05:30
ethan	38ccf0157e	cleanup: correct func name in log message (#5852 ) Signed-off-by: Guangming Wang <guangming.wang@daocloud.io>	2019-08-10 16:24:58 +01:00
Chris Marchbanks	529ccff07b	Remove all usages of stretchr/testify Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-08-08 19:49:27 -06:00
Vadym Martsynovskyy	8318aa2d5d	Check for duplicate label names in remote read (#5829 ) * Check for duplicate label names in remote read Also add test to confirm that #5731 is fixed * Use subtests in TestValidateLabelsAndMetricName * Really check that expectedErr matches err Signed-off-by: Vadym Martsynovskyy <vmartsynovskyy@gmail.com>	2019-08-07 16:13:10 +01:00
Callum Styan	c40a83f386	Add metrics for max shards, min shards, and desired shards. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-08-04 20:04:19 -07:00
AllenZMC	758c71b980	fix word `encourter` to `encounter` Signed-off-by: czm <zhongming.chang@daocloud.io>	2019-07-29 22:16:23 +08:00
Devin Trejo	d77f2aa29c	Only check last directory when discovering checkpoint number (#5756 ) * Only check last directory when discovering checkpoint number Signed-off-by: Devin Trejo <dtrejo@palantir.com> * Comments for checkpointNum Signed-off-by: Devin Trejo <dtrejo@palantir.com>	2019-07-15 17:53:58 +01:00
Yao Zengzeng	3cde8a9941	pass error up if WALWathcer.segments() return err (#5741 ) Signed-off-by: YaoZengzeng <yaozengzeng@zju.edu.cn>	2019-07-15 17:52:03 +01:00
Xigang Wang	445bcd1251	Update the runShard method and change len(pendingSamples) to n=len(pendingSamples) (#5708 ) Signed-off-by: xigang <wangxigang2014@gmail.com>	2019-07-09 19:09:11 +01:00
Chris Marchbanks	06f1ba73eb	Provide flag to compress the tsdb WAL Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-07-03 08:03:29 -06:00
Chris Marchbanks	475ca2ecd0	Update to tsdb 0.9.1 Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-07-03 07:51:11 -06:00
Chris Marchbanks	06bdaf076f	Remote Write Allocs Improvements (#5614 ) * Add benchmark for sample delivery * Simplify StoreSeries to have only one loop * Reduce allocations for pending samples in runShard * Only allocate one send slice per segment * Cache a buffer in each shard for snappy to use * Remove queue manager seriesMtx It is not possible for any of the places protected by the seriesMtx to be called concurrently so it is safe to remove. By removing the mutex we can simplify the Append code to one loop. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-06-27 19:48:21 +01:00
Chris Marchbanks	a38a54fa11	Split remote write storage into its own type This allows other processes to reuse just the remote write code without having to use the remote read code as well. This will be used to create a sidecar capable of sending remote write payloads. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-06-27 10:11:02 +01:00
Thomas Jackson	91d7175eaa	Add storage.Warnings to LabelValues and LabelNames (#5673 ) Fixes #5661 Signed-off-by: Thomas Jackson <jacksontj.89@gmail.com>	2019-06-17 08:31:17 +01:00
Dmitry Shmulevich	0c0638b080	resolve race condition in maxGauge (#5647 ) * resolve race condition in maxGauge Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com>	2019-06-13 20:55:08 +01:00
Chris Marchbanks	840872a6f8	Fix remote storage config not updating correctly (#5555 ) * Update remote write and remote read separately * Add external labels to the remote write conf hash * Add unit tests for remote storage lifecycle Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-05-17 10:29:49 +01:00
Simon Pasquier	45506841e6	*: enable all default linters (#5504 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-05-03 15:11:28 +02:00
Callum Styan	3639d51eb6	Remote Storage: string interner should not panic in release (#5487 ) * Don't panic if we try to release a string that is not in the interner. * Move seriesMtx locking in QueueManager's StoreSeries function. This stops us from calling release for strings that aren't interned if there's a race between reading a checkpoint and storing new series labels, which could happen during checkpointing or reloading config. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-04-24 10:46:31 +01:00
Callum Styan	e87449b59d	Remote Write: Queue Manager specific metrics shouldn't exist if the queue no longer exists (#5445 ) * Unregister remote write queue manager specific metrics when stopping the queue manager. * Use DeleteLabelValues instead of Unregister to remove queue and watcher related metrics when we stop them. Create those metrics in the structs start functions rather than in their constructors because of the ordering of creation, start, and stop in remote storage ApplyConfig. * Add setMetrics function to WAL watcher so we can set the watchers metrics in it's Start function, but not have to call Start in some tests (causes data race). Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-04-23 09:49:17 +01:00
Callum Styan	b7538e7b49	Don't stop, recreate, and start remote storage QueueManagers if the (#5485 ) remote write config hasn't changed at all. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-04-23 09:47:18 +01:00
Romain Baugue	95193fa027	Exhaust every request body before closing it (#5166 ) (#5479 ) From the documentation: > The default HTTP client's Transport may not > reuse HTTP/1.x "keep-alive" TCP connections if the Body is > not read to completion and closed. This effectively enable keep-alive for the fixed requests. Signed-off-by: Romain Baugue <romain.baugue@elwinar.com>	2019-04-18 09:50:37 +01:00
Vasily Sliouniaev	5be9a1426f	Prevent reshard concurrent with calling stop (#5460 ) * Prevent reshard concurrent with calling stop Signed-off-by: Vasily <v.sliouniaev@gmail.com>	2019-04-16 11:25:19 +01:00
Callum Styan	c2b88992a3	Remote Write: fix checkpoint reading (#5429 ) * Fix ReadCheckpoint to ensure that it actually reads all the contents of each segment in a checkpoint dir, or returns an error. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-04-09 10:52:44 +01:00
Tariq Ibrahim	8fdfa8abea	refine error handling in prometheus (#5388 ) i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors. ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives. iii) Does away with the use of fmt package for errors in favour of pkg/errors Signed-off-by: tariqibrahim <tariq181290@gmail.com>	2019-03-26 00:01:12 +01:00
Tom Wilkie	807fd33ecc	Review feedback. - Update read path to use labels.Labels. - Fix the tests. - Remove pack. - Remove unused function. - Fix race in tests. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-03-18 20:31:12 +00:00
Callum Styan	1a7923dde3	Add ref counting to string interning so we can remove a string when there are no longer any refs. Add tests for interning. Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-03-18 20:31:12 +00:00
Tom Wilkie	cbf5f13285	Naive string iterning for labes & values in the remote_write path. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-03-18 20:31:12 +00:00
Tom Wilkie	c7b3535997	Use pkg/relabelling in remote write. - Unmarshall external_labels config as labels.Labels, add tests. - Convert some more uses of model.LabelSet to labels.Labels. - Remove old relabel pkg (fixes #3647). - Validate external label names. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-03-18 20:31:12 +00:00
Tom Wilkie	2fa93595d6	More WAL remote_write tweaks. (#5300 ) * Consistently pre-lookup the metrics for a given queue in queue manager. * Don't open the WAL (for writing) in the remote_write code. * Add some more logging. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-03-05 12:21:11 +00:00
Tariq Ibrahim	1adb91738d	fix typo in recordType method of wal_watcher.go (#5297 ) Signed-off-by: tariqibrahim <tariq181290@gmail.com>	2019-03-04 17:33:35 +01:00
Tariq Ibrahim	ab8e9b7423	fix typo in queue_manager.go comment (#5294 ) Signed-off-by: tariqibrahim <tariq181290@gmail.com>	2019-03-03 11:35:29 +00:00
Tom Wilkie	67da8e7b46	Refactor and fix queue resharding (#5286 ) - Remove prometheus_remote_queue_last_send_timestamp_seconds metric. Its not particularly useful, we have highest_timestamp_seconds. - Factor out maxGauage, a gauge that only increases. - Change sharding calculations to use max samples in timestamp - max samples out timestamp (not rates). - Also include the ratio of samples dropped to correctly predict number of pending samples. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-03-01 11:04:26 -08:00
Callum Styan	b8106dd459	Review feedback: - Add a dropped samples EWMA and use it in calculating desired shards. - Update metric names and a log messages. - Limit number of entries in the dedupe logging middleware to prevent potential OOM. Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Callum Styan	512f549064	Refactor: inline decodeRecord in readSegment and don't bother decoding samples records if we're not tailing the segment, add a benchmark test and fix some other tests Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	f795942572	Decrement pending sample when queue exits. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	ee7efa93fe	Fix some tests. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Callum Styan	b69bdfb4d1	Store the checkpoint we read last, so that we don't keep reading the same checkpoint on each tick. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	efbd9559f4	Deal with corruptions in the WAL: - If we're replaying the WAL to get series records, skip that segment when we hit corruptions. - If we're tailing the WAL for samples, fail the watcher. - When the watcher fails, restart from the latest checkpoint - and only send new samples by updating startTime. - Tidy up log lines and error handling, don't return so many errors on quiting. - Expect EOF when processing checkpoints. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	adf5307470	Update wal LiveReader to ensure EOF is correctly propagated. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Callum Styan	d6258aea8f	Fix up remote write tests: - Tests that created a QueueManager were leaving behind files at the end of tests. - WAL replaying (readToEnd)tests seem to require extra time to finish now. - Some fixes to make staticcheck happy Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	184f06a981	Combine the record decoding metrics into one; break out garbage collection into a separate function. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00

1 2 3 4 5 ...

318 commits