prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-12-26 06:04:05 -08:00

Author	SHA1	Message	Date
Chris Marchbanks	a11e73edda	Fix a deadlock between Batch and FlushAndShutdown (#10608 ) If FlushAndShutdown is called with a full batchQueue, and then Batch is called rather than the normal path of reading from a queue a deadlock might be encountered. Rather than having FlushAndShutdown having blocking code while holding a lock retry sending the batch every second. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2022-04-20 20:50:41 +02:00
beorn7	7ee1836ef5	Merge branch 'main' into sparsehistogram	2022-04-05 18:31:19 +02:00
Wilbert Guo	83a2e52bc2	Add SyncForState Implementation for Ruler HA (#10070 ) * continuously syncing activeAt for alerts Signed-off-by: Yijie Qin <qinyijie@amazon.com> Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> * add import Signed-off-by: Yijie Qin <qinyijie@amazon.com> Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> * Refactor SyncForState and add unit tests Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> * Format code Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> * Add hook for syncForState Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> Fix go lint Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> Refactor syncForState override implementation Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> Add syncForState override func as argument to Update() Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> Fix go formatting Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> Fix circleci test errors Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> Remove overrideFunc as argument to run() Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> * remove the syncForState Signed-off-by: Yijie Qin <qinyijie@amazon.com> * use the override function to decide if need to replace the activeAt or not Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fix test case Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fix format Signed-off-by: Yijie Qin <qinyijie@amazon.com> * Trigger build Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fixing comments Signed-off-by: Yijie Qin <qinyijie@amazon.com> * return the result of map of alerts instead of single one Signed-off-by: Yijie Qin <qinyijie@amazon.com> * upper case the QueryforStateSeries Signed-off-by: Yijie Qin <qinyijie@amazon.com> * use a more generic rule group post process function type Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fix indentation Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fix gofmt Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fix lint Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fixing naming Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fix comments Signed-off-by: Yijie Qin <qinyijie@amazon.com> * add the lastEvalTimestamp as parameter Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fmt Signed-off-by: Yijie Qin <qinyijie@amazon.com> * change funcType to func Signed-off-by: Yijie Qin <qinyijie@amazon.com> Co-authored-by: Yijie Qin <qinyijie@amazon.com> Co-authored-by: Yijie Qin <63399121+qinxx108@users.noreply.github.com>	2022-03-29 02:16:46 +02:00
beorn7	4210aac74a	Merge branch 'main' into sparsehistogram	2022-03-22 14:47:42 +01:00
beorn7	79376c1e94	Merge branch 'release-2.33' into beorn7/release	2022-03-08 17:42:49 +01:00
Chris Marchbanks	e970acb085	Fix deadlock between adding to queue and getting batch Do not block when trying to write a batch to the queue. This can cause appends to lock forever if the only thing reading from the queue needs the mutex to write. Instead, if batchQueue is full pop the sample that was just added from the partial batch and return false. The code doing the appending already handles retries with backoff. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2022-03-07 17:15:57 -07:00
Chris Marchbanks	afdc1decac	Write a test that reproduces the deadlock Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2022-03-07 17:15:51 -07:00
Łukasz Mierzwa	a4317bf0ec	Run gofumpt on all files (#10392 ) * Run gofumpt on all files Getting golangci-lint errors when building on my laptop, possibly because I have newer version of gofumpt then what it was formatted with. Run gofumpt -w -extra on all files as it will be needed in the future anyway. * Update golangci-lint to v1.44.2 v1.44.0 upgraded gofumpt so bumping version in CI will help keep formatting correct for everyone * Address golangci-lint error Getting 'error-strings: error strings should not be capitalized or end with punctuation or a newline' from revive here. Drop new line. Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>	2022-03-03 17:21:05 +01:00
DrAuYueng	5a6e26556b	Add an option to use the external labels as selectors for the remote read endpoint (#10254 ) * An option to ignore external_labels Signed-off-by: DrAuYueng <ouyang1204@gmail.com>	2022-02-16 22:12:47 +01:00
Julien Pivotto	b0d70557b7	Merge pull request #10285 from prometheus/release-2.33	2022-02-12 00:02:24 +01:00
Chris Marchbanks	bfb1500a38	Fix deadlock when stopping a shard (#10279 ) If a queue is stopped and one of its shards happens to hit the batch_send_deadline at the same time a deadlock can occur where stop holds the mutex and will not release it until the send is finished, but the send needs the mutex to retrieve the most recent batch. This is fixed by using a second mutex just for writing. In addition, the test I wrote exposed a case where during shutdown a batch could be sent twice due to concurrent calls to queue.Batch() and queue.FlushAndShutdown(). Protect these with a mutex as well. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2022-02-11 07:07:41 -07:00
Matej Gera	2c61d29b2a	Tracing: Migrate to OpenTelemetry library (#9724 ) Signed-off-by: Matej Gera <matejgera@gmail.com>	2022-01-25 11:08:04 +01:00
Eng Zer Jun	3e67654d37	refactor: use `T.TempDir()` and `B.TempDir` to create temporary directory The directory created by `T.TempDir()` and `B.TempDir()` is automatically removed when the test and all its subtests complete. Reference: https://pkg.go.dev/testing#T.TempDir Reference: https://pkg.go.dev/testing#B.TempDir Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>	2022-01-22 18:57:30 +08:00
Bryan Boreham	954c0e8020	remote_write: round desired shards up before check Previously we would reject an increase from 2 to 2.5 as being within 30%; by rounding up first we see this as an increase from 2 to 3. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-01-10 09:57:37 +00:00
Bryan Boreham	6d01ce8c4d	remote_write: shard up more when backlogged Change the coefficient from 1% to 5%, so instead of targetting to clear the backlog in 100s we target 20s. Update unit test to reflect the new behaviour. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-01-10 09:57:37 +00:00
Bryan Boreham	d588b14d9c	remote_write: detailed test for shard calculation Drive the input parameters to `calculateDesiredShards()` very precisely, to illustrate some questionable behaviour marked with `?!`. See https://github.com/prometheus/prometheus/issues/9178, https://github.com/prometheus/prometheus/issues/9207, Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-01-10 09:57:37 +00:00
Chris Marchbanks	ba03f7fc23	Merge pull request #10102 from prometheus/update-metrics-on-rw-fails Update sent timestamp when write irrecoverably fails	2022-01-05 10:46:09 -07:00
beorn7	e7592fe353	sparsehistogram: Address two TODOs Signed-off-by: beorn7 <beorn@grafana.com>	2022-01-04 12:48:59 +01:00
Goutham Veeramachaneni	6696b7a5f0	Don't update metrics on context cancellation Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>	2022-01-04 10:46:52 +01:00
Chris Marchbanks	dfa5cb7462	Merge pull request #10038 from charlesxsh/fix-TestReshardRaceWithStop add proper exit for loop	2022-01-03 09:02:45 -07:00
Goutham Veeramachaneni	1af81dc5c9	Update sent timestamp when write irrecoverably fails. We have an alert that fires when prometheus_remote_storage_highest_timestamp_in_seconds - prometheus_remote_storage_queue_highest_sent_timestamp_seconds becomes too high. But we have an agent that fires this when the remote "rate-limits" the user. This is because prometheus_remote_storage_queue_highest_sent_timestamp_seconds doesn't get updated when the remote sends a 429. I think we should update the metrics, and the change I made makes sense. Because if the requests fails because of connectivity issues, etc. we will never exit the `sendWriteRequestWithBackoff` function. It only exits the function when there is a non-recoverable error, like a bad status code, and in that case, I think the metric needs to be updated. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>	2022-01-03 11:13:48 +01:00
Shihao Xia	c3e7bfb813	add proper exit for loop Signed-off-by: Shihao Xia <charlesxsh@hotmail.com>	2021-12-29 23:48:11 -05:00
beorn7	86cc83b13c	storage: iterator fixes after merge Signed-off-by: beorn7 <beorn@grafana.com>	2021-12-18 14:12:01 +01:00
beorn7	64c7bd2b08	Merge branch 'main' into sparsehistogram	2021-12-18 14:04:25 +01:00
Julien Pivotto	27343277fa	Merge release-2.32 forward into main (#10032 ) * storage: expose bug in iterators #10027 Signed-off-by: beorn7 <beorn@grafana.com> * storage: fix bug #10027 in iterators' Seek method Signed-off-by: beorn7 <beorn@grafana.com> * Append reporting metrics without limit If reporting metrics fails due to reaching the limit, this makes the target appear as UP in the UI, but the metrics are missing. This commit bypasses that limit for report metrics. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * Remove check against cfg so interval/ timeout are always set (#10023) (#10031) Signed-off-by: Nicholas Blott <blottn@tcd.ie> Co-authored-by: Nicholas Blott <blottn@tcd.ie> * Cut v2.32.1 Signed-off-by: Julius Volz <julius.volz@gmail.com> * Apply suggestions from code review Signed-off-by: Julius Volz <julius.volz@gmail.com> Co-authored-by: Levi Harrison <git@leviharrison.dev> Co-authored-by: Julien Pivotto <roidelapluie@inuits.eu> Co-authored-by: Nicholas Blott <blottn@tcd.ie> Co-authored-by: Julius Volz <julius.volz@gmail.com> Co-authored-by: Levi Harrison <git@leviharrison.dev>	2021-12-17 23:18:38 +01:00
beorn7	0ede6ae321	storage: fix bug #10027 in iterators' Seek method Signed-off-by: beorn7 <beorn@grafana.com>	2021-12-16 12:07:35 +01:00
beorn7	b042e29569	storage: expose bug in iterators #10027 Signed-off-by: beorn7 <beorn@grafana.com>	2021-12-16 12:02:15 +01:00
beorn7	6f33ab2b35	Merge branch 'main' into sparsehistogram	2021-12-15 13:49:33 +01:00
Chris Marchbanks	0a8d28ea93	Merge pull request #9934 from bboreham/remote-write-struct remote-write: buffer struct instead of interface to reduce garbage-collection	2021-12-09 09:17:45 -07:00
Bryan Boreham	bd6436605d	Review feedback Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2021-12-09 14:40:44 +00:00
Sebastian Rabenhorst	d8b8678bd1	Log time series details for out-of-order samples in remote write receiver (#9894 ) * Improved out-of-order sample logs in write handler Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> sign commit Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> Inlined logAppendError Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> Update storage/remote/write_handler.go Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> Fixed fmt Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> * Improved out-of-order sample logs in write handler Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> sign commit Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> Inlined logAppendError Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>	2021-12-08 15:07:51 +00:00
detailyang	3e482c905f	fix:storage:avoid panic when iterater exhauested (#9945 ) Signed-off-by: detailyang <detailyang@gmail.com>	2021-12-07 19:50:00 +05:30
Bryan Boreham	50878ebe5e	remote-write: buffer struct instead of interface This reduces the amount of individual objects allocated, allowing sends to run a bit faster. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2021-12-03 14:30:42 +00:00
Bryan Boreham	c478d6477a	remote-write: benchmark just sending, on 20 shards Previously BenchmarkSampleDelivery spent a lot of effort checking each sample had arrived, so was largely showing the performance of test-only code. Increase the number of shards to be more realistic for a large workload. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2021-12-03 14:02:10 +00:00
Chris Marchbanks	e95d4ec3f1	Merge pull request #9830 from prometheus/batch-queues Batch samples before sending them to channels	2021-12-02 08:37:41 -07:00
Chris Marchbanks	c655684142	Subtract from enqueued samples/exemplars upon send Right now the values for enqueuedSamples and enqueuedExemplars is never subtracted leading to inflated values for failedSamples/failedExemplars when a hard shutdown of a shard occurs. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2021-11-30 12:54:50 -07:00
Chris Marchbanks	319249f9db	Batch samples before sending them to channels Channels can cause bottlenecks and tons of context switches when reading hundreds of thousands of samples per second from a single queue. Instead, pre-batch the samples to amortize the cost of the concurrency overhead. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2021-11-30 12:54:45 -07:00
beorn7	68e02be963	Post-merge fixes Signed-off-by: beorn7 <beorn@grafana.com>	2021-11-30 17:20:28 +01:00
beorn7	e4e24453fa	Merge branch 'main' into beorn7/merge2	2021-11-30 17:19:06 +01:00
Björn Rabenstein	4ce01e9770	storage: Rename ...Values methods to At... (#9889 ) This mirrors #9888 for the richer iterators we have with histograms in the game. Signed-off-by: beorn7 <beorn@grafana.com>	2021-11-29 16:23:04 +05:30
Björn Rabenstein	d677aa4b29	storage: Consolidate iterator method names (Values -> At) (#9888 ) `BufferedSeriesIterator` and `MemoizedSeriesIterator` use a method called `Values` for exactly the purpose for which all other iterators of the same kind use a method called `At`. That alone is confusing, but on top of that, the `Values` method only returns a single sample, not multiple values. I assume the naming has historical reasons. This commit makes it more consistent. It is now easier to read, and now `BufferedSeriesIterator` and `MemoizedSeriesIterator` implement `chunkenc.Iterator` like many other iterators, too. Signed-off-by: beorn7 <beorn@grafana.com>	2021-11-29 11:16:40 +01:00
Björn Rabenstein	b866db009b	storage: Fix and improve the Seek method of various iterators (#9878 ) There was a subtle and nasty bug in listSeriesIterator.Seek. In addition, the Seek call is defined to be a no-op if the current position of the iterator is already pointing to a suitable sample. This commit adds fast paths for this case to several potentially expensive Seek calls. Another bug was in concreteSeriesIterator.Seek. It always searched the whole series and not from the current position of the iterator. Signed-off-by: beorn7 <beorn@grafana.com>	2021-11-29 15:17:56 +05:30
Björn Rabenstein	7e42acd3b1	tsdb: Rework iterators (#9877 ) - Pick At... method via return value of Next/Seek. - Do not clobber returned buckets. - Add partial FloatHistogram suppert. Note that the promql package is now _only_ dealing with FloatHistograms, following the idea that PromQL only knows float values. As a byproduct, I have removed the histogramSeries metric. In my understanding, series can have both float and histogram samples, so that metric doesn't make sense anymore. As another byproduct, I have converged the sampleBuf and the histogramSampleBuf in memSeries into one. The sample type stored in the sampleBuf has been extended to also contain histograms even before this commit. Signed-off-by: beorn7 <beorn@grafana.com>	2021-11-29 13:24:23 +05:30
Ganesh Vernekar	26c0a433f5	Support appending different sample types to the same series (#9705 ) * Support appending different sample types to the same series Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix comments Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix build Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-11-26 17:43:27 +05:30
Matheus Alcantara	e673805d67	storage/remote: use t.TempDir instead of ioutil.TempDir on tests (#9811 ) Signed-off-by: Matheus Alcantara <matheusssilv97@gmail.com>	2021-11-19 15:21:45 -05:00
Hu Shuai	eb43437d83	Fix golint issue (#9800 ) Signed-off-by: Hu Shuai <hus.fnst@fujitsu.com>	2021-11-18 09:26:07 +01:00
beorn7	5d4db805ac	Merge branch 'main' into sparsehistogram	2021-11-17 19:57:31 +01:00
Dieter Plaetinck	0fac9bb859	Add basic initial developer docs for TSDB (#9451 ) * Add basic initial developer docs for TSDB There's a decent amount of content already out there (blog posts, conference talks, etc), but: * when they get stale, they don't tend to get updated * they still leave me with questions that I'ld like to answer for developers (like me) who want to use, or work with, TSDB What I propose is developer docs inside the prometheus repository. Easy to find and harness the power of the community to expand it and keep it up to date. * perfect is the enemy of good. Let's have a base and incrementally improve * Markdown docs should be broad but not too deep. Source code comments can complement them, and are the ideal place for implementation details. Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * use example code that works out of the box Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * Apply suggestions from code review Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * PR feedback Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * more docs Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * PR feedback Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * Apply suggestions from code review Signed-off-by: Dieter Plaetinck <dieter@grafana.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> * Apply suggestions from code review Signed-off-by: Dieter Plaetinck <dieter@grafana.com> Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> * feedback Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * Update tsdb/docs/usage.md Signed-off-by: Dieter Plaetinck <dieter@grafana.com> Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> * final tweaks Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * workaround docs versioning issue Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * Move example code to real executable, testable example. Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * cleanup example test and make sure it always reproduces Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * obtain temp dir in a way that works with older Go versions Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * Fix Ganesh's comments Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-11-17 15:51:27 +05:30
beorn7	73858d7f82	storage: histogram support in memoized_iterator Signed-off-by: beorn7 <beorn@grafana.com>	2021-11-15 21:55:58 +01:00
beorn7	9b30ca2598	promql: Support histogram in value string representation Signed-off-by: beorn7 <beorn@grafana.com>	2021-11-15 20:36:44 +01:00
beorn7	4c28d9fac7	Move to histogram.Histogram pointers This is to avoid copying the many fields of a histogram.Histogram all the time. This also fixes a bunch of formerly broken tests. Signed-off-by: beorn7 <beorn@grafana.com>	2021-11-12 23:17:35 +01:00
Mateusz Gozdek	d8561dbfd8	storage/remote: make tests use separate remote write configs So tests can be run in parallel without races. Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>	2021-11-10 09:40:43 +01:00
Mateusz Gozdek	116552cc58	storage/remote: check errors from ApplyConfig in tests So tests do not produce obscure errors when applying configuration fails. Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>	2021-11-10 09:40:43 +01:00
beorn7	c954cd9d1d	Move packages out of deprecated pkg directory This creates a new `model` directory and moves all data-model related packages over there: exemplar labels relabel rulefmt textparse timestamp value All the others are more or less utilities and have been moved to `util`: gate logging modetimevfs pool runtime Signed-off-by: beorn7 <beorn@grafana.com>	2021-11-09 08:03:10 +01:00
beorn7	8f92c90897	Add TODOs and some minor tweaks Signed-off-by: beorn7 <beorn@grafana.com>	2021-11-07 17:12:04 +01:00
Dieter Plaetinck	cda025b5b5	TSDB: demistify SeriesRefs and ChunkRefs (#9536 ) * TSDB: demistify seriesRefs and ChunkRefs The TSDB package contains many types of series and chunk references, all shrouded in uint types. Often the same uint value may actually mean one of different types, in non-obvious ways. This PR aims to clarify the code and help navigating to relevant docs, usage, etc much quicker. Concretely: * Use appropriately named types and document their semantics and relations. * Make multiplexing and demuxing of types explicit (on the boundaries between concrete implementations and generic interfaces). * Casting between different types should be free. None of the changes should have any impact on how the code runs. TODO: Implement BlockSeriesRef where appropriate (for a future PR) Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * feedback Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * agent: demistify seriesRefs and ChunkRefs Signed-off-by: Dieter Plaetinck <dieter@grafana.com>	2021-11-06 15:40:04 +05:30
Ganesh Vernekar	c8b267efd6	Get histograms from TSDB to the rate() function implementation Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-11-03 19:04:18 +05:30
Marco Pracucci	9f5ff5b269	Allow to disable trimming when querying TSDB (#9647 ) * Allow to disable trimming when querying TSDB Signed-off-by: Marco Pracucci <marco@pracucci.com> * Addressed review comments Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added unit test Signed-off-by: Marco Pracucci <marco@pracucci.com> * Renamed TrimDisabled to DisableTrimming Signed-off-by: Marco Pracucci <marco@pracucci.com>	2021-11-03 15:38:34 +05:30
sniper	f82e56fbba	fix request bytes size and continue is useless (#9635 ) Signed-off-by: kalmanzhao <kalmanzhao@tencent.com> Co-authored-by: kalmanzhao <kalmanzhao@tencent.com>	2021-11-03 14:40:31 +05:30
Mateusz Gozdek	b7bdf6fab2	Fix imports formatting According to `2829908806 (r58457095)`. Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>	2021-11-02 19:52:34 +01:00
Mateusz Gozdek	1a6c2283a3	Format Go source files using 'gofumpt -w -s -extra' Part of #9557 Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>	2021-11-02 19:52:34 +01:00
lzhfromustc	9da5382103	storage/remote: Prevent two goroutines from endless loop (#8967 ) Signed-off-by: lzhfromustc <lzhfromustc@gmail.com>	2021-10-29 16:39:02 -07:00
lzhfromustc	d42be7be76	test:Fix two potential goroutine leaks (#8964 ) Signed-off-by: lzhfromustc <lzhfromustc@gmail.com>	2021-10-29 15:44:32 -07:00
Bryan Boreham	5afa606ecb	Remote-write: reuse memory for marshalling (#9412 ) By holding a `proto.Buffer` per shard and passing it down to where marshalling is done, we avoid creating a lot of garbage. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2021-10-29 14:44:40 -07:00
Robert Fratto	bc72a718c4	Initial draft of prometheus-agent (#8785 ) * Initial draft of prometheus-agent This commit introduces a new binary, prometheus-agent, based on the Grafana Agent code. It runs a WAL-only version of prometheus without the TSDB, alerting, or rule evaluations. It is intended to be used to remote_write to Prometheus or another remote_write receiver. By default, prometheus-agent will listen on port 9095 to not collide with the prometheus default of 9090. Truncation of the WAL cooperates on a best-effort case with Remote Write. Every time the WAL is truncated, the minimum timestamp of data to truncate is determined by the lowest sent timestamp of all samples across all remote_write endpoints. This gives loose guarantees that data from the WAL will not try to be removed until the maximum sample lifetime passes or remote_write starts functionining. Signed-off-by: Robert Fratto <robertfratto@gmail.com> * add tests for Prometheus agent (#22) * add tests for Prometheus agent * add tests for Prometheus agent * rearranged tests as per the review comments * update tests for Agent * changes as per code review comments Signed-off-by: SriKrishna Paparaju <paparaju@gmail.com> * incremental changes to prometheus agent Signed-off-by: SriKrishna Paparaju <paparaju@gmail.com> * changes as per code review comments Signed-off-by: SriKrishna Paparaju <paparaju@gmail.com> * Commit feedback from code review Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com> Signed-off-by: Robert Fratto <robertfratto@gmail.com> * Port over some comments from grafana/agent Signed-off-by: Robert Fratto <robertfratto@gmail.com> * Rename agent.Storage to agent.DB for tsdb consistency Signed-off-by: Robert Fratto <robertfratto@gmail.com> * Consolidate agentMode ifs in cmd/prometheus/main.go Signed-off-by: Robert Fratto <robertfratto@gmail.com> * Document PreAction usage requirements better for agent mode flags Signed-off-by: Robert Fratto <robertfratto@gmail.com> * remove unnecessary defaultListenAddr Signed-off-by: Robert Fratto <robertfratto@gmail.com> * `go fmt ./tsdb/agent` and fix lint errors Signed-off-by: Robert Fratto <robertfratto@gmail.com> Co-authored-by: SriKrishna Paparaju <paparaju@gmail.com>	2021-10-29 16:25:05 +01:00
beorn7	a9008f5423	Merge branch 'main' into sparsehistogram	2021-10-19 17:14:23 +02:00
Ben Ye	fdbc40a9ef	Expose NewChainSampleIterator func (#9475 ) * expose NewChainSampleIterator func Signed-off-by: Ben Ye <ben.ye@bytedance.com> * add comment Signed-off-by: Ben Ye <ben.ye@bytedance.com> * update comments Signed-off-by: Ben Ye <ben.ye@bytedance.com>	2021-10-14 14:49:00 +05:30
beorn7	7a8bb8222c	Style cleanup of all the changes in sparsehistogram so far A lot of this code was hacked together, literally during a hackathon. This commit intends not to change the code substantially, but just make the code obey the usual style practices. A (possibly incomplete) list of areas: * Generally address linter warnings. * The `pgk` directory is deprecated as per dev-summit. No new packages should be added to it. I moved the new `pkg/histogram` package to `model` anticipating what's proposed in #9478. * Make the naming of the Sparse Histogram more consistent. Including abbreviations, there were just too many names for it: SparseHistogram, Histogram, Histo, hist, his, shs, h. The idea is to call it "Histogram" in general. Only add "Sparse" if it is needed to avoid confusion with conventional Histograms (which is rare because the TSDB really has no notion of conventional Histograms). Use abbreviations only in local scope, and then really abbreviate (not just removing three out of seven letters like in "Histo"). This is in the spirit of https://github.com/golang/go/wiki/CodeReviewComments#variable-names * Several other minor name changes. * A lot of formatting of doc comments. For one, following https://github.com/golang/go/wiki/CodeReviewComments#comment-sentences , but also layout question, anticipating how things will look like when rendered by `godoc` (even where `godoc` doesn't render them right now because they are for unexported types or not a doc comment at all but just a normal code comment - consistency is queen!). * Re-enabled `TestQueryLog` and `TestEndopints` (they pass now, leaving them disabled was presumably an oversight). * Bucket iterator for histogram.Histogram is now created with a method. * HistogramChunk.iterator now allows iterator recycling. (I think @dieterbe only commented it out because he was confused by the question in the comment.) * HistogramAppender.Append panics now because we decided to treat staleness marker differently. Signed-off-by: beorn7 <beorn@grafana.com>	2021-10-11 13:02:03 +02:00
beorn7	fd5ea4e0b5	Merge branch 'main' into sparsehistogram	2021-10-07 23:16:42 +02:00
Bryan Boreham	1fb3c1b598	Replace calls to strings.Compare (#9397 ) < is clearer and faster. As the documentation says, "Basically no one should use strings.Compare." Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2021-09-27 17:33:53 +05:30
Julien Pivotto	63b3e4e5ec	Enable HTTP2 again (#9398 ) We are re-enabling HTTP 2 again. There has been a few bugfixes upstream in go, and we have also enabled ReadIdleTimeout. Fix #7588 Fix #9068 Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2021-09-26 23:16:12 +02:00
Nick Pillitteri	acee8c8a88	Redact remote write URL when used for metric label (#9383 ) Redact any basic auth passwords in the remote write URL (which are technically allowed although not recommended) when used as metric labels. Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>	2021-09-23 12:34:09 -06:00
Serge Catudal	d77c985f8c	Add initial support for exemplar to the remote write receiver endpoint (#9319 ) * Add initial support for exemplar to the remote write receiver endpoint Signed-off-by: Serge Catudal <serge.catudal@gmail.com> * Update storage remote write handler tests with exemplars Signed-off-by: Serge Catudal <serge.catudal@gmail.com> * Update remote write handler in order to have a distinct checkAppendExemplarError function from scrape Signed-off-by: Serge Catudal <serge.catudal@gmail.com>	2021-09-21 14:53:27 -06:00
Paweł Szulik	f5563bfe95	tests: Move from t.Errorf and others. (Part 2) (#9309 ) * Refactor util tests. Signed-off-by: Paweł Szulik <paul.szulik@gmail.com>	2021-09-13 21:19:20 +02:00
Levi Harrison	bd57cd395e	Switch to common/sigv4 Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-08-26 09:37:19 -04:00
Ganesh Vernekar	eedb86783e	Fix queries on blocks for sparse histograms and add unit test (#9209 ) Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-08-16 18:52:29 +05:30
Ganesh Vernekar	095f572d4a	Sync sparsehistogram branch with main (#9189 ) * Fix `kuma_sd` targetgroup reporting (#9157) * Bundle all xDS targets into a single group Signed-off-by: austin ce <austin.cawley@gmail.com> * Snapshot in-memory chunks on shutdown for faster restarts (#7229) Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Rename links Signed-off-by: Levi Harrison <git@leviharrison.dev> * Remove Individual Data Type Caps in Per-shard Buffering for Remote Write (#8921) * Moved everything to nPending buffer Signed-off-by: Levi Harrison <git@leviharrison.dev> * Simplify exemplar capacity addition Signed-off-by: Levi Harrison <git@leviharrison.dev> * Added pre-allocation Signed-off-by: Levi Harrison <git@leviharrison.dev> * Don't allocate if not sending exemplars Signed-off-by: Levi Harrison <git@leviharrison.dev> * Avoid deadlock when processing duplicate series record (#9170) * Avoid deadlock when processing duplicate series record `processWALSamples()` needs to be able to send on its output channel before it can read the input channel, so reads to allow this in case the output channel is full. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * processWALSamples: update comment Previous text seems to relate to an earlier implementation. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Optimise WAL loading by removing extra map and caching min-time (#9160) * BenchmarkLoadWAL: close WAL after use So that goroutines are stopped and resources released Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * BenchmarkLoadWAL: make series IDs co-prime with #workers Series are distributed across workers by taking the modulus of the ID with the number of workers, so multiples of 100 are a poor choice. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * BenchmarkLoadWAL: simulate mmapped chunks Real Prometheus cuts chunks every 120 samples, then skips those samples when re-reading the WAL. Simulate this by creating a single mapped chunk for each series, since the max time is all the reader looks at. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Fix comment Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Remove series map from processWALSamples() The locks that is commented to reduce contention in are now sharded 32,000 ways, so won't be contended. Removing the map saves memory and goes just as fast. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * loadWAL: Cache the last mmapped chunk time So we can skip calling append() for samples it will reject. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Improvements from code review Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Full stops and capitals on comments Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Cache max time in both places mmappedChunks is updated Including refactor to extract function `setMMappedChunks`, to reduce code duplication. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Update head min/max time when mmapped chunks added This ensures we have the correct values if no WAL samples are added for that series. Note that `mSeries.maxTime()` was always `math.MinInt64` before, since that function doesn't consider mmapped chunks. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Split Go and React Tests (#8897) * Added go-ci and react-ci Co-authored-by: Julien Pivotto <roidelapluie@inuits.eu> Signed-off-by: Levi Harrison <git@leviharrison.dev> * Remove search keymap from new expression editor (#9184) Signed-off-by: Julius Volz <julius.volz@gmail.com> Co-authored-by: Austin Cawley-Edwards <austin.cawley@gmail.com> Co-authored-by: Levi Harrison <git@leviharrison.dev> Co-authored-by: Julien Pivotto <roidelapluie@inuits.eu> Co-authored-by: Bryan Boreham <bjboreham@gmail.com> Co-authored-by: Julius Volz <julius.volz@gmail.com>	2021-08-11 15:43:17 +05:30
Levi Harrison	fac1b57334	Remove Individual Data Type Caps in Per-shard Buffering for Remote Write (#8921 ) * Moved everything to nPending buffer Signed-off-by: Levi Harrison <git@leviharrison.dev> * Simplify exemplar capacity addition Signed-off-by: Levi Harrison <git@leviharrison.dev> * Added pre-allocation Signed-off-by: Levi Harrison <git@leviharrison.dev> * Don't allocate if not sending exemplars Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-08-09 15:20:53 -06:00
Ganesh Vernekar	8b70e87ab9	Merge remote-tracking branch 'upstream/main' into sparse-refactor Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-08-05 12:16:08 +05:30
jinglina	ed24e51e7c	remove redundant type conversion (#9126 ) Signed-off-by: jinglina <jinglinax@163.com>	2021-07-28 13:33:46 +05:30
Bryan Boreham	60804c5a09	remote_write: reduce blocking from garbage-collect of series (#9109 ) * Refactor: pass segment-reading function as param To allow a different implementation to be used when garbage-collecting. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * remote_write: reduce blocking from GC of series Add a method `UpdateSeriesSegment()` which is used together with `SeriesReset()` to garbage-collect old series. This allows us to split the lock around queueManager series data and avoid blocking `Append()` while reading series from the last checkpoint. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Cosmetic: review feedback on comments Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * remote-write benchmark: include GC of series Reduce the total number of samples per iteration from 50005000 (25 million) which is too big for my laptop, to 110000. Extend `createTimeseries()` to add additional labels, so that the queue manager is doing more realistic work. Move the Append() call to a background goroutine - this works because TestWriteClient uses a WaitGroup to signal completion. Call `StoreSeries()` and `SeriesReset()` while adding samples, to simulate the garbage-collection that wal.Watcher does. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Change BenchmarkSampleDelivery to call UpdateSeriesSegment This matches what Watcher.garbageCollectSeries() is doing now Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2021-07-27 13:21:48 -07:00
Oleg Zaytsev	b1ed4a0a66	LabelNames API with matchers (#9083 ) * Push the matchers for LabelNames all the way into the index. NB This doesn't actually implement it in the index, just plumbs it through for now... Signed-off-by: Tom Wilkie <tom@grafana.com> * Hack it up. Does not work. Signed-off-by: Tom Wilkie <tom@grafana.com> * Revert changes I don't understand Can't see why do we need to hold a mutex on symbols, and the purpose of the LabelNamesFor method. Maybe I'll need to re-add this later. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Implement LabelNamesFor This method provides the label names that appear in the postings provided. We do that deeper than the label values because we know beforehand that most of the label names we'll be the same across different postings, and we don't want to go down an up looking up the same symbols for all different series. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Mutex on symbols should be unlocked However, I still don't understand why do we need a mutex here. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Fix head.LabelNamesFor Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Implement mockIndex LabelNames with matchers Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Nitpick on slice initialisation Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Add tests for LabelNamesWithMatchers Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Fix the mutex mess on head.LabelValues/LabelNames I still don't see why we need to grab that unrelated mutex, but at least now we're grabbing it consistently Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Check error after iterating postings Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Use the error from posting when there was en error in postings Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Update storage/interface.go comment Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> * Update tsdb/index/index.go comment Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> * Update tsdb/index/index.go wrapped error msg Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> * Update tsdb/index/index.go wrapped error msg Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> * Update tsdb/index/index.go warpped error msg Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> * Remove unneeded comment Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Add testcases for LabelNames w/matchers in api.go Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Use t.Cleanup() instead of defer in tests Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> Co-authored-by: Tom Wilkie <tom@grafana.com> Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>	2021-07-20 18:08:08 +05:30
Martin Disibio	1bcd13d6b5	Exemplar resize (#8974 ) * Create experimental circular buffer resize method, benchmarks Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Optimize exemplar resize to only replay as many exemplars as needed Signed-off-by: Martin Disibio <mdisibio@gmail.com> * More comments, benchmark AddExemplar Signed-off-by: Martin Disibio <mdisibio@gmail.com> * optimizations Signed-off-by: Martin Disibio <mdisibio@gmail.com> * comment Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Slight refactor of resize benchmark + make use of resize via runtime reloadable storage config. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Some more config related changes. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address some review comments. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address more review comments. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Refactor to remove usage of noopExemplarStorage and avoid race condition when resizing from Head code. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix or add comments to clarify some of the new behaviour. Signed-off-by: Callum Styan <callumstyan@gmail.com> * fix potential panics related to negative exemplar buffer lengths Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Callum Styan <callumstyan@gmail.com>	2021-07-20 10:22:57 +05:30
Ganesh Vernekar	a1087ed37a	Fix scraping of sparse histograms (#9031 ) Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-07-01 18:19:04 +05:30
Ganesh Vernekar	f4d3af73f0	Query histograms from TSDB and unit test for append+query (#9022 ) Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-06-30 20:18:13 +05:30
Ganesh Vernekar	04ad56d9b8	Append sparse histograms into the Head block (#9013 ) * Append sparse histograms into the Head block Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Add AtHistogram() to Iterator interface. Make HistoChunk conform to Chunk interface. Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-06-29 20:08:46 +05:30
Ganesh Vernekar	64bea6999e	HistogramAppender interface for sparse histograms (#9007 ) Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-06-28 20:30:55 +05:30
Levi Harrison	d5c3c567d3	Remote Write: Add max samples per metadata send (#8959 ) * Added MaxSamplesPerSend Signed-off-by: Levi Harrison <git@leviharrison.dev> * Added tests Signed-off-by: Levi Harrison <git@leviharrison.dev> * Fixed order of require Signed-off-by: Levi Harrison <git@leviharrison.dev> * Added docs Signed-off-by: Levi Harrison <git@leviharrison.dev> * writes -> writesReceived Signed-off-by: Levi Harrison <git@leviharrison.dev> * Improved send loop Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-06-24 15:39:50 -07:00
Levi Harrison	b5f6f8fb36	Switched to go-kit/log Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-06-11 12:28:36 -04:00
kongxs	632678a461	Fix spelling mistake (#8879 ) * Fix spelling mistake Signed-off-by: kjinan <2008kongxiangsheng@163.com> * Update discovery/kubernetes/endpoints.go Co-authored-by: Julien Pivotto <roidelapluie@gmail.com> Signed-off-by: kjinan <2008kongxiangsheng@163.com> Co-authored-by: Julien Pivotto <roidelapluie@gmail.com>	2021-06-01 00:49:29 +02:00
Julien Pivotto	ae086c73cb	Merge pull request #8757 from songjiayang/refactor-processExternalLabels Refactor processExternalLabels method with slice copy for left labels	2021-05-25 18:12:16 +02:00
Ben Ye	d95b097250	expose seriesToChunkEncoder (#8845 ) Signed-off-by: yeya24 <yb532204897@gmail.com>	2021-05-19 13:01:35 +01:00
Matthias Loibl	7e7efaba32	storage: Split chunks if more than 120 samples (#8582 ) * storage: Split chunks if more than 120 samples Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * storage: Don't set maxt which is overwritten right away Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * storage: Improve comments on merge_test Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * storage: Improve comments and move code closer to usage Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * tsdb/tsdbutil: Add comment for GenerateSamples Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>	2021-05-18 18:37:16 +02:00
songjiayang	9a01472780	Refactor processExternalLabels method with slice copy for left labels Signed-off-by: songjiayang <songjiayang1@gmail.com>	2021-05-12 21:31:41 +08:00
Hu Shuai	996848ef40	Fix golint issue Signed-off-by: Hu Shuai <hus.fnst@cn.fujitsu.com>	2021-05-08 11:45:29 +08:00
Callum Styan	8fd73b1d28	Add Exemplar Remote Write support (#8296 ) * Write exemplars to the WAL and send them over remote write. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Update example for exemplars, print data in a more obvious format. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Add metrics for remote write of exemplars. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix incorrect slices passed to send in remote write. Signed-off-by: Callum Styan <callumstyan@gmail.com> * We need to unregister the new metrics. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address review comments Signed-off-by: Callum Styan <callumstyan@gmail.com> * Order of exemplar append vs write exemplar to WAL needs to change. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Several fixes to prevent sending uninitialized or incorrect samples with an exemplar. Fix dropping exemplar for missing series. Add tests for queue_manager sending exemplars Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Store both samples and exemplars in the same timeseries buffer to remove the alloc when building final request, keep sub-slices in separate buffers for re-use Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Condense sample/exemplar delivery tests to parameterized sub-tests Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Rename test methods for clarity now that they also handle exemplars Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Rename counter variable. Fix instances where metrics were not updated correctly Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Add exemplars to LoadWAL benchmark Signed-off-by: Callum Styan <callumstyan@gmail.com> * last exemplars timestamp metric needs to convert value to seconds with ms precision Signed-off-by: Callum Styan <callumstyan@gmail.com> * Process exemplar records in a separate go routine when loading the WAL. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address review comments related to clarifying comments and variable names. Also refactor sample/exemplar to enqueue prompb types. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Regenerate types proto with comments, update protoc version again. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Put remote write of exemplars behind a feature flag. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address some of Ganesh's review comments. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Move exemplar remote write feature flag to a config file field. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address Bartek's review comments. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Don't allocate exemplar buffers in queue_manager if we're not going to send exemplars over remote write. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Add ValidateExemplar function, validate exemplars when appending to head and log them all to WAL before adding them to exemplar storage. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address more reivew comments from Ganesh. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Add exemplar total label length check. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address a few last review comments Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Martin Disibio <mdisibio@gmail.com>	2021-05-06 13:53:52 -07:00
ZouYu	c7262f0d70	Fix some gofmt warnings (#8743 ) Signed-off-by: Zou Yu <zouy.fnst@cn.fujitsu.com>	2021-04-22 08:43:30 -06:00
Marco Pracucci	4da5c25ea4	Upgrade prometheus/common to v0.21.0 Signed-off-by: Marco Pracucci <marco@pracucci.com>	2021-04-21 12:19:16 +02:00
Christian Simon	9781e51f59	Correct spelling of "iterable" (#8713 ) Signed-off-by: Christian Simon <simon@swine.de>	2021-04-12 21:43:42 +01:00
Bryan Boreham	c7a62b95ce	GetRef() now returns the label set (#8641 ) The purpose of GetRef() is to allow Append() to be called without the caller needing to copy the labels. To avoid a race where a series is removed from TSDB between the calls to GetRef() and Append(), we return TSDB's copy of the labels. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2021-03-24 15:24:58 +00:00
Bryan Boreham	d614ae9ecf	[RFC] Add method to get reference number for TSDB Appender (#8600 ) * Add method to get reference number for TSDB Appender In situations where we need to copy labels before calling Add(), GetRef() allows to check first, then call AddFast() in the case that the series is already known. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Add explicit interface for GetRef() method Suggested in code review by @bwplotka Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Rename OptionalGetRef to GetRef Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Simplify return value of GetRef() 0 can be relied on to mean 'no reference' Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2021-03-19 19:28:55 +00:00
Marco Pracucci	e246670193	Further increase max log line in remote write client (#8616 ) Signed-off-by: Marco Pracucci <marco@pracucci.com>	2021-03-18 16:53:18 +00:00
Callum Styan	289ba11b79	Add circular in-memory exemplars storage (#6635 ) * Add circular in-memory exemplars storage Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> Signed-off-by: Martin Disibio <mdisibio@gmail.com> Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> Co-authored-by: Martin Disibio <mdisibio@gmail.com> * Fix some comments, clean up exemplar metrics struct and exemplar tests. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix exemplar query api null vs empty array issue. Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> Co-authored-by: Martin Disibio <mdisibio@gmail.com>	2021-03-16 15:17:45 +05:30
Julien Pivotto	76750d2a96	Merge pull request #8585 from pracucci/optimize-buffered-iterator Optimized vectorSelectorSingle()	2021-03-15 11:50:46 +01:00
Marco Pracucci	6f050f66c7	Update storage/memoized_iterator.go Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Julien Pivotto <roidelapluie@gmail.com>	2021-03-15 09:27:20 +01:00
Julius Volz	cf4250cff3	Fix sample deduplication in chainSampleIterator Fixes https://github.com/prometheus/prometheus/issues/8558 Signed-off-by: Julius Volz <julius.volz@gmail.com>	2021-03-12 12:34:23 +01:00
Marco Pracucci	b92c03023d	Optimized vector selector Signed-off-by: Marco Pracucci <marco@pracucci.com>	2021-03-11 14:32:56 +01:00
Robert Fratto	5b78aa0649	Contribute grafana/agent sigv4 code (#8509 ) * Contribute grafana/agent sigv4 code * address review feedback - move validation logic for RemoteWrite into unmarshal - copy configuration fields from ec2 SD config - remove enabled field, use pointer for enabling sigv4 * Update config/config.go * Don't provide credentials if secret key / access key left blank * Add SigV4 headers to the list of unchangeable headers. * sigv4: don't include all headers in signature * only test for equality in the authorization header, not the signed date * address review feedback 1. s/httpClientConfigEnabled/httpClientConfigAuthEnabled 2. bearer_token tuples to "authorization" 3. Un-export NewSigV4RoundTripper * add x-amz-content-sha256 to list of unchangeable headers * Document sigv4 configuration * add suggestion for using default AWS SDK credentials Signed-off-by: Robert Fratto <robertfratto@gmail.com> Co-authored-by: Julien Pivotto <roidelapluie@gmail.com>	2021-03-08 12:20:09 -07:00
Julien Pivotto	9c4bc38c94	Merge pull request #8516 from Harkishen-Singh/headers-remote-read-on-round-tripper Custom headers on remote-read and refactor implementation to roundtripper	2021-02-26 17:55:07 +01:00
Tom Wilkie	ce97cdd477	Move remote read handler to remote package. (#8536 ) * Move remote read handler to remote package. This follows the pattern I started with the remote write handler. The api/v1 package is getting pretty cluttered. Moving code to other packages helps reduce this size and also makes it reusable - eg Cortex doesn't do streaming remote writes yet, and will very soon. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> * Deal with a nil remoteReadHandler for tests. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> * Remove the global metrics. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> * Fix test. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> * Review feedback. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2021-02-26 16:43:19 +00:00
Harkishen-Singh	79ba53a6c4	Custom headers on remote-read and refactor implementation to roundtripper. Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>	2021-02-26 17:20:29 +05:30
Tom Wilkie	7369561305	Combine Appender.Add and AddFast into a single Append method. (#8489 ) This moves the label lookup into TSDB, whilst still keeping the cached-ref optimisation for repeated Appends. This makes the API easier to consume and implement. In particular this change is motivated by the scrape-time-aggregation work, which I don't think is possible to implement without it as it needs access to label values. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2021-02-18 17:37:00 +05:30
Harkishen-Singh	77c20fd2f8	Adds support to configure retry on Rate-Limiting from remote-write config. Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>	2021-02-16 14:52:49 +05:30
Harkishen Singh	cd412470d7	Consider status code 429 as recoverable errors to avoid resharding (#8237 ) * Consider status code 429 as recoverable errors to avoid resharding. * Adds support for Retry-After in backoff logic in remote storage. Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>	2021-02-10 15:25:37 -07:00
fuling	47d7e3781f	[fix] remote_storage : change "write_hander.go" to "write_handler.go" Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>	2021-02-10 14:25:04 +08:00
Mauro Stettler	7715fe3219	Add matchers to LabelValues() call (#8400 ) * Accept matchers in querier LabelValues() Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * create matcher to only select metrics which have searched label Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * test case for merge querier with matchers Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * test LabelValues with matchers on head Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * add test for LabelValues on block Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * formatting fix Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * Add comments Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * add missing lock release Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * remove unused parameter Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * Benchmarks for LabelValues() methods on block/head Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * Better comment Co-authored-by: Julien Pivotto <roidelapluie@gmail.com> Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * update comment Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * minor refactor make code cleaner Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * better comments Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * fix expected errors in test Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * Deleting parameter which can only be empty Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * fix comments Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * remove unnecessary lock Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * only lookup label value if label name was looked up Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * Return error when there is one Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * Call .Get() on decoder before checking errors Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * only lock head.symMtx when necessary Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * remove unnecessary delete() Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * re-use code instead of duplicating it Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * Consistently return error from LabelValueFor() Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * move helper func from util.go to querier.go Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * Fix test expectation Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> * ensure result de-duplication and sorting works Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * return named error from LabelValueFor() Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> Co-authored-by: Julien Pivotto <roidelapluie@gmail.com> Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>	2021-02-09 23:08:35 +05:30
Tom Wilkie	d479151f1f	Various enhancements and refactorings for remote write receiver: - Remove unrelated changes - Refactor code out of the API module - that is already getting pretty crowded. - Don't track reference for AddFast in remote write. This has the potential to consume unlimited server-side memory if a malicious client pushes a different label set for every series. For now, its easier and safer to always use the 'slow' path. - Return 400 on out of order samples. - Use remote.DecodeWriteRequest in the remote write adapters. - Put this behing the 'remote-write-server' feature flag - Add some (very) basic docs. - Used named return & add test for commit error propagation Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2021-02-08 20:41:23 +00:00
Julien Pivotto	6368227f0c	Merge pull request #8443 from liguozhong/remote_storage [remote storage] remove sendWriteRequestWithBackoff() "s" and "req" param	2021-02-04 23:28:14 +01:00
Nándor István Krácser	509000269a	remote_write: allow passing along custom HTTP headers (#8416 ) * remote_write: allow passing along custom HTTP headers Signed-off-by: Nandor Kracser <bonifaido@gmail.com> * add warning Signed-off-by: Nandor Kracser <bonifaido@gmail.com> * remote_write: add header valadtion Signed-off-by: Nandor Kracser <bonifaido@gmail.com> * extend tests for bad remote write headers Signed-off-by: Nandor Kracser <bonifaido@gmail.com> * remote_write: add note about the authorization header Signed-off-by: Nandor Kracser <bonifaido@gmail.com>	2021-02-04 14:18:13 -07:00
fuling	4a407210f5	[remote storage] remove sendWriteRequestWithBackoff() "s" and "req" param Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>	2021-02-04 21:38:32 +08:00
Chris Marchbanks	275f7e7766	Log recoverable remote write errors as warnings (#8412 ) Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2021-01-27 09:38:34 -07:00
kevinForMyself	db445844d3	Fix garbage collection about t.droppedSeries in QueueManager.SeriesReset. (#8387 ) * Fix memory leak about t.droppedSeries in QueueManager.SeriesReset. Signed-off-by: kevinForMyself <zise_2001@163.com> * Fix garbage collection about t.droppedSeries in QueueManager.SeriesReset Signed-off-by: kevinForMyself <zise_2001@163.com>	2021-01-22 08:03:10 -07:00
Ben Ye	caa173d2aa	Support matchers for Labels API (#8301 ) Signed-off-by: Ben Ye <yb532204897@gmail.com> Co-authored-by: Erik Klockare <eklockare@gmail.com>	2020-12-22 11:02:19 +00:00
gotjosh	4eca4dffb8	Allow metric metadata to be propagated via Remote Write. (#6815 ) * Introduce a metadata watcher Similarly to the WAL watcher, its purpose is to observe the scrape manager and pull metadata. Then, send it to a remote storage. Signed-off-by: gotjosh <josue@grafana.com> * Additional fixes after rebasing. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Rework samples/metadata metrics. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Use more descriptive variable names in MetadataWatcher collect. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix issues caused during rebasing. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix missing metric add and unneeded config code. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address some review comments. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix metrics and docs Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Replace assert with require Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Bring back max_samples_per_send metric Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Fix tests Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> Co-authored-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-11-19 20:53:03 +05:30
Julien Pivotto	6c56a1faaa	Testify: move to require (#8122 ) * Testify: move to require Moving testify to require to fail tests early in case of errors. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * More moves Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-29 09:43:23 +00:00
Bartlomiej Plotka	3d8826a3d4	MultiError: Refactored MultiError for more concise and safe usage. (#8066 ) * MultiError: Refactored MultiError for more concise and safe usage. * Less lines * Goland IDE was marking every usage of old MultiError "potential nil" error * It was easy to forgot using Err() when error was returned, now it's safely assured on compile time. NOTE: Potentially I would rename package to merrors. (: In different PR. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed review comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fix after rebase. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-10-28 15:24:58 +00:00
Jorge Luis Betancourt	4dc755cd27	Add a metric for tracking max_samples_per_send (#8102 ) Currently there is no way of tracking the value of the `max_samples_per_send` configuration option, which is commonly tweaked when integrating with a remote write backend. Signed-off-by: Jorge Luis Betancourt Gonzalez <jorge-luis.betancourt@trivago.com>	2020-10-28 11:39:36 +00:00
Julien Pivotto	1282d1b39c	Refactor test assertions (#8110 ) * Refactor test assertions This pull request gets rid of assert.True where possible to use fine-grained assertions. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-27 11:06:53 +01:00
Julien Pivotto	4e5b1722b3	Move away from testutil, refactor imports (#8087 ) Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-22 11:00:08 +02:00
Julien Pivotto	8c9850c2bb	Remote: Do not collect non-initialized timestamp metrics (#8060 ) * Remote: Do not collect non-initialized timestamp metrics Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-15 23:53:59 +02:00
Frederic Branczyk	21a2e8c320	storage/remote/intern_test.go: Fix linting errors Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>	2020-10-08 09:50:37 +02:00
Siddhant Sinha	d9dcf2c90f	Fixes #7982 . Increase maxErrMsgLen value remote_write api (#8017 ) Signed-off-by: Siddhant Sinha <sid.sinha94@gmail.com>	2020-10-08 00:13:09 +01:00
Harkishen Singh	072b9649a3	Refactor vars to avoid test failures in storage/remote with -count > 1 (#7934 ) * Refactor global vars to avoid failure with run test more than once. Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com> * Register highestRecvTimestamp metric. Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com> * Use local interner vars. Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com> * Declare interner in write storage. Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>	2020-09-24 12:44:18 -06:00
Chris Marchbanks	f0f8e50567	Fix missing remote read spans (#7914 ) The remote read client needs to use the nethttp.Transport wrapper in order for spans to be instrumented properly. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2020-09-09 14:03:48 +02:00
Julien Pivotto	a6ee1f8517	Merge pull request #7913 from prometheus/release-2.21 Merge release 2.21 into master	2020-09-09 11:08:32 +02:00
Bartlomiej Plotka	088fcc9e48	Fixed iterator regression: Avoid using heap for each sample when iterating. (#7900 ) * Fixed iterator regression: Avoid using heap for each sample when iterating. Fixes: https://github.com/prometheus/prometheus/issues/7873 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Check for .At() called after .Next() returned false Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * More comments Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-09-08 16:23:01 +02:00
Bartlomiej Plotka	a399227a9f	Revert "Fixed iterator regression: Avoid using heap for each sample when iterating." This reverts commit `2c8b2c5915`.	2020-09-04 17:10:42 +01:00
Julien Pivotto	2c8b2c5915	Fixed iterator regression: Avoid using heap for each sample when iterating. Fixes: https://github.com/prometheus/prometheus/issues/7873 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-09-04 17:09:40 +01:00
Anand Sanmukhani	f4a1b3cef6	Rename `remote.newReadClient()` to `remote.NewReadClient()` (#7881 ) This should make the `NewReadClient()` exported outside the `remote` package Signed-off-by: Anand Sanmukhani <asanmukh@redhat.com>	2020-09-02 17:15:10 +01:00
showuon	dfdc358a5b	Fix the duplicated results issue from /api/v1/series (#7862 ) * Fix the duplicated results issue from /api/v1/series Signed-off-by: Luke Chen <showuon@gmail.com>	2020-08-29 01:21:39 +02:00
Robert Fratto	2bd077ed97	expose UserAgent to make it changeable by importers (#7832 ) Signed-off-by: Robert Fratto <robertfratto@gmail.com>	2020-08-25 10:38:37 -06:00
Joe Elliott	624075eafe	Exposes remote storage http client to allow for customization (#7831 ) * Exposes remote write client to allow for customizations to the http client Signed-off-by: Joe Elliott <number101010@gmail.com> * Removed changes to vendored file Signed-off-by: Joe Elliott <number101010@gmail.com>	2020-08-20 08:45:31 -06:00
Andy Bursavich	4e6a94a27d	Invert service discovery dependencies (#7701 ) This also fixes a bug in query_log_file, which now is relative to the config file like all other paths. Signed-off-by: Andy Bursavich <abursavich@gmail.com>	2020-08-20 13:48:26 +01:00
Julien Pivotto	a55c69c4c3	Apply gofmt -s on storage/remote/write_test.go (#7791 ) Noticed in https://goreportcard.com/report/github.com/prometheus/prometheus This is the only file. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-08-12 19:55:42 -06:00
Mucahit Kurt	869f1bc587	use remote.Storage in TestSampleDelivery instead of direct creation of QueueManager (#4758 ) Signed-off-by: Mucahit Kurt <mucahitkurt@gmail.com>	2020-08-11 12:37:03 -07:00
johncming	2b75c1b199	storage/remote: rename httpclient name. (#7747 ) Signed-off-by: johncming <johncming@yahoo.com>	2020-08-10 11:32:43 -07:00
Guangwen Feng	2c4a4548a8	Fix golint issue caused by incorrect func name (#7756 ) Signed-off-by: Guangwen Feng <fenggw-fnst@cn.fujitsu.com>	2020-08-06 20:27:37 +01:00
Bartlomiej Plotka	28c5cfaf0d	tsdb: Moved code merge series and iterators to differen files; cleanup. No functional changes just move! (#7714 ) I did not want to move those in previous PR to make it easier to review. Now small cleanup time for readability. (: ## Changes * Merge series goes to `storage/merge.go` leaving `fanout.go` for just fanout code. * Moved `fanout test` code from weird separate package to storage. * Unskiped one test: TestFanout_SelectSorted/chunk_querier * Moved block series set codes responsible for querying blocks to `querier.go` from `compact.go` Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-08-03 11:32:56 +01:00
Julien Pivotto	9848e77678	storage/remote: remove unused code now that tsdb implements ChunkQuerier (#7715 ) * storage/remote: remove unused code now that tsdb implements ChunkQuerier Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * encodeChunks is unused too Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-08-03 10:13:36 +02:00
Bartlomiej Plotka	e6d7cc5fa4	tsdb: Added ChunkQueryable implementations to db; unified MergeSeriesSets and vertical to single struct. (#7069 ) * tsdb: Added ChunkQueryable implementations to db; unified compactor, querier and fanout block iterating. Chained to https://github.com/prometheus/prometheus/pull/7059 * NewMerge(Chunk)Querier now takies multiple primaries allowing tsdb DB code to use it. * Added single SeriesEntry / ChunkEntry for all series implementations. * Unified all vertical, and non vertical for compact and querying to single merge series / chunk sets by reusing VerticalSeriesMergeFunc for overlapping algorithm (same logic as before) * Added block (Base/Chunk/)Querier for block querying. We then use populateAndTomb(Base/Chunk/) to iterate over chunks or samples. * Refactored endpoint tests and querier tests to include subtests. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed comments from Brian and Beorn. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed snapshot test and added chunk iterator support for DBReadOnly. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed race when iterating over Ats first. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed tests. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed populate block tests. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed endpoints test. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed test. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Added test & fixed case of head open chunk. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed DBReadOnly tests and bug producing 1 sample chunks. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Added cases for partial block overlap for multiple full chunks. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Added extra tests for chunk meta after compaction. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed small vertical merge bug and added more tests for that. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-07-31 16:03:02 +01:00
Annanay	ec562f152b	Merge branch 'master' into appender-context Signed-off-by: Annanay <annanayagarwal@gmail.com>	2020-07-31 13:03:56 +05:30
Annanay	9bba8a6eae	Merge branch 'master' into appender-context Signed-off-by: Annanay <annanayagarwal@gmail.com>	2020-07-30 16:43:18 +05:30
Annanay	89129cd39a	Address comments Signed-off-by: Annanay <annanayagarwal@gmail.com>	2020-07-30 16:41:13 +05:30
Javier Palomo Almena	b58a613443	Replace sync/atomic with uber-go/atomic (#7683 ) * storage: Replace usage of sync/atomic with uber-go/atomic Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com> * tsdb: Replace usage of sync/atomic with uber-go/atomic Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com> * web: Replace usage of sync/atomic with uber-go/atomic Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com> * notifier: Replace usage of sync/atomic with uber-go/atomic Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com> * cmd: Replace usage of sync/atomic with uber-go/atomic Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com> * scripts: Verify that we are not using restricted packages It checks that we are not directly importing 'sync/atomic'. Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com> * Reorganise imports in blocks Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com> * notifier/test: Apply PR suggestions Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com> * storage/remote: avoid storing references on newEntry Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com> * Revert "scripts: Verify that we are not using restricted packages" This reverts commit `278d32748e`. Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com> * web: Group imports accordingly Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>	2020-07-30 13:15:42 +05:30
Joe Elliott	04b028f1e6	Exports recoverable error (#7689 ) Signed-off-by: Joe Elliott <number101010@gmail.com>	2020-07-29 11:08:25 -06:00
Annanay	f40e4579b7	gofmt Signed-off-by: Annanay <annanayagarwal@gmail.com>	2020-07-24 20:40:19 +05:30
Annanay	7f98a744e5	Add context to Appender interface Signed-off-by: Annanay <annanayagarwal@gmail.com>	2020-07-24 19:40:51 +05:30
Zhou Hao	ddedf454d0	add os.RemoveAll err verification (#7540 ) * add os.RemoveAll err verification for watcher_test Signed-off-by: Zhou Hao <zhouhao@cn.fujitsu.com> * add os.RemoveAll err verification for db_test Signed-off-by: Zhou Hao <zhouhao@cn.fujitsu.com> * add os.RemoveAll err verification for write_test Signed-off-by: Zhou Hao <zhouhao@cn.fujitsu.com> * add os.RemoveAll err verification for queue_manager_test Signed-off-by: Zhou Hao <zhouhao@cn.fujitsu.com> * tsdb/wal/watcher_test: add close operation before delete Signed-off-by: Zhou Hao <zhouhao@cn.fujitsu.com>	2020-07-17 11:47:32 +05:30
Hu Shuai	b962029031	Add a unit test for MergeLabels in storage/remote/codec.go. (#7499 ) This PR is about adding a unit test for MergeLabels in storage/remote/codec.go. Signed-off-by: Hu Shuai <hus.fnst@cn.fujitsu.com>	2020-07-04 22:17:19 -06:00
Ganesh Vernekar	a4c2ea1ca3	Merge remote-tracking branch 'upstream/master' into merge-release-2.19 Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-06-26 14:33:50 +05:30
Chris Marchbanks	b299aba6cf	Fix panic when updating a remote write queue (#7452 ) Right now Queue Manager metrics are registered when the metrics struct is created, which happens before a changed queue is shutdown and the old metrics are unregistered. In the case of named queues or updates to external labels the apply config will panic due to duplicate metrics. Instead, register the metrics as part of starting the queue as we always guarantee that Stop will be called before a new Start. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2020-06-26 12:03:52 +05:30
Chris Marchbanks	d78656c244	Pending Samples metric includes samples in channel (#7335 ) * Pending Samples metric includes samples in channel The pending samples metric should also include samples waiting in the channels to be sent to provide a more accurate measure. In addition, make sure that the pending samples is reset to 0 anytime a queue is started as we remake all of the shards at that time. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com> * Log the number of dropped samples on hard shutdown Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2020-06-25 14:48:30 -06:00
Bartlomiej Plotka	b788986717	storage: Adjusted fully storage layer support for chunk iterators: Remote read client, readyStorage, fanout. (#7059 ) * Fixed nits introduced by https://github.com/prometheus/prometheus/pull/7334 * Added ChunkQueryable implementation to fanout and readyStorage. * Added more comments. * Changed NewVerticalChunkSeriesMerger to CompactingChunkSeriesMerger, removed tiny interface by reusing VerticalSeriesMergeFunc for overlapping algorithm for both chunks and series, for both querying and compacting (!) + made sure duplicates are merged. * Added ErrChunkSeriesSet * Added Samples interface for seamless []promb.Sample to []tsdbutil.Sample conversion. * Deprecating non chunks serieset based StreamChunkedReadResponses, added chunk one. * Improved tests. * Split remote client into Write (old storage) and read. * Queryable client is now SampleAndChunkQueryable. Since we cannot use nice QueryableFunc I moved all config based options to sampleAndChunkQueryableClient to aboid boilerplate. In next commit: Changes for TSDB. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-06-24 14:41:52 +01:00
Nicole J	d5a8f2afc4	Added the remote read histogram (#7334 ) change remote read queries total metric to a histogram and add read requests counter with status code Signed-off-by: njingco <jingco.nicole@gmail.com>	2020-06-16 07:11:41 -07:00
Kemal Akkoyun	66dfb951c4	: Consistent Error/Warning handling for SeriesSet iterator: Allowing Async Select (#7251 ) Add errors and Warnings to SeriesSet Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Change Querier interface and refactor accordingly Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Refactor promql/engine to propagate warnings at eval stage Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Address review issues Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Make sure all the series from all Selects are pre-advanced Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Address review issues Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Separate merge series sets Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Clean Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Refactor merge querier failure handling Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Refactored and simplified fanout with improvements from incoming chunk iterator PRs. * Secondary logic is hidden, instead of weird failed series set logic we had. * Fanout is well commented * Fanout closing record all errors * MergeQuerier improved API (clearer) * deferredGenericMergeSeriesSet is not needed as we return no samples anyway for failed series sets (next = false). Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fix formatting Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix CI issues Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Added final tests for error handling. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed Brian's comments. * Moved hints in populate to be allocated only when needed. * Used sync.Once in secondary Querier to achieve all-or-nothing partial response logic. * Select after first Next is done will panic. NOTE: in lazySeriesSet in theory we could just panic, I think however we can totally just return error, it will panic in expand anyway. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Utilize errWithWarnings Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix recently introduced expansion issue Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add tests for secondary querier error handling Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Implement lazy merge Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add name to test cases Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Reorganize Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Address review comments Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Address review comments Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Remove redundant warnings Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix rebase mistake Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-06-09 17:57:31 +01:00
Bartlomiej Plotka	8f0a924be9	Merge branch 'release-2.18' into get2.18.2-to-master	2020-06-09 10:04:44 +01:00
Marco Pracucci	bc1d7d7f27	Fixed incorrect query results caused by buffer reuse in merge adapter (#7361 ) Signed-off-by: Marco Pracucci <marco@pracucci.com>	2020-06-09 05:45:18 +01:00
Marco Pracucci	1e006d84a0	Fixed incorrect query results caused by buffer reuse in merge adapter (#7361 ) Signed-off-by: Marco Pracucci <marco@pracucci.com>	2020-06-08 19:55:53 +01:00
Bert Hartmann	82c7cd320a	increase the remote write bucket range (#7323 ) * increase the remote write bucket range Increase the range of remote write buckets to capture times above 10s for laggy scenarios Buckets had been: {.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10} Buckets are now: {0.03125, 0.0625, 0.125, 0.25, 0.5, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512} Signed-off-by: Bert Hartmann <berthartm@gmail.com> * revert back to DefBuckets with addons to be backwards compatible Signed-off-by: Bert Hartmann <berthartm@gmail.com> * shuffle the buckets to maintain 2-2.5x increases Signed-off-by: Bert Hartmann <berthartm@gmail.com>	2020-06-04 13:54:47 -06:00
Nicole J	5e9bd17b1f	added the prometheus_remote_storage_remote_read_queries_total (#7328 ) * added the prometheus_remote_storage_remote_read_queries_total query Signed-off-by: njingco <jingco.nicole@gmail.com> * adjusted the help label of remoteReadQueriesTotal Signed-off-by: njingco <jingco.nicole@gmail.com>	2020-06-03 10:30:52 -07:00
Cody Boggs	3268eac2dd	Trace Remote Write requests (#7206 ) * Trace Remote Write requests Signed-off-by: Cody Boggs <cboggs@splunk.com> * Refactor store attempts to keep code flow clearer, and avoid so many places to deal with span finishing Signed-off-by: Cody Boggs <cboggs@splunk.com>	2020-06-01 09:21:13 -06:00
Chris Marchbanks	c1f9917e90	Add test for unregistering queue manager metrics Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2020-05-05 14:14:04 -06:00
Chris Marchbanks	dfad1da296	Remove duplicate metrics in QueueManager Right now any new metrics added for remote write need to be added to both the QueueManager struct, and the queueManagerMetrics struct. Instead, use the queueManagerMetrics struct directly from QueueManager. The newQueueManagerMetrics constructor will now create the metrics for a specific queue with name and endpoint pre-populated, and a new copy of the struct will be created specifically for each queue. This also fixes a bug where prometheus_remote_storage_sent_bytes_total is not being unregistered after a queue is changed. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2020-05-05 14:13:59 -06:00
Julien Pivotto	7ecd2d1c24	Jaeger: Create child span for remote read (#7187 ) * Jaeger: Create child span for remote read * Jaeger: use middleware to trace client http request Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-05-02 22:41:55 +02:00
qinng	f36ae1c21c	[remote-storage] use warn log level when send samples to remote failed (#7184 ) [remote] increasing sendbatch error log level Signed-off-by: guoruyi1 <guoruyi1@xiaomi.com> Co-authored-by: guoruyi1 <guoruyi1@xiaomi.com>	2020-04-30 17:06:22 -06:00
Vasily Sliouniaev	0393b188c9	Add Jaeger (#7148 ) * Trace remote read Signed-off-by: vas <vasily.sliouniaev@jet.com> * Use jaeger Signed-off-by: vas <vasily.sliouniaev@jet.com>	2020-04-23 02:05:55 +02:00
Marek Slabicki	4b5e7d4984	Adding a shouldReshard function to modularize logic for the QueueManager deciding if it should shard or not (#7143 ) Signed-off-by: Marek Slabicki <thaniri@gmail.com>	2020-04-20 16:20:39 -06:00
Chris Marchbanks	cd12f0873c	Merge pull request #7073 from csmarchbanks/fix-md5-remote-write Fix remote write not updating when relabel configs or secrets change	2020-04-16 16:36:25 -06:00
Chris Marchbanks	5ab6b043c1	Always update lastSendTimestamp after a request (#7122 ) If the server is returning non-recoverable errors, such as if we are trying to push samples that are too old, remote write will never reshard. Non-recoverable errors should be treated the same as success for the purpose of resharding, just as we do with sample rates and durations. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2020-04-15 09:03:28 -06:00
ZouYu	2b7437d60e	Fix some warnings: 'redundant type from array, slice, or map composite literal' (#7109 ) Signed-off-by: ZouYu <zouy.fnst@cn.fujitsu.com>	2020-04-15 11:17:41 +01:00
Chris Marchbanks	d88a2b0261	Handle secret changes in remote write ApplyConfig Remake the http client whenever ApplyConfig is called. This allows secrets to be updated without needing to restart an otherwise unchanged queue. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2020-04-13 23:14:15 +00:00
Simon Pasquier	317e73de79	Hash YAML instead of JSON But it doesn't work either because of secret fields. Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2020-04-13 22:32:37 +00:00
Simon Pasquier	8cc84660fa	storage/remote: add tests for config changes Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2020-04-13 22:32:37 +00:00
Marek Slabicki	8224ddec23	Capitalizing first letter of all log lines (#7043 ) Signed-off-by: Marek Slabicki <thaniri@gmail.com>	2020-04-11 09:22:18 +01:00
ga	9a21fdcd1b	[storage] clean imports (#7099 ) Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>	2020-04-07 22:05:39 +01:00
Callum Styan	be13a4ba7e	Compare querier storage to primary storage via reflect.DeepEqual. (#7050 ) Signed-off-by: Callum Styan <callumstyan@gmail.com>	2020-03-25 18:42:32 +00:00
Bartlomiej Plotka	d5c33877f9	storage: Added Chunks{Queryable/Querier/SeriesSet/Series/Iteratable. Added generic Merge{SeriesSet/Querier} implementation. (#7005 ) * storage: Added Chunks{Queryable/Querier/SeriesSet/Series/Iteratable. Added generic Merge{SeriesSet/Querier} implementation. ## Rationales: In many places (e.g. chunk Remote read, Thanos Receive fetching chunk from TSDB), we operate on encoded chunks not samples. This means that we unnecessary decode/encode, wasting CPU, time and memory. This PR adds chunk iterator interfaces and makes the merge code to be reused between both seriesSets I will make the use of it in following PR inside tsdb itself. For now fanout implements it and mergers. All merges now also allows passing series mergers. This opens doors for custom deduplications other than TSDB vertical ones (e.g. offline one we have in Thanos). ## Changes * Added Chunk versions of all iterating methods. It all starts in Querier/ChunkQuerier. The plan is that Storage will implement both chunked and samples. * Added Seek to chunks.Iterator interface for iterating over chunks. * NewMergeChunkQuerier was added; Both this and NewMergeQuerier are now using generigMergeQuerier to share the code. Generic code was added. * Improved tests. * Added some TODO for further simplifications in next PRs. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed Brian's comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Moved s/Labeled/SeriesLabels as per Krasi suggestion. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed Krasi's comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Second iteration of Krasi comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Another round of comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-03-24 20:15:47 +00:00
beorn7	526cff39b9	Fix tests that were broken by #7009 Signed-off-by: beorn7 <beorn@grafana.com>	2020-03-20 21:22:58 +01:00
Bartlomiej Plotka	c4eefd1b3a	storage: Removed SelectSorted method; Simplified interface; Added requirement for remote read to sort response. This is technically BREAKING CHANGE, but it was like this from the beginning: I just notice that we rely in Prometheus on remote read being sorted. This is because we use selected data from remote reads in MergeSeriesSet which rely on sorting. I found during work on https://github.com/prometheus/prometheus/pull/5882 that we do so many repetitions because of this, for not good reason. I think I found a good balance between convenience and readability with just one method. Smaller the interface = better. Also I don't know what TestSelectSorted was testing, but now it's testing sorting. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-03-20 21:14:43 +01:00
Callum Styan	f802f1e8ca	Fix bug with WAL watcher and Live Reader metrics usage. (#6998 ) * Fix bug with WAL watcher and Live Reader metrics usage. Calling NewXMetrics when creating a Watcher or LiveReader results in a registration error, which we're ignoring, and as a result other than the first Watcher/Reader created, we had no metrics for either. So we would only have metrics like Watcher Records Read for the first remote write config in a users config file. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2020-03-20 17:34:15 +01:00
Chris Marchbanks	3128875ff4	Fix panic when a remote read store errors (#6975 ) Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2020-03-13 20:57:29 +01:00
beorn7	f6f4fd6556	tsdb: Do a full rollback upon commit error I think the previous behavior is problematic as it will leave `memSeries` around that still have `pendingCommit` set to `true`. The only case where this can happen in this code path is a failure to write to the WAL, in which case we are probably in trouble anyway. I believe, however, we should still try to do the right thing and do the full rollback. This will implicitly try to write to the WAL again, but this time without samples, which may even succeed. (But we propagate the previous error in any case.) This also adds `a.head.putSeriesBuffer(a.sampleSeries)` to Rollback, which was previously missing. Signed-off-by: beorn7 <beorn@grafana.com>	2020-03-10 14:54:41 +01:00
Callum Styan	1518083168	Rw testability improvements (#6537 ) * Change createTimeseries to take values for number of series and number of samples per series. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Take num of samples to expect in expectSampleCount instead of array of samples. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Add field to TestStorageClient to ignore samples sent waitgroup for potential tests where we don't care about delivery of all samples. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix up tests a little bit. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2020-02-25 11:10:57 -08:00
李国忠	f7a322fa6d	[comments] change the word "liike" to "like" (#6859 ) Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>	2020-02-22 20:48:54 +00:00
Bartlomiej Plotka	59c9d6ef45	Addressed Brian's comments, moved metrics to main.go Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-02-17 18:03:57 +00:00
Bartlomiej Plotka	cfba92a133	Addressed comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-02-17 18:03:57 +00:00
Bartlomiej Plotka	fb79f515fc	Fixed second bug. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-02-17 18:03:57 +00:00
Bartlomiej Plotka	2cf637fbf5	Addressed comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-02-17 18:03:57 +00:00
Bartlomiej Plotka	34426766d8	Unify Iterator interfaces. All point to storage now. This is part of https://github.com/prometheus/prometheus/pull/5882 that can be done to simplify things. All todos I added will be fixed in follow up PRs. * querier.Querier, querier.Appender, querier.SeriesSet, and querier.Series interfaces merged with storage interface.go. All imports that. * querier.SeriesIterator replaced by chunkenc.Iterator * Added chunkenc.Iterator.Seek method and tests for xor implementation (?) * Since we properly handle SelectParams for Select methods I adjusted min max based on that. This should help in terms of performance for queries with functions like offset. * added Seek to deletedIterator and test. * storage/tsdb was removed as it was only a unnecessary glue with incompatible structs. No logic was changed, only different source of abstractions, so no need for benchmarks. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-02-17 18:03:54 +00:00
李国忠	3cd6a5b050	Storage concurrently tests and bug fix (#6808 ) * Storage concurrently tests and bug fix Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>	2020-02-12 08:58:52 +00:00
李国忠	40dd13b074	Storage concurrently (#6770 ) * Storage concurrently Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>	2020-02-11 17:19:34 +00:00
helenxu1221	7df4fe3faa	reset counter after collecting metric (#6798 ) Signed-off-by: HelenXu <helenxu@Helens-MacBook-Pro.local>	2020-02-09 20:51:21 -07:00
Robert Fratto	a53e00f9fd	pass registerer from storage to queue manager for its metrics (#6728 ) * pass registerer from storage to queue manager for its metrics Signed-off-by: Robert Fratto <robert.fratto@grafana.com>	2020-02-03 13:47:03 -08:00
Peter Štibraný	08c5549055	Document that NewMergeSeriesSet expects individual sets to be sorted. (#6718 ) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>	2020-01-30 19:30:32 +05:30
Brian Brazil	38d32e0686	Don't sort postings if we only have one block. Sorting the heads postings can be quite slow. We only need sorted series when merging with another querier, so only sort then. This will make big queries that only touch the head faster, though queries that touch both the head and a block will still be the same speed. This probably won't help much with graphing unless the range is under an hour, however it should make most recording rules faster. Add gaurantee that remote read streaming produces sorted series. PromQL benchmarks for histograms show only 2-3% improvement, but they're only over 1k series. benchmark old ns/op new ns/op delta BenchmarkQuerierSelect/Head/1of1000000-4 1375486282 507657736 -63.09% BenchmarkQuerierSelect/Head/10of1000000-4 1387859004 507769850 -63.41% BenchmarkQuerierSelect/Head/100of1000000-4 1387087935 506029110 -63.52% BenchmarkQuerierSelect/Head/1000of1000000-4 1386869064 504521986 -63.62% BenchmarkQuerierSelect/Head/10000of1000000-4 1386213685 505210422 -63.55% BenchmarkQuerierSelect/Head/100000of1000000-4 1392754988 529842406 -61.96% BenchmarkQuerierSelect/Head/1000000of1000000-4 1569414722 725059506 -53.80% BenchmarkQuerierSelect/SortedHead/1of1000000-4 1381019902 1370495863 -0.76% BenchmarkQuerierSelect/SortedHead/10of1000000-4 1375696209 1366789468 -0.65% BenchmarkQuerierSelect/SortedHead/100of1000000-4 1386009422 1364519297 -1.55% BenchmarkQuerierSelect/SortedHead/1000of1000000-4 1377700532 1364486191 -0.96% BenchmarkQuerierSelect/SortedHead/10000of1000000-4 1383539536 1369545314 -1.01% BenchmarkQuerierSelect/SortedHead/100000of1000000-4 1410089163 1394731339 -1.09% BenchmarkQuerierSelect/SortedHead/1000000of1000000-4 1634744148 1581554956 -3.25% BenchmarkQuerierSelect/Block/1of1000000-4 881741242 879839470 -0.22% BenchmarkQuerierSelect/Block/10of1000000-4 880381562 882846038 +0.28% BenchmarkQuerierSelect/Block/100of1000000-4 887519357 881016916 -0.73% BenchmarkQuerierSelect/Block/1000of1000000-4 902194205 883433524 -2.08% BenchmarkQuerierSelect/Block/10000of1000000-4 892321964 885130170 -0.81% BenchmarkQuerierSelect/Block/100000of1000000-4 938604466 933527150 -0.54% BenchmarkQuerierSelect/Block/1000000of1000000-4 1313510845 1295881124 -1.34% benchmark old allocs new allocs delta BenchmarkQuerierSelect/Head/1of1000000-4 4000056 4000018 -0.00% BenchmarkQuerierSelect/Head/10of1000000-4 4000074 4000036 -0.00% BenchmarkQuerierSelect/Head/100of1000000-4 4000254 4000216 -0.00% BenchmarkQuerierSelect/Head/1000of1000000-4 4002054 4002016 -0.00% BenchmarkQuerierSelect/Head/10000of1000000-4 4020054 4020016 -0.00% BenchmarkQuerierSelect/Head/100000of1000000-4 4200054 4200016 -0.00% BenchmarkQuerierSelect/Head/1000000of1000000-4 6000054 6000016 -0.00% BenchmarkQuerierSelect/SortedHead/1of1000000-4 4000071 4000071 +0.00% BenchmarkQuerierSelect/SortedHead/10of1000000-4 4000089 4000089 +0.00% BenchmarkQuerierSelect/SortedHead/100of1000000-4 4000269 4000269 +0.00% BenchmarkQuerierSelect/SortedHead/1000of1000000-4 4002069 4002069 +0.00% BenchmarkQuerierSelect/SortedHead/10000of1000000-4 4020069 4020069 +0.00% BenchmarkQuerierSelect/SortedHead/100000of1000000-4 4200069 4200069 +0.00% BenchmarkQuerierSelect/SortedHead/1000000of1000000-4 6000069 6000069 +0.00% BenchmarkQuerierSelect/Block/1of1000000-4 6000023 6000022 -0.00% BenchmarkQuerierSelect/Block/10of1000000-4 6000059 6000058 -0.00% BenchmarkQuerierSelect/Block/100of1000000-4 6000419 6000418 -0.00% BenchmarkQuerierSelect/Block/1000of1000000-4 6004019 6004018 -0.00% BenchmarkQuerierSelect/Block/10000of1000000-4 6040019 6040018 -0.00% BenchmarkQuerierSelect/Block/100000of1000000-4 6400019 6400018 -0.00% BenchmarkQuerierSelect/Block/1000000of1000000-4 10000020 10000019 -0.00% benchmark old bytes new bytes delta BenchmarkQuerierSelect/Head/1of1000000-4 229192200 176001176 -23.21% BenchmarkQuerierSelect/Head/10of1000000-4 229193352 176002328 -23.21% BenchmarkQuerierSelect/Head/100of1000000-4 229204872 176013848 -23.21% BenchmarkQuerierSelect/Head/1000of1000000-4 229320072 176129048 -23.20% BenchmarkQuerierSelect/Head/10000of1000000-4 230472072 177281048 -23.08% BenchmarkQuerierSelect/Head/100000of1000000-4 241992072 188801048 -21.98% BenchmarkQuerierSelect/Head/1000000of1000000-4 357192072 304001048 -14.89% BenchmarkQuerierSelect/SortedHead/1of1000000-4 229193928 229193928 +0.00% BenchmarkQuerierSelect/SortedHead/10of1000000-4 229195080 229195080 +0.00% BenchmarkQuerierSelect/SortedHead/100of1000000-4 229206600 229206600 +0.00% BenchmarkQuerierSelect/SortedHead/1000of1000000-4 229321800 229321800 +0.00% BenchmarkQuerierSelect/SortedHead/10000of1000000-4 230473800 230473800 +0.00% BenchmarkQuerierSelect/SortedHead/100000of1000000-4 241993800 241993800 +0.00% BenchmarkQuerierSelect/SortedHead/1000000of1000000-4 357193800 357193800 +0.00% BenchmarkQuerierSelect/Block/1of1000000-4 227201516 227201500 -0.00% BenchmarkQuerierSelect/Block/10of1000000-4 227202924 227202908 -0.00% BenchmarkQuerierSelect/Block/100of1000000-4 227217036 227217020 -0.00% BenchmarkQuerierSelect/Block/1000of1000000-4 227358156 227358140 -0.00% BenchmarkQuerierSelect/Block/10000of1000000-4 228769356 228769340 -0.00% BenchmarkQuerierSelect/Block/100000of1000000-4 242881356 242881340 -0.00% BenchmarkQuerierSelect/Block/1000000of1000000-4 384001616 384001600 -0.00% Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2020-01-28 09:14:56 +00:00
Anand Singh Kunwar	aa61e392b2	Make remote client `Store` use passed context (#6673 ) * Remote store client's `Store` API currently doesn't use passed context, but instead just constructs a new `context.Background()` Signed-off-by: Anand Singh Kunwar <anandkunwar95@gmail.com>	2020-01-27 07:43:20 -07:00
Julien Pivotto	cf42888e4d	Fix order of testutil.Equals (#6695 ) Equals takes the expected value as first parameter, and the actual value as second parameter. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-01-27 12:21:59 +00:00
Julien Pivotto	aad8f89ecb	Remote storage: propagate json marshal errors (#6622 ) Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-01-14 08:40:30 +00:00
Chris Marchbanks	7f3aca62c4	Only reduce the number of shards when caught up. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2020-01-06 14:53:23 -07:00
Chris Marchbanks	9e24e1f9e8	Use samplesPending rather than integral The integral accumulator in the remote write sharding code is just a second way of keeping track of the number of samples pending. Remove integralAccumulator and use the samplesPending value we already calculate to calculate the number of shards. This has the added benefit of fixing a bug where the integralAccumulator was not being initialized correctly due to not taking into account the number of ticks being counted, causing the integralAccumulator initial value to be off by an order of magnitude in some cases. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2020-01-06 14:53:23 -07:00
Chris Marchbanks	847c66a843	Add sharding test Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2020-01-06 14:53:23 -07:00
Josh Soref	91d76c8023	Spelling (#6517 ) * spelling: alertmanager Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: attributes Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: autocomplete Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: bootstrap Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: caught Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: chunkenc Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: compaction Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: corrupted Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: deletable Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: expected Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: fine-grained Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: initialized Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: iteration Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: javascript Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: multiple Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: number Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: overlapping Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: possible Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: postings Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: procedure Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: programmatic Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: queuing Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: querier Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: repairing Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: received Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: reproducible Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: retention Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: sample Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: segements Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: semantic Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: software [LICENSE] Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: staging Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: timestamp Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: unfortunately Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: uvarint Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: subsequently Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: ressamples Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>	2020-01-02 15:54:09 +01:00
Julien Pivotto	31700a05df	Improve testutil.ErrorEqual (#6471 ) Also improves TestPopulateLabels: testutil.ErrorEqual just returned a bool without failing the test. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2019-12-17 21:11:33 +00:00
Callum Styan	67838643ee	Add config option for remote job name (#6043 ) * Track remote write queues via a map so we don't care about index. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Support a job name for remote write/read so we can differentiate between them using the name. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Remote write/read has Name to not confuse the meaning of the field with scrape job names. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Split queue/client label into remote_name and url labels. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Don't allow for duplicate remote write/read configs. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Ensure we restart remote write queues if the hash of their config has not changed, but the remote name has changed. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Include name in remote read/write config hashes, simplify duplicates check, update test accordingly. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-12-12 12:47:23 -08:00
Garrett	5a9c4acfbf	Pushdown aggregator group by through read hint (#6401 ) * Pushdown aggregator group by through read hint Implement https://github.com/prometheus/prometheus/issues/6400 * add temporal aggregation pushdown support Signed-off-by: xiancli <xiancli@ebay.com>	2019-12-05 14:06:28 +00:00
Chris Marchbanks	5000c05378	Merge pull request #6378 from prometheus/accurate-desired-shards-metric Change desired shards metric to report raw calculated value	2019-12-03 15:26:21 -07:00
Callum Styan	5830e03691	Merge pull request #6337 from cstyan/rw-log-replay Log the start and end of the WAL replay within the WAL watcher.	2019-12-03 13:24:26 -08:00
Callum Styan	6a24eee340	Simplify duration check for watcher WAL replay. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-11-26 16:53:11 -08:00
Chris Marchbanks	6f34e35b3e	Record the exact value of desired shards in metric It is possible that desired shards is always a bit higher than the number of shards (less than 30%) and by exporting desired shards as the raw number it will be easy to tell if a Prometheus is in that situation. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-11-26 06:26:03 -07:00
Chris Marchbanks	0e684ca205	Fix unknown type in sharding up log Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-11-26 06:22:56 -07:00
Callum Styan	c2cb1e4103	Add a metric to track total bytes sent per remote write queue. (#6344 ) Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-11-25 13:25:18 -07:00
Tom Wilkie	de0a772b8e	Port tsdb to use pkg/labels. (#6326 ) * Port tsdb to use pkg/labels. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> * Get tests passing. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> * Remove useless cast. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> * Appease linters. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> * Fix review comments Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2019-11-18 11:53:33 -08:00
Callum Styan	5f1be2cf45	Refactor calculateDesiredShards + don't reshard if we're having issues sending samples. (#6111 ) * Refactor calculateDesiredShards + don't reshard if we're having issues sending samples. * Track lastSendTimestamp via an int64 with atomic add/load, add a test for reshard calculation. * Simplify conditional for skipping resharding, add samplesIn/Out to shard testcase struct. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-10-21 15:54:25 -06:00
Krasi Georgiev	81d284f806	Merge the 2.13 release branch to master (#6117 )	2019-10-09 17:41:46 +02:00
Callum Styan	84ff928606	Make sure the remote write storage uses a dedupe logger. (#6113 ) Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-10-08 11:42:00 -06:00
Chris Marchbanks	8df4bca470	Garbage collect asynchronously in the WAL Watcher The WAL Watcher replays a checkpoint after it is created in order to garbage collect series that no longer exist in the WAL. Currently the garbage collection process is done serially with reading from the tip of the WAL which can cause large delays in writing samples to remote storage just after compaction occurs. This also fixes a memory leak where dropped series are not cleaned up as part of the SeriesReset process. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-10-07 14:36:10 -06:00
George Felix	895abbb7d0	Replaced test validations with testutils on storage/remote/codec_test.go (#6097 ) * Replaced test validations with testutils on storage/remote/codec_test.go Signed-off-by: George Felix <george.felix@ubeeqo.com> * gofmt Signed-off-by: George Felix <george.felix@ubeeqo.com> * Removed shouldPass assertion Signed-off-by: George Felix <gfelixc@gmail.com> * Fixes to improve readability Signed-off-by: George Felix <george.felix@ubeeqo.com> * Fixes based on code review comments Signed-off-by: George Felix <george.felix@ubeeqo.com>	2019-10-07 11:35:53 -06:00
Joe Elliott	95dc59ec7e	Replaced t.Fatalf() with testutil.Assert() in buffer_test.go (#6084 ) * Added Fatal method and used it in buffer_test Signed-off-by: Joe Elliott <number101010@gmail.com> * Added period to meet contributing guidelines Signed-off-by: Joe Elliott <number101010@gmail.com> * Removed fatal testutil method. Refactored test cases to use testutil.Assert Signed-off-by: Joe Elliott <number101010@gmail.com> * Added if found condition for clarity Signed-off-by: Joe Elliott <number101010@gmail.com>	2019-10-02 06:28:08 +01:00
陈谭军	103f26d188	fix the wrong word (#6069 ) Signed-off-by: chentanjun <2799194073@qq.com>	2019-09-30 09:54:55 -06:00
Callum Styan	3344bb5c33	Move WAL watcher code to tsdb/wal package. (#5999 ) * Move WAL watcher code to tsdb/wal package. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix tests after moving WAL watcher code. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Lint fixes. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-09-19 14:45:41 +05:30
Björn Rabenstein	3b3eaf3496	Merge pull request #5787 from cstyan/reshard-max-logging Add metrics for max/min/desired shards to queue manager.	2019-09-09 22:32:54 +02:00
Chris Marchbanks	b4317768b9	Merge pull request #5849 from csmarchbanks/rw-use-labels Cache labels.Labels to Identify Series in Remote Write	2019-09-04 14:35:52 -06:00
Yao Zengzeng	f65b7c296d	fix TODO: only stop & recreate remote write queues which have changes (#5540 ) Signed-off-by: YaoZengzeng <yaozengzeng@zju.edu.cn>	2019-09-04 11:21:53 -06:00
Chris Marchbanks	791a2409a2	Pre-allocate pendingSamples to reduce allocations Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-09-03 15:41:47 -06:00
Chris Marchbanks	160186da18	Store labels.Labels instead of []prompb.Label This will use half the steady state memory as required by prompb.Label. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-09-03 15:41:46 -06:00
Stanislav Putrya	6141a8bd7c	Show warnings in UI if query have returned some warnings (#5964 ) * Show warnings in UI if query have returned some warnings + improve warning (error) text if query to remote was finished with error * Add prefixes for remote_read errors Signed-off-by: Stan Putrya <root.vagner@gmail.com>	2019-08-28 14:25:28 +01:00
Bartek Płotka	48b2c9c8ea	remote-read: streamed chunked server side; Extended protobuf; Added chunked, checksumed reader (#5703 ) Part of: https://github.com/prometheus/prometheus/issues/4517 and https://github.com/improbable-eng/thanos/issues/488 Changes: * Extended protobuf for chunked remote read and negotation. * Added checksumed, chunked Writer/Reader. * Added Server side implementation for chunked streamed remote-read. Signed-off-by: Bartek Plotka <bwplotka@gmail.com>	2019-08-19 21:16:10 +01:00
Julius Volz	b5c833ca21	Update go.mod dependencies before release (#5883 ) * Update go.mod dependencies before release Signed-off-by: Julius Volz <julius.volz@gmail.com> * Add issue for showing query warnings in promtool Signed-off-by: Julius Volz <julius.volz@gmail.com> * Revert json-iterator back to 1.1.6 It produced errors when marshaling Point values with special float values. Signed-off-by: Julius Volz <julius.volz@gmail.com> * Fix expected step values in promtool tests after client_golang update Signed-off-by: Julius Volz <julius.volz@gmail.com> * Update generated protobuf code after proto dep updates Signed-off-by: Julius Volz <julius.volz@gmail.com>	2019-08-14 11:00:39 +02:00
Bartek Płotka	32be514845	Merge pull request #5805 from codesome/merge-tsdb Merge tsdb into prometheus	2019-08-13 11:39:41 +01:00
Chris Marchbanks	a6a55c433c	Improve desired shards calculation (#5763 ) The desired shards calculation now properly keeps track of the rate of pending samples, and uses the previously unused integralAccumulator to adjust for missing information in the desired shards calculation. Also, configure more capacity for each shard. The default 10 capacity causes shards to block on each other while sending remote requests. Default to a 500 sample capacity and explain in the documentation that having more capacity will help throughput. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-08-13 10:10:21 +01:00
Ganesh Vernekar	5ecef3542d	Cleanup after merging tsdb into prometheus Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2019-08-13 14:04:14 +05:30
ethan	38ccf0157e	cleanup: correct func name in log message (#5852 ) Signed-off-by: Guangming Wang <guangming.wang@daocloud.io>	2019-08-10 16:24:58 +01:00
Chris Marchbanks	529ccff07b	Remove all usages of stretchr/testify Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-08-08 19:49:27 -06:00
Chris Marchbanks	0685eb5395	Refactor testutil.NewStorage into a new package This avoids a circular dependency between the testutil and storage packages. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-08-08 19:43:04 -06:00
Vadym Martsynovskyy	8318aa2d5d	Check for duplicate label names in remote read (#5829 ) * Check for duplicate label names in remote read Also add test to confirm that #5731 is fixed * Use subtests in TestValidateLabelsAndMetricName * Really check that expectedErr matches err Signed-off-by: Vadym Martsynovskyy <vmartsynovskyy@gmail.com>	2019-08-07 16:13:10 +01:00
Callum Styan	c40a83f386	Add metrics for max shards, min shards, and desired shards. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-08-04 20:04:19 -07:00
AllenZMC	758c71b980	fix word `encourter` to `encounter` Signed-off-by: czm <zhongming.chang@daocloud.io>	2019-07-29 22:16:23 +08:00
Devin Trejo	d77f2aa29c	Only check last directory when discovering checkpoint number (#5756 ) * Only check last directory when discovering checkpoint number Signed-off-by: Devin Trejo <dtrejo@palantir.com> * Comments for checkpointNum Signed-off-by: Devin Trejo <dtrejo@palantir.com>	2019-07-15 17:53:58 +01:00
Yao Zengzeng	3cde8a9941	pass error up if WALWathcer.segments() return err (#5741 ) Signed-off-by: YaoZengzeng <yaozengzeng@zju.edu.cn>	2019-07-15 17:52:03 +01:00
Xigang Wang	445bcd1251	Update the runShard method and change len(pendingSamples) to n=len(pendingSamples) (#5708 ) Signed-off-by: xigang <wangxigang2014@gmail.com>	2019-07-09 19:09:11 +01:00
Chris Marchbanks	06f1ba73eb	Provide flag to compress the tsdb WAL Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-07-03 08:03:29 -06:00
Chris Marchbanks	475ca2ecd0	Update to tsdb 0.9.1 Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-07-03 07:51:11 -06:00
Chris Marchbanks	06bdaf076f	Remote Write Allocs Improvements (#5614 ) * Add benchmark for sample delivery * Simplify StoreSeries to have only one loop * Reduce allocations for pending samples in runShard * Only allocate one send slice per segment * Cache a buffer in each shard for snappy to use * Remove queue manager seriesMtx It is not possible for any of the places protected by the seriesMtx to be called concurrently so it is safe to remove. By removing the mutex we can simplify the Append code to one loop. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-06-27 19:48:21 +01:00
Chris Marchbanks	a38a54fa11	Split remote write storage into its own type This allows other processes to reuse just the remote write code without having to use the remote read code as well. This will be used to create a sidecar capable of sending remote write payloads. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-06-27 10:11:02 +01:00
Thomas Jackson	91d7175eaa	Add storage.Warnings to LabelValues and LabelNames (#5673 ) Fixes #5661 Signed-off-by: Thomas Jackson <jacksontj.89@gmail.com>	2019-06-17 08:31:17 +01:00
Dmitry Shmulevich	0c0638b080	resolve race condition in maxGauge (#5647 ) * resolve race condition in maxGauge Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com>	2019-06-13 20:55:08 +01:00
Chris Marchbanks	840872a6f8	Fix remote storage config not updating correctly (#5555 ) * Update remote write and remote read separately * Add external labels to the remote write conf hash * Add unit tests for remote storage lifecycle Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-05-17 10:29:49 +01:00
Simon Pasquier	45506841e6	*: enable all default linters (#5504 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-05-03 15:11:28 +02:00
Callum Styan	3639d51eb6	Remote Storage: string interner should not panic in release (#5487 ) * Don't panic if we try to release a string that is not in the interner. * Move seriesMtx locking in QueueManager's StoreSeries function. This stops us from calling release for strings that aren't interned if there's a race between reading a checkpoint and storing new series labels, which could happen during checkpointing or reloading config. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-04-24 10:46:31 +01:00
Callum Styan	e87449b59d	Remote Write: Queue Manager specific metrics shouldn't exist if the queue no longer exists (#5445 ) * Unregister remote write queue manager specific metrics when stopping the queue manager. * Use DeleteLabelValues instead of Unregister to remove queue and watcher related metrics when we stop them. Create those metrics in the structs start functions rather than in their constructors because of the ordering of creation, start, and stop in remote storage ApplyConfig. * Add setMetrics function to WAL watcher so we can set the watchers metrics in it's Start function, but not have to call Start in some tests (causes data race). Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-04-23 09:49:17 +01:00
Callum Styan	b7538e7b49	Don't stop, recreate, and start remote storage QueueManagers if the (#5485 ) remote write config hasn't changed at all. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-04-23 09:47:18 +01:00
Romain Baugue	95193fa027	Exhaust every request body before closing it (#5166 ) (#5479 ) From the documentation: > The default HTTP client's Transport may not > reuse HTTP/1.x "keep-alive" TCP connections if the Body is > not read to completion and closed. This effectively enable keep-alive for the fixed requests. Signed-off-by: Romain Baugue <romain.baugue@elwinar.com>	2019-04-18 09:50:37 +01:00
Vasily Sliouniaev	5be9a1426f	Prevent reshard concurrent with calling stop (#5460 ) * Prevent reshard concurrent with calling stop Signed-off-by: Vasily <v.sliouniaev@gmail.com>	2019-04-16 11:25:19 +01:00
Callum Styan	c2b88992a3	Remote Write: fix checkpoint reading (#5429 ) * Fix ReadCheckpoint to ensure that it actually reads all the contents of each segment in a checkpoint dir, or returns an error. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-04-09 10:52:44 +01:00
Tariq Ibrahim	8fdfa8abea	refine error handling in prometheus (#5388 ) i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors. ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives. iii) Does away with the use of fmt package for errors in favour of pkg/errors Signed-off-by: tariqibrahim <tariq181290@gmail.com>	2019-03-26 00:01:12 +01:00
Tom Wilkie	807fd33ecc	Review feedback. - Update read path to use labels.Labels. - Fix the tests. - Remove pack. - Remove unused function. - Fix race in tests. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-03-18 20:31:12 +00:00
Callum Styan	1a7923dde3	Add ref counting to string interning so we can remove a string when there are no longer any refs. Add tests for interning. Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-03-18 20:31:12 +00:00
Tom Wilkie	cbf5f13285	Naive string iterning for labes & values in the remote_write path. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-03-18 20:31:12 +00:00
Tom Wilkie	c7b3535997	Use pkg/relabelling in remote write. - Unmarshall external_labels config as labels.Labels, add tests. - Convert some more uses of model.LabelSet to labels.Labels. - Remove old relabel pkg (fixes #3647). - Validate external label names. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-03-18 20:31:12 +00:00
Tom Wilkie	2fa93595d6	More WAL remote_write tweaks. (#5300 ) * Consistently pre-lookup the metrics for a given queue in queue manager. * Don't open the WAL (for writing) in the remote_write code. * Add some more logging. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-03-05 12:21:11 +00:00
Krasi Georgiev	1684dc750a	updated tsdb to 0.6.0 (#5292 ) * updated tsdb to 0.6.0 as part of the update also added the new storage.tsdb.allow-overlapping-blocks flag and mark it as experimental.	2019-03-04 21:42:45 +02:00
Tariq Ibrahim	1adb91738d	fix typo in recordType method of wal_watcher.go (#5297 ) Signed-off-by: tariqibrahim <tariq181290@gmail.com>	2019-03-04 17:33:35 +01:00
Tariq Ibrahim	ab8e9b7423	fix typo in queue_manager.go comment (#5294 ) Signed-off-by: tariqibrahim <tariq181290@gmail.com>	2019-03-03 11:35:29 +00:00
Tom Wilkie	67da8e7b46	Refactor and fix queue resharding (#5286 ) - Remove prometheus_remote_queue_last_send_timestamp_seconds metric. Its not particularly useful, we have highest_timestamp_seconds. - Factor out maxGauage, a gauge that only increases. - Change sharding calculations to use max samples in timestamp - max samples out timestamp (not rates). - Also include the ratio of samples dropped to correctly predict number of pending samples. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-03-01 11:04:26 -08:00
Callum Styan	b8106dd459	Review feedback: - Add a dropped samples EWMA and use it in calculating desired shards. - Update metric names and a log messages. - Limit number of entries in the dedupe logging middleware to prevent potential OOM. Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Callum Styan	512f549064	Refactor: inline decodeRecord in readSegment and don't bother decoding samples records if we're not tailing the segment, add a benchmark test and fix some other tests Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	f795942572	Decrement pending sample when queue exits. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	ee7efa93fe	Fix some tests. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Callum Styan	b69bdfb4d1	Store the checkpoint we read last, so that we don't keep reading the same checkpoint on each tick. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	efbd9559f4	Deal with corruptions in the WAL: - If we're replaying the WAL to get series records, skip that segment when we hit corruptions. - If we're tailing the WAL for samples, fail the watcher. - When the watcher fails, restart from the latest checkpoint - and only send new samples by updating startTime. - Tidy up log lines and error handling, don't return so many errors on quiting. - Expect EOF when processing checkpoints. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	adf5307470	Update wal LiveReader to ensure EOF is correctly propagated. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Callum Styan	d6258aea8f	Fix up remote write tests: - Tests that created a QueueManager were leaving behind files at the end of tests. - WAL replaying (readToEnd)tests seem to require extra time to finish now. - Some fixes to make staticcheck happy Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	184f06a981	Combine the record decoding metrics into one; break out garbage collection into a separate function. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	859cda27ff	Remove some 'global' state, moving segment numbers to parameters. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	bdc6b764b0	If reading the WAL fails, try again. Also, read from the segment containing the index for the last checkpoint, not the first segment. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	d6f911b511	Factor out logging ratelimit & dedupe middleware. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	a5c20642b3	Refactor WAL watcher to remove some duplication. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
Tom Wilkie	37ad4db485	Export timestamps in seconds since epoch. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-28 08:38:39 -08:00
JoeWrightss	362873f72b	Fix .Log() error message (#5257 ) Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>	2019-02-22 14:39:37 +00:00
Simon Pasquier	b41d6d54f2	storage/remote: increase timeouts for Travis CI (#5224 ) * storage/remote: adapt tests for Travis CI Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Check filesystems on Travis environment Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Run remote/storage tests on CircleCI for troubleshooting Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Try using tmpfs partition Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Revert "Try using tmpfs partition" This reverts commit `85a30deb72`. Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Don't store labels in writeToMock Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Fix data race Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Bump retries to 100 meaning that the total timeout is 10s Signed-off-by: Simon Pasquier <spasquie@redhat.com> * clean up .travis.yml Signed-off-by: Simon Pasquier <spasquie@redhat.com> * code fixup Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Remove unneeded empty line Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-02-15 16:47:41 +01:00
Callum Styan	37e35f9e0c	Various improvements to WAL based remote write. - Use the queue name in WAL watcher logging. - Don't return from watch if the reader error was EOF. - Fix sample timestamp check logic regarding what samples we send. - Refactor so we don't need readToEnd/readSeriesRecords - Fix wal_watcher tests since readToEnd no longer exists Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-02-12 11:39:13 +00:00
Tom Wilkie	b93bafeee1	Various fixes to locking & shutdown for WAL-based remote write. - Remove datarace in the exported highest scrape timestamp. - Backoff on enqueue should be per-sample - reset the result for each sample. - Remove diffKeys, unused ctx and cancelfunc in WALWatcher, 'name' from writeTo interface, and pass it to constructor. - Reorder functions in WALWatcher depth-first according to call graph. - Fix vendor/modules.txt. - Split out the various timer periods into consts at the top of the file. - Move w.currentSegmentMetric.Set close to where we set the currentSegment. - Combine r.Next() and isClosed(w.quit) into a single loop. - Unnest some ifs in WALWatcher.watch, propagate erros in decodeRecord, add some new lines to make it easier to read. - Reorganise checkpoint handling to reduce nesting and make it easier to follow. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-02-12 11:39:13 +00:00
Callum Styan	6f69e31398	Tail the TSDB WAL for remote_write This change switches the remote_write API to use the TSDB WAL. This should reduce memory usage and prevent sample loss when the remote end point is down. We use the new LiveReader from TSDB to tail WAL segments. Logic for finding the tracking segment is included in this PR. The WAL is tailed once for each remote_write endpoint specified. Reading from the segment is based on a ticker rather than relying on fsnotify write events, which were found to be complicated and unreliable in early prototypes. Enqueuing a sample for sending via remote_write can now block, to provide back pressure. Queues are still required to acheive parallelism and batching. We have updated the queue config based on new defaults for queue capacity and pending samples values - much smaller values are now possible. The remote_write resharding code has been updated to prevent deadlocks, and extra tests have been added for these cases. As part of this change, we attempt to guarantee that samples are not lost; however this initial version doesn't guarantee this across Prometheus restarts or non-retryable errors from the remote end (eg 400s). This changes also includes the following optimisations: - only marshal the proto request once, not once per retry - maintain a single copy of the labels for given series to reduce GC pressure Other minor tweaks: - only reshard if we've also successfully sent recently - add pending samples, latest sent timestamp, WAL events processed metrics Co-authored-by: Chris Marchbanks <csmarchbanks.com> (initial prototype) Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> (sharding changes) Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-02-12 11:39:13 +00:00
Goutham Veeramachaneni	384cba1211	Add flag for size based retention (#5109 ) * Add flag for size based retention Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Deprecate the old retention flag for a new one. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Add ability to take a suffix for size flag Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Address feedback Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>	2019-01-18 19:18:36 +05:30
Krasi Georgiev	3bd41cc92c	Udpate tsdb to 0.4 (#5110 ) * update tsdb to v0.4.0 Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com> * remove unused struct field Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>	2019-01-18 16:32:14 +05:30
Matt Layher	302148fd69	*: apply gofmt -s Signed-off-by: Matt Layher <mdlayher@gmail.com>	2019-01-16 17:28:14 -05:00
Callum Styan	5358f76c5c	update remote write path proto so that Labels/Timeseries can't be nil (#4957 ) Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-01-15 19:13:39 +00:00
Simon Pasquier	f678e27eb6	: use latest release of staticcheck (#5057 ) : use latest release of staticcheck It also fixes a couple of things in the code flagged by the additional checks. Signed-off-by: Simon Pasquier <spasquie@redhat.com> Use official release of staticcheck Also run 'go list' before staticcheck to avoid failures when downloading packages. Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-01-04 14:47:38 +01:00
glutamatt	5ddde1965b	tune the "Wal segment size" with a flag (#5029 ) Add WALSegmentSize as an option, and the corresponding flag "storage.tsdb.wal-segment-size" to tune the max size of wal segment files. The addressed base problem is to reduce the disk space used by wal segment files : on a raspberry pi, for instance, we often want to reduce write load of the sd card, then, the wal directory is mounted on a memory (space limited) partition. the default value of the segment max file size, pushed the size of directory to 128 MB for each segment , which is too much ram consumption on a rasp. the initial discussion is at https://github.com/prometheus/tsdb/pull/450	2019-01-03 17:13:21 +03:00
Tom Wilkie	6e08029b56	Move err to be the last return value from storage.Select. (#5054 ) Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-01-02 11:10:13 +00:00

... 4 5 6 7 8 ...

1472 commits