prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2025-03-05 20:59:13 -08:00

Author	SHA1	Message	Date
Leonardo Zamariola	3326df42bb	Removing global state modification on unit tests (fix #10033 #10034 ) (#10935 ) * Removing global state modification on unit tests (fix #10033 #10034) The config.DefaultRemoteReadConfig and config.DefaultRemoteWriteConfig instances hold global state. Unit tests were changing their url.URL reference globally causing false positives when tests were ran through package. Two helper functions were created to copy those global values instead of changing them in place to fix null point when running unit tests by method instead of by package. Signed-off-by: Leonardo Zamariola <leonardo.zamariola@gmail.com> * Fixing pull request suggestions Copying by value from default config Signed-off-by: Leonardo Zamariola <leonardo.zamariola@gmail.com>	2022-06-30 10:20:16 -06:00
Matej Gera	1dd247f68b	Remote Write: Rename confusing `walDir` parameter to `dir` (#10464 ) * Rename walDir parameter to dir Signed-off-by: Matej Gera <matejgera@gmail.com> * Improve NewQueueManager comment Signed-off-by: Matej Gera <matejgera@gmail.com>	2022-05-30 21:45:30 -07:00
Bryan Boreham	4b9f248e85	unit tests: make all Labels sorted alphabetically (#10532 ) "Labels is a sorted set of labels. Order has to be guaranteed upon instantiation." says the comment, so fix all the tests that break this rule. For `BenchmarkLabelValuesWithMatchers()` and `BenchmarkHeadLabelValuesWithMatchers()` the amount of work done changes significantly if you put the labels in order, because all series refs get neatly partitioned by the `tens` label, so I renamed the labels to maintain the previous behaviour. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-05-04 23:41:36 +02:00
Matthieu MOREL	e2ede285a2	refactor: move from io/ioutil to io and os packages (#10528 ) * refactor: move from io/ioutil to io and os packages * use fs.DirEntry instead of os.FileInfo after os.ReadDir Signed-off-by: MOREL Matthieu <matthieu.morel@cnp.fr>	2022-04-27 11:24:36 +02:00
Chris Marchbanks	a11e73edda	Fix a deadlock between Batch and FlushAndShutdown (#10608 ) If FlushAndShutdown is called with a full batchQueue, and then Batch is called rather than the normal path of reading from a queue a deadlock might be encountered. Rather than having FlushAndShutdown having blocking code while holding a lock retry sending the batch every second. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2022-04-20 20:50:41 +02:00
Wilbert Guo	83a2e52bc2	Add SyncForState Implementation for Ruler HA (#10070 ) * continuously syncing activeAt for alerts Signed-off-by: Yijie Qin <qinyijie@amazon.com> Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> * add import Signed-off-by: Yijie Qin <qinyijie@amazon.com> Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> * Refactor SyncForState and add unit tests Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> * Format code Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> * Add hook for syncForState Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> Fix go lint Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> Refactor syncForState override implementation Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> Add syncForState override func as argument to Update() Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> Fix go formatting Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> Fix circleci test errors Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> Remove overrideFunc as argument to run() Signed-off-by: Wilbert Guo <wilbeguo@amazon.com> * remove the syncForState Signed-off-by: Yijie Qin <qinyijie@amazon.com> * use the override function to decide if need to replace the activeAt or not Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fix test case Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fix format Signed-off-by: Yijie Qin <qinyijie@amazon.com> * Trigger build Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fixing comments Signed-off-by: Yijie Qin <qinyijie@amazon.com> * return the result of map of alerts instead of single one Signed-off-by: Yijie Qin <qinyijie@amazon.com> * upper case the QueryforStateSeries Signed-off-by: Yijie Qin <qinyijie@amazon.com> * use a more generic rule group post process function type Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fix indentation Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fix gofmt Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fix lint Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fixing naming Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fix comments Signed-off-by: Yijie Qin <qinyijie@amazon.com> * add the lastEvalTimestamp as parameter Signed-off-by: Yijie Qin <qinyijie@amazon.com> * fmt Signed-off-by: Yijie Qin <qinyijie@amazon.com> * change funcType to func Signed-off-by: Yijie Qin <qinyijie@amazon.com> Co-authored-by: Yijie Qin <qinyijie@amazon.com> Co-authored-by: Yijie Qin <63399121+qinxx108@users.noreply.github.com>	2022-03-29 02:16:46 +02:00
beorn7	79376c1e94	Merge branch 'release-2.33' into beorn7/release	2022-03-08 17:42:49 +01:00
Chris Marchbanks	e970acb085	Fix deadlock between adding to queue and getting batch Do not block when trying to write a batch to the queue. This can cause appends to lock forever if the only thing reading from the queue needs the mutex to write. Instead, if batchQueue is full pop the sample that was just added from the partial batch and return false. The code doing the appending already handles retries with backoff. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2022-03-07 17:15:57 -07:00
Chris Marchbanks	afdc1decac	Write a test that reproduces the deadlock Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2022-03-07 17:15:51 -07:00
Łukasz Mierzwa	a4317bf0ec	Run gofumpt on all files (#10392 ) * Run gofumpt on all files Getting golangci-lint errors when building on my laptop, possibly because I have newer version of gofumpt then what it was formatted with. Run gofumpt -w -extra on all files as it will be needed in the future anyway. * Update golangci-lint to v1.44.2 v1.44.0 upgraded gofumpt so bumping version in CI will help keep formatting correct for everyone * Address golangci-lint error Getting 'error-strings: error strings should not be capitalized or end with punctuation or a newline' from revive here. Drop new line. Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>	2022-03-03 17:21:05 +01:00
DrAuYueng	5a6e26556b	Add an option to use the external labels as selectors for the remote read endpoint (#10254 ) * An option to ignore external_labels Signed-off-by: DrAuYueng <ouyang1204@gmail.com>	2022-02-16 22:12:47 +01:00
Julien Pivotto	b0d70557b7	Merge pull request #10285 from prometheus/release-2.33	2022-02-12 00:02:24 +01:00
Chris Marchbanks	bfb1500a38	Fix deadlock when stopping a shard (#10279 ) If a queue is stopped and one of its shards happens to hit the batch_send_deadline at the same time a deadlock can occur where stop holds the mutex and will not release it until the send is finished, but the send needs the mutex to retrieve the most recent batch. This is fixed by using a second mutex just for writing. In addition, the test I wrote exposed a case where during shutdown a batch could be sent twice due to concurrent calls to queue.Batch() and queue.FlushAndShutdown(). Protect these with a mutex as well. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2022-02-11 07:07:41 -07:00
Matej Gera	2c61d29b2a	Tracing: Migrate to OpenTelemetry library (#9724 ) Signed-off-by: Matej Gera <matejgera@gmail.com>	2022-01-25 11:08:04 +01:00
Eng Zer Jun	3e67654d37	refactor: use `T.TempDir()` and `B.TempDir` to create temporary directory The directory created by `T.TempDir()` and `B.TempDir()` is automatically removed when the test and all its subtests complete. Reference: https://pkg.go.dev/testing#T.TempDir Reference: https://pkg.go.dev/testing#B.TempDir Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>	2022-01-22 18:57:30 +08:00
Bryan Boreham	954c0e8020	remote_write: round desired shards up before check Previously we would reject an increase from 2 to 2.5 as being within 30%; by rounding up first we see this as an increase from 2 to 3. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-01-10 09:57:37 +00:00
Bryan Boreham	6d01ce8c4d	remote_write: shard up more when backlogged Change the coefficient from 1% to 5%, so instead of targetting to clear the backlog in 100s we target 20s. Update unit test to reflect the new behaviour. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-01-10 09:57:37 +00:00
Bryan Boreham	d588b14d9c	remote_write: detailed test for shard calculation Drive the input parameters to `calculateDesiredShards()` very precisely, to illustrate some questionable behaviour marked with `?!`. See https://github.com/prometheus/prometheus/issues/9178, https://github.com/prometheus/prometheus/issues/9207, Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-01-10 09:57:37 +00:00
Chris Marchbanks	ba03f7fc23	Merge pull request #10102 from prometheus/update-metrics-on-rw-fails Update sent timestamp when write irrecoverably fails	2022-01-05 10:46:09 -07:00
Goutham Veeramachaneni	6696b7a5f0	Don't update metrics on context cancellation Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>	2022-01-04 10:46:52 +01:00
Chris Marchbanks	dfa5cb7462	Merge pull request #10038 from charlesxsh/fix-TestReshardRaceWithStop add proper exit for loop	2022-01-03 09:02:45 -07:00
Goutham Veeramachaneni	1af81dc5c9	Update sent timestamp when write irrecoverably fails. We have an alert that fires when prometheus_remote_storage_highest_timestamp_in_seconds - prometheus_remote_storage_queue_highest_sent_timestamp_seconds becomes too high. But we have an agent that fires this when the remote "rate-limits" the user. This is because prometheus_remote_storage_queue_highest_sent_timestamp_seconds doesn't get updated when the remote sends a 429. I think we should update the metrics, and the change I made makes sense. Because if the requests fails because of connectivity issues, etc. we will never exit the `sendWriteRequestWithBackoff` function. It only exits the function when there is a non-recoverable error, like a bad status code, and in that case, I think the metric needs to be updated. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>	2022-01-03 11:13:48 +01:00
Shihao Xia	c3e7bfb813	add proper exit for loop Signed-off-by: Shihao Xia <charlesxsh@hotmail.com>	2021-12-29 23:48:11 -05:00
Julien Pivotto	27343277fa	Merge release-2.32 forward into main (#10032 ) * storage: expose bug in iterators #10027 Signed-off-by: beorn7 <beorn@grafana.com> * storage: fix bug #10027 in iterators' Seek method Signed-off-by: beorn7 <beorn@grafana.com> * Append reporting metrics without limit If reporting metrics fails due to reaching the limit, this makes the target appear as UP in the UI, but the metrics are missing. This commit bypasses that limit for report metrics. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * Remove check against cfg so interval/ timeout are always set (#10023) (#10031) Signed-off-by: Nicholas Blott <blottn@tcd.ie> Co-authored-by: Nicholas Blott <blottn@tcd.ie> * Cut v2.32.1 Signed-off-by: Julius Volz <julius.volz@gmail.com> * Apply suggestions from code review Signed-off-by: Julius Volz <julius.volz@gmail.com> Co-authored-by: Levi Harrison <git@leviharrison.dev> Co-authored-by: Julien Pivotto <roidelapluie@inuits.eu> Co-authored-by: Nicholas Blott <blottn@tcd.ie> Co-authored-by: Julius Volz <julius.volz@gmail.com> Co-authored-by: Levi Harrison <git@leviharrison.dev>	2021-12-17 23:18:38 +01:00
beorn7	0ede6ae321	storage: fix bug #10027 in iterators' Seek method Signed-off-by: beorn7 <beorn@grafana.com>	2021-12-16 12:07:35 +01:00
beorn7	b042e29569	storage: expose bug in iterators #10027 Signed-off-by: beorn7 <beorn@grafana.com>	2021-12-16 12:02:15 +01:00
Chris Marchbanks	0a8d28ea93	Merge pull request #9934 from bboreham/remote-write-struct remote-write: buffer struct instead of interface to reduce garbage-collection	2021-12-09 09:17:45 -07:00
Bryan Boreham	bd6436605d	Review feedback Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2021-12-09 14:40:44 +00:00
Sebastian Rabenhorst	d8b8678bd1	Log time series details for out-of-order samples in remote write receiver (#9894 ) * Improved out-of-order sample logs in write handler Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> sign commit Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> Inlined logAppendError Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> Update storage/remote/write_handler.go Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> Fixed fmt Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> * Improved out-of-order sample logs in write handler Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> sign commit Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> Inlined logAppendError Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>	2021-12-08 15:07:51 +00:00
detailyang	3e482c905f	fix:storage:avoid panic when iterater exhauested (#9945 ) Signed-off-by: detailyang <detailyang@gmail.com>	2021-12-07 19:50:00 +05:30
Bryan Boreham	50878ebe5e	remote-write: buffer struct instead of interface This reduces the amount of individual objects allocated, allowing sends to run a bit faster. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2021-12-03 14:30:42 +00:00
Bryan Boreham	c478d6477a	remote-write: benchmark just sending, on 20 shards Previously BenchmarkSampleDelivery spent a lot of effort checking each sample had arrived, so was largely showing the performance of test-only code. Increase the number of shards to be more realistic for a large workload. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2021-12-03 14:02:10 +00:00
Chris Marchbanks	e95d4ec3f1	Merge pull request #9830 from prometheus/batch-queues Batch samples before sending them to channels	2021-12-02 08:37:41 -07:00
Chris Marchbanks	c655684142	Subtract from enqueued samples/exemplars upon send Right now the values for enqueuedSamples and enqueuedExemplars is never subtracted leading to inflated values for failedSamples/failedExemplars when a hard shutdown of a shard occurs. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2021-11-30 12:54:50 -07:00
Chris Marchbanks	319249f9db	Batch samples before sending them to channels Channels can cause bottlenecks and tons of context switches when reading hundreds of thousands of samples per second from a single queue. Instead, pre-batch the samples to amortize the cost of the concurrency overhead. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2021-11-30 12:54:45 -07:00
Björn Rabenstein	d677aa4b29	storage: Consolidate iterator method names (Values -> At) (#9888 ) `BufferedSeriesIterator` and `MemoizedSeriesIterator` use a method called `Values` for exactly the purpose for which all other iterators of the same kind use a method called `At`. That alone is confusing, but on top of that, the `Values` method only returns a single sample, not multiple values. I assume the naming has historical reasons. This commit makes it more consistent. It is now easier to read, and now `BufferedSeriesIterator` and `MemoizedSeriesIterator` implement `chunkenc.Iterator` like many other iterators, too. Signed-off-by: beorn7 <beorn@grafana.com>	2021-11-29 11:16:40 +01:00
Björn Rabenstein	b866db009b	storage: Fix and improve the Seek method of various iterators (#9878 ) There was a subtle and nasty bug in listSeriesIterator.Seek. In addition, the Seek call is defined to be a no-op if the current position of the iterator is already pointing to a suitable sample. This commit adds fast paths for this case to several potentially expensive Seek calls. Another bug was in concreteSeriesIterator.Seek. It always searched the whole series and not from the current position of the iterator. Signed-off-by: beorn7 <beorn@grafana.com>	2021-11-29 15:17:56 +05:30
Matheus Alcantara	e673805d67	storage/remote: use t.TempDir instead of ioutil.TempDir on tests (#9811 ) Signed-off-by: Matheus Alcantara <matheusssilv97@gmail.com>	2021-11-19 15:21:45 -05:00
Hu Shuai	eb43437d83	Fix golint issue (#9800 ) Signed-off-by: Hu Shuai <hus.fnst@fujitsu.com>	2021-11-18 09:26:07 +01:00
Dieter Plaetinck	0fac9bb859	Add basic initial developer docs for TSDB (#9451 ) * Add basic initial developer docs for TSDB There's a decent amount of content already out there (blog posts, conference talks, etc), but: * when they get stale, they don't tend to get updated * they still leave me with questions that I'ld like to answer for developers (like me) who want to use, or work with, TSDB What I propose is developer docs inside the prometheus repository. Easy to find and harness the power of the community to expand it and keep it up to date. * perfect is the enemy of good. Let's have a base and incrementally improve * Markdown docs should be broad but not too deep. Source code comments can complement them, and are the ideal place for implementation details. Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * use example code that works out of the box Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * Apply suggestions from code review Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * PR feedback Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * more docs Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * PR feedback Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * Apply suggestions from code review Signed-off-by: Dieter Plaetinck <dieter@grafana.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> * Apply suggestions from code review Signed-off-by: Dieter Plaetinck <dieter@grafana.com> Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> * feedback Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * Update tsdb/docs/usage.md Signed-off-by: Dieter Plaetinck <dieter@grafana.com> Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> * final tweaks Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * workaround docs versioning issue Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * Move example code to real executable, testable example. Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * cleanup example test and make sure it always reproduces Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * obtain temp dir in a way that works with older Go versions Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * Fix Ganesh's comments Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-11-17 15:51:27 +05:30
Mateusz Gozdek	d8561dbfd8	storage/remote: make tests use separate remote write configs So tests can be run in parallel without races. Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>	2021-11-10 09:40:43 +01:00
Mateusz Gozdek	116552cc58	storage/remote: check errors from ApplyConfig in tests So tests do not produce obscure errors when applying configuration fails. Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>	2021-11-10 09:40:43 +01:00
beorn7	c954cd9d1d	Move packages out of deprecated pkg directory This creates a new `model` directory and moves all data-model related packages over there: exemplar labels relabel rulefmt textparse timestamp value All the others are more or less utilities and have been moved to `util`: gate logging modetimevfs pool runtime Signed-off-by: beorn7 <beorn@grafana.com>	2021-11-09 08:03:10 +01:00
Dieter Plaetinck	cda025b5b5	TSDB: demistify SeriesRefs and ChunkRefs (#9536 ) * TSDB: demistify seriesRefs and ChunkRefs The TSDB package contains many types of series and chunk references, all shrouded in uint types. Often the same uint value may actually mean one of different types, in non-obvious ways. This PR aims to clarify the code and help navigating to relevant docs, usage, etc much quicker. Concretely: * Use appropriately named types and document their semantics and relations. * Make multiplexing and demuxing of types explicit (on the boundaries between concrete implementations and generic interfaces). * Casting between different types should be free. None of the changes should have any impact on how the code runs. TODO: Implement BlockSeriesRef where appropriate (for a future PR) Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * feedback Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * agent: demistify seriesRefs and ChunkRefs Signed-off-by: Dieter Plaetinck <dieter@grafana.com>	2021-11-06 15:40:04 +05:30
Marco Pracucci	9f5ff5b269	Allow to disable trimming when querying TSDB (#9647 ) * Allow to disable trimming when querying TSDB Signed-off-by: Marco Pracucci <marco@pracucci.com> * Addressed review comments Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added unit test Signed-off-by: Marco Pracucci <marco@pracucci.com> * Renamed TrimDisabled to DisableTrimming Signed-off-by: Marco Pracucci <marco@pracucci.com>	2021-11-03 15:38:34 +05:30
sniper	f82e56fbba	fix request bytes size and continue is useless (#9635 ) Signed-off-by: kalmanzhao <kalmanzhao@tencent.com> Co-authored-by: kalmanzhao <kalmanzhao@tencent.com>	2021-11-03 14:40:31 +05:30
Mateusz Gozdek	b7bdf6fab2	Fix imports formatting According to `2829908806 (r58457095)`. Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>	2021-11-02 19:52:34 +01:00
Mateusz Gozdek	1a6c2283a3	Format Go source files using 'gofumpt -w -s -extra' Part of #9557 Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>	2021-11-02 19:52:34 +01:00
lzhfromustc	9da5382103	storage/remote: Prevent two goroutines from endless loop (#8967 ) Signed-off-by: lzhfromustc <lzhfromustc@gmail.com>	2021-10-29 16:39:02 -07:00
lzhfromustc	d42be7be76	test:Fix two potential goroutine leaks (#8964 ) Signed-off-by: lzhfromustc <lzhfromustc@gmail.com>	2021-10-29 15:44:32 -07:00

1 2 3 4 5 ...

1199 commits