prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-11-14 01:24:04 -08:00

Author	SHA1	Message	Date
Giedrius Statkevičius	d1d2566055	remote/read_handler: pool input to Marshal() (#11357 ) * remote/read_handler: pool input to Marshal() Use a sync.Pool to reuse byte slices between calls to Marshal() in the remote read handler. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * remote: add microbenchmark for remote read handler Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>	2022-11-15 16:29:16 +01:00
Ganesh Vernekar	648be89822	Merge remote-tracking branch 'upstream/main' into fix-conflict Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-10-12 14:20:02 +05:30
Jesus Vazquez	775d90d5f8	TSDB: Rename wal package to wlog (#11352 ) The wlog.WL type can now be used to create a Write Ahead Log or a Write Behind Log. Before the prefix for wbl metrics was 'prometheus_tsdb_out_of_order_wal_' and has been replaced with 'prometheus_tsdb_out_of_order_wbl_'. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> Signed-off-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com> Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>	2022-10-10 20:38:46 +05:30
Jesus Vazquez	e934d0f011	Merge 'main' into sparsehistogram Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>	2022-10-05 22:14:49 +02:00
Bryan Boreham	3029320ce6	storage/remote: in tests use labels.FromStrings And a few cases of `EmptyLabels()`. Replacing code which assumes the internal structure of `Labels`. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-09-09 13:34:49 +02:00
Ganesh Vernekar	f540c1dbd3	Add support for histograms in WAL checkpointing (#11210 ) * Add support for histograms in WAL checkpointing Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix review comments Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix tests Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-08-29 17:38:36 +05:30
beorn7	c9fd3c235d	Merge branch 'main' into sparsehistogram	2022-08-10 17:54:37 +02:00
Levi Harrison	0db6b072bc	Export `histogramToHistogramProto()` (#11046 ) Signed-off-by: Levi Harrison <git@leviharrison.dev>	2022-07-21 10:12:50 -04:00
Paschalis Tsilias	d1122e0743	Introduce TSDB changes for appending metadata to the WAL (#10972 ) * Append metadata to the WAL Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Remove extra whitespace; Reword some docstrings and comments Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Use RLock() for hasNewMetadata check Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Use single byte for metric type in RefMetadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Update proposed WAL format for single-byte type metadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Implementa MetadataAppender interface for the Agent Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Address first round of review comments Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Amend description of metadata in wal.md Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Correct key used to retrieve metadata from cache When we're setting metadata entries in the scrapeCace, we're using the p.Help(), p.Unit(), p.Type() helpers, which retrieve the series name and use it as the cache key. When checking for cache entries though, we used p.Series() as the key, which included the metric name _with_ its labels. That meant that we were never actually hitting the cache. We're fixing this by utiling the __name__ internal label for correctly getting the cache entries after they've been set by setHelp(), setType() or setUnit(). Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Put feature behind a feature flag Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix AppendMetadata docstring Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Reorder WAL format document Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Change error message of AppendMetadata; Fix access of s.meta in AppendMetadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Reuse temporary buffer in Metadata encoder Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Only keep latest metadata for each refID during checkpointing Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix test that's referencing decoding metadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Avoid creating metadata block if no new metadata are present Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Add tests for corrupt metadata block and relevant record type Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix CR comments Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Extract logic about changing metadata in an anonymous function Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Implement new proposed WAL format and amend relevant tests Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Use 'const' for metadata field names Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Apply metadata to head memSeries in Commit, not in AppendMetadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Add docstring and rename extracted helper in scrape.go Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Add tests for tsdb-related cases Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix linter issues vol1 Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix linter issues vol2 Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix Windows test by closing WAL reader files Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Use switch instead of two if statements in metadata decoding Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix review comments around TestMetadata* tests Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Add code for replaying WAL; test correctness of in-memory data after a replay Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Remove scrape-loop related code from PR Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Address first round of comments Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Simplify tests by sorting slices before comparison Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix test to use separate transactions Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Empty out buffer and record slices after encoding latest metadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix linting issue Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Update calculation for DroppedMetadata metric Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Rename MetadataAppender interface and AppendMetadata method to MetadataUpdater/UpdateMetadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Reuse buffer when encoding latest metadata for each series Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix review comments; Check all returned error values using two helpers Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Simplify use of helpers Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Satisfy linter Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>	2022-07-19 10:58:52 +02:00
beorn7	87351f2318	prompb: Modify layout of histograms Note: This is deliberately an incompatible change. Since we have never used histograms in remote read/write yet, there is no point in keeping compatibility. This _is_, however, compatible to the state in the main branch. This commit flattens the bucket message into top-level fields. This has the disadvantage of now having two triples of fields prefixed with `negative_...` or `positive_...`. However, with this structure, we save one tag on the wire. And, perhaps more importantly, we mirror the structure of the `histogram.Histogram` Go type. This commit also adjusts `repeated` fields to use names in the plural form, as it is also the case for the fields that already existed. This also adds a doc comment to `HistogramProtoToHistogram` and changes its return type to a pointer (which is more convenient and probably more efficient). Signed-off-by: beorn7 <beorn@grafana.com>	2022-07-14 17:47:17 +02:00
Levi Harrison	08f3ddb864	Sparse histogram remote-write support (#11001 )	2022-07-14 09:13:12 -04:00
beorn7	28f028e938	Merge branch 'main' into sparsehistogram	2022-07-12 19:07:13 +02:00
Matthieu MOREL	d56d0a9d52	(storage): move from github.com/pkg/errors to 'errors' and 'fmt' (#10946 ) Signed-off-by: Matthieu MOREL <mmorel-35@users.noreply.github.com> Co-authored-by: Matthieu MOREL <mmorel-35@users.noreply.github.com>	2022-07-01 18:59:50 +02:00
Leonardo Zamariola	3326df42bb	Removing global state modification on unit tests (fix #10033 #10034 ) (#10935 ) * Removing global state modification on unit tests (fix #10033 #10034) The config.DefaultRemoteReadConfig and config.DefaultRemoteWriteConfig instances hold global state. Unit tests were changing their url.URL reference globally causing false positives when tests were ran through package. Two helper functions were created to copy those global values instead of changing them in place to fix null point when running unit tests by method instead of by package. Signed-off-by: Leonardo Zamariola <leonardo.zamariola@gmail.com> * Fixing pull request suggestions Copying by value from default config Signed-off-by: Leonardo Zamariola <leonardo.zamariola@gmail.com>	2022-06-30 10:20:16 -06:00
beorn7	40ad5e284a	Merge branch 'main' into beorn7/sparsehistogram	2022-06-09 20:50:30 +02:00
Matej Gera	1dd247f68b	Remote Write: Rename confusing `walDir` parameter to `dir` (#10464 ) * Rename walDir parameter to dir Signed-off-by: Matej Gera <matejgera@gmail.com> * Improve NewQueueManager comment Signed-off-by: Matej Gera <matejgera@gmail.com>	2022-05-30 21:45:30 -07:00
Bryan Boreham	4b9f248e85	unit tests: make all Labels sorted alphabetically (#10532 ) "Labels is a sorted set of labels. Order has to be guaranteed upon instantiation." says the comment, so fix all the tests that break this rule. For `BenchmarkLabelValuesWithMatchers()` and `BenchmarkHeadLabelValuesWithMatchers()` the amount of work done changes significantly if you put the labels in order, because all series refs get neatly partitioned by the `tens` label, so I renamed the labels to maintain the previous behaviour. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-05-04 23:41:36 +02:00
beorn7	3bc711e333	Merge branch 'main' into sparsehistogram	2022-05-04 13:37:13 +02:00
Matthieu MOREL	e2ede285a2	refactor: move from io/ioutil to io and os packages (#10528 ) * refactor: move from io/ioutil to io and os packages * use fs.DirEntry instead of os.FileInfo after os.ReadDir Signed-off-by: MOREL Matthieu <matthieu.morel@cnp.fr>	2022-04-27 11:24:36 +02:00
Chris Marchbanks	a11e73edda	Fix a deadlock between Batch and FlushAndShutdown (#10608 ) If FlushAndShutdown is called with a full batchQueue, and then Batch is called rather than the normal path of reading from a queue a deadlock might be encountered. Rather than having FlushAndShutdown having blocking code while holding a lock retry sending the batch every second. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2022-04-20 20:50:41 +02:00
beorn7	4210aac74a	Merge branch 'main' into sparsehistogram	2022-03-22 14:47:42 +01:00
beorn7	79376c1e94	Merge branch 'release-2.33' into beorn7/release	2022-03-08 17:42:49 +01:00
Chris Marchbanks	e970acb085	Fix deadlock between adding to queue and getting batch Do not block when trying to write a batch to the queue. This can cause appends to lock forever if the only thing reading from the queue needs the mutex to write. Instead, if batchQueue is full pop the sample that was just added from the partial batch and return false. The code doing the appending already handles retries with backoff. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2022-03-07 17:15:57 -07:00
Chris Marchbanks	afdc1decac	Write a test that reproduces the deadlock Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2022-03-07 17:15:51 -07:00
DrAuYueng	5a6e26556b	Add an option to use the external labels as selectors for the remote read endpoint (#10254 ) * An option to ignore external_labels Signed-off-by: DrAuYueng <ouyang1204@gmail.com>	2022-02-16 22:12:47 +01:00
Julien Pivotto	b0d70557b7	Merge pull request #10285 from prometheus/release-2.33	2022-02-12 00:02:24 +01:00
Chris Marchbanks	bfb1500a38	Fix deadlock when stopping a shard (#10279 ) If a queue is stopped and one of its shards happens to hit the batch_send_deadline at the same time a deadlock can occur where stop holds the mutex and will not release it until the send is finished, but the send needs the mutex to retrieve the most recent batch. This is fixed by using a second mutex just for writing. In addition, the test I wrote exposed a case where during shutdown a batch could be sent twice due to concurrent calls to queue.Batch() and queue.FlushAndShutdown(). Protect these with a mutex as well. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2022-02-11 07:07:41 -07:00
Matej Gera	2c61d29b2a	Tracing: Migrate to OpenTelemetry library (#9724 ) Signed-off-by: Matej Gera <matejgera@gmail.com>	2022-01-25 11:08:04 +01:00
Eng Zer Jun	3e67654d37	refactor: use `T.TempDir()` and `B.TempDir` to create temporary directory The directory created by `T.TempDir()` and `B.TempDir()` is automatically removed when the test and all its subtests complete. Reference: https://pkg.go.dev/testing#T.TempDir Reference: https://pkg.go.dev/testing#B.TempDir Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>	2022-01-22 18:57:30 +08:00
Bryan Boreham	954c0e8020	remote_write: round desired shards up before check Previously we would reject an increase from 2 to 2.5 as being within 30%; by rounding up first we see this as an increase from 2 to 3. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-01-10 09:57:37 +00:00
Bryan Boreham	6d01ce8c4d	remote_write: shard up more when backlogged Change the coefficient from 1% to 5%, so instead of targetting to clear the backlog in 100s we target 20s. Update unit test to reflect the new behaviour. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-01-10 09:57:37 +00:00
Bryan Boreham	d588b14d9c	remote_write: detailed test for shard calculation Drive the input parameters to `calculateDesiredShards()` very precisely, to illustrate some questionable behaviour marked with `?!`. See https://github.com/prometheus/prometheus/issues/9178, https://github.com/prometheus/prometheus/issues/9207, Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-01-10 09:57:37 +00:00
Chris Marchbanks	ba03f7fc23	Merge pull request #10102 from prometheus/update-metrics-on-rw-fails Update sent timestamp when write irrecoverably fails	2022-01-05 10:46:09 -07:00
Goutham Veeramachaneni	6696b7a5f0	Don't update metrics on context cancellation Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>	2022-01-04 10:46:52 +01:00
Chris Marchbanks	dfa5cb7462	Merge pull request #10038 from charlesxsh/fix-TestReshardRaceWithStop add proper exit for loop	2022-01-03 09:02:45 -07:00
Goutham Veeramachaneni	1af81dc5c9	Update sent timestamp when write irrecoverably fails. We have an alert that fires when prometheus_remote_storage_highest_timestamp_in_seconds - prometheus_remote_storage_queue_highest_sent_timestamp_seconds becomes too high. But we have an agent that fires this when the remote "rate-limits" the user. This is because prometheus_remote_storage_queue_highest_sent_timestamp_seconds doesn't get updated when the remote sends a 429. I think we should update the metrics, and the change I made makes sense. Because if the requests fails because of connectivity issues, etc. we will never exit the `sendWriteRequestWithBackoff` function. It only exits the function when there is a non-recoverable error, like a bad status code, and in that case, I think the metric needs to be updated. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>	2022-01-03 11:13:48 +01:00
Shihao Xia	c3e7bfb813	add proper exit for loop Signed-off-by: Shihao Xia <charlesxsh@hotmail.com>	2021-12-29 23:48:11 -05:00
beorn7	86cc83b13c	storage: iterator fixes after merge Signed-off-by: beorn7 <beorn@grafana.com>	2021-12-18 14:12:01 +01:00
beorn7	64c7bd2b08	Merge branch 'main' into sparsehistogram	2021-12-18 14:04:25 +01:00
Julien Pivotto	27343277fa	Merge release-2.32 forward into main (#10032 ) * storage: expose bug in iterators #10027 Signed-off-by: beorn7 <beorn@grafana.com> * storage: fix bug #10027 in iterators' Seek method Signed-off-by: beorn7 <beorn@grafana.com> * Append reporting metrics without limit If reporting metrics fails due to reaching the limit, this makes the target appear as UP in the UI, but the metrics are missing. This commit bypasses that limit for report metrics. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * Remove check against cfg so interval/ timeout are always set (#10023) (#10031) Signed-off-by: Nicholas Blott <blottn@tcd.ie> Co-authored-by: Nicholas Blott <blottn@tcd.ie> * Cut v2.32.1 Signed-off-by: Julius Volz <julius.volz@gmail.com> * Apply suggestions from code review Signed-off-by: Julius Volz <julius.volz@gmail.com> Co-authored-by: Levi Harrison <git@leviharrison.dev> Co-authored-by: Julien Pivotto <roidelapluie@inuits.eu> Co-authored-by: Nicholas Blott <blottn@tcd.ie> Co-authored-by: Julius Volz <julius.volz@gmail.com> Co-authored-by: Levi Harrison <git@leviharrison.dev>	2021-12-17 23:18:38 +01:00
beorn7	0ede6ae321	storage: fix bug #10027 in iterators' Seek method Signed-off-by: beorn7 <beorn@grafana.com>	2021-12-16 12:07:35 +01:00
beorn7	b042e29569	storage: expose bug in iterators #10027 Signed-off-by: beorn7 <beorn@grafana.com>	2021-12-16 12:02:15 +01:00
beorn7	6f33ab2b35	Merge branch 'main' into sparsehistogram	2021-12-15 13:49:33 +01:00
Chris Marchbanks	0a8d28ea93	Merge pull request #9934 from bboreham/remote-write-struct remote-write: buffer struct instead of interface to reduce garbage-collection	2021-12-09 09:17:45 -07:00
Bryan Boreham	bd6436605d	Review feedback Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2021-12-09 14:40:44 +00:00
Sebastian Rabenhorst	d8b8678bd1	Log time series details for out-of-order samples in remote write receiver (#9894 ) * Improved out-of-order sample logs in write handler Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> sign commit Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> Inlined logAppendError Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> Update storage/remote/write_handler.go Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> Fixed fmt Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> * Improved out-of-order sample logs in write handler Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> sign commit Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> Inlined logAppendError Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>	2021-12-08 15:07:51 +00:00
Bryan Boreham	50878ebe5e	remote-write: buffer struct instead of interface This reduces the amount of individual objects allocated, allowing sends to run a bit faster. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2021-12-03 14:30:42 +00:00
Bryan Boreham	c478d6477a	remote-write: benchmark just sending, on 20 shards Previously BenchmarkSampleDelivery spent a lot of effort checking each sample had arrived, so was largely showing the performance of test-only code. Increase the number of shards to be more realistic for a large workload. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2021-12-03 14:02:10 +00:00
Chris Marchbanks	e95d4ec3f1	Merge pull request #9830 from prometheus/batch-queues Batch samples before sending them to channels	2021-12-02 08:37:41 -07:00
Chris Marchbanks	c655684142	Subtract from enqueued samples/exemplars upon send Right now the values for enqueuedSamples and enqueuedExemplars is never subtracted leading to inflated values for failedSamples/failedExemplars when a hard shutdown of a shard occurs. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2021-11-30 12:54:50 -07:00

1 2 3 4 5 ...

411 commits