prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-11-14 17:44:06 -08:00

Author	SHA1	Message	Date
Ganesh Vernekar	4aa6d561a1	Run OOO compaction after restart if there is OOO data from WBL (#320 ) Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-08-22 18:26:43 +05:30
Jesus Vazquez	54196bb7c4	Remove useless err check (#319 ) Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>	2022-08-19 12:52:03 +05:30
Peter Štibraný	bec3143c6c	Remove unused tsdb.Options.NewChunkDiskMapper field. (#318 ) Signed-off-by: Peter Štibraný <pstibrany@gmail.com> Signed-off-by: Peter Štibraný <pstibrany@gmail.com>	2022-08-18 09:04:13 +00:00
Peter Štibraný	acd4a8a69d	Remove old chunk mapper. (#311 ) * Remove old chunk mapper. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Revert to code from upstream. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Fix OOO test failures Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix sorting in TestOOOHeadChunkReader_LabelValues Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Signed-off-by: Peter Štibraný <pstibrany@gmail.com> Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-08-18 08:38:17 +00:00
Jesus Vazquez	e9456018fa	Merge pull request #314 from grafana/jvp/fix-ooo-head-postingsformatchers-and-labelvalues Fix OOO Head LabelValues and PostingsForMatchers	2022-08-11 16:23:28 +02:00
Jesus Vazquez	6bff0d9113	Add suggestions to labelValues test Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>	2022-08-11 16:08:49 +02:00
Oleg Zaytsev	b87c54de14	Load OutOfOrderTimeWindow only once per appender We're loading the time window once per each Append() call, plus once in the Commit(). While not extremely expensive, atomic operations are also not cheap. Additionally, it makes sense to keep the window consistent for a single append. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>	2022-08-11 10:46:12 +02:00
Jesus Vazquez	14ec85d4d2	Fix LabelValues test	2022-08-11 10:29:06 +02:00
Jesus Vazquez	3473058073	Start writing OOOHeadIndexReader LabelValues test Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>	2022-08-10 23:38:02 +02:00
Jesus Vazquez	7368402054	Fix OOO Head LabelValues and PostingsForMatchers OOOHeadIndexReader was using the headIndexReader PostingsForMatchers() and LabelValues() implementation which lead to a very subtle bug that led to wrong query results. headIndexReader LabelValues() implementation checks if the query timerange overlaps with the head maxt and mint and if it doesnt it returns an empty list of values. Since this code was also used by the ooo head it led to wrong results that we were not able to see in tests because our queries where always from MinInt64 to MaxInt64. This commit also adds a new test that performs multiple time range queries to make sure this never happens again. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>	2022-08-10 23:14:19 +02:00
Ganesh Vernekar	01b03b7f85	Prevent panic with ApplyConfig (#312 ) Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-08-09 19:25:17 +05:30
Ganesh Vernekar	9fe7d3a478	Update minOOOTime after truncating Head (#309 ) * Update minOOOTime after truncating Head Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix lint Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Add a unit test Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-08-02 19:43:50 +05:30
Ganesh Vernekar	0ca37c62db	Make overlapping block log into a debug Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-08-02 12:44:12 +05:30
Ganesh Vernekar	4b2198d7ec	Do not double count in OOO histogram (#300 ) Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-07-22 13:47:51 +05:30
Ganesh Vernekar	2836e5513f	Add support to query unmerged, unsorted chunks (#299 ) * Add support to query unmerged, unsorted chunks Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix unrelated lint issue Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-07-20 15:57:02 +05:30
Ganesh Vernekar	00b379c3a5	Include out of order samples in BenchmarkQueries (#286 ) Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-07-14 18:01:39 +05:30
Ganesh Vernekar	ff0dc75758	Replay WBL even if OOO Time Window is 0 (#296 ) * Replay WBL even if OOO Time Window is 0 Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Apply feedback Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-07-14 08:28:03 +00:00
Peter Štibraný	ae49ab5ea8	Merge remote-tracking branch 'upstream/main' into update-upstream-prometheus	2022-07-13 10:18:09 +02:00
Ganesh Vernekar	a632c73352	Simplify how OutOfOrderTimeWindow works (#285 ) * Simplify how OutOfOrderTimeWindow works Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Update test Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-07-08 18:53:23 +05:30
Ganesh Vernekar	c6f3d4ab33	Remove temporary patch for out-of-order (#283 ) * Remove temporary patch for out-of-order Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Remove ooo_wbl patch and fix tests Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-07-04 13:35:18 +00:00
Jesus Vazquez	6cfe44d7fd	WaitUntilIdle optimize idling time (#10878 ) Relates to @bboreham optimization in https://github.com/prometheus/prometheus/pull/10859 Bryan did reduce the sleep time improving the deltas on the benchmark by quite a lot. However I've been working on a similar implementation for out of order and I noticed that we actually get into this method thousands of times. @ywwg had the brilliant idea of not always sleeping before the select but actually make it a case in the select so we only sleep if we need to. The benchmark deltas are amazing ``` ❯ benchstat old_implementation.txt new_implementation_using_time_after.txt name old time/op new time/op delta LoadWAL/batches=10,seriesPerBatch=100,samplesPerSeries=7200,exemplarsPerSeries=0,mmappedChunkT=0-8 521ms ±25% 253ms ± 6% -51.47% (p=0.008 n=5+5) LoadWAL/batches=10,seriesPerBatch=100,samplesPerSeries=7200,exemplarsPerSeries=36,mmappedChunkT=0-8 773ms ± 3% 369ms ±31% -52.23% (p=0.008 n=5+5) LoadWAL/batches=10,seriesPerBatch=100,samplesPerSeries=7200,exemplarsPerSeries=72,mmappedChunkT=0-8 592ms ±28% 297ms ±28% -49.80% (p=0.008 n=5+5) LoadWAL/batches=10,seriesPerBatch=100,samplesPerSeries=7200,exemplarsPerSeries=360,mmappedChunkT=0-8 547ms ± 2% 999ms ±187% ~ (p=0.690 n=5+5) LoadWAL/batches=10,seriesPerBatch=10000,samplesPerSeries=50,exemplarsPerSeries=0,mmappedChunkT=0-8 11.3s ± 4% 1.3s ±44% -88.48% (p=0.008 n=5+5) LoadWAL/batches=10,seriesPerBatch=10000,samplesPerSeries=50,exemplarsPerSeries=2,mmappedChunkT=0-8 11.1s ± 1% 1.2s ±20% -89.08% (p=0.008 n=5+5) LoadWAL/batches=10,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=0,mmappedChunkT=0-8 1.24s ± 3% 0.18s ± 7% -85.76% (p=0.008 n=5+5) LoadWAL/batches=10,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=2,mmappedChunkT=0-8 1.24s ± 2% 0.18s ± 5% -85.24% (p=0.008 n=5+5) LoadWAL/batches=10,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=5,mmappedChunkT=0-8 1.23s ± 5% 0.27s ±33% -77.73% (p=0.008 n=5+5) LoadWAL/batches=10,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=24,mmappedChunkT=0-8 1.28s ± 1% 0.36s ± 7% -71.51% (p=0.008 n=5+5) LoadWAL/batches=100,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=0,mmappedChunkT=3800-8 12.1s ± 1% 3.1s ± 6% -74.33% (p=0.008 n=5+5) LoadWAL/batches=100,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=2,mmappedChunkT=3800-8 12.1s ± 1% 3.4s ± 4% -71.94% (p=0.008 n=5+5) LoadWAL/batches=100,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=5,mmappedChunkT=3800-8 12.1s ± 1% 3.8s ±17% -68.35% (p=0.008 n=5+5) LoadWAL/batches=100,seriesPerBatch=1000,samplesPerSeries=480,exemplarsPerSeries=24,mmappedChunkT=3800-8 12.4s ± 1% 4.0s ±18% -67.71% (p=0.008 n=5+5) ``` Benchmarked on Linux ``` goos: linux goarch: amd64 pkg: github.com/prometheus/prometheus/tsdb cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz ``` Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>	2022-06-30 15:00:04 +02:00
Julien Pivotto	bacd776356	Merge pull request #10907 from damnever/fix/panic Fix panic if series is not found when deleting series	2022-06-30 11:23:08 +02:00
Peter Štibraný	ffc60d8397	Reduce chunk write queue memory usage 2 (#10874 ) * Job queue This PR reimplements chan chunkWriteJob with custom buffered queue that should use less memory, because it doesn't preallocate entire buffer for maximum queue size at once. Instead it allocates individual "segments" with smaller size. As elements are added to the queue, they fill individual segments. When elements are removed from the queue (and segments), empty segments can be thrown away. This doesn't change memory usage of the queue when it's full, but should decrease its memory footprint when it's empty (queue will keep max 1 segment in such case). Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Modify test to work with low resolution timer. Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Improve comments. Signed-off-by: Peter Štibraný <pstibrany@gmail.com>	2022-06-29 17:51:27 +05:30
Ganesh Vernekar	5e8406a1d4	Avoid gaps in in-order data after restart with out-of-order enabled (#277 ) * Avoid gaps in in-order data after restart with out-of-order enabled Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix tests, do the temporary patch only if OOO is enabled Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Avoid Peter's confusion Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Use latest OutOfOrderTimeWindow Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-06-27 20:26:25 +05:30
Jesus Vazquez	1446b53d87	Merge pull request #276 from grafana/jvp/rename-oooallowance-to-oootimewindow Rename OutOfOrderAllowance to OutOfOrderTimeWindow	2022-06-24 12:40:20 +02:00
Ganesh Vernekar	abde1e0ba1	Update MinOOOTime and MaxOOOTime properly after restart (#275 ) Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-06-24 10:26:33 +00:00
Jesus Vazquez	e70e769889	Rename OutOfOrderAllowance to OutOfOrderTimeWindow After review Allowance is perhaps a bit misleading so we've decided to replace it with a more common term like TimeWindow.	2022-06-24 12:23:38 +02:00
Xiaochao Dong (@damnever)	6b042da2d8	Fix panic if series is not found when deleting series Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>	2022-06-24 15:55:32 +08:00
Ganesh Vernekar	df59320886	Add out-of-order sample support to the TSDB (#269 ) This implementation is based on this design doc: https://docs.google.com/document/d/1Kppm7qL9C-BJB1j6yb6-9ObG3AbdZnFUBYPNNWwDBYM/edit?usp=sharing This commit adds support to accept out-of-order ("OOO") sample into the TSDB up to a configurable time allowance. If OOO is enabled, overlapping querying are automatically enabled. Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com> Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com> Co-authored-by: Dieter Plaetinck <dieter@grafana.com>	2022-06-22 11:45:21 +00:00
Steve Azzopardi	04fe2c9522	fix(tsdb): inc mmap corruption counter on mmap out of sequence error (#10406 ) What --- When we see out of sequence chunks increase the chunk corruption counter to indicate that one of the chunks was corrupted. Reference: https://github.com/prometheus/prometheus/pull/10406#issuecomment-1142595527 Signed-off-by: Steve Azzopardi <steveazz@outlook.com>	2022-06-22 14:03:12 +05:30
Peter Štibraný	03a2313f7a	Reduce chunk write queue memory usage (#10873 ) * dont waste space on the chunkRefMap * add time factor * add comments * better readability * add instrumentation and more comments * formatting * uppercase comments * Address review feedback. Renamed "free" to "shrink" everywhere, updated comments and threshold to 1000. * double space Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com> Co-authored-by: Mauro Stettler <mauro.stettler@gmail.com>	2022-06-17 13:11:39 +05:30
Bryan Boreham	9f77d23889	tsdb: commit data periodically in CreateBlock (#10788 ) To avoid building up data in memory, commit and make a new appender periodically. The number `commitAfter = 10000` was chosen arbitrarily; testing with 10x more or less gives slightly worse results. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-06-17 11:26:19 +05:30
Łukasz Mierzwa	d65f037def	Don't increment prometheus_tsdb_compactions_failed_total when context is canceled (#10772 ) When restarting Prometheus I sometimes see: caller=db.go:832 level=error component=tsdb msg="compaction failed" err="compact head: persist head block: 2 errors: populate block: context canceled; context canceled" And prometheus_tsdb_compactions_failed_total metric gets incremented. This makes it more difficult to write alerts based on prometheus_tsdb_compactions_failed_total metric since any restart can trigger it. Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>	2022-06-17 11:21:43 +05:30
Peter Štibraný	cf7aeb59a7	Merge remote-tracking branch 'upstream/main' into update-prometheus	2022-06-14 09:34:59 +02:00
Jesus Vazquez	06f1d3c349	Merge pull request #251 from grafana/codesome/ooopatch Add an option to enable overlapping compaction separately with overlapping queries	2022-06-13 17:11:38 +02:00
Bryan Boreham	542b9ecdbd	tsdb: reduce sleep time when reading WAL (#10859 ) The code sleeps for a short time to allow goroutines to finish, however it seems the duration can be reduced a lot, speeding up the reading process. I checked using some WAL data from production, and the queue is almost always empty at the time we enter `waitForIdle()` so there is no danger of spinning in the tight loop. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-06-12 11:54:11 +05:30
Ganesh Vernekar	0eb828c179	Add an option to enable overlapping compaction separately with overlapping queries Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-06-09 12:11:42 -07:00
Peter Štibraný	d051065441	Remove use of io/ioutil.	2022-06-09 15:01:34 +02:00
Peter Štibraný	9d51bf50db	Merge upstream Prometheus	2022-06-09 11:29:19 +02:00
Peter Štibraný	55236be04a	Fix comments. (#248 )	2022-06-08 10:01:05 +02:00
Bryan Boreham	9f79a6f4b5	tsdb: faster CRC check by avoiding allocations (#10789 ) Instead of creating a new hashing object every time, call `crc32.Checksum` which computes the answer without allocations. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-06-08 08:00:59 +05:30
Peter Štibraný	1e2d2fb2d8	Job queue (#247 ) This PR reimplements chan chunkWriteJob with custom buffered queue that should use less memory, because it doesn't preallocate entire buffer for maximum queue size at once. Instead it allocates individual "segments" with smaller size. As elements are added to the queue, they fill individual segments. When elements are removed from the queue (and segments), empty segments can be thrown away. This doesn't change memory usage of the queue when it's full, but should decrease its memory footprint when it's empty (queue will keep max 1 segment in such case).	2022-06-07 17:42:28 +02:00
Mauro Stettler	459f59935c	Reduce chunk write queue memory usage (#131 ) * dont waste space on the chunkRefMap * add time factor * add comments * better readability * add instrumentation and more comments * formatting * uppercase comments * Address review feedback. Renamed "free" to "shrink" everywhere, updated comments and threshold to 1000. * double space Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com>	2022-06-02 13:48:30 +02:00
Matej Gera	1dd247f68b	Remote Write: Rename confusing `walDir` parameter to `dir` (#10464 ) * Rename walDir parameter to dir Signed-off-by: Matej Gera <matejgera@gmail.com> * Improve NewQueueManager comment Signed-off-by: Matej Gera <matejgera@gmail.com>	2022-05-30 21:45:30 -07:00
David Leadbeater	57f4aab27d	Update godoc links and remove note about TSDB versioning (#10754 ) Signed-off-by: David Leadbeater <dgl@dgl.cx>	2022-05-26 18:34:43 +10:00
maizige	10b677b826	fix typo (#10696 ) Update doc comment Signed-off-by: gemaizi <864321211@qq.com>	2022-05-25 18:01:45 +02:00
Filip Petkovski	d3cb39044e	Fix typo in symbol table size exceeded error message (#10746 ) This commit fixes a typo when reporting an error that the the symbols table size has been exceeded. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>	2022-05-25 10:40:36 +02:00
Julien Pivotto	6e3a0efe40	Make necessary change to compile promql parser to wasm (#10683 ) Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>	2022-05-12 09:12:05 +02:00
Matthias Rampke	78f2645787	test(tsdb): break up repeated test to avoid timeout (#10671 ) On macOS, the TestTombstoneCleanRetentionLimitsRace performs very poorly. It takes more than a second to write out one block, and as it writes 400 of them, we run into the 10-minute test timeout frequently. While this doesn't fix the actual performance issue, breaking each iteration into a subtest makes the test pass reliably (because each iteration comfortably finishes in under a minute). Related report: https://groups.google.com/g/prometheus-developers/c/jxQ6Ayg6VJ4/m/03H_DS9PDAAJ Signed-off-by: Matthias Rampke <matthias@prometheus.io>	2022-05-09 00:39:26 +02:00
Łukasz Mierzwa	88f9b248b4	Correctly format error message (#10669 ) Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>	2022-05-06 00:42:31 +02:00

1 2 3 4 5 ...

580 commits