prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2025-03-05 20:59:13 -08:00

Author	SHA1	Message	Date
johncming	a6e18916ab	tsdb: Remove duplicate variables. (#8239 ) Signed-off-by: johncming <johncming@yahoo.com>	2020-11-30 08:55:33 +00:00
Ganesh Vernekar	dff967286e	Set the min time of Head properly after truncation (#8212 ) * Set the min time of Head properly after truncation Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Fix lint Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Enhance compaction plan logic for completely deleted small block Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Fix review comments Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-11-25 18:33:30 +05:30
Marco Pracucci	db19e05d93	Add option to customise head chunks write buffer size (#8201 ) * Add option to customise head chunks write buffer size Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed tests Signed-off-by: Marco Pracucci <marco@pracucci.com>	2020-11-19 18:30:47 +05:30
Julien Pivotto	8bc369bf9b	Calculate head chunk size based on actual disk usage (#8139 ) Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-11-03 15:34:59 +05:30
Bartlomiej Plotka	3d8826a3d4	MultiError: Refactored MultiError for more concise and safe usage. (#8066 ) * MultiError: Refactored MultiError for more concise and safe usage. * Less lines * Goland IDE was marking every usage of old MultiError "potential nil" error * It was easy to forgot using Err() when error was returned, now it's safely assured on compile time. NOTE: Potentially I would rename package to merrors. (: In different PR. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed review comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fix after rebase. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-10-28 15:24:58 +00:00
Julien Pivotto	4e5b1722b3	Move away from testutil, refactor imports (#8087 ) Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-22 11:00:08 +02:00
Bartlomiej Plotka	2fe1e9fa93	Create a checkpoint only at the end of Compact call (#8067 ) * Create a checkpoint only at the end of Compact call Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Fix review comments Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Fix Bartek's offline reviews Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Introduce TruncateInMemory and TruncateWAL Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Small enhancements and test fixing attempts Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Fix tests Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Add TestOneCheckpointPerCompactCall Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Don't truncate WAL on block compaction Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Simplified the algo. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Better protection around calling truncateWAL, truncate WAL on Head compaction error Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-10-19 20:57:08 +05:30
Ling Jin	9145200842	tsdb: fix unkown ref in log (#8048 ) Signed-off-by: JinLingChristopher <jinl1037@hotmail.com>	2020-10-13 20:03:16 +05:30
Arthur Silva Sens	4f45e201cc	Promtool tsdb list now prints block sizes (#7993 ) * promtool tsdb list now prints blocks' size Signed-off-by: arthursens <arthursens2005@gmail.com>	2020-10-12 23:15:40 +02:00
garanews	c38816828f	fix few typo (#8023 ) Signed-off-by: garanews <puntogtg@tiscali.it>	2020-10-07 16:51:31 +01:00
Brian Brazil	073e93c768	Gracefully handle unknown WAL record types. (#8004 ) As we're looking to expand what's in the WAL, having old Prometheus servers ignore the new record types rather than treating them as corruption allows for better upgrade/downgrade paths. Adjust some tests accordingly, so they're still testing what they're meant to test. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2020-10-05 10:09:59 +01:00
Max Neverov	7e1c27b853	Add tsdb startup duration metric (#7737 ) * Add tsdb wal replay duration metric Signed-off-by: Max Neverov <neverov.max@gmail.com>	2020-09-21 18:25:05 +02:00
Xiaochao Dong	a282d25099	tsdb: remove duplicate values set to reduce memory usage(map overhead) (#7915 ) Signed-off-by: Xiaochao Dong (@damnever) <dxc.wolf@gmail.com>	2020-09-10 20:35:47 +05:30
johncming	75ae384192	tsdb: remove redundant fields. (#7869 ) Signed-off-by: johncming <johncming@yahoo.com>	2020-09-02 17:03:21 +01:00
Ganesh Vernekar	2255b6f62f	Refactor WAL.Segments method to be part of the wal package (#6477 ) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-09-01 14:46:57 +05:30
Ganesh Vernekar	c806262206	Fix 'chunks.HeadReadWriter: maxt of the files are not set' error (#7856 ) * Fix chunks.HeadReadWriter: maxt of the files are not set Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-08-26 19:59:18 +02:00
Yukun Sun	cfd4e05c9e	fix: return a corruption error when iterator function find a chunk that is out of sequence (#7855 ) Signed-off-by: sunyukun <sunyukun@didiglobal.com> Co-authored-by: sunyukun <sunyukun@didiglobal.com>	2020-08-26 20:36:27 +05:30
Zhou Hao	40ace418d1	fix misspell (#7764 ) Signed-off-by: Zhou Hao <zhouhao@cn.fujitsu.com>	2020-08-07 08:57:25 +01:00
johncming	ac677ed8b3	promql: delete redundant return value. (#7721 ) Signed-off-by: johncming <johncming@yahoo.com>	2020-08-03 10:45:53 +01:00
Bartlomiej Plotka	e6d7cc5fa4	tsdb: Added ChunkQueryable implementations to db; unified MergeSeriesSets and vertical to single struct. (#7069 ) * tsdb: Added ChunkQueryable implementations to db; unified compactor, querier and fanout block iterating. Chained to https://github.com/prometheus/prometheus/pull/7059 * NewMerge(Chunk)Querier now takies multiple primaries allowing tsdb DB code to use it. * Added single SeriesEntry / ChunkEntry for all series implementations. * Unified all vertical, and non vertical for compact and querying to single merge series / chunk sets by reusing VerticalSeriesMergeFunc for overlapping algorithm (same logic as before) * Added block (Base/Chunk/)Querier for block querying. We then use populateAndTomb(Base/Chunk/) to iterate over chunks or samples. * Refactored endpoint tests and querier tests to include subtests. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed comments from Brian and Beorn. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed snapshot test and added chunk iterator support for DBReadOnly. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed race when iterating over Ats first. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed tests. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed populate block tests. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed endpoints test. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed test. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Added test & fixed case of head open chunk. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed DBReadOnly tests and bug producing 1 sample chunks. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Added cases for partial block overlap for multiple full chunks. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Added extra tests for chunk meta after compaction. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed small vertical merge bug and added more tests for that. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-07-31 16:03:02 +01:00
Annanay	9bba8a6eae	Merge branch 'master' into appender-context Signed-off-by: Annanay <annanayagarwal@gmail.com>	2020-07-30 16:43:18 +05:30
Annanay	89129cd39a	Address comments Signed-off-by: Annanay <annanayagarwal@gmail.com>	2020-07-30 16:41:13 +05:30
Javier Palomo Almena	348ff4285f	tsdb: Replace sync/atomic with uber-go/atomic in tsdb (#7659 ) * tsdb/chunks: Replace sync/atomic with uber-go/atomic Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com> * tsdb/heaad: Replace sync/atomic with uber-go/atomic Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com> * vendor: Make go.uber.org/atomic a direct dependency There is no modifications to go.sum and vendor/ because it was already vendored. Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com> * tsdb: Remove comments referring to the sync/atomic alignment bug Related: https://golang.org/pkg/sync/atomic/#pkg-note-BUG Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>	2020-07-28 10:12:42 +05:30
Julien Pivotto	ffc925dd21	TSDB: Error when we commit/rollback twice (#7593 ) * TSDB: Error when we commit/rollback twice Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-07-22 11:57:38 +02:00
Krasimir Georgiev	ccab2b30c9	Test no panic after a WAL corruption (#7625 ) * no panic the head memseries has chunks in it Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com> * fix a panic when querying after a wal corruption. Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com> * review nits Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com> * Add test for reading the data after a wal corruption. Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com> Update tsdb/db_test.go Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> Update tsdb/db_test.go Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com> * spellings Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com> Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>	2020-07-21 12:32:13 +05:30
Krasi Georgiev	d30492cbb0	Avoid panic when the headChunk is nil during isolation. Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>	2020-07-20 18:23:18 +03:00
Ganesh Vernekar	1760c7474c	Replay m-map chunks irrespective of WAL (#7589 ) * Replay m-map chunks irrespective of WAL Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * More logs Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-07-16 18:34:08 +05:30
Ganesh Vernekar	ea013343ca	Log when starting to create a checkpoint (#7581 ) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-07-15 19:15:37 +05:30
Bartlomiej Plotka	823b218e1b	Fixed race between compact (gc, populate) and head append causing unknown symbol error. (#7560 ) * Fixed race between compact (gc, populate) and head append causing unknown symbol error. Fixes https://github.com/prometheus/prometheus/issues/7373 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-07-14 09:36:22 +01:00
Bartlomiej Plotka	492061b24c	Revert "Fix unknown symbol error during head compaction (#7526 )" (#7556 ) This reverts commit `30505a202a`.	2020-07-11 22:37:16 +05:30
Ganesh Vernekar	30505a202a	Fix unknown symbol error during head compaction (#7526 ) * Fix race during head compaction Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Comment out the test Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Skip test instead of commenting it out Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-07-07 17:29:09 +05:30
Harkishen Singh	f32307b656	Increments WAL corruption metric on WAL corruption during checkpointing (#7491 ) * Increments wal corruption metric on error during checkpointing Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com> * check for wal corruption error Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>	2020-07-05 11:25:42 +05:30
Ganesh Vernekar	082c17b691	Introduce SortedLabelValues/LabelValues to speedup queries for high cardinality (#7448 ) * Introduce LabelValuesUnsorted to speedup queries for high cardinality Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Add sort check Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-06-25 14:10:29 +01:00
Peter Štibraný	ff80690a6e	Optimise lowWatermark in Isolation (#7332 ) * Track open appenders in doubly-linked list to make lowWatermark O(1). * Use RW locks. * Added BenchmarkIsolationWithState. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>	2020-06-03 20:09:05 +02:00
Jess G	fdc49fae5b	Added time range parameters to labelNames API (#7288 ) * add time range params to labelNames api Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * evaluate min/max time range when reading labels from the head Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * add time range params to labelValues api Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * fix test, add docs Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * add a test for head min max range Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * fix test to match comment Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * address CR comments Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * combine vars only used once Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * add time range params to labelNames api Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * evaluate min/max time range when reading labels from the head Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * add time range params to labelValues api Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * fix test, add docs Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * add a test for head min max range Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * fix test to match comment Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * address CR comments Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * combine vars only used once Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * fix test Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * restart ci Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * use range expectedLabelNames instead of range actualLabelNames in test Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>	2020-05-30 13:50:09 +01:00
Krasimir Georgiev	09df8d94e0	More explicit chunks and head error handling. (#7277 )	2020-05-22 12:03:23 +03:00
Ganesh Vernekar	1c99adb9fd	Callbacks for lifecycle of series in TSDB (#7159 ) * Callbacks for lifecycle of series in TSDB Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Add more comments Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-05-20 18:52:08 +05:30
Ganesh Vernekar	d4b9fe801f	M-map full chunks of Head from disk (#6679 ) When appending to the head and a chunk is full it is flushed to the disk and m-mapped (memory mapped) to free up memory Prom startup now happens in these stages - Iterate the m-maped chunks from disk and keep a map of series reference to its slice of mmapped chunks. - Iterate the WAL as usual. Whenever we create a new series, look for it's mmapped chunks in the map created before and add it to that series. If a head chunk is corrupted the currpted one and all chunks after that are deleted and the data after the corruption is recovered from the existing WAL which means that a corruption in m-mapped files results in NO data loss. [Mmaped chunks format](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/head_chunks.md) - main difference is that the chunk for mmaping now also includes series reference because there is no index for mapping series to chunks. [The block chunks](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/chunks.md) are accessed from the index which includes the offsets for the chunks in the chunks file - example - chunks of series ID have offsets 200, 500 etc in the chunk files. In case of mmaped chunks, the offsets are stored in memory and accessed from that. During WAL replay, these offsets are restored by iterating all m-mapped chunks as stated above by matching the series id present in the chunk header and offset of that chunk in that file. Prombench results _WAL Replay_ 1h Wal reply time 30% less wal reply time - 4m31 vs 3m36 2h Wal reply time 20% less wal reply time - 8m16 vs 7m _Memory During WAL Replay_ High Churn: 10-15% less RAM - 32gb vs 28gb 20% less RAM after compaction 34gb vs 27gb No Churn: 20-30% less RAM - 23gb vs 18gb 40% less RAM after compaction 32.5gb vs 20gb Screenshots are in [this comment](https://github.com/prometheus/prometheus/pull/6679#issuecomment-621678932) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-05-06 21:00:00 +05:30
Ben Ye	1e4e37144d	Fixed wrongly handled not ready TSDB on web and API. (#7182 ) * fix federate endpoint panic Signed-off-by: yeya24 <yb532204897@gmail.com> * Fixed all cases of not ready TSDB being wrongly handled. * Fixed issue for federation. * Ensured this will never happen again thanks to interfaces * Fixes same issue for stats. * Added tests for readiness. * Fixed bug in stats. It was: status.MaxTime = db.Head().MaxTime() status.MinTime = db.Head().MaxTime() Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed Brian's comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed Brian's comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-04-29 17:16:14 +01:00
Julien Pivotto	fc3fb3265a	Merge pull request #7145 from prometheus/release-2.17 Backport release 2.17 into master	2020-04-20 14:08:12 +02:00
Julien Pivotto	ed1852ab95	TSDB: Isolation: avoid creating appenderId's without appender (#7135 ) Prior to this commit we could have situations where we are creating an appenderId but never creating an appender to go with it, therefore blocking the low watermak. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-04-17 20:51:03 +02:00
Marek Slabicki	8224ddec23	Capitalizing first letter of all log lines (#7043 ) Signed-off-by: Marek Slabicki <thaniri@gmail.com>	2020-04-11 09:22:18 +01:00
Brian Brazil	cd73b3d33e	Reduce how much old WAL we keep around. (#7098 ) Previously we were keeping up to around 6 hours of WAL around by removing 1/3 every hours. This was excessive, so switch to removing 2/3 which will up to around 3 hours of WAL around. This will roughly halve the size of the WAL and halve startup time for those who are I/O bound. This may increase the checkpoint size for those with certain churn patterns, but by much less than we're saving from the segments. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2020-04-07 15:55:57 +05:30
Julien Pivotto	ceef10cee4	Reset comment Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-03-26 00:17:56 +01:00
Julien Pivotto	653f343547	Revert head posting optimization This reverts commit `52630ad0c7`. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-03-25 20:19:33 +01:00
Julien Pivotto	8907ba6235	Make TSDB use storage errors This fixes #6992, which was introduced by #6777. There was an intermediate component which translated TSDB errors into storage errors, but that component was deleted and this bug went unnoticed, until we were watching at the Prombench results. Without this, scrape will fail instead of dropping samples or using "Add" when the series have been garbage collected. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-03-17 22:24:25 +01:00
Julien Pivotto	0f9e78bd88	tsdb: fix races around head chunks (#6985 ) * tsdb: fix races around head chunks Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-03-16 13:59:22 +01:00
Björn Rabenstein	d80b0810c1	Move crucial actions to defer (#6918 ) With defer having less of a performance penalty, there is no reason not to do those crucial operations via defer. Context: With isolation in place, if we forget to Commit/Rollback, the low watermark will get stuck forever. The current code should not have any bugs, but moving to defer helps to avoid future bugs. This is also moving the `closeAppend` in the `Commit` implementation itself to defer. If logging to the WAL fails, we would have missed the `closeAppend`. Signed-off-by: beorn7 <beorn@grafana.com>	2020-03-13 20:54:47 +01:00
beorn7	f6f4fd6556	tsdb: Do a full rollback upon commit error I think the previous behavior is problematic as it will leave `memSeries` around that still have `pendingCommit` set to `true`. The only case where this can happen in this code path is a failure to write to the WAL, in which case we are probably in trouble anyway. I believe, however, we should still try to do the right thing and do the full rollback. This will implicitly try to write to the WAL again, but this time without samples, which may even succeed. (But we propagate the previous error in any case.) This also adds `a.head.putSeriesBuffer(a.sampleSeries)` to Rollback, which was previously missing. Signed-off-by: beorn7 <beorn@grafana.com>	2020-03-10 14:54:41 +01:00
beorn7	0193b746b1	Defer call to iso.closeAppend This is taken from #6918. Since we probably won't merge #6918 before the relase, we have to do this bit of it as it fixes an actual bug (iso.closeAppend is not called if the append fails because of an error logging to the WAL). Signed-off-by: beorn7 <beorn@grafana.com>	2020-03-04 23:33:30 +01:00

1 2

79 commits