prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2025-03-05 20:59:13 -08:00

Author	SHA1	Message	Date
Krasi Georgiev	9a75b5f84b	Avoid panic when the headChunk is nil during isolation. Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>	2020-07-24 14:32:17 +05:30
Ganesh Vernekar	48fae12b89	Fix unsequential m-map files (#7414 ) * Fix unsequential m-map files Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Fix review comments Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-06-18 19:24:58 +05:30
Ganesh Vernekar	1627d234da	Moves the atomically accessed member to the top of the struct (#7365 ) * Moves the 64bit atomically accessed field to the top of the struct. Signed-off-by: Bryan Varner <1652015+bvarner@users.noreply.github.com> * Moves the 64bit atomically accessed field to the top of the struct. Signed-off-by: Bryan Varner <1652015+bvarner@users.noreply.github.com> * Fixing up go fmt formatting issues. Signed-off-by: Bryan Varner <1652015+bvarner@users.noreply.github.com> Co-authored-by: Bryan Varner <1652015+bvarner@users.noreply.github.com>	2020-06-09 10:55:43 +05:30
Peter Štibraný	ff80690a6e	Optimise lowWatermark in Isolation (#7332 ) * Track open appenders in doubly-linked list to make lowWatermark O(1). * Use RW locks. * Added BenchmarkIsolationWithState. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>	2020-06-03 20:09:05 +02:00
Jess G	fdc49fae5b	Added time range parameters to labelNames API (#7288 ) * add time range params to labelNames api Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * evaluate min/max time range when reading labels from the head Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * add time range params to labelValues api Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * fix test, add docs Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * add a test for head min max range Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * fix test to match comment Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * address CR comments Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * combine vars only used once Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * add time range params to labelNames api Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * evaluate min/max time range when reading labels from the head Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * add time range params to labelValues api Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * fix test, add docs Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * add a test for head min max range Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * fix test to match comment Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * address CR comments Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * combine vars only used once Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * fix test Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * restart ci Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com> * use range expectedLabelNames instead of range actualLabelNames in test Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>	2020-05-30 13:50:09 +01:00
Ganesh Vernekar	a1355eb7c7	Remove time based m-map file creation (#7314 ) * Remove time based m-map file creation Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Fix review comments Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-05-29 20:08:41 +05:30
Ganesh Vernekar	83619aa9ac	Preallocate m-map file only for Windows (#7306 ) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-05-28 20:24:19 +05:30
Guangwen Feng	2393d6137b	Add unit test case for func Type in record.go (#7082 ) Signed-off-by: Guangwen Feng <fenggw-fnst@cn.fujitsu.com>	2020-05-27 12:08:33 +05:30
Krasimir Georgiev	f4dd45609a	Use min and maxt of the range head when creating a block (#7282 ) Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>	2020-05-22 17:00:06 +05:30
Krasimir Georgiev	09df8d94e0	More explicit chunks and head error handling. (#7277 )	2020-05-22 12:03:23 +03:00
Ganesh Vernekar	1c99adb9fd	Callbacks for lifecycle of series in TSDB (#7159 ) * Callbacks for lifecycle of series in TSDB Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Add more comments Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-05-20 18:52:08 +05:30
Ganesh Vernekar	d4b9fe801f	M-map full chunks of Head from disk (#6679 ) When appending to the head and a chunk is full it is flushed to the disk and m-mapped (memory mapped) to free up memory Prom startup now happens in these stages - Iterate the m-maped chunks from disk and keep a map of series reference to its slice of mmapped chunks. - Iterate the WAL as usual. Whenever we create a new series, look for it's mmapped chunks in the map created before and add it to that series. If a head chunk is corrupted the currpted one and all chunks after that are deleted and the data after the corruption is recovered from the existing WAL which means that a corruption in m-mapped files results in NO data loss. [Mmaped chunks format](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/head_chunks.md) - main difference is that the chunk for mmaping now also includes series reference because there is no index for mapping series to chunks. [The block chunks](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/chunks.md) are accessed from the index which includes the offsets for the chunks in the chunks file - example - chunks of series ID have offsets 200, 500 etc in the chunk files. In case of mmaped chunks, the offsets are stored in memory and accessed from that. During WAL replay, these offsets are restored by iterating all m-mapped chunks as stated above by matching the series id present in the chunk header and offset of that chunk in that file. Prombench results _WAL Replay_ 1h Wal reply time 30% less wal reply time - 4m31 vs 3m36 2h Wal reply time 20% less wal reply time - 8m16 vs 7m _Memory During WAL Replay_ High Churn: 10-15% less RAM - 32gb vs 28gb 20% less RAM after compaction 34gb vs 27gb No Churn: 20-30% less RAM - 23gb vs 18gb 40% less RAM after compaction 32.5gb vs 20gb Screenshots are in [this comment](https://github.com/prometheus/prometheus/pull/6679#issuecomment-621678932) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-05-06 21:00:00 +05:30
Bartlomiej Plotka	532f7bbac9	Merge pull request #7204 from prometheus/release-2.18 [Merge Without Squash] Merge release-2.18 back to master.	2020-05-05 18:58:45 +01:00
Ben Ye	1e4e37144d	Fixed wrongly handled not ready TSDB on web and API. (#7182 ) * fix federate endpoint panic Signed-off-by: yeya24 <yb532204897@gmail.com> * Fixed all cases of not ready TSDB being wrongly handled. * Fixed issue for federation. * Ensured this will never happen again thanks to interfaces * Fixes same issue for stats. * Added tests for readiness. * Fixed bug in stats. It was: status.MaxTime = db.Head().MaxTime() status.MinTime = db.Head().MaxTime() Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed Brian's comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed Brian's comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-04-29 17:16:14 +01:00
ga	05038b48bd	Goroutine: Fix ambiguous variable (#7175 ) Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>	2020-04-28 11:02:26 +01:00
Goutham Veeramachaneni	84b4d079c8	Make sure deleted intervals are excluded from Seek (#6980 ) Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>	2020-04-23 10:00:30 +01:00
Julien Pivotto	fc3fb3265a	Merge pull request #7145 from prometheus/release-2.17 Backport release 2.17 into master	2020-04-20 14:08:12 +02:00
Julien Pivotto	ed1852ab95	TSDB: Isolation: avoid creating appenderId's without appender (#7135 ) Prior to this commit we could have situations where we are creating an appenderId but never creating an appender to go with it, therefore blocking the low watermak. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-04-17 20:51:03 +02:00
ZouYu	2b7437d60e	Fix some warnings: 'redundant type from array, slice, or map composite literal' (#7109 ) Signed-off-by: ZouYu <zouy.fnst@cn.fujitsu.com>	2020-04-15 11:17:41 +01:00
Marek Slabicki	8224ddec23	Capitalizing first letter of all log lines (#7043 ) Signed-off-by: Marek Slabicki <thaniri@gmail.com>	2020-04-11 09:22:18 +01:00
Brian Brazil	cd73b3d33e	Reduce how much old WAL we keep around. (#7098 ) Previously we were keeping up to around 6 hours of WAL around by removing 1/3 every hours. This was excessive, so switch to removing 2/3 which will up to around 3 hours of WAL around. This will roughly halve the size of the WAL and halve startup time for those who are I/O bound. This may increase the checkpoint size for those with certain churn patterns, but by much less than we're saving from the segments. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2020-04-07 15:55:57 +05:30
Brad Walker	3348930df5	Replace fileutil.ReadDir with ioutil.ReadDir (#7029 ) (#7033 ) * tsdb: Replace fileutil.ReadDir with ioutil.ReadDir (#7029) Signed-off-by: Brad Walker <brad@bradmwalker.com> * tsdb: Remove fileutil.ReadDir (#7029) Signed-off-by: Brad Walker <brad@bradmwalker.com>	2020-04-06 19:04:20 +05:30
MengZeLee	a7982ffc0f	Fix typo (#7068 ) Fix typo. Signed-off-by: MengZn <adnt587@gmail.com>	2020-03-30 13:18:34 +05:30
Brian Brazil	7646cbca32	Use .UTC everywhere we use time.Unix (#7066 ) time.Unix attaches the local timezone, which can then leak out (e.g. in the alert json). While this is harmless, we should be consistent. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2020-03-29 17:35:39 +01:00
Julien Pivotto	9057decce2	Merge pull request #7060 from prometheus/release-2.17 Release 2.17	2020-03-27 15:57:07 +01:00
Julien Pivotto	ceef10cee4	Reset comment Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-03-26 00:17:56 +01:00
Julien Pivotto	73228b1b68	Those links should not be reverted Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-03-25 20:37:26 +01:00
Julien Pivotto	653f343547	Revert head posting optimization This reverts commit `52630ad0c7`. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-03-25 20:19:33 +01:00
Bartlomiej Plotka	d5c33877f9	storage: Added Chunks{Queryable/Querier/SeriesSet/Series/Iteratable. Added generic Merge{SeriesSet/Querier} implementation. (#7005 ) * storage: Added Chunks{Queryable/Querier/SeriesSet/Series/Iteratable. Added generic Merge{SeriesSet/Querier} implementation. ## Rationales: In many places (e.g. chunk Remote read, Thanos Receive fetching chunk from TSDB), we operate on encoded chunks not samples. This means that we unnecessary decode/encode, wasting CPU, time and memory. This PR adds chunk iterator interfaces and makes the merge code to be reused between both seriesSets I will make the use of it in following PR inside tsdb itself. For now fanout implements it and mergers. All merges now also allows passing series mergers. This opens doors for custom deduplications other than TSDB vertical ones (e.g. offline one we have in Thanos). ## Changes * Added Chunk versions of all iterating methods. It all starts in Querier/ChunkQuerier. The plan is that Storage will implement both chunked and samples. * Added Seek to chunks.Iterator interface for iterating over chunks. * NewMergeChunkQuerier was added; Both this and NewMergeQuerier are now using generigMergeQuerier to share the code. Generic code was added. * Improved tests. * Added some TODO for further simplifications in next PRs. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed Brian's comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Moved s/Labeled/SeriesLabels as per Krasi suggestion. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed Krasi's comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Second iteration of Krasi comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Another round of comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-03-24 20:15:47 +00:00
Ben Kochie	fac7a4a050	Merge pull request #7037 from prometheus/bjk/golint Enable golint in CI	2020-03-24 09:20:08 +01:00
Ben Kochie	269e7c8091	Fix golint issues. Signed-off-by: Ben Kochie <superq@gmail.com>	2020-03-23 20:38:43 +01:00
Ganesh Vernekar	6fdc852813	Fix TestHeadDeleteSimple to test reloaded Head too (#7021 ) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-03-23 16:55:25 +02:00
Ganesh Vernekar	e64a149984	Close Head in DBReadOnly.FlushWAL (#7022 ) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-03-23 14:49:44 +05:30
zhulongcheng	e813f60fd6	tsdb: fix sequence check for WAL segments (#7032 ) Signed-off-by: zhulongcheng <zhulongcheng.dev@gmail.com>	2020-03-23 13:16:28 +05:30
zhulongcheng	dbb8f5861d	tsdb: add tombstonesHeaderSize constant (#7028 ) Signed-off-by: zhulongcheng <zhulongcheng.dev@gmail.com>	2020-03-22 12:59:35 +05:30
Julien Pivotto	f1984bb007	Merge pull request #7025 from prometheus/release-2.17 Merge release 2.17 into master	2020-03-21 21:39:07 +01:00
beorn7	526cff39b9	Fix tests that were broken by #7009 Signed-off-by: beorn7 <beorn@grafana.com>	2020-03-20 21:22:58 +01:00
Bartlomiej Plotka	c4eefd1b3a	storage: Removed SelectSorted method; Simplified interface; Added requirement for remote read to sort response. This is technically BREAKING CHANGE, but it was like this from the beginning: I just notice that we rely in Prometheus on remote read being sorted. This is because we use selected data from remote reads in MergeSeriesSet which rely on sorting. I found during work on https://github.com/prometheus/prometheus/pull/5882 that we do so many repetitions because of this, for not good reason. I think I found a good balance between convenience and readability with just one method. Smaller the interface = better. Also I don't know what TestSelectSorted was testing, but now it's testing sorting. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-03-20 21:14:43 +01:00
Callum Styan	f802f1e8ca	Fix bug with WAL watcher and Live Reader metrics usage. (#6998 ) * Fix bug with WAL watcher and Live Reader metrics usage. Calling NewXMetrics when creating a Watcher or LiveReader results in a registration error, which we're ignoring, and as a result other than the first Watcher/Reader created, we had no metrics for either. So we would only have metrics like Watcher Records Read for the first remote write config in a users config file. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2020-03-20 17:34:15 +01:00
Bartlomiej Plotka	8fa4ada9ae	Merge pull request #7010 from prometheus/beorn7/fix-test Fix tests that were broken by #7009	2020-03-19 17:41:02 +00:00
Ganesh Vernekar	e50fdbc70c	Live m-mapping of chunks on disk (#6830 ) * Live m-mapping of chunks on disk Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Fix review comments Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Fix review comments Part 2 Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Fix review comments Part 3 Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Fix review comments Part 4 Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Attempt to fix windows bug Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-03-19 22:03:44 +05:30
beorn7	c0ecbb38af	Fix tests that were broken by #7009 Signed-off-by: beorn7 <beorn@grafana.com>	2020-03-19 16:28:23 +01:00
Björn Rabenstein	1da83305be	Merge pull request #7009 from prometheus/release-2.17 Merge release-2.17 into master	2020-03-19 13:46:28 +01:00
zhulongcheng	5f5c7a4477	tsdb: sort checkpoints by segment number (#6987 ) Signed-off-by: zhulongcheng <zhulongcheng.dev@gmail.com>	2020-03-18 20:40:41 +05:30
Julien Pivotto	8907ba6235	Make TSDB use storage errors This fixes #6992, which was introduced by #6777. There was an intermediate component which translated TSDB errors into storage errors, but that component was deleted and this bug went unnoticed, until we were watching at the Prombench results. Without this, scrape will fail instead of dropping samples or using "Add" when the series have been garbage collected. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-03-17 22:24:25 +01:00
Julien Pivotto	0f9e78bd88	tsdb: fix races around head chunks (#6985 ) * tsdb: fix races around head chunks Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-03-16 13:59:22 +01:00
Björn Rabenstein	d80b0810c1	Move crucial actions to defer (#6918 ) With defer having less of a performance penalty, there is no reason not to do those crucial operations via defer. Context: With isolation in place, if we forget to Commit/Rollback, the low watermark will get stuck forever. The current code should not have any bugs, but moving to defer helps to avoid future bugs. This is also moving the `closeAppend` in the `Commit` implementation itself to defer. If logging to the WAL fails, we would have missed the `closeAppend`. Signed-off-by: beorn7 <beorn@grafana.com>	2020-03-13 20:54:47 +01:00
Bartlomiej Plotka	9d9c45588e	Addressed Goutham's review. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-03-13 19:18:31 +00:00
Bartlomiej Plotka	cd9516316a	Addressed Brian's comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-03-13 14:37:47 +00:00
Bartlomiej Plotka	fe802f29c9	storage: Removed SelectSorted method; Simplified interface; Added requirement for remote read to sort response. This is technically BREAKING CHANGE, but it was like this from the beginning: I just notice that we rely in Prometheus on remote read being sorted. This is because we use selected data from remote reads in MergeSeriesSet which rely on sorting. I found during work on https://github.com/prometheus/prometheus/pull/5882 that we do so many repetitions because of this, for not good reason. I think I found a good balance between convenience and readability with just one method. Smaller the interface = better. Also I don't know what TestSelectSorted was testing, but now it's testing sorting. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-03-13 13:06:25 +00:00

1 2 3 4

170 commits