prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-12-27 22:49:40 -08:00

Author	SHA1	Message	Date
Fiona Liao	5bee0cfce2	Change `ChunkReader.Chunk()` to `ChunkOrIterable()` The ChunkReader interface's Chunk() has been changed to ChunkOrIterable(). This is a precursor to OOO native histogram support - with OOO native histograms, the chunks.Meta passed to Chunk() can result in multiple chunks being returned rather than just a single chunk (e.g. if oooMergedChunk has a counter reset in the middle). To support this, ChunkOrIterable() requires either a single chunk or an iterable to be returned. If an iterable is returned, the caller has the responsibility of converting the samples from the iterable into possibly multiple chunks. The OOOHeadChunkReader now returns an iterable rather than a chunk to prepare for the native histograms case. Also as a beneficial side effect, oooMergedChunk and boundedChunk has been simplified as they only need to implement the Iterable interface now, not the full Chunk interface. --------- Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com> Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>	2023-11-28 11:14:29 +01:00
Charles Korn	59844498f7	Fix issue where queries can fail or omit OOO samples if OOO head compaction occurs between creating a querier and reading chunks (#13115 ) * Add failing test. Signed-off-by: Charles Korn <charles.korn@grafana.com> * Don't run OOO head garbage collection while reads are running. Signed-off-by: Charles Korn <charles.korn@grafana.com> * Add further test cases for different order of operations. Signed-off-by: Charles Korn <charles.korn@grafana.com> * Ensure all queriers are closed if `DB.blockChunkQuerierForRange()` fails. Signed-off-by: Charles Korn <charles.korn@grafana.com> * Ensure all queriers are closed if `DB.Querier()` fails. Signed-off-by: Charles Korn <charles.korn@grafana.com> * Invert error handling in `DB.Querier()` and `DB.blockChunkQuerierForRange()` to make it clearer Signed-off-by: Charles Korn <charles.korn@grafana.com> * Ensure that queries that touch OOO data can't block OOO head garbage collection forever. Signed-off-by: Charles Korn <charles.korn@grafana.com> * Address PR feedback: fix parameter name in comment Co-authored-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com> Signed-off-by: Charles Korn <charleskorn@users.noreply.github.com> * Address PR feedback: use `lastGarbageCollectedMmapRef` Signed-off-by: Charles Korn <charles.korn@grafana.com> * Address PR feedback: ensure pending reads are cleaned up if creating an OOO querier fails Signed-off-by: Charles Korn <charles.korn@grafana.com> --------- Signed-off-by: Charles Korn <charles.korn@grafana.com> Signed-off-by: Charles Korn <charleskorn@users.noreply.github.com> Co-authored-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>	2023-11-24 12:38:38 +01:00
Matthieu MOREL	dd8871379a	remplace errors.Errorf by fmt.Errorf Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2023-11-14 13:04:31 +00:00
Linas Medziunas	1cd6c1cde5	ValidateHistogram: strict Count check in absence of NaNs Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>	2023-11-03 16:17:24 +02:00
Björn Rabenstein	a43669e611	Merge pull request #12928 from alexandear/ci-enable-godot ci(lint): enable godot; append dot at the end of comments	2023-11-01 17:15:41 +01:00
Julien Pivotto	f568221610	Merge pull request #13057 from prometheus/release-2.48 Merge release-2.48 back into main	2023-10-31 15:24:39 -04:00
Oleksandr Redko	fa90ca46e5	ci(lint): enable godot; append dot at the end of comments Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>	2023-10-31 19:53:38 +02:00
zenador	80e977aae6	Remove `NewPossibleNonCounterInfo` and minimise creating empty annotations (#13012 ) * Remove NewPossibleNonCounterInfo until it can be made more efficient, and avoid creating empty annotations as much as possible Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2023-10-24 17:36:07 +01:00
Márcio Carôso	dff1c395f6	Expose --storage.tsdb.retention.time in metric prometheus_tsdb_retention_limit_seconds (#12986 ) * Expose --storage.tsdb.retention.time in a metric Signed-off-by: Marcio Caroso <msscaroso@gmail.com> --------- Signed-off-by: Marcio Caroso <msscaroso@gmail.com>	2023-10-24 13:34:42 +02:00
Ganesh Vernekar	4df2f2432b	Additionally wrap WBL replay error (#12406 ) * Additionally wrap WBL replay error Although WBL replay is already wrapped with errLoadWbl, there are other errors that can happen during a WBL replay. We should not try to repair WAL in those cases. This commit additionally wraps the final error in Head.Init again with errLoadWbl so that WBL replay errors can be identified properly. Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com> Co-authored-by: Jesus Vazquez <jesusvzpg@gmail.com> Signed-off-by: Levi Harrison <git@leviharrison.dev>	2023-10-15 13:47:42 -04:00
Ganesh Vernekar	f5913266a1	Additionally wrap WBL replay error (#12406 ) * Additionally wrap WBL replay error Although WBL replay is already wrapped with errLoadWbl, there are other errors that can happen during a WBL replay. We should not try to repair WAL in those cases. This commit additionally wraps the final error in Head.Init again with errLoadWbl so that WBL replay errors can be identified properly. Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com> Co-authored-by: Jesus Vazquez <jesusvzpg@gmail.com>	2023-10-13 14:21:35 +02:00
zenador	69edd8709b	Add warnings (and annotations) to PromQL query results (#12152 ) Return annotations (warnings and infos) from PromQL queries This generalizes the warnings we have already used before (but only for problems with remote read) as "annotations". Annotations can be warnings or infos (the latter could be false positives). We do not treat them different in the API for now and return them all as "warnings". It would be easy to distinguish them and return infos separately, should that appear useful in the future. The new annotations are then used to create a lot of warnings or infos during PromQL evaluations. Partially these are things we have wanted for a long time (e.g. inform the user that they have applied `rate` to a metric that doesn't look like a counter), but the new native histograms have created even more needs for those annotations (e.g. if a query tries to aggregate float numbers with histograms). The annotations added here are not yet complete. A prominent example would be a warning about a range too short for a rate calculation. But such a warnings is more tricky to create with good fidelity and we will tackle it later. Another TODO is to take annotations into account when evaluating recording rules. --------- Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2023-09-14 18:57:31 +02:00
Arve Knudsen	156222cc50	Add context argument to LabelQuerier.LabelValues (#12665 ) Add context argument to LabelQuerier.LabelValues and LabelQuerier.SortedLabelValues. Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2023-09-14 16:02:04 +02:00
Arve Knudsen	a964349e97	Add context argument to LabelQuerier.LabelNames (#12666 ) Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2023-09-14 10:39:51 +02:00
Arve Knudsen	4451ba10b4	Add context argument to IndexReader.Postings (#12667 ) Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2023-09-13 17:45:06 +02:00
Arve Knudsen	6ef9ed0bc3	Add context argument to DB.Delete (#12834 ) Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2023-09-13 15:43:06 +02:00
Arve Knudsen	6daee89e5f	Add context argument to Querier.Select (#12660 ) Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2023-09-12 12:37:38 +02:00
Justin Lei	8ef7dfdeeb	Add a chunk size limit in bytes (#12054 ) Add a chunk size limit in bytes This creates a hard cap for XOR chunks of 1024 bytes. The limit for histogram chunk is also 1024 bytes, but it is a soft limit as a histogram has a dynamic size, and even a single one could be larger than 1024 bytes. This also avoids cutting new histogram chunks if the existing chunk has fewer than 10 histograms yet. In that way, we are accepting "jumbo chunks" in order to have at least 10 histograms in a chunk, allowing compression to kick in. Signed-off-by: Justin Lei <justin.lei@grafana.com>	2023-08-24 15:21:17 +02:00
beorn7	aa82fe198f	tsdb: Fix histogram validation So far, `ValidateHistogram` would not detect if the count did not include the count in the zero bucket. This commit fixes the problem and updates all the tests that have been undetected offenders so far. Note that this problem would only ever create false negatives, so we never falsely rejected to store a histogram because of it. On the other hand, `ValidateFloatHistogram` has been to strict with the count being at least as large as the sum of the counts in all the buckets. Float precision issues could create false positives here, see products of PromQL evaluations, it's actually quite hard to put an upper limit no the floating point imprecision. Users could produce the weirdest expressions, maxing out float precision problems. Therefore, this commit simply removes that particular check from `ValidateFloatHistogram`. Signed-off-by: beorn7 <beorn@grafana.com>	2023-08-22 23:04:01 +02:00
Julien Pivotto	e3fabd5fdf	Merge pull request #12664 from prometheus/superq/cleanup_chunk_snapshots Cleanup temporary chunk snapshot dirs	2023-08-08 13:02:39 +02:00
SuperQ	8d38d59fc5	Cleanup temporary chunk snapshot dirs Simlar to cleanup of WAL files on startup, cleanup temporary chunk_snapshot dirs. This prevents storage space leaks due to terminated snapshots on shutdown. Signed-off-by: SuperQ <superq@gmail.com>	2023-08-08 09:43:48 +02:00
Łukasz Mierzwa	3c80963e81	Use a linked list for memSeries.headChunk (#11818 ) Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headChunk is needed to use for given append() call. If that happens it will first mmap existing head chunk and only after that happens it will create a new empty headChunk and continue appending our sample to it. Since appending samples uses write lock on memSeries no other read or write can happen until any append is completed. When we have an append() that must create a new head chunk the whole memSeries is blocked until mmapping of existing head chunk finishes. Mmapping itself uses a lock as it needs to be serialised, which means that the more chunks to mmap we have the longer each chunk might wait for it to be mmapped. If there's enough chunks that require mmapping some memSeries will be locked for long enough that it will start affecting queries and scrapes. Queries might timeout, since by default they have a 2 minute timeout set. Scrapes will be blocked inside append() call, which means there will be a gap between samples. This will first affect range queries or calls using rate() and such, since the time range requested in the query might have too few samples to calculate anything. To avoid this we need to remove mmapping from append path, since mmapping is blocking. But this means that when we cut a new head chunk we need to keep the old one around, so we can mmap it later. This change makes memSeries.headChunk a linked list, memSeries.headChunk still points to the 'open' head chunk that receives new samples, while older, yet to be mmapped, chunks are linked to it. Mmapping is done on a schedule by iterating all memSeries one by one. Thanks to this we control when mmapping is done, since we trigger it manually, which reduces the risk that it will have to compete for mmap locks with other chunks. Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>	2023-07-31 11:10:24 +02:00
György Krajcsovits	d4e355243a	tsdbutil/ChunkFromSamplesGeneric should not panic Add error handling instead. Prepares for #12352 Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2023-07-20 17:01:34 +02:00
Justin Lei	32d87282ad	Add Zstandard compression option for wlog (#11666 ) Snappy remains as the default compression but there is now a flag to switch the compression algorithm. Signed-off-by: Justin Lei <justin.lei@grafana.com>	2023-07-11 14:57:57 +02:00
Nidhey Nitin Indurkar	a8772a4178	Feat: Get block by id directly on promtool analyze & get latest block if ID not provided (#12031 ) * feat: analyze latest block or block by ID in CLI (promtool) Signed-off-by: nidhey27 <nidhey.indurkar@infracloud.io> * address remarks Signed-off-by: nidhey60@gmail.com <nidhey.indurkar@infracloud.io> * address latest review comments Signed-off-by: nidhey60@gmail.com <nidhey.indurkar@infracloud.io> --------- Signed-off-by: nidhey27 <nidhey.indurkar@infracloud.io> Signed-off-by: nidhey60@gmail.com <nidhey.indurkar@infracloud.io>	2023-06-01 17:13:09 +05:30
Matthieu MOREL	bae9a21200	Merge branch 'main' into linter/nilerr Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2023-04-19 19:56:39 +02:00
beorn7	c3c7d44d84	lint: Adjust to the lint warnings raised by current versions of golint-ci We haven't updated golint-ci in our CI yet, but this commit prepares for that. There are a lot of new warnings, and it is mostly because the "revive" linter got updated. I agree with most of the new warnings, mostly around not naming unused function parameters (although it is justified in some cases for documentation purposes – while things like mocks are a good example where not naming the parameter is clearer). I'm pretty upset about the "empty block" warning to include `for` loops. It's such a common pattern to do something in the head of the `for` loop and then have an empty block. There is still an open issue about this: https://github.com/mgechev/revive/issues/810 I have disabled "revive" altogether in files where empty blocks are used excessively, and I have made the effort to add individual `// nolint:revive` where empty blocks are used just once or twice. It's borderline noisy, though, but let's go with it for now. I should mention that none of the "empty block" warnings for `for` loop bodies were legitimate. Signed-off-by: beorn7 <beorn@grafana.com>	2023-04-19 17:10:10 +02:00
Matthieu MOREL	fb3eb21230	enable gocritic, unconvert and unused linters Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2023-04-13 19:20:22 +00:00
beorn7	817a2396cb	Name float values as "floats", not as "values" In the past, every sample value was a float, so it was fine to call a variable holding such a float "value" or "sample". With native histograms, a sample might have a histogram value. And a histogram value is still a value. Calling a float value just "value" or "sample" or "V" is therefore misleading. Over the last few commits, I already renamed many variables, but this cleans up a few more places where the changes are more invasive. Note that we do not to attempt naming in the JSON APIs or in the protobufs. That would be quite a disruption. However, internally, we can call variables as we want, and we should go with the option of avoiding misunderstandings. Signed-off-by: beorn7 <beorn@grafana.com>	2023-04-13 19:25:24 +02:00
Ganesh Vernekar	e709b0b36e	Merge pull request #12127 from codesome/ooo-mmap-replay Update OOO min/max time properly after replaying m-map chunks	2023-04-04 12:05:57 +05:30
Alex Le	1936868e9d	Allow populate block logic in compact to be overriden outside Prometheus (#11711 ) Signed-off-by: Alex Le <leqiyue@amazon.com> Signed-off-by: Alex Le <emoc1989@gmail.com>	2023-04-04 12:01:49 +05:30
Ganesh Vernekar	58a8d526e8	Merge pull request #11992 from codesome/no-reencode-chunk Do not re-encode head chunk for ChunkQuerier	2023-03-15 18:30:38 +05:30
Ganesh Vernekar	0a3f203c63	Update tests to not assume the chunk implementation Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2023-03-15 17:58:37 +05:30
Ganesh Vernekar	0c0c2af7f5	Do not re-encode head chunk in ChunkQuerier Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2023-03-15 17:58:01 +05:30
Ganesh Vernekar	1c3f1216b3	tsdb: Test querying after missing wbl with snapshots enabled If the snapshot was enabled with some ooo mmap chunks on disk, and wbl was removed between restarts, then we should still be able to query the ooo mmap chunks after a restart. This test shows that we are not able to query those ooo mmap chunks after a restart under this situation. Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2023-03-13 13:14:00 +05:30
Jesus Vazquez	5c3f058755	Add unit test and also protect truncateOOO Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>	2023-02-10 15:18:17 +01:00
beorn7	1cfc8f65a3	histograms: Return actually useful counter reset hints This is a bit more conservative than we could be. As long as a chunk isn't the first in a block, we can be pretty sure that the previous chunk won't disappear. However, the incremental gain of returning NotCounterReset in these cases is probably very small and might not be worth the code complications. Wwith this, we now also pay attention to an explicitly set counter reset during ingestion. While the case doesn't show up in practice yet, there could be scenarios where the metric source knows there was a counter reset even if it might not be visible from the values in the histogram. It is also useful for testing. Signed-off-by: beorn7 <beorn@grafana.com>	2023-01-25 16:57:21 +01:00
Ganesh Vernekar	38fa151a7c	tsdb: Only initialise out-of-order fields when required Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2023-01-12 20:29:16 +05:30
Marc Tudurí	9474610baf	Support FloatHistogram in TSDB (#11522 ) Extends Appender.AppendHistogram function to accept the FloatHistogram. TSDB supports appending, querying, WAL replay, for this new type of histogram. Signed-off-by: Marc Tudurí <marctc@protonmail.com> Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-12-28 14:25:07 +05:30
Bryan Boreham	10b27dfb84	Simplify IndexReader.Series interface Instead of passing in a `ScratchBuilder` and `Labels`, just pass the builder and the caller can extract labels from it. In many cases the caller didn't use the Labels value anyway. Now in `Labels.ScratchBuilder` we need a slightly different API: one to assign what will be the result, instead of overwriting some other `Labels`. This is safer and easier to reason about. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-12-19 15:22:09 +00:00
Bryan Boreham	4b6a4d1425	Update package tsdb tests for new labels.Labels type Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-12-19 15:22:09 +00:00
Bryan Boreham	d3d96ec887	tsdb/index: use ScratchBuilder to create Labels This necessitates a change to the `tsdb.IndexReader` interface: `index.Reader` is used from multiple goroutines concurrently, so we can't have state in it. We do retain a `ScratchBuilder` in `blockBaseSeriesSet` which is iterator-like. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-12-19 15:22:09 +00:00
Bryan Boreham	3c7de69059	storage: allow re-use of iterators Patterned after `Chunk.Iterator()`: pass the old iterator in so it can be re-used to avoid allocating a new object. (This commit does not do any re-use; it is just changing all the method signatures so re-use is possible in later commits.) Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-12-15 18:32:45 +00:00
Ganesh Vernekar	d0e683e26d	Add TestCompactHeadWithDeletion to test compaction failure after deletion Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-11-23 17:31:18 +05:30
Ganesh Vernekar	648be89822	Merge remote-tracking branch 'upstream/main' into fix-conflict Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-10-12 14:20:02 +05:30
Ganesh Vernekar	8e29110949	Add/Improve unit tests for compaction with histogram (#11342 ) Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-10-12 13:31:12 +05:30
Signed-off-by: Jesus Vazquez	3362bf6d79	Fix merge conflicts Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-10-11 22:53:37 +05:30
Jesus Vazquez	775d90d5f8	TSDB: Rename wal package to wlog (#11352 ) The wlog.WL type can now be used to create a Write Ahead Log or a Write Behind Log. Before the prefix for wbl metrics was 'prometheus_tsdb_out_of_order_wal_' and has been replaced with 'prometheus_tsdb_out_of_order_wbl_'. Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> Signed-off-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com> Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>	2022-10-10 20:38:46 +05:30
Jesus Vazquez	e934d0f011	Merge 'main' into sparsehistogram Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>	2022-10-05 22:14:49 +02:00
Ganesh Vernekar	f34aeefe6e	Allow overlapping blocks by default (#11331 ) Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-09-28 19:17:54 +05:30
Jesus Vazquez	c1b669bf9b	Add out-of-order sample support to the TSDB (#11075 ) * Introduce out-of-order TSDB support This implementation is based on this design doc: https://docs.google.com/document/d/1Kppm7qL9C-BJB1j6yb6-9ObG3AbdZnFUBYPNNWwDBYM/edit?usp=sharing This commit adds support to accept out-of-order ("OOO") sample into the TSDB up to a configurable time allowance. If OOO is enabled, overlapping querying are automatically enabled. Most of the additions have been borrowed from https://github.com/grafana/mimir-prometheus/ Here is the list ist of the original commits cherry picked from mimir-prometheus into this branch: - `4b2198d7ec` - `2836e5513f` - `00b379c3a5` - `ff0dc75758` - `a632c73352` - `c6f3d4ab33` - `5e8406a1d4` - `abde1e0ba1` - `e70e769889` - `df59320886` Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com> Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com> Co-authored-by: Dieter Plaetinck <dieter@grafana.com> Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * gofumpt files Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Add license header to missing files Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Fix OOO tests due to existing chunk disk mapper implementation Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Fix truncate int overflow Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Add Sync method to the WAL and update tests Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * remove useless sync Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Update minOOOTime after truncating Head * Update minOOOTime after truncating Head Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix lint Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Add a unit test Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Load OutOfOrderTimeWindow only once per appender Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Fix OOO Head LabelValues and PostingsForMatchers Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Fix replay of OOO mmap chunks Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Remove unnecessary err check Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Prevent panic with ApplyConfig Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Run OOO compaction after restart if there is OOO data from WBL Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Apply Bartek's suggestions Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Refactor OOO compaction Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Address comments and TODOs - Added a comment explaining why we need the allow overlapping compaction toggle - Clarified TSDBConfig OutOfOrderTimeWindow doc - Added an owner to all the TODOs in the code Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Run go format Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Fix remaining review comments Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix tests Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Change wbl reference when truncating ooo in TestHeadMinOOOTimeUpdate Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Fix TestWBLAndMmapReplay test failure on windows Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Address most of the feedback Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Refactor the block meta for out of order Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix windows error Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix review comments Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com> Co-authored-by: Dieter Plaetinck <dieter@grafana.com> Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>	2022-09-20 22:35:50 +05:30
Ganesh Vernekar	2474c6fb2c	Error on amending histograms on append (#11308 ) * Error on amending histograms on append Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Rename Matches to Equals Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-09-19 13:10:30 +05:30
Ganesh Vernekar	d354f20c2a	Add a feature flag to control native histogram ingestion (#11253 ) * Add runtime config to control native histogram ingestion Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Make the config into a CLI flag Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-09-14 17:38:34 +05:30
Bryan Boreham	176fa38e76	tsdb: in tests use labels.FromStrings Replacing code which assumes the internal structure of `Labels`. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-09-09 13:34:49 +02:00
Ganesh Vernekar	d209a29a5b	Add unit test for histogram append and various querying scenarios (#11194 ) * Add unit test for histogram append and various querying scenarios Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * make lint happy Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix tests Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix review comments Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-08-29 15:35:03 +05:30
Łukasz Mierzwa	3196c98bc2	Reduce memSeries memory usage by decoupling metadata (#11152 ) Metadata was added recently but doesn't seem to be used much, at least as far as I could identify. Yet it's part of memSeries struct and so even when empty takes 48 bytes, which is a lot given that without it memSeries requires 224 bytes. This change turns it into a pointer on the struct, that get set only when metadata is actually set of given series. Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com> Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>	2022-08-17 15:32:28 +05:30
beorn7	c9fd3c235d	Merge branch 'main' into sparsehistogram	2022-08-10 17:54:37 +02:00
Paschalis Tsilias	d1122e0743	Introduce TSDB changes for appending metadata to the WAL (#10972 ) * Append metadata to the WAL Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Remove extra whitespace; Reword some docstrings and comments Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Use RLock() for hasNewMetadata check Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Use single byte for metric type in RefMetadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Update proposed WAL format for single-byte type metadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Implementa MetadataAppender interface for the Agent Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Address first round of review comments Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Amend description of metadata in wal.md Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Correct key used to retrieve metadata from cache When we're setting metadata entries in the scrapeCace, we're using the p.Help(), p.Unit(), p.Type() helpers, which retrieve the series name and use it as the cache key. When checking for cache entries though, we used p.Series() as the key, which included the metric name _with_ its labels. That meant that we were never actually hitting the cache. We're fixing this by utiling the __name__ internal label for correctly getting the cache entries after they've been set by setHelp(), setType() or setUnit(). Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Put feature behind a feature flag Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix AppendMetadata docstring Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Reorder WAL format document Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Change error message of AppendMetadata; Fix access of s.meta in AppendMetadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Reuse temporary buffer in Metadata encoder Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Only keep latest metadata for each refID during checkpointing Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix test that's referencing decoding metadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Avoid creating metadata block if no new metadata are present Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Add tests for corrupt metadata block and relevant record type Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix CR comments Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Extract logic about changing metadata in an anonymous function Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Implement new proposed WAL format and amend relevant tests Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Use 'const' for metadata field names Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Apply metadata to head memSeries in Commit, not in AppendMetadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Add docstring and rename extracted helper in scrape.go Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Add tests for tsdb-related cases Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix linter issues vol1 Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix linter issues vol2 Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix Windows test by closing WAL reader files Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Use switch instead of two if statements in metadata decoding Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix review comments around TestMetadata* tests Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Add code for replaying WAL; test correctness of in-memory data after a replay Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Remove scrape-loop related code from PR Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Address first round of comments Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Simplify tests by sorting slices before comparison Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix test to use separate transactions Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Empty out buffer and record slices after encoding latest metadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix linting issue Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Update calculation for DroppedMetadata metric Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Rename MetadataAppender interface and AppendMetadata method to MetadataUpdater/UpdateMetadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Reuse buffer when encoding latest metadata for each series Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix review comments; Check all returned error values using two helpers Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Simplify use of helpers Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Satisfy linter Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>	2022-07-19 10:58:52 +02:00
beorn7	40ad5e284a	Merge branch 'main' into beorn7/sparsehistogram	2022-06-09 20:50:30 +02:00
Matthias Rampke	78f2645787	test(tsdb): break up repeated test to avoid timeout (#10671 ) On macOS, the TestTombstoneCleanRetentionLimitsRace performs very poorly. It takes more than a second to write out one block, and as it writes 400 of them, we run into the 10-minute test timeout frequently. While this doesn't fix the actual performance issue, breaking each iteration into a subtest makes the test pass reliably (because each iteration comfortably finishes in under a minute). Related report: https://groups.google.com/g/prometheus-developers/c/jxQ6Ayg6VJ4/m/03H_DS9PDAAJ Signed-off-by: Matthias Rampke <matthias@prometheus.io>	2022-05-09 00:39:26 +02:00
beorn7	3bc711e333	Merge branch 'main' into sparsehistogram	2022-05-04 13:37:13 +02:00
Matthieu MOREL	e2ede285a2	refactor: move from io/ioutil to io and os packages (#10528 ) * refactor: move from io/ioutil to io and os packages * use fs.DirEntry instead of os.FileInfo after os.ReadDir Signed-off-by: MOREL Matthieu <matthieu.morel@cnp.fr>	2022-04-27 11:24:36 +02:00
beorn7	7ee1836ef5	Merge branch 'main' into sparsehistogram	2022-04-05 18:31:19 +02:00
Howie	1291ec7185	deleting .tmp WAL files on startup (#10317 ) fix issue #10245 Signed-off-by: lihaowei <haoweili35@gmail.com> * minor changes Signed-off-by: lihaowei <haoweili35@gmail.com> * review changes Signed-off-by: lihaowei <haoweili35@gmail.com> * minor changes Signed-off-by: lihaowei <haoweili35@gmail.com>	2022-03-24 16:14:14 +05:30
beorn7	4210aac74a	Merge branch 'main' into sparsehistogram	2022-03-22 14:47:42 +01:00
Björn Rabenstein	d1edb006c1	Merge pull request #10341 from prometheus/release-2.33 Merge release-2.33 forward into main	2022-02-22 22:51:05 +01:00
Ganesh Vernekar	24827782cb	Fix panics when m-mapping head chunks (#10316 ) * Fix panics when m-mapping head chunks Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix review comments Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix reviews Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-02-22 20:35:15 +05:30
Eng Zer Jun	3e67654d37	refactor: use `T.TempDir()` and `B.TempDir` to create temporary directory The directory created by `T.TempDir()` and `B.TempDir()` is automatically removed when the test and all its subtests complete. Reference: https://pkg.go.dev/testing#T.TempDir Reference: https://pkg.go.dev/testing#B.TempDir Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>	2022-01-22 18:57:30 +08:00
Ganesh Vernekar	129ed4ec8b	Fix Example() function in TSDB (#10153 ) * Fix Example() function in TSDB Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix tests Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-01-11 17:24:03 +05:30
Mauro Stettler	0df3489275	Write chunks via queue, predicting the refs (#10051 ) * Write chunks via queue, predicting the refs Our load tests have shown that there is a latency spike in the remote write handler whenever the head chunks need to be written, because chunkDiskMapper.WriteChunk() blocks until the chunks are written to disk. This adds a queue to the chunk disk mapper which makes the WriteChunk() method non-blocking unless the queue is full. Reads can still be served from the queue. Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * address PR feeddback Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * initialize metrics without .Add(0) Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * change isRunningMtx to normal lock Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * do not re-initialize chunkrefmap Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * update metric outside of lock scope Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * add benchmark for adding job to chunk write queue Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * remove unnecessary "success" var Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * gofumpt -extra Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * avoid WithLabelValues call in addJob Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * format comments Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * addressing PR feedback Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * rename cutExpectRef to cutAndExpectRef Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * use head.Init() instead of .initTime() Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * address PR feedback Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * PR feedback Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * update test according to PR feedback Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * replace callbackWg -> awaitCb Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * better test of truncation with empty files Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * replace callbackWg -> awaitCb Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>	2022-01-10 13:36:45 +00:00
beorn7	e4e24453fa	Merge branch 'main' into beorn7/merge2	2021-11-30 17:19:06 +01:00
Björn Rabenstein	7e42acd3b1	tsdb: Rework iterators (#9877 ) - Pick At... method via return value of Next/Seek. - Do not clobber returned buckets. - Add partial FloatHistogram suppert. Note that the promql package is now _only_ dealing with FloatHistograms, following the idea that PromQL only knows float values. As a byproduct, I have removed the histogramSeries metric. In my understanding, series can have both float and histogram samples, so that metric doesn't make sense anymore. As another byproduct, I have converged the sampleBuf and the histogramSampleBuf in memSeries into one. The sample type stored in the sampleBuf has been extended to also contain histograms even before this commit. Signed-off-by: beorn7 <beorn@grafana.com>	2021-11-29 13:24:23 +05:30
Bryan Boreham	1b74a3812e	Fix panic, out of order chunks, and race warning during WAL replay (#9856 ) * Fix panic on WAL replay Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Refactor: introduce walSubsetProcessor walSubsetProcessor packages up the `processWALSamples()` function and its input and output channels, helping to clarify how these things relate. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Refactor: extract more methods onto walSubsetProcessor This makes the main logic easier to follow. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Fix race warning by locking processWALSamples Although we have waited for the processor to finish, we still get a warning from the race detector because it doesn't know how the different parts relate. Add a lock round each batch of samples, so the race detector can see that we never access series owned by the processor outside of a lock. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Added test to reproduce issue 9859 Signed-off-by: Marco Pracucci <marco@pracucci.com> * Remove redundant unit test Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix out of order chunks during WAL replay Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix nits Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com> Co-authored-by: Marco Pracucci <marco@pracucci.com>	2021-11-25 13:36:14 +05:30
Darshan Chaudhary	9dcf8b2208	Add the ability to disable tsdb isolation (#9270 ) * Disable isolation in isolation struct Signed-off-by: darshanime <deathbullet@gmail.com> * Run tsdb tests with isolation disabled Signed-off-by: darshanime <deathbullet@gmail.com> * Check for isolation disabled in isoState.Close() Signed-off-by: darshanime <deathbullet@gmail.com> * use t.Skip to skip isolation tests when disabled Signed-off-by: darshanime <deathbullet@gmail.com> * address review comments Signed-off-by: darshanime <deathbullet@gmail.com> * fix test for defaultIsolationState Signed-off-by: darshanime <deathbullet@gmail.com> * Change flag name. Set flag in DB. Do not init txRing. Close isoState. Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Test disabled isolation in CircleCI test_go Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Skip isolation related tests in db_test.go Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-11-19 15:41:32 +05:30
beorn7	5d4db805ac	Merge branch 'main' into sparsehistogram	2021-11-17 19:57:31 +01:00
beorn7	4c28d9fac7	Move to histogram.Histogram pointers This is to avoid copying the many fields of a histogram.Histogram all the time. This also fixes a bunch of formerly broken tests. Signed-off-by: beorn7 <beorn@grafana.com>	2021-11-12 23:17:35 +01:00
Robert Fratto	72a9f7fee9	Share TSDB locker code with agent (#9623 ) * share tsdb db locker code with agent Closes #9616 Signed-off-by: Robert Fratto <robertfratto@gmail.com> * add flag to disable lockfile for agent Signed-off-by: Robert Fratto <robertfratto@gmail.com> * use agentOnlySetting instead of PreAction Signed-off-by: Robert Fratto <robertfratto@gmail.com> * tsdb: address review feedback 1. Rename Locker to DirLocker 2. Move DirLocker to tsdb/tsdbutil 3. Name metric using fmt.Sprintf 4. Refine error checking in DirLocker test Signed-off-by: Robert Fratto <robertfratto@gmail.com> * tsdb: create test utilities to assert expected DirLocker behavior Signed-off-by: Robert Fratto <robertfratto@gmail.com> * tsdb/tsdbutil: fix lint errors Signed-off-by: Robert Fratto <robertfratto@gmail.com> * tsdb/agent: fix windows test failure Use new DB variable instead of overriding the old one. Signed-off-by: Robert Fratto <robertfratto@gmail.com>	2021-11-11 11:45:25 -05:00
Mateusz Gozdek	2f312ff4c5	tsdb: mark TestTombstoneCleanRetentionLimitsRace test as slow It takes over 100 seconds to execute this test, so I'd consider it as slow. Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>	2021-11-11 01:37:24 +01:00
beorn7	c954cd9d1d	Move packages out of deprecated pkg directory This creates a new `model` directory and moves all data-model related packages over there: exemplar labels relabel rulefmt textparse timestamp value All the others are more or less utilities and have been moved to `util`: gate logging modetimevfs pool runtime Signed-off-by: beorn7 <beorn@grafana.com>	2021-11-09 08:03:10 +01:00
Dieter Plaetinck	cda025b5b5	TSDB: demistify SeriesRefs and ChunkRefs (#9536 ) * TSDB: demistify seriesRefs and ChunkRefs The TSDB package contains many types of series and chunk references, all shrouded in uint types. Often the same uint value may actually mean one of different types, in non-obvious ways. This PR aims to clarify the code and help navigating to relevant docs, usage, etc much quicker. Concretely: * Use appropriately named types and document their semantics and relations. * Make multiplexing and demuxing of types explicit (on the boundaries between concrete implementations and generic interfaces). * Casting between different types should be free. None of the changes should have any impact on how the code runs. TODO: Implement BlockSeriesRef where appropriate (for a future PR) Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * feedback Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * agent: demistify seriesRefs and ChunkRefs Signed-off-by: Dieter Plaetinck <dieter@grafana.com>	2021-11-06 15:40:04 +05:30
Mateusz Gozdek	1a6c2283a3	Format Go source files using 'gofumpt -w -s -extra' Part of #9557 Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>	2021-11-02 19:52:34 +01:00
Furkan Türkal	0c07663b70	fix: possible race on shared variables in test (#9470 ) Fixes #9433 Signed-off-by: Furkan <furkan.turkal@trendyol.com>	2021-10-25 18:44:40 +05:30
Levi Harrison	06afe6162c	Also ignore `func1` Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-08-28 22:42:22 -04:00
Ganesh Vernekar	59d02b5ef0	tsdb: Block Head GC till pending readers are done reading (#9081 ) * tsdb: Block Head GC till pending readers are done reading Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix review comments Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix review comments 2 Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix the exclusiveness of maxt in WaitForPendingReadersInTimeRange Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-07-20 14:17:20 +05:30
Levi Harrison	437c470c40	Added ignore Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-06-17 10:04:52 -04:00
Julien Pivotto	b1c179be85	Fix main build (#8948 ) Was broken after the merge of #8824 Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2021-06-16 17:18:32 +02:00
Julien Duchesne	8855c2e626	Add `prometheus_tsdb_clean_start` metric (#8824 ) Add cleanup of the lockfile when the db is cleanly closed The metric describes the status of the lockfile on startup 0: Already existed 1: Did not exist -1: Disabled Therefore, if the min value over time of this metric is 0, that means that executions have exited uncleanly We can then use that metric to have a much lower threshold on the crashlooping alert: If the metric exists and it has been zero, two restarts is enough to trigger the alarm If it does not exist (old prom version for example), the current five restarts threshold remains Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com> * Change metric name + set unset value to -1 Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com> * Only check the last value of the clean start alert Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com> * Fix test + nit Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>	2021-06-16 15:03:02 +05:30
Levi Harrison	b5f6f8fb36	Switched to go-kit/log Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-06-11 12:28:36 -04:00
Levi Harrison	7bc11dcb06	React UI: Add Starting Screen (#8662 ) * Added walreplay API endpoint Signed-off-by: Levi Harrison <git@leviharrison.dev> * Added starting page to react-ui Signed-off-by: Levi Harrison <git@leviharrison.dev> * Documented the new endpoint Signed-off-by: Levi Harrison <git@leviharrison.dev> * Fixed typos Signed-off-by: Levi Harrison <git@leviharrison.dev> Co-authored-by: Julius Volz <julius.volz@gmail.com> * Removed logo Signed-off-by: Levi Harrison <git@leviharrison.dev> * Changed isResponding to isUnexpected Signed-off-by: Levi Harrison <git@leviharrison.dev> * Changed width of progress bar Signed-off-by: Levi Harrison <git@leviharrison.dev> * Changed width of progress bar Signed-off-by: Levi Harrison <git@leviharrison.dev> * Added DB stats object Signed-off-by: Levi Harrison <git@leviharrison.dev> * Updated starting page to work with new fields Signed-off-by: Levi Harrison <git@leviharrison.dev> * Passing nil Signed-off-by: Levi Harrison <git@leviharrison.dev> * Passing nil (pt. 2) Signed-off-by: Levi Harrison <git@leviharrison.dev> * Passing nil (pt. 3) Signed-off-by: Levi Harrison <git@leviharrison.dev> * Passing nil (and also implementing a method this time) (pt. 4) Signed-off-by: Levi Harrison <git@leviharrison.dev> * Passing nil (and also implementing a method this time) (pt. 5) Signed-off-by: Levi Harrison <git@leviharrison.dev> * Changed const to let Signed-off-by: Levi Harrison <git@leviharrison.dev> * Passing nil (pt. 6) Signed-off-by: Levi Harrison <git@leviharrison.dev> * Remove SetStats method Signed-off-by: Levi Harrison <git@leviharrison.dev> * Added comma Signed-off-by: Levi Harrison <git@leviharrison.dev> * Changed api Signed-off-by: Levi Harrison <git@leviharrison.dev> * Changed to triple equals Signed-off-by: Levi Harrison <git@leviharrison.dev> * Fixed data response types Signed-off-by: Levi Harrison <git@leviharrison.dev> * Don't return pointer Signed-off-by: Levi Harrison <git@leviharrison.dev> * Changed version Signed-off-by: Levi Harrison <git@leviharrison.dev> * Fixed interface issue Signed-off-by: Levi Harrison <git@leviharrison.dev> * Fixed pointer Signed-off-by: Levi Harrison <git@leviharrison.dev> * Fixed copying lock value error Signed-off-by: Levi Harrison <git@leviharrison.dev> Co-authored-by: Julius Volz <julius.volz@gmail.com>	2021-06-05 15:29:32 +01:00
Marco Pracucci	4b49ffbad5	Stop the bleed on chunk mapper panic (#8723 ) * Added test to reproduce panic on TSDB head chunks truncated while querying Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added test for Querier too Signed-off-by: Marco Pracucci <marco@pracucci.com> * Stop the bleed on mmap-ed head chunks panic Signed-off-by: Marco Pracucci <marco@pracucci.com> * Lower memory pressure in tests to ensure it doesn't OOM Signed-off-by: Marco Pracucci <marco@pracucci.com> * Skip TestQuerier_ShouldNotPanicIfHeadChunkIsTruncatedWhileReadingQueriedChunks Signed-off-by: Marco Pracucci <marco@pracucci.com> * Experiment to not trigger runtime.GC() continuously Signed-off-by: Marco Pracucci <marco@pracucci.com> * Try to fix test in CI Signed-off-by: Marco Pracucci <marco@pracucci.com> * Do not call runtime.GC() at all Signed-off-by: Marco Pracucci <marco@pracucci.com> * I have no idea why it's failing in CI, skipping tests Signed-off-by: Marco Pracucci <marco@pracucci.com>	2021-05-06 14:18:59 -06:00
Julien Pivotto	889dd0bbd3	Fix DB tests in the default branch The main branch tests are not passing due to the fact that #8489 was not rebased on top of #8007. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2021-02-18 23:56:27 +01:00
Tom Wilkie	7369561305	Combine Appender.Add and AddFast into a single Append method. (#8489 ) This moves the label lookup into TSDB, whilst still keeping the cached-ref optimisation for repeated Appends. This makes the API easier to consume and implement. In particular this change is motivated by the scrape-time-aggregation work, which I don't think is possible to implement without it as it needs access to label values. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2021-02-18 17:37:00 +05:30
Arthur Silva Sens	6a3d55db0a	Rolling tombstones clean up (#8007 ) * CleanupTombstones refactored, now reloading blocks after every compaction. The goal is to remove deletable blocks after every compaction and, thus, decrease disk space used when cleaning tombstones. Signed-off-by: arthursens <arthursens2005@gmail.com> * Protect DB against parallel reloads Signed-off-by: ArthurSens <arthursens2005@gmail.com> * Fix typos Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2021-02-17 11:02:43 +05:30
Nguyen Le Vu Long	fbe960f2c1	fix: remove pre-2.21 tmp blocks on start (#8353 ) * fix: remove pre-2.21 tmp blocks on start Signed-off-by: Nguyen Le Vu Long <vulongvn98@gmail.com> * fix: commenting Signed-off-by: Nguyen Le Vu Long <vulongvn98@gmail.com>	2021-01-09 10:02:26 +01:00
Marco Pracucci	db19e05d93	Add option to customise head chunks write buffer size (#8201 ) * Add option to customise head chunks write buffer size Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed tests Signed-off-by: Marco Pracucci <marco@pracucci.com>	2020-11-19 18:30:47 +05:30
Julien Pivotto	8bc369bf9b	Calculate head chunk size based on actual disk usage (#8139 ) Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-11-03 15:34:59 +05:30
Julien Pivotto	6c56a1faaa	Testify: move to require (#8122 ) * Testify: move to require Moving testify to require to fail tests early in case of errors. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * More moves Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-29 09:43:23 +00:00
Ganesh Vernekar	3245b3267b	Don't use returned DB to close resources on TSDB startup error (#8113 ) * Don't use returned DB to close resources on TSDB startup error Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Add unit test and fix another panic Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Fix review comment Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-10-28 15:39:03 +05:30
Julien Pivotto	1282d1b39c	Refactor test assertions (#8110 ) * Refactor test assertions This pull request gets rid of assert.True where possible to use fine-grained assertions. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-27 11:06:53 +01:00
Julien Pivotto	4e5b1722b3	Move away from testutil, refactor imports (#8087 ) Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-22 11:00:08 +02:00

1 2 3 4 5

205 commits