prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-11-15 18:14:06 -08:00

Author	SHA1	Message	Date
Matthieu MOREL	6f595c6762	golangci-lint: enable whitespace linter (#13905 ) Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2024-04-11 09:27:54 +01:00
Jonathan Halterman	633224886a	Write out of order hint when initially creating meta file (#13894 ) Signed-off-by: Jonathan Halterman <jonathan@grafana.com> Signed-off-by: Jonathan Halterman <jhalterman@gmail.com> Co-authored-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>	2024-04-08 17:34:14 +02:00
Łukasz Mierzwa	277f04f0c4	Stop compactions if there's a block to write (#13754 ) * Stop compactions if there's a block to write db.Compact() checks if there's a block to write with HEAD chunks before calling db.compactBlocks(). This is to ensure that if we need to write a block then it happens ASAP, otherwise memory usage might keep growing. But what can also happen is that we don't need to write any block, we start db.compactBlocks(), compaction takes hours, and in the meantime HEAD needs to write out chunks to a block. This can be especially problematic if, for example, you run Thanos sidecar that's uploading block, which requires that compactions are disabled. Then you disable Thanos sidecar and re-enable compactions. When db.compactBlocks() is finally called it might have a huge number of blocks to compact, which might take a very long time, during which HEAD cannot write out chunks to a new block. In such case memory usage will keep growing until either: - compactions are finally finished and HEAD can write a block - we run out of memory and Prometheus gets OOM-killed This change adds a check for pending HEAD block writes inside db.compactBlocks(), so that we bail out early if there are still compactions to run, but we also need to write a new block. Also add a test for compactBlocks. --------- Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com> Signed-off-by: Lukasz Mierzwa <lukasz@cloudflare.com>	2024-04-07 18:28:28 +01:00
carehabit	a672662073	all: fix some typos (#13863 ) Signed-off-by: carehabit <shenyuting@outlook.com>	2024-04-01 18:06:05 +02:00
Bartlomiej Plotka	25578f2b22	[test] Merge pull request #13790 from aknuds1/arve/retention-commit tsdb.BeyondTimeRetention: Fix comment and test at retention duration	2024-03-26 12:26:32 +01:00
machine424	2a2e2ed28b	chore(tsdb): set the wbl to nil as well in DBReadOnly.loadDataAsQueryable Signed-off-by: machine424 <ayoubmrini424@gmail.com>	2024-03-20 17:13:39 +01:00
Arve Knudsen	9c7a734063	tsdb.BeyondTimeRetention: Fix comment and test at retention duration Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2024-03-19 09:10:21 +01:00
Bryan Boreham	a0e93e403e	Merge pull request #13764 from bboreham/remove-deprecated-wal [Cleanup] TSDB: Remove old deprecated WAL implementation Deprecated since 2018.	2024-03-17 09:34:57 +00:00
Darshan Chaudhary	b7047f7fcb	Fix retention boundary so 2h retention deletes blocks right at the 2h boundary (#9633 ) Signed-off-by: darshanime <deathbullet@gmail.com>	2024-03-15 19:35:16 +01:00
Bryan Boreham	d45b5deb75	TSDB: move function only used in tests Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2024-03-15 08:54:47 +00:00
Bryan Boreham	3274cac0d3	TSDB: remove unused function Was only used in old WAL implementation. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2024-03-15 08:51:57 +00:00
Bryan Boreham	87edf1f960	[Cleanup] TSDB: Remove old deprecated WAL implementation Deprecated since 2018. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2024-03-13 15:57:23 +00:00
György Krajcsovits	4d4d822c36	Add native histograms to latency/duration metrics Dogfood native histograms. Allow dependent projects to migrate to native histograms. I took the defaults from client_golang. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2024-03-01 14:44:38 +01:00
machine424	f477e0539a	Move from golang.org/x/exp/slices into slices now that we only support Go >= 1.21 Prevent adding back golang.org/x/exp/slices. Signed-off-by: machine424 <ayoubmrini424@gmail.com>	2024-02-28 14:54:53 +01:00
Marco Pracucci	501bc6419e	Add ShardedPostings() support to TSDB (#10421 ) This PR is a reference implementation of the proposal described in #10420. In addition to what described in #10420, in this PR I've introduced labels.StableHash(). The idea is to offer an hashing function which doesn't change over time, and that's used by query sharding in order to get a stable behaviour over time. The implementation of labels.StableHash() is the hashing function used by Prometheus before stringlabels, and what's used by Grafana Mimir for query sharding (because built before stringlabels was a thing). Follow up work As mentioned in #10420, if this PR is accepted I'm also open to upload another foundamental piece used by Grafana Mimir query sharding to accelerate the query execution: an optional, configurable and fast in-memory cache for the series hashes. Signed-off-by: Marco Pracucci <marco@pracucci.com>	2024-01-29 11:57:27 +00:00
Giedrius Statkevičius	b695e069b8	tsdb/main: wire "EnableOverlappingCompaction" to tsdb.Options (#13398 ) This added the https://github.com/prometheus/prometheus/pull/13393 "EnableOverlappingCompaction" parameter to the compactor code but not to the tsdb.Options. I forgot about that. Add it to `tsdb.Options` too and set it to `true` in Prometheus. Copy/paste the description from https://github.com/prometheus/prometheus/pull/13393#issuecomment-1891787986 Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>	2024-01-15 16:42:40 +01:00
Giedrius Statkevičius	61b4080a14	tsdb/{index,compact}: allow using custom postings encoding format (#13242 ) * tsdb/{index,compact}: allow using custom postings encoding format We would like to experiment with a different postings encoding format in Thanos so in this change I am proposing adding another argument to `NewWriter` which would allow users to change the format if needed. Also, wire the leveled compactor so that it would be possible to change the format there too. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * tsdb/compact: use a struct for leveled compactor options As discussed on Slack, let's use a struct for the options in leveled compactor. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * tsdb: make changes after Bryan's review - Make changes less intrusive - Turn the postings encoder type into a function - Add NewWriterWithEncoder() Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> --------- Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>	2024-01-08 09:48:27 +00:00
Giedrius Statkevičius	f36b56a62c	tsdb: remove unused option (#13282 ) Digging around the TSDB code and I've found that this flag is unused so let's remove it. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>	2023-12-12 09:58:54 +00:00
Matthieu MOREL	8f6cf3aabb	tsdb: use Go standard errors Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2023-12-11 12:18:54 +00:00
Charles Korn	59844498f7	Fix issue where queries can fail or omit OOO samples if OOO head compaction occurs between creating a querier and reading chunks (#13115 ) * Add failing test. Signed-off-by: Charles Korn <charles.korn@grafana.com> * Don't run OOO head garbage collection while reads are running. Signed-off-by: Charles Korn <charles.korn@grafana.com> * Add further test cases for different order of operations. Signed-off-by: Charles Korn <charles.korn@grafana.com> * Ensure all queriers are closed if `DB.blockChunkQuerierForRange()` fails. Signed-off-by: Charles Korn <charles.korn@grafana.com> * Ensure all queriers are closed if `DB.Querier()` fails. Signed-off-by: Charles Korn <charles.korn@grafana.com> * Invert error handling in `DB.Querier()` and `DB.blockChunkQuerierForRange()` to make it clearer Signed-off-by: Charles Korn <charles.korn@grafana.com> * Ensure that queries that touch OOO data can't block OOO head garbage collection forever. Signed-off-by: Charles Korn <charles.korn@grafana.com> * Address PR feedback: fix parameter name in comment Co-authored-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com> Signed-off-by: Charles Korn <charleskorn@users.noreply.github.com> * Address PR feedback: use `lastGarbageCollectedMmapRef` Signed-off-by: Charles Korn <charles.korn@grafana.com> * Address PR feedback: ensure pending reads are cleaned up if creating an OOO querier fails Signed-off-by: Charles Korn <charles.korn@grafana.com> --------- Signed-off-by: Charles Korn <charles.korn@grafana.com> Signed-off-by: Charles Korn <charleskorn@users.noreply.github.com> Co-authored-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>	2023-11-24 12:38:38 +01:00
Matthieu MOREL	dd8871379a	remplace errors.Errorf by fmt.Errorf Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2023-11-14 13:04:31 +00:00
Márcio Carôso	dff1c395f6	Expose --storage.tsdb.retention.time in metric prometheus_tsdb_retention_limit_seconds (#12986 ) * Expose --storage.tsdb.retention.time in a metric Signed-off-by: Marcio Caroso <msscaroso@gmail.com> --------- Signed-off-by: Marcio Caroso <msscaroso@gmail.com>	2023-10-24 13:34:42 +02:00
George Krajcsovits	7d7b9eacff	Fix int32 overflow issues (#12978 ) On a 32 bit architecture the size of int is 32 bits. Thus converting from int64, uint64 can overflow it and flip the sign. Try for yourself in playground: package main import "fmt" func main() { x := int64(0x1F0000001) y := int64(1) z := int32(x - y) // numerically this is 0x1F0000000 fmt.Printf("%v\n", z) } Prints -268435456 as if x was smaller. Followup to #12650 Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2023-10-16 16:23:26 +02:00
Ganesh Vernekar	f5913266a1	Additionally wrap WBL replay error (#12406 ) * Additionally wrap WBL replay error Although WBL replay is already wrapped with errLoadWbl, there are other errors that can happen during a WBL replay. We should not try to repair WAL in those cases. This commit additionally wraps the final error in Head.Init again with errLoadWbl so that WBL replay errors can be identified properly. Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com> Co-authored-by: Jesus Vazquez <jesusvzpg@gmail.com>	2023-10-13 14:21:35 +02:00
Goutham Veeramachaneni	86729d4d7b	Update exp package (#12650 )	2023-09-21 22:53:51 +02:00
Arve Knudsen	4451ba10b4	Add context argument to IndexReader.Postings (#12667 ) Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2023-09-13 17:45:06 +02:00
Arve Knudsen	6ef9ed0bc3	Add context argument to DB.Delete (#12834 ) Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2023-09-13 15:43:06 +02:00
Arve Knudsen	6daee89e5f	Add context argument to Querier.Select (#12660 ) Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2023-09-12 12:37:38 +02:00
Bryan Boreham	0d283effa8	promql: force mmap of head chunks in BenchmarkRangeQuery Otherwise we have a highly unusual situation of over 100 chunks in the headChunks list of each series, which heavily skews performance. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2023-08-26 09:40:59 +00:00
Julien Pivotto	e3fabd5fdf	Merge pull request #12664 from prometheus/superq/cleanup_chunk_snapshots Cleanup temporary chunk snapshot dirs	2023-08-08 13:02:39 +02:00
SuperQ	8d38d59fc5	Cleanup temporary chunk snapshot dirs Simlar to cleanup of WAL files on startup, cleanup temporary chunk_snapshot dirs. This prevents storage space leaks due to terminated snapshots on shutdown. Signed-off-by: SuperQ <superq@gmail.com>	2023-08-08 09:43:48 +02:00
Łukasz Mierzwa	3c80963e81	Use a linked list for memSeries.headChunk (#11818 ) Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks. When append() is called on memSeries it might decide that a new headChunk is needed to use for given append() call. If that happens it will first mmap existing head chunk and only after that happens it will create a new empty headChunk and continue appending our sample to it. Since appending samples uses write lock on memSeries no other read or write can happen until any append is completed. When we have an append() that must create a new head chunk the whole memSeries is blocked until mmapping of existing head chunk finishes. Mmapping itself uses a lock as it needs to be serialised, which means that the more chunks to mmap we have the longer each chunk might wait for it to be mmapped. If there's enough chunks that require mmapping some memSeries will be locked for long enough that it will start affecting queries and scrapes. Queries might timeout, since by default they have a 2 minute timeout set. Scrapes will be blocked inside append() call, which means there will be a gap between samples. This will first affect range queries or calls using rate() and such, since the time range requested in the query might have too few samples to calculate anything. To avoid this we need to remove mmapping from append path, since mmapping is blocking. But this means that when we cut a new head chunk we need to keep the old one around, so we can mmap it later. This change makes memSeries.headChunk a linked list, memSeries.headChunk still points to the 'open' head chunk that receives new samples, while older, yet to be mmapped, chunks are linked to it. Mmapping is done on a schedule by iterating all memSeries one by one. Thanks to this we control when mmapping is done, since we trigger it manually, which reduces the risk that it will have to compete for mmap locks with other chunks. Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>	2023-07-31 11:10:24 +02:00
Justin Lei	32d87282ad	Add Zstandard compression option for wlog (#11666 ) Snappy remains as the default compression but there is now a flag to switch the compression algorithm. Signed-off-by: Justin Lei <justin.lei@grafana.com>	2023-07-11 14:57:57 +02:00
Bryan Boreham	5255bf06ad	Replace sort.Slice with faster slices.SortFunc The generic version is more efficient. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2023-07-02 22:17:08 +00:00
Nidhey Nitin Indurkar	a8772a4178	Feat: Get block by id directly on promtool analyze & get latest block if ID not provided (#12031 ) * feat: analyze latest block or block by ID in CLI (promtool) Signed-off-by: nidhey27 <nidhey.indurkar@infracloud.io> * address remarks Signed-off-by: nidhey60@gmail.com <nidhey.indurkar@infracloud.io> * address latest review comments Signed-off-by: nidhey60@gmail.com <nidhey.indurkar@infracloud.io> --------- Signed-off-by: nidhey27 <nidhey.indurkar@infracloud.io> Signed-off-by: nidhey60@gmail.com <nidhey.indurkar@infracloud.io>	2023-06-01 17:13:09 +05:30
zenador	37e5249e33	Use DefaultSamplesPerChunk in tsdb (#12387 ) Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2023-05-24 13:00:21 +02:00
Callum Styan	0d2108ad79	[tsdb] re-implement WAL watcher to read via a "notification" channel (#11949 ) * WIP implement WAL watcher reading via notifications over a channel from the TSDB code Signed-off-by: Callum Styan <callumstyan@gmail.com> * Notify via head appenders Commit (finished all WAL logging) rather than on each WAL Log call Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix misspelled Notify plus add a metric for dropped Write notifications Signed-off-by: Callum Styan <callumstyan@gmail.com> * Update tests to handle new notification pattern Signed-off-by: Callum Styan <callumstyan@gmail.com> * this test maybe needs more time on windows? Signed-off-by: Callum Styan <callumstyan@gmail.com> * does this test need more time on windows as well? Signed-off-by: Callum Styan <callumstyan@gmail.com> * read timeout is already a time.Duration Signed-off-by: Callum Styan <callumstyan@gmail.com> * remove mistakenly commited benchmark data files Signed-off-by: Callum Styan <callumstyan@gmail.com> * address some review feedback Signed-off-by: Callum Styan <callumstyan@gmail.com> * fix missed changes from previous commit Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix issues from wrapper function Signed-off-by: Callum Styan <callumstyan@gmail.com> * try fixing race condition in test by allowing tests to overwrite the read ticker timeout instead of calling the Notify function Signed-off-by: Callum Styan <callumstyan@gmail.com> * fix linting Signed-off-by: Callum Styan <callumstyan@gmail.com> --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>	2023-05-15 12:31:49 -07:00
Björn Rabenstein	37fe9b89dc	Merge pull request #12055 from leizor/leizor/prometheus/issues/12009 Adjust samplesPerChunk from 120 to 220	2023-05-10 14:45:12 +02:00
cui fliter	276ca6a883	fix some comments Signed-off-by: cui fliter <imcusg@gmail.com>	2023-04-25 14:19:16 +08:00
Matthieu MOREL	bae9a21200	Merge branch 'main' into linter/nilerr Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2023-04-19 19:56:39 +02:00
beorn7	5b53aa1108	style: Replace `else if` cascades with `switch` Wiser coders than myself have come to the conclusion that a `switch` statement is almost always superior to a statement that includes any `else if`. The exceptions that I have found in our codebase are just these two: * The `if else` is followed by an additional statement before the next condition (separated by a `;`). * The whole thing is within a `for` loop and `break` statements are used. In this case, using `switch` would require tagging the `for` loop, which probably tips the balance. Why are `switch` statements more readable? For one, fewer curly braces. But more importantly, the conditions all have the same alignment, so the whole thing follows the natural flow of going down a list of conditions. With `else if`, in contrast, all conditions but the first are "hidden" behind `} else if `, harder to spot and (for no good reason) presented differently from the first condition. I'm sure the aforemention wise coders can list even more reasons. In any case, I like it so much that I have found myself recommending it in code reviews. I would like to make it a habit in our code base, without making it a hard requirement that we would test on the CI. But for that, there has to be a role model, so this commit eliminates all `if else` occurrences, unless it is autogenerated code or fits one of the exceptions above. Signed-off-by: beorn7 <beorn@grafana.com>	2023-04-19 17:22:31 +02:00
Justin Lei	052993414a	Add storage.tsdb.samples-per-chunk flag Signed-off-by: Justin Lei <justin.lei@grafana.com>	2023-04-13 15:59:49 -07:00
Matthieu MOREL	fb3eb21230	enable gocritic, unconvert and unused linters Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2023-04-13 19:20:22 +00:00
Vernon Miller	ca0abf26c5	Adds an affirmative log message for successful WAL repair (#12135 ) * Adds an affirmative log message for successful WAL repair Signed-off-by: Vernon Miller <vernon.miller@grafana.com> Signed-off-by: Vernon Miller <96601789+aldernero@users.noreply.github.com> Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>	2023-03-21 19:33:43 +05:30
Đurica Yuri Nikolić	c9b85afd93	Making the number of CPUs used for WAL replay configurable (#12066 ) Adds `WALReplayConcurrency` as an option on tsdb `Options` and `HeadOptions`. If it is not set or set <=0, then `GOMAXPROCS` is used, which matches the previous behaviour. Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>	2023-03-07 16:41:33 +00:00
Bryan Boreham	543c318ec2	Update package tsdb for new labels.Labels type Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-12-19 15:22:09 +00:00
Bryan Boreham	6bdecf377c	Switch from 'sanity' to more inclusive lanuage (#9376 ) * Switch from 'sanity' to more inclusive lanuage "Removing ableist language in code is important; it helps to create and maintain an environment that welcomes all developers of all backgrounds, while emphasizing that we as developers select the most articulate, precise, descriptive language we can rather than relying on metaphors. The phrase sanity check is ableist, and unnecessarily references mental health in our code bases. It denotes that people with mental illnesses are inferior, wrong, or incorrect, and the phrase sanity continues to be used by employers and other individuals to discriminate against these people." From https://gist.github.com/seanmhanson/fe370c2d8bd2b3228680e38899baf5cc Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-11-28 17:09:18 +00:00
Ganesh Vernekar	42633bd05c	Merge pull request #11485 from t00350320/prometheus-office GetRefByhash() will query a label's ref with hash value rather than lset.Hash().	2022-11-16 15:09:49 +01:00
tanghengjian	982007ecab	GetRefByhash will query a label's ref with hash value rather than lset.Hash(). Signed-off-by: tanghengjian <1040104807@qq.com>	2022-11-16 14:13:59 +01:00
Ganesh Vernekar	fa6e05903f	Merge pull request #11447 from prometheus/sparsehistogram Add Support for Native Histograms This PR merges all the coding work that has been done in sparsehistogram branch over the last 1 year into main branch. Design doc on native histograms: https://docs.google.com/document/d/1cLNv3aufPZb3fNfaJgdaRBZsInZKKIHo9E6HinJVbpM/edit Some sneak peak: https://www.youtube.com/watch?v=T2GvcYNth9U	2022-10-26 17:10:46 -04:00

1 2 3

143 commits