Commit graph

126 commits

Author SHA1 Message Date
Giedrius Statkevičius f36b56a62c
tsdb: remove unused option (#13282)
Digging around the TSDB code and I've found that this flag is unused so
let's remove it.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2023-12-12 09:58:54 +00:00
Matthieu MOREL 8f6cf3aabb tsdb: use Go standard errors
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-12-11 12:18:54 +00:00
Charles Korn 59844498f7
Fix issue where queries can fail or omit OOO samples if OOO head compaction occurs between creating a querier and reading chunks (#13115)
* Add failing test.

Signed-off-by: Charles Korn <charles.korn@grafana.com>

* Don't run OOO head garbage collection while reads are running.

Signed-off-by: Charles Korn <charles.korn@grafana.com>

* Add further test cases for different order of operations.

Signed-off-by: Charles Korn <charles.korn@grafana.com>

* Ensure all queriers are closed if `DB.blockChunkQuerierForRange()` fails.

Signed-off-by: Charles Korn <charles.korn@grafana.com>

* Ensure all queriers are closed if `DB.Querier()` fails.

Signed-off-by: Charles Korn <charles.korn@grafana.com>

* Invert error handling in `DB.Querier()` and `DB.blockChunkQuerierForRange()` to make it clearer

Signed-off-by: Charles Korn <charles.korn@grafana.com>

* Ensure that queries that touch OOO data can't block OOO head garbage collection forever.

Signed-off-by: Charles Korn <charles.korn@grafana.com>

* Address PR feedback: fix parameter name in comment

Co-authored-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>
Signed-off-by: Charles Korn <charleskorn@users.noreply.github.com>

* Address PR feedback: use `lastGarbageCollectedMmapRef`

Signed-off-by: Charles Korn <charles.korn@grafana.com>

* Address PR feedback: ensure pending reads are cleaned up if creating an OOO querier fails

Signed-off-by: Charles Korn <charles.korn@grafana.com>

---------

Signed-off-by: Charles Korn <charles.korn@grafana.com>
Signed-off-by: Charles Korn <charleskorn@users.noreply.github.com>
Co-authored-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>
2023-11-24 12:38:38 +01:00
Matthieu MOREL dd8871379a remplace errors.Errorf by fmt.Errorf
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-11-14 13:04:31 +00:00
Márcio Carôso dff1c395f6
Expose --storage.tsdb.retention.time in metric prometheus_tsdb_retention_limit_seconds (#12986)
* Expose --storage.tsdb.retention.time in a metric

Signed-off-by: Marcio Caroso <msscaroso@gmail.com>

---------

Signed-off-by: Marcio Caroso <msscaroso@gmail.com>
2023-10-24 13:34:42 +02:00
George Krajcsovits 7d7b9eacff
Fix int32 overflow issues (#12978)
On a 32 bit architecture the size of int is 32 bits. Thus converting from
int64, uint64 can overflow it and flip the sign.

Try for yourself in playground:
package main

import "fmt"

func main() {
	x := int64(0x1F0000001)
	y := int64(1)
	z := int32(x - y) // numerically this is 0x1F0000000
	fmt.Printf("%v\n", z)
}

Prints -268435456 as if x was smaller.

Followup to #12650

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2023-10-16 16:23:26 +02:00
Ganesh Vernekar f5913266a1
Additionally wrap WBL replay error (#12406)
* Additionally wrap WBL replay error

Although WBL replay is already wrapped with errLoadWbl,
there are other errors that can happen during a WBL replay.
We should not try to repair WAL in those cases.

This commit additionally wraps the final error in Head.Init again
with errLoadWbl so that WBL replay errors can be identified properly.

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com>
Co-authored-by: Jesus Vazquez <jesusvzpg@gmail.com>
2023-10-13 14:21:35 +02:00
Goutham Veeramachaneni 86729d4d7b
Update exp package (#12650) 2023-09-21 22:53:51 +02:00
Arve Knudsen 4451ba10b4
Add context argument to IndexReader.Postings (#12667)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2023-09-13 17:45:06 +02:00
Arve Knudsen 6ef9ed0bc3
Add context argument to DB.Delete (#12834)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2023-09-13 15:43:06 +02:00
Arve Knudsen 6daee89e5f
Add context argument to Querier.Select (#12660)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2023-09-12 12:37:38 +02:00
Bryan Boreham 0d283effa8 promql: force mmap of head chunks in BenchmarkRangeQuery
Otherwise we have a highly unusual situation of over 100 chunks
in the headChunks list of each series, which heavily skews
performance.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-08-26 09:40:59 +00:00
Julien Pivotto e3fabd5fdf
Merge pull request #12664 from prometheus/superq/cleanup_chunk_snapshots
Cleanup temporary chunk snapshot dirs
2023-08-08 13:02:39 +02:00
SuperQ 8d38d59fc5
Cleanup temporary chunk snapshot dirs
Simlar to cleanup of WAL files on startup, cleanup temporary
chunk_snapshot dirs. This prevents storage space leaks due to terminated
snapshots on shutdown.

Signed-off-by: SuperQ <superq@gmail.com>
2023-08-08 09:43:48 +02:00
Łukasz Mierzwa 3c80963e81
Use a linked list for memSeries.headChunk (#11818)
Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks.
When append() is called on memSeries it might decide that a new headChunk is needed to use for given append() call.
If that happens it will first mmap existing head chunk and only after that happens it will create a new empty headChunk and continue appending
our sample to it.

Since appending samples uses write lock on memSeries no other read or write can happen until any append is completed.
When we have an append() that must create a new head chunk the whole memSeries is blocked until mmapping of existing head chunk finishes.
Mmapping itself uses a lock as it needs to be serialised, which means that the more chunks to mmap we have the longer each chunk might wait
for it to be mmapped.
If there's enough chunks that require mmapping some memSeries will be locked for long enough that it will start affecting
queries and scrapes.
Queries might timeout, since by default they have a 2 minute timeout set.
Scrapes will be blocked inside append() call, which means there will be a gap between samples. This will first affect range queries
or calls using rate() and such, since the time range requested in the query might have too few samples to calculate anything.

To avoid this we need to remove mmapping from append path, since mmapping is blocking.
But this means that when we cut a new head chunk we need to keep the old one around, so we can mmap it later.
This change makes memSeries.headChunk a linked list, memSeries.headChunk still points to the 'open' head chunk that receives new samples,
while older, yet to be mmapped, chunks are linked to it.
Mmapping is done on a schedule by iterating all memSeries one by one. Thanks to this we control when mmapping is done, since we trigger
it manually, which reduces the risk that it will have to compete for mmap locks with other chunks.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2023-07-31 11:10:24 +02:00
Justin Lei 32d87282ad
Add Zstandard compression option for wlog (#11666)
Snappy remains as the default compression but there is now a flag to switch 
the compression algorithm.

Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-07-11 14:57:57 +02:00
Bryan Boreham 5255bf06ad Replace sort.Slice with faster slices.SortFunc
The generic version is more efficient.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-07-02 22:17:08 +00:00
Nidhey Nitin Indurkar a8772a4178
Feat: Get block by id directly on promtool analyze & get latest block if ID not provided (#12031)
* feat: analyze latest block or block by ID in CLI (promtool)

Signed-off-by: nidhey27 <nidhey.indurkar@infracloud.io>

* address remarks

Signed-off-by: nidhey60@gmail.com <nidhey.indurkar@infracloud.io>

* address latest review comments

Signed-off-by: nidhey60@gmail.com <nidhey.indurkar@infracloud.io>

---------

Signed-off-by: nidhey27 <nidhey.indurkar@infracloud.io>
Signed-off-by: nidhey60@gmail.com <nidhey.indurkar@infracloud.io>
2023-06-01 17:13:09 +05:30
zenador 37e5249e33
Use DefaultSamplesPerChunk in tsdb (#12387)
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2023-05-24 13:00:21 +02:00
Callum Styan 0d2108ad79
[tsdb] re-implement WAL watcher to read via a "notification" channel (#11949)
* WIP implement WAL watcher reading via notifications over a channel from
the TSDB code

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Notify via head appenders Commit (finished all WAL logging) rather than
on each WAL Log call

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix misspelled Notify plus add a metric for dropped Write notifications

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Update tests to handle new notification pattern

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* this test maybe needs more time on windows?

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* does this test need more time on windows as well?

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* read timeout is already a time.Duration

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* remove mistakenly commited benchmark data files

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* address some review feedback

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* fix missed changes from previous commit

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix issues from wrapper function

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* try fixing race condition in test by allowing tests to overwrite the
read ticker timeout instead of calling the Notify function

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* fix linting

Signed-off-by: Callum Styan <callumstyan@gmail.com>

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2023-05-15 12:31:49 -07:00
Björn Rabenstein 37fe9b89dc
Merge pull request #12055 from leizor/leizor/prometheus/issues/12009
Adjust samplesPerChunk from 120 to 220
2023-05-10 14:45:12 +02:00
cui fliter 276ca6a883 fix some comments
Signed-off-by: cui fliter <imcusg@gmail.com>
2023-04-25 14:19:16 +08:00
Matthieu MOREL bae9a21200
Merge branch 'main' into linter/nilerr
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-04-19 19:56:39 +02:00
beorn7 5b53aa1108 style: Replace else if cascades with switch
Wiser coders than myself have come to the conclusion that a `switch`
statement is almost always superior to a statement that includes any
`else if`.

The exceptions that I have found in our codebase are just these two:

* The `if else` is followed by an additional statement before the next
  condition (separated by a `;`).
* The whole thing is within a `for` loop and `break` statements are
  used. In this case, using `switch` would require tagging the `for`
  loop, which probably tips the balance.

Why are `switch` statements more readable?

For one, fewer curly braces. But more importantly, the conditions all
have the same alignment, so the whole thing follows the natural flow
of going down a list of conditions. With `else if`, in contrast, all
conditions but the first are "hidden" behind `} else if `, harder to
spot and (for no good reason) presented differently from the first
condition.

I'm sure the aforemention wise coders can list even more reasons.

In any case, I like it so much that I have found myself recommending
it in code reviews. I would like to make it a habit in our code base,
without making it a hard requirement that we would test on the CI. But
for that, there has to be a role model, so this commit eliminates all
`if else` occurrences, unless it is autogenerated code or fits one of
the exceptions above.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-19 17:22:31 +02:00
Justin Lei 052993414a Add storage.tsdb.samples-per-chunk flag
Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-04-13 15:59:49 -07:00
Matthieu MOREL fb3eb21230 enable gocritic, unconvert and unused linters
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-04-13 19:20:22 +00:00
Vernon Miller ca0abf26c5
Adds an affirmative log message for successful WAL repair (#12135)
* Adds an affirmative log message for successful WAL repair

Signed-off-by: Vernon Miller <vernon.miller@grafana.com>
Signed-off-by: Vernon Miller <96601789+aldernero@users.noreply.github.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-03-21 19:33:43 +05:30
Đurica Yuri Nikolić c9b85afd93
Making the number of CPUs used for WAL replay configurable (#12066)
Adds `WALReplayConcurrency` as an option on tsdb `Options` and `HeadOptions`.
If it is not set or set <=0, then `GOMAXPROCS` is used, which matches the previous behaviour.

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
2023-03-07 16:41:33 +00:00
Bryan Boreham 543c318ec2 Update package tsdb for new labels.Labels type
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-12-19 15:22:09 +00:00
Bryan Boreham 6bdecf377c
Switch from 'sanity' to more inclusive lanuage (#9376)
* Switch from 'sanity' to more inclusive lanuage

"Removing ableist language in code is important; it helps to create and
maintain an environment that welcomes all developers of all backgrounds,
while emphasizing that we as developers select the most articulate,
precise, descriptive language we can rather than relying on metaphors.

The phrase sanity check is ableist, and unnecessarily references mental
health in our code bases. It denotes that people with mental illnesses
are inferior, wrong, or incorrect, and the phrase sanity continues to be
used by employers and other individuals to discriminate against these
people."

From https://gist.github.com/seanmhanson/fe370c2d8bd2b3228680e38899baf5cc

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-11-28 17:09:18 +00:00
Ganesh Vernekar 42633bd05c
Merge pull request #11485 from t00350320/prometheus-office
GetRefByhash() will query a label's ref with hash value rather than lset.Hash().
2022-11-16 15:09:49 +01:00
tanghengjian 982007ecab
GetRefByhash will query a label's ref with hash value rather than lset.Hash().
Signed-off-by: tanghengjian <1040104807@qq.com>
2022-11-16 14:13:59 +01:00
Ganesh Vernekar fa6e05903f
Merge pull request #11447 from prometheus/sparsehistogram
Add Support for Native Histograms

This PR merges all the coding work that has been done in sparsehistogram branch over the last 1 year into main branch.

Design doc on native histograms: https://docs.google.com/document/d/1cLNv3aufPZb3fNfaJgdaRBZsInZKKIHo9E6HinJVbpM/edit
Some sneak peak: https://www.youtube.com/watch?v=T2GvcYNth9U
2022-10-26 17:10:46 -04:00
Viacheslav Panasovets 3d2e18bad5
Fix time.Since() in defer. Wrap in anonymous function (#11489)
Function arguments in defer evaluated during definition of defer, not
during execution

Signed-off-by: Slavik Panasovets <slavik@google.com>

Signed-off-by: Slavik Panasovets <slavik@google.com>
2022-10-26 00:26:12 +02:00
Ganesh Vernekar 648be89822
Merge remote-tracking branch 'upstream/main' into fix-conflict
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-10-12 14:20:02 +05:30
Signed-off-by: Jesus Vazquez 3362bf6d79
Fix merge conflicts
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-10-11 22:53:37 +05:30
Jesus Vazquez 775d90d5f8
TSDB: Rename wal package to wlog (#11352)
The wlog.WL type can now be used to create a Write Ahead Log or a Write
Behind Log.

Before the prefix for wbl metrics was
'prometheus_tsdb_out_of_order_wal_' and has been replaced with
'prometheus_tsdb_out_of_order_wbl_'.

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Signed-off-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
2022-10-10 20:38:46 +05:30
Jesus Vazquez e934d0f011 Merge 'main' into sparsehistogram
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
2022-10-05 22:14:49 +02:00
Ganesh Vernekar f34aeefe6e
Allow overlapping blocks by default (#11331)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-09-28 19:17:54 +05:30
Bryan Boreham ff00dee262
tsdb: turn off transaction isolation for head compaction (#11317)
* tsdb: add a basic test for read/write isolation

* tsdb: store the min time with isolationAppender
So that we can see when appending has moved past a certain point in time.

* tsdb: allow RangeHead to have isolation disabled
This will be used when for head compaction.

* tsdb: do head compaction with isolation disabled
This saves a lot of work tracking appends done while compaction is ongoing.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-09-27 19:31:23 +05:30
Jesus Vazquez c1b669bf9b
Add out-of-order sample support to the TSDB (#11075)
* Introduce out-of-order TSDB support

This implementation is based on this design doc:
https://docs.google.com/document/d/1Kppm7qL9C-BJB1j6yb6-9ObG3AbdZnFUBYPNNWwDBYM/edit?usp=sharing

This commit adds support to accept out-of-order ("OOO") sample into the TSDB
up to a configurable time allowance. If OOO is enabled, overlapping querying
are automatically enabled.

Most of the additions have been borrowed from
https://github.com/grafana/mimir-prometheus/
Here is the list ist of the original commits cherry picked
from mimir-prometheus into this branch:
- 4b2198d7ec
- 2836e5513f
- 00b379c3a5
- ff0dc75758
- a632c73352
- c6f3d4ab33
- 5e8406a1d4
- abde1e0ba1
- e70e769889
- df59320886

Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Dieter Plaetinck <dieter@grafana.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* gofumpt files

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Add license header to missing files

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix OOO tests due to existing chunk disk mapper implementation

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix truncate int overflow

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Add Sync method to the WAL and update tests

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* remove useless sync

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Update minOOOTime after truncating Head

* Update minOOOTime after truncating Head

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix lint

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Add a unit test

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Load OutOfOrderTimeWindow only once per appender

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix OOO Head LabelValues and PostingsForMatchers

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix replay of OOO mmap chunks

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Remove unnecessary err check

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Prevent panic with ApplyConfig

Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Run OOO compaction after restart if there is OOO data from WBL

Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Apply Bartek's suggestions

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Refactor OOO compaction

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Address comments and TODOs

- Added a comment explaining why we need the allow overlapping
  compaction toggle
- Clarified TSDBConfig OutOfOrderTimeWindow doc
- Added an owner to all the TODOs in the code

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Run go format

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix remaining review comments

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix tests

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Change wbl reference when truncating ooo in TestHeadMinOOOTimeUpdate

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix TestWBLAndMmapReplay test failure on windows

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Address most of the feedback

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Refactor the block meta for out of order

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix windows error

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix review comments

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Dieter Plaetinck <dieter@grafana.com>
Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2022-09-20 22:35:50 +05:30
Ganesh Vernekar d354f20c2a
Add a feature flag to control native histogram ingestion (#11253)
* Add runtime config to control native histogram ingestion

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Make the config into a CLI flag

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-09-14 17:38:34 +05:30
Abirdcfly 314aa45c2c
chore: remove duplicate word in comments (#11225)
Signed-off-by: Abirdcfly <fp544037857@gmail.com>

Signed-off-by: Abirdcfly <fp544037857@gmail.com>
2022-08-27 22:21:41 +02:00
Łukasz Mierzwa d65f037def
Don't increment prometheus_tsdb_compactions_failed_total when context is canceled (#10772)
When restarting Prometheus I sometimes see:

caller=db.go:832 level=error component=tsdb msg="compaction failed" err="compact head: persist head block: 2 errors: populate block: context canceled; context canceled"

And prometheus_tsdb_compactions_failed_total metric gets incremented.
This makes it more difficult to write alerts based on
prometheus_tsdb_compactions_failed_total metric since any restart can
trigger it.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2022-06-17 11:21:43 +05:30
Matthieu MOREL e2ede285a2
refactor: move from io/ioutil to io and os packages (#10528)
* refactor: move from io/ioutil to io and os packages
* use fs.DirEntry instead of os.FileInfo after os.ReadDir

Signed-off-by: MOREL Matthieu <matthieu.morel@cnp.fr>
2022-04-27 11:24:36 +02:00
Howie 1291ec7185
deleting *.tmp WAL files on startup (#10317)
* fix issue #10245

Signed-off-by: lihaowei <haoweili35@gmail.com>

* minor changes

Signed-off-by: lihaowei <haoweili35@gmail.com>

* review changes

Signed-off-by: lihaowei <haoweili35@gmail.com>

* minor changes

Signed-off-by: lihaowei <haoweili35@gmail.com>
2022-03-24 16:14:14 +05:30
cui fliter c9b56d1a49
all: fix some typos (#10389)
Signed-off-by: cuishuang <imcusg@gmail.com>
2022-03-03 12:03:07 +00:00
Mauro Stettler 0df3489275
Write chunks via queue, predicting the refs (#10051)
* Write chunks via queue, predicting the refs

Our load tests have shown that there is a latency spike in the
remote write handler whenever the head chunks need to be written,
because chunkDiskMapper.WriteChunk() blocks until the chunks are written
to disk.

This adds a queue to the chunk disk mapper which makes the WriteChunk()
method non-blocking unless the queue is full. Reads can still be served
from the queue.

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* address PR feeddback

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* initialize metrics without .Add(0)

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* change isRunningMtx to normal lock

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* do not re-initialize chunkrefmap

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* update metric outside of lock scope

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* add benchmark for adding job to chunk write queue

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* remove unnecessary "success" var

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* gofumpt -extra

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* avoid WithLabelValues call in addJob

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* format comments

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* addressing PR feedback

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* rename cutExpectRef to cutAndExpectRef

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* use head.Init() instead of .initTime()

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* address PR feedback

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* PR feedback

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* update test according to PR feedback

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* replace callbackWg -> awaitCb

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* better test of truncation with empty files

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* replace callbackWg -> awaitCb

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
2022-01-10 13:36:45 +00:00
Darshan Chaudhary 9dcf8b2208
Add the ability to disable tsdb isolation (#9270)
* Disable isolation in isolation struct

Signed-off-by: darshanime <deathbullet@gmail.com>

* Run tsdb tests with isolation disabled

Signed-off-by: darshanime <deathbullet@gmail.com>

* Check for isolation disabled in isoState.Close()

Signed-off-by: darshanime <deathbullet@gmail.com>

* use t.Skip to skip isolation tests when disabled

Signed-off-by: darshanime <deathbullet@gmail.com>

* address review comments

Signed-off-by: darshanime <deathbullet@gmail.com>

* fix test for defaultIsolationState

Signed-off-by: darshanime <deathbullet@gmail.com>

* Change flag name. Set flag in DB. Do not init txRing. Close isoState.

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Test disabled isolation in CircleCI test_go

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Skip isolation related tests in db_test.go

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-11-19 15:41:32 +05:30
Dieter Plaetinck 0fac9bb859
Add basic initial developer docs for TSDB (#9451)
* Add basic initial developer docs for TSDB

There's a decent amount of content already out there (blog posts,
conference talks, etc), but:
* when they get stale, they don't tend to get updated
* they still leave me with questions that I'ld like to answer
  for developers (like me) who want to use, or work with, TSDB

What I propose is developer docs inside the prometheus
repository.  Easy to find and harness the power of the community
to expand it and keep it up to date.

* perfect is the enemy of good.  Let's have a base and incrementally improve
* Markdown docs should be broad but not too deep.  Source code comments
  can complement them, and are the ideal place for implementation details.

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* use example code that works out of the box

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* Apply suggestions from code review

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* PR feedback

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* more docs

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* PR feedback

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* Apply suggestions from code review

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Apply suggestions from code review

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>

* feedback

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* Update tsdb/docs/usage.md

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>

* final tweaks

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* workaround docs versioning issue

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* Move example code to real executable, testable example.

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* cleanup example test and make sure it always reproduces

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* obtain temp dir in a way that works with older Go versions

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* Fix Ganesh's comments

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-11-17 15:51:27 +05:30