If the query overlaps the range currently undergoing compaction, we
should only fetch chunks up to that time. Need to store that min time
in `HeadAndOOOIndexReader`.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Several things done here:
- Set `max-issues-per-linter` to 0 so that we actually see all linter
warnings and not just 50 per linter. (As we also set
`max-same-issues` to 0, I assume this was the intention from the
beginning.)
- Stop using the golangci-lint default excludes (by setting
`exclude-use-default: false`. Those are too generous and don't match
our style conventions. (I have re-added some of the excludes
explicitly in this commit. See below.)
- Re-add the `errcheck` exclusion we have used so far via the
defaults.
- Exclude the signature requirement `govet` has for `Seek` methods
because we use non-standard `Seek` methods a lot. (But we keep other
requirements, while the default excludes completely disabled the
check for common method segnatures.)
- Exclude warnings about missing doc comments on exported symbols. (We
used to be pretty adamant about doc comments, but stopped that at
some point in the past. By now, we have about 500 missing doc
comments. We may consider reintroducing this check, but that's
outside of the scope of this commit. The default excludes of
golangci-lint essentially ignore doc comments completely.)
- By stop using the default excludes, we now get warnings back on
malformed doc comments. That's the most impactful change in this
commit. It does not enforce doc comments (again), but _if_ there is
a doc comment, it has to have the recommended form. (Most of the
changes in this commit are fixing this form.)
- Improve wording/spelling of some comments in .golangci.yml, and
remove an outdated comment.
- Leave `package-comments` inactive, but add a TODO asking if we
should change that.
- Add a new sub-linter `comment-spacings` (and fix corresponding
comments), which avoids missing spaces after the leading `//`.
Signed-off-by: beorn7 <beorn@grafana.com>
Just query via `HeadAndOOOQuerier`, which will skip series where no
in-order chunks are in range.
Now we don't need `OOORangeHead`.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Add `HeadAndOOOQuerier` which iterates just once over series, then
where necessary merges chunks from in-order and out-of-order lists.
Add a ChunkQuerier for in-order and ooo together
Add copy-last-chunk behaviour to HeadAndOOOChunkReader
Out-of-order chunk IDs are distinguished from in-order by setting bit 23.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
avoid simultaneous compactions and reduce stress on shared resources.
This is enabled via `--enable-feature=delayed-compaction`.
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
* expose hook for block querier
Signed-off-by: Ben Ye <benye@amazon.com>
* update comment
Signed-off-by: Ben Ye <benye@amazon.com>
* use defined type
Signed-off-by: Ben Ye <benye@amazon.com>
---------
Signed-off-by: Ben Ye <benye@amazon.com>
* add hook to allow head compaction to create multiple output blocks
Signed-off-by: Ben Ye <benye@amazon.com>
* change Compact interface; remove BlockPopulator changes
Signed-off-by: Ben Ye <benye@amazon.com>
* rebase main
Signed-off-by: Ben Ye <benye@amazon.com>
* fix lint
Signed-off-by: Ben Ye <benye@amazon.com>
* fix unit test
Signed-off-by: Ben Ye <benye@amazon.com>
* address feedbacks; add unit test
Signed-off-by: Ben Ye <benye@amazon.com>
* Apply suggestions from code review
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Update tsdb/compact_test.go
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
---------
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
* expose hook in tsdb to allow customizing compactor
Signed-off-by: Ben Ye <benye@amazon.com>
* address comment
Signed-off-by: Ben Ye <benye@amazon.com>
---------
Signed-off-by: Ben Ye <benye@amazon.com>
use it in loadDataAsQueryable to make sure the RO Head doesn't truncate or cut new chunks in data/chunks_head/.
add a -sandbox-dir-root flag to "promtool tsdb dump/dump-openmetrics" to control the root of that sandbox dirrectory.
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Signed-off-by: Jonathan Halterman <jonathan@grafana.com>
Signed-off-by: Jonathan Halterman <jhalterman@gmail.com>
Co-authored-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>
* Stop compactions if there's a block to write
db.Compact() checks if there's a block to write with HEAD chunks before calling db.compactBlocks().
This is to ensure that if we need to write a block then it happens ASAP, otherwise memory usage might keep growing.
But what can also happen is that we don't need to write any block, we start db.compactBlocks(),
compaction takes hours, and in the meantime HEAD needs to write out chunks to a block.
This can be especially problematic if, for example, you run Thanos sidecar that's uploading block,
which requires that compactions are disabled. Then you disable Thanos sidecar and re-enable compactions.
When db.compactBlocks() is finally called it might have a huge number of blocks to compact, which might
take a very long time, during which HEAD cannot write out chunks to a new block.
In such case memory usage will keep growing until either:
- compactions are finally finished and HEAD can write a block
- we run out of memory and Prometheus gets OOM-killed
This change adds a check for pending HEAD block writes inside db.compactBlocks(), so that
we bail out early if there are still compactions to run, but we also need to write a new
block.
Also add a test for compactBlocks.
---------
Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
Signed-off-by: Lukasz Mierzwa <lukasz@cloudflare.com>
Dogfood native histograms.
Allow dependent projects to migrate to native histograms.
I took the defaults from client_golang.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
This PR is a reference implementation of the proposal described in #10420.
In addition to what described in #10420, in this PR I've introduced labels.StableHash(). The idea is to offer an hashing function which doesn't change over time, and that's used by query sharding in order to get a stable behaviour over time. The implementation of labels.StableHash() is the hashing function used by Prometheus before stringlabels, and what's used by Grafana Mimir for query sharding (because built before stringlabels was a thing).
Follow up work
As mentioned in #10420, if this PR is accepted I'm also open to upload another foundamental piece used by Grafana Mimir query sharding to accelerate the query execution: an optional, configurable and fast in-memory cache for the series hashes.
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* tsdb/{index,compact}: allow using custom postings encoding format
We would like to experiment with a different postings encoding format in
Thanos so in this change I am proposing adding another argument to
`NewWriter` which would allow users to change the format if needed.
Also, wire the leveled compactor so that it would be possible to change
the format there too.
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
* tsdb/compact: use a struct for leveled compactor options
As discussed on Slack, let's use a struct for the options in leveled
compactor.
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
* tsdb: make changes after Bryan's review
- Make changes less intrusive
- Turn the postings encoder type into a function
- Add NewWriterWithEncoder()
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
---------
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Digging around the TSDB code and I've found that this flag is unused so
let's remove it.
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
* Add failing test.
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Don't run OOO head garbage collection while reads are running.
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Add further test cases for different order of operations.
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Ensure all queriers are closed if `DB.blockChunkQuerierForRange()` fails.
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Ensure all queriers are closed if `DB.Querier()` fails.
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Invert error handling in `DB.Querier()` and `DB.blockChunkQuerierForRange()` to make it clearer
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Ensure that queries that touch OOO data can't block OOO head garbage collection forever.
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Address PR feedback: fix parameter name in comment
Co-authored-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>
Signed-off-by: Charles Korn <charleskorn@users.noreply.github.com>
* Address PR feedback: use `lastGarbageCollectedMmapRef`
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Address PR feedback: ensure pending reads are cleaned up if creating an OOO querier fails
Signed-off-by: Charles Korn <charles.korn@grafana.com>
---------
Signed-off-by: Charles Korn <charles.korn@grafana.com>
Signed-off-by: Charles Korn <charleskorn@users.noreply.github.com>
Co-authored-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>
On a 32 bit architecture the size of int is 32 bits. Thus converting from
int64, uint64 can overflow it and flip the sign.
Try for yourself in playground:
package main
import "fmt"
func main() {
x := int64(0x1F0000001)
y := int64(1)
z := int32(x - y) // numerically this is 0x1F0000000
fmt.Printf("%v\n", z)
}
Prints -268435456 as if x was smaller.
Followup to #12650
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* Additionally wrap WBL replay error
Although WBL replay is already wrapped with errLoadWbl,
there are other errors that can happen during a WBL replay.
We should not try to repair WAL in those cases.
This commit additionally wraps the final error in Head.Init again
with errLoadWbl so that WBL replay errors can be identified properly.
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com>
Co-authored-by: Jesus Vazquez <jesusvzpg@gmail.com>
Otherwise we have a highly unusual situation of over 100 chunks
in the headChunks list of each series, which heavily skews
performance.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Simlar to cleanup of WAL files on startup, cleanup temporary
chunk_snapshot dirs. This prevents storage space leaks due to terminated
snapshots on shutdown.
Signed-off-by: SuperQ <superq@gmail.com>
Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks.
When append() is called on memSeries it might decide that a new headChunk is needed to use for given append() call.
If that happens it will first mmap existing head chunk and only after that happens it will create a new empty headChunk and continue appending
our sample to it.
Since appending samples uses write lock on memSeries no other read or write can happen until any append is completed.
When we have an append() that must create a new head chunk the whole memSeries is blocked until mmapping of existing head chunk finishes.
Mmapping itself uses a lock as it needs to be serialised, which means that the more chunks to mmap we have the longer each chunk might wait
for it to be mmapped.
If there's enough chunks that require mmapping some memSeries will be locked for long enough that it will start affecting
queries and scrapes.
Queries might timeout, since by default they have a 2 minute timeout set.
Scrapes will be blocked inside append() call, which means there will be a gap between samples. This will first affect range queries
or calls using rate() and such, since the time range requested in the query might have too few samples to calculate anything.
To avoid this we need to remove mmapping from append path, since mmapping is blocking.
But this means that when we cut a new head chunk we need to keep the old one around, so we can mmap it later.
This change makes memSeries.headChunk a linked list, memSeries.headChunk still points to the 'open' head chunk that receives new samples,
while older, yet to be mmapped, chunks are linked to it.
Mmapping is done on a schedule by iterating all memSeries one by one. Thanks to this we control when mmapping is done, since we trigger
it manually, which reduces the risk that it will have to compete for mmap locks with other chunks.
Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>