* Export single ith test histogram generation functions
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* Do not set counter reset hint for non-gauge histograms individually
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* Apply suggestions from code review
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>
---------
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
* tsdb: make sharding function a parameter
Instead of relying on `labels.Hash()`, which may change, have the
caller pass in a shard function if required.
For most purposes `tsdb.Options.ShardFunc` is used, but the compactor
may be created independently so `NewLeveledCompactorWithChunkSize` also
takes a shard function parameter.
Regular Prometheus, which does not use block sharding, will have this
parameter as nil.
Rename WithCache functions as WithOptions
Where they now have 2 or more extra parameters.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This is a bit more conservative than we could be. As long as a chunk
isn't the first in a block, we can be pretty sure that the previous
chunk won't disappear. However, the incremental gain of returning
NotCounterReset in these cases is probably very small and might not be
worth the code complications.
Wwith this, we now also pay attention to an explicitly set counter
reset during ingestion. While the case doesn't show up in practice
yet, there could be scenarios where the metric source knows there was
a counter reset even if it might not be visible from the values in the
histogram. It is also useful for testing.
Signed-off-by: beorn7 <beorn@grafana.com>
- Adjust doc comments to go1.19 style.
- Break down some overly long lines.
- Minor doc comment tweaks and fixes.
- Some renaming.
Some rationales for the last point:
I have renamed “interjections” into “inserts”, mostly because it is
shorter, and the word shows up a lot by now (and the concept is
cryptic enough to not obfuscate it even more with abbreviations).
I have also tried to find more descriptive naming for the “compare
spans” functions.
Signed-off-by: beorn7 <beorn@grafana.com>
This is an optimization on the existing append in OOOChunk.
What we've been doing so far is find the place inside the out-of-order
slice where the new sample should go in and then place it there and move
any samples to the right if necessary. This is OK but requires a binary
search every time the slice is bigger than 0.
The optimization is opinionated and suggests that although out-of-order
samples can be out-of-order amongst themselves they'll probably be in
order thus we can probably optimistically append at the end and if not
do the binary search.
OOOChunks are capped to 30 samples by default so this is a small
optimization but everything adds up, specially if you handle many active
timeseries with out-of-order samples.
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Signed-off-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
* adapt code.go and write_handler.go to support float histograms
* adapt watcher.go to support float histograms
* wip adapt queue_manager.go to support float histograms
* address comments for metrics in queue_manager.go
* set test cases for queue manager
* use same counts for histograms and float histograms
* refactor createHistograms tests
* fix float histograms ref in watcher_test.go
* address PR comments
Signed-off-by: Marc Tuduri <marctc@protonmail.com>
Instead of relying on `labels.Hash()`, which may change, have the
caller pass in a shard function if required.
For most purposes `tsdb.Options.ShardFunc` is used, but the compactor
may be created independently so `NewLeveledCompactorWithChunkSize` also
takes a shard function parameter.
Regular Prometheus, which does not use block sharding, will have this
parameter as nil.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This is to check if a gauge histogram can be appended to the given chunk.
If not, it tells what changes to make to the chunk and the histogram
if possible.
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
In the head and in v1 postings on disk, it makes no difference whether
postings are sorted. Only for v2 does the code step through in order.
So, move the sorting to where it is required, and thus skip it entirely
in the head.
Label values in on-disk blocks are already sorted, but `slices.Sort` is
very fast on already-sorted data so we don't bother checking.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This allocates memory for all the returned values, which skews the
result. We aren't trying to benchmark `ExpandPostings`, so just step
through all the values without storing them to consume them.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Previously all the postings constructed were consumed on the first
iteration, so subsequent iterations did no work.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
When out-of-order is enabled, queries go through both Head and OOOHead,
and they both execute the same PostingsForMatchers call, as memSeries
are shared for both.
In some cases these calls can be heavy, and also frequent. We can
deduplicate those calls by using the PostingsForMatchers cache that we
already use for query sharding.
The usage of this cache can skip a newly appended series in the results
for the duration of the ttl.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Extends Appender.AppendHistogram function to accept the FloatHistogram. TSDB supports appending, querying, WAL replay, for this new type of histogram.
Signed-off-by: Marc Tudurí <marctc@protonmail.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Instead of passing in a `ScratchBuilder` and `Labels`, just pass the
builder and the caller can extract labels from it. In many cases the
caller didn't use the Labels value anyway.
Now in `Labels.ScratchBuilder` we need a slightly different API: one
to assign what will be the result, instead of overwriting some other
`Labels`. This is safer and easier to reason about.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Implement decoding via labels.ScratchBuilder, which we retain and re-use
to reduce memory allocations.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This necessitates a change to the `tsdb.IndexReader` interface:
`index.Reader` is used from multiple goroutines concurrently, so we
can't have state in it.
We do retain a `ScratchBuilder` in `blockBaseSeriesSet` which is
iterator-like.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Use simpler utility function to create Labels objects, making fewer
assumptions about the data structure.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Re-use previous memory if it is already of the correct type.
In `NewListSeries` we hoist the conversion to an interface value out
so it only allocates once.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Re-use previous memory if it is already of the correct type.
Also turn two levels of function closure into a single object that
holds the required data.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Patterned after `Chunk.Iterator()`: pass the old iterator in so it
can be re-used to avoid allocating a new object.
(This commit does not do any re-use; it is just changing all the method
signatures so re-use is possible in later commits.)
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This adds negative buckets and access of float histograms to
TestHistogramChunkSameBuckets and TestHistogramChunkBucketChanges.
It also exercises a specific pattern of reusing an iterator (one where
no access has happened).
This exposes two bugs (where entries for positive buckets where used
where the corresponding entries for negative buckets should have been
used). One was fixed in #11627 (not merged), which triggered the work
in this commit.
This commit fixes both issues, so #11627 can be closed.
It also simplifies the code in the histogramIterator.Next method that
aims to recycle existing slice capacity.
Furthermore, this is on top of the release-2.40 branch because we
should probably cut a bugfix release for this.
Signed-off-by: beorn7 <beorn@grafana.com>
Typical parameters are one hour by 1 minute step, where the
function would allocate a slice of 3.6 million samples instead of 60.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Switch from 'sanity' to more inclusive lanuage
"Removing ableist language in code is important; it helps to create and
maintain an environment that welcomes all developers of all backgrounds,
while emphasizing that we as developers select the most articulate,
precise, descriptive language we can rather than relying on metaphors.
The phrase sanity check is ableist, and unnecessarily references mental
health in our code bases. It denotes that people with mental illnesses
are inferior, wrong, or incorrect, and the phrase sanity continues to be
used by employers and other individuals to discriminate against these
people."
From https://gist.github.com/seanmhanson/fe370c2d8bd2b3228680e38899baf5cc
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Inverting the test for chunks deleted by tombstones makes all three
rejections consistent, and also avoids the case where a chunk is
excluded but still causes `trimFront` or `trimBack` to be set.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This reduces garbage, hence goes faster, when a short time range is
required compared to the amount of chunks in the block. For example
recording rules and alerts often look only at the last few minutes.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* matcher.go: restore comment from upstream
There's no reason to remove this comment.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* head_append.go: Cortex to Mimir
This repo is used in Mimir.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* Add BenchmarkOpenBlock
* Use specific types when reading offset table
Instead of reading a generic-ish []string, we can read a generic type
which would be specifically labels.Label.
This avoid allocating a slice that escapes to the heap, making it both
faster and more efficient in terms of memory management.
* Update error message for unexpected number of keys
* s/posting offset table/postings offset table/
* Remove useless lastKey assignment
* Use two []bytes vars, simplify
Applied PR feedback: removed generics, moved the label indices reading
to that specific test as we're not using it in production anyway, we're
just testing what we've just built.
Also using two []bytes variables for name and value that use the backing
buffer instead of using strings, this reduces allocations a lot as we
only copy them when we store them (this is optimized by the compiler).
* Fix the dumb bug
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Marco Pracucci <marco@pracucci.com>
Function arguments in defer evaluated during definition of defer, not
during execution
Signed-off-by: Slavik Panasovets <slavik@google.com>
Signed-off-by: Slavik Panasovets <slavik@google.com>
With these changes, the "happy path" when the leading and trailing
number of bits don't need an update, fewer operations are needed.
The change is probably very marginal (no change in the benchmark added
here, but the benchmark also doesn't cover non-changing values), and
an argument could me made that avoiding pointers also has its
benefits.
However, I think that reducing the number of return values improves
readability. Which convinced me that I should at least propose this.
Signed-off-by: beorn7 <beorn@grafana.com>
The wlog.WL type can now be used to create a Write Ahead Log or a Write
Behind Log.
Before the prefix for wbl metrics was
'prometheus_tsdb_out_of_order_wal_' and has been replaced with
'prometheus_tsdb_out_of_order_wbl_'.
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Signed-off-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
This call was added by PR #11075 merged before #11318 which changed all
similar calls to `sort.Sort` into a faster one.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Use new experimental package `golang.org/x/exp/slices`.
slices.Sort works on values that are directly comparable, like ints,
so avoids the overhad of an interface call to `.Less()`.
Left tests unchanged, because they don't need the speed and it may be
a cross-check that slices.Sort gives the same answer.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Use new experimental package `golang.org/x/exp/slices`.
Some of the speedup comes from comparing SeriesRef (which is an int64)
directly rather than through an interface `.Less()` call; some comes
from exp/slices using "pattern-defeating quicksort(pdqsort)".
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* tsdb/agent: fix application of defaults
MaxTS was being incorrectly constrained to the truncation interval
* add more tests to check validation
* force MaxWALTime = MinWALTime if min > max
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
* TSDB chunks: remove race between writing and reading
Because the data is stored as a bit-stream, the last byte in the stream
could change if the stream is appended to after an Iterator is obtained.
Copy the last byte when the Iterator is created, so we don't have to
read it later.
Clarify in comments that concurrent Iterator and Appender are allowed,
but the chunk must not be modified while an Iterator is created.
(This was already the case, in order to copy the bstream slice header.)
* TSDB: stop saving last 4 samples in memSeries
This extra copy of the last 4 samples was introduced to avoid a race
condition between reading the last byte of the chunk and writing to it.
But now we have fixed that by having `bstreamReader` copy the last byte,
we don't need to copy the last 4 samples.
This change saves 56 bytes per series, which is very worthwhile when
you have millions or tens of millions of series.
* TSDB: tidy up stopIterator re-use
Previous changes have left this code duplicating some lines; pull
them out to a separate function and tidy up.
* TSDB head_test: stop checking when iterators are wrapped
The behaviour has changed so chunk iterators are only wrapped when
transaction isolation requires them to stop short of the end.
This makes tests fail which are checking the type.
Tests should check the observable behaviour, not the type.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
* tsdb: add a basic test for read/write isolation
* tsdb: store the min time with isolationAppender
So that we can see when appending has moved past a certain point in time.
* tsdb: allow RangeHead to have isolation disabled
This will be used when for head compaction.
* tsdb: do head compaction with isolation disabled
This saves a lot of work tracking appends done while compaction is ongoing.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* tsdb: remove chunkRange from memSeries
chunkRange is the (oddly-named) configured duration for the head block.
We don't need a copy of this value per series. Pass it down where
required, and remove the copy.
The value in `Head` is only updated in `resetInMemoryState()`, which
also discards all `memSeries`.
* tsdb: remove oooCapMax from memSeries
oooCapMax is the configured maximum capacity for an out-of-order chunk.
Storing it per-series uses extra memory, and has surprising behaviour
if users change the value in config - series created before the change
will keep their old value.
Instead, pass it down where required, and remove the per-series value.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Introduce out-of-order TSDB support
This implementation is based on this design doc:
https://docs.google.com/document/d/1Kppm7qL9C-BJB1j6yb6-9ObG3AbdZnFUBYPNNWwDBYM/edit?usp=sharing
This commit adds support to accept out-of-order ("OOO") sample into the TSDB
up to a configurable time allowance. If OOO is enabled, overlapping querying
are automatically enabled.
Most of the additions have been borrowed from
https://github.com/grafana/mimir-prometheus/
Here is the list ist of the original commits cherry picked
from mimir-prometheus into this branch:
- 4b2198d7ec
- 2836e5513f
- 00b379c3a5
- ff0dc75758
- a632c73352
- c6f3d4ab33
- 5e8406a1d4
- abde1e0ba1
- e70e769889
- df59320886
Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Dieter Plaetinck <dieter@grafana.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* gofumpt files
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Add license header to missing files
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Fix OOO tests due to existing chunk disk mapper implementation
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Fix truncate int overflow
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Add Sync method to the WAL and update tests
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* remove useless sync
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Update minOOOTime after truncating Head
* Update minOOOTime after truncating Head
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Fix lint
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Add a unit test
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Load OutOfOrderTimeWindow only once per appender
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Fix OOO Head LabelValues and PostingsForMatchers
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Fix replay of OOO mmap chunks
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Remove unnecessary err check
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Prevent panic with ApplyConfig
Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Run OOO compaction after restart if there is OOO data from WBL
Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Apply Bartek's suggestions
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Refactor OOO compaction
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Address comments and TODOs
- Added a comment explaining why we need the allow overlapping
compaction toggle
- Clarified TSDBConfig OutOfOrderTimeWindow doc
- Added an owner to all the TODOs in the code
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Run go format
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Fix remaining review comments
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Fix tests
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Change wbl reference when truncating ooo in TestHeadMinOOOTimeUpdate
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
* Fix TestWBLAndMmapReplay test failure on windows
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Address most of the feedback
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Refactor the block meta for out of order
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Fix windows error
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Fix review comments
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Dieter Plaetinck <dieter@grafana.com>
Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
If some shards did not get any samples mapped, the buffer will be empty
so sending it over the chan to `processWALSamples()` is a waste of time.
This is especially likely now we are checking `minValidTime` before
sending.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
The chunk pool belongs to the head not to the series. Pass it down where
required, and remove the copy of the pointer that `memSeries` was
holding.
`safeChunk` also needs to hold it, because in scenarios where it is used
we don't have a reference to the head. However it was already holding
`chunkDiskMapper` for the same reason, so no big change.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
There's no point splitting a sample onto the right shard and checking
if the series needs to be re-mapped, if we're only going to discard it
once it arrives at `ProcessWALSamples()`. Simply discard it earlier.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Add runtime config to control native histogram ingestion
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Make the config into a CLI flag
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
I find it useful to know why a restriction exists, to check whether that
reason still applies, or in which other places it might apply.
This is based on the note here:
https://github.com/prometheus/prometheus/pull/9270#pullrequestreview-743820956
on the PR where the original comment was added.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Update go to 1.19, set min version to 1.18
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
* Update golangci-lint
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>