Move writing memSeries lastHistogramValue and lastFloatHistogramValue
after series creation under lock.
The resulting code isn't totally correct in the sense that we're setting
these values before Commit() , so they might be overwritten/rolled back
later.
Also Append of stale sample checks the values without lock, so there's
still a potential race.
The correct solution would be to set these only in Commit() which we
actually do, but then Commit() would also need to process samples in
order and not floats first, then histograms, then float histograms - which
leads to not knowing what stale marker to write for histograms.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
For: #14355
This commit updates Prometheus to adopt stdlib's log/slog package in
favor of go-kit/log. As part of converting to use slog, several other
related changes are required to get prometheus working, including:
- removed unused logging util func `RateLimit()`
- forward ported the util/logging/Deduper logging by implementing a small custom slog.Handler that does the deduping before chaining log calls to the underlying real slog.Logger
- move some of the json file logging functionality to use prom/common package functionality
- refactored some of the new json file logging for scraping
- changes to promql.QueryLogger interface to swap out logging methods for relevant slog sugar wrappers
- updated lots of tests that used/replicated custom logging functionality, attempting to keep the logical goal of the tests consistent after the transition
- added a healthy amount of `if logger == nil { $makeLogger }` type conditional checks amongst various functions where none were provided -- old code that used the go-kit/log.Logger interface had several places where there were nil references when trying to use functions like `With()` to add keyvals on the new *slog.Logger type
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* tsdb: mmapCurrentOOOHeadChunk prepare for multiple ooo chunks
Currently float samples can only create a single ooo head chunk, but
native histograms can result in multiple due to counter resets, etc.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* tsdb: getOOOSeriesChunks prepare for multiple ooo chunks
Currently float samples can only create a single ooo head chunk, but
native histograms can result in multiple due to counter resets, etc.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
---------
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
In `mmapCurrentOOOHeadChunk`, check if the number is at the maximum and
drop the data with an error log. This is not expected to happen as the
maximum is over 8 million; that's 8 years of 1 sample every second.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Rename a variable.
Add parameters to memSeries.insert function.
No effect on how float samples are handled.
Related to #14546
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Discovered while working on #14546 OOO native histograms.
Not triggered on main before #14546 as the code path is unused.
There was a bug where the min time of a chunk was adjusted even
if it was only recoded and not completely new.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* Fix appendable: check whether last val was a histogram
When appending a float, we were checking whether lastValue was equal to
current value, but we didn't check whether last value was a float value.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* TSDB: Document what needs locking in memSeries
* TSDB: Lock around access to series labels
So we can modify them to reset the symbol-table.
* TSDB: Make label locking conditional on build tag
---------
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Now the error will include the timestamp and the existing and new values.
When you are trying to track down the source of this error, it can be
useful to see that the values are close, or alternating, or something
else.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* TSDB: Don't compact the head block when empty
Don't compact the Head block if there have not yet been any samples
appended.
Previously, the logic for determining if the head should be compacted
relied on the default values for min and max time and integer overflow
when they were checked in `Head.compactable()`. The check in
`Head.compactable()` effectively did `math.MinInt64 - math.MaxInt64`
which overflowed and wrapped to `1`. Since `1` is less than `1.5`
times the chunk range, compaction did not happen. This was the correct
behavior but relying on overflow wrapping is surprising.
This change add a method for checking if the min and max time for the
head is unset and uses it to short-circuit compaction in that case.
It also replaces several explicit checks for the default value to
determine if the head has not yet had any samples added.
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
* tsdb: zero out Labels and memSeries pointers in pool
So that the garbage-collector doesn't see this memory as still in use.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
---------
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Björn Rabenstein <github@rabenste.in>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
Mutex is 8 bytes; RWMutex is 24 bytes and much more complicated. Since
`RLock` is only used in two places, `UpdateMetadata` and `Delete`,
neither of which are hotspots, we should use the cheaper one.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
They are used in multiple repos, so common is a better place for them.
Several packages now don't depend on `model/textparse`, e.g.
`storage/remote`.
Also remove `metadata` struct from `api.go`, since it was identical to
a struct in the `metadata` package.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Append created timestamps.
Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>
* Log when created timestamps are ignored
Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>
* Proposed changes to Append CT PR.
Changes:
* Changed textparse Parser interface for consistency and robustness.
* Changed CT interface to be more explicit and handle validation.
* Simplified test, change scrapeManager to allow testability.
* Added TODOs.
Signed-off-by: bwplotka <bwplotka@gmail.com>
* Updates.
Signed-off-by: bwplotka <bwplotka@gmail.com>
* Addressed comments.
Signed-off-by: bwplotka <bwplotka@gmail.com>
* Refactor head_appender test
Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>
* Fix linter issues
Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>
* Use model.Sample in head appender test
Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>
---------
Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>
Signed-off-by: bwplotka <bwplotka@gmail.com>
Co-authored-by: bwplotka <bwplotka@gmail.com>
* Fix panic during tsdb Commit
Fixes the following
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x19deb45]
goroutine 651118930 [running]:
github.com/prometheus/prometheus/tsdb.(*headAppender).Commit(0xc19100f7c0)
/drone/src/vendor/github.com/prometheus/prometheus/tsdb/head_append.go:855 +0x245
github.com/prometheus/prometheus/tsdb.dbAppender.Commit({{0x35bd6f0?, 0xc19100f7c0?}, 0xc000fa4c00?})
/drone/src/vendor/github.com/prometheus/prometheus/tsdb/db.go:1159 +0x2f
We theorize that the panic happened due the the series referenced by the
exemplar being removed between AppendExemplar and Commit due to being idle.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
When samples are committed in the head, they are also written to the WAL.
The order of WAL records should be sample then exemplar, but this was
not the case for native histogram samples. This PR fixes that.
The problem with the wrong order is that remote write reads the WAL and
sends the recorded timeseries in the WAL order, which means exemplars
arrived before histogram samples. If the receiving side is Prometheus
TSDB and the series has not existed before then the exemplar does not
currently create the series. Which means the exemplar is rejected and lost.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Before cutting a new XOR chunk in case the chunk goes over the size
limit, check that the timestamp is in order and not equal or older
than the latest sample in the old chunk.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Add a chunk size limit in bytes
This creates a hard cap for XOR chunks of 1024 bytes.
The limit for histogram chunk is also 1024 bytes, but it is a soft limit as a histogram has a dynamic size, and even a single one could be larger than 1024 bytes.
This also avoids cutting new histogram chunks if the existing chunk has fewer than 10 histograms yet. In that way, we are accepting "jumbo chunks" in order to have at least 10 histograms in a chunk, allowing compression to kick in.
Signed-off-by: Justin Lei <justin.lei@grafana.com>
So far, `ValidateHistogram` would not detect if the count did not
include the count in the zero bucket. This commit fixes the problem
and updates all the tests that have been undetected offenders so far.
Note that this problem would only ever create false negatives, so we
never falsely rejected to store a histogram because of it.
On the other hand, `ValidateFloatHistogram` has been to strict with
the count being at least as large as the sum of the counts in all the
buckets. Float precision issues could create false positives here, see
products of PromQL evaluations, it's actually quite hard to put an
upper limit no the floating point imprecision. Users could produce the
weirdest expressions, maxing out float precision problems. Therefore,
this commit simply removes that particular check from
`ValidateFloatHistogram`.
Signed-off-by: beorn7 <beorn@grafana.com>
The most common case is to have a nil error when appending series, so
let's check that first instead of checking the 3 error types first.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks.
When append() is called on memSeries it might decide that a new headChunk is needed to use for given append() call.
If that happens it will first mmap existing head chunk and only after that happens it will create a new empty headChunk and continue appending
our sample to it.
Since appending samples uses write lock on memSeries no other read or write can happen until any append is completed.
When we have an append() that must create a new head chunk the whole memSeries is blocked until mmapping of existing head chunk finishes.
Mmapping itself uses a lock as it needs to be serialised, which means that the more chunks to mmap we have the longer each chunk might wait
for it to be mmapped.
If there's enough chunks that require mmapping some memSeries will be locked for long enough that it will start affecting
queries and scrapes.
Queries might timeout, since by default they have a 2 minute timeout set.
Scrapes will be blocked inside append() call, which means there will be a gap between samples. This will first affect range queries
or calls using rate() and such, since the time range requested in the query might have too few samples to calculate anything.
To avoid this we need to remove mmapping from append path, since mmapping is blocking.
But this means that when we cut a new head chunk we need to keep the old one around, so we can mmap it later.
This change makes memSeries.headChunk a linked list, memSeries.headChunk still points to the 'open' head chunk that receives new samples,
while older, yet to be mmapped, chunks are linked to it.
Mmapping is done on a schedule by iterating all memSeries one by one. Thanks to this we control when mmapping is done, since we trigger
it manually, which reduces the risk that it will have to compete for mmap locks with other chunks.
Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
* WIP implement WAL watcher reading via notifications over a channel from
the TSDB code
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Notify via head appenders Commit (finished all WAL logging) rather than
on each WAL Log call
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Fix misspelled Notify plus add a metric for dropped Write notifications
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Update tests to handle new notification pattern
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* this test maybe needs more time on windows?
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* does this test need more time on windows as well?
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* read timeout is already a time.Duration
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* remove mistakenly commited benchmark data files
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* address some review feedback
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* fix missed changes from previous commit
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Fix issues from wrapper function
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* try fixing race condition in test by allowing tests to overwrite the
read ticker timeout instead of calling the Notify function
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* fix linting
Signed-off-by: Callum Styan <callumstyan@gmail.com>
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Wiser coders than myself have come to the conclusion that a `switch`
statement is almost always superior to a statement that includes any
`else if`.
The exceptions that I have found in our codebase are just these two:
* The `if else` is followed by an additional statement before the next
condition (separated by a `;`).
* The whole thing is within a `for` loop and `break` statements are
used. In this case, using `switch` would require tagging the `for`
loop, which probably tips the balance.
Why are `switch` statements more readable?
For one, fewer curly braces. But more importantly, the conditions all
have the same alignment, so the whole thing follows the natural flow
of going down a list of conditions. With `else if`, in contrast, all
conditions but the first are "hidden" behind `} else if `, harder to
spot and (for no good reason) presented differently from the first
condition.
I'm sure the aforemention wise coders can list even more reasons.
In any case, I like it so much that I have found myself recommending
it in code reviews. I would like to make it a habit in our code base,
without making it a hard requirement that we would test on the CI. But
for that, there has to be a role model, so this commit eliminates all
`if else` occurrences, unless it is autogenerated code or fits one of
the exceptions above.
Signed-off-by: beorn7 <beorn@grafana.com>