Commit graph

128 commits

Author SHA1 Message Date
Carrie Edwards a046417bc0 Use new record type only for NHCB 2024-12-06 13:46:20 -08:00
Carrie Edwards f8a39767a4 Update WAL doc to include native histogram encodings 2024-12-05 09:21:47 -08:00
Carrie Edwards 6684344026 Rename old histogram record type, use old names for new records 2024-12-05 09:21:47 -08:00
Carrie Edwards 454f6d39ca Add separate handling for histograms and custom bucket histograms 2024-12-05 09:21:47 -08:00
Carrie Edwards 37df50adb9 Attempt for record type 2024-12-05 09:21:47 -08:00
Carrie Edwards cfcd51538d Remove references to custom values record 2024-12-05 09:21:47 -08:00
Carrie Edwards 6d413fad36 Use histogram records for custom value handling 2024-12-05 09:21:47 -08:00
Carrie Edwards aa144b7263 Handle custom buckets in WAL and WBL 2024-12-05 09:21:47 -08:00
Bryan Boreham f42b37ff2f
[BUGFIX] TSDB: Fix race on stale values in headAppender (#15322)
* [BUGFIX] TSDB: Fix race on stale values in headAppender

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Simplify

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

---------

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-11-06 16:51:39 +01:00
Nicolas Takashi b6c538972c
[REFACTORY] simplify appender commit (#15112)
* [REFACTOR] simplify appender commit

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Co-authored-by: Arthur Silva Sens <arthursens2005@gmail.com>
2024-10-29 12:34:02 +00:00
Pedro Tanaka bab587b9dc
Agent: allow for ingestion of CT samples (#15124)
Some checks failed
CI / Go tests (push) Has been cancelled
CI / More Go tests (push) Has been cancelled
CI / Go tests with previous Go version (push) Has been cancelled
CI / UI tests (push) Has been cancelled
CI / Go tests on Windows (push) Has been cancelled
CI / Mixins tests (push) Has been cancelled
CI / Build Prometheus for common architectures (0) (push) Has been cancelled
CI / Build Prometheus for common architectures (1) (push) Has been cancelled
CI / Build Prometheus for common architectures (2) (push) Has been cancelled
CI / Build Prometheus for all architectures (0) (push) Has been cancelled
CI / Build Prometheus for all architectures (1) (push) Has been cancelled
CI / Build Prometheus for all architectures (10) (push) Has been cancelled
CI / Build Prometheus for all architectures (11) (push) Has been cancelled
CI / Build Prometheus for all architectures (2) (push) Has been cancelled
CI / Build Prometheus for all architectures (3) (push) Has been cancelled
CI / Build Prometheus for all architectures (4) (push) Has been cancelled
CI / Build Prometheus for all architectures (5) (push) Has been cancelled
CI / Build Prometheus for all architectures (6) (push) Has been cancelled
CI / Build Prometheus for all architectures (7) (push) Has been cancelled
CI / Build Prometheus for all architectures (8) (push) Has been cancelled
CI / Build Prometheus for all architectures (9) (push) Has been cancelled
CI / Check generated parser (push) Has been cancelled
CI / golangci-lint (push) Has been cancelled
CI / fuzzing (push) Has been cancelled
CI / codeql (push) Has been cancelled
Scorecards supply-chain security / Scorecards analysis (push) Has been cancelled
CI / Report status of build Prometheus for all architectures (push) Has been cancelled
CI / Publish main branch artifacts (push) Has been cancelled
CI / Publish release artefacts (push) Has been cancelled
CI / Publish UI on npm Registry (push) Has been cancelled
* Remove unused option from HeadOptions

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Improve docs for appendable() method in head appender

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Ingest CT (float) samples in Agent DB

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* allow for ingestion of CT native histogram

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* adding some verification for ct ts

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Validating CT histogram before append and add newly created series to pending series

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* checking the wal for written samples

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Checking for samples in test

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* adding case for validations

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* fixing comparison when dedupelabels is enabled

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* unite tests, use table testing

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Implement CT related methods in timestampTracker for write storage

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* adding error case to test

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* removing unused fields

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Updating lastTs for series when adding CT to invalidate duplicates

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* making sure that updating the lastTS wont cause OOO later on in Commit();

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

---------

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-10-27 01:06:34 +01:00
Vanshika cccbe72514
TSDB: Fix some edge cases when OOO is enabled (#14710)
Some checks are pending
CI / Go tests (push) Waiting to run
CI / More Go tests (push) Waiting to run
CI / Go tests with previous Go version (push) Waiting to run
CI / UI tests (push) Waiting to run
CI / Go tests on Windows (push) Waiting to run
CI / Mixins tests (push) Waiting to run
CI / Build Prometheus for common architectures (0) (push) Waiting to run
CI / Build Prometheus for common architectures (1) (push) Waiting to run
CI / Build Prometheus for common architectures (2) (push) Waiting to run
CI / Build Prometheus for all architectures (0) (push) Waiting to run
CI / Build Prometheus for all architectures (1) (push) Waiting to run
CI / Build Prometheus for all architectures (10) (push) Waiting to run
CI / Build Prometheus for all architectures (11) (push) Waiting to run
CI / Build Prometheus for all architectures (2) (push) Waiting to run
CI / Build Prometheus for all architectures (3) (push) Waiting to run
CI / Build Prometheus for all architectures (4) (push) Waiting to run
CI / Build Prometheus for all architectures (5) (push) Waiting to run
CI / Build Prometheus for all architectures (6) (push) Waiting to run
CI / Build Prometheus for all architectures (7) (push) Waiting to run
CI / Build Prometheus for all architectures (8) (push) Waiting to run
CI / Build Prometheus for all architectures (9) (push) Waiting to run
CI / Report status of build Prometheus for all architectures (push) Blocked by required conditions
CI / Check generated parser (push) Waiting to run
CI / golangci-lint (push) Waiting to run
CI / fuzzing (push) Waiting to run
CI / codeql (push) Waiting to run
CI / Publish main branch artifacts (push) Blocked by required conditions
CI / Publish release artefacts (push) Blocked by required conditions
CI / Publish UI on npm Registry (push) Blocked by required conditions
Scorecards supply-chain security / Scorecards analysis (push) Waiting to run
Fix some edge cases when OOO is enabled

Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>
Signed-off-by: Vanshika <102902652+Vanshikav123@users.noreply.github.com>
Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com>
Co-authored-by: Jesus Vazquez <jesusvzpg@gmail.com>
2024-10-23 17:34:28 +02:00
György Krajcsovits bb70370d72 TSDB head: fix race between AppendHistogram and Commit
Move writing memSeries lastHistogramValue and lastFloatHistogramValue
after series creation under lock.

The resulting code isn't totally correct in the sense that we're setting
these values before Commit() , so they might be overwritten/rolled back
later.

Also Append of stale sample checks the values without lock, so there's
still a potential race.

The correct solution would be to set these only in Commit() which we
actually do, but then Commit() would also need to process samples in
order and not floats first, then histograms, then float histograms - which
leads to not knowing what stale marker to write for histograms.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-10-10 16:59:15 +02:00
TJ Hoplock 6ebfbd2d54 chore!: adopt log/slog, remove go-kit/log
For: #14355

This commit updates Prometheus to adopt stdlib's log/slog package in
favor of go-kit/log. As part of converting to use slog, several other
related changes are required to get prometheus working, including:
- removed unused logging util func `RateLimit()`
- forward ported the util/logging/Deduper logging by implementing a small custom slog.Handler that does the deduping before chaining log calls to the underlying real slog.Logger
- move some of the json file logging functionality to use prom/common package functionality
- refactored some of the new json file logging for scraping
- changes to promql.QueryLogger interface to swap out logging methods for relevant slog sugar wrappers
- updated lots of tests that used/replicated custom logging functionality, attempting to keep the logical goal of the tests consistent after the transition
- added a healthy amount of `if logger == nil { $makeLogger }` type conditional checks amongst various functions where none were provided -- old code that used the go-kit/log.Logger interface had several places where there were nil references when trying to use functions like `With()` to add keyvals on the new *slog.Logger type

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
2024-10-07 15:58:50 -04:00
Arthur Silva Sens 6bd9b1a7cc
Histogram CT Zero ingestion
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
2024-09-26 11:29:22 -03:00
Carrie Edwards 14e3c05ce8
tsdb: Add support for ingestion of out-of-order native histogram samples (#14546)
Add support for ingesting OOO native histograms

* Add flag for enabling and disabling OOO native histogram ingestion
* Update OOO querying tests to include native histogram samples
* Add OOO head tests
* Add test for OOO native histogram counter reset headers

Signed-off-by: Carrie Edwards <edwrdscarrie@gmail.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Co-authored by: Carrie Edwards <edwrdscarrie@gmail.com>
Co-authored by: Jeanette Tan <jeanette.tan@grafana.com>
Co-authored by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Co-authored by: Fiona Liao <fiona.liao@grafana.com>
2024-09-17 11:19:06 +02:00
George Krajcsovits 5aa3d8260a
TSDB: OOO native histograms: prep for multiple ooo head chunks (#14850)
Some checks failed
CI / Go tests (push) Has been cancelled
CI / More Go tests (push) Has been cancelled
CI / Go tests with previous Go version (push) Has been cancelled
CI / UI tests (push) Has been cancelled
CI / Go tests on Windows (push) Has been cancelled
CI / Mixins tests (push) Has been cancelled
CI / Build Prometheus for common architectures (0) (push) Has been cancelled
CI / Build Prometheus for common architectures (1) (push) Has been cancelled
CI / Build Prometheus for common architectures (2) (push) Has been cancelled
CI / Build Prometheus for all architectures (0) (push) Has been cancelled
CI / Build Prometheus for all architectures (1) (push) Has been cancelled
CI / Build Prometheus for all architectures (10) (push) Has been cancelled
CI / Build Prometheus for all architectures (11) (push) Has been cancelled
CI / Build Prometheus for all architectures (2) (push) Has been cancelled
CI / Build Prometheus for all architectures (3) (push) Has been cancelled
CI / Build Prometheus for all architectures (4) (push) Has been cancelled
CI / Build Prometheus for all architectures (5) (push) Has been cancelled
CI / Build Prometheus for all architectures (6) (push) Has been cancelled
CI / Build Prometheus for all architectures (7) (push) Has been cancelled
CI / Build Prometheus for all architectures (8) (push) Has been cancelled
CI / Build Prometheus for all architectures (9) (push) Has been cancelled
CI / Check generated parser (push) Has been cancelled
CI / golangci-lint (push) Has been cancelled
CI / fuzzing (push) Has been cancelled
CI / codeql (push) Has been cancelled
Scorecards supply-chain security / Scorecards analysis (push) Has been cancelled
CI / Report status of build Prometheus for all architectures (push) Has been cancelled
CI / Publish main branch artifacts (push) Has been cancelled
CI / Publish release artefacts (push) Has been cancelled
CI / Publish UI on npm Registry (push) Has been cancelled
* tsdb: mmapCurrentOOOHeadChunk prepare for multiple ooo chunks

Currently float samples can only create a single ooo head chunk, but
native histograms can result in multiple due to counter resets, etc.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

* tsdb: getOOOSeriesChunks prepare for multiple ooo chunks

Currently float samples can only create a single ooo head chunk, but
native histograms can result in multiple due to counter resets, etc.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

---------

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-09-11 23:55:39 +01:00
Arve Knudsen 3a78e76282 Upgrade golangci-lint to v1.60.1
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-08-18 12:13:25 +02:00
Bryan Boreham 512c67ec26 TSDB: Never go over maximum number of OOO chunks
In `mmapCurrentOOOHeadChunk`, check if the number is at the maximum and
drop the data with an error log. This is not expected to happen as the
maximum is over 8 million; that's 8 years of 1 sample every second.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-08-14 13:41:59 +01:00
György Krajcsovits 41656162fc tsdb: prepare inserting native histograms into OOO head
Rename a variable.
Add parameters to memSeries.insert function.

No effect on how float samples are handled.

Related to #14546

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-08-14 11:13:47 +02:00
George Krajcsovits cf62fb5c44
Merge pull request #14629 from krajorama/fix-to-encoded-chunks
Some checks are pending
CI / Go tests (push) Waiting to run
CI / More Go tests (push) Waiting to run
CI / Go tests with previous Go version (push) Waiting to run
CI / UI tests (push) Waiting to run
CI / Go tests on Windows (push) Waiting to run
CI / Mixins tests (push) Waiting to run
CI / Build Prometheus for common architectures (0) (push) Waiting to run
CI / Build Prometheus for common architectures (1) (push) Waiting to run
CI / Build Prometheus for common architectures (2) (push) Waiting to run
CI / Build Prometheus for all architectures (0) (push) Waiting to run
CI / Build Prometheus for all architectures (1) (push) Waiting to run
CI / Build Prometheus for all architectures (10) (push) Waiting to run
CI / Build Prometheus for all architectures (11) (push) Waiting to run
CI / Build Prometheus for all architectures (2) (push) Waiting to run
CI / Build Prometheus for all architectures (3) (push) Waiting to run
CI / Build Prometheus for all architectures (4) (push) Waiting to run
CI / Build Prometheus for all architectures (5) (push) Waiting to run
CI / Build Prometheus for all architectures (6) (push) Waiting to run
CI / Build Prometheus for all architectures (7) (push) Waiting to run
CI / Build Prometheus for all architectures (8) (push) Waiting to run
CI / Build Prometheus for all architectures (9) (push) Waiting to run
CI / Report status of build Prometheus for all architectures (push) Blocked by required conditions
CI / Check generated parser (push) Waiting to run
CI / golangci-lint (push) Waiting to run
CI / fuzzing (push) Waiting to run
CI / codeql (push) Waiting to run
CI / Publish main branch artifacts (push) Blocked by required conditions
CI / Publish release artefacts (push) Blocked by required conditions
CI / Publish UI on npm Registry (push) Blocked by required conditions
Scorecards supply-chain security / Scorecards analysis (push) Waiting to run
Fix ToEncodedChunks minT for recoded chunks
2024-08-08 20:00:31 +02:00
György Krajcsovits 1ea3781699 Fix ToEncodedChunks minT for recoded chunks
Discovered while working on #14546 OOO native histograms.
Not triggered on main before #14546 as the code path is unused.

There was a bug where the min time of a chunk was adjusted even
if it was only recoded and not completely new.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-08-08 15:22:46 +02:00
Oleg Zaytsev 0833d2a230
Fix appendable: check whether last val was a histogram (#14613)
* Fix appendable: check whether last val was a histogram

When appending a float, we were checking whether lastValue was equal to
current value, but we didn't check whether last value was a float value.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-08-07 15:02:59 +02:00
Carrie Edwards a28d1974b4
Merge pull request #14438 from prometheus/cedwards/ooo-chunk-encoding
tsdb: Add support for handling multiple chunks in OOO head
2024-07-16 11:12:23 -07:00
Carrie Edwards 7e020bb4e9 Refactor
Signed-off-by: Carrie Edwards <edwrdscarrie@gmail.com>
2024-07-16 10:53:23 -07:00
Carrie Edwards e0d7cef545 Rename some variables in Commit()
Signed-off-by: Carrie Edwards <edwrdscarrie@gmail.com>
2024-07-16 10:53:23 -07:00
Carrie Edwards 2e0e4e9ce9 Add support for handling multiple chunks in OOO head
Signed-off-by: Carrie Edwards <edwrdscarrie@gmail.com>
Co-authored by: Jeanette Tan <jeanette.tan@grafana.com>:
Co-authored by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>:
Signed-off-by: Carrie Edwards <edwrdscarrie@gmail.com>
Co-authored by: Fiona Liao <fiona.liao@grafana.com>:
2024-07-16 10:53:09 -07:00
Bryan Boreham 709c5d6fc3
TSDB: Lock around access to labels in head under -tags dedupelabels (#14322)
* TSDB: Document what needs locking in memSeries

* TSDB: Lock around access to series labels

So we can modify them to reset the symbol-table.

* TSDB: Make label locking conditional on build tag

---------

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-07-05 10:11:32 +01:00
Bryan Boreham 42b546a43d
tsdb: add details to duplicate sample error (#13277)
Now the error will include the timestamp and the existing and new values.
When you are trying to track down the source of this error, it can be
useful to see that the values are close, or alternating, or something
else.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-06-04 08:54:09 +01:00
Arve Knudsen d699dc3c77
Fix language in docs and comments (#14041)
Fix language in docs and comments

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
2024-05-08 17:57:09 +02:00
Matthieu MOREL 6f595c6762
golangci-lint: enable whitespace linter (#13905)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-04-11 09:27:54 +01:00
komisan19 0249e080b4 refactor: utilize standard functions max/min
Signed-off-by: komisan19 <18901496+komisan19@users.noreply.github.com>
2024-04-04 03:15:38 +09:00
Nick Pillitteri 481f14e1c0
TSDB: Don't rely on integer overflow in head compaction check (#13755)
* TSDB: Don't compact the head block when empty

Don't compact the Head block if there have not yet been any samples
appended.

Previously, the logic for determining if the head should be compacted
relied on the default values for min and max time and integer overflow
when they were checked in `Head.compactable()`. The check in
`Head.compactable()` effectively did `math.MinInt64 - math.MaxInt64`
which overflowed and wrapped to `1`. Since `1` is less than `1.5`
times the chunk range, compaction did not happen. This was the correct
behavior but relying on overflow wrapping is surprising.

This change add a method for checking if the min and max time for the
head is unset and uses it to short-circuit compaction in that case.
It also replaces several explicit checks for the default value to
determine if the head has not yet had any samples added.

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
2024-03-26 12:17:38 +01:00
Bryan Boreham bbe39af99f
tsdb: zero out Labels and memSeries pointers in pool (#13712)
* tsdb: zero out Labels and memSeries pointers in pool

So that the garbage-collector doesn't see this memory as still in use.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

---------

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Björn Rabenstein <github@rabenste.in>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
2024-03-06 13:36:34 +01:00
Fiona Liao 841a133514 Move histogramsAppended to be more consistent
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
2024-02-21 11:15:04 +00:00
Fiona Liao 52389647b2 Add type label to outOfOrderSamplesAppended metric
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
2024-02-19 15:24:39 +00:00
Bryan Boreham 66237c1996 tsdb: use cheaper Mutex on series
Mutex is 8 bytes; RWMutex is 24 bytes and much more complicated. Since
`RLock` is only used in two places, `UpdateMetadata` and `Delete`,
neither of which are hotspots, we should use the cheaper one.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-01-26 11:01:39 +00:00
Bryan Boreham 8065bef172 Move metric type definitions to common/model
They are used in multiple repos, so common is a better place for them.
Several packages now don't depend on `model/textparse`, e.g.
`storage/remote`.

Also remove `metadata` struct from `api.go`, since it was identical to
a struct in the `metadata` package.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-12-19 18:56:54 +00:00
Matthieu MOREL 8f6cf3aabb tsdb: use Go standard errors
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-12-11 12:18:54 +00:00
Arthur Silva Sens 5082655392
Append Created Timestamps (#12733)
* Append created timestamps.

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>

* Log when created timestamps are ignored

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>

* Proposed changes to Append CT PR.

Changes:

* Changed textparse Parser interface for consistency and robustness.
* Changed CT interface to be more explicit and handle validation.
* Simplified test, change scrapeManager to allow testability.
* Added TODOs.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Updates.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Addressed comments.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Refactor head_appender test

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>

* Fix linter issues

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>

* Use model.Sample in head appender test

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>

---------

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>
Signed-off-by: bwplotka <bwplotka@gmail.com>
Co-authored-by: bwplotka <bwplotka@gmail.com>
2023-12-11 08:43:42 +00:00
George Krajcsovits acc114fe55
Fix panic during tsdb Commit (#13092)
* Fix panic during tsdb Commit

Fixes the following

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x19deb45]

goroutine 651118930 [running]:
github.com/prometheus/prometheus/tsdb.(*headAppender).Commit(0xc19100f7c0)
	/drone/src/vendor/github.com/prometheus/prometheus/tsdb/head_append.go:855 +0x245
github.com/prometheus/prometheus/tsdb.dbAppender.Commit({{0x35bd6f0?, 0xc19100f7c0?}, 0xc000fa4c00?})
	/drone/src/vendor/github.com/prometheus/prometheus/tsdb/db.go:1159 +0x2f

We theorize that the panic happened due the the series referenced by the
exemplar being removed between AppendExemplar and Commit due to being idle.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2023-11-12 14:51:37 +00:00
George Krajcsovits 39a35d92bc
tsdb/head: wlog exemplars after samples (#13113)
When samples are committed in the head, they are also written to the WAL.
The order of WAL records should be sample then exemplar, but this was
not the case for native histogram samples. This PR fixes that.

The problem with the wrong order is that remote write reads the WAL and
sends the recorded timeseries in the WAL order, which means exemplars
arrived before histogram samples. If the receiving side is Prometheus
TSDB and the series has not existed before then the exemplar does not
currently create the series. Which means the exemplar is rejected and lost.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2023-11-11 17:30:16 +01:00
Linas Medziunas ebed7d0612 Change Validate to be a method on histogram structs
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
2023-11-03 16:47:59 +02:00
Linas Medziunas 1f8aea11d6 Move histogram validation code to model/histogram
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
2023-11-03 16:17:24 +02:00
Linas Medziunas 1cd6c1cde5 ValidateHistogram: strict Count check in absence of NaNs
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
2023-11-03 16:17:24 +02:00
György Krajcsovits 9dbd100a5e Refactor solution to not repeat code
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2023-09-20 15:54:00 +02:00
György Krajcsovits 96d03b6f46 Fix duplicate sample detection at chunks size limit
Before cutting a new XOR chunk in case the chunk goes over the size
limit, check that the timestamp is in order and not equal or older
than the latest sample in the old chunk.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2023-09-20 14:49:56 +02:00
Justin Lei 8ef7dfdeeb
Add a chunk size limit in bytes (#12054)
Add a chunk size limit in bytes

This creates a hard cap for XOR chunks of 1024 bytes.

The limit for histogram chunk is also 1024 bytes, but it is a soft limit as a histogram has a dynamic size, and even a single one could be larger than 1024 bytes.

This also avoids cutting new histogram chunks if the existing chunk has fewer than 10 histograms yet. In that way, we are accepting "jumbo chunks" in order to have at least 10 histograms in a chunk, allowing compression to kick in.

Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-08-24 15:21:17 +02:00
beorn7 aa82fe198f tsdb: Fix histogram validation
So far, `ValidateHistogram` would not detect if the count did not
include the count in the zero bucket. This commit fixes the problem
and updates all the tests that have been undetected offenders so far.

Note that this problem would only ever create false negatives, so we
never falsely rejected to store a histogram because of it.

On the other hand, `ValidateFloatHistogram` has been to strict with
the count being at least as large as the sum of the counts in all the
buckets. Float precision issues could create false positives here, see
products of PromQL evaluations, it's actually quite hard to put an
upper limit no the floating point imprecision. Users could produce the
weirdest expressions, maxing out float precision problems. Therefore,
this commit simply removes that particular check from
`ValidateFloatHistogram`.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-08-22 23:04:01 +02:00
Oleg Zaytsev cd7d0b69a2
Check nil err first when committing (#12625)
The most common case is to have a nil error when appending series, so
let's check that first instead of checking the 3 error types first.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2023-08-01 14:04:45 +02:00