Commit graph

798 commits

Author SHA1 Message Date
Dimitar Dimitrov b40865833d
PostingsForMatchers race with creating new series (#12558)
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
2023-08-29 11:03:27 +02:00
Bryan Boreham c5671c6d97
Merge pull request #12755 from bboreham/rangequery-benchmark-mmap
promql: force mmap of head chunks in BenchmarkRangeQuery
2023-08-26 15:56:52 +01:00
Bryan Boreham 5d22d422ab
Merge pull request #12690 from michalbiesek/feat-go-bump
Update Go version to 1.21
2023-08-26 14:36:15 +01:00
Bryan Boreham 0d283effa8 promql: force mmap of head chunks in BenchmarkRangeQuery
Otherwise we have a highly unusual situation of over 100 chunks
in the headChunks list of each series, which heavily skews
performance.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-08-26 09:40:59 +00:00
Gregor Zeitlinger f01718262a
Unit tests for native histograms (#12668)
promql: Extend testing framework to support native histograms

This includes both the internal testing framework as well as the rules unit test feature of promtool.

This also adds a bunch of basic tests. Many of the code level tests can now be converted to tests within the framework, and more tests can be added easily.

---------

Signed-off-by: Harold Dost <h.dost@criteo.com>
Signed-off-by: Gregor Zeitlinger <gregor.zeitlinger@grafana.com>
Signed-off-by: Stephen Lang <stephen.lang@grafana.com>
Co-authored-by: Harold Dost <h.dost@criteo.com>
Co-authored-by: Stephen Lang <stephen.lang@grafana.com>
Co-authored-by: Gregor Zeitlinger <gregor.zeitlinger@grafana.com>
2023-08-25 23:35:42 +02:00
Michal Biesek 04d7b4dbee
lint: Fix SA1019 Using a deprecated function
`rand.Read` has been deprecated since Go 1.20
`crypto/rand.Read` is more appropriate

Ref: https://tip.golang.org/doc/go1.20

Signed-off-by: Michal Biesek <michalbiesek@gmail.com>
2023-08-25 17:47:41 +02:00
Justin Lei 8ef7dfdeeb
Add a chunk size limit in bytes (#12054)
Add a chunk size limit in bytes

This creates a hard cap for XOR chunks of 1024 bytes.

The limit for histogram chunk is also 1024 bytes, but it is a soft limit as a histogram has a dynamic size, and even a single one could be larger than 1024 bytes.

This also avoids cutting new histogram chunks if the existing chunk has fewer than 10 histograms yet. In that way, we are accepting "jumbo chunks" in order to have at least 10 histograms in a chunk, allowing compression to kick in.

Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-08-24 15:21:17 +02:00
beorn7 aa82fe198f tsdb: Fix histogram validation
So far, `ValidateHistogram` would not detect if the count did not
include the count in the zero bucket. This commit fixes the problem
and updates all the tests that have been undetected offenders so far.

Note that this problem would only ever create false negatives, so we
never falsely rejected to store a histogram because of it.

On the other hand, `ValidateFloatHistogram` has been to strict with
the count being at least as large as the sum of the counts in all the
buckets. Float precision issues could create false positives here, see
products of PromQL evaluations, it's actually quite hard to put an
upper limit no the floating point imprecision. Users could produce the
weirdest expressions, maxing out float precision problems. Therefore,
this commit simply removes that particular check from
`ValidateFloatHistogram`.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-08-22 23:04:01 +02:00
Mustafa Ateş Uzun e5e51bebef
fix: error message typo
Signed-off-by: Mustafa Ateş Uzun <mustafauzun0@gmail.com>
2023-08-17 16:34:45 +03:00
Julien Pivotto e3fabd5fdf
Merge pull request #12664 from prometheus/superq/cleanup_chunk_snapshots
Cleanup temporary chunk snapshot dirs
2023-08-08 13:02:39 +02:00
SuperQ 8d38d59fc5
Cleanup temporary chunk snapshot dirs
Simlar to cleanup of WAL files on startup, cleanup temporary
chunk_snapshot dirs. This prevents storage space leaks due to terminated
snapshots on shutdown.

Signed-off-by: SuperQ <superq@gmail.com>
2023-08-08 09:43:48 +02:00
Julien Pivotto c3311272d9
Merge pull request #12652 from colega/fix-typo-in-append-histogram-param-name
Fix typo in Appender.AppendHistogram() arg name
2023-08-04 16:37:40 +02:00
Oleg Zaytsev 6ea6def0d3
Use zeropool when replaying agent's DB WAL (#12651)
Same as https://github.com/prometheus/prometheus/pull/12189 but for
tsdb/agent/db.go

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2023-08-04 10:39:55 +02:00
Oleg Zaytsev c810e7cae3
Fix typo in Appender.AppendHistogram() arg name
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2023-08-04 10:21:16 +02:00
Oleg Zaytsev 61daa30bb1
Pass ref to SeriesLifecycleCallback.PostDeletion (#12626)
When a particular SeriesLifecycleCallback tries to optimize and run
closer to the Head, keeping track of the HeadSeriesRef instead of the
labelsets, it's impossible to handle the PostDeletion callback properly
as there's no way to know which series refs were deleted from the head.

This changes the callback to provide the series refs alongside the
labelsets, so the implementation can choose what to do.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2023-08-03 10:56:27 +02:00
Oleg Zaytsev cd7d0b69a2
Check nil err first when committing (#12625)
The most common case is to have a nil error when appending series, so
let's check that first instead of checking the 3 error types first.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2023-08-01 14:04:45 +02:00
cui fliter f26dfc95e6
fix struct name in comment (#12624)
Signed-off-by: cui fliter <imcusg@gmail.com>
2023-08-01 12:24:42 +02:00
Łukasz Mierzwa 3c80963e81
Use a linked list for memSeries.headChunk (#11818)
Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks.
When append() is called on memSeries it might decide that a new headChunk is needed to use for given append() call.
If that happens it will first mmap existing head chunk and only after that happens it will create a new empty headChunk and continue appending
our sample to it.

Since appending samples uses write lock on memSeries no other read or write can happen until any append is completed.
When we have an append() that must create a new head chunk the whole memSeries is blocked until mmapping of existing head chunk finishes.
Mmapping itself uses a lock as it needs to be serialised, which means that the more chunks to mmap we have the longer each chunk might wait
for it to be mmapped.
If there's enough chunks that require mmapping some memSeries will be locked for long enough that it will start affecting
queries and scrapes.
Queries might timeout, since by default they have a 2 minute timeout set.
Scrapes will be blocked inside append() call, which means there will be a gap between samples. This will first affect range queries
or calls using rate() and such, since the time range requested in the query might have too few samples to calculate anything.

To avoid this we need to remove mmapping from append path, since mmapping is blocking.
But this means that when we cut a new head chunk we need to keep the old one around, so we can mmap it later.
This change makes memSeries.headChunk a linked list, memSeries.headChunk still points to the 'open' head chunk that receives new samples,
while older, yet to be mmapped, chunks are linked to it.
Mmapping is done on a schedule by iterating all memSeries one by one. Thanks to this we control when mmapping is done, since we trigger
it manually, which reduces the risk that it will have to compete for mmap locks with other chunks.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2023-07-31 11:10:24 +02:00
Robert Fratto 886945cda7
tsdb/agent: ensure that new series get written to WAL on rollback (#12592)
If a new series is introduced in a storage.Appender instance, that
series should be written to the WAL once the storage.Appender is closed,
even on Rollback.

Previously, new series would only be written to the WAL when calling
Commit. However, because the series is stored in memory regardless,
subsequent calls to Commit may write samples to the WAL which reference
a series ID which that was never written.

Related to #11589. It's likely that this fix also resolves this issue,
but we need more testing from users to see if the problem persists after
this fix; there may be more cases where samples get written to the WAL
in Prometheus Agent mode without the corresponding series record.

Signed-off-by: Robert Fratto <robertfratto@gmail.com>
2023-07-27 09:28:26 -04:00
George Krajcsovits 6cd2d1621f
Hide histogram chunk append and reset header internals (#12352)
tsdb: Hide histogram chunk append and reset header internals

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>
2023-07-26 15:08:16 +02:00
Björn Rabenstein 0e12f11d61
Merge pull request #12583 from prometheus/release-2.46
Merge release-2.46 into main
2023-07-20 18:29:44 +02:00
György Krajcsovits d4e355243a tsdbutil/ChunkFromSamplesGeneric should not panic
Add error handling instead.
Prepares for #12352

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2023-07-20 17:01:34 +02:00
Julien Pivotto 7905594b52
Merge pull request #12557 from prometheus/beorn7/histogram
scrape: Enable ingestion of multiple exemplars per sample
2023-07-20 15:19:28 +02:00
Julien Pivotto 1f5934e7be
Merge pull request #10623 from songjiayang/update-index
make sure response error when TOC parse failed
2023-07-18 13:47:27 +02:00
cui fliter 096ceca44f
remove repetitive words (#12556)
Signed-off-by: cui fliter <imcusg@gmail.com>
2023-07-13 15:53:40 +02:00
beorn7 0e3f35324b scrape: Enable ingestion of multiple exemplars per sample
This has become a requirement for native histograms, as a single
histogram sample commonly has many buckets, so that providing many
exemplars makes sense.

Since OM text doesn't support native histograms yet, the test had to
be expanded to also support protobuf test cases.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-07-13 14:16:10 +02:00
Julien Pivotto 89e213bc02
Merge pull request #12546 from roidelapluie/removeimport
TSDB: Remove usused import of sort
2023-07-11 15:06:48 +02:00
Justin Lei 32d87282ad
Add Zstandard compression option for wlog (#11666)
Snappy remains as the default compression but there is now a flag to switch 
the compression algorithm.

Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-07-11 14:57:57 +02:00
Julien Pivotto bf5bf1a4b3 TSDB: Remove usused import of sort
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-07-11 14:29:31 +02:00
Julien Pivotto 8c8afec116
Merge pull request #12542 from merrickclay/tsdb-doc-comment
improve incorrect doc comment
2023-07-11 13:10:04 +02:00
Julien Pivotto 0f85e4f41d
Merge pull request #12539 from bboreham/slices-sorts
Replace sort.Slice with faster slices.SortFunc
2023-07-11 13:09:02 +02:00
Merrick Clay 70e41fc5ac improve incorrect doc comment
Signed-off-by: Merrick Clay <merrick.e.clay@gmail.com>
2023-07-10 16:52:00 -06:00
Bryan Boreham ce153e3fff Replace sort.Sort with faster slices.SortFunc
The generic version is more efficient.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-07-10 09:43:45 +00:00
Marc Tudurí 4851ced266
tsdb: Support native histograms in snapshot on shutdown (#12258)
Signed-off-by: Marc Tuduri <marctc@protonmail.com>
2023-07-05 11:44:13 +02:00
Julien Pivotto 9ff1f24efa
Merge pull request #12505 from pracucci/fix-infinite-loop-in-index-writer
Fix infinite loop in index Writer when a series contains duplicated label names
2023-07-04 13:08:36 +02:00
Patrick Oyarzun 68e5937474
Apply relevant label matchers in LabelValues before fetching extra postings (#12274)
* Apply matchers when fetching label values

Signed-off-by: Patrick Oyarzun <patrick.oyarzun@grafana.com>

* Avoid extra copying of label values

Signed-off-by: Patrick Oyarzun <patrick.oyarzun@grafana.com>

---------

Signed-off-by: Patrick Oyarzun <patrick.oyarzun@grafana.com>
2023-07-04 10:37:58 +01:00
Bryan Boreham 5255bf06ad Replace sort.Slice with faster slices.SortFunc
The generic version is more efficient.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-07-02 22:17:08 +00:00
Marco Pracucci 35069910f5
Fix infinite loop in index Writer when a series contains duplicated label names
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2023-07-01 17:38:08 +02:00
Marco Pracucci 031d22df9e
Fix race condition in ChunkDiskMapper.Truncate() (#12500)
* Fix race condition in ChunkDiskMapper.Truncate()

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added unit test

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Update tsdb/chunks/head_chunks.go

Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>

---------

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-06-30 18:29:59 +05:30
Bartlomiej Plotka 4062f12573
Merge pull request #12396 from leizor/leizor/chunk-opts
Group args to append to memSeries in chunkOpts
2023-06-27 13:08:21 +02:00
Nidhey Nitin Indurkar a8772a4178
Feat: Get block by id directly on promtool analyze & get latest block if ID not provided (#12031)
* feat: analyze latest block or block by ID in CLI (promtool)

Signed-off-by: nidhey27 <nidhey.indurkar@infracloud.io>

* address remarks

Signed-off-by: nidhey60@gmail.com <nidhey.indurkar@infracloud.io>

* address latest review comments

Signed-off-by: nidhey60@gmail.com <nidhey.indurkar@infracloud.io>

---------

Signed-off-by: nidhey27 <nidhey.indurkar@infracloud.io>
Signed-off-by: nidhey60@gmail.com <nidhey.indurkar@infracloud.io>
2023-06-01 17:13:09 +05:30
Alan Protasio 73078bf738
Opmizing Group Regex (#12375)
Signed-off-by: Alan Protasio <alanprot@gmail.com>
2023-05-30 13:49:22 +02:00
Julien Pivotto 6f97641a51
Merge pull request #12380 from mmorel-35/patch-2
ci(lint): enable predeclared linter
2023-05-28 14:43:29 +02:00
Justin Lei e73d8b2084 Also pass chunkOpts into appendPreprocessor
Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-05-25 13:37:18 -07:00
Justin Lei 4c4454e4c9 Group args to append to memSeries in chunkOpts
Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-05-25 13:12:46 -07:00
Justin Lei 89af351730
Remove samplesPerChunk from memSeries (#12390)
Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-05-25 11:18:41 +02:00
zenador 37e5249e33
Use DefaultSamplesPerChunk in tsdb (#12387)
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2023-05-24 13:00:21 +02:00
Baskar Shanmugam 905a0bd63a
Added 'limit' query parameter support to /api/v1/status/tsdb endpoint (#12336)
* Added 'topN' query parameter support to /api/v1/status/tsdb endpoint

Signed-off-by: Baskar Shanmugam <baskar.shanmugam.career@gmail.com>

* Updated query parameter for tsdb status to 'limit'

Signed-off-by: Baskar Shanmugam <baskar.shanmugam.career@gmail.com>

* Corrected Stats() parameter name from topN to limit

Signed-off-by: Baskar Shanmugam <baskar.shanmugam.career@gmail.com>

* Fixed p.Stats CI failure

Signed-off-by: Baskar Shanmugam <baskar.shanmugam.career@gmail.com>

---------

Signed-off-by: Baskar Shanmugam <baskar.shanmugam.career@gmail.com>
2023-05-22 14:37:07 +02:00
Alan Protasio 8c5d4b4add
Opmize MatchNotEqual (#12377)
Signed-off-by: Alan Protasio <alanprot@gmail.com>
2023-05-21 10:41:30 +02:00
Matthieu MOREL c8e7f95a3c ci(lint): enable predeclared linter
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-05-21 07:33:54 +00:00