Commit graph

97 commits

Author SHA1 Message Date
Arve Knudsen e48d4e5835 Merge remote-tracking branch 'prometheus/main' into chore/sync-prometheus
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2023-09-18 09:29:42 +02:00
Fiona Liao 4419399e4e
Do WBL mmap marker replay concurrently (#12801)
* Benchmark WBL

Extended WAL benchmark test with WBL parts too - added basic cases for
OOO handling - a percentage of series have a percentage of samples set
as OOO ones.

Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
2023-09-12 21:31:10 +02:00
Shirley d3a1044354
WBL loading: don't send empty buffers over chan (#12808)
Signed-off-by: Shirley Leu <4163034+fridgepoet@users.noreply.github.com>
Co-authored-by: Fiona Liao <fiona.y.liao@gmail.com>
2023-09-12 16:26:02 +02:00
Fiona Liao f211fcd92d
Remove duplicated ms.mmMaxTime check in WAL
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
2023-09-05 15:23:03 +01:00
Dimitar Dimitrov 77ac7ad40a
Merge remote-tracking branch 'upstream/main' into dimitar/pull-upstream 2023-09-05 16:19:00 +02:00
Mustafa Ateş Uzun e5e51bebef
fix: error message typo
Signed-off-by: Mustafa Ateş Uzun <mustafauzun0@gmail.com>
2023-08-17 16:34:45 +03:00
Łukasz Mierzwa 3c80963e81
Use a linked list for memSeries.headChunk (#11818)
Currently memSeries holds a single head chunk in-memory and a slice of mmapped chunks.
When append() is called on memSeries it might decide that a new headChunk is needed to use for given append() call.
If that happens it will first mmap existing head chunk and only after that happens it will create a new empty headChunk and continue appending
our sample to it.

Since appending samples uses write lock on memSeries no other read or write can happen until any append is completed.
When we have an append() that must create a new head chunk the whole memSeries is blocked until mmapping of existing head chunk finishes.
Mmapping itself uses a lock as it needs to be serialised, which means that the more chunks to mmap we have the longer each chunk might wait
for it to be mmapped.
If there's enough chunks that require mmapping some memSeries will be locked for long enough that it will start affecting
queries and scrapes.
Queries might timeout, since by default they have a 2 minute timeout set.
Scrapes will be blocked inside append() call, which means there will be a gap between samples. This will first affect range queries
or calls using rate() and such, since the time range requested in the query might have too few samples to calculate anything.

To avoid this we need to remove mmapping from append path, since mmapping is blocking.
But this means that when we cut a new head chunk we need to keep the old one around, so we can mmap it later.
This change makes memSeries.headChunk a linked list, memSeries.headChunk still points to the 'open' head chunk that receives new samples,
while older, yet to be mmapped, chunks are linked to it.
Mmapping is done on a schedule by iterating all memSeries one by one. Thanks to this we control when mmapping is done, since we trigger
it manually, which reduces the risk that it will have to compete for mmap locks with other chunks.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2023-07-31 11:10:24 +02:00
Jeanette Tan 8035c04624 Merge remote-tracking branch 'upstream/main'
Minor conflicts:
rules/manager.go
tsdb/compact.go
tsdb/db.go
go.mod
2023-07-19 21:40:27 +08:00
Justin Lei 32d87282ad
Add Zstandard compression option for wlog (#11666)
Snappy remains as the default compression but there is now a flag to switch 
the compression algorithm.

Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-07-11 14:57:57 +02:00
Marco Pracucci 7ad111b27e
Merge remote-tracking branch 'remotes/prometheus/main' into update-upstream 2023-07-06 15:13:54 +02:00
Marc Tudurí 4851ced266
tsdb: Support native histograms in snapshot on shutdown (#12258)
Signed-off-by: Marc Tuduri <marctc@protonmail.com>
2023-07-05 11:44:13 +02:00
Justin Lei 7622d5ac5f Group args to append to memSeries in chunkOpts
Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-07-04 15:01:01 +00:00
Justin Lei d0fea47a9c Remove samplesPerChunk from memSeries (#12390)
Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-07-04 13:38:31 +00:00
Justin Lei 4c4454e4c9 Group args to append to memSeries in chunkOpts
Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-05-25 13:12:46 -07:00
Justin Lei 89af351730
Remove samplesPerChunk from memSeries (#12390)
Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-05-25 11:18:41 +02:00
Jeanette Tan 0fccba0db9 Merge remote-tracking branch 'upstream/main' 2023-04-26 21:25:21 +08:00
Matthieu MOREL bae9a21200
Merge branch 'main' into linter/nilerr
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-04-19 19:56:39 +02:00
beorn7 c3c7d44d84 lint: Adjust to the lint warnings raised by current versions of golint-ci
We haven't updated golint-ci in our CI yet, but this commit prepares
for that.

There are a lot of new warnings, and it is mostly because the "revive"
linter got updated. I agree with most of the new warnings, mostly
around not naming unused function parameters (although it is justified
in some cases for documentation purposes – while things like mocks are
a good example where not naming the parameter is clearer).

I'm pretty upset about the "empty block" warning to include `for`
loops. It's such a common pattern to do something in the head of the
`for` loop and then have an empty block. There is still an open issue
about this: https://github.com/mgechev/revive/issues/810 I have
disabled "revive" altogether in files where empty blocks are used
excessively, and I have made the effort to add individual
`// nolint:revive` where empty blocks are used just once or twice.
It's borderline noisy, though, but let's go with it for now.

I should mention that none of the "empty block" warnings for `for`
loop bodies were legitimate.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-19 17:10:10 +02:00
Matthieu MOREL fb3eb21230 enable gocritic, unconvert and unused linters
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-04-13 19:20:22 +00:00
Oleg Zaytsev 4086a5f042 Merge branch 'main' into prometheus-2023-04-03-3923e83 2023-04-13 09:15:24 +02:00
Oleg Zaytsev 6e2905a4d4
Use zeropool.Pool to workaround SA6002 (#12189)
* Use zeropool.Pool to workaround SA6002

I built a tiny library called https://github.com/colega/zeropool to
workaround the SA6002 staticheck issue.

While searching for the references of that SA6002 staticheck issues on
Github first results was Prometheus itself, with quite a lot of ignores
of it.

This changes the usages of `sync.Pool` to `zeropool.Pool[T]` where a
pointer is not available.

Also added a benchmark for HeadAppender Append/Commit when series
already exist, which is one of the most usual cases IMO, as I didn't find
any.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Improve BenchmarkHeadAppender with more cases

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* A little copying is better than a little dependency

https://www.youtube.com/watch?v=PAAkCSZUG1c&t=9m28s

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Fix imports order

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Add license header

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Copyright should be on one of the first 3 lines

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Use require.Equal for testing

I don't depend on testify in my lib, but here we have it available.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Avoid flaky test

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Also use zeropool for pointsPool in engine.go

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

---------

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2023-03-29 20:34:34 +01:00
Yuri Nikolic c7d730f549 Fixing conflicts with commit c9b85afd93 2023-03-08 17:27:44 +01:00
Đurica Yuri Nikolić c9b85afd93
Making the number of CPUs used for WAL replay configurable (#12066)
Adds `WALReplayConcurrency` as an option on tsdb `Options` and `HeadOptions`.
If it is not set or set <=0, then `GOMAXPROCS` is used, which matches the previous behaviour.

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
2023-03-07 16:41:33 +00:00
György Krajcsovits 26936c4e91 Merge remote-tracking branch 'upstream/main' into krajo/merge-upstream-main
At 1f0cc09579
2023-02-01 13:35:06 +01:00
fayzal-g cfa4ea53cc Correctly update chunksRemoved and chunks metrics
Signed-off-by: fayzal-g <fayzal.ghantiwala@grafana.com>
2023-01-18 10:58:48 +00:00
György Krajcsovits cf4537a5f2 Merge remote-tracking branch 'upstream/main' into krajo/merge-upstream-main
At e6f84d5445
2023-01-17 13:42:29 +01:00
Ganesh Vernekar 6e560fe19b
tsdb: Avoid unnecessary allocation from 11779
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-01-17 16:53:49 +05:30
fayzal-g d2ac103ba9 Remove comments 2023-01-16 15:49:49 +00:00
fayzal-g 7aef6e28fe Merge remote-tracking branch 'upstream/main' into merge-jan-16-upstream 2023-01-16 15:24:00 +00:00
Ganesh Vernekar cb2be6e62f
Merge pull request #11779 from codesome/memseries-ooo
tsdb: Only initialise out-of-order fields when required
2023-01-16 10:58:05 +05:30
Ganesh Vernekar 38fa151a7c
tsdb: Only initialise out-of-order fields when required
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-01-12 20:29:16 +05:30
György Krajcsovits 103c4fd289 Merge remote-tracking branch 'upstream/main' into main
# Conflicts:
#	.github/workflows/ci.yml
#	tsdb/block.go
#	tsdb/compact.go
#	tsdb/compact_test.go
#	tsdb/head_read.go
#	tsdb/index/index.go
#	tsdb/ooo_head_read.go
#	tsdb/querier_test.go
2023-01-08 14:55:44 +01:00
Ganesh Vernekar e555469ba1
tsdb: Remove isHistogramSeries from memSeries
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-12-28 14:31:55 +05:30
Marc Tudurí 9474610baf
Support FloatHistogram in TSDB (#11522)
Extends Appender.AppendHistogram function to accept the FloatHistogram. TSDB supports appending, querying, WAL replay, for this new type of histogram.

Signed-off-by: Marc Tudurí <marctc@protonmail.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-12-28 14:25:07 +05:30
Jeanette Tan 51cf003517 Merge remote-tracking branch 'upstream/main'
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2022-11-23 01:39:23 +08:00
Ganesh Vernekar 8ee4dfd40c
Fix the build after conflict resolution
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-10-12 17:59:42 +05:30
Ganesh Vernekar 648be89822
Merge remote-tracking branch 'upstream/main' into fix-conflict
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-10-12 14:20:02 +05:30
Signed-off-by: Jesus Vazquez 3362bf6d79
Fix merge conflicts
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-10-11 22:53:37 +05:30
Jesus Vazquez 775d90d5f8
TSDB: Rename wal package to wlog (#11352)
The wlog.WL type can now be used to create a Write Ahead Log or a Write
Behind Log.

Before the prefix for wbl metrics was
'prometheus_tsdb_out_of_order_wal_' and has been replaced with
'prometheus_tsdb_out_of_order_wbl_'.

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Signed-off-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
2022-10-10 20:38:46 +05:30
Ganesh Vernekar b522fb0b76
Merge remote-tracking branch 'upstream/main' into sync-prom 2022-10-10 18:07:39 +05:30
Jesus Vazquez e934d0f011 Merge 'main' into sparsehistogram
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
2022-10-05 22:14:49 +02:00
Ganesh Vernekar 83d9ee3ab7
Merge remote-tracking branch 'upstream/main' into sync-prom
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-10-05 20:33:36 +05:30
Bryan Boreham d166da7b59
tsdb: stop saving a copy of last 4 samples in memSeries (#11296)
* TSDB chunks: remove race between writing and reading

Because the data is stored as a bit-stream, the last byte in the stream
could change if the stream is appended to after an Iterator is obtained.
Copy the last byte when the Iterator is created, so we don't have to
read it later.

Clarify in comments that concurrent Iterator and Appender are allowed,
but the chunk must not be modified while an Iterator is created.
(This was already the case, in order to copy the bstream slice header.)

* TSDB: stop saving last 4 samples in memSeries

This extra copy of the last 4 samples was introduced to avoid a race
condition between reading the last byte of the chunk and writing to it.

But now we have fixed that by having `bstreamReader` copy the last byte,
we don't need to copy the last 4 samples.

This change saves 56 bytes per series, which is very worthwhile when
you have millions or tens of millions of series.

* TSDB: tidy up stopIterator re-use

Previous changes have left this code duplicating some lines; pull
them out to a separate function and tidy up.

* TSDB head_test: stop checking when iterators are wrapped

The behaviour has changed so chunk iterators are only wrapped when
transaction isolation requires them to stop short of the end.
This makes tests fail which are checking the type.

Tests should check the observable behaviour, not the type.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-09-27 19:32:05 +05:30
Bryan Boreham d0607435a2
tsdb: remove chunkRange and oooCapMax from memSeries (#11288)
* tsdb: remove chunkRange from memSeries

chunkRange is the (oddly-named) configured duration for the head block.

We don't need a copy of this value per series. Pass it down where
required, and remove the copy.

The value in `Head` is only updated in `resetInMemoryState()`, which
also discards all `memSeries`.

* tsdb: remove oooCapMax from memSeries

oooCapMax is the configured maximum capacity for an out-of-order chunk.

Storing it per-series uses extra memory, and has surprising behaviour
if users change the value in config - series created before the change
will keep their old value.

Instead, pass it down where required, and remove the per-series value.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-09-27 13:52:22 +05:30
Jesus Vazquez c1b669bf9b
Add out-of-order sample support to the TSDB (#11075)
* Introduce out-of-order TSDB support

This implementation is based on this design doc:
https://docs.google.com/document/d/1Kppm7qL9C-BJB1j6yb6-9ObG3AbdZnFUBYPNNWwDBYM/edit?usp=sharing

This commit adds support to accept out-of-order ("OOO") sample into the TSDB
up to a configurable time allowance. If OOO is enabled, overlapping querying
are automatically enabled.

Most of the additions have been borrowed from
https://github.com/grafana/mimir-prometheus/
Here is the list ist of the original commits cherry picked
from mimir-prometheus into this branch:
- 4b2198d7ec
- 2836e5513f
- 00b379c3a5
- ff0dc75758
- a632c73352
- c6f3d4ab33
- 5e8406a1d4
- abde1e0ba1
- e70e769889
- df59320886

Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Dieter Plaetinck <dieter@grafana.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* gofumpt files

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Add license header to missing files

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix OOO tests due to existing chunk disk mapper implementation

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix truncate int overflow

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Add Sync method to the WAL and update tests

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* remove useless sync

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Update minOOOTime after truncating Head

* Update minOOOTime after truncating Head

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix lint

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Add a unit test

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Load OutOfOrderTimeWindow only once per appender

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix OOO Head LabelValues and PostingsForMatchers

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix replay of OOO mmap chunks

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Remove unnecessary err check

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Prevent panic with ApplyConfig

Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Run OOO compaction after restart if there is OOO data from WBL

Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Apply Bartek's suggestions

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Refactor OOO compaction

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Address comments and TODOs

- Added a comment explaining why we need the allow overlapping
  compaction toggle
- Clarified TSDBConfig OutOfOrderTimeWindow doc
- Added an owner to all the TODOs in the code

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Run go format

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix remaining review comments

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix tests

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Change wbl reference when truncating ooo in TestHeadMinOOOTimeUpdate

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix TestWBLAndMmapReplay test failure on windows

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Address most of the feedback

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Refactor the block meta for out of order

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix windows error

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix review comments

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Dieter Plaetinck <dieter@grafana.com>
Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2022-09-20 22:35:50 +05:30
Bryan Boreham af6167df58
WAL loading: don't send empty buffers over chan (#11319)
If some shards did not get any samples mapped, the buffer will be empty
so sending it over the chan to `processWALSamples()` is a waste of time.

This is especially likely now we are checking `minValidTime` before
sending.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-09-20 19:43:30 +05:30
Bryan Boreham e49d596fb1
WAL loading: check sample time is valid earlier (#11307)
There's no point splitting a sample onto the right shard and checking
if the series needs to be re-mapped, if we're only going to discard it
once it arrives at `ProcessWALSamples()`. Simply discard it earlier.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-09-15 12:36:57 +05:30
Julien Pivotto ec6c1f17d1
Update dependencies (#11287)
Updating dependencies following CI changes and move to go 1.19

Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2022-09-09 13:28:55 +02:00
Ganesh Vernekar f540c1dbd3
Add support for histograms in WAL checkpointing (#11210)
* Add support for histograms in WAL checkpointing

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix review comments

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix tests

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-08-29 17:38:36 +05:30
Xiaochao Dong 09187fb0cc
Replay WAL concurrently without blocking (#10973)
* Replay WAL concurrently without blocking

Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>

* Resolve review comments

Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>

Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>
2022-08-17 19:23:57 +05:30