Commit graph

250 commits

Author SHA1 Message Date
Jeanette Tan 1be0816b46 Merge remote-tracking branch 'upstream/main' 2023-05-23 00:20:36 +08:00
Baskar Shanmugam 905a0bd63a
Added 'limit' query parameter support to /api/v1/status/tsdb endpoint (#12336)
* Added 'topN' query parameter support to /api/v1/status/tsdb endpoint

Signed-off-by: Baskar Shanmugam <baskar.shanmugam.career@gmail.com>

* Updated query parameter for tsdb status to 'limit'

Signed-off-by: Baskar Shanmugam <baskar.shanmugam.career@gmail.com>

* Corrected Stats() parameter name from topN to limit

Signed-off-by: Baskar Shanmugam <baskar.shanmugam.career@gmail.com>

* Fixed p.Stats CI failure

Signed-off-by: Baskar Shanmugam <baskar.shanmugam.career@gmail.com>

---------

Signed-off-by: Baskar Shanmugam <baskar.shanmugam.career@gmail.com>
2023-05-22 14:37:07 +02:00
Callum Styan 0d2108ad79
[tsdb] re-implement WAL watcher to read via a "notification" channel (#11949)
* WIP implement WAL watcher reading via notifications over a channel from
the TSDB code

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Notify via head appenders Commit (finished all WAL logging) rather than
on each WAL Log call

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix misspelled Notify plus add a metric for dropped Write notifications

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Update tests to handle new notification pattern

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* this test maybe needs more time on windows?

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* does this test need more time on windows as well?

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* read timeout is already a time.Duration

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* remove mistakenly commited benchmark data files

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* address some review feedback

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* fix missed changes from previous commit

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix issues from wrapper function

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* try fixing race condition in test by allowing tests to overwrite the
read ticker timeout instead of calling the Notify function

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* fix linting

Signed-off-by: Callum Styan <callumstyan@gmail.com>

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2023-05-15 12:31:49 -07:00
Björn Rabenstein 37fe9b89dc
Merge pull request #12055 from leizor/leizor/prometheus/issues/12009
Adjust samplesPerChunk from 120 to 220
2023-05-10 14:45:12 +02:00
Bryan Boreham 0ab9553611
tsdb: drop deleted series from the WAL sooner (#12297)
`head.deleted` holds the WAL segment in use at the time each series was
removed from the head. At the end of `truncateWAL()` we will delete
all segments up to `last`, so we can drop any series that were last seen
in a segment at or before that point.

(same change in Prometheus Agent too)

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-05-01 16:43:15 +01:00
Jeanette Tan 0fccba0db9 Merge remote-tracking branch 'upstream/main' 2023-04-26 21:25:21 +08:00
Matthieu MOREL bae9a21200
Merge branch 'main' into linter/nilerr
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-04-19 19:56:39 +02:00
Đurica Yuri Nikolić d162bb51b6
Merge pull request #485 from grafana/yuri/bring-prometheus-upstream
Synch with Prometheus up to 2023-04-18 (b028112)
2023-04-18 15:07:26 +02:00
Đurica Yuri Nikolić b028112331
Making the number of CPU cores used for sorting postings lists editable (#12247)
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
2023-04-18 12:13:05 +02:00
Jeanette Tan 1570114ae1 Merge remote-tracking branch 'upstream/main' 2023-04-14 17:34:40 +08:00
Justin Lei 052993414a Add storage.tsdb.samples-per-chunk flag
Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-04-13 15:59:49 -07:00
Matthieu MOREL fb3eb21230 enable gocritic, unconvert and unused linters
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-04-13 19:20:22 +00:00
beorn7 817a2396cb Name float values as "floats", not as "values"
In the past, every sample value was a float, so it was fine to call a
variable holding such a float "value" or "sample". With native
histograms, a sample might have a histogram value. And a histogram
value is still a value. Calling a float value just "value" or "sample"
or "V" is therefore misleading. Over the last few commits, I already
renamed many variables, but this cleans up a few more places where the
changes are more invasive.

Note that we do not to attempt naming in the JSON APIs or in the
protobufs. That would be quite a disruption. However, internally, we
can call variables as we want, and we should go with the option of
avoiding misunderstandings.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-13 19:25:24 +02:00
Oleg Zaytsev 4086a5f042 Merge branch 'main' into prometheus-2023-04-03-3923e83 2023-04-13 09:15:24 +02:00
Ganesh Vernekar e709b0b36e
Merge pull request #12127 from codesome/ooo-mmap-replay
Update OOO min/max time properly after replaying m-map chunks
2023-04-04 12:05:57 +05:30
Oleg Zaytsev 6e2905a4d4
Use zeropool.Pool to workaround SA6002 (#12189)
* Use zeropool.Pool to workaround SA6002

I built a tiny library called https://github.com/colega/zeropool to
workaround the SA6002 staticheck issue.

While searching for the references of that SA6002 staticheck issues on
Github first results was Prometheus itself, with quite a lot of ignores
of it.

This changes the usages of `sync.Pool` to `zeropool.Pool[T]` where a
pointer is not available.

Also added a benchmark for HeadAppender Append/Commit when series
already exist, which is one of the most usual cases IMO, as I didn't find
any.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Improve BenchmarkHeadAppender with more cases

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* A little copying is better than a little dependency

https://www.youtube.com/watch?v=PAAkCSZUG1c&t=9m28s

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Fix imports order

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Add license header

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Copyright should be on one of the first 3 lines

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Use require.Equal for testing

I don't depend on testify in my lib, but here we have it available.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Avoid flaky test

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Also use zeropool for pointsPool in engine.go

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

---------

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2023-03-29 20:34:34 +01:00
Ganesh Vernekar 41649ceb1b
Merge remote-tracking branch 'upstream/main' into codesome/sync-prom
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-03-22 08:35:08 +05:30
Abhijit Mukherjee 8f6d5dcd45
Fix: getting rid of EncOOOXOR chunk encoding (#12111)
Signed-off-by: mabhi <abhijit.mukherjee@infracloud.io>
2023-03-16 15:53:47 +05:30
Ganesh Vernekar 2af44f9558
tsdb: Update OOO min/max time properly after replaying m-map chunks
Without this fix, if snapshots were enabled, and wbl goes missing
between restarts, then TSDB does not recognize that there are ooo
mmap chunks on disk and we cannot query them until those chunks
are compacted into blocks.

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-03-13 13:14:00 +05:30
Ganesh Vernekar c9d06f2826
tsdb: Replay m-map chunk only when required
M-map chunks replayed on startup are discarded if there
was no WAL and no snapshot loaded, because there is no
series created in the Head that it can map to. So only
load m-map chunks from disk if there is either a snapshot
loaded or there is WAL on disk.

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-03-13 13:13:42 +05:30
Ganesh Vernekar 7e74f73733
Merge remote-tracking branch 'upstream/main' into sync-prom
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-03-13 12:38:59 +05:30
Ganesh Vernekar 6c008ec56a
Merge pull request #11962 from jesusvazquez/jvp/protect-new-compaction-head-from-uninitialized-wbl
TSDB: Protect NewOOOCompactionHead from an uninitialized wbl
2023-03-13 10:52:03 +05:30
Yuri Nikolic c7d730f549 Fixing conflicts with commit c9b85afd93 2023-03-08 17:27:44 +01:00
Yuri Nikolic 13c2945af0 Fixing conflicts with commit 86d3d5fec6 2023-03-08 16:48:58 +01:00
Đurica Yuri Nikolić c9b85afd93
Making the number of CPUs used for WAL replay configurable (#12066)
Adds `WALReplayConcurrency` as an option on tsdb `Options` and `HeadOptions`.
If it is not set or set <=0, then `GOMAXPROCS` is used, which matches the previous behaviour.

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
2023-03-07 16:41:33 +00:00
ansalamdaniel c1c444504e
Feat: metrics for head_chunks & wal folders (#12013)
Signed-off-by: ansalamdaniel <ansalam.daniel@infracloud.io>
2023-03-02 15:25:56 +05:30
Rens Groothuijsen d33eb3ab17
Automatically remove incorrect snapshot with index that is ahead of WAL (#11859)
Signed-off-by: Rens Groothuijsen <l.groothuijsen@alumni.maastrichtuniversity.nl>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-03-01 17:51:02 +05:30
Jesus Vazquez 5c3f058755 Add unit test and also protect truncateOOO
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
2023-02-10 15:18:17 +01:00
Justin Lei af1d9e01c7
Refactor tsdbutil for tests/native histograms (#11948)
* Add float histograms to ChunkFromSamplesGeneric

Signed-off-by: Justin Lei <justin.lei@grafana.com>

* Add Generate*Samples functions to tsdbutil

Signed-off-by: Justin Lei <justin.lei@grafana.com>

* PR responses

Signed-off-by: Justin Lei <justin.lei@grafana.com>

---------

Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-02-10 17:09:33 +05:30
György Krajcsovits 26936c4e91 Merge remote-tracking branch 'upstream/main' into krajo/merge-upstream-main
At 1f0cc09579
2023-02-01 13:35:06 +01:00
George Krajcsovits 1f0cc09579
Export single ith test histogram generation functions (#11911)
* Export single ith test histogram generation functions

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

* Do not set counter reset hint for non-gauge histograms individually

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

* Apply suggestions from code review

Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>

---------

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-02-01 16:23:38 +05:30
Marco Pracucci 950c177c72
Hardcode the labels stable hash function instead of taking it as an option
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2023-01-30 14:21:18 +01:00
Marco Pracucci 2461dee551
Merge remote-tracking branch 'remotes/prometheus/main' into update-upstream 2023-01-26 18:41:17 +01:00
Bryan Boreham 2b4c652e10
Merge pull request #369 from grafana/shard-func
* tsdb: make sharding function a parameter

Instead of relying on `labels.Hash()`, which may change, have the
caller pass in a shard function if required.

For most purposes `tsdb.Options.ShardFunc` is used, but the compactor
may be created independently so `NewLeveledCompactorWithChunkSize` also
takes a shard function parameter.

Regular Prometheus, which does not use block sharding, will have this
parameter as nil.

Rename WithCache functions as WithOptions

Where they now have 2 or more extra parameters.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-01-26 11:59:03 +00:00
beorn7 1cfc8f65a3 histograms: Return actually useful counter reset hints
This is a bit more conservative than we could be. As long as a chunk
isn't the first in a block, we can be pretty sure that the previous
chunk won't disappear. However, the incremental gain of returning
NotCounterReset in these cases is probably very small and might not be
worth the code complications.

Wwith this, we now also pay attention to an explicitly set counter
reset during ingestion. While the case doesn't show up in practice
yet, there could be scenarios where the metric source knows there was
a counter reset even if it might not be visible from the values in the
histogram. It is also useful for testing.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-01-25 16:57:21 +01:00
fayzal-g 7aef6e28fe Merge remote-tracking branch 'upstream/main' into merge-jan-16-upstream 2023-01-16 15:24:00 +00:00
Ganesh Vernekar cb2be6e62f
Merge pull request #11779 from codesome/memseries-ooo
tsdb: Only initialise out-of-order fields when required
2023-01-16 10:58:05 +05:30
Ganesh Vernekar 38fa151a7c
tsdb: Only initialise out-of-order fields when required
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-01-12 20:29:16 +05:30
Bryan Boreham 1aaabfee2d tsdb: make sharding function a parameter
Instead of relying on `labels.Hash()`, which may change, have the
caller pass in a shard function if required.

For most purposes `tsdb.Options.ShardFunc` is used, but the compactor
may be created independently so `NewLeveledCompactorWithChunkSize` also
takes a shard function parameter.

Regular Prometheus, which does not use block sharding, will have this
parameter as nil.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-01-12 11:41:22 +00:00
beorn7 6dcd03dbf3 tsdb: Add integer gauge histogram support
This follows what #11783 has done for float gauge histograms.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-01-11 13:28:43 +01:00
Ganesh Vernekar 57bcbf1888
Merge pull request #11783 from codesome/gauge-histogram
tsdb: Add gauge histogram support
2023-01-10 19:06:08 +05:30
Ganesh Vernekar 3c2ea91a83
tsdb: Test gauge float histograms
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2023-01-10 18:35:37 +05:30
Ganesh Vernekar c94a41c4b2
Merge pull request #11785 from Fish-pro/erroris
Use errors.Is to check for a specific error
2023-01-10 14:56:14 +05:30
György Krajcsovits 103c4fd289 Merge remote-tracking branch 'upstream/main' into main
# Conflicts:
#	.github/workflows/ci.yml
#	tsdb/block.go
#	tsdb/compact.go
#	tsdb/compact_test.go
#	tsdb/head_read.go
#	tsdb/index/index.go
#	tsdb/ooo_head_read.go
#	tsdb/querier_test.go
2023-01-08 14:55:44 +01:00
Fish-pro 6ed71a229e Use errors.Is to check for a specific error
Signed-off-by: Fish-pro <zechun.chen@daocloud.io>
2022-12-29 23:23:07 +08:00
Oleg Zaytsev d23859dee2
Allow forcing usage of PostingsForMatchersCache
When out-of-order is enabled, queries go through both Head and OOOHead,
and they both execute the same PostingsForMatchers call, as memSeries
are shared for both.

In some cases these calls can be heavy, and also frequent. We can
deduplicate those calls by using the PostingsForMatchers cache that we
already use for query sharding.

The usage of this cache can skip a newly appended series in the results
for the duration of the ttl.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2022-12-28 13:44:10 +01:00
Ganesh Vernekar e555469ba1
tsdb: Remove isHistogramSeries from memSeries
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-12-28 14:31:55 +05:30
Marc Tudurí 9474610baf
Support FloatHistogram in TSDB (#11522)
Extends Appender.AppendHistogram function to accept the FloatHistogram. TSDB supports appending, querying, WAL replay, for this new type of histogram.

Signed-off-by: Marc Tudurí <marctc@protonmail.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-12-28 14:25:07 +05:30
Jeanette Tan 51cf003517 Merge remote-tracking branch 'upstream/main'
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2022-11-23 01:39:23 +08:00
Julien Pivotto 739494d81b
Fix alignment of atomic int64 (#11547)
* Fix atomix int64 placement
* Test main for 386

Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2022-11-09 11:18:49 +01:00