Commit graph

993 commits

Author SHA1 Message Date
Ben Ye ded35ef20d expose compactor metrics
Signed-off-by: Ben Ye <benye@amazon.com>
2024-03-31 15:10:29 -07:00
Nicolas Takashi 0b762db154
[refactor] moving mergedOOOChunks to ooo_head_read
Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>
2024-03-29 23:33:15 +00:00
Arve Knudsen 35aab01de0 tsdb/wlog.Checkpoint: Handle also float histograms
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-03-26 15:27:32 +01:00
Bartlomiej Plotka de05d5e11b
Merge pull request #13776 from aknuds1/arve/checkpoint-histogram-samples
tsdb/wlog.Checkpoint: Fix counting of histogram samples
2024-03-26 12:33:48 +01:00
Bartlomiej Plotka 25578f2b22
[test] Merge pull request #13790 from aknuds1/arve/retention-commit
tsdb.BeyondTimeRetention: Fix comment and test at retention duration
2024-03-26 12:26:32 +01:00
Nick Pillitteri 481f14e1c0
TSDB: Don't rely on integer overflow in head compaction check (#13755)
* TSDB: Don't compact the head block when empty

Don't compact the Head block if there have not yet been any samples
appended.

Previously, the logic for determining if the head should be compacted
relied on the default values for min and max time and integer overflow
when they were checked in `Head.compactable()`. The check in
`Head.compactable()` effectively did `math.MinInt64 - math.MaxInt64`
which overflowed and wrapped to `1`. Since `1` is less than `1.5`
times the chunk range, compaction did not happen. This was the correct
behavior but relying on overflow wrapping is surprising.

This change add a method for checking if the min and max time for the
head is unset and uses it to short-circuit compaction in that case.
It also replaces several explicit checks for the default value to
determine if the head has not yet had any samples added.

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
2024-03-26 12:17:38 +01:00
Ben Ye ceca6c4716
[ENHANCEMENT] TSDB: Log more statistics during startup (#13838)
* log chunk snapshot and mmap chunks replay duration together with total replay duration

Signed-off-by: Ben Ye <benye@amazon.com>
2024-03-26 11:16:27 +00:00
Bryan Boreham 78c0fd2f4d
Merge pull request #13799 from machine424/wbl
chore(tsdb): set the wbl to nil as well in DBReadOnly.loadDataAsQuery…
2024-03-25 15:03:56 +01:00
Bryan Boreham 080d440bf8 Merge remote-tracking branch 'origin/main' into pr/13461 2024-03-25 12:14:26 +00:00
machine424 2a2e2ed28b
chore(tsdb): set the wbl to nil as well in DBReadOnly.loadDataAsQueryable
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-03-20 17:13:39 +01:00
Arve Knudsen 07332f7427 TestTimeRetention: Split into two sub-tests
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-03-20 15:06:36 +01:00
Arve Knudsen af694dc295 Merge TestDB_BeyondTimeRetention into TestTimeRetention
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-03-20 09:07:16 +01:00
Arve Knudsen 9c7a734063 tsdb.BeyondTimeRetention: Fix comment and test at retention duration
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-03-19 09:10:21 +01:00
Bryan Boreham a0e93e403e
Merge pull request #13764 from bboreham/remove-deprecated-wal
[Cleanup] TSDB: Remove old deprecated WAL implementation

Deprecated since 2018.
2024-03-17 09:34:57 +00:00
Darshan Chaudhary b7047f7fcb
Fix retention boundary so 2h retention deletes blocks right at the 2h boundary (#9633)
Signed-off-by: darshanime <deathbullet@gmail.com>
2024-03-15 19:35:16 +01:00
Arve Knudsen cef1025ea8 tsdb/wlog.Checkpoint: Fix counting of histogram samples
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-03-15 10:23:59 +01:00
Bryan Boreham d45b5deb75 TSDB: move function only used in tests
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-03-15 08:54:47 +00:00
Bryan Boreham 3274cac0d3 TSDB: remove unused function
Was only used in old WAL implementation.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-03-15 08:51:57 +00:00
Arve Knudsen 1de49d5b69
Remove unused function tsdb/chunks.PopulatedChunk (#13763)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-03-14 11:15:17 +01:00
Bryan Boreham 87edf1f960 [Cleanup] TSDB: Remove old deprecated WAL implementation
Deprecated since 2018.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-03-13 15:57:23 +00:00
Bryan Boreham d08f054950
[ENHANCEMENT] TSDB: Check CRC without allocating (#13742)
Use the existing utility function which does this.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-03-12 12:24:27 +01:00
carrychair 856f6e49c8 fix function and struct name
Signed-off-by: carrychair <linghuchong404@gmail.com>
2024-03-09 17:53:17 +08:00
Björn Rabenstein 9acae57937
Merge pull request #13681 from krajorama/native-latency-histograms
Add native histograms to latency/duration metrics
2024-03-07 20:46:43 +01:00
Bryan Boreham bbe39af99f
tsdb: zero out Labels and memSeries pointers in pool (#13712)
* tsdb: zero out Labels and memSeries pointers in pool

So that the garbage-collector doesn't see this memory as still in use.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

---------

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Björn Rabenstein <github@rabenste.in>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
2024-03-06 13:36:34 +01:00
György Krajcsovits 4d4d822c36 Add native histograms to latency/duration metrics
Dogfood native histograms.
Allow dependent projects to migrate to native histograms.

I took the defaults from client_golang.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-03-01 14:44:38 +01:00
machine424 f477e0539a
Move from golang.org/x/exp/slices into slices now that we only support Go >= 1.21
Prevent adding back golang.org/x/exp/slices.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-02-28 14:54:53 +01:00
Bryan Boreham 925134e6de tsdb tests: make work with labels SymbolTable
Need to initialize decoders with SymbolTable.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-26 11:45:25 +00:00
Bryan Boreham 93b72ec5dd tsdb: create SymbolTables for labels as required
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-26 11:45:25 +00:00
Bryan Boreham 4d6bb2e0e4
Merge pull request #13628 from bboreham/cleanup-13583
tsdb/wlog: small cleanup of WAL watcher after #13583
2024-02-23 12:39:17 +00:00
George Krajcsovits c6c8f63516
Merge pull request #13607 from fionaliao/ooo-samples-appended-type
Add sample type label to outOfOrderSamplesAppended metric
2024-02-23 09:41:45 +01:00
Bryan Boreham 6ed56c9f04 WAL watcher: improve comments
Clarify in the first comment that it is `watch()` that waits, and reduce
verbiage.

The second comment was slightly contradictory to the first and otherwise
didn't seem to add much, since `currentSegment` was incremented just a
few lines later.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-22 09:32:46 +00:00
Bryan Boreham a975a83079 tsdb: clean up Watcher debug messages
Print lastSegment after it gets initialized.

Move variable declaration to first use.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-22 09:19:18 +00:00
Bryan Boreham 78f46bccca tsdb/wlog tests: remove unnecessary sleep check
Sleep() is documented to return immediately on negative or zero input.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-22 09:14:52 +00:00
Callum Styan 0c71230784
fix bug that would cause us to endlessly fall behind (#13583)
* fix bug that would cause us to only read from the WAL on the 15s
fallback timer if remote write had fallen behind and is no longer
reading from the WAL segment that is currently being written to

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* remove unintended logging, fix lint, plus allow test to take slightly
longer because cloud CI

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* address review feedback

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* fix watcher sleeps in test, flu brain is smooth

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* increase timeout, unfortunately cloud CI can require a longer timeout

Signed-off-by: Callum Styan <callumstyan@gmail.com>

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2024-02-21 17:09:07 -08:00
Fiona Liao 841a133514 Move histogramsAppended to be more consistent
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
2024-02-21 11:15:04 +00:00
Fiona Liao 52389647b2 Add type label to outOfOrderSamplesAppended metric
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
2024-02-19 15:24:39 +00:00
Bryan Boreham c0e36e6bb3 Standardise exemplar label as "trace_id"
This is consistent with the OpenTelemetry standard, and an example in OpenMetrics.

https://github.com/open-telemetry/opentelemetry-specification/blob/89aa01348139/specification/metrics/data-model.md#exemplars
https://github.com/OpenObservability/OpenMetrics/blob/138654493130/specification/OpenMetrics.md#exemplars-1

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-15 14:20:08 +00:00
Bryan Boreham 12cac5bd5c tsdb tests: use go-cmp instead of DeepEquals
Also one simpler call checking nil.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-08 19:32:33 +00:00
Bryan Boreham 17f48f2b3b Tests: use replacement DeepEquals in more places
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-08 19:32:33 +00:00
Bryan Boreham 39af788dbd Tests: use replacement DeepEquals using go-cmp
Use DeepEqual replacement using go-cmp, which is more flexible.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-08 19:30:20 +00:00
Peter Štibraný e2b9cfeeeb
Enforce chunks ordering when writing index. (#8085)
Document conditions on chunks. Add check on chunk time ordering.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
2024-02-04 16:31:49 +01:00
Bryan Boreham 98c4889029
Merge pull request #9298 from Creatone/creatone/use-testify
tests: Move from t.Errorf and others.
2024-02-04 16:27:57 +01:00
Mikhail Fesenko 419dd265cc
Fix strange code, add messages to code brought in #8106 (#13509)
Signed-off-by: Mikhail Fesenko <proggga@gmail.com>
2024-02-02 10:00:38 +01:00
Bryan Boreham 16e68c01e4 tests: remove err from message when testify prints it already
For instance `require.NoError` will print the unexpected error; we don't
need to include it in the message.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-01 14:18:01 +00:00
Mikhail Fesenko 5f2c3a5d3e
Small improvements, add const, remove copypasta (#8106)
Signed-off-by: Mikhail Fesenko <proggga@gmail.com>
Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com>
2024-02-01 14:30:50 +01:00
Paweł Szulik 5961f78186 Refactor tsdb tests to use testify.
Signed-off-by: Paweł Szulik <paul.szulik@gmail.com>
2024-01-31 16:03:17 +00:00
Bryan Boreham cd4562d3a6
Merge pull request #13473 from bboreham/pure-mutex
tsdb: use cheaper Mutex on series
2024-01-30 09:57:08 +00:00
Marco Pracucci 501bc6419e
Add ShardedPostings() support to TSDB (#10421)
This PR is a reference implementation of the proposal described in #10420.

In addition to what described in #10420, in this PR I've introduced labels.StableHash(). The idea is to offer an hashing function which doesn't change over time, and that's used by query sharding in order to get a stable behaviour over time. The implementation of labels.StableHash() is the hashing function used by Prometheus before stringlabels, and what's used by Grafana Mimir for query sharding (because built before stringlabels was a thing).

Follow up work
As mentioned in #10420, if this PR is accepted I'm also open to upload another foundamental piece used by Grafana Mimir query sharding to accelerate the query execution: an optional, configurable and fast in-memory cache for the series hashes.

Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-01-29 11:57:27 +00:00
Bryan Boreham 66237c1996 tsdb: use cheaper Mutex on series
Mutex is 8 bytes; RWMutex is 24 bytes and much more complicated. Since
`RLock` is only used in two places, `UpdateMetadata` and `Delete`,
neither of which are hotspots, we should use the cheaper one.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-01-26 11:01:39 +00:00
Marco Pracucci ec9cada56e
Remove unused isRegexMetaCharacter()
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-01-26 06:35:02 +01:00