Commit graph

524 commits

Author SHA1 Message Date
beorn7 4c28d9fac7 Move to histogram.Histogram pointers
This is to avoid copying the many fields of a histogram.Histogram all
the time.

This also fixes a bunch of formerly broken tests.

Signed-off-by: beorn7 <beorn@grafana.com>
2021-11-12 23:17:35 +01:00
Mauro Stettler 8a4f659126 fix error message
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
2021-11-12 21:45:46 +01:00
Robert Fratto 72a9f7fee9
Share TSDB locker code with agent (#9623)
* share tsdb db locker code with agent

Closes #9616

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

* add flag to disable lockfile for agent

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

* use agentOnlySetting instead of PreAction

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

* tsdb: address review feedback

1. Rename Locker to DirLocker
2. Move DirLocker to tsdb/tsdbutil
3. Name metric using fmt.Sprintf
4. Refine error checking in DirLocker test

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

* tsdb: create test utilities to assert expected DirLocker behavior

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

* tsdb/tsdbutil: fix lint errors

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

* tsdb/agent: fix windows test failure

Use new DB variable instead of overriding the old one.

Signed-off-by: Robert Fratto <robertfratto@gmail.com>
2021-11-11 11:45:25 -05:00
beorn7 f1065e44a4 model: String method for histogram.Histogram
This includes a regular bucket iterator and a string method for
histogram.Bucket.

Signed-off-by: beorn7 <beorn@grafana.com>
2021-11-11 17:29:22 +01:00
Peter Štibraný 422e7839d4
Add more size checks when writing individual sections in the index. (#9710)
* Add more size checks when writing individual sections in the index.

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>

* Use uint and add comment about it.

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
2021-11-11 15:44:28 +05:30
Mateusz Gozdek 83086aee00 tsdb/agent: use unique registry per tests
So tests can run in parallel.

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
2021-11-11 01:37:24 +01:00
Mateusz Gozdek 2f312ff4c5 tsdb: mark TestTombstoneCleanRetentionLimitsRace test as slow
It takes over 100 seconds to execute this test, so I'd consider it as
slow.

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
2021-11-11 01:37:24 +01:00
Levi Harrison 7400e07fa9
Close DB in Agent tests (#9630)
* Close agent db in tests

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Close first DB before opening second

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Use seperate variables for different DBs?

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Close remote storage

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Fix closing of stuff

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Remove the build flags after a rebase

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix closing of stuff 2

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

Co-authored-by: Julien Pivotto <roidelapluie@inuits.eu>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-11-10 20:05:54 +05:30
Mateusz Gozdek f4650c27e7 tsdb/wal: fix flaky TestReaderFuzz* tests
It seems sometimes you can get error like:

                Error:          Not equal:
                                expected: []byte(nil)
                                actual  : []byte{}

                                Diff:
                                --- Expected
                                +++ Actual
                                @@ -1,2 +1,3 @@
                                -([]uint8) <nil>
                                +([]uint8) {
                                +}

This commit does what bytes.Equal does to silence those differences. I'm
not sure if this is a correct solution or just covering up the actual bug.

Closes #9574

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
2021-11-09 14:32:20 +01:00
Björn Rabenstein 4c56a193c5
Merge pull request #9478 from prometheus/beorn7/pkg-deprecation
Move packages out of deprecated pkg directory
2021-11-09 11:09:16 +01:00
曹明 a0d31c28fc tsdb: Add windows arm64 support.
Signed-off-by: 曹明 <caoming1@kingsoft.com>
2021-11-09 11:07:27 +01:00
Mateusz Gozdek b319b14431
tsdb/chunks: preallocate at least some space on non-Windows systems (#9581)
To avoid potential chunk corruption read, which I am not sure why is
happening.

Closes #9561.

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
2021-11-09 13:47:00 +05:30
beorn7 c954cd9d1d Move packages out of deprecated pkg directory
This creates a new `model` directory and moves all data-model related
packages over there:
  exemplar labels relabel rulefmt textparse timestamp value

All the others are more or less utilities and have been moved to `util`:
  gate logging modetimevfs pool runtime

Signed-off-by: beorn7 <beorn@grafana.com>
2021-11-09 08:03:10 +01:00
beorn7 a1e595edac Fix two trivial lint warnings
Not sure why those show up for me locally but not if run by the CI.

Signed-off-by: beorn7 <beorn@grafana.com>
2021-11-08 22:32:13 +01:00
beorn7 8f92c90897 Add TODOs and some minor tweaks
Signed-off-by: beorn7 <beorn@grafana.com>
2021-11-07 17:12:04 +01:00
Dieter Plaetinck cda025b5b5
TSDB: demistify SeriesRefs and ChunkRefs (#9536)
* TSDB: demistify seriesRefs and ChunkRefs

The TSDB package contains many types of series and chunk references,
all shrouded in uint types.  Often the same uint value may
actually mean one of different types, in non-obvious ways.

This PR aims to clarify the code and help navigating to relevant docs,
usage, etc much quicker.

Concretely:

* Use appropriately named types and document their semantics and
  relations.
* Make multiplexing and demuxing of types explicit
  (on the boundaries between concrete implementations and generic
  interfaces).
* Casting between different types should be free.  None of the changes
  should have any impact on how the code runs.

TODO: Implement BlockSeriesRef where appropriate (for a future PR)

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* feedback

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* agent: demistify seriesRefs and ChunkRefs

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
2021-11-06 15:40:04 +05:30
johncming b882d2b7c7
tsdb/wal: Avoid writing closed channel. (#9566)
Signed-off-by: johncming <johncming@yahoo.com>
2021-11-06 15:11:06 +05:30
chenlujjj d18e42c650
refine comments of Checkpoint function (#9655)
Signed-off-by: chenlujjj <953546398@qq.com>
2021-11-06 15:09:16 +05:30
Marco Pracucci 309b094b92
Optimized MemPostings.EnsureOrder() (#9673)
* Optimizes MemPostings.EnsureOrder()

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Ignore linter warning

Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-11-05 10:01:23 +00:00
Ganesh Vernekar c8b267efd6
Get histograms from TSDB to the rate() function implementation
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-11-03 19:04:18 +05:30
Marco Pracucci 9f5ff5b269
Allow to disable trimming when querying TSDB (#9647)
* Allow to disable trimming when querying TSDB

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Addressed review comments

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added unit test

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Renamed TrimDisabled to DisableTrimming

Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-11-03 15:38:34 +05:30
Marco Pracucci edd05d7010
Add Head.AppendableMinValidTime() (#9643)
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-11-03 13:09:54 +05:30
Mateusz Gozdek b7bdf6fab2 Fix imports formatting
According to
2829908806 (r58457095).

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
2021-11-02 19:52:34 +01:00
Mateusz Gozdek 1a6c2283a3 Format Go source files using 'gofumpt -w -s -extra'
Part of #9557

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
2021-11-02 19:52:34 +01:00
Julien Pivotto 6e1d6edb33
Exclude agent from windows tests (#9645)
We are aware of the issue, but while we are working on it,
having main tests broken is an annoyance.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-11-02 13:58:51 +01:00
chenlujjj 660329d5b3
add tombstoneFormatVersionSize & tombstonesCRCSize constants (#9625)
Signed-off-by: chenlujjj <953546398@qq.com>
2021-11-01 16:05:19 +05:30
Praveen Ghuge 64d9b41998
Use testing.T.TempDir() instead of ioutil.TempDir() in tsdb/wal unit tests (#9602)
Signed-off-by: Praveen Ghuge <praveen.ghuge@outlook.com>
2021-11-01 12:28:18 +05:30
Robert Fratto bc72a718c4
Initial draft of prometheus-agent (#8785)
* Initial draft of prometheus-agent

This commit introduces a new binary, prometheus-agent, based on the
Grafana Agent code. It runs a WAL-only version of prometheus without the
TSDB, alerting, or rule evaluations. It is intended to be used to
remote_write to Prometheus or another remote_write receiver.

By default, prometheus-agent will listen on port 9095 to not collide
with the prometheus default of 9090.

Truncation of the WAL cooperates on a best-effort case with Remote
Write. Every time the WAL is truncated, the minimum timestamp of data to
truncate is determined by the lowest sent timestamp of all samples
across all remote_write endpoints. This gives loose guarantees that data
from the WAL will not try to be removed until the maximum sample
lifetime passes or remote_write starts functionining.

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

* add tests for Prometheus agent (#22)

* add tests for Prometheus agent

* add tests for Prometheus agent

* rearranged tests as per the review comments

* update tests for Agent

* changes as per code review comments

Signed-off-by: SriKrishna Paparaju <paparaju@gmail.com>

* incremental changes to prometheus agent

Signed-off-by: SriKrishna Paparaju <paparaju@gmail.com>

* changes as per code review comments

Signed-off-by: SriKrishna Paparaju <paparaju@gmail.com>

* Commit feedback from code review

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

* Port over some comments from grafana/agent

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

* Rename agent.Storage to agent.DB for tsdb consistency

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

* Consolidate agentMode ifs in cmd/prometheus/main.go

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

* Document PreAction usage requirements better for agent mode flags

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

* remove unnecessary defaultListenAddr

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

* `go fmt ./tsdb/agent` and fix lint errors

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

Co-authored-by: SriKrishna Paparaju <paparaju@gmail.com>
2021-10-29 16:25:05 +01:00
Xiaochao Dong c2d1c85857
close tsdb.head in test case (#9580)
Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>
2021-10-26 11:36:25 +05:30
Furkan Türkal 0c07663b70
fix: possible race on shared variables in test (#9470)
Fixes #9433

Signed-off-by: Furkan <furkan.turkal@trendyol.com>
2021-10-25 18:44:40 +05:30
Dieter Plaetinck d5bfbe3114
improve bstream comments and doc (#9560)
* improve bstream comments and doc

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* feedback

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
2021-10-25 18:44:15 +05:30
Julien Pivotto 73255e15f6 Address golint failures from revive
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-10-23 00:53:11 +02:00
Serge Catudal 8c3eca84db
Fix remote write receiver endpoint for exemplars (#9414)
Signed-off-by: Serge Catudal <serge.catudal@gmail.com>
2021-10-21 22:58:40 +02:00
beorn7 a9008f5423 Merge branch 'main' into sparsehistogram 2021-10-19 17:14:23 +02:00
beorn7 4998b9750f chunkenc: Bugfix and naming tweaks
Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-19 15:38:32 +02:00
beorn7 78ef9c6359 chunkenc: make xor reading more DRY
Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-19 15:28:33 +02:00
beorn7 4a1b84f8b2 chunkenc: make xor writing more DRY
Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-19 15:28:33 +02:00
Björn Rabenstein 3704c6c20a
Merge pull request #9533 from prometheus/beorn7/sparsehistogram
tsdb: Complete chunk format documentation
2021-10-19 13:51:46 +02:00
beorn7 1a4e54cfbb tsdb: Complete chunk format documentation
This also tweaks and fixes a few things done previously.

Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-19 13:51:30 +02:00
beorn7 0876d57aea chunkenc: Add test for chunk layout encoding
And fix a bug exposed by it...

Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-18 19:37:24 +02:00
beorn7 ad9b4c2b68 Fix typos
Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-18 15:44:13 +02:00
beorn7 fe50d6fc14 Update chunk layout documentation
Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-15 23:18:41 +02:00
beorn7 ed33aea392 Avoid redundant varint decoding in chunk appender construction
Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-15 20:33:14 +02:00
beorn7 d31bb75dc4 Use VarbitUint rather than VarbitInt to encode len(spans)
Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-15 15:27:32 +02:00
beorn7 3179215a59 Encode zero threshold first
This guaranees that the zero threshold is byte-aligned. Not sure if
that helps in any way, but at least it won't harm.

Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-14 14:55:21 +02:00
beorn7 c5522677bf Improve encoding of zero threshold
Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-14 14:47:26 +02:00
beorn7 7093b089f2 Use more varbit in histogram chunks
This adds bit buckets for larger numbers to varbit encoding and also
an unsigned version of varbit encoding.

Then, varbit encoding is used for all the histogram chunk data instead
of varint.

Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-13 20:03:35 +02:00
Björn Rabenstein 7309c20e7e
Merge pull request #9500 from codesome/resettests
Add unit test for counter reset header
2021-10-13 18:19:21 +02:00
Ganesh Vernekar dcaf568279
Metadata -> Layout renaming
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-10-13 20:27:48 +05:30
Ganesh Vernekar 4e206c7c77
Fix reviews
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-10-13 20:23:31 +05:30
Dieter Plaetinck d5afe0a577
TSDB: Use a dedicated head chunk reference type (#9501)
* Use dedicated Ref type

Throughout the code base, there are reference types masked as
regular integers.  Let's use dedicated types.  They are
equivalent, but clearer semantically.
This also makes it trivial to find where they are used,
and from uses, find the centralized docs.

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* postpone some work until after possible return

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* clarify

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* rename feedback

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* skip header is up to caller

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
2021-10-13 17:44:32 +05:30
Ganesh Vernekar 85e6686f84
Add unit test for counter reset header
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-10-13 15:57:42 +05:30
Björn Rabenstein 311673d62e
Save on slice allocations in histogramIterator (#9494)
Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-13 14:43:49 +05:30
Björn Rabenstein c450c01eb9
Remove obsolete TODOs about metadata (#9490)
Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-12 17:00:30 +05:30
beorn7 7a8bb8222c Style cleanup of all the changes in sparsehistogram so far
A lot of this code was hacked together, literally during a
hackathon. This commit intends not to change the code substantially,
but just make the code obey the usual style practices.

A (possibly incomplete) list of areas:

* Generally address linter warnings.

* The `pgk` directory is deprecated as per dev-summit. No new packages should
  be added to it. I moved the new `pkg/histogram` package to `model`
  anticipating what's proposed in #9478.

* Make the naming of the Sparse Histogram more consistent. Including
  abbreviations, there were just too many names for it: SparseHistogram,
  Histogram, Histo, hist, his, shs, h. The idea is to call it "Histogram" in
  general. Only add "Sparse" if it is needed to avoid confusion with
  conventional Histograms (which is rare because the TSDB really has no notion
  of conventional Histograms). Use abbreviations only in local scope, and then
  really abbreviate (not just removing three out of seven letters like in
  "Histo"). This is in the spirit of
  https://github.com/golang/go/wiki/CodeReviewComments#variable-names

* Several other minor name changes.

* A lot of formatting of doc comments. For one, following
  https://github.com/golang/go/wiki/CodeReviewComments#comment-sentences
  , but also layout question, anticipating how things will look like
  when rendered by `godoc` (even where `godoc` doesn't render them
  right now because they are for unexported types or not a doc comment
  at all but just a normal code comment - consistency is queen!).

* Re-enabled `TestQueryLog` and `TestEndopints` (they pass now,
  leaving them disabled was presumably an oversight).

* Bucket iterator for histogram.Histogram is now created with a
  method.

* HistogramChunk.iterator now allows iterator recycling. (I think
  @dieterbe only commented it out because he was confused by the
  question in the comment.)

* HistogramAppender.Append panics now because we decided to treat
  staleness marker differently.

Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-11 13:02:03 +02:00
beorn7 fd5ea4e0b5 Merge branch 'main' into sparsehistogram 2021-10-07 23:16:42 +02:00
Ganesh Vernekar 5d4dc7e413
Convert the header into an enum
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-10-07 19:53:24 +05:30
Ganesh Vernekar 175ef4ebcf
Add a NotCounterReset flag
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-10-06 16:06:34 +05:30
Ganesh Vernekar a280b6c2da
Fix review comments
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-10-06 15:28:10 +05:30
Ganesh Vernekar 10d4cb6dc0
Merge remote-tracking branch 'upstream/main' into release-2.30-merge 2021-10-05 20:35:14 +05:30
Ganesh Vernekar 9d81b2d610
Fix tests
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-10-05 14:21:24 +05:30
Ganesh Vernekar 59d77ff16d
Fix build
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-10-05 12:57:49 +05:30
Ganesh Vernekar 10bc6e80ee
Fix panic on failed snapshot replay and don't hard fail replay on disabled exemplars (#9438)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-10-05 10:51:25 +05:30
Ganesh Vernekar eb9931e961
Add info about counter resets in chunk meta
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-10-04 18:44:12 +05:30
Ganesh Vernekar b30db03f35
Cut v2.30.2 (#9426)
* Don't error on overlapping m-mapped chunks during WAL replay (#9381)

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Reduce log level during WAL replay on overlapping m-map chunks (#9425)

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Cut v2.30.2

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-10-01 17:00:22 +05:30
Ganesh Vernekar a7d499e19a
Reduce log level during WAL replay on overlapping m-map chunks (#9425)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-10-01 15:33:29 +05:30
Ganesh Vernekar 8c597e5166
Don't error on overlapping m-mapped chunks during WAL replay (#9381)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-10-01 14:34:12 +05:30
Ganesh Vernekar 1dd22ed655
Support stale samples for sparse histograms (#9352)
* Support stale samples for sparse histograms

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Don't cut a new chunk for every stale sample

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Update comments for HistoAppender.Appendable

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix review comments

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-10-01 13:41:51 +05:30
Bryan Boreham 1fb3c1b598
Replace calls to strings.Compare (#9397)
< is clearer and faster. As the documentation says,
"Basically no one should use strings.Compare."

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2021-09-27 17:33:53 +05:30
Ganesh Vernekar 2bcd9f2f69
Link 2 more TSDB blog posts in tsdb/README.md (#9371)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-09-21 21:35:33 +05:30
Darshan Chaudhary 1f688657bf
Call delete on head if interval overlaps (#9151)
* Call delete on head if interval overlaps

Signed-off-by: darshanime <deathbullet@gmail.com>

* Garbage collect tombstones during head gc

Signed-off-by: darshanime <deathbullet@gmail.com>

* Truncate tombstones before min time during head gc

Signed-off-by: darshanime <deathbullet@gmail.com>

* Lock less by deleting all keys in a single pass

Signed-off-by: darshanime <deathbullet@gmail.com>

* Pass map to DeleteTombstones

Signed-off-by: darshanime <deathbullet@gmail.com>

* Create new slice to replace old one

Signed-off-by: darshanime <deathbullet@gmail.com>
2021-09-16 12:20:03 +05:30
Ganesh Vernekar 30534e99d9
Take snapshot only after closing the WAL (#9328)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-09-13 18:30:41 +05:30
Ganesh Vernekar 8944520ccc
Fix deletion of old snapshots (#9314)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-09-08 19:53:44 +05:30
Bryan Boreham 2327236bb5
Decrement active_appenders metric when no samples added (#9230)
* Decrement active_appenders metric when no samples added

Also add a test that the metric is incremented and decremented as
expected with and without samples.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Fix comment

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-09-08 14:49:58 +05:30
Bryan Boreham 87d909df4a
Remove symbols map from TSDB head (#9301)
This saves memory, effort and locking.

Since every symbol is also added to postings, `Symbols()` can be
implemented there instead. This now has to build a map for
deduplication, but `Symbols()` is only called for compaction, and `gc()`
used to rebuild the symbols map after every compaction so not an
additional cost.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2021-09-08 14:48:48 +05:30
Callum Styan 93886d8417
Fix div by 0 panic is resize function. (#9286)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2021-09-02 11:08:05 -07:00
Ganesh Vernekar 1315d8ecb6
Remove query hacks in the API and fix metrics (#9275)
* Remove query hacks in the API and fix metrics

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Tests for the metrics

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Better way to count series on restart

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-08-31 17:01:19 +05:30
Ganesh Vernekar 35b1a82594
Exemplars in snapshot (#9255)
* Exemplars in snapshot

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix lint

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Add docs

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix lint

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix comments

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-08-30 19:34:38 +05:30
Ganesh Vernekar eeace6bcab
Add couple of metrics to track sparse histograms in TSDB (#9271)
* Add couple of metrics to track sparse histograms in TSDB

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix Beorn's comments

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-08-30 19:08:44 +05:30
Levi Harrison 06afe6162c
Also ignore func1
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-08-28 22:42:22 -04:00
Julien Pivotto d5676fb9e0
Merge pull request #9254 from prometheus/superq/go1.17
Build with Go 1.17 / npm 7 / node 16
2021-08-28 18:36:42 +02:00
SuperQ e167a45c65
Add new Go build tags.
Add new go:build comments based on 1.17 formatting[0].

[0]: https://golang.org/doc/go1.17#gofmt

Signed-off-by: SuperQ <superq@gmail.com>
2021-08-27 10:24:14 +02:00
Callum Styan cc55e57c1b
Fix a data race in the loadWAL function caused by reusing the same error var in multiple goroutines (#9259)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2021-08-27 11:49:34 +05:30
Ganesh Vernekar 8a5d8c15e3
Do not replay checkpoint if it is covered by snapshot (#9226)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-08-25 21:48:55 +05:30
Ganesh Vernekar c373200b75
Cut a new chunk on counter resets for any bucket
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-08-24 20:02:08 +05:30
Bryan Boreham 9dfdc3eb36
Speed up BenchmarkPostings_Stats (#9213)
The previous code re-used series IDs 1-1000 many times over, so a lot
of time was spent ensuring the lists of series were in ascending order.
The intended use of `MemPostings.Add()` is that all series IDs are
unique, and changing the benchmark to do this lets it finish ten times
faster.

(It doesn't affect the benchmark result, just the setup code)

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2021-08-18 10:27:21 +01:00
Ganesh Vernekar 328a74ca36
Fix bugs and add enhancements to the chunk snapshot (#9185)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-08-17 18:08:16 +01:00
Ganesh Vernekar eedb86783e
Fix queries on blocks for sparse histograms and add unit test (#9209)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-08-16 18:52:29 +05:30
Ganesh Vernekar 42f576aa18
Add test for sparse histogram compaction (#9208)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-08-16 16:32:23 +05:30
Jupiter 84ab705318
32 should better be replaced by "symbolFactor" (#9203)
Signed-off-by: tanghengjian <1040104807@qq.com>
2021-08-13 16:38:53 +05:30
Marco Pracucci 84e786ebc1
Fixed Decoder.Series() error checking (#9201)
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-08-13 16:11:41 +05:30
Julien Pivotto cab96a06ef
Merge release 2.29 in main (#9196)
* PromQL: Fix start and end keywords masking label and metric names

This commit fixes an issue with the "at modifier" that introduced two
new keywords: `start` and `end`. In grouping options and in metric
names, these keywords took precedence over metric or label names, so
that those metrics and labels could no longer be referenced.

Signed-off-by: Clayton Peters <clayton.peters@man.com>

* Add in additional tests for metrics and/or labels called start/end.

Signed-off-by: Clayton Peters <clayton.peters@man.com>

* *: Cut 2.29.0-rc.0

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* VERSION: bump to 2.29.0-rc.0

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* Remove experimental wording on size-based retention

Followup of #9004

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Fix PR reference in changelog

Signed-off-by: George Brighton <george@gebn.co.uk>

* Describe EC2 availability zone IDs at most once per refresh (#9142)

Signed-off-by: George Brighton <george@gebn.co.uk>

* Describe EC2 availability zones at most once per SD load

Closes #9142.

Signed-off-by: George Brighton <george@gebn.co.uk>

* Incorporate feedback

Signed-off-by: George Brighton <george@gebn.co.uk>

* Integrate feedback

Signed-off-by: George Brighton <george@gebn.co.uk>

* Add a compatibility note for macOS users.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* *: Cut v2.29.0-rc.1

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* Fix `kuma_sd` targetgroup reporting (#9157)

* Bundle all xDS targets into a single group

Signed-off-by: austin ce <austin.cawley@gmail.com>

* *: cut v2.29.0-rc.2

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* Rename links

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* bump codemirror-promql to 0.17.0

Signed-off-by: Augustin Husson <husson.augustin@gmail.com>

* *: cut v2.29.0

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* tsdb: align atomically accessed int64 (#9192)

This prevents a panic in 32-bit archs:
https://pkg.go.dev/sync/atomic#pkg-note-BUG

Fixed #9190

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Release 2.29.1 (#9193)

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

Co-authored-by: Clayton Peters <clayton.peters@man.com>
Co-authored-by: Frederic Branczyk <fbranczyk@gmail.com>
Co-authored-by: George Brighton <george@gebn.co.uk>
Co-authored-by: Austin Cawley-Edwards <austin.cawley@gmail.com>
Co-authored-by: Levi Harrison <git@leviharrison.dev>
Co-authored-by: Augustin Husson <husson.augustin@gmail.com>
2021-08-12 18:38:06 +02:00
Ganesh Vernekar f0688c21d6
Log sparse histograms into WAL and replay from it (#9191)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-08-11 17:38:48 +05:30
Ganesh Vernekar 095f572d4a
Sync sparsehistogram branch with main (#9189)
* Fix `kuma_sd` targetgroup reporting (#9157)

* Bundle all xDS targets into a single group

Signed-off-by: austin ce <austin.cawley@gmail.com>

* Snapshot in-memory chunks on shutdown for faster restarts (#7229)

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Rename links

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Remove Individual Data Type Caps in Per-shard Buffering for Remote Write (#8921)

* Moved everything to nPending buffer

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Simplify exemplar capacity addition

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Added pre-allocation

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Don't allocate if not sending exemplars

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Avoid deadlock when processing duplicate series record (#9170)

* Avoid deadlock when processing duplicate series record

`processWALSamples()` needs to be able to send on its output channel
before it can read the input channel, so reads to allow this in case the
output channel is full.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* processWALSamples: update comment

Previous text seems to relate to an earlier implementation.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Optimise WAL loading by removing extra map and caching min-time (#9160)

* BenchmarkLoadWAL: close WAL after use

So that goroutines are stopped and resources released

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* BenchmarkLoadWAL: make series IDs co-prime with #workers

Series are distributed across workers by taking the modulus of the
ID with the number of workers, so multiples of 100 are a poor choice.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* BenchmarkLoadWAL: simulate mmapped chunks

Real Prometheus cuts chunks every 120 samples, then skips those samples
when re-reading the WAL. Simulate this by creating a single mapped chunk
for each series, since the max time is all the reader looks at.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Fix comment

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Remove series map from processWALSamples()

The locks that is commented to reduce contention in are now sharded
32,000 ways, so won't be contended. Removing the map saves memory and
goes just as fast.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* loadWAL: Cache the last mmapped chunk time

So we can skip calling append() for samples it will reject.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Improvements from code review

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Full stops and capitals on comments

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Cache max time in both places mmappedChunks is updated

Including refactor to extract function `setMMappedChunks`, to reduce
code duplication.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Update head min/max time when mmapped chunks added

This ensures we have the correct values if no WAL samples are added for
that series.

Note that `mSeries.maxTime()` was always `math.MinInt64` before, since
that function doesn't consider mmapped chunks.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Split Go and React Tests (#8897)

* Added go-ci and react-ci

Co-authored-by: Julien Pivotto <roidelapluie@inuits.eu>
Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Remove search keymap from new expression editor (#9184)

Signed-off-by: Julius Volz <julius.volz@gmail.com>

Co-authored-by: Austin Cawley-Edwards <austin.cawley@gmail.com>
Co-authored-by: Levi Harrison <git@leviharrison.dev>
Co-authored-by: Julien Pivotto <roidelapluie@inuits.eu>
Co-authored-by: Bryan Boreham <bjboreham@gmail.com>
Co-authored-by: Julius Volz <julius.volz@gmail.com>
2021-08-11 15:43:17 +05:30
Bryan Boreham 040ef175eb
Optimise WAL loading by removing extra map and caching min-time (#9160)
* BenchmarkLoadWAL: close WAL after use

So that goroutines are stopped and resources released

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* BenchmarkLoadWAL: make series IDs co-prime with #workers

Series are distributed across workers by taking the modulus of the
ID with the number of workers, so multiples of 100 are a poor choice.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* BenchmarkLoadWAL: simulate mmapped chunks

Real Prometheus cuts chunks every 120 samples, then skips those samples
when re-reading the WAL. Simulate this by creating a single mapped chunk
for each series, since the max time is all the reader looks at.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Fix comment

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Remove series map from processWALSamples()

The locks that is commented to reduce contention in are now sharded
32,000 ways, so won't be contended. Removing the map saves memory and
goes just as fast.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* loadWAL: Cache the last mmapped chunk time

So we can skip calling append() for samples it will reject.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Improvements from code review

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Full stops and capitals on comments

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Cache max time in both places mmappedChunks is updated

Including refactor to extract function `setMMappedChunks`, to reduce
code duplication.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Update head min/max time when mmapped chunks added

This ensures we have the correct values if no WAL samples are added for
that series.

Note that `mSeries.maxTime()` was always `math.MinInt64` before, since
that function doesn't consider mmapped chunks.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2021-08-10 14:53:31 +05:30
Bryan Boreham 7407457243
Avoid deadlock when processing duplicate series record (#9170)
* Avoid deadlock when processing duplicate series record

`processWALSamples()` needs to be able to send on its output channel
before it can read the input channel, so reads to allow this in case the
output channel is full.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* processWALSamples: update comment

Previous text seems to relate to an earlier implementation.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2021-08-10 13:32:42 +05:30
Ganesh Vernekar ee7e0071d1
Snapshot in-memory chunks on shutdown for faster restarts (#7229)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-08-06 17:51:01 +01:00
Ganesh Vernekar 19e98e5469
Support storing the zero threshold in the histogram chunk (#9165)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-08-06 18:08:41 +05:30
Ganesh Vernekar 7026e6b4e4
Fix tests in histo_test.go (#9163)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-08-06 14:26:56 +05:30
Ganesh Vernekar 8b70e87ab9
Merge remote-tracking branch 'upstream/main' into sparse-refactor
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-08-05 12:16:08 +05:30