Commit graph

359 commits

Author SHA1 Message Date
Jupiter 84ab705318
32 should better be replaced by "symbolFactor" (#9203)
Signed-off-by: tanghengjian <1040104807@qq.com>
2021-08-13 16:38:53 +05:30
Marco Pracucci 84e786ebc1
Fixed Decoder.Series() error checking (#9201)
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-08-13 16:11:41 +05:30
Julien Pivotto cab96a06ef
Merge release 2.29 in main (#9196)
* PromQL: Fix start and end keywords masking label and metric names

This commit fixes an issue with the "at modifier" that introduced two
new keywords: `start` and `end`. In grouping options and in metric
names, these keywords took precedence over metric or label names, so
that those metrics and labels could no longer be referenced.

Signed-off-by: Clayton Peters <clayton.peters@man.com>

* Add in additional tests for metrics and/or labels called start/end.

Signed-off-by: Clayton Peters <clayton.peters@man.com>

* *: Cut 2.29.0-rc.0

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* VERSION: bump to 2.29.0-rc.0

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* Remove experimental wording on size-based retention

Followup of #9004

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Fix PR reference in changelog

Signed-off-by: George Brighton <george@gebn.co.uk>

* Describe EC2 availability zone IDs at most once per refresh (#9142)

Signed-off-by: George Brighton <george@gebn.co.uk>

* Describe EC2 availability zones at most once per SD load

Closes #9142.

Signed-off-by: George Brighton <george@gebn.co.uk>

* Incorporate feedback

Signed-off-by: George Brighton <george@gebn.co.uk>

* Integrate feedback

Signed-off-by: George Brighton <george@gebn.co.uk>

* Add a compatibility note for macOS users.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* *: Cut v2.29.0-rc.1

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* Fix `kuma_sd` targetgroup reporting (#9157)

* Bundle all xDS targets into a single group

Signed-off-by: austin ce <austin.cawley@gmail.com>

* *: cut v2.29.0-rc.2

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* Rename links

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* bump codemirror-promql to 0.17.0

Signed-off-by: Augustin Husson <husson.augustin@gmail.com>

* *: cut v2.29.0

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* tsdb: align atomically accessed int64 (#9192)

This prevents a panic in 32-bit archs:
https://pkg.go.dev/sync/atomic#pkg-note-BUG

Fixed #9190

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Release 2.29.1 (#9193)

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

Co-authored-by: Clayton Peters <clayton.peters@man.com>
Co-authored-by: Frederic Branczyk <fbranczyk@gmail.com>
Co-authored-by: George Brighton <george@gebn.co.uk>
Co-authored-by: Austin Cawley-Edwards <austin.cawley@gmail.com>
Co-authored-by: Levi Harrison <git@leviharrison.dev>
Co-authored-by: Augustin Husson <husson.augustin@gmail.com>
2021-08-12 18:38:06 +02:00
Bryan Boreham 040ef175eb
Optimise WAL loading by removing extra map and caching min-time (#9160)
* BenchmarkLoadWAL: close WAL after use

So that goroutines are stopped and resources released

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* BenchmarkLoadWAL: make series IDs co-prime with #workers

Series are distributed across workers by taking the modulus of the
ID with the number of workers, so multiples of 100 are a poor choice.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* BenchmarkLoadWAL: simulate mmapped chunks

Real Prometheus cuts chunks every 120 samples, then skips those samples
when re-reading the WAL. Simulate this by creating a single mapped chunk
for each series, since the max time is all the reader looks at.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Fix comment

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Remove series map from processWALSamples()

The locks that is commented to reduce contention in are now sharded
32,000 ways, so won't be contended. Removing the map saves memory and
goes just as fast.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* loadWAL: Cache the last mmapped chunk time

So we can skip calling append() for samples it will reject.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Improvements from code review

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Full stops and capitals on comments

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Cache max time in both places mmappedChunks is updated

Including refactor to extract function `setMMappedChunks`, to reduce
code duplication.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Update head min/max time when mmapped chunks added

This ensures we have the correct values if no WAL samples are added for
that series.

Note that `mSeries.maxTime()` was always `math.MinInt64` before, since
that function doesn't consider mmapped chunks.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2021-08-10 14:53:31 +05:30
Bryan Boreham 7407457243
Avoid deadlock when processing duplicate series record (#9170)
* Avoid deadlock when processing duplicate series record

`processWALSamples()` needs to be able to send on its output channel
before it can read the input channel, so reads to allow this in case the
output channel is full.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* processWALSamples: update comment

Previous text seems to relate to an earlier implementation.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2021-08-10 13:32:42 +05:30
Ganesh Vernekar ee7e0071d1
Snapshot in-memory chunks on shutdown for faster restarts (#7229)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-08-06 17:51:01 +01:00
Ganesh Vernekar 848cb5a6d6
Enhanced WAL replay for duplicate series record (#7438)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-08-03 20:03:54 +05:30
Ganesh Vernekar 8002a3ab80
Breakdown tsdb/head.go into multiple files (#9147)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-08-03 14:14:26 +02:00
jinglina 1a430e5f89
remove redundant parentheses (#9134)
Signed-off-by: jinglina <jinglinax@163.com>
2021-07-29 18:26:57 +05:30
Darshan Chaudhary c4f2e9eec5
Add present_over_time (#9097)
* Add present_over_time

Signed-off-by: darshanime <deathbullet@gmail.com>

* Add tests for present_over_time

Signed-off-by: darshanime <deathbullet@gmail.com>

* Address PR comments

Signed-off-by: darshanime <deathbullet@gmail.com>

* Add documentation for present_over_time

Signed-off-by: darshanime <deathbullet@gmail.com>

* Update documentation

Signed-off-by: darshanime <deathbullet@gmail.com>

* Update documentation comment

Signed-off-by: darshanime <deathbullet@gmail.com>
2021-07-29 12:38:11 +02:00
Oleg Zaytsev f9482c5bf6
Clarify computeChunkEndTime's purpose (#9049)
I was struggling to understand the purpose of this method until I
tweaked the tests, so I decided to write down my observations.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2021-07-28 18:39:05 +05:30
Bryan Boreham 60804c5a09
remote_write: reduce blocking from garbage-collect of series (#9109)
* Refactor: pass segment-reading function as param

To allow a different implementation to be used when garbage-collecting.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* remote_write: reduce blocking from GC of series

Add a method `UpdateSeriesSegment()` which is used together with
`SeriesReset()` to garbage-collect old series. This allows us to
split the lock around queueManager series data and avoid blocking
`Append()` while reading series from the last checkpoint.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Cosmetic: review feedback on comments

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* remote-write benchmark: include GC of series

Reduce the total number of samples per iteration from 5000*5000
(25 million) which is too big for my laptop, to 1*10000.

Extend `createTimeseries()` to add additional labels, so that the
queue manager is doing more realistic work.

Move the Append() call to a background goroutine - this works because
TestWriteClient uses a WaitGroup to signal completion.

Call `StoreSeries()` and `SeriesReset()` while adding samples, to
simulate the garbage-collection that wal.Watcher does.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Change BenchmarkSampleDelivery to call UpdateSeriesSegment

This matches what Watcher.garbageCollectSeries() is doing now

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2021-07-27 13:21:48 -07:00
Bryan Boreham dea37853d9
tsdb: use dennwc/varint to speed up WAL decoding (#9106)
* tsdb: use dennwc/varint to speed up decoding

This is a tiny library, MIT-licensed, which unrolls the loop to go
about twice as fast.

Needed to copy the sign-inverting logic inline, previously provided by
the `binary` package.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* More comments to explain varint decoding

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2021-07-27 10:02:57 +05:30
Bryan Boreham 6788760efa
Reduce memory allocation in benchmarkIterator() (#5983)
Previously it was allocating millions of chunks, all containing the
same 250 samples.  Above some ratio of CPU performance to available
memory, the benchmark cannot run.

Make 250 a const and just allocate one chunk which we iterate
repeatedly till we reach the benchmark count.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2021-07-26 19:36:54 +05:30
Oleg Zaytsev b1ed4a0a66
LabelNames API with matchers (#9083)
* Push the matchers for LabelNames all the way into the index.

NB This doesn't actually implement it in the index, just plumbs it through for now...

Signed-off-by: Tom Wilkie <tom@grafana.com>

* Hack it up.  Does not work.

Signed-off-by: Tom Wilkie <tom@grafana.com>

* Revert changes I don't understand

Can't see why do we need to hold a mutex on symbols, and the purpose of
the LabelNamesFor method.

Maybe I'll need to re-add this later.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Implement LabelNamesFor

This method provides the label names that appear in the postings
provided. We do that deeper than the label values because we know
beforehand that most of the label names we'll be the same across
different postings, and we don't want to go down an up looking up the
same symbols for all different series.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Mutex on symbols should be unlocked

However, I still don't understand why do we need a mutex here.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Fix head.LabelNamesFor

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Implement mockIndex LabelNames with matchers

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Nitpick on slice initialisation

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Add tests for LabelNamesWithMatchers

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Fix the mutex mess on head.LabelValues/LabelNames

I still don't see why we need to grab that unrelated mutex, but at least
now we're grabbing it consistently

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Check error after iterating postings

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Use the error from posting when there was en error in postings

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Update storage/interface.go comment

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>

* Update tsdb/index/index.go comment

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>

* Update tsdb/index/index.go wrapped error msg

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>

* Update tsdb/index/index.go wrapped error msg

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>

* Update tsdb/index/index.go warpped error msg

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>

* Remove unneeded comment

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Add testcases for LabelNames w/matchers in api.go

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Use t.Cleanup() instead of defer in tests

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

Co-authored-by: Tom Wilkie <tom@grafana.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
2021-07-20 18:08:08 +05:30
Jupiter 7337ecf0d3
Log when total symbol size exceeds 2^32 bytes. (#9104)
* Compaction fails when total symbol size exceeds 2^32 bytes.
Signed-off-by: tanghengjian <1040104807@qq.com>

* Compaction fails when total symbol size exceeds 2^32 bytes.

Signed-off-by: tanghengjian <1040104807@qq.com>

* Compaction fails when total symbol size exceeds 2^32 bytes.

Signed-off-by: root <tanghengjian@oppo.com>

Co-authored-by: root <tanghengjian@oppo.com>
2021-07-20 15:51:36 +05:30
Ganesh Vernekar 59d02b5ef0
tsdb: Block Head GC till pending readers are done reading (#9081)
* tsdb: Block Head GC till pending readers are done reading

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix review comments

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix review comments 2

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix the exclusiveness of maxt in WaitForPendingReadersInTimeRange

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-07-20 14:17:20 +05:30
Martin Disibio 1bcd13d6b5
Exemplar resize (#8974)
* Create experimental circular buffer resize method, benchmarks

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Optimize exemplar resize to only replay as many exemplars as needed

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* More comments, benchmark AddExemplar

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* optimizations

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* comment

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Slight refactor of resize benchmark + make use of resize via runtime
reloadable storage config.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Some more config related changes.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address some review comments.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address more review comments.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Refactor to remove usage of noopExemplarStorage and avoid race condition
when resizing from Head code.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix or add comments to clarify some of the new behaviour.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* fix potential panics related to negative exemplar buffer lengths

Signed-off-by: Callum Styan <callumstyan@gmail.com>

Co-authored-by: Callum Styan <callumstyan@gmail.com>
2021-07-20 10:22:57 +05:30
ide-rea 14b24d15b0
update checkpoint replay status (#8898)
* Consider  wal checkpoint replay status

Signed-off-by: XiaoYu Zhang <ideoutrea@163.com>

* Fix tests failed

Signed-off-by: XiaoYu Zhang <ideoutrea@163.com>

* Update checkpoint replay status

Signed-off-by: XiaoYu Zhang <ideoutrea@163.com>
2021-07-13 15:38:07 +05:30
Bryan Boreham dc8f505595
tsdb: coalesce lock/unlock operations for append (#9061)
Fetch the low watermark value under the same lock as we need for the
appender, rather than releasing then re-aquiring a lock on the same
Mutex.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2021-07-07 18:58:20 +05:30
Oleg Zaytsev 40126a8494
Use binary literals for xor chunk encoding
An opinionated cosmetic change, but since go 1.13 we have this fancy
0b.... literals so we don't need to write hex and comment the binary
value.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2021-07-05 16:39:24 +02:00
Julien Pivotto fa6b2897f0
Merge pull request #8956 from LeviHarrison/fix-tsdb-test-flake
CI: Ignore goleak in TSDB test
2021-06-20 23:41:55 +02:00
Levi Harrison 437c470c40 Added ignore
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-06-17 10:04:52 -04:00
Levi Harrison 4a4882d4c7 Replace godoc.org links
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-06-17 07:18:51 -04:00
Julien Pivotto b1c179be85
Fix main build (#8948)
Was broken after the merge of #8824

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-06-16 17:18:32 +02:00
Julien Duchesne 8855c2e626
Add prometheus_tsdb_clean_start metric (#8824)
Add cleanup of the lockfile when the db is cleanly closed

The metric describes the status of the lockfile on startup
0: Already existed
1: Did not exist
-1: Disabled

Therefore, if the min value over time of this metric is 0, that means that executions have exited uncleanly
We can then use that metric to have a much lower threshold on the crashlooping alert:

If the metric exists and it has been zero, two restarts is enough to trigger the alarm
If it does not exist (old prom version for example), the current five restarts threshold remains

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>

* Change metric name + set unset value to -1

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>

* Only check the last value of the clean start alert

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>

* Fix test + nit

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>
2021-06-16 15:03:02 +05:30
Levi Harrison b5f6f8fb36 Switched to go-kit/log
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-06-11 12:28:36 -04:00
Levi Harrison 7bc11dcb06
React UI: Add Starting Screen (#8662)
* Added walreplay API endpoint

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Added starting page to react-ui

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Documented the new endpoint

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Fixed typos

Signed-off-by: Levi Harrison <git@leviharrison.dev>

Co-authored-by: Julius Volz <julius.volz@gmail.com>

* Removed logo

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Changed isResponding to isUnexpected

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Changed width of progress bar

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Changed width of progress bar

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Added DB stats object

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Updated starting page to work with new fields

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Passing nil

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Passing nil (pt. 2)

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Passing nil (pt. 3)

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Passing nil (and also implementing a method this time) (pt. 4)

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Passing nil (and also implementing a method this time) (pt. 5)

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Changed const to let

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Passing nil (pt. 6)

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Remove SetStats method

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Added comma

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Changed api

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Changed to triple equals

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Fixed data response types

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Don't return pointer

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Changed version

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Fixed interface issue

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Fixed pointer

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Fixed copying lock value error

Signed-off-by: Levi Harrison <git@leviharrison.dev>

Co-authored-by: Julius Volz <julius.volz@gmail.com>
2021-06-05 15:29:32 +01:00
Oleg Zaytsev 6d99731303
Single literal regexp value testcase for querier
It's common to see queries like bar=~"foo" from machine generated
queries in the fronted. These are not evaluated as regexps, but are a
single-value-set, i.e. and equality matchings instead.

This is just a testcase for a single-value case.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2021-06-01 09:57:48 +02:00
ide-rea ef584a9df6
Improve wal.go segments sequential validation (#8859)
Signed-off-by: XiaoYu Zhang <ideoutrea@163.com>
2021-05-25 15:38:35 +05:30
kjinan e1370eecde typos correct
Signed-off-by: kjinan <2008kongxiangsheng@163.com>
2021-05-20 09:52:33 +08:00
Julien Pivotto ea33dbf80f
Merge pull request #8822 from kcx2366425574/main
remove unused param
2021-05-19 23:15:17 +02:00
Ben Ye 0a8912433a
allow compact series merger to be configurable (#8836)
Signed-off-by: yeya24 <yb532204897@gmail.com>
2021-05-18 18:38:37 +02:00
Matthias Loibl 7e7efaba32
storage: Split chunks if more than 120 samples (#8582)
* storage: Split chunks if more than 120 samples

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* storage: Don't set maxt which is overwritten right away

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* storage: Improve comments on merge_test

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* storage: Improve comments and move code closer to usage

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* tsdb/tsdbutil: Add comment for GenerateSamples

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
2021-05-18 18:37:16 +02:00
kjinan 24869ff2d0 typos correct
Signed-off-by: kjinan <2008kongxiangsheng@163.com>
2021-05-14 09:34:44 +08:00
kcx2366425574 be9c870b06 remove the param that is not used
Signed-off-by: kcx2366425574 <kuangcx@inspur.com>
2021-05-13 20:15:13 +08:00
ide-rea 277bac622a
validate exemplar labelSet length first (#8816)
* ignore check exemplar labelSet length when append

Signed-off-by: XiaoYu Zhang <ideoutrea@163.com>

* validate exemplar labelSet length firstly

Signed-off-by: XiaoYu Zhang <ideoutrea@163.com>
2021-05-12 20:17:05 +05:30
Callum Styan 8fd73b1d28
Add Exemplar Remote Write support (#8296)
* Write exemplars to the WAL and send them over remote write.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Update example for exemplars, print data in a more obvious format.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Add metrics for remote write of exemplars.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix incorrect slices passed to send in remote write.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* We need to unregister the new metrics.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address review comments

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Order of exemplar append vs write exemplar to WAL needs to change.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Several fixes to prevent sending uninitialized or incorrect samples with an exemplar. Fix dropping exemplar for missing series. Add tests for queue_manager sending exemplars

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Store both samples and exemplars in the same timeseries buffer to remove the alloc when building final request, keep sub-slices in separate buffers for re-use

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Condense sample/exemplar delivery tests to parameterized sub-tests

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Rename test methods for clarity now that they also handle exemplars

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Rename counter variable. Fix instances where metrics were not updated correctly

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Add exemplars to LoadWAL benchmark

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* last exemplars timestamp metric needs to convert value to seconds with
ms precision

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Process exemplar records in a separate go routine when loading the WAL.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address review comments related to clarifying comments and variable
names. Also refactor sample/exemplar to enqueue prompb types.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Regenerate types proto with comments, update protoc version again.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Put remote write of exemplars behind a feature flag.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address some of Ganesh's review comments.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Move exemplar remote write feature flag to a config file field.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address Bartek's review comments.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Don't allocate exemplar buffers in queue_manager if we're not going to
send exemplars over remote write.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Add ValidateExemplar function, validate exemplars when appending to head
and log them all to WAL before adding them to exemplar storage.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address more reivew comments from Ganesh.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Add exemplar total label length check.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address a few last review comments

Signed-off-by: Callum Styan <callumstyan@gmail.com>

Co-authored-by: Martin Disibio <mdisibio@gmail.com>
2021-05-06 13:53:52 -07:00
Marco Pracucci 4b49ffbad5
Stop the bleed on chunk mapper panic (#8723)
* Added test to reproduce panic on TSDB head chunks truncated while querying

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added test for Querier too

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Stop the bleed on mmap-ed head chunks panic

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Lower memory pressure in tests to ensure it doesn't OOM

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Skip TestQuerier_ShouldNotPanicIfHeadChunkIsTruncatedWhileReadingQueriedChunks

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Experiment to not trigger runtime.GC() continuously

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Try to fix test in CI

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Do not call runtime.GC() at all

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* I have no idea why it's failing in CI, skipping tests

Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-05-06 14:18:59 -06:00
Chris Marchbanks 7c7dafc321
Do not snappy encode if record is too large (#8790)
Snappy cannot encode records larger than ~3.7 GB and will panic if an
encoding is attempted. Check to make sure that the record is smaller
than this before encoding.

In the future, we could improve this behavior to still compress large
records (or break them up into smaller records), but this avoids the
panic for users with very large single scrape targets.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2021-05-06 12:56:45 -06:00
Ben Ye 8f05cd8f9e
tsdb: move exemplar series labels to index entry (#8783)
Signed-off-by: yeya24 <yb532204897@gmail.com>
2021-05-05 18:51:16 +01:00
Ben Ye 9e8df5ade9
check latest exemplar timestamp (#8782)
Signed-off-by: yeya24 <yb532204897@gmail.com>
2021-05-05 16:28:48 +01:00
Fiona Liao 9b83d8330a
Fix memSafeIterator.Seek() (#8748)
* Add range query test cases

This includes a couple of failing ones that double count some points due
to the iterator seek bug.

Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>

* Add Seek() implementation for memSafeIterator

Previously, calling memSafeIterator.Seek() would call the Seek() method
on its embedded iterator. This was causing the embedded iterator and the
memSafeIterator to get out of sync because when the embedded Seek()
moved to the next element of the embedded iterator, memSafeIterator
didn't "know" about it. memSafeIterator has to "know" when the embedded
iterator has moved to be able to work out when it should be reading from
its buffer rather than the embedded iterator.

Used same logic as for xorIterator.Seek() (which in runtime is used as
the embedded iterator) - return false if the iterator has an error and
try to move to next element if the required time hasn't been reached, or
if no elements have been read yet. The memSafeIterator.Next() method is
being called so memSafeIterator.i is always accurate.

Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>

* Add tsdb package test

Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>

Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
2021-04-27 00:43:22 +02:00
Marco Pracucci 52df5ef7a3
TSDB: do not allocate exemplars buffer if exemplars are disabled (#8746)
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-04-21 20:02:21 +05:30
Bartlomiej Plotka 80545bfb2e
Instrumented circular exemplar storage. (#8712)
* Instrumented circular storage.

Fixes: https://github.com/prometheus/prometheus/issues/8708
Fixes: https://github.com/prometheus/prometheus/issues/8707

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed CB.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Julien comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Callum comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2021-04-16 13:44:53 +01:00
nberkley f9e2dd0697
Add support for smaller block chunk segment allocations (#8478)
* Add support for --storage.tsdb.max-chunk-size to suport small chunks for space limited prometheus instances.

Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Update tsdb/compact.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Update tsdb/db.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Update cmd/prometheus/main.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Change naming scheme to

Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Add a lower bound to --storage.tsdb.max-block-chunk-segment-size

Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Update storage.md to explain what a chunk segment is

Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Apply suggestions from code review

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Force tests

Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Fix code style

Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
2021-04-15 14:25:01 +05:30
Christian Simon 9781e51f59
Correct spelling of "iterable" (#8713)
Signed-off-by: Christian Simon <simon@swine.de>
2021-04-12 21:43:42 +01:00
Guangwen Feng 7985c4e6af Fix golint issue caused by typo
Signed-off-by: Guangwen Feng <fenggw-fnst@cn.fujitsu.com>
2021-04-12 09:57:41 +08:00
jinglina 431ea75a11
remove redundant type conversion (#8692)
Signed-off-by: Christina Jing (荆丽娜) <jinglina_lc@inspur.com>
2021-04-05 17:00:48 +05:30
Björn Rabenstein 9549a15c6f
Merge pull request #7675 from JessicaGreben/jg/11-retroactive-rule-eval
Add rule importer to backfill
2021-03-29 19:09:21 +02:00