While implementing a different feature, I found that Labels.Get() was
performing a linear search. I wondered whether it would perform any
better with a binary search approach, and wrote a benchmark: the answer
is that it's probably doesn't worth it, so I just decided to leave the
benchmark and the results for the next reader.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Use global string map for MatchType.String()
We were unnecessarily creating a new map for each call of String().
This is a 10x improvement in MatchType.String() performance in time,
from 53ns to 4ns on my i7 laptop, and I guess that this method is being
called quite often so why throw out the resources.
I was surprised that benchmark says that there are no allocations made
in the old version.
I also tries using `//go:generate stringer` and the result is even
better, at about 2.8ns, but having to keep the generated code updated
isn't worth the change (at least it's bigger than a small change I was
intended to do)
Benchmark comparison:
name \ time/op old global_map stringer
MatchType_String 53.6ns ± 1% 4.1ns ± 1% 2.8ns ± 1%
name \ alloc/op old global_map stringer
MatchType_String 0.00B 0.00B 0.00B
name \ allocs/op old global_map stringer
MatchType_String 0.00 0.00 0.00
Old benchmark:
goos: darwin
goarch: amd64
pkg: github.com/prometheus/prometheus/pkg/labels
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkMatchType_String 21766578 54.36 ns/op 0 B/op 0 allocs/op
BenchmarkMatchType_String 21742339 53.28 ns/op 0 B/op 0 allocs/op
BenchmarkMatchType_String 21985470 53.37 ns/op 0 B/op 0 allocs/op
BenchmarkMatchType_String 21676282 53.50 ns/op 0 B/op 0 allocs/op
BenchmarkMatchType_String 22075573 53.33 ns/op 0 B/op 0 allocs/op
PASS
ok github.com/prometheus/prometheus/pkg/labels 6.252s
New with global map:
goos: darwin
goarch: amd64
pkg: github.com/prometheus/prometheus/pkg/labels
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkMatchType_String 283412692 4.129 ns/op 0 B/op 0 allocs/op
BenchmarkMatchType_String 294859941 4.091 ns/op 0 B/op 0 allocs/op
BenchmarkMatchType_String 295750158 4.113 ns/op 0 B/op 0 allocs/op
BenchmarkMatchType_String 282827982 4.072 ns/op 0 B/op 0 allocs/op
BenchmarkMatchType_String 292942393 4.047 ns/op 0 B/op 0 allocs/op
PASS
ok github.com/prometheus/prometheus/pkg/labels 8.238s
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Use array instead of map
Since MatchType is an iota type, we can safely use an array here.
This solution is even better:
name \ time/op old global_map stringer array
MatchType_String 53.6ns ± 1% 4.1ns ± 1% 2.8ns ± 1% 1.0ns ± 1%
name \ alloc/op old global_map stringer array
MatchType_String 0.00B 0.00B 0.00B 0.00B
name \ allocs/op old global_map stringer array
MatchType_String 0.00 0.00 0.00 0.00
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Benchmark all MatchType values
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Use constants for limits
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
The previous code re-used series IDs 1-1000 many times over, so a lot
of time was spent ensuring the lists of series were in ascending order.
The intended use of `MemPostings.Add()` is that all series IDs are
unique, and changing the benchmark to do this lets it finish ten times
faster.
(It doesn't affect the benchmark result, just the setup code)
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* PromQL: Fix start and end keywords masking label and metric names
This commit fixes an issue with the "at modifier" that introduced two
new keywords: `start` and `end`. In grouping options and in metric
names, these keywords took precedence over metric or label names, so
that those metrics and labels could no longer be referenced.
Signed-off-by: Clayton Peters <clayton.peters@man.com>
* Add in additional tests for metrics and/or labels called start/end.
Signed-off-by: Clayton Peters <clayton.peters@man.com>
* *: Cut 2.29.0-rc.0
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* VERSION: bump to 2.29.0-rc.0
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* Remove experimental wording on size-based retention
Followup of #9004
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Fix PR reference in changelog
Signed-off-by: George Brighton <george@gebn.co.uk>
* Describe EC2 availability zone IDs at most once per refresh (#9142)
Signed-off-by: George Brighton <george@gebn.co.uk>
* Describe EC2 availability zones at most once per SD load
Closes#9142.
Signed-off-by: George Brighton <george@gebn.co.uk>
* Incorporate feedback
Signed-off-by: George Brighton <george@gebn.co.uk>
* Integrate feedback
Signed-off-by: George Brighton <george@gebn.co.uk>
* Add a compatibility note for macOS users.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* *: Cut v2.29.0-rc.1
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* Fix `kuma_sd` targetgroup reporting (#9157)
* Bundle all xDS targets into a single group
Signed-off-by: austin ce <austin.cawley@gmail.com>
* *: cut v2.29.0-rc.2
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* Rename links
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* bump codemirror-promql to 0.17.0
Signed-off-by: Augustin Husson <husson.augustin@gmail.com>
* *: cut v2.29.0
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* tsdb: align atomically accessed int64 (#9192)
This prevents a panic in 32-bit archs:
https://pkg.go.dev/sync/atomic#pkg-note-BUGFixed#9190
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Release 2.29.1 (#9193)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Co-authored-by: Clayton Peters <clayton.peters@man.com>
Co-authored-by: Frederic Branczyk <fbranczyk@gmail.com>
Co-authored-by: George Brighton <george@gebn.co.uk>
Co-authored-by: Austin Cawley-Edwards <austin.cawley@gmail.com>
Co-authored-by: Levi Harrison <git@leviharrison.dev>
Co-authored-by: Augustin Husson <husson.augustin@gmail.com>
* BenchmarkLoadWAL: close WAL after use
So that goroutines are stopped and resources released
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* BenchmarkLoadWAL: make series IDs co-prime with #workers
Series are distributed across workers by taking the modulus of the
ID with the number of workers, so multiples of 100 are a poor choice.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* BenchmarkLoadWAL: simulate mmapped chunks
Real Prometheus cuts chunks every 120 samples, then skips those samples
when re-reading the WAL. Simulate this by creating a single mapped chunk
for each series, since the max time is all the reader looks at.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Fix comment
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Remove series map from processWALSamples()
The locks that is commented to reduce contention in are now sharded
32,000 ways, so won't be contended. Removing the map saves memory and
goes just as fast.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* loadWAL: Cache the last mmapped chunk time
So we can skip calling append() for samples it will reject.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Improvements from code review
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Full stops and capitals on comments
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Cache max time in both places mmappedChunks is updated
Including refactor to extract function `setMMappedChunks`, to reduce
code duplication.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Update head min/max time when mmapped chunks added
This ensures we have the correct values if no WAL samples are added for
that series.
Note that `mSeries.maxTime()` was always `math.MinInt64` before, since
that function doesn't consider mmapped chunks.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Avoid deadlock when processing duplicate series record
`processWALSamples()` needs to be able to send on its output channel
before it can read the input channel, so reads to allow this in case the
output channel is full.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* processWALSamples: update comment
Previous text seems to relate to an earlier implementation.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Moved everything to nPending buffer
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Simplify exemplar capacity addition
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Added pre-allocation
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Don't allocate if not sending exemplars
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* optimize Linode SD by polling for event changes during refresh
Most accounts are fairly "static", in the sense that they're not cycling
through instances constantly. So rather than do a full refresh every
interval and potentially make several behind-the-scenes paginated API
calls, this will now poll the `/account/events/` endpoint every minute
with a list of events that we care about. If a matching event is found,
we then do a full refresh.
Co-authored-by: William Smith <wsmith@linode.com>
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Signed-off-by: William Smith <wsmith@linode.com>
I was struggling to understand the purpose of this method until I
tweaked the tests, so I decided to write down my observations.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Refactor: pass segment-reading function as param
To allow a different implementation to be used when garbage-collecting.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* remote_write: reduce blocking from GC of series
Add a method `UpdateSeriesSegment()` which is used together with
`SeriesReset()` to garbage-collect old series. This allows us to
split the lock around queueManager series data and avoid blocking
`Append()` while reading series from the last checkpoint.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Cosmetic: review feedback on comments
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* remote-write benchmark: include GC of series
Reduce the total number of samples per iteration from 5000*5000
(25 million) which is too big for my laptop, to 1*10000.
Extend `createTimeseries()` to add additional labels, so that the
queue manager is doing more realistic work.
Move the Append() call to a background goroutine - this works because
TestWriteClient uses a WaitGroup to signal completion.
Call `StoreSeries()` and `SeriesReset()` while adding samples, to
simulate the garbage-collection that wal.Watcher does.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Change BenchmarkSampleDelivery to call UpdateSeriesSegment
This matches what Watcher.garbageCollectSeries() is doing now
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>