* support maxBlockDuration for promtool tsdb create-blocks-from rules
Fixes#9465
Signed-off-by: Will Tran <will@autonomic.ai>
* don't hardcode 2h as the default block size in rules test
Signed-off-by: Will Tran <will@autonomic.ai>
* Add a feature flag to enable the new manager
This PR creates a copy of the legacy manager and uses it by default.
It is a companion PR to #9349. With this PR, users can enable the new
discovery manager and provide us with any feedback / side effects that
the new behaviour might have.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Allow to tune the scrape tolerance
In most of the classic monitoring use cases, a few milliseconds
difference can be omitted.
In Prometheus, a few millisecond difference can however make a big
difference.
Currently, Prometheus will ignore up to 2 ms difference in the
alignments.
It turns out that for users who can afford a 10ms difference, there is a
lot of resources and disk space to win, as shown in this graph, which
shows the bytes / samples over a production Prometheus server. You can
clearly see the switch from 2ms to 10ms tolerance.
This pull request enables the adjustment of the scrape timestamp
alignment tolerance.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Fix golint
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
The compaction analysis which runs under promtool tsdb analyze can be an
intensive process which slows down the entire command.
This commit adds an --extended flag to tsdb analyze which can be toggled
for running long running tasks, such as compaction analysis.
Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
Trade space for speed. Convert all rules into our temporary struct, sort
and then iterate. This is a significant when having many rules.
Signed-off-by: Holger Hans Peter Freyther <holger@moiji-mobile.com>
Add a new built-in metric `scrape_timeout_seconds` to allow monitoring
of the ratio of scrape duration to the scrape timeout. Hide behind a
feature flag to avoid additional cardinality by default.
Signed-off-by: SuperQ <superq@gmail.com>
* PromQL: Fix start and end keywords masking label and metric names
This commit fixes an issue with the "at modifier" that introduced two
new keywords: `start` and `end`. In grouping options and in metric
names, these keywords took precedence over metric or label names, so
that those metrics and labels could no longer be referenced.
Signed-off-by: Clayton Peters <clayton.peters@man.com>
* Add in additional tests for metrics and/or labels called start/end.
Signed-off-by: Clayton Peters <clayton.peters@man.com>
* *: Cut 2.29.0-rc.0
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* VERSION: bump to 2.29.0-rc.0
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* Remove experimental wording on size-based retention
Followup of #9004
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Fix PR reference in changelog
Signed-off-by: George Brighton <george@gebn.co.uk>
* Describe EC2 availability zone IDs at most once per refresh (#9142)
Signed-off-by: George Brighton <george@gebn.co.uk>
* Describe EC2 availability zones at most once per SD load
Closes#9142.
Signed-off-by: George Brighton <george@gebn.co.uk>
* Incorporate feedback
Signed-off-by: George Brighton <george@gebn.co.uk>
* Integrate feedback
Signed-off-by: George Brighton <george@gebn.co.uk>
* Add a compatibility note for macOS users.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* *: Cut v2.29.0-rc.1
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* Fix `kuma_sd` targetgroup reporting (#9157)
* Bundle all xDS targets into a single group
Signed-off-by: austin ce <austin.cawley@gmail.com>
* *: cut v2.29.0-rc.2
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* Rename links
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* bump codemirror-promql to 0.17.0
Signed-off-by: Augustin Husson <husson.augustin@gmail.com>
* *: cut v2.29.0
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* tsdb: align atomically accessed int64 (#9192)
This prevents a panic in 32-bit archs:
https://pkg.go.dev/sync/atomic#pkg-note-BUGFixed#9190
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Release 2.29.1 (#9193)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Co-authored-by: Clayton Peters <clayton.peters@man.com>
Co-authored-by: Frederic Branczyk <fbranczyk@gmail.com>
Co-authored-by: George Brighton <george@gebn.co.uk>
Co-authored-by: Austin Cawley-Edwards <austin.cawley@gmail.com>
Co-authored-by: Levi Harrison <git@leviharrison.dev>
Co-authored-by: Augustin Husson <husson.augustin@gmail.com>
* Create experimental circular buffer resize method, benchmarks
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Optimize exemplar resize to only replay as many exemplars as needed
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* More comments, benchmark AddExemplar
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* optimizations
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* comment
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Slight refactor of resize benchmark + make use of resize via runtime
reloadable storage config.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Some more config related changes.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address some review comments.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address more review comments.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Refactor to remove usage of noopExemplarStorage and avoid race condition
when resizing from Head code.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Fix or add comments to clarify some of the new behaviour.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* fix potential panics related to negative exemplar buffer lengths
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Callum Styan <callumstyan@gmail.com>
* Extend promtool to support compaction analysis
This commit extends the promtool tsdb analyze command to help
troubleshoot high Prometheus disk usage. The command now plots a
distribution of how full chunks are relative to the maximum capacity of
120 samples per chunk.
Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
* Update cmd/promtool/tsdb.go
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Added feature flag support to unit tests
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Added/fixed tests
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Addressed review comments
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* promtool: backfill: allow configuring block duration
When backfilling large amounts of data across long periods of time, it
may in certain circumstances be useful to use a longer block duration to
increase the efficiency and speed of the backfilling process. This patch
adds a flag --block-duration-power to allow a user to choose the power N
where the block duration is 2^(N+1)h.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
* promtool: use sub-tests in backfill testing
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
* backfill: add messages to tests for clarity
When someone new breaks a test, seeing "expected: false, got: true" is
really not useful. A nice message helps here.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
* backfill: test long block durations
A test that uses a long block duration to write bigger blocks is added.
The check to make sure all blocks are the default duration is removed.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
When using the backfill command to add data to an ephemeral/test
Prometheus instance, it is not important to see which data was added as
it is often generated ahead of time and mostly irrelevant to the
use-case. The current approach prints information about each block that
is written, but does so in a generally inefficient and costly manner.
This patch adds a `--quiet` flag that allows a user to opt out of this
behavior.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
* Added walreplay API endpoint
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Added starting page to react-ui
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Documented the new endpoint
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Fixed typos
Signed-off-by: Levi Harrison <git@leviharrison.dev>
Co-authored-by: Julius Volz <julius.volz@gmail.com>
* Removed logo
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed isResponding to isUnexpected
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed width of progress bar
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed width of progress bar
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Added DB stats object
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Updated starting page to work with new fields
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Passing nil
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Passing nil (pt. 2)
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Passing nil (pt. 3)
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Passing nil (and also implementing a method this time) (pt. 4)
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Passing nil (and also implementing a method this time) (pt. 5)
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed const to let
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Passing nil (pt. 6)
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Remove SetStats method
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Added comma
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed api
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed to triple equals
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Fixed data response types
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Don't return pointer
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed version
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Fixed interface issue
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Fixed pointer
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Fixed copying lock value error
Signed-off-by: Levi Harrison <git@leviharrison.dev>
Co-authored-by: Julius Volz <julius.volz@gmail.com>
* Update Go modules for 2.26
Bump all Go modules to the latest upstream.
Signed-off-by: Ben Kochie <superq@gmail.com>
* Fix promtool for new client_golang
LabelValues now requires a list of string matchers.
Signed-off-by: Ben Kochie <superq@gmail.com>
By default same value that was hardcoded is used, but with the
new flag added the number of scrapes can be increased to any value.
Signed-off-by: Pau Freixes <pfreixes@gmail.com>
This moves the label lookup into TSDB, whilst still keeping the cached-ref optimisation for repeated Appends.
This makes the API easier to consume and implement. In particular this change is motivated by the scrape-time-aggregation work, which I don't think is possible to implement without it as it needs access to label values.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Accepting alert_rule_test without alertname is confusing as it will
always pass with empty exp_alerts, and never with non-empty exp_alerts.
Signed-off-by: Raphael Bauduin <raphael.bauduin@tessares.net>
- Remove unrelated changes
- Refactor code out of the API module - that is already getting pretty crowded.
- Don't track reference for AddFast in remote write. This has the potential to consume unlimited server-side memory if a malicious client pushes a different label set for every series. For now, its easier and safer to always use the 'slow' path.
- Return 400 on out of order samples.
- Use remote.DecodeWriteRequest in the remote write adapters.
- Put this behing the 'remote-write-server' feature flag
- Add some (very) basic docs.
- Used named return & add test for commit error propagation
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
When printing the error, we still need access to the mmapped byte array
of the file. Therefore, we make sure that we run it before closing the
file.
I could have done something more complex like a defer, or not closing
the file, knowing that we would exit the program anyway. However, I
think that in case we extend this in the future, or this is copy/paster
elsewhere, we should continue closing the file. As it is small enough, I
went for the solution to call the function 3 times instead of playing
with a defer.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
This commit adds `@ <timestamp>` modifier as per this design doc: https://docs.google.com/document/d/1uSbD3T2beM-iX4-Hp7V074bzBRiRNlqUdcWP6JTDQSs/edit.
An example query:
```
rate(process_cpu_seconds_total[1m])
and
topk(7, rate(process_cpu_seconds_total[1h] @ 1234))
```
which ranks based on last 1h rate and w.r.t. unix timestamp 1234 but actually plots the 1m rate.
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
* Backfilling: optimize for non-consecutive blocks
When you have missing data for > 2 hours, you spend a lot of time
re-reading the complete file. It is not optimal.
This introduces a fastpath for this scenario.
Next, we do parse the metric even when we know we will not use it, based
on its timestamp. This only computes the metric when we know its
timestamp is right.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
I initially thought I could somehow rescue the current column layout
by recycling the tabwriter, but flushing completely blanks
it. However, by setting a minimum width of 13, we get a slightly
broader DURATION column but otherwise nice formatting, unless numbers
get really big, but that's OK, I guess.
Before:
```
BLOCK ULID MIN TIME MAX TIME DURATION NUM SAMPLES NUM CHUNKS NUM SERIES SIZE
01ETN0KGNP5WWK9T5QMQGBG9F1 2020-11-19 07:39:17 +0000 UTC 2020-11-19 07:44:17 +0000 UTC 5m0.001s 8 2 2 624B
01ETN0KGQSFF0AB2QDZVQG3CWC 2020-11-19 10:25:57 +0000 UTC 2020-11-19 10:30:57 +0000 UTC 5m0.001s 8 2 2 622B
01ETN0KGSW8KYP3YPG4X20P60Z 2020-11-19 13:12:37 +0000 UTC 2020-11-19 13:17:37 +0000 UTC 5m0.001s 8 2 2 625B
```
After:
```
BLOCK ULID MIN TIME MAX TIME DURATION NUM SAMPLES NUM CHUNKS NUM SERIES SIZE
01ETN0R72SXN9A1FG732P7KFFN 2020-11-19 07:39:17 +0000 UTC 2020-11-19 07:44:17 +0000 UTC 5m0.001s 8 2 2 624B
01ETN0R74Y9AG1A1MKN4MZK7WM 2020-11-19 10:25:57 +0000 UTC 2020-11-19 10:30:57 +0000 UTC 5m0.001s 8 2 2 622B
01ETN0R76KXZ5VQECMDNES49J6 2020-11-19 13:12:37 +0000 UTC 2020-11-19 13:17:37 +0000 UTC 5m0.001s 8 2 2 625B
```
After without the `-r` flag:
```
BLOCK ULID MIN TIME MAX TIME DURATION NUM SAMPLES NUM CHUNKS NUM SERIES SIZE
01ETN0RFFJ42274NWR1GH0RTV6 1605771557000 1605771857001 5m0.001s 8 2 2 624
01ETN0RFJ1MZCHHS2SBZS8XC27 1605781557000 1605781857001 5m0.001s 8 2 2 622
01ETN0RFM98N3V4KD2DZXFGHGN 1605791557000 1605791857001 5m0.001s 8 2 2 625
```
Signed-off-by: beorn7 <beorn@grafana.com>