* Initial draft of prometheus-agent
This commit introduces a new binary, prometheus-agent, based on the
Grafana Agent code. It runs a WAL-only version of prometheus without the
TSDB, alerting, or rule evaluations. It is intended to be used to
remote_write to Prometheus or another remote_write receiver.
By default, prometheus-agent will listen on port 9095 to not collide
with the prometheus default of 9090.
Truncation of the WAL cooperates on a best-effort case with Remote
Write. Every time the WAL is truncated, the minimum timestamp of data to
truncate is determined by the lowest sent timestamp of all samples
across all remote_write endpoints. This gives loose guarantees that data
from the WAL will not try to be removed until the maximum sample
lifetime passes or remote_write starts functionining.
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
* add tests for Prometheus agent (#22)
* add tests for Prometheus agent
* add tests for Prometheus agent
* rearranged tests as per the review comments
* update tests for Agent
* changes as per code review comments
Signed-off-by: SriKrishna Paparaju <paparaju@gmail.com>
* incremental changes to prometheus agent
Signed-off-by: SriKrishna Paparaju <paparaju@gmail.com>
* changes as per code review comments
Signed-off-by: SriKrishna Paparaju <paparaju@gmail.com>
* Commit feedback from code review
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
* Port over some comments from grafana/agent
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
* Rename agent.Storage to agent.DB for tsdb consistency
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
* Consolidate agentMode ifs in cmd/prometheus/main.go
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
* Document PreAction usage requirements better for agent mode flags
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
* remove unnecessary defaultListenAddr
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
* `go fmt ./tsdb/agent` and fix lint errors
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
Co-authored-by: SriKrishna Paparaju <paparaju@gmail.com>
Avoid using %#v, nothing needs to parse this, so escaping " and so on
leads to hard to read output.
Add new lines, number and indentation to each alert series output.
Signed-off-by: David Leadbeater <dgl@dgl.cx>
* support maxBlockDuration for promtool tsdb create-blocks-from rules
Fixes#9465
Signed-off-by: Will Tran <will@autonomic.ai>
* don't hardcode 2h as the default block size in rules test
Signed-off-by: Will Tran <will@autonomic.ai>
* Add a feature flag to enable the new manager
This PR creates a copy of the legacy manager and uses it by default.
It is a companion PR to #9349. With this PR, users can enable the new
discovery manager and provide us with any feedback / side effects that
the new behaviour might have.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
A lot of this code was hacked together, literally during a
hackathon. This commit intends not to change the code substantially,
but just make the code obey the usual style practices.
A (possibly incomplete) list of areas:
* Generally address linter warnings.
* The `pgk` directory is deprecated as per dev-summit. No new packages should
be added to it. I moved the new `pkg/histogram` package to `model`
anticipating what's proposed in #9478.
* Make the naming of the Sparse Histogram more consistent. Including
abbreviations, there were just too many names for it: SparseHistogram,
Histogram, Histo, hist, his, shs, h. The idea is to call it "Histogram" in
general. Only add "Sparse" if it is needed to avoid confusion with
conventional Histograms (which is rare because the TSDB really has no notion
of conventional Histograms). Use abbreviations only in local scope, and then
really abbreviate (not just removing three out of seven letters like in
"Histo"). This is in the spirit of
https://github.com/golang/go/wiki/CodeReviewComments#variable-names
* Several other minor name changes.
* A lot of formatting of doc comments. For one, following
https://github.com/golang/go/wiki/CodeReviewComments#comment-sentences
, but also layout question, anticipating how things will look like
when rendered by `godoc` (even where `godoc` doesn't render them
right now because they are for unexported types or not a doc comment
at all but just a normal code comment - consistency is queen!).
* Re-enabled `TestQueryLog` and `TestEndopints` (they pass now,
leaving them disabled was presumably an oversight).
* Bucket iterator for histogram.Histogram is now created with a
method.
* HistogramChunk.iterator now allows iterator recycling. (I think
@dieterbe only commented it out because he was confused by the
question in the comment.)
* HistogramAppender.Append panics now because we decided to treat
staleness marker differently.
Signed-off-by: beorn7 <beorn@grafana.com>
* Allow to tune the scrape tolerance
In most of the classic monitoring use cases, a few milliseconds
difference can be omitted.
In Prometheus, a few millisecond difference can however make a big
difference.
Currently, Prometheus will ignore up to 2 ms difference in the
alignments.
It turns out that for users who can afford a 10ms difference, there is a
lot of resources and disk space to win, as shown in this graph, which
shows the bytes / samples over a production Prometheus server. You can
clearly see the switch from 2ms to 10ms tolerance.
This pull request enables the adjustment of the scrape timestamp
alignment tolerance.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Fix golint
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
The compaction analysis which runs under promtool tsdb analyze can be an
intensive process which slows down the entire command.
This commit adds an --extended flag to tsdb analyze which can be toggled
for running long running tasks, such as compaction analysis.
Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
Trade space for speed. Convert all rules into our temporary struct, sort
and then iterate. This is a significant when having many rules.
Signed-off-by: Holger Hans Peter Freyther <holger@moiji-mobile.com>
Add a new built-in metric `scrape_timeout_seconds` to allow monitoring
of the ratio of scrape duration to the scrape timeout. Hide behind a
feature flag to avoid additional cardinality by default.
Signed-off-by: SuperQ <superq@gmail.com>
* PromQL: Fix start and end keywords masking label and metric names
This commit fixes an issue with the "at modifier" that introduced two
new keywords: `start` and `end`. In grouping options and in metric
names, these keywords took precedence over metric or label names, so
that those metrics and labels could no longer be referenced.
Signed-off-by: Clayton Peters <clayton.peters@man.com>
* Add in additional tests for metrics and/or labels called start/end.
Signed-off-by: Clayton Peters <clayton.peters@man.com>
* *: Cut 2.29.0-rc.0
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* VERSION: bump to 2.29.0-rc.0
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* Remove experimental wording on size-based retention
Followup of #9004
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Fix PR reference in changelog
Signed-off-by: George Brighton <george@gebn.co.uk>
* Describe EC2 availability zone IDs at most once per refresh (#9142)
Signed-off-by: George Brighton <george@gebn.co.uk>
* Describe EC2 availability zones at most once per SD load
Closes#9142.
Signed-off-by: George Brighton <george@gebn.co.uk>
* Incorporate feedback
Signed-off-by: George Brighton <george@gebn.co.uk>
* Integrate feedback
Signed-off-by: George Brighton <george@gebn.co.uk>
* Add a compatibility note for macOS users.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* *: Cut v2.29.0-rc.1
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* Fix `kuma_sd` targetgroup reporting (#9157)
* Bundle all xDS targets into a single group
Signed-off-by: austin ce <austin.cawley@gmail.com>
* *: cut v2.29.0-rc.2
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* Rename links
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* bump codemirror-promql to 0.17.0
Signed-off-by: Augustin Husson <husson.augustin@gmail.com>
* *: cut v2.29.0
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* tsdb: align atomically accessed int64 (#9192)
This prevents a panic in 32-bit archs:
https://pkg.go.dev/sync/atomic#pkg-note-BUGFixed#9190
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Release 2.29.1 (#9193)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Co-authored-by: Clayton Peters <clayton.peters@man.com>
Co-authored-by: Frederic Branczyk <fbranczyk@gmail.com>
Co-authored-by: George Brighton <george@gebn.co.uk>
Co-authored-by: Austin Cawley-Edwards <austin.cawley@gmail.com>
Co-authored-by: Levi Harrison <git@leviharrison.dev>
Co-authored-by: Augustin Husson <husson.augustin@gmail.com>
* Fix `kuma_sd` targetgroup reporting (#9157)
* Bundle all xDS targets into a single group
Signed-off-by: austin ce <austin.cawley@gmail.com>
* Snapshot in-memory chunks on shutdown for faster restarts (#7229)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Rename links
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Remove Individual Data Type Caps in Per-shard Buffering for Remote Write (#8921)
* Moved everything to nPending buffer
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Simplify exemplar capacity addition
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Added pre-allocation
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Don't allocate if not sending exemplars
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Avoid deadlock when processing duplicate series record (#9170)
* Avoid deadlock when processing duplicate series record
`processWALSamples()` needs to be able to send on its output channel
before it can read the input channel, so reads to allow this in case the
output channel is full.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* processWALSamples: update comment
Previous text seems to relate to an earlier implementation.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Optimise WAL loading by removing extra map and caching min-time (#9160)
* BenchmarkLoadWAL: close WAL after use
So that goroutines are stopped and resources released
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* BenchmarkLoadWAL: make series IDs co-prime with #workers
Series are distributed across workers by taking the modulus of the
ID with the number of workers, so multiples of 100 are a poor choice.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* BenchmarkLoadWAL: simulate mmapped chunks
Real Prometheus cuts chunks every 120 samples, then skips those samples
when re-reading the WAL. Simulate this by creating a single mapped chunk
for each series, since the max time is all the reader looks at.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Fix comment
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Remove series map from processWALSamples()
The locks that is commented to reduce contention in are now sharded
32,000 ways, so won't be contended. Removing the map saves memory and
goes just as fast.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* loadWAL: Cache the last mmapped chunk time
So we can skip calling append() for samples it will reject.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Improvements from code review
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Full stops and capitals on comments
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Cache max time in both places mmappedChunks is updated
Including refactor to extract function `setMMappedChunks`, to reduce
code duplication.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Update head min/max time when mmapped chunks added
This ensures we have the correct values if no WAL samples are added for
that series.
Note that `mSeries.maxTime()` was always `math.MinInt64` before, since
that function doesn't consider mmapped chunks.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Split Go and React Tests (#8897)
* Added go-ci and react-ci
Co-authored-by: Julien Pivotto <roidelapluie@inuits.eu>
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Remove search keymap from new expression editor (#9184)
Signed-off-by: Julius Volz <julius.volz@gmail.com>
Co-authored-by: Austin Cawley-Edwards <austin.cawley@gmail.com>
Co-authored-by: Levi Harrison <git@leviharrison.dev>
Co-authored-by: Julien Pivotto <roidelapluie@inuits.eu>
Co-authored-by: Bryan Boreham <bjboreham@gmail.com>
Co-authored-by: Julius Volz <julius.volz@gmail.com>
* Create experimental circular buffer resize method, benchmarks
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Optimize exemplar resize to only replay as many exemplars as needed
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* More comments, benchmark AddExemplar
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* optimizations
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* comment
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Slight refactor of resize benchmark + make use of resize via runtime
reloadable storage config.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Some more config related changes.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address some review comments.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address more review comments.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Refactor to remove usage of noopExemplarStorage and avoid race condition
when resizing from Head code.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Fix or add comments to clarify some of the new behaviour.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* fix potential panics related to negative exemplar buffer lengths
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Callum Styan <callumstyan@gmail.com>
* Extend promtool to support compaction analysis
This commit extends the promtool tsdb analyze command to help
troubleshoot high Prometheus disk usage. The command now plots a
distribution of how full chunks are relative to the maximum capacity of
120 samples per chunk.
Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
* Update cmd/promtool/tsdb.go
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Added feature flag support to unit tests
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Added/fixed tests
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Addressed review comments
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Append sparse histograms into the Head block
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* Add AtHistogram() to Iterator interface. Make HistoChunk conform to Chunk interface.
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
* promtool: backfill: allow configuring block duration
When backfilling large amounts of data across long periods of time, it
may in certain circumstances be useful to use a longer block duration to
increase the efficiency and speed of the backfilling process. This patch
adds a flag --block-duration-power to allow a user to choose the power N
where the block duration is 2^(N+1)h.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
* promtool: use sub-tests in backfill testing
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
* backfill: add messages to tests for clarity
When someone new breaks a test, seeing "expected: false, got: true" is
really not useful. A nice message helps here.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
* backfill: test long block durations
A test that uses a long block duration to write bigger blocks is added.
The check to make sure all blocks are the default duration is removed.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
When using the backfill command to add data to an ephemeral/test
Prometheus instance, it is not important to see which data was added as
it is often generated ahead of time and mostly irrelevant to the
use-case. The current approach prints information about each block that
is written, but does so in a generally inefficient and costly manner.
This patch adds a `--quiet` flag that allows a user to opt out of this
behavior.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
* Added walreplay API endpoint
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Added starting page to react-ui
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Documented the new endpoint
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Fixed typos
Signed-off-by: Levi Harrison <git@leviharrison.dev>
Co-authored-by: Julius Volz <julius.volz@gmail.com>
* Removed logo
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed isResponding to isUnexpected
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed width of progress bar
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed width of progress bar
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Added DB stats object
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Updated starting page to work with new fields
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Passing nil
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Passing nil (pt. 2)
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Passing nil (pt. 3)
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Passing nil (and also implementing a method this time) (pt. 4)
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Passing nil (and also implementing a method this time) (pt. 5)
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed const to let
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Passing nil (pt. 6)
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Remove SetStats method
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Added comma
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed api
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed to triple equals
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Fixed data response types
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Don't return pointer
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Changed version
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Fixed interface issue
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Fixed pointer
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* Fixed copying lock value error
Signed-off-by: Levi Harrison <git@leviharrison.dev>
Co-authored-by: Julius Volz <julius.volz@gmail.com>
* Update Go modules for 2.26
Bump all Go modules to the latest upstream.
Signed-off-by: Ben Kochie <superq@gmail.com>
* Fix promtool for new client_golang
LabelValues now requires a list of string matchers.
Signed-off-by: Ben Kochie <superq@gmail.com>
By default same value that was hardcoded is used, but with the
new flag added the number of scrapes can be increased to any value.
Signed-off-by: Pau Freixes <pfreixes@gmail.com>
This moves the label lookup into TSDB, whilst still keeping the cached-ref optimisation for repeated Appends.
This makes the API easier to consume and implement. In particular this change is motivated by the scrape-time-aggregation work, which I don't think is possible to implement without it as it needs access to label values.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Accepting alert_rule_test without alertname is confusing as it will
always pass with empty exp_alerts, and never with non-empty exp_alerts.
Signed-off-by: Raphael Bauduin <raphael.bauduin@tessares.net>
- Remove unrelated changes
- Refactor code out of the API module - that is already getting pretty crowded.
- Don't track reference for AddFast in remote write. This has the potential to consume unlimited server-side memory if a malicious client pushes a different label set for every series. For now, its easier and safer to always use the 'slow' path.
- Return 400 on out of order samples.
- Use remote.DecodeWriteRequest in the remote write adapters.
- Put this behing the 'remote-write-server' feature flag
- Add some (very) basic docs.
- Used named return & add test for commit error propagation
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>