Commit graph

791 commits

Author SHA1 Message Date
Sunil Thaha 4bdaea7663
fix: storage.tsdb.path randomly initialised to data-agent/ (#9660)
Using the same variable for storage.tsdb.path and storage.agent.path
as below in main.go causes cfg.localStoragePath to be data/ or
data-agent/ at random.

  a.Flag("storage.tsdb.path", "Base path for metrics storage.").
      PreAction(serverOnlySetting()).
      Default("data/").StringVar(&cfg.localStoragePath)

  a.Flag("storage.agent.path", "Base path for metrics storage.").
      PreAction(agentOnlySetting()).
      Default("data-agent/").StringVar(&cfg.localStoragePath)
This patch fixes it by using a different variable for storage.agent.path

Signed-off-by: Sunil Thaha sthaha@redhat.com

Signed-off-by: Sunil Thaha <sthaha@redhat.com>
2021-11-04 10:08:01 +00:00
Bartlomiej Plotka e68ccc7708
Fix misleading agent-only/server-only check messages. (#9650)
* Fix misleading agent-only/server-only check messages.

Issue:

```
[root@host01 ~]# docker run -it --net=host --rm -v /root/editor/prom-agent-batcopter.yaml:/etc/prometheus/prometheus.yaml -v /root/prom-batcopter-data:/prometheus -u root --name prom-agent-batcopter quay.io/prometheus/prometheus:main --enable-feature=agent --config.file=/etc/prometheus/prometheus.yaml --storage.tsdb.path=/prometheus --web.listen-address=:9091
ts=2021-11-02T16:00:59.789Z caller=main.go:205 level=info msg="Experimental agent mode enabled."
The following flag(s) can not be used in agent mode: ["--enable-feature"]
```

Problem was that PreAction gives us all parsed flag. Context does not give us any info on what flag clause it was defined.

Also added info for flag help about being server or agent only.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* gofumpt.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2021-11-04 09:08:53 +00:00
Mateusz Gozdek c3beca72e2 cmd/prometheus: wait for Prometheus to shutdown in tests
So temporary data directory can be successfully removed, as on Windows,
directory cannot be in used while removal.

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
2021-11-02 20:14:19 +01:00
Mateusz Gozdek b7bdf6fab2 Fix imports formatting
According to
2829908806 (r58457095).

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
2021-11-02 19:52:34 +01:00
Mateusz Gozdek 1a6c2283a3 Format Go source files using 'gofumpt -w -s -extra'
Part of #9557

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
2021-11-02 19:52:34 +01:00
Julien Pivotto 807f46a1ed
Gate agent behind a feature flag, valide mode flags (#9620)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-11-02 13:03:35 +00:00
Darshan Chaudhary a7e554b158
add check service-discovery command (#8970)
Signed-off-by: darshanime <deathbullet@gmail.com>
2021-11-01 14:42:12 +01:00
Hu Shuai 4b799c361a
Fix in typo in cmd/prometheus/main.go (#9632)
Signed-off-by: Hu Shuai <hus.fnst@cn.fujitsu.com>
2021-11-01 16:08:23 +05:30
Arthur Silva Sens be2599c853
config: Make remote-write required for Agent mode (#9618)
* config: Make remote-write required for Agent mode

Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2021-10-30 01:41:40 +02:00
Robert Fratto bc72a718c4
Initial draft of prometheus-agent (#8785)
* Initial draft of prometheus-agent

This commit introduces a new binary, prometheus-agent, based on the
Grafana Agent code. It runs a WAL-only version of prometheus without the
TSDB, alerting, or rule evaluations. It is intended to be used to
remote_write to Prometheus or another remote_write receiver.

By default, prometheus-agent will listen on port 9095 to not collide
with the prometheus default of 9090.

Truncation of the WAL cooperates on a best-effort case with Remote
Write. Every time the WAL is truncated, the minimum timestamp of data to
truncate is determined by the lowest sent timestamp of all samples
across all remote_write endpoints. This gives loose guarantees that data
from the WAL will not try to be removed until the maximum sample
lifetime passes or remote_write starts functionining.

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

* add tests for Prometheus agent (#22)

* add tests for Prometheus agent

* add tests for Prometheus agent

* rearranged tests as per the review comments

* update tests for Agent

* changes as per code review comments

Signed-off-by: SriKrishna Paparaju <paparaju@gmail.com>

* incremental changes to prometheus agent

Signed-off-by: SriKrishna Paparaju <paparaju@gmail.com>

* changes as per code review comments

Signed-off-by: SriKrishna Paparaju <paparaju@gmail.com>

* Commit feedback from code review

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

* Port over some comments from grafana/agent

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

* Rename agent.Storage to agent.DB for tsdb consistency

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

* Consolidate agentMode ifs in cmd/prometheus/main.go

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

* Document PreAction usage requirements better for agent mode flags

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

* remove unnecessary defaultListenAddr

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

* `go fmt ./tsdb/agent` and fix lint errors

Signed-off-by: Robert Fratto <robertfratto@gmail.com>

Co-authored-by: SriKrishna Paparaju <paparaju@gmail.com>
2021-10-29 16:25:05 +01:00
David Leadbeater c91c2bbea5
promtool: Show more human readable got/exp output (#8064)
Avoid using %#v, nothing needs to parse this, so escaping " and so on
leads to hard to read output.

Add new lines, number and indentation to each alert series output.

Signed-off-by: David Leadbeater <dgl@dgl.cx>
2021-10-28 22:17:18 +11:00
DrAuYueng 69e309d202
Expose TargetsFromGroup/AlertmanagerFromGroup func and reuse this for (#9343)
static/file sd config check in promtool

Signed-off-by: DrAuYueng <ouyang1204@gmail.com>
2021-10-28 02:01:28 +02:00
Julien Pivotto 73255e15f6 Address golint failures from revive
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-10-23 00:53:11 +02:00
Will Tran 97b0738895
add --max-block-duration in promtool create-blocks-from rules (#9511)
* support maxBlockDuration for promtool tsdb create-blocks-from rules

Fixes #9465

Signed-off-by: Will Tran <will@autonomic.ai>

* don't hardcode 2h as the default block size in rules test

Signed-off-by: Will Tran <will@autonomic.ai>
2021-10-21 23:28:37 +02:00
Furkan Türkal 9d0058a09e
Bind port 0 in main_test (#9558)
Fixes #9499

Signed-off-by: Furkan <furkan.turkal@trendyol.com>
2021-10-21 14:59:20 +02:00
Julien Pivotto 432005826d
Add a feature flag to enable the new discovery manager (#9537)
* Add a feature flag to enable the new manager

This PR creates a copy of the legacy manager and uses it by default.

It is a companion PR to #9349. With this PR, users can enable the new
discovery manager and provide us with any feedback / side effects that
the new behaviour might have.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-10-20 10:15:54 +02:00
beorn7 a9008f5423 Merge branch 'main' into sparsehistogram 2021-10-19 17:14:23 +02:00
jessicagreben 60d0990886 add more explicit label values
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2021-10-18 01:04:13 +02:00
jessicagreben 3da87d2f39 add unit test to check label rule labels override
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2021-10-18 01:04:13 +02:00
Julien Pivotto f8372bc6b9 backfill: Apply rule labels after query labels
Fix #9419

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-10-18 01:04:13 +02:00
beorn7 7a8bb8222c Style cleanup of all the changes in sparsehistogram so far
A lot of this code was hacked together, literally during a
hackathon. This commit intends not to change the code substantially,
but just make the code obey the usual style practices.

A (possibly incomplete) list of areas:

* Generally address linter warnings.

* The `pgk` directory is deprecated as per dev-summit. No new packages should
  be added to it. I moved the new `pkg/histogram` package to `model`
  anticipating what's proposed in #9478.

* Make the naming of the Sparse Histogram more consistent. Including
  abbreviations, there were just too many names for it: SparseHistogram,
  Histogram, Histo, hist, his, shs, h. The idea is to call it "Histogram" in
  general. Only add "Sparse" if it is needed to avoid confusion with
  conventional Histograms (which is rare because the TSDB really has no notion
  of conventional Histograms). Use abbreviations only in local scope, and then
  really abbreviate (not just removing three out of seven letters like in
  "Histo"). This is in the spirit of
  https://github.com/golang/go/wiki/CodeReviewComments#variable-names

* Several other minor name changes.

* A lot of formatting of doc comments. For one, following
  https://github.com/golang/go/wiki/CodeReviewComments#comment-sentences
  , but also layout question, anticipating how things will look like
  when rendered by `godoc` (even where `godoc` doesn't render them
  right now because they are for unexported types or not a doc comment
  at all but just a normal code comment - consistency is queen!).

* Re-enabled `TestQueryLog` and `TestEndopints` (they pass now,
  leaving them disabled was presumably an oversight).

* Bucket iterator for histogram.Histogram is now created with a
  method.

* HistogramChunk.iterator now allows iterator recycling. (I think
  @dieterbe only commented it out because he was confused by the
  question in the comment.)

* HistogramAppender.Append panics now because we decided to treat
  staleness marker differently.

Signed-off-by: beorn7 <beorn@grafana.com>
2021-10-11 13:02:03 +02:00
beorn7 fd5ea4e0b5 Merge branch 'main' into sparsehistogram 2021-10-07 23:16:42 +02:00
Julien Pivotto bd217c58a7
Backfill: Do not query after --end (#9340)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-09-15 16:02:41 +02:00
Julien Pivotto 1ea774f184
Merge pull request #9339 from roidelapluie/remove-double-align
backfill: Do not align the start of the group since we align every rule.
2021-09-14 23:46:25 +02:00
Julien Pivotto 2bde71ec5f
Merge pull request #9338 from prometheus/release-2.30
merge back release 2.30
2021-09-14 23:46:11 +02:00
Julien Pivotto 691ce066fb backfill: Do not align the start of the group since we align every rule.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-09-14 23:13:06 +02:00
jessicagreben b0a21f9eab rm overlap, add label builder to fix name bug
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2021-09-13 10:32:08 -07:00
Julien Pivotto 0111aa987e
Merge pull request #9312 from fpetkovski/promtool-analyze-compaction
promtool: add extended flag for tsdb analysis
2021-09-08 17:27:01 +02:00
Julien Pivotto 48a101be1b
Allow to tune the scrape tolerance (#9283)
* Allow to tune the scrape tolerance

In most of the classic monitoring use cases, a few milliseconds
difference can be omitted.

In Prometheus, a few millisecond difference can however make a big
difference.

Currently, Prometheus will ignore up to 2 ms difference in the
alignments.

It turns out that for users who can afford a 10ms difference, there is a
lot of resources and disk space to win, as shown in this graph, which
shows the bytes / samples over a production Prometheus server. You can
clearly see the switch from 2ms to 10ms tolerance.

This pull request enables the adjustment of the scrape timestamp
alignment tolerance.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Fix golint

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-09-08 17:27:33 +05:30
fpetkovski 449f874679 promtool: add extended flag for tsdb analysis
The compaction analysis which runs under promtool tsdb analyze can be an
intensive process which slows down the entire command.

This commit adds an --extended flag to tsdb analyze which can be toggled
for running long running tasks, such as compaction analysis.

Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
2021-09-08 10:50:01 +02:00
Julien Pivotto ad642a85c0
Merge pull request #9304 from LeviHarrison/backfill-fix-date
Rules backfill: fix new rule importer message
2021-09-07 18:01:03 +02:00
Julien Pivotto bd24e2fb92
Merge pull request #9303 from LeviHarrison/backfill-return-1
Rules backfill: return 1 if unsuccessful
2021-09-07 18:00:42 +02:00
Levi Harrison ded95ff434
Fix new rule importer message
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-09-06 22:19:29 -04:00
Levi Harrison 34e1b47968
Fixed error handling
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-09-06 21:55:57 -04:00
Holger Hans Peter Freyther 5edec40d60 promtool: Speed up checking for duplicate rules
Trade space for speed. Convert all rules into our temporary struct, sort
and then iterate. This is a significant when having many rules.

Signed-off-by: Holger Hans Peter Freyther <holger@moiji-mobile.com>
2021-09-06 23:10:26 +08:00
Holger Hans Peter Freyther 3a309c1ae5 promtool: Add simple benchmark checkDuplicates benchmark
Add a simple benchmark with a large number of rules.

Signed-off-by: Holger Hans Peter Freyther <holger@moiji-mobile.com>
2021-09-06 23:10:26 +08:00
Holger Hans Peter Freyther 794937b3d6 promtool: Add testcase for detecting duplicates
Introduce a basic test for checking for duplicate rules.

Signed-off-by: Holger Hans Peter Freyther <holger@moiji-mobile.com>
2021-09-06 23:10:26 +08:00
SuperQ 31f4108758
Add scrape_timeout_seconds metric
Add a new built-in metric `scrape_timeout_seconds` to allow monitoring
of the ratio of scrape duration to the scrape timeout. Hide behind a
feature flag to avoid additional cardinality by default.

Signed-off-by: SuperQ <superq@gmail.com>
2021-09-02 12:15:35 +02:00
SuperQ e167a45c65
Add new Go build tags.
Add new go:build comments based on 1.17 formatting[0].

[0]: https://golang.org/doc/go1.17#gofmt

Signed-off-by: SuperQ <superq@gmail.com>
2021-08-27 10:24:14 +02:00
Julien Pivotto cab96a06ef
Merge release 2.29 in main (#9196)
* PromQL: Fix start and end keywords masking label and metric names

This commit fixes an issue with the "at modifier" that introduced two
new keywords: `start` and `end`. In grouping options and in metric
names, these keywords took precedence over metric or label names, so
that those metrics and labels could no longer be referenced.

Signed-off-by: Clayton Peters <clayton.peters@man.com>

* Add in additional tests for metrics and/or labels called start/end.

Signed-off-by: Clayton Peters <clayton.peters@man.com>

* *: Cut 2.29.0-rc.0

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* VERSION: bump to 2.29.0-rc.0

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* Remove experimental wording on size-based retention

Followup of #9004

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Fix PR reference in changelog

Signed-off-by: George Brighton <george@gebn.co.uk>

* Describe EC2 availability zone IDs at most once per refresh (#9142)

Signed-off-by: George Brighton <george@gebn.co.uk>

* Describe EC2 availability zones at most once per SD load

Closes #9142.

Signed-off-by: George Brighton <george@gebn.co.uk>

* Incorporate feedback

Signed-off-by: George Brighton <george@gebn.co.uk>

* Integrate feedback

Signed-off-by: George Brighton <george@gebn.co.uk>

* Add a compatibility note for macOS users.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* *: Cut v2.29.0-rc.1

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* Fix `kuma_sd` targetgroup reporting (#9157)

* Bundle all xDS targets into a single group

Signed-off-by: austin ce <austin.cawley@gmail.com>

* *: cut v2.29.0-rc.2

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* Rename links

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* bump codemirror-promql to 0.17.0

Signed-off-by: Augustin Husson <husson.augustin@gmail.com>

* *: cut v2.29.0

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* tsdb: align atomically accessed int64 (#9192)

This prevents a panic in 32-bit archs:
https://pkg.go.dev/sync/atomic#pkg-note-BUG

Fixed #9190

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Release 2.29.1 (#9193)

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

Co-authored-by: Clayton Peters <clayton.peters@man.com>
Co-authored-by: Frederic Branczyk <fbranczyk@gmail.com>
Co-authored-by: George Brighton <george@gebn.co.uk>
Co-authored-by: Austin Cawley-Edwards <austin.cawley@gmail.com>
Co-authored-by: Levi Harrison <git@leviharrison.dev>
Co-authored-by: Augustin Husson <husson.augustin@gmail.com>
2021-08-12 18:38:06 +02:00
Ganesh Vernekar 095f572d4a
Sync sparsehistogram branch with main (#9189)
* Fix `kuma_sd` targetgroup reporting (#9157)

* Bundle all xDS targets into a single group

Signed-off-by: austin ce <austin.cawley@gmail.com>

* Snapshot in-memory chunks on shutdown for faster restarts (#7229)

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Rename links

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Remove Individual Data Type Caps in Per-shard Buffering for Remote Write (#8921)

* Moved everything to nPending buffer

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Simplify exemplar capacity addition

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Added pre-allocation

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Don't allocate if not sending exemplars

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Avoid deadlock when processing duplicate series record (#9170)

* Avoid deadlock when processing duplicate series record

`processWALSamples()` needs to be able to send on its output channel
before it can read the input channel, so reads to allow this in case the
output channel is full.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* processWALSamples: update comment

Previous text seems to relate to an earlier implementation.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Optimise WAL loading by removing extra map and caching min-time (#9160)

* BenchmarkLoadWAL: close WAL after use

So that goroutines are stopped and resources released

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* BenchmarkLoadWAL: make series IDs co-prime with #workers

Series are distributed across workers by taking the modulus of the
ID with the number of workers, so multiples of 100 are a poor choice.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* BenchmarkLoadWAL: simulate mmapped chunks

Real Prometheus cuts chunks every 120 samples, then skips those samples
when re-reading the WAL. Simulate this by creating a single mapped chunk
for each series, since the max time is all the reader looks at.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Fix comment

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Remove series map from processWALSamples()

The locks that is commented to reduce contention in are now sharded
32,000 ways, so won't be contended. Removing the map saves memory and
goes just as fast.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* loadWAL: Cache the last mmapped chunk time

So we can skip calling append() for samples it will reject.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Improvements from code review

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Full stops and capitals on comments

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Cache max time in both places mmappedChunks is updated

Including refactor to extract function `setMMappedChunks`, to reduce
code duplication.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Update head min/max time when mmapped chunks added

This ensures we have the correct values if no WAL samples are added for
that series.

Note that `mSeries.maxTime()` was always `math.MinInt64` before, since
that function doesn't consider mmapped chunks.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Split Go and React Tests (#8897)

* Added go-ci and react-ci

Co-authored-by: Julien Pivotto <roidelapluie@inuits.eu>
Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Remove search keymap from new expression editor (#9184)

Signed-off-by: Julius Volz <julius.volz@gmail.com>

Co-authored-by: Austin Cawley-Edwards <austin.cawley@gmail.com>
Co-authored-by: Levi Harrison <git@leviharrison.dev>
Co-authored-by: Julien Pivotto <roidelapluie@inuits.eu>
Co-authored-by: Bryan Boreham <bjboreham@gmail.com>
Co-authored-by: Julius Volz <julius.volz@gmail.com>
2021-08-11 15:43:17 +05:30
Ganesh Vernekar ee7e0071d1
Snapshot in-memory chunks on shutdown for faster restarts (#7229)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-08-06 17:51:01 +01:00
Ganesh Vernekar 8b70e87ab9
Merge remote-tracking branch 'upstream/main' into sparse-refactor
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-08-05 12:16:08 +05:30
jinglina ed24e51e7c
remove redundant type conversion (#9126)
Signed-off-by: jinglina <jinglinax@163.com>
2021-07-28 13:33:46 +05:30
Julien Pivotto 04f33e88f7
Merge pull request #9121 from LeviHarrison/revert-klog-fix
Revert klog fix
2021-07-27 14:07:59 +02:00
Levi Harrison 58556c19be Revert "Fix logging after the move to go-kit/log (#9021)"
This reverts commit 642722e5d0.

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-07-27 07:37:03 -04:00
Ganesh Vernekar 507d61fdeb
Remove experimental tag on --storage.tsdb.allow-overlapping-blocks (#9117)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-07-27 14:38:20 +05:30
Martin Disibio 1bcd13d6b5
Exemplar resize (#8974)
* Create experimental circular buffer resize method, benchmarks

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Optimize exemplar resize to only replay as many exemplars as needed

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* More comments, benchmark AddExemplar

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* optimizations

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* comment

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Slight refactor of resize benchmark + make use of resize via runtime
reloadable storage config.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Some more config related changes.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address some review comments.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address more review comments.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Refactor to remove usage of noopExemplarStorage and avoid race condition
when resizing from Head code.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix or add comments to clarify some of the new behaviour.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* fix potential panics related to negative exemplar buffer lengths

Signed-off-by: Callum Styan <callumstyan@gmail.com>

Co-authored-by: Callum Styan <callumstyan@gmail.com>
2021-07-20 10:22:57 +05:30
Levi Harrison 3b5257d869
Changed disabled_features to feature_flags
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-07-13 22:03:51 -04:00
Ganesh Vernekar 78d68d5972
Make query_range serve histograms (#9036)
* Modify query_range to serve only sparse histograms

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Finish CumulativeExpandSparseHistogram for positive schema

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix lint

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix bug and comment out tests for query_range

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix lint 2

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-07-03 19:23:56 +05:30
Filip Petkovski 7c125aa5fb
Promtool: Add support for compaction analysis (#8940)
* Extend promtool to support compaction analysis

This commit extends the promtool tsdb analyze command to help
troubleshoot high Prometheus disk usage. The command now plots a
distribution of how full chunks are relative to the maximum capacity of
120 samples per chunk.

Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>

* Update cmd/promtool/tsdb.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2021-07-02 11:08:52 +01:00
Julius Volz 441e6cd7d6
Merge release-2.28 back into main (#9035)
* Cut v2.28.0-rc.0 (#8954)

* Cut v2.28.0-rc.0

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Changelog fixup

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Address review comments

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Downgrade some features to enhancements

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Adjust release date to today

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Migrate HTTP SD docs from docs repo (#8972)

See discussion in https://github.com/prometheus/docs/pull/1975

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Cut Prometheus v2.28.0 (#8973)

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* HTTP SD: Allow charset in content type (#8981)

* Added content type regex

Signed-off-by: Levi Harrison <git@leviharrison.dev>
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* fixed disappeared target groups in http_sd #9019

Signed-off-by: servak <fservak@gmail.com>

* Add a testcase for http-sd

Signed-off-by: servak <fservak@gmail.com>

* HTTP SD: Simplify logic of disappeared targetgroups (#9026)

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Fix logging after the move to go-kit/log (#9021)

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Cut Prometheus v2.28.1 (#9034)

Signed-off-by: Julius Volz <julius.volz@gmail.com>

Co-authored-by: Levi Harrison <git@leviharrison.dev>
Co-authored-by: Julien Pivotto <roidelapluie@inuits.eu>
Co-authored-by: servak <fservak@gmail.com>
2021-07-01 18:02:13 +02:00
Levi Harrison 90976e7505
Promtool: Add feature flags to unit tests (#8958)
* Added feature flag support to unit tests

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Added/fixed tests

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Addressed review comments

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-06-30 22:43:39 +01:00
Ankit Goel d437cee73a
Move storage.tsdb.retention.size out of experimental #8728 (#9004)
* Move storage.tsdb.retention.size out of experimental #8728

Signed-off-by: Ankit Goel <ankit.goel@deliveryhero.com>
2021-06-30 01:30:11 +02:00
Levi Harrison ca1896c15b
Promtool: Validate service discovery files (#8950)
* Check SD files in promtool

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-06-29 17:32:59 +02:00
Ganesh Vernekar 04ad56d9b8
Append sparse histograms into the Head block (#9013)
* Append sparse histograms into the Head block

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Add AtHistogram() to Iterator interface. Make HistoChunk conform to Chunk interface.

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-06-29 20:08:46 +05:30
Steve Kuznetsov fd6c852567
promtool: backfill: allow configuring block duration (#8919)
* promtool: backfill: allow configuring block duration

When backfilling large amounts of data across long periods of time, it
may in certain circumstances be useful to use a longer block duration to
increase the efficiency and speed of the backfilling process. This patch
adds a flag --block-duration-power to allow a user to choose the power N
where the block duration is 2^(N+1)h.

Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>

* promtool: use sub-tests in backfill testing

Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>

* backfill: add messages to tests for clarity

When someone new breaks a test, seeing "expected: false, got: true" is
really not useful. A nice message helps here.

Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>

* backfill: test long block durations

A test that uses a long block duration to write bigger blocks is added.
The check to make sure all blocks are the default duration is removed.

Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
2021-06-29 14:53:38 +05:30
Ganesh Vernekar 64bea6999e
HistogramAppender interface for sparse histograms (#9007)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-06-28 20:30:55 +05:30
Ben Kochie 7cb55d5732
Merge pull request #8802 from mwasilew2/yaml-linting
Adds yamllinting to Makefile.common
2021-06-24 15:59:35 +02:00
Julien Pivotto ba76bceb6b
Merge pull request #8917 from stevekuznetsov/skuznets/silence-backfill
promtool: backfill: allow silencing output
2021-06-14 23:27:18 +02:00
Michal Wasilewski 3f686cad8b
fixes yamllint errors
Signed-off-by: Michal Wasilewski <mwasilewski@gmx.com>
2021-06-12 12:47:47 +02:00
Levi Harrison b5f6f8fb36 Switched to go-kit/log
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-06-11 12:28:36 -04:00
Steve Kuznetsov ee771a2a66
promtool: backfill: allow silencing output
When using the backfill command to add data to an ephemeral/test
Prometheus instance, it is not important to see which data was added as
it is often generated ahead of time and mostly irrelevant to the
use-case. The current approach prints information about each block that
is written, but does so in a generally inefficient and costly manner.
This patch adds a `--quiet` flag that allows a user to opt out of this
behavior.

Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
2021-06-10 15:31:16 -07:00
Levi Harrison 7bc11dcb06
React UI: Add Starting Screen (#8662)
* Added walreplay API endpoint

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Added starting page to react-ui

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Documented the new endpoint

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Fixed typos

Signed-off-by: Levi Harrison <git@leviharrison.dev>

Co-authored-by: Julius Volz <julius.volz@gmail.com>

* Removed logo

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Changed isResponding to isUnexpected

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Changed width of progress bar

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Changed width of progress bar

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Added DB stats object

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Updated starting page to work with new fields

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Passing nil

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Passing nil (pt. 2)

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Passing nil (pt. 3)

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Passing nil (and also implementing a method this time) (pt. 4)

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Passing nil (and also implementing a method this time) (pt. 5)

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Changed const to let

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Passing nil (pt. 6)

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Remove SetStats method

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Added comma

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Changed api

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Changed to triple equals

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Fixed data response types

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Don't return pointer

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Changed version

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Fixed interface issue

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Fixed pointer

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Fixed copying lock value error

Signed-off-by: Levi Harrison <git@leviharrison.dev>

Co-authored-by: Julius Volz <julius.volz@gmail.com>
2021-06-05 15:29:32 +01:00
Levi Harrison 17ea8d006a
Added external URL access
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-05-30 23:35:26 -04:00
Bartlomiej Plotka 80545bfb2e
Instrumented circular exemplar storage. (#8712)
* Instrumented circular storage.

Fixes: https://github.com/prometheus/prometheus/issues/8708
Fixes: https://github.com/prometheus/prometheus/issues/8707

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed CB.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Julien comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Callum comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2021-04-16 13:44:53 +01:00
nberkley f9e2dd0697
Add support for smaller block chunk segment allocations (#8478)
* Add support for --storage.tsdb.max-chunk-size to suport small chunks for space limited prometheus instances.

Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Update tsdb/compact.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Update tsdb/db.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Update cmd/prometheus/main.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Change naming scheme to

Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Add a lower bound to --storage.tsdb.max-block-chunk-segment-size

Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Update storage.md to explain what a chunk segment is

Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Apply suggestions from code review

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Force tests

Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Fix code style

Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
2021-04-15 14:25:01 +05:30
Julien Pivotto ae73a6296a
Merge pull request #8683 from cuirunxing-hub/main
typos correct
2021-04-02 20:14:55 +02:00
cuirunxing-hub 57bc2e94e2 typos correct
Signed-off-by: cuirunxing-hub <cuirunxing@inspur.com>
2021-04-02 09:03:00 +08:00
Jess G 731545ad34
Add documentation for recording rule backfiller (#8674)
* add docs for rule backfiller

Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2021-04-01 22:38:00 +02:00
Julien Pivotto e635ca834b Add environment variable expansion in external label values
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-03-30 01:36:28 +02:00
Björn Rabenstein 9549a15c6f
Merge pull request #7675 from JessicaGreben/jg/11-retroactive-rule-eval
Add rule importer to backfill
2021-03-29 19:09:21 +02:00
jessicagreben 896c828bb5 close writer after flush
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2021-03-29 06:45:12 -07:00
jessicagreben d89a1d999f add log with start/end times, close blocks before end of func
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2021-03-28 12:13:58 -07:00
Ben Kochie f0bccba1c3
Update Go modules for 2.26 (#8636)
* Update Go modules for 2.26

Bump all Go modules to the latest upstream.

Signed-off-by: Ben Kochie <superq@gmail.com>

* Fix promtool for new client_golang

LabelValues now requires a list of string matchers.

Signed-off-by: Ben Kochie <superq@gmail.com>
2021-03-24 09:41:12 +00:00
Julien Pivotto c0c36b1155
Improve promql-negative-offset docs (#8631)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-03-22 10:16:43 +01:00
jessicagreben 8de4da3716 add changes per comments, fix tests
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2021-03-20 12:38:30 -07:00
Callum Styan 289ba11b79
Add circular in-memory exemplars storage (#6635)
* Add circular in-memory exemplars storage

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>

Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com>
Co-authored-by: Martin Disibio <mdisibio@gmail.com>

* Fix some comments, clean up exemplar metrics struct and exemplar tests.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix exemplar query api null vs empty array issue.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com>
Co-authored-by: Martin Disibio <mdisibio@gmail.com>
2021-03-16 15:17:45 +05:30
jessicagreben e3a8132bb3 fix block alignment, add sample alignment
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2021-03-15 12:44:58 -07:00
jessicagreben 7c26642460 add block alignment and write in 2 hr blocks
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2021-03-14 10:10:55 -07:00
Julien Pivotto 63ea88af82
Merge pull request #8575 from pfreixes/add-scrapes-parameter
Add num scrapes as tsdb write benchmark command flag
2021-03-11 13:09:50 +01:00
Pau Freixes b1ac4a45e6 Add num scrapes as tsdb write benchmark command flag
By default same value that was hardcoded is used, but with the
new flag added the number of scrapes can be increased to any value.

Signed-off-by: Pau Freixes <pfreixes@gmail.com>
2021-03-10 11:17:07 +01:00
Julien Pivotto ad5ed416ba
Merge pull request #8487 from pschou/dev_neg_offset
allow negative offset
2021-03-08 22:18:45 +01:00
Julien Pivotto 5742a18590 Fix subqueries with default resolution in promql unit tests
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-03-07 09:20:04 +01:00
jessicagreben 9fc53b7edf fix appender.Add -> appender.Append
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2021-03-01 05:49:49 -08:00
Arthur Silva Sens 537c0aff49
Prometheus and Promtool binaries now print help and usage to stdout (#8542)
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2021-02-25 19:52:34 +01:00
jessicagreben 78e84aed89 resolve merge conflict
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2021-02-24 09:47:29 -08:00
jessicagreben f2db9dc722 add multi rule integration tests
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2021-02-24 09:42:31 -08:00
pschou f80b52be69
Merge branch 'main' into dev_neg_offset 2021-02-23 20:52:57 -05:00
schou 22cd48868a adding feature flag, promql-negative-offset
Signed-off-by: schou <pschou@users.noreply.github.com>
2021-02-23 20:25:56 -05:00
Julien Pivotto 8c8de46003
Merge pull request #8036 from dgl/promtool-alert-err
promtool: Don't end alert tests early, in some failure situations
2021-02-20 22:35:00 +01:00
Tom Wilkie 7369561305
Combine Appender.Add and AddFast into a single Append method. (#8489)
This moves the label lookup into TSDB, whilst still keeping the cached-ref optimisation for repeated Appends.

This makes the API easier to consume and implement.  In particular this change is motivated by the scrape-time-aggregation work, which I don't think is possible to implement without it as it needs access to label values.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2021-02-18 17:37:00 +05:30
Julien Pivotto 1fac1c783b
Merge pull request #8504 from rbauduin/require_alertname
promtool: alert_rule_test items require alertname
2021-02-17 22:07:52 +01:00
Julien Pivotto 2d172d0896
Merge pull request #8508 from prometheus/release-2.25
Merge back release 2.25
2021-02-17 16:26:34 +01:00
Raphael Bauduin a7d64cad21 promtool: alert_rule_test items require alertname
Accepting alert_rule_test without alertname is confusing as it will
always pass with empty exp_alerts, and never with non-empty exp_alerts.

Signed-off-by: Raphael Bauduin <raphael.bauduin@tessares.net>
2021-02-17 16:23:12 +01:00
Ganesh Vernekar c4536fa28c
Increase block writer size for backfilling
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2021-02-17 15:45:41 +05:30
Julien Pivotto a419b75abd
Merge pull request #8485 from hryniuk/promtool-query-errors-details
Print details of API errors received by promtool
2021-02-16 22:47:08 +01:00
Łukasz Hryniuk ab41de68b4 Print details of API errors
Signed-off-by: Łukasz Hryniuk <code@hryniuk.pl>
2021-02-15 23:42:06 +01:00
David Leadbeater 3e30f72af1 promtool: Add more negative alert tests
Signed-off-by: David Leadbeater <dgl@dgl.cx>
2021-02-15 17:00:49 +00:00
Julien Pivotto e29b47b39e
Merge pull request #8440 from mishamo/master
Add optional name property to testgroup for better test failure output
2021-02-09 21:23:24 +01:00
misha 1c3e7b4241 Use strings.Builder for neater error formatting
Signed-off-by: misha <DL-OTTCloudPlatform-Nova@bskyb.internal>
2021-02-09 15:00:26 +00:00
Tom Wilkie d479151f1f Various enhancements and refactorings for remote write receiver:
- Remove unrelated changes
- Refactor code out of the API module - that is already getting pretty crowded.
- Don't track reference for AddFast in remote write.  This has the potential to consume unlimited server-side memory if a malicious client pushes a different label set for every series.  For now, its easier and safer to always use the 'slow' path.
- Return 400 on out of order samples.
- Use remote.DecodeWriteRequest in the remote write adapters.
- Put this behing the 'remote-write-server' feature flag
- Add some (very) basic docs.
- Used named return & add test for commit error propagation

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2021-02-08 20:41:23 +00:00
fuling 72475b8a0c [ENHANCEMENT] remote storage:Add default api implementation of remote write
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
2021-02-07 18:12:48 +00:00
misha c2c5aeb16b Add optional name property to testgroup for better test failure output
Signed-off-by: misha <DL-OTTCloudPlatform-Nova@bskyb.internal>
2021-02-04 10:07:22 +00:00
Julien Pivotto c1f8bd9944
Merge pull request #8432 from roidelapluie/backfillpanic
backfill: move checkErr before we close the mmaped file
2021-02-03 16:32:35 +01:00
Julien Pivotto 9334269f2b backfill: move checkErr before we close the mmaped file
When printing the error, we still need access to the mmapped byte array
of the file. Therefore, we make sure that we run it before closing the
file.

I could have done something more complex like a defer, or not closing
the file, knowing that we would exit the program anyway. However, I
think that in case we extend this in the future, or this is copy/paster
elsewhere, we should continue closing the file. As it is small enough, I
went for the solution to call the function 3 times instead of playing
with a defer.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-02-01 21:18:42 +01:00
Jeremy Albinet 4a1f2c097e Typo on plural in checkRules/checkDuplicates
Signed-off-by: Jeremy Albinet <jalbinet@synthesio.com>
2021-02-01 15:43:05 +01:00
Julien Pivotto 2316062d4e Deprecate --alertmanager.timeout
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-01-25 12:36:13 +01:00
Ganesh Vernekar 9199fcb8d1
'@ <timestamp>' modifier (#8121)
This commit adds `@ <timestamp>` modifier as per this design doc: https://docs.google.com/document/d/1uSbD3T2beM-iX4-Hp7V074bzBRiRNlqUdcWP6JTDQSs/edit.

An example query:

```
rate(process_cpu_seconds_total[1m]) 
  and
topk(7, rate(process_cpu_seconds_total[1h] @ 1234))
```

which ranks based on last 1h rate and w.r.t. unix timestamp 1234 but actually plots the 1m rate.

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2021-01-20 16:27:39 +05:30
Julien Pivotto ac2626757c Update exporter-toolkit to 0.5.0
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-01-13 21:49:54 +01:00
Guangwen Feng 2df1a482da
Fix misspelled word in comment (#8348)
Signed-off-by: Guangwen Feng <fenggw-fnst@cn.fujitsu.com>
2021-01-07 10:01:08 +00:00
Julien Pivotto bc9f9ee3aa
Backfilling: fast-path for non-consecutive blocks (#8324)
* Backfilling: optimize for non-consecutive blocks

When you have missing data for > 2 hours, you spend a lot of time
re-reading the complete file. It is not optimal.

This introduces a fastpath for this scenario.

Next, we do parse the metric even when we know we will not use it, based
on its timestamp. This only computes the metric when we know its
timestamp is right.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-12-30 02:06:41 +01:00
Julien Pivotto 003d6451fc Promtool: add web config validation
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-12-29 16:55:29 +01:00
Julien Pivotto 5b4f46a348 Add TLS and basic authentication
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-12-28 21:33:44 +01:00
Ben Kochie 5055dfbbe4 Listen on web early in startup
Avoid starting up components like the TSDB if we can't bind
to the web listening port.

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-12-28 20:13:05 +01:00
beorn7 6bfa33308e promtool: Print block meta-data slightly more nicely
I initially thought I could somehow rescue the current column layout
by recycling the tabwriter, but flushing completely blanks
it. However, by setting a minimum width of 13, we get a slightly
broader DURATION column but otherwise nice formatting, unless numbers
get really big, but that's OK, I guess.

Before:

```
BLOCK ULID                  MIN TIME                       MAX TIME                       DURATION  NUM SAMPLES  NUM CHUNKS  NUM SERIES  SIZE
01ETN0KGNP5WWK9T5QMQGBG9F1  2020-11-19 07:39:17 +0000 UTC  2020-11-19 07:44:17 +0000 UTC  5m0.001s  8            2           2           624B
01ETN0KGQSFF0AB2QDZVQG3CWC  2020-11-19 10:25:57 +0000 UTC  2020-11-19 10:30:57 +0000 UTC  5m0.001s  8  2  2  622B
01ETN0KGSW8KYP3YPG4X20P60Z  2020-11-19 13:12:37 +0000 UTC  2020-11-19 13:17:37 +0000 UTC  5m0.001s  8  2  2  625B
```

After:

```
BLOCK ULID                  MIN TIME                       MAX TIME                       DURATION     NUM SAMPLES  NUM CHUNKS   NUM SERIES   SIZE
01ETN0R72SXN9A1FG732P7KFFN  2020-11-19 07:39:17 +0000 UTC  2020-11-19 07:44:17 +0000 UTC  5m0.001s     8            2            2            624B
01ETN0R74Y9AG1A1MKN4MZK7WM  2020-11-19 10:25:57 +0000 UTC  2020-11-19 10:30:57 +0000 UTC  5m0.001s     8            2            2            622B
01ETN0R76KXZ5VQECMDNES49J6  2020-11-19 13:12:37 +0000 UTC  2020-11-19 13:17:37 +0000 UTC  5m0.001s     8            2            2            625B
```

After without the `-r` flag:

```
BLOCK ULID                  MIN TIME       MAX TIME       DURATION     NUM SAMPLES  NUM CHUNKS   NUM SERIES   SIZE
01ETN0RFFJ42274NWR1GH0RTV6  1605771557000  1605771857001  5m0.001s     8            2            2            624
01ETN0RFJ1MZCHHS2SBZS8XC27  1605781557000  1605781857001  5m0.001s     8            2            2            622
01ETN0RFM98N3V4KD2DZXFGHGN  1605791557000  1605791857001  5m0.001s     8            2            2            625
```

Signed-off-by: beorn7 <beorn@grafana.com>
2020-12-28 16:55:12 +01:00
beorn7 651b57b9ab Merge branch 'backfillhr' of git://github.com/roidelapluie/prometheus into review 2020-12-28 16:18:00 +01:00
yeya24 cedd2dbec9 create output directory before backfilling
Signed-off-by: yeya24 <yb532204897@gmail.com>
2020-12-24 23:36:36 -05:00
Julien Pivotto 53480c168d Backfill: print created blocks only, add human-readable option
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-12-23 20:42:30 +01:00
AdaephonBen dca6954b0a
promtool: Add URL scheme when not provided (#7956)
Signed-off-by: AdaephonBen <ma18btech11011@iith.ac.in>
2020-12-23 19:52:04 +01:00
lzhfromustc 27a6e1e174
test: add buffer to channel to avoid goroutine leak (#8274)
Signed-off-by: lzhfromustc <lzhfromustc@gmail.com>
2020-12-10 09:09:21 +00:00
Julien Pivotto 7957731339 Inline defer
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-12-09 09:23:39 +01:00
Julien Pivotto 82b5f1d8b1 Backfill: Use mmap to reuse parser code
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-12-08 23:48:31 +01:00
jessicagreben e32e4fcc53 fix unit test
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2020-11-30 11:02:45 -08:00
jessicagreben cec3515fa3 fix linter
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2020-11-30 08:17:51 -08:00
jessicagreben 2e9946e4d7 add test
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2020-11-28 07:58:33 -08:00
jessicagreben ac06d0a657 merge master/resolve conflict
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2020-11-26 08:43:07 -08:00
jessicagreben ee85c22adb flush samples to disk every 5k samples
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2020-11-26 08:30:06 -08:00
Atibhi Agrawal b317b6ab9c
Backfill from OpenMetrics format (#8084)
* get parser working

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* import file created

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Find min and max ts

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* make two passes over file and write to tsdb

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* print error messages

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Fix Max and Min initializer

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Start with unit tests

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* reset file read

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* align blocks to two hour range

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Add cleanup test

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* remove .ds_store

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* add license to import_test

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Fix Circle CI error

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Refactor code
Move backfill from tsdb to promtool directory

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* fix gitignore

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Remove panic
Rename ContenType

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* adjust mint

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* fix return statement

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* fix go modules

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Added unit test for backfill

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* fix CI error

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Fix file handling

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Close DB

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Close directory

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Error Handling

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* inline err

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Fix command line flags

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* add spaces before func
fix pointers

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Add defer'd calls

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* move openmetrics.go content to backfill

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* changed args to flags

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* add tests for wrong OM files

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Added additional tests

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Add comment to warn of func reuse

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Make input required in main.go

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* defer blockwriter close

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* fix defer

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* defer

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Remove contentType

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* remove defer from backfilltest

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Fix defer remove in backfill_test

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* changes to fix CI errors

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* fix go.mod

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* change package name

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* assert->require

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* remove todo

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* fix format

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* fix todo

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* fix createblock

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* fix tests

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* fix defer

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* fix return

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* check err for anon func

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* change comments

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* update comment

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Fix for the Flush Bug

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* fix formatting, comments, names

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Print Blocks

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* cleanup

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* refactor test to take care of multiple samples

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* refactor tests

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* remove om

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* I dont know what I fixed

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Fix tests

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Fix tests, add test description, print blocks

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* commit after 5000 samples

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* reviews part 1

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Series Count

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* fix CI

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* remove extra func

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* make timestamp into sec

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Reviews 2

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Add Todo

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Fixes

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* fixes reviews

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* =0

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* remove backfill.om

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* add global err var, remove stuff

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* change var name

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* sampleLimit pass as parameter

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Add test when number of samples greater than batch size

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Change name of batchsize

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* revert export

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* nits

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* remove

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* add comment, remove newline,consistent err

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Print Blocks

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* Modify comments

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* db.Querier

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* add sanity check , get maxt and mint

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* ci error

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* fix

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* comment change

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* nits

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* NoError

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* fix

Signed-off-by: aSquare14 <atibhi.a@gmail.com>

* fix

Signed-off-by: aSquare14 <atibhi.a@gmail.com>
2020-11-26 10:37:06 +05:30
jessicagreben 5dd3577424 change name of promtool subcommand to create-blocks-from
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2020-11-22 15:05:02 -08:00
jessicagreben 19dee0a569 add name and labels to metric, eval all rules for each block
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2020-11-22 14:24:38 -08:00
gotjosh 4eca4dffb8
Allow metric metadata to be propagated via Remote Write. (#6815)
* Introduce a metadata watcher

Similarly to the WAL watcher, its purpose is to observe the scrape manager and pull metadata. Then, send it to a remote storage.

Signed-off-by: gotjosh <josue@grafana.com>

* Additional fixes after rebasing.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Rework samples/metadata metrics.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Use more descriptive variable names in MetadataWatcher collect.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix issues caused during rebasing.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix missing metric add and unneeded config code.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address some review comments.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix metrics and docs

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Replace assert with require

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Bring back max_samples_per_send metric

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix tests

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

Co-authored-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-11-19 20:53:03 +05:30
jessicagreben 75654715d3 fix panics
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2020-11-01 07:54:04 -08:00
jessicagreben 61c9a89120 use milliseconds for blocksize
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2020-10-31 07:11:54 -07:00
jessicagreben 6980bcf671 unexport backfiller
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2020-10-31 06:40:56 -07:00
jessicagreben 3ed6457dd4 use blockwriter, rm multiwriter code
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2020-10-31 06:32:07 -07:00
Julien Pivotto 6c56a1faaa
Testify: move to require (#8122)
* Testify: move to require

Moving testify to require to fail tests early in case of errors.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* More moves

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-29 09:43:23 +00:00
Bartlomiej Plotka 3d8826a3d4
MultiError: Refactored MultiError for more concise and safe usage. (#8066)
* MultiError: Refactored MultiError for more concise and safe usage.

* Less lines
* Goland IDE was marking every usage of old MultiError "potential nil" error
* It was easy to forgot using Err() when error was returned, now it's safely assured on compile time.

NOTE: Potentially I would rename package to merrors. (: In different PR.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed review comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fix after rebase.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-10-28 15:24:58 +00:00
Julien Pivotto 1282d1b39c
Refactor test assertions (#8110)
* Refactor test assertions

This pull request gets rid of assert.True where possible to use
fine-grained assertions.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-27 11:06:53 +01:00
David Leadbeater e7e60623ff
promtool: Calculate mint and maxt per test (#8096)
* promtool: Calculate mint and maxt per test

Previously a single test that used a later eval time would make all
other tests in the file share the [mint, maxt] and potentially evaluate
far more samples than needed.

Fixes: #8019

Signed-off-by: David Leadbeater <dgl@dgl.cx>
2020-10-24 12:03:55 +01:00
Julien Pivotto 4e5b1722b3
Move away from testutil, refactor imports (#8087)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-22 11:00:08 +02:00
jessicagreben 36ac0b68f1 merge master, fix conflicts 2020-10-17 08:20:21 -07:00
Björn Rabenstein 71577e45eb
Merge pull request #8044 from prometheus/beorn7/metrics
Instrumentation: Report valid configs in the respective metrics from the beginning
2020-10-12 23:32:02 +02:00
Arthur Silva Sens 4f45e201cc
Promtool tsdb list now prints block sizes (#7993)
* promtool tsdb list now prints blocks' size

Signed-off-by: arthursens <arthursens2005@gmail.com>
2020-10-12 23:15:40 +02:00
beorn7 0f3c1bf6cf Report valid configs in the respective metrics from the beginning
In #7399, an early validity check of the config was introduced to
prevent the scenario where an invalid config is only detected after a
possibly very long startup procedure. However, the respective success
metrics are not updated after the initial validation so that the
success metrics suggest an invalid config. If the startup procedure,
like replaying the WAL, really takes very long, alerts about invalid
config will trigger.

This commit sets the succes metrics after initial validation. They
will be set again after the "real" config (re-)load, but that
shouldn't be a problem. The metric now truthfully represents whenever
the config was successfully loaded, no matter if the result was then
thrown away (because it was just for validation) or actually used.

Signed-off-by: beorn7 <beorn@grafana.com>
2020-10-12 21:30:59 +02:00
David Leadbeater 5393ec22cb promtool: Don't end alert tests early, in some failure situations
If an alert test had a failing test, then any other alert test interval
specified after that point would result in the test exiting early.
This made debugging some tests more difficult than needed.

Now only exit early for evaluation failures.

Signed-off-by: David Leadbeater <dgl@dgl.cx>
2020-10-09 12:59:59 +01:00
Frederic Branczyk da3ea43242
Merge pull request #7976 from roidelapluie/tolerance
Introduce timestamp tolerance in scrapes
2020-10-08 09:21:19 +02:00
Julien Pivotto be5ba1a62d Fix wordings
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-07 21:44:36 +02:00
Julien Pivotto 4617d16b4b Specify the removal
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-07 18:32:04 +02:00
Julien Pivotto e2a2bf3c06 Add context
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-07 18:30:32 +02:00
Julien Pivotto 627ff84599 Adjust flag
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-07 18:25:52 +02:00
Julien Pivotto 6b618ecf02 Better description
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-07 17:43:42 +02:00
Julien Pivotto 536dfb6234 Add an experimental, hidden flag
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-07 17:31:46 +02:00
Frederic Branczyk 6be3ebdfe7
Merge pull request #8015 from simonpasquier/bump-k8s-deps
Bump k8s dependencies + support k8s.io/klog/v2
2020-10-07 09:54:58 +02:00
Julien Pivotto 946819e16e
cmd/prometheus: Issue a warning on 32 bit archs (#8012)
* cmd/prometheus: Issue a warning on 32 bit archs

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-06 21:42:56 +02:00
Simon Pasquier 9bb3555fe4 cmd/prometheus: support k8s.io/klog/v2
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-10-06 14:56:14 +02:00
David Leadbeater 77c784ac93
Ensure alert rules are marked as restored in unit tests (#7661)
This makes sure the ALERTS timeseries is created when unit testing
alerting rules.

Signed-off-by: David Leadbeater <dgl@dgl.cx>
2020-09-21 18:15:34 +02:00
jessicagreben 2e526cf2a7 add output dir parameter
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2020-09-13 08:38:32 -07:00
jessicagreben dfa510086b add alignment, mv rule importer to promtool dir, add queryRange
Signed-off-by: jessicagreben <jessicagrebens@gmail.com>
2020-09-13 08:07:59 -07:00
Julien Pivotto 442b3364d7
Promtool: add evaluation time to instant query (#7829)
* Promtool: add evaluation time to instant query

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Apply suggestion

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-08-25 11:32:25 +01:00
Andy Bursavich 4e6a94a27d
Invert service discovery dependencies (#7701)
This also fixes a bug in query_log_file, which now is relative to the config file like all other paths.

Signed-off-by: Andy Bursavich <abursavich@gmail.com>
2020-08-20 13:48:26 +01:00
Harold Dost 21a753c4e2
Make file permissions set to allow for wider umask options. (#7782)
0644 -> 0666 on all non vendored code.

Fixes #7717

Signed-off-by: Harold Dost <harolddost@gmail.com>
2020-08-12 23:23:17 +02:00
Julien Pivotto d661f84748 Log duration of reloads
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-08-06 21:49:26 +02:00
Annanay 9bba8a6eae Merge branch 'master' into appender-context
Signed-off-by: Annanay <annanayagarwal@gmail.com>
2020-07-30 16:43:18 +05:30
Julien Pivotto 01e3bfcd1a
Add warnings about NFS (#7691)
* Add warnings about NFS

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-30 11:22:44 +02:00
Javier Palomo Almena b58a613443
Replace sync/atomic with uber-go/atomic (#7683)
* storage: Replace usage of sync/atomic with uber-go/atomic

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* tsdb: Replace usage of sync/atomic with uber-go/atomic

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* web: Replace usage of sync/atomic with uber-go/atomic

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* notifier: Replace usage of sync/atomic with uber-go/atomic

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* cmd: Replace usage of sync/atomic with uber-go/atomic

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* scripts: Verify that we are not using restricted packages

It checks that we are not directly importing 'sync/atomic'.

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* Reorganise imports in blocks

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* notifier/test: Apply PR suggestions

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* storage/remote: avoid storing references on newEntry

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* Revert "scripts: Verify that we are not using restricted packages"

This reverts commit 278d32748e.

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* web: Group imports accordingly

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>
2020-07-30 13:15:42 +05:30
jessicagreben 7504b5ce7c add rule importer with tsdb block writer
Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>
2020-07-27 07:44:49 -07:00
Annanay 7f98a744e5 Add context to Appender interface
Signed-off-by: Annanay <annanayagarwal@gmail.com>
2020-07-24 19:40:51 +05:30
chinhnc e05c19da5d
Display block duration in promtool list blocks command (#7653)
* Update tsdb.go

Added DURATION column to `tsdb list` command

Signed-off-by: soup <chicknsoupuds@gmail.com>

* Use time.Duration instead of hardcoded hour

Signed-off-by: soup <chicknsoupuds@gmail.com>
2020-07-24 19:01:20 +05:30
Ben Ye 50c261502e
add tsdb cmds into promtool (#6088)
Signed-off-by: yeya24 <yb532204897@gmail.com>

update tsdb cli in makefile and promu

Signed-off-by: yeya24 <yb532204897@gmail.com>

remove building tsdb bin

Signed-off-by: yeya24 <yb532204897@gmail.com>

remove useless func

Signed-off-by: yeya24 <yb532204897@gmail.com>

refactor analyzeBlock

Signed-off-by: yeya24 <yb532204897@gmail.com>

Fix Makefile

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-07-23 19:35:50 +01:00
Bartlomiej Plotka a0df8a383a
promql: Removed global and add ability to have better interval for subqueries if not specified (#7628)
* promql: Removed global and add ability to have better interval for subqueries if not specified

## Changes
* Refactored tests for better hints testing
* Added various TODO in places to enhance.
* Moved DefaultEvalInterval global to opts with func(rangeMillis int64) int64 function instead

Motivation: At Thanos we would love to have better control over the subqueries step/interval.
This is important to choose proper resolution. I think having proper step also does not harm for
Prometheus and remote read users. Especially on stateless querier we do not know evaluation interval
and in fact putting global can be wrong to assume for Prometheus even.

I think ideally we could try to have at least 3 samples within the range, the same
way Prometheus UI and Grafana assumes.

Anyway this interfaces allows to decide on promQL user basis.

Open question: Is taking parent interval a smart move?

Motivation for removing global: I spent 1h fighting with:


=== RUN   TestEvaluations
    TestEvaluations: promql_test.go:31: unexpected error: error evaluating query "absent_over_time(rate(nonexistant[5m])[5m:])" (line 687): unexpected error: runtime error: integer divide by zero
--- FAIL: TestEvaluations (0.32s)
FAIL

At the end I found that this fails on most of the versions including this master if you run this test alone. If run together with many
other tests it passes. This is due to SetDefaultEvaluationInterval(1 * time.Minute)
in test that is ran before TestEvaluations. Thanks to globals (:

Let's fix it by dropping this global.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Added issue links for TODOs.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Removed irrelevant changes.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-07-22 14:39:51 +01:00
Julien Pivotto b83cbacbdd
Rule manager: remove blocking channel in mail (#7631)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-22 00:13:24 +02:00
Ben Ye e6ea798c32
promtool range query should exit when fail to parse time (#7505)
Signed-off-by: yeya24 <yb532204897@gmail.com>
2020-07-16 23:53:04 +01:00
yeya24 797e48c1a3 support time range in promtool query labels
Updated prometheus/client_golang and json-iterator/go

Signed-off-by: yeya24 <yb532204897@gmail.com>
2020-07-03 11:29:39 -04:00
Frederic Branczyk d17d88935c
rules: Use narrower interface for rule manager loading of for state (#7472)
To load ALERT_FOR_STATE only `storage.Queryable` interface is required,
so this patch uses this narrower interface for to perform this.

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
2020-06-26 19:06:36 +01:00
Bartlomiej Plotka b788986717
storage: Adjusted fully storage layer support for chunk iterators: Remote read client, readyStorage, fanout. (#7059)
* Fixed nits introduced by https://github.com/prometheus/prometheus/pull/7334
* Added ChunkQueryable implementation to fanout and readyStorage.
* Added more comments.
* Changed NewVerticalChunkSeriesMerger to CompactingChunkSeriesMerger, removed tiny interface by reusing VerticalSeriesMergeFunc for overlapping algorithm for
both chunks and series, for both querying and compacting (!) + made sure duplicates are merged.
* Added ErrChunkSeriesSet
* Added Samples interface for seamless []promb.Sample to []tsdbutil.Sample conversion.
* Deprecating non chunks serieset based StreamChunkedReadResponses, added chunk one.
* Improved tests.
* Split remote client into Write (old storage) and read.
* Queryable client is now SampleAndChunkQueryable. Since we cannot use nice QueryableFunc I moved
all config based options to sampleAndChunkQueryableClient to aboid boilerplate.

In next commit: Changes for TSDB.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-06-24 14:41:52 +01:00
Harkishen Singh 70b0a34616
Exit early on invalid config file (#7399)
* Reload config file at start

Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>

* relocated config checking

Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>

* change log lever

Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>

* add helpful comment

Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>
2020-06-21 21:26:59 +05:30
Ben Kochie 8d3c2f6829
Enable WAL compression by default (#7410)
Enable the `--storage.tsdb.wal-compression` flag by defualt.

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-06-18 17:59:40 +01:00
Jordan Neufeld 268b4c29e1
Support extended durations in promtool unit tests (Fixes #6285) (#6297)
* Fixed evaluation_time duration parsing in promtool unit tests (Fixes #6285)

Signed-off-by: Jordan Neufeld <jordan@neufeldtech.com>
2020-06-15 16:03:07 +01:00
Arthur Silva Sens 7727b9012e
Correction of misleading help text(#5142) (#7231)
* Correction of misleading help text(#5142)

Signed-off-by: arthursens <arthursens2005@gmail.com>
2020-05-11 12:15:01 +01:00
Julien Pivotto 9e265aba10
Merge pull request #7225 from prometheus/release-2.18
[Merge without Squash] Merge release-2.18 back to master for 2.18.1 fixes.
2020-05-07 21:23:59 +02:00
Hongcai Ren c7e82274c6
replace github.com/prometheus/prometheus/testutil/promlint by github.com/prometheus/client_golang/prometheus/testutil/promlint from our codebase (#7209)
Signed-off-by: RainbowMango <renhongcai@huawei.com>
2020-05-07 11:34:39 +01:00
Julien Pivotto 645b71e9ef
Fix snapshots (#7217)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-05-07 10:03:48 +01:00
Ganesh Vernekar d4b9fe801f
M-map full chunks of Head from disk (#6679)
When appending to the head and a chunk is full it is flushed to the disk and m-mapped (memory mapped) to free up memory

Prom startup now happens in these stages
 - Iterate the m-maped chunks from disk and keep a map of series reference to its slice of mmapped chunks.
- Iterate the WAL as usual. Whenever we create a new series, look for it's mmapped chunks in the map created before and add it to that series.

If a head chunk is corrupted the currpted one and all chunks after that are deleted and the data after the corruption is recovered from the existing WAL which means that a corruption in m-mapped files results in NO data loss.

[Mmaped chunks format](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/head_chunks.md)  - main difference is that the chunk for mmaping now also includes series reference because there is no index for mapping series to chunks.
[The block chunks](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/chunks.md) are accessed from the index which includes the offsets for the chunks in the chunks file - example - chunks of series ID have offsets 200, 500 etc in the chunk files.
In case of mmaped chunks, the offsets are stored in memory and accessed from that. During WAL replay, these offsets are restored by iterating all m-mapped chunks as stated above by matching the series id present in the chunk header and offset of that chunk in that file.

**Prombench results**

_WAL Replay_

1h Wal reply time
30% less wal reply time - 4m31 vs 3m36
2h Wal reply time
20% less wal reply time - 8m16 vs 7m

_Memory During WAL Replay_

High Churn:
10-15% less RAM -  32gb vs 28gb
20% less RAM after compaction 34gb vs 27gb
No Churn:
20-30% less RAM -  23gb vs 18gb
40% less RAM after compaction 32.5gb vs 20gb

Screenshots are in [this comment](https://github.com/prometheus/prometheus/pull/6679#issuecomment-621678932)


Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-05-06 21:00:00 +05:30
Ben Ye 1e4e37144d
Fixed wrongly handled not ready TSDB on web and API. (#7182)
* fix federate endpoint panic

Signed-off-by: yeya24 <yb532204897@gmail.com>

* Fixed all cases of not ready TSDB being wrongly handled.

* Fixed issue for federation.
* Ensured this will never happen again thanks to interfaces
* Fixes same issue for stats.
* Added tests for readiness.
* Fixed bug in stats. It was:
   status.MaxTime = db.Head().MaxTime()
   status.MinTime = db.Head().MaxTime()


Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Brian's comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Brian's comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-04-29 17:16:14 +01:00
Vasily Sliouniaev 0393b188c9
Add Jaeger (#7148)
* Trace remote read

Signed-off-by: vas <vasily.sliouniaev@jet.com>

* Use jaeger

Signed-off-by: vas <vasily.sliouniaev@jet.com>
2020-04-23 02:05:55 +02:00
Marek Slabicki 8224ddec23
Capitalizing first letter of all log lines (#7043)
Signed-off-by: Marek Slabicki <thaniri@gmail.com>
2020-04-11 09:22:18 +01:00
Brian Brazil 7646cbca32
Use .UTC everywhere we use time.Unix (#7066)
time.Unix attaches the local timezone, which can then
leak out (e.g. in the alert json). While this is harmless,
we should be consistent.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2020-03-29 17:35:39 +01:00
Ben Kochie 269e7c8091
Fix golint issues.
Signed-off-by: Ben Kochie <superq@gmail.com>
2020-03-23 20:38:43 +01:00
johncming bbacd2dd09
remove needless break. (#7008)
Signed-off-by: johncming <johncming@yahoo.com>
2020-03-19 11:21:00 +00:00
李国忠 52025bd7a9
[comments] change word ‘wheter’ to ‘whether’ (#6912)
* [comments] change word ‘wheter’ to ‘whether’
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>

* [comments] change word ‘wheter’ to ‘whether’
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
2020-03-02 13:51:24 +05:30
Tobias Guggenmos 4835bbf376
Merge branch 'master' into split_parser 2020-02-19 15:18:13 +01:00
Bartlomiej Plotka 48ead578a0 Moved tsdbconfig to main.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-18 11:25:36 +00:00
Bartlomiej Plotka a20bebf7eb Moved readyStorage to main.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:57 +00:00
Bartlomiej Plotka 8a775bc468 Moved unit agnostic options to separate pkg.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:57 +00:00
Bartlomiej Plotka 59c9d6ef45 Addressed Brian's comments, moved metrics to main.go
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:57 +00:00
Bartlomiej Plotka cfba92a133 Addressed comments.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:57 +00:00
Bartlomiej Plotka 34426766d8 Unify Iterator interfaces. All point to storage now.
This is part of https://github.com/prometheus/prometheus/pull/5882 that can be done to simplify things.
All todos I added will be fixed in follow up PRs.

* querier.Querier, querier.Appender, querier.SeriesSet, and querier.Series interfaces merged
with storage interface.go. All imports that.
* querier.SeriesIterator replaced by chunkenc.Iterator
* Added chunkenc.Iterator.Seek method and tests for xor implementation (?)
* Since we properly handle SelectParams for Select methods I adjusted min max
based on that. This should help in terms of performance for queries with functions like offset.
* added Seek to deletedIterator and test.
* storage/tsdb was removed as it was only a unnecessary glue with incompatible structs.

No logic was changed, only different source of abstractions, so no need for benchmarks.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:54 +00:00
Tobias Guggenmos 454ba12676 Fix build errors in promtool
Signed-off-by: Tobias Guggenmos <tguggenm@redhat.com>
2020-02-17 16:09:23 +01:00
Björn Rabenstein af04cb22c8
Merge pull request #6821 from prometheus/release-2.16
Release 2.16
2020-02-14 13:10:14 +01:00
Julien Pivotto ff0003e072
Make lookbackDelta a option of QueryEngine (#6746)
* Make lookbackDelta a option of QueryEngine

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* julius' suggestion

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* remove trivial getter

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Assume lookback delta is always > 0

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* add debug log

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* don't expose loopback delta

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Specify that lookack delta is also used in federation

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Fix federation test

While we have added some logic to the promql engine to keep it backwards
compatible and have a 5 minute loopback by default, the web/ package is
likely to really be internal to Prometheus and we should not add the
same kind of heuritstics here.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* loopback delta: Fix debug log

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-02-10 00:58:23 +01:00
Julien Pivotto d799078c88 also test start and end
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-02-08 16:42:50 +01:00
Julien Pivotto 881dde505a promql: fix promql query log step unit
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-02-08 16:26:56 +01:00
Julien Pivotto 3c4c01eae2
Fix race in Query Log Test (#6727)
A data race can happen if we run t.Log after the test t is done -- which
in this case is highly possible because of the use of subtests and the
fact that we call t.Log in a goroutine.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-30 13:51:18 -08:00
Julien Pivotto 9adad8ad30 Remove MaxConcurrent from the PromQL engine opts (#6712)
Since we use ActiveQueryTracker to check for concurrency in
d992c36b3a it does not make sense to keep
the MaxConcurrent value as an option of the PromQL engine.

This pull request removes it from the PromQL engine options, sets the
max concurrent metric to -1 if there is no active query tracker, and use
the value of the active query tracker otherwise.

It removes dead code and also will inform people who import the promql
package that we made that change, as it breaks the EngineOpts struct.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-28 20:38:49 +00:00
Julien Pivotto 5f27ac3583 Refactor query log fields (#6694)
* Refactor query log fields

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-27 09:53:10 +00:00
Julien Pivotto 2b2eb79e8b Add windows tests for query logger (#6653)
* Add windows tests
* Do not rely on time.Time in timer

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-20 13:17:11 +00:00
Julien Pivotto 0eb34299da End-to-end Query Log test (#6600)
* End-to-end Query Log test

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-19 21:56:13 +00:00
Julien Pivotto 1a58d2657d Removed compilation step inside main_test (#6658)
Inspired by https://github.com/prometheus/prometheus/pull/6347 and
https://github.com/prometheus/prometheus/pull/6347#issuecomment-570151979

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-19 07:14:25 +00:00
Harkishen Singh 84e6459c4d Adds support for line-column numbers for invalid rules, promtool (#6533)
Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>
2020-01-15 18:07:54 +00:00
Julien Pivotto 3885562587 Query Logging styling (#6594)
- Fix Json vs JSON in activequerylogger
- Fix SetQueryLogger always returns nil

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-09 21:11:39 +00:00
Julien Pivotto 9d9bc524e5 Add query log (#6520)
* Add query log, make stats logged in JSON like in the API

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-08 13:28:43 +00:00
Simon Pasquier cccd542891
*: avoid missed Alertmanager targets (#6455)
This change makes sure that nearly-identical Alertmanager configurations
aren't merged together.

The config's identifier was the MD5 hash of the configuration serialized
to JSON but because `relabel.Regexp` has no public field and doesn't
implement the JSON.Marshaler interface, it was always serialized to
"{}".

In practice, the identifier can be based on the index of the
configuration in the list.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-12-12 17:00:19 +01:00
Brooks Swinnerton 0ea3a2218d Add time units to storage.tsdb.retention.size flag (#6365)
* Add time units to storage.tsdb.retention.size flag

In an effort to reduce confusion with the `m` option of the
`ParseDuration()` function, this commit adds the available time units to
the `storage.tsdb.retention.time` flag to help showcase that there is no
option for months (which could be assumed to be `m`).

If someone were looking to set the retention to six months, they may
mistakenly do so with `6m`, which would reduce their retention to six
minutes.

Signed-off-by: Brooks Swinnerton <bswinnerton@gmail.com>
2019-11-30 08:00:51 +00:00
johncming ad4bc5701e remove unwanted break (#6338)
Signed-off-by: johncming <johncming@yahoo.com>
2019-11-18 23:01:03 -08:00
akerele abraham 9d39fdad0c unittest: check for rule files existence (#6075)
Signed-off-by: akerele abraham <abrahamakerele38@gmail.com>
2019-11-18 13:54:52 -08:00
Chris Marchbanks 1d1f64b4bc
Fix Promtool showing false duplicate rule warnings (#6270)
Alert rules do not use the Record field, so any alerts with the same
labels and different names would be counted as being duplicates.
Promtool will now consider either field when finding duplicates.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2019-11-05 11:22:31 -07:00
Simon Pasquier ddff1480a7
cmd/promtool: improve output for PromQL tests (#6052)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-09-25 09:26:29 +02:00
Harkishen Singh e097c70e6d add checks for metrics and display duplicate fields (#6026)
Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>
2019-09-20 11:29:47 +01:00
Simon Pasquier 06066a3619
*: improve error messages when parsing bad rules (#5965)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-08-28 17:36:48 +02:00
Sayan Chowdhury cb66e325d8 Show the warnings during label query (#5924)
This patch loops through the warnings while querying the label and spits the
output to stderr

Fixes #5885

Signed-off-by: Sayan Chowdhury <sayan.chowdhury2012@gmail.com>
2019-08-24 19:42:21 +02:00
Bartek Płotka 48b2c9c8ea
remote-read: streamed chunked server side; Extended protobuf; Added chunked, checksumed reader (#5703)
Part of: https://github.com/prometheus/prometheus/issues/4517 and https://github.com/improbable-eng/thanos/issues/488

Changes:
* Extended protobuf for chunked remote read and negotation.
* Added checksumed, chunked Writer/Reader.
* Added Server side implementation for chunked streamed remote-read.


Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2019-08-19 21:16:10 +01:00
Bartek Płotka 5cb32d67f9
Merge pull request #5893 from prometheus/unify-tsdbutil
Removed extra tsdb/testutil after merge.
2019-08-15 12:07:59 +01:00
Bartek Plotka f0863a604e Removed extra tsdb/testutil after merge.
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2019-08-14 10:12:32 +01:00
Julius Volz b5c833ca21
Update go.mod dependencies before release (#5883)
* Update go.mod dependencies before release

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Add issue for showing query warnings in promtool

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Revert json-iterator back to 1.1.6

It produced errors when marshaling Point values with special float
values.

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Fix expected step values in promtool tests after client_golang update

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Update generated protobuf code after proto dep updates

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2019-08-14 11:00:39 +02:00
Advait Bhatwadekar 5d401f1e1b Added query logging for prometheus. Issue #1315 (#5794)
* Added query logging for prometheus.
Options added:
1) active.queries.filepath: Filename where queries will be recorded
2) active.queries.filesize: Size of the file where queries will be recorded.

Functionality added:
All active queries are now logged in a file. If prometheus crashes unexpectedly, these queries are also printed out on stdout in the rerun.

Queries are written concurrently to an mmaped file, and removed once they are done. Their positions in the file are reused. They are written in json format. However, due to dynamic nature of application, the json has an extra comma after the last query, and is missing an ending ']'. There may also null bytes in the tail of file.

Signed-off-by: Advait Bhatwadekar <advait123@ymail.com>
2019-07-31 16:12:43 +01:00
Simon Pasquier 75886e0464 cmd/promtool: fix panic with empty exp_labels
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-07-17 17:02:31 +02:00
Chris Marchbanks 06f1ba73eb
Provide flag to compress the tsdb WAL
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2019-07-03 08:03:29 -06:00
Tom Wilkie 851131b074
Allow injection of arbitrary headers in promtool, for auth etc. (#4389)
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-06-30 11:50:23 +01:00
Simon Pasquier be67b8d460
web: fix flaky TestHTTPMetrics() (#5695)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-06-24 15:48:15 +02:00
Björn Rabenstein dc22f74153
Merge pull request #5608 from simonpasquier/external-labels-for-alert-tests
cmd/promtool: add $externalLabels for alert unit tests
2019-06-20 16:48:12 +02:00
Björn Rabenstein 372b3438e5 Update prometheus/client_golang to v1.0.0 (#5682)
Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-17 19:14:36 +01:00
Keenan Romain 55f3a9fe4a Allows globs for rules when unit testing (#5595)
* Includes glob support when unit testing rule_files. 

Signed-off-by: Keenan Romain <Keenan.Romain@mailchimp.com>
2019-06-12 11:31:07 +01:00
Simon Pasquier 74ff35ccdd cmd/promtool: add $externalLabels for alert unit tests
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-05-29 16:40:01 +02:00
beorn7 aff4738f33 Adjust TestQueryRange to new Prometheus API client
Signed-off-by: beorn7 <bjoern@rabenste.in>
2019-05-17 18:09:47 +02:00
Lee Gaines f4486815c1 logs filesystem type on startup (#5558)
Signed-off-by: Lee Gaines <leetgaines@gmail.com>
2019-05-17 10:16:16 +01:00
Björn Rabenstein 0a34399611 Fix minor punctuation and language issues in flag doc strings (#5568)
This is mostly to create consistency, not because the one or the other
way would be wrong. A few actual corrections are also included.

Signed-off-by: beorn7 <bjoern@rabenste.in>
2019-05-15 16:59:06 +02:00
Simon Pasquier 45506841e6
*: enable all default linters (#5504)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-05-03 15:11:28 +02:00
Simon Pasquier 9c69eec82a cmd/promtool: use log.NewNopLogger() (#5531)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-05-03 10:00:07 +01:00
Frederic Branczyk c790d7658c
Merge pull request #5491 from metalmatze/rungroup
Use github.com/oklog/run not archived oklog/oklog
2019-04-29 16:22:16 +02:00
Björn Rabenstein 0be9388f8d
Merge pull request #5463 from prometheus/beorn7/templating
Follow-up on #5009
2019-04-24 16:42:23 +02:00
Simon Pasquier abc1994bec
cmd/promtool: return errors from rule evaluations (#5483)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-23 09:59:03 +02:00
Matthias Loibl 388caa06ac
Use github.com/oklog/run not archived oklog/oklog
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
2019-04-19 14:55:28 +02:00
Bjoern Rabenstein 38d518c0fe Rework #5009 after comments
Signed-off-by: Bjoern Rabenstein <bjoern@rabenste.in>
2019-04-17 01:40:10 +02:00
Bjoern Rabenstein a92ef68dd8 Fix staticcheck errors
Not sure why they only show up now.

Signed-off-by: Bjoern Rabenstein <bjoern@rabenste.in>
2019-04-17 01:40:10 +02:00
Sylvain Rabot 335a34486e Add external labels to template expansion
This affects the expansion of templates in alert labels and
annotations and console templates.

Signed-off-by: Sylvain Rabot <sylvain@abstraction.fr>
2019-04-17 01:40:10 +02:00
Simon Pasquier e5dbac7972 cmd/prometheus: group flags properly (#5419)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-10 13:22:05 +01:00
David Symonds 7a60e22c2d cmd/promtool: resolve relative paths in alert test files (#5336)
Like `promtool check config <path/to/foo.yaml>`, which resolves relative
paths inside foo.yaml to be relative to `path/to`, this now makes
`promtool test rules <path/to/test.yaml>` do the same thing.

Signed-off-by: David Symonds <dsymonds@gmail.com>
2019-03-27 10:27:26 +01:00
Tariq Ibrahim 8fdfa8abea refine error handling in prometheus (#5388)
i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors.
ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives.
iii) Does away with the use of fmt package for errors in favour of pkg/errors

Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-03-26 00:01:12 +01:00
Brian Brazil 0a87dcd416
cmd: Warn rather than Info when retention time wraps (#5403)
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-03-25 18:06:38 +00:00
Krasi Georgiev 9d96ada510 Display correct values for the retention in the flags web gui. (#5322)
* Display correct values for the retention in the flags web gui.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>

* adding a log entry

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>

* added the retention info to the runtime status page

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>

* simplify the retention display

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-03-11 22:48:57 +05:30
Krasi Georgiev 1684dc750a
updated tsdb to 0.6.0 (#5292)
* updated tsdb to 0.6.0

as part of the update also added the new storage.tsdb.allow-overlapping-blocks flag and mark it as experimental.
2019-03-04 21:42:45 +02:00
Simon Pasquier c8a1a5a93c
discovery/kubernetes: fix support for password_file and bearer_token_file (#5211)
* discovery/kubernetes: fix support for password_file

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Create and pass custom RoundTripper to Kubernetes client

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Use inline HTTPClientConfig

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-20 11:22:34 +01:00
Krasi Georgiev a3c41f4256
use the default time retention value only when no size retention is set (#5216)
fixes https://github.com/prometheus/prometheus/issues/5213

Now that we have time and size base retention time bases should not have a default value. A default is set only when both - time and size flags are not set.

This change will not affect current installations that rely on the default time based value, and will avoid confusions when only the size retention is set and it is expected that the default time based setting would be no longer in place.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-02-19 13:53:43 +02:00
Callum Styan 6f69e31398 Tail the TSDB WAL for remote_write
This change switches the remote_write API to use the TSDB WAL.  This should reduce memory usage and prevent sample loss when the remote end point is down.

We use the new LiveReader from TSDB to tail WAL segments.  Logic for finding the tracking segment is included in this PR.  The WAL is tailed once for each remote_write endpoint specified. Reading from the segment is based on a ticker rather than relying on fsnotify write events, which were found to be complicated and unreliable in early prototypes.

Enqueuing a sample for sending via remote_write can now block, to provide back pressure.  Queues are still required to acheive parallelism and batching.  We have updated the queue config based on new defaults for queue capacity and pending samples values - much smaller values are now possible.  The remote_write resharding code has been updated to prevent deadlocks, and extra tests have been added for these cases.

As part of this change, we attempt to guarantee that samples are not lost; however this initial version doesn't guarantee this across Prometheus restarts or non-retryable errors from the remote end (eg 400s).

This changes also includes the following optimisations:
- only marshal the proto request once, not once per retry
- maintain a single copy of the labels for given series to reduce GC pressure

Other minor tweaks:
- only reshard if we've also successfully sent recently
- add pending samples, latest sent timestamp, WAL events processed metrics

Co-authored-by: Chris Marchbanks <csmarchbanks.com> (initial prototype)
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> (sharding changes)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-02-12 11:39:13 +00:00
Brian Brazil 1dd57765b4
Reduce time that alertmanagers are in flux when reloaded. (#5126)
This no longer waits for all of the scrape reload to complete
before getting a list of AMs again.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-01-28 18:34:12 +00:00
Goutham Veeramachaneni 4068968e12
Protect retention from overflowing (#5112)
Also sanitise the max block duration to max a month.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-18 20:18:06 +05:30
Goutham Veeramachaneni 384cba1211
Add flag for size based retention (#5109)
* Add flag for size based retention

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Deprecate the old retention flag for a new one.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Add ability to take a suffix for size flag

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Address feedback

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-18 19:18:36 +05:30
Hrishikesh Barman a1f34bec2e Added CORS Origin flag (#5011)
Signed-off-by: Hrishikesh Barman <hrishikeshbman@gmail.com>
2019-01-17 15:01:06 +00:00
Matt Layher 302148fd69 *: apply gofmt -s
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2019-01-16 17:28:14 -05:00
Ryan Leung 45c8b084c6 fix TestFailedStartupExitCode (#5076)
Signed-off-by: rleungx <rleungx@gmail.com>
2019-01-16 10:13:36 +01:00
Lv Jiawei b8ede99767 Fix comment typo (#5087)
According to code, I think it is a typo.

Signed-off-by: MIBc <lvjiawei@cmss.chinamobile.com>
2019-01-09 10:56:47 +00:00
Frederic Branczyk e9ae0b5a1b
Merge pull request #4927 from tariq1890/update_k8s
update client-go to v10.0.0 and other k8s deps to v1.13.1
2019-01-07 10:54:34 +01:00
Simon Pasquier f678e27eb6
*: use latest release of staticcheck (#5057)
* *: use latest release of staticcheck

It also fixes a couple of things in the code flagged by the additional
checks.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Use official release of staticcheck

Also run 'go list' before staticcheck to avoid failures when downloading packages.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-01-04 14:47:38 +01:00
tariqibrahim 9b4a25e7b0 use klog dependency
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-01-03 13:57:20 -08:00
glutamatt 5ddde1965b tune the "Wal segment size" with a flag (#5029)
Add WALSegmentSize as an option, and the corresponding flag "storage.tsdb.wal-segment-size" to tune the max size of wal segment files.

The addressed base problem is to reduce the disk space used by wal segment files : on a raspberry pi, for instance, we often want to reduce write load of the sd card, then, the wal directory is mounted on a memory (space limited) partition.

the default value of the segment max file size, pushed the size of directory to 128 MB for each segment , which is too much ram consumption on a rasp.

the initial discussion is at https://github.com/prometheus/tsdb/pull/450
2019-01-03 17:13:21 +03:00
Ganesh Vernekar 7d30ccd0eb Sort samples before comparing - PromQL unit test (#5052)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-12-31 10:55:49 +00:00
Ganesh Vernekar dbe55c1352 Subquery (#4831)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-12-22 13:47:13 +00:00
Simon Pasquier a2766a94a3 cmd/prometheus: add tests for sendAlerts() (#4910)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-12-18 11:15:46 +00:00
AixesHunter 1b166d7174 Fix variable 'notifier' collides with imported package name 'github.com/prometheus/prometheus/notifier', changed to 'notifierManager'. (#4947)
Signed-off-by: aixeshunter <aixeshunter@gmail.com>
2018-12-18 11:13:18 +00:00
Ganesh Vernekar fbadd88ba5 Get unique eval times for alert unit tests (#4964)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-12-18 08:40:03 +00:00
Simon Pasquier ac9d5f3d53
cmd/prometheus: replace glog by glog-gokit (#4931)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-12-04 15:01:12 +01:00
Krasi Georgiev 080e6ed31a
collect cpu and trace profiles with the promtool debug command (#4897)
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-11-23 17:57:31 +02:00
Alex Yu 5dcce32ef8 update promlog to latest version (#4876)
* update promlog to latest version

Signed-off-by: Alex Yu <yu.alex96@gmail.com>

* Update api tests, fix main setup

Signed-off-by: Alex Yu <yu.alex96@gmail.com>

* tidy go.sum

Signed-off-by: Alex Yu <yu.alex96@gmail.com>

* revendor prometheus/common

Signed-off-by: Alex Yu <yu.alex96@gmail.com>

* only initialize config; use kingpin for remote_storage_adapter

Signed-off-by: Alex Yu <yu.alex96@gmail.com>

* actually parse the flags

Signed-off-by: Alex Yu <yu.alex96@gmail.com>

* clean up imports

Signed-off-by: Alex Yu <yu.alex96@gmail.com>
2018-11-23 14:22:40 +01:00
Ganesh Vernekar cfb3769274 Lazily load samples for unit testing (#4851)
* Lazily load samples for unit testing

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* cleanup

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-11-22 14:21:38 +05:30
achiuBAE a9050c45f6 Allow setting the Prometheus instance document title through a flag. (#4841)
* web: added ability to set page title through flag.

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* Reformatted variable names and Flag description for readability.

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* assets_vfsdata.go

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* Flag name changed from web.ui-title to web.page-title

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* make assets

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>
2018-11-21 12:45:06 +08:00
stuart nelson 6a69471bc2
[promtool] Support writing output as json (#4848)
* Support writing output as json

Oftentimes I'll want to execute something based on
the output from promtool, and supporting json
makes it easy to pull out values with a supporting
tool such as jq.

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2018-11-14 18:40:07 +01:00
Lucas Serven 70c8b2c63c
cmd/prometheus: buffer signal chans
According to the GoDoc for os.Signal [0]:

> Package signal will not block sending to c: the caller must ensure that
> c has sufficient buffer space to keep up with the expected signal rate.
> For a channel used for notification of just one signal value, a buffer
> of size 1 is sufficient.

[0] https://golang.org/pkg/os/signal/#Notify

Signed-off-by: Lucas Serven <lserven@gmail.com>
2018-11-14 10:33:28 +01:00
Frederic Branczyk bda9781ccd
Merge pull request #3839 from brancz/remove-old-alert-record
promql: Remove old and unused alerting/reconding syntax
2018-11-06 15:53:27 +01:00
Simon Pasquier a30348f1a4 discovery: add config label to discovered targets metric (#4753)
* discovery: add labels to discovered targets metric

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-10-18 16:46:59 +01:00
Callum Styan 9bca041285 WIP: keep track of samples per query, set a max # of samples (#4513)
* keep track of samples per query, set a max # of samples that can be in
memory at once

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2018-10-02 12:59:19 +01:00
Tom Wilkie 4c52400708
Limit concurrent remote reads. (#4656)
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-09-25 20:07:34 +01:00
Ganesh Vernekar 5790d23fd8 Unit testing for rules (#4350)
* Unit testing for rules
* Specifying order of group evaluation in unit tests

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-09-25 17:06:26 +01:00
Tom Wilkie 457e4bb58e
Limit the number of samples remote read can return. (#4532)
* Limit the number of samples remote read can return.

- Return 413 entity too large.
- Limit can be set be a flag.  Allow 0 to mean no limit.
- Include limit in error message.
- Set default limit to 50M (* 16 bytes = 800MB).

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-09-05 15:50:50 +02:00
Chris Marchbanks 63ed9d1b70 Send EndsAt along with alerts (#4550)
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2018-08-28 16:05:00 +01:00
Chris Marchbanks 87f1dad16d throttle resends of alerts to 1 minute by default (#4538)
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2018-08-27 17:41:42 +01:00
Krasi Georgiev 12fe204ea6
move runtime debug funcs in own package (#4494)
To make local debuging with `go run` easyer moved all files into a
dedicate package `runtime`.
This allows running prometheus just by using `go run main.go` instead of
passing mani files like `go run main.go limits_default.go ...`

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-08-22 13:41:11 +03:00
Simon Pasquier 08c2f50382
Merge pull request #4418 from simonpasquier/log-vm-limits
prometheus: log virtual memory limits
2018-08-07 16:27:46 +02:00
Frederic Branczyk b0b3e3dd74
promql: Remove old and unused alerting/reconding syntax
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
2018-08-07 15:14:06 +02:00
Dave Henderson 73a08f0045 promtool - Adding --step flag to 'query range' subcommand (#4454)
Signed-off-by: Dave Henderson <dhenderson@gmail.com>
2018-08-05 11:03:18 +02:00
Julius Volz 90521a65f8
Remove error return value from NotifyFunc() (#4459)
It's always nil and we also forgot to check it.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-08-04 21:31:12 +02:00
Ganesh Vernekar f1db699dff Persist alert 'for' state across restarts (#4061)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-08-02 11:18:24 +01:00
Simon Pasquier a94450c288 Fix build for openbsd
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-07-31 14:41:30 +02:00
Simon Pasquier 141c188ae6 Enforce conversion for freebsd
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-07-26 14:58:56 +02:00
Simon Pasquier 208d21a393 Add comment and print units
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-07-26 10:26:58 +02:00
Simon Pasquier ba22b10113 prometheus: log virtual memory limits
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-07-25 15:51:27 +02:00
Daisy T a3376e8f36 add query labels command to promtool (#4346)
Signed-off-by: Daisy T <daisyts@gmx.com>
2018-07-18 16:27:28 +02:00
Julius Volz 95dfb1b1dd
Add missing import to promtool, fix build (#4395)
Sorry, I used GitHub's web-based merge-conflict-resolution editor on
https://github.com/prometheus/prometheus/pull/4308 and it didn't show me
test errors afterwards, but maybe they didn't run again or I should have
waited or something.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-07-18 10:26:45 +02:00
Shubheksha 125da3b812 promtool: add command for querying series (#4308)
Signed-off-by: Shubheksha Jalan <jshubheksha@gmail.com>
2018-07-18 10:15:58 +02:00
Julius Volz 03aa3a3de8
main: Improve / clean up error messages (#4286)
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-07-18 09:58:40 +02:00