Commit graph

427 commits

Author SHA1 Message Date
Annanay 9bba8a6eae Merge branch 'master' into appender-context
Signed-off-by: Annanay <annanayagarwal@gmail.com>
2020-07-30 16:43:18 +05:30
Julien Pivotto 01e3bfcd1a
Add warnings about NFS (#7691)
* Add warnings about NFS

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-30 11:22:44 +02:00
Javier Palomo Almena b58a613443
Replace sync/atomic with uber-go/atomic (#7683)
* storage: Replace usage of sync/atomic with uber-go/atomic

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* tsdb: Replace usage of sync/atomic with uber-go/atomic

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* web: Replace usage of sync/atomic with uber-go/atomic

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* notifier: Replace usage of sync/atomic with uber-go/atomic

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* cmd: Replace usage of sync/atomic with uber-go/atomic

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* scripts: Verify that we are not using restricted packages

It checks that we are not directly importing 'sync/atomic'.

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* Reorganise imports in blocks

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* notifier/test: Apply PR suggestions

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* storage/remote: avoid storing references on newEntry

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* Revert "scripts: Verify that we are not using restricted packages"

This reverts commit 278d32748e.

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* web: Group imports accordingly

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>
2020-07-30 13:15:42 +05:30
Annanay 7f98a744e5 Add context to Appender interface
Signed-off-by: Annanay <annanayagarwal@gmail.com>
2020-07-24 19:40:51 +05:30
chinhnc e05c19da5d
Display block duration in promtool list blocks command (#7653)
* Update tsdb.go

Added DURATION column to `tsdb list` command

Signed-off-by: soup <chicknsoupuds@gmail.com>

* Use time.Duration instead of hardcoded hour

Signed-off-by: soup <chicknsoupuds@gmail.com>
2020-07-24 19:01:20 +05:30
Ben Ye 50c261502e
add tsdb cmds into promtool (#6088)
Signed-off-by: yeya24 <yb532204897@gmail.com>

update tsdb cli in makefile and promu

Signed-off-by: yeya24 <yb532204897@gmail.com>

remove building tsdb bin

Signed-off-by: yeya24 <yb532204897@gmail.com>

remove useless func

Signed-off-by: yeya24 <yb532204897@gmail.com>

refactor analyzeBlock

Signed-off-by: yeya24 <yb532204897@gmail.com>

Fix Makefile

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-07-23 19:35:50 +01:00
Bartlomiej Plotka a0df8a383a
promql: Removed global and add ability to have better interval for subqueries if not specified (#7628)
* promql: Removed global and add ability to have better interval for subqueries if not specified

## Changes
* Refactored tests for better hints testing
* Added various TODO in places to enhance.
* Moved DefaultEvalInterval global to opts with func(rangeMillis int64) int64 function instead

Motivation: At Thanos we would love to have better control over the subqueries step/interval.
This is important to choose proper resolution. I think having proper step also does not harm for
Prometheus and remote read users. Especially on stateless querier we do not know evaluation interval
and in fact putting global can be wrong to assume for Prometheus even.

I think ideally we could try to have at least 3 samples within the range, the same
way Prometheus UI and Grafana assumes.

Anyway this interfaces allows to decide on promQL user basis.

Open question: Is taking parent interval a smart move?

Motivation for removing global: I spent 1h fighting with:


=== RUN   TestEvaluations
    TestEvaluations: promql_test.go:31: unexpected error: error evaluating query "absent_over_time(rate(nonexistant[5m])[5m:])" (line 687): unexpected error: runtime error: integer divide by zero
--- FAIL: TestEvaluations (0.32s)
FAIL

At the end I found that this fails on most of the versions including this master if you run this test alone. If run together with many
other tests it passes. This is due to SetDefaultEvaluationInterval(1 * time.Minute)
in test that is ran before TestEvaluations. Thanks to globals (:

Let's fix it by dropping this global.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Added issue links for TODOs.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Removed irrelevant changes.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-07-22 14:39:51 +01:00
Julien Pivotto b83cbacbdd
Rule manager: remove blocking channel in mail (#7631)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-22 00:13:24 +02:00
Ben Ye e6ea798c32
promtool range query should exit when fail to parse time (#7505)
Signed-off-by: yeya24 <yb532204897@gmail.com>
2020-07-16 23:53:04 +01:00
yeya24 797e48c1a3 support time range in promtool query labels
Updated prometheus/client_golang and json-iterator/go

Signed-off-by: yeya24 <yb532204897@gmail.com>
2020-07-03 11:29:39 -04:00
Frederic Branczyk d17d88935c
rules: Use narrower interface for rule manager loading of for state (#7472)
To load ALERT_FOR_STATE only `storage.Queryable` interface is required,
so this patch uses this narrower interface for to perform this.

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
2020-06-26 19:06:36 +01:00
Bartlomiej Plotka b788986717
storage: Adjusted fully storage layer support for chunk iterators: Remote read client, readyStorage, fanout. (#7059)
* Fixed nits introduced by https://github.com/prometheus/prometheus/pull/7334
* Added ChunkQueryable implementation to fanout and readyStorage.
* Added more comments.
* Changed NewVerticalChunkSeriesMerger to CompactingChunkSeriesMerger, removed tiny interface by reusing VerticalSeriesMergeFunc for overlapping algorithm for
both chunks and series, for both querying and compacting (!) + made sure duplicates are merged.
* Added ErrChunkSeriesSet
* Added Samples interface for seamless []promb.Sample to []tsdbutil.Sample conversion.
* Deprecating non chunks serieset based StreamChunkedReadResponses, added chunk one.
* Improved tests.
* Split remote client into Write (old storage) and read.
* Queryable client is now SampleAndChunkQueryable. Since we cannot use nice QueryableFunc I moved
all config based options to sampleAndChunkQueryableClient to aboid boilerplate.

In next commit: Changes for TSDB.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-06-24 14:41:52 +01:00
Harkishen Singh 70b0a34616
Exit early on invalid config file (#7399)
* Reload config file at start

Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>

* relocated config checking

Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>

* change log lever

Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>

* add helpful comment

Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>
2020-06-21 21:26:59 +05:30
Ben Kochie 8d3c2f6829
Enable WAL compression by default (#7410)
Enable the `--storage.tsdb.wal-compression` flag by defualt.

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-06-18 17:59:40 +01:00
Jordan Neufeld 268b4c29e1
Support extended durations in promtool unit tests (Fixes #6285) (#6297)
* Fixed evaluation_time duration parsing in promtool unit tests (Fixes #6285)

Signed-off-by: Jordan Neufeld <jordan@neufeldtech.com>
2020-06-15 16:03:07 +01:00
Arthur Silva Sens 7727b9012e
Correction of misleading help text(#5142) (#7231)
* Correction of misleading help text(#5142)

Signed-off-by: arthursens <arthursens2005@gmail.com>
2020-05-11 12:15:01 +01:00
Julien Pivotto 9e265aba10
Merge pull request #7225 from prometheus/release-2.18
[Merge without Squash] Merge release-2.18 back to master for 2.18.1 fixes.
2020-05-07 21:23:59 +02:00
Hongcai Ren c7e82274c6
replace github.com/prometheus/prometheus/testutil/promlint by github.com/prometheus/client_golang/prometheus/testutil/promlint from our codebase (#7209)
Signed-off-by: RainbowMango <renhongcai@huawei.com>
2020-05-07 11:34:39 +01:00
Julien Pivotto 645b71e9ef
Fix snapshots (#7217)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-05-07 10:03:48 +01:00
Ganesh Vernekar d4b9fe801f
M-map full chunks of Head from disk (#6679)
When appending to the head and a chunk is full it is flushed to the disk and m-mapped (memory mapped) to free up memory

Prom startup now happens in these stages
 - Iterate the m-maped chunks from disk and keep a map of series reference to its slice of mmapped chunks.
- Iterate the WAL as usual. Whenever we create a new series, look for it's mmapped chunks in the map created before and add it to that series.

If a head chunk is corrupted the currpted one and all chunks after that are deleted and the data after the corruption is recovered from the existing WAL which means that a corruption in m-mapped files results in NO data loss.

[Mmaped chunks format](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/head_chunks.md)  - main difference is that the chunk for mmaping now also includes series reference because there is no index for mapping series to chunks.
[The block chunks](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/chunks.md) are accessed from the index which includes the offsets for the chunks in the chunks file - example - chunks of series ID have offsets 200, 500 etc in the chunk files.
In case of mmaped chunks, the offsets are stored in memory and accessed from that. During WAL replay, these offsets are restored by iterating all m-mapped chunks as stated above by matching the series id present in the chunk header and offset of that chunk in that file.

**Prombench results**

_WAL Replay_

1h Wal reply time
30% less wal reply time - 4m31 vs 3m36
2h Wal reply time
20% less wal reply time - 8m16 vs 7m

_Memory During WAL Replay_

High Churn:
10-15% less RAM -  32gb vs 28gb
20% less RAM after compaction 34gb vs 27gb
No Churn:
20-30% less RAM -  23gb vs 18gb
40% less RAM after compaction 32.5gb vs 20gb

Screenshots are in [this comment](https://github.com/prometheus/prometheus/pull/6679#issuecomment-621678932)


Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-05-06 21:00:00 +05:30
Ben Ye 1e4e37144d
Fixed wrongly handled not ready TSDB on web and API. (#7182)
* fix federate endpoint panic

Signed-off-by: yeya24 <yb532204897@gmail.com>

* Fixed all cases of not ready TSDB being wrongly handled.

* Fixed issue for federation.
* Ensured this will never happen again thanks to interfaces
* Fixes same issue for stats.
* Added tests for readiness.
* Fixed bug in stats. It was:
   status.MaxTime = db.Head().MaxTime()
   status.MinTime = db.Head().MaxTime()


Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Brian's comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Brian's comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-04-29 17:16:14 +01:00
Vasily Sliouniaev 0393b188c9
Add Jaeger (#7148)
* Trace remote read

Signed-off-by: vas <vasily.sliouniaev@jet.com>

* Use jaeger

Signed-off-by: vas <vasily.sliouniaev@jet.com>
2020-04-23 02:05:55 +02:00
Marek Slabicki 8224ddec23
Capitalizing first letter of all log lines (#7043)
Signed-off-by: Marek Slabicki <thaniri@gmail.com>
2020-04-11 09:22:18 +01:00
Brian Brazil 7646cbca32
Use .UTC everywhere we use time.Unix (#7066)
time.Unix attaches the local timezone, which can then
leak out (e.g. in the alert json). While this is harmless,
we should be consistent.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2020-03-29 17:35:39 +01:00
Ben Kochie 269e7c8091
Fix golint issues.
Signed-off-by: Ben Kochie <superq@gmail.com>
2020-03-23 20:38:43 +01:00
johncming bbacd2dd09
remove needless break. (#7008)
Signed-off-by: johncming <johncming@yahoo.com>
2020-03-19 11:21:00 +00:00
李国忠 52025bd7a9
[comments] change word ‘wheter’ to ‘whether’ (#6912)
* [comments] change word ‘wheter’ to ‘whether’
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>

* [comments] change word ‘wheter’ to ‘whether’
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
2020-03-02 13:51:24 +05:30
Tobias Guggenmos 4835bbf376
Merge branch 'master' into split_parser 2020-02-19 15:18:13 +01:00
Bartlomiej Plotka 48ead578a0 Moved tsdbconfig to main.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-18 11:25:36 +00:00
Bartlomiej Plotka a20bebf7eb Moved readyStorage to main.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:57 +00:00
Bartlomiej Plotka 8a775bc468 Moved unit agnostic options to separate pkg.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:57 +00:00
Bartlomiej Plotka 59c9d6ef45 Addressed Brian's comments, moved metrics to main.go
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:57 +00:00
Bartlomiej Plotka cfba92a133 Addressed comments.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:57 +00:00
Bartlomiej Plotka 34426766d8 Unify Iterator interfaces. All point to storage now.
This is part of https://github.com/prometheus/prometheus/pull/5882 that can be done to simplify things.
All todos I added will be fixed in follow up PRs.

* querier.Querier, querier.Appender, querier.SeriesSet, and querier.Series interfaces merged
with storage interface.go. All imports that.
* querier.SeriesIterator replaced by chunkenc.Iterator
* Added chunkenc.Iterator.Seek method and tests for xor implementation (?)
* Since we properly handle SelectParams for Select methods I adjusted min max
based on that. This should help in terms of performance for queries with functions like offset.
* added Seek to deletedIterator and test.
* storage/tsdb was removed as it was only a unnecessary glue with incompatible structs.

No logic was changed, only different source of abstractions, so no need for benchmarks.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:54 +00:00
Tobias Guggenmos 454ba12676 Fix build errors in promtool
Signed-off-by: Tobias Guggenmos <tguggenm@redhat.com>
2020-02-17 16:09:23 +01:00
Björn Rabenstein af04cb22c8
Merge pull request #6821 from prometheus/release-2.16
Release 2.16
2020-02-14 13:10:14 +01:00
Julien Pivotto ff0003e072
Make lookbackDelta a option of QueryEngine (#6746)
* Make lookbackDelta a option of QueryEngine

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* julius' suggestion

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* remove trivial getter

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Assume lookback delta is always > 0

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* add debug log

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* don't expose loopback delta

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Specify that lookack delta is also used in federation

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Fix federation test

While we have added some logic to the promql engine to keep it backwards
compatible and have a 5 minute loopback by default, the web/ package is
likely to really be internal to Prometheus and we should not add the
same kind of heuritstics here.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* loopback delta: Fix debug log

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-02-10 00:58:23 +01:00
Julien Pivotto d799078c88 also test start and end
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-02-08 16:42:50 +01:00
Julien Pivotto 881dde505a promql: fix promql query log step unit
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-02-08 16:26:56 +01:00
Julien Pivotto 3c4c01eae2
Fix race in Query Log Test (#6727)
A data race can happen if we run t.Log after the test t is done -- which
in this case is highly possible because of the use of subtests and the
fact that we call t.Log in a goroutine.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-30 13:51:18 -08:00
Julien Pivotto 9adad8ad30 Remove MaxConcurrent from the PromQL engine opts (#6712)
Since we use ActiveQueryTracker to check for concurrency in
d992c36b3a it does not make sense to keep
the MaxConcurrent value as an option of the PromQL engine.

This pull request removes it from the PromQL engine options, sets the
max concurrent metric to -1 if there is no active query tracker, and use
the value of the active query tracker otherwise.

It removes dead code and also will inform people who import the promql
package that we made that change, as it breaks the EngineOpts struct.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-28 20:38:49 +00:00
Julien Pivotto 5f27ac3583 Refactor query log fields (#6694)
* Refactor query log fields

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-27 09:53:10 +00:00
Julien Pivotto 2b2eb79e8b Add windows tests for query logger (#6653)
* Add windows tests
* Do not rely on time.Time in timer

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-20 13:17:11 +00:00
Julien Pivotto 0eb34299da End-to-end Query Log test (#6600)
* End-to-end Query Log test

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-19 21:56:13 +00:00
Julien Pivotto 1a58d2657d Removed compilation step inside main_test (#6658)
Inspired by https://github.com/prometheus/prometheus/pull/6347 and
https://github.com/prometheus/prometheus/pull/6347#issuecomment-570151979

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-19 07:14:25 +00:00
Harkishen Singh 84e6459c4d Adds support for line-column numbers for invalid rules, promtool (#6533)
Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>
2020-01-15 18:07:54 +00:00
Julien Pivotto 3885562587 Query Logging styling (#6594)
- Fix Json vs JSON in activequerylogger
- Fix SetQueryLogger always returns nil

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-09 21:11:39 +00:00
Julien Pivotto 9d9bc524e5 Add query log (#6520)
* Add query log, make stats logged in JSON like in the API

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-08 13:28:43 +00:00
Simon Pasquier cccd542891
*: avoid missed Alertmanager targets (#6455)
This change makes sure that nearly-identical Alertmanager configurations
aren't merged together.

The config's identifier was the MD5 hash of the configuration serialized
to JSON but because `relabel.Regexp` has no public field and doesn't
implement the JSON.Marshaler interface, it was always serialized to
"{}".

In practice, the identifier can be based on the index of the
configuration in the list.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-12-12 17:00:19 +01:00
Brooks Swinnerton 0ea3a2218d Add time units to storage.tsdb.retention.size flag (#6365)
* Add time units to storage.tsdb.retention.size flag

In an effort to reduce confusion with the `m` option of the
`ParseDuration()` function, this commit adds the available time units to
the `storage.tsdb.retention.time` flag to help showcase that there is no
option for months (which could be assumed to be `m`).

If someone were looking to set the retention to six months, they may
mistakenly do so with `6m`, which would reduce their retention to six
minutes.

Signed-off-by: Brooks Swinnerton <bswinnerton@gmail.com>
2019-11-30 08:00:51 +00:00
johncming ad4bc5701e remove unwanted break (#6338)
Signed-off-by: johncming <johncming@yahoo.com>
2019-11-18 23:01:03 -08:00
akerele abraham 9d39fdad0c unittest: check for rule files existence (#6075)
Signed-off-by: akerele abraham <abrahamakerele38@gmail.com>
2019-11-18 13:54:52 -08:00
Chris Marchbanks 1d1f64b4bc
Fix Promtool showing false duplicate rule warnings (#6270)
Alert rules do not use the Record field, so any alerts with the same
labels and different names would be counted as being duplicates.
Promtool will now consider either field when finding duplicates.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2019-11-05 11:22:31 -07:00
Simon Pasquier ddff1480a7
cmd/promtool: improve output for PromQL tests (#6052)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-09-25 09:26:29 +02:00
Harkishen Singh e097c70e6d add checks for metrics and display duplicate fields (#6026)
Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>
2019-09-20 11:29:47 +01:00
Simon Pasquier 06066a3619
*: improve error messages when parsing bad rules (#5965)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-08-28 17:36:48 +02:00
Sayan Chowdhury cb66e325d8 Show the warnings during label query (#5924)
This patch loops through the warnings while querying the label and spits the
output to stderr

Fixes #5885

Signed-off-by: Sayan Chowdhury <sayan.chowdhury2012@gmail.com>
2019-08-24 19:42:21 +02:00
Bartek Płotka 48b2c9c8ea
remote-read: streamed chunked server side; Extended protobuf; Added chunked, checksumed reader (#5703)
Part of: https://github.com/prometheus/prometheus/issues/4517 and https://github.com/improbable-eng/thanos/issues/488

Changes:
* Extended protobuf for chunked remote read and negotation.
* Added checksumed, chunked Writer/Reader.
* Added Server side implementation for chunked streamed remote-read.


Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2019-08-19 21:16:10 +01:00
Bartek Płotka 5cb32d67f9
Merge pull request #5893 from prometheus/unify-tsdbutil
Removed extra tsdb/testutil after merge.
2019-08-15 12:07:59 +01:00
Bartek Plotka f0863a604e Removed extra tsdb/testutil after merge.
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2019-08-14 10:12:32 +01:00
Julius Volz b5c833ca21
Update go.mod dependencies before release (#5883)
* Update go.mod dependencies before release

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Add issue for showing query warnings in promtool

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Revert json-iterator back to 1.1.6

It produced errors when marshaling Point values with special float
values.

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Fix expected step values in promtool tests after client_golang update

Signed-off-by: Julius Volz <julius.volz@gmail.com>

* Update generated protobuf code after proto dep updates

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2019-08-14 11:00:39 +02:00
Advait Bhatwadekar 5d401f1e1b Added query logging for prometheus. Issue #1315 (#5794)
* Added query logging for prometheus.
Options added:
1) active.queries.filepath: Filename where queries will be recorded
2) active.queries.filesize: Size of the file where queries will be recorded.

Functionality added:
All active queries are now logged in a file. If prometheus crashes unexpectedly, these queries are also printed out on stdout in the rerun.

Queries are written concurrently to an mmaped file, and removed once they are done. Their positions in the file are reused. They are written in json format. However, due to dynamic nature of application, the json has an extra comma after the last query, and is missing an ending ']'. There may also null bytes in the tail of file.

Signed-off-by: Advait Bhatwadekar <advait123@ymail.com>
2019-07-31 16:12:43 +01:00
Simon Pasquier 75886e0464 cmd/promtool: fix panic with empty exp_labels
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-07-17 17:02:31 +02:00
Chris Marchbanks 06f1ba73eb
Provide flag to compress the tsdb WAL
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2019-07-03 08:03:29 -06:00
Tom Wilkie 851131b074
Allow injection of arbitrary headers in promtool, for auth etc. (#4389)
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-06-30 11:50:23 +01:00
Simon Pasquier be67b8d460
web: fix flaky TestHTTPMetrics() (#5695)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-06-24 15:48:15 +02:00
Björn Rabenstein dc22f74153
Merge pull request #5608 from simonpasquier/external-labels-for-alert-tests
cmd/promtool: add $externalLabels for alert unit tests
2019-06-20 16:48:12 +02:00
Björn Rabenstein 372b3438e5 Update prometheus/client_golang to v1.0.0 (#5682)
Signed-off-by: beorn7 <beorn@grafana.com>
2019-06-17 19:14:36 +01:00
Keenan Romain 55f3a9fe4a Allows globs for rules when unit testing (#5595)
* Includes glob support when unit testing rule_files. 

Signed-off-by: Keenan Romain <Keenan.Romain@mailchimp.com>
2019-06-12 11:31:07 +01:00
Simon Pasquier 74ff35ccdd cmd/promtool: add $externalLabels for alert unit tests
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-05-29 16:40:01 +02:00
beorn7 aff4738f33 Adjust TestQueryRange to new Prometheus API client
Signed-off-by: beorn7 <bjoern@rabenste.in>
2019-05-17 18:09:47 +02:00
Lee Gaines f4486815c1 logs filesystem type on startup (#5558)
Signed-off-by: Lee Gaines <leetgaines@gmail.com>
2019-05-17 10:16:16 +01:00
Björn Rabenstein 0a34399611 Fix minor punctuation and language issues in flag doc strings (#5568)
This is mostly to create consistency, not because the one or the other
way would be wrong. A few actual corrections are also included.

Signed-off-by: beorn7 <bjoern@rabenste.in>
2019-05-15 16:59:06 +02:00
Simon Pasquier 45506841e6
*: enable all default linters (#5504)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-05-03 15:11:28 +02:00
Simon Pasquier 9c69eec82a cmd/promtool: use log.NewNopLogger() (#5531)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-05-03 10:00:07 +01:00
Frederic Branczyk c790d7658c
Merge pull request #5491 from metalmatze/rungroup
Use github.com/oklog/run not archived oklog/oklog
2019-04-29 16:22:16 +02:00
Björn Rabenstein 0be9388f8d
Merge pull request #5463 from prometheus/beorn7/templating
Follow-up on #5009
2019-04-24 16:42:23 +02:00
Simon Pasquier abc1994bec
cmd/promtool: return errors from rule evaluations (#5483)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-23 09:59:03 +02:00
Matthias Loibl 388caa06ac
Use github.com/oklog/run not archived oklog/oklog
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
2019-04-19 14:55:28 +02:00
Bjoern Rabenstein 38d518c0fe Rework #5009 after comments
Signed-off-by: Bjoern Rabenstein <bjoern@rabenste.in>
2019-04-17 01:40:10 +02:00
Bjoern Rabenstein a92ef68dd8 Fix staticcheck errors
Not sure why they only show up now.

Signed-off-by: Bjoern Rabenstein <bjoern@rabenste.in>
2019-04-17 01:40:10 +02:00
Sylvain Rabot 335a34486e Add external labels to template expansion
This affects the expansion of templates in alert labels and
annotations and console templates.

Signed-off-by: Sylvain Rabot <sylvain@abstraction.fr>
2019-04-17 01:40:10 +02:00
Simon Pasquier e5dbac7972 cmd/prometheus: group flags properly (#5419)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-10 13:22:05 +01:00
David Symonds 7a60e22c2d cmd/promtool: resolve relative paths in alert test files (#5336)
Like `promtool check config <path/to/foo.yaml>`, which resolves relative
paths inside foo.yaml to be relative to `path/to`, this now makes
`promtool test rules <path/to/test.yaml>` do the same thing.

Signed-off-by: David Symonds <dsymonds@gmail.com>
2019-03-27 10:27:26 +01:00
Tariq Ibrahim 8fdfa8abea refine error handling in prometheus (#5388)
i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors.
ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives.
iii) Does away with the use of fmt package for errors in favour of pkg/errors

Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-03-26 00:01:12 +01:00
Brian Brazil 0a87dcd416
cmd: Warn rather than Info when retention time wraps (#5403)
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-03-25 18:06:38 +00:00
Krasi Georgiev 9d96ada510 Display correct values for the retention in the flags web gui. (#5322)
* Display correct values for the retention in the flags web gui.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>

* adding a log entry

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>

* added the retention info to the runtime status page

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>

* simplify the retention display

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-03-11 22:48:57 +05:30
Krasi Georgiev 1684dc750a
updated tsdb to 0.6.0 (#5292)
* updated tsdb to 0.6.0

as part of the update also added the new storage.tsdb.allow-overlapping-blocks flag and mark it as experimental.
2019-03-04 21:42:45 +02:00
Simon Pasquier c8a1a5a93c
discovery/kubernetes: fix support for password_file and bearer_token_file (#5211)
* discovery/kubernetes: fix support for password_file

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Create and pass custom RoundTripper to Kubernetes client

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Use inline HTTPClientConfig

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-20 11:22:34 +01:00
Krasi Georgiev a3c41f4256
use the default time retention value only when no size retention is set (#5216)
fixes https://github.com/prometheus/prometheus/issues/5213

Now that we have time and size base retention time bases should not have a default value. A default is set only when both - time and size flags are not set.

This change will not affect current installations that rely on the default time based value, and will avoid confusions when only the size retention is set and it is expected that the default time based setting would be no longer in place.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-02-19 13:53:43 +02:00
Callum Styan 6f69e31398 Tail the TSDB WAL for remote_write
This change switches the remote_write API to use the TSDB WAL.  This should reduce memory usage and prevent sample loss when the remote end point is down.

We use the new LiveReader from TSDB to tail WAL segments.  Logic for finding the tracking segment is included in this PR.  The WAL is tailed once for each remote_write endpoint specified. Reading from the segment is based on a ticker rather than relying on fsnotify write events, which were found to be complicated and unreliable in early prototypes.

Enqueuing a sample for sending via remote_write can now block, to provide back pressure.  Queues are still required to acheive parallelism and batching.  We have updated the queue config based on new defaults for queue capacity and pending samples values - much smaller values are now possible.  The remote_write resharding code has been updated to prevent deadlocks, and extra tests have been added for these cases.

As part of this change, we attempt to guarantee that samples are not lost; however this initial version doesn't guarantee this across Prometheus restarts or non-retryable errors from the remote end (eg 400s).

This changes also includes the following optimisations:
- only marshal the proto request once, not once per retry
- maintain a single copy of the labels for given series to reduce GC pressure

Other minor tweaks:
- only reshard if we've also successfully sent recently
- add pending samples, latest sent timestamp, WAL events processed metrics

Co-authored-by: Chris Marchbanks <csmarchbanks.com> (initial prototype)
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> (sharding changes)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-02-12 11:39:13 +00:00
Brian Brazil 1dd57765b4
Reduce time that alertmanagers are in flux when reloaded. (#5126)
This no longer waits for all of the scrape reload to complete
before getting a list of AMs again.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-01-28 18:34:12 +00:00
Goutham Veeramachaneni 4068968e12
Protect retention from overflowing (#5112)
Also sanitise the max block duration to max a month.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-18 20:18:06 +05:30
Goutham Veeramachaneni 384cba1211
Add flag for size based retention (#5109)
* Add flag for size based retention

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Deprecate the old retention flag for a new one.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Add ability to take a suffix for size flag

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Address feedback

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-18 19:18:36 +05:30
Hrishikesh Barman a1f34bec2e Added CORS Origin flag (#5011)
Signed-off-by: Hrishikesh Barman <hrishikeshbman@gmail.com>
2019-01-17 15:01:06 +00:00
Matt Layher 302148fd69 *: apply gofmt -s
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2019-01-16 17:28:14 -05:00
Ryan Leung 45c8b084c6 fix TestFailedStartupExitCode (#5076)
Signed-off-by: rleungx <rleungx@gmail.com>
2019-01-16 10:13:36 +01:00
Lv Jiawei b8ede99767 Fix comment typo (#5087)
According to code, I think it is a typo.

Signed-off-by: MIBc <lvjiawei@cmss.chinamobile.com>
2019-01-09 10:56:47 +00:00
Frederic Branczyk e9ae0b5a1b
Merge pull request #4927 from tariq1890/update_k8s
update client-go to v10.0.0 and other k8s deps to v1.13.1
2019-01-07 10:54:34 +01:00
Simon Pasquier f678e27eb6
*: use latest release of staticcheck (#5057)
* *: use latest release of staticcheck

It also fixes a couple of things in the code flagged by the additional
checks.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Use official release of staticcheck

Also run 'go list' before staticcheck to avoid failures when downloading packages.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-01-04 14:47:38 +01:00