Commit graph

493 commits

Author SHA1 Message Date
Ganesh Vernekar 64bea6999e
HistogramAppender interface for sparse histograms (#9007)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2021-06-28 20:30:55 +05:30
Levi Harrison b5f6f8fb36 Switched to go-kit/log
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-06-11 12:28:36 -04:00
Levi Harrison 7bc11dcb06
React UI: Add Starting Screen (#8662)
* Added walreplay API endpoint

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Added starting page to react-ui

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Documented the new endpoint

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Fixed typos

Signed-off-by: Levi Harrison <git@leviharrison.dev>

Co-authored-by: Julius Volz <julius.volz@gmail.com>

* Removed logo

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Changed isResponding to isUnexpected

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Changed width of progress bar

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Changed width of progress bar

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Added DB stats object

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Updated starting page to work with new fields

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Passing nil

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Passing nil (pt. 2)

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Passing nil (pt. 3)

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Passing nil (and also implementing a method this time) (pt. 4)

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Passing nil (and also implementing a method this time) (pt. 5)

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Changed const to let

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Passing nil (pt. 6)

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Remove SetStats method

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Added comma

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Changed api

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Changed to triple equals

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Fixed data response types

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Don't return pointer

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Changed version

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Fixed interface issue

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Fixed pointer

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Fixed copying lock value error

Signed-off-by: Levi Harrison <git@leviharrison.dev>

Co-authored-by: Julius Volz <julius.volz@gmail.com>
2021-06-05 15:29:32 +01:00
Levi Harrison 17ea8d006a
Added external URL access
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-05-30 23:35:26 -04:00
Bartlomiej Plotka 80545bfb2e
Instrumented circular exemplar storage. (#8712)
* Instrumented circular storage.

Fixes: https://github.com/prometheus/prometheus/issues/8708
Fixes: https://github.com/prometheus/prometheus/issues/8707

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed CB.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Julien comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Callum comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2021-04-16 13:44:53 +01:00
nberkley f9e2dd0697
Add support for smaller block chunk segment allocations (#8478)
* Add support for --storage.tsdb.max-chunk-size to suport small chunks for space limited prometheus instances.

Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Update tsdb/compact.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Update tsdb/db.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Update cmd/prometheus/main.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Change naming scheme to

Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Add a lower bound to --storage.tsdb.max-block-chunk-segment-size

Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Update storage.md to explain what a chunk segment is

Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Apply suggestions from code review

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Force tests

Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

* Fix code style

Signed-off-by: Nathan Berkley <nberkley@tripadvisor.com>

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
2021-04-15 14:25:01 +05:30
Julien Pivotto e635ca834b Add environment variable expansion in external label values
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-03-30 01:36:28 +02:00
Julien Pivotto c0c36b1155
Improve promql-negative-offset docs (#8631)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-03-22 10:16:43 +01:00
Callum Styan 289ba11b79
Add circular in-memory exemplars storage (#6635)
* Add circular in-memory exemplars storage

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Signed-off-by: Martin Disibio <mdisibio@gmail.com>

Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com>
Co-authored-by: Martin Disibio <mdisibio@gmail.com>

* Fix some comments, clean up exemplar metrics struct and exemplar tests.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix exemplar query api null vs empty array issue.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com>
Co-authored-by: Martin Disibio <mdisibio@gmail.com>
2021-03-16 15:17:45 +05:30
Julien Pivotto ad5ed416ba
Merge pull request #8487 from pschou/dev_neg_offset
allow negative offset
2021-03-08 22:18:45 +01:00
Arthur Silva Sens 537c0aff49
Prometheus and Promtool binaries now print help and usage to stdout (#8542)
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2021-02-25 19:52:34 +01:00
schou 22cd48868a adding feature flag, promql-negative-offset
Signed-off-by: schou <pschou@users.noreply.github.com>
2021-02-23 20:25:56 -05:00
Tom Wilkie 7369561305
Combine Appender.Add and AddFast into a single Append method. (#8489)
This moves the label lookup into TSDB, whilst still keeping the cached-ref optimisation for repeated Appends.

This makes the API easier to consume and implement.  In particular this change is motivated by the scrape-time-aggregation work, which I don't think is possible to implement without it as it needs access to label values.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2021-02-18 17:37:00 +05:30
Tom Wilkie d479151f1f Various enhancements and refactorings for remote write receiver:
- Remove unrelated changes
- Refactor code out of the API module - that is already getting pretty crowded.
- Don't track reference for AddFast in remote write.  This has the potential to consume unlimited server-side memory if a malicious client pushes a different label set for every series.  For now, its easier and safer to always use the 'slow' path.
- Return 400 on out of order samples.
- Use remote.DecodeWriteRequest in the remote write adapters.
- Put this behing the 'remote-write-server' feature flag
- Add some (very) basic docs.
- Used named return & add test for commit error propagation

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2021-02-08 20:41:23 +00:00
fuling 72475b8a0c [ENHANCEMENT] remote storage:Add default api implementation of remote write
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
2021-02-07 18:12:48 +00:00
Julien Pivotto 2316062d4e Deprecate --alertmanager.timeout
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-01-25 12:36:13 +01:00
Ganesh Vernekar 9199fcb8d1
'@ <timestamp>' modifier (#8121)
This commit adds `@ <timestamp>` modifier as per this design doc: https://docs.google.com/document/d/1uSbD3T2beM-iX4-Hp7V074bzBRiRNlqUdcWP6JTDQSs/edit.

An example query:

```
rate(process_cpu_seconds_total[1m]) 
  and
topk(7, rate(process_cpu_seconds_total[1h] @ 1234))
```

which ranks based on last 1h rate and w.r.t. unix timestamp 1234 but actually plots the 1m rate.

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2021-01-20 16:27:39 +05:30
Julien Pivotto ac2626757c Update exporter-toolkit to 0.5.0
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-01-13 21:49:54 +01:00
Julien Pivotto 5b4f46a348 Add TLS and basic authentication
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-12-28 21:33:44 +01:00
Ben Kochie 5055dfbbe4 Listen on web early in startup
Avoid starting up components like the TSDB if we can't bind
to the web listening port.

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-12-28 20:13:05 +01:00
lzhfromustc 27a6e1e174
test: add buffer to channel to avoid goroutine leak (#8274)
Signed-off-by: lzhfromustc <lzhfromustc@gmail.com>
2020-12-10 09:09:21 +00:00
gotjosh 4eca4dffb8
Allow metric metadata to be propagated via Remote Write. (#6815)
* Introduce a metadata watcher

Similarly to the WAL watcher, its purpose is to observe the scrape manager and pull metadata. Then, send it to a remote storage.

Signed-off-by: gotjosh <josue@grafana.com>

* Additional fixes after rebasing.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Rework samples/metadata metrics.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Use more descriptive variable names in MetadataWatcher collect.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix issues caused during rebasing.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix missing metric add and unneeded config code.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address some review comments.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix metrics and docs

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Replace assert with require

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Bring back max_samples_per_send metric

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix tests

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

Co-authored-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-11-19 20:53:03 +05:30
Julien Pivotto 6c56a1faaa
Testify: move to require (#8122)
* Testify: move to require

Moving testify to require to fail tests early in case of errors.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* More moves

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-29 09:43:23 +00:00
Julien Pivotto 1282d1b39c
Refactor test assertions (#8110)
* Refactor test assertions

This pull request gets rid of assert.True where possible to use
fine-grained assertions.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-27 11:06:53 +01:00
Julien Pivotto 4e5b1722b3
Move away from testutil, refactor imports (#8087)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-22 11:00:08 +02:00
beorn7 0f3c1bf6cf Report valid configs in the respective metrics from the beginning
In #7399, an early validity check of the config was introduced to
prevent the scenario where an invalid config is only detected after a
possibly very long startup procedure. However, the respective success
metrics are not updated after the initial validation so that the
success metrics suggest an invalid config. If the startup procedure,
like replaying the WAL, really takes very long, alerts about invalid
config will trigger.

This commit sets the succes metrics after initial validation. They
will be set again after the "real" config (re-)load, but that
shouldn't be a problem. The metric now truthfully represents whenever
the config was successfully loaded, no matter if the result was then
thrown away (because it was just for validation) or actually used.

Signed-off-by: beorn7 <beorn@grafana.com>
2020-10-12 21:30:59 +02:00
Frederic Branczyk da3ea43242
Merge pull request #7976 from roidelapluie/tolerance
Introduce timestamp tolerance in scrapes
2020-10-08 09:21:19 +02:00
Julien Pivotto be5ba1a62d Fix wordings
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-07 21:44:36 +02:00
Julien Pivotto 4617d16b4b Specify the removal
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-07 18:32:04 +02:00
Julien Pivotto e2a2bf3c06 Add context
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-07 18:30:32 +02:00
Julien Pivotto 627ff84599 Adjust flag
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-07 18:25:52 +02:00
Julien Pivotto 6b618ecf02 Better description
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-07 17:43:42 +02:00
Julien Pivotto 536dfb6234 Add an experimental, hidden flag
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-07 17:31:46 +02:00
Frederic Branczyk 6be3ebdfe7
Merge pull request #8015 from simonpasquier/bump-k8s-deps
Bump k8s dependencies + support k8s.io/klog/v2
2020-10-07 09:54:58 +02:00
Julien Pivotto 946819e16e
cmd/prometheus: Issue a warning on 32 bit archs (#8012)
* cmd/prometheus: Issue a warning on 32 bit archs

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-06 21:42:56 +02:00
Simon Pasquier 9bb3555fe4 cmd/prometheus: support k8s.io/klog/v2
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-10-06 14:56:14 +02:00
Andy Bursavich 4e6a94a27d
Invert service discovery dependencies (#7701)
This also fixes a bug in query_log_file, which now is relative to the config file like all other paths.

Signed-off-by: Andy Bursavich <abursavich@gmail.com>
2020-08-20 13:48:26 +01:00
Julien Pivotto d661f84748 Log duration of reloads
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-08-06 21:49:26 +02:00
Annanay 9bba8a6eae Merge branch 'master' into appender-context
Signed-off-by: Annanay <annanayagarwal@gmail.com>
2020-07-30 16:43:18 +05:30
Julien Pivotto 01e3bfcd1a
Add warnings about NFS (#7691)
* Add warnings about NFS

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-30 11:22:44 +02:00
Javier Palomo Almena b58a613443
Replace sync/atomic with uber-go/atomic (#7683)
* storage: Replace usage of sync/atomic with uber-go/atomic

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* tsdb: Replace usage of sync/atomic with uber-go/atomic

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* web: Replace usage of sync/atomic with uber-go/atomic

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* notifier: Replace usage of sync/atomic with uber-go/atomic

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* cmd: Replace usage of sync/atomic with uber-go/atomic

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* scripts: Verify that we are not using restricted packages

It checks that we are not directly importing 'sync/atomic'.

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* Reorganise imports in blocks

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* notifier/test: Apply PR suggestions

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* storage/remote: avoid storing references on newEntry

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* Revert "scripts: Verify that we are not using restricted packages"

This reverts commit 278d32748e.

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* web: Group imports accordingly

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>
2020-07-30 13:15:42 +05:30
Annanay 7f98a744e5 Add context to Appender interface
Signed-off-by: Annanay <annanayagarwal@gmail.com>
2020-07-24 19:40:51 +05:30
Bartlomiej Plotka a0df8a383a
promql: Removed global and add ability to have better interval for subqueries if not specified (#7628)
* promql: Removed global and add ability to have better interval for subqueries if not specified

## Changes
* Refactored tests for better hints testing
* Added various TODO in places to enhance.
* Moved DefaultEvalInterval global to opts with func(rangeMillis int64) int64 function instead

Motivation: At Thanos we would love to have better control over the subqueries step/interval.
This is important to choose proper resolution. I think having proper step also does not harm for
Prometheus and remote read users. Especially on stateless querier we do not know evaluation interval
and in fact putting global can be wrong to assume for Prometheus even.

I think ideally we could try to have at least 3 samples within the range, the same
way Prometheus UI and Grafana assumes.

Anyway this interfaces allows to decide on promQL user basis.

Open question: Is taking parent interval a smart move?

Motivation for removing global: I spent 1h fighting with:


=== RUN   TestEvaluations
    TestEvaluations: promql_test.go:31: unexpected error: error evaluating query "absent_over_time(rate(nonexistant[5m])[5m:])" (line 687): unexpected error: runtime error: integer divide by zero
--- FAIL: TestEvaluations (0.32s)
FAIL

At the end I found that this fails on most of the versions including this master if you run this test alone. If run together with many
other tests it passes. This is due to SetDefaultEvaluationInterval(1 * time.Minute)
in test that is ran before TestEvaluations. Thanks to globals (:

Let's fix it by dropping this global.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Added issue links for TODOs.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Removed irrelevant changes.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-07-22 14:39:51 +01:00
Julien Pivotto b83cbacbdd
Rule manager: remove blocking channel in mail (#7631)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-22 00:13:24 +02:00
Frederic Branczyk d17d88935c
rules: Use narrower interface for rule manager loading of for state (#7472)
To load ALERT_FOR_STATE only `storage.Queryable` interface is required,
so this patch uses this narrower interface for to perform this.

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
2020-06-26 19:06:36 +01:00
Bartlomiej Plotka b788986717
storage: Adjusted fully storage layer support for chunk iterators: Remote read client, readyStorage, fanout. (#7059)
* Fixed nits introduced by https://github.com/prometheus/prometheus/pull/7334
* Added ChunkQueryable implementation to fanout and readyStorage.
* Added more comments.
* Changed NewVerticalChunkSeriesMerger to CompactingChunkSeriesMerger, removed tiny interface by reusing VerticalSeriesMergeFunc for overlapping algorithm for
both chunks and series, for both querying and compacting (!) + made sure duplicates are merged.
* Added ErrChunkSeriesSet
* Added Samples interface for seamless []promb.Sample to []tsdbutil.Sample conversion.
* Deprecating non chunks serieset based StreamChunkedReadResponses, added chunk one.
* Improved tests.
* Split remote client into Write (old storage) and read.
* Queryable client is now SampleAndChunkQueryable. Since we cannot use nice QueryableFunc I moved
all config based options to sampleAndChunkQueryableClient to aboid boilerplate.

In next commit: Changes for TSDB.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-06-24 14:41:52 +01:00
Harkishen Singh 70b0a34616
Exit early on invalid config file (#7399)
* Reload config file at start

Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>

* relocated config checking

Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>

* change log lever

Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>

* add helpful comment

Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>
2020-06-21 21:26:59 +05:30
Ben Kochie 8d3c2f6829
Enable WAL compression by default (#7410)
Enable the `--storage.tsdb.wal-compression` flag by defualt.

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-06-18 17:59:40 +01:00
Arthur Silva Sens 7727b9012e
Correction of misleading help text(#5142) (#7231)
* Correction of misleading help text(#5142)

Signed-off-by: arthursens <arthursens2005@gmail.com>
2020-05-11 12:15:01 +01:00
Julien Pivotto 9e265aba10
Merge pull request #7225 from prometheus/release-2.18
[Merge without Squash] Merge release-2.18 back to master for 2.18.1 fixes.
2020-05-07 21:23:59 +02:00
Julien Pivotto 645b71e9ef
Fix snapshots (#7217)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-05-07 10:03:48 +01:00
Ganesh Vernekar d4b9fe801f
M-map full chunks of Head from disk (#6679)
When appending to the head and a chunk is full it is flushed to the disk and m-mapped (memory mapped) to free up memory

Prom startup now happens in these stages
 - Iterate the m-maped chunks from disk and keep a map of series reference to its slice of mmapped chunks.
- Iterate the WAL as usual. Whenever we create a new series, look for it's mmapped chunks in the map created before and add it to that series.

If a head chunk is corrupted the currpted one and all chunks after that are deleted and the data after the corruption is recovered from the existing WAL which means that a corruption in m-mapped files results in NO data loss.

[Mmaped chunks format](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/head_chunks.md)  - main difference is that the chunk for mmaping now also includes series reference because there is no index for mapping series to chunks.
[The block chunks](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/chunks.md) are accessed from the index which includes the offsets for the chunks in the chunks file - example - chunks of series ID have offsets 200, 500 etc in the chunk files.
In case of mmaped chunks, the offsets are stored in memory and accessed from that. During WAL replay, these offsets are restored by iterating all m-mapped chunks as stated above by matching the series id present in the chunk header and offset of that chunk in that file.

**Prombench results**

_WAL Replay_

1h Wal reply time
30% less wal reply time - 4m31 vs 3m36
2h Wal reply time
20% less wal reply time - 8m16 vs 7m

_Memory During WAL Replay_

High Churn:
10-15% less RAM -  32gb vs 28gb
20% less RAM after compaction 34gb vs 27gb
No Churn:
20-30% less RAM -  23gb vs 18gb
40% less RAM after compaction 32.5gb vs 20gb

Screenshots are in [this comment](https://github.com/prometheus/prometheus/pull/6679#issuecomment-621678932)


Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-05-06 21:00:00 +05:30
Ben Ye 1e4e37144d
Fixed wrongly handled not ready TSDB on web and API. (#7182)
* fix federate endpoint panic

Signed-off-by: yeya24 <yb532204897@gmail.com>

* Fixed all cases of not ready TSDB being wrongly handled.

* Fixed issue for federation.
* Ensured this will never happen again thanks to interfaces
* Fixes same issue for stats.
* Added tests for readiness.
* Fixed bug in stats. It was:
   status.MaxTime = db.Head().MaxTime()
   status.MinTime = db.Head().MaxTime()


Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Brian's comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Brian's comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-04-29 17:16:14 +01:00
Vasily Sliouniaev 0393b188c9
Add Jaeger (#7148)
* Trace remote read

Signed-off-by: vas <vasily.sliouniaev@jet.com>

* Use jaeger

Signed-off-by: vas <vasily.sliouniaev@jet.com>
2020-04-23 02:05:55 +02:00
Marek Slabicki 8224ddec23
Capitalizing first letter of all log lines (#7043)
Signed-off-by: Marek Slabicki <thaniri@gmail.com>
2020-04-11 09:22:18 +01:00
Ben Kochie 269e7c8091
Fix golint issues.
Signed-off-by: Ben Kochie <superq@gmail.com>
2020-03-23 20:38:43 +01:00
johncming bbacd2dd09
remove needless break. (#7008)
Signed-off-by: johncming <johncming@yahoo.com>
2020-03-19 11:21:00 +00:00
李国忠 52025bd7a9
[comments] change word ‘wheter’ to ‘whether’ (#6912)
* [comments] change word ‘wheter’ to ‘whether’
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>

* [comments] change word ‘wheter’ to ‘whether’
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
2020-03-02 13:51:24 +05:30
Bartlomiej Plotka 48ead578a0 Moved tsdbconfig to main.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-18 11:25:36 +00:00
Bartlomiej Plotka a20bebf7eb Moved readyStorage to main.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:57 +00:00
Bartlomiej Plotka 8a775bc468 Moved unit agnostic options to separate pkg.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:57 +00:00
Bartlomiej Plotka 59c9d6ef45 Addressed Brian's comments, moved metrics to main.go
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:57 +00:00
Bartlomiej Plotka cfba92a133 Addressed comments.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:57 +00:00
Bartlomiej Plotka 34426766d8 Unify Iterator interfaces. All point to storage now.
This is part of https://github.com/prometheus/prometheus/pull/5882 that can be done to simplify things.
All todos I added will be fixed in follow up PRs.

* querier.Querier, querier.Appender, querier.SeriesSet, and querier.Series interfaces merged
with storage interface.go. All imports that.
* querier.SeriesIterator replaced by chunkenc.Iterator
* Added chunkenc.Iterator.Seek method and tests for xor implementation (?)
* Since we properly handle SelectParams for Select methods I adjusted min max
based on that. This should help in terms of performance for queries with functions like offset.
* added Seek to deletedIterator and test.
* storage/tsdb was removed as it was only a unnecessary glue with incompatible structs.

No logic was changed, only different source of abstractions, so no need for benchmarks.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:54 +00:00
Björn Rabenstein af04cb22c8
Merge pull request #6821 from prometheus/release-2.16
Release 2.16
2020-02-14 13:10:14 +01:00
Julien Pivotto ff0003e072
Make lookbackDelta a option of QueryEngine (#6746)
* Make lookbackDelta a option of QueryEngine

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* julius' suggestion

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* remove trivial getter

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Assume lookback delta is always > 0

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* add debug log

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* don't expose loopback delta

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Specify that lookack delta is also used in federation

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Fix federation test

While we have added some logic to the promql engine to keep it backwards
compatible and have a 5 minute loopback by default, the web/ package is
likely to really be internal to Prometheus and we should not add the
same kind of heuritstics here.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* loopback delta: Fix debug log

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-02-10 00:58:23 +01:00
Julien Pivotto d799078c88 also test start and end
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-02-08 16:42:50 +01:00
Julien Pivotto 881dde505a promql: fix promql query log step unit
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-02-08 16:26:56 +01:00
Julien Pivotto 3c4c01eae2
Fix race in Query Log Test (#6727)
A data race can happen if we run t.Log after the test t is done -- which
in this case is highly possible because of the use of subtests and the
fact that we call t.Log in a goroutine.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-30 13:51:18 -08:00
Julien Pivotto 9adad8ad30 Remove MaxConcurrent from the PromQL engine opts (#6712)
Since we use ActiveQueryTracker to check for concurrency in
d992c36b3a it does not make sense to keep
the MaxConcurrent value as an option of the PromQL engine.

This pull request removes it from the PromQL engine options, sets the
max concurrent metric to -1 if there is no active query tracker, and use
the value of the active query tracker otherwise.

It removes dead code and also will inform people who import the promql
package that we made that change, as it breaks the EngineOpts struct.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-28 20:38:49 +00:00
Julien Pivotto 5f27ac3583 Refactor query log fields (#6694)
* Refactor query log fields

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-27 09:53:10 +00:00
Julien Pivotto 2b2eb79e8b Add windows tests for query logger (#6653)
* Add windows tests
* Do not rely on time.Time in timer

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-20 13:17:11 +00:00
Julien Pivotto 0eb34299da End-to-end Query Log test (#6600)
* End-to-end Query Log test

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-19 21:56:13 +00:00
Julien Pivotto 1a58d2657d Removed compilation step inside main_test (#6658)
Inspired by https://github.com/prometheus/prometheus/pull/6347 and
https://github.com/prometheus/prometheus/pull/6347#issuecomment-570151979

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-19 07:14:25 +00:00
Julien Pivotto 3885562587 Query Logging styling (#6594)
- Fix Json vs JSON in activequerylogger
- Fix SetQueryLogger always returns nil

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-09 21:11:39 +00:00
Julien Pivotto 9d9bc524e5 Add query log (#6520)
* Add query log, make stats logged in JSON like in the API

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-08 13:28:43 +00:00
Simon Pasquier cccd542891
*: avoid missed Alertmanager targets (#6455)
This change makes sure that nearly-identical Alertmanager configurations
aren't merged together.

The config's identifier was the MD5 hash of the configuration serialized
to JSON but because `relabel.Regexp` has no public field and doesn't
implement the JSON.Marshaler interface, it was always serialized to
"{}".

In practice, the identifier can be based on the index of the
configuration in the list.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-12-12 17:00:19 +01:00
Brooks Swinnerton 0ea3a2218d Add time units to storage.tsdb.retention.size flag (#6365)
* Add time units to storage.tsdb.retention.size flag

In an effort to reduce confusion with the `m` option of the
`ParseDuration()` function, this commit adds the available time units to
the `storage.tsdb.retention.time` flag to help showcase that there is no
option for months (which could be assumed to be `m`).

If someone were looking to set the retention to six months, they may
mistakenly do so with `6m`, which would reduce their retention to six
minutes.

Signed-off-by: Brooks Swinnerton <bswinnerton@gmail.com>
2019-11-30 08:00:51 +00:00
johncming ad4bc5701e remove unwanted break (#6338)
Signed-off-by: johncming <johncming@yahoo.com>
2019-11-18 23:01:03 -08:00
Bartek Płotka 48b2c9c8ea
remote-read: streamed chunked server side; Extended protobuf; Added chunked, checksumed reader (#5703)
Part of: https://github.com/prometheus/prometheus/issues/4517 and https://github.com/improbable-eng/thanos/issues/488

Changes:
* Extended protobuf for chunked remote read and negotation.
* Added checksumed, chunked Writer/Reader.
* Added Server side implementation for chunked streamed remote-read.


Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2019-08-19 21:16:10 +01:00
Bartek Plotka f0863a604e Removed extra tsdb/testutil after merge.
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2019-08-14 10:12:32 +01:00
Advait Bhatwadekar 5d401f1e1b Added query logging for prometheus. Issue #1315 (#5794)
* Added query logging for prometheus.
Options added:
1) active.queries.filepath: Filename where queries will be recorded
2) active.queries.filesize: Size of the file where queries will be recorded.

Functionality added:
All active queries are now logged in a file. If prometheus crashes unexpectedly, these queries are also printed out on stdout in the rerun.

Queries are written concurrently to an mmaped file, and removed once they are done. Their positions in the file are reused. They are written in json format. However, due to dynamic nature of application, the json has an extra comma after the last query, and is missing an ending ']'. There may also null bytes in the tail of file.

Signed-off-by: Advait Bhatwadekar <advait123@ymail.com>
2019-07-31 16:12:43 +01:00
Chris Marchbanks 06f1ba73eb
Provide flag to compress the tsdb WAL
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2019-07-03 08:03:29 -06:00
Simon Pasquier be67b8d460
web: fix flaky TestHTTPMetrics() (#5695)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-06-24 15:48:15 +02:00
Lee Gaines f4486815c1 logs filesystem type on startup (#5558)
Signed-off-by: Lee Gaines <leetgaines@gmail.com>
2019-05-17 10:16:16 +01:00
Björn Rabenstein 0a34399611 Fix minor punctuation and language issues in flag doc strings (#5568)
This is mostly to create consistency, not because the one or the other
way would be wrong. A few actual corrections are also included.

Signed-off-by: beorn7 <bjoern@rabenste.in>
2019-05-15 16:59:06 +02:00
Frederic Branczyk c790d7658c
Merge pull request #5491 from metalmatze/rungroup
Use github.com/oklog/run not archived oklog/oklog
2019-04-29 16:22:16 +02:00
Matthias Loibl 388caa06ac
Use github.com/oklog/run not archived oklog/oklog
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
2019-04-19 14:55:28 +02:00
Bjoern Rabenstein 38d518c0fe Rework #5009 after comments
Signed-off-by: Bjoern Rabenstein <bjoern@rabenste.in>
2019-04-17 01:40:10 +02:00
Sylvain Rabot 335a34486e Add external labels to template expansion
This affects the expansion of templates in alert labels and
annotations and console templates.

Signed-off-by: Sylvain Rabot <sylvain@abstraction.fr>
2019-04-17 01:40:10 +02:00
Simon Pasquier e5dbac7972 cmd/prometheus: group flags properly (#5419)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-04-10 13:22:05 +01:00
Tariq Ibrahim 8fdfa8abea refine error handling in prometheus (#5388)
i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors.
ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives.
iii) Does away with the use of fmt package for errors in favour of pkg/errors

Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-03-26 00:01:12 +01:00
Brian Brazil 0a87dcd416
cmd: Warn rather than Info when retention time wraps (#5403)
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-03-25 18:06:38 +00:00
Krasi Georgiev 9d96ada510 Display correct values for the retention in the flags web gui. (#5322)
* Display correct values for the retention in the flags web gui.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>

* adding a log entry

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>

* added the retention info to the runtime status page

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>

* simplify the retention display

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-03-11 22:48:57 +05:30
Krasi Georgiev 1684dc750a
updated tsdb to 0.6.0 (#5292)
* updated tsdb to 0.6.0

as part of the update also added the new storage.tsdb.allow-overlapping-blocks flag and mark it as experimental.
2019-03-04 21:42:45 +02:00
Krasi Georgiev a3c41f4256
use the default time retention value only when no size retention is set (#5216)
fixes https://github.com/prometheus/prometheus/issues/5213

Now that we have time and size base retention time bases should not have a default value. A default is set only when both - time and size flags are not set.

This change will not affect current installations that rely on the default time based value, and will avoid confusions when only the size retention is set and it is expected that the default time based setting would be no longer in place.

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2019-02-19 13:53:43 +02:00
Callum Styan 6f69e31398 Tail the TSDB WAL for remote_write
This change switches the remote_write API to use the TSDB WAL.  This should reduce memory usage and prevent sample loss when the remote end point is down.

We use the new LiveReader from TSDB to tail WAL segments.  Logic for finding the tracking segment is included in this PR.  The WAL is tailed once for each remote_write endpoint specified. Reading from the segment is based on a ticker rather than relying on fsnotify write events, which were found to be complicated and unreliable in early prototypes.

Enqueuing a sample for sending via remote_write can now block, to provide back pressure.  Queues are still required to acheive parallelism and batching.  We have updated the queue config based on new defaults for queue capacity and pending samples values - much smaller values are now possible.  The remote_write resharding code has been updated to prevent deadlocks, and extra tests have been added for these cases.

As part of this change, we attempt to guarantee that samples are not lost; however this initial version doesn't guarantee this across Prometheus restarts or non-retryable errors from the remote end (eg 400s).

This changes also includes the following optimisations:
- only marshal the proto request once, not once per retry
- maintain a single copy of the labels for given series to reduce GC pressure

Other minor tweaks:
- only reshard if we've also successfully sent recently
- add pending samples, latest sent timestamp, WAL events processed metrics

Co-authored-by: Chris Marchbanks <csmarchbanks.com> (initial prototype)
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> (sharding changes)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-02-12 11:39:13 +00:00
Brian Brazil 1dd57765b4
Reduce time that alertmanagers are in flux when reloaded. (#5126)
This no longer waits for all of the scrape reload to complete
before getting a list of AMs again.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2019-01-28 18:34:12 +00:00
Goutham Veeramachaneni 4068968e12
Protect retention from overflowing (#5112)
Also sanitise the max block duration to max a month.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-18 20:18:06 +05:30
Goutham Veeramachaneni 384cba1211
Add flag for size based retention (#5109)
* Add flag for size based retention

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Deprecate the old retention flag for a new one.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Add ability to take a suffix for size flag

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Address feedback

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2019-01-18 19:18:36 +05:30
Hrishikesh Barman a1f34bec2e Added CORS Origin flag (#5011)
Signed-off-by: Hrishikesh Barman <hrishikeshbman@gmail.com>
2019-01-17 15:01:06 +00:00
Matt Layher 302148fd69 *: apply gofmt -s
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2019-01-16 17:28:14 -05:00
Ryan Leung 45c8b084c6 fix TestFailedStartupExitCode (#5076)
Signed-off-by: rleungx <rleungx@gmail.com>
2019-01-16 10:13:36 +01:00
Lv Jiawei b8ede99767 Fix comment typo (#5087)
According to code, I think it is a typo.

Signed-off-by: MIBc <lvjiawei@cmss.chinamobile.com>
2019-01-09 10:56:47 +00:00
tariqibrahim 9b4a25e7b0 use klog dependency
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-01-03 13:57:20 -08:00
glutamatt 5ddde1965b tune the "Wal segment size" with a flag (#5029)
Add WALSegmentSize as an option, and the corresponding flag "storage.tsdb.wal-segment-size" to tune the max size of wal segment files.

The addressed base problem is to reduce the disk space used by wal segment files : on a raspberry pi, for instance, we often want to reduce write load of the sd card, then, the wal directory is mounted on a memory (space limited) partition.

the default value of the segment max file size, pushed the size of directory to 128 MB for each segment , which is too much ram consumption on a rasp.

the initial discussion is at https://github.com/prometheus/tsdb/pull/450
2019-01-03 17:13:21 +03:00
Ganesh Vernekar dbe55c1352 Subquery (#4831)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-12-22 13:47:13 +00:00
Simon Pasquier a2766a94a3 cmd/prometheus: add tests for sendAlerts() (#4910)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-12-18 11:15:46 +00:00
AixesHunter 1b166d7174 Fix variable 'notifier' collides with imported package name 'github.com/prometheus/prometheus/notifier', changed to 'notifierManager'. (#4947)
Signed-off-by: aixeshunter <aixeshunter@gmail.com>
2018-12-18 11:13:18 +00:00
Simon Pasquier ac9d5f3d53
cmd/prometheus: replace glog by glog-gokit (#4931)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-12-04 15:01:12 +01:00
Alex Yu 5dcce32ef8 update promlog to latest version (#4876)
* update promlog to latest version

Signed-off-by: Alex Yu <yu.alex96@gmail.com>

* Update api tests, fix main setup

Signed-off-by: Alex Yu <yu.alex96@gmail.com>

* tidy go.sum

Signed-off-by: Alex Yu <yu.alex96@gmail.com>

* revendor prometheus/common

Signed-off-by: Alex Yu <yu.alex96@gmail.com>

* only initialize config; use kingpin for remote_storage_adapter

Signed-off-by: Alex Yu <yu.alex96@gmail.com>

* actually parse the flags

Signed-off-by: Alex Yu <yu.alex96@gmail.com>

* clean up imports

Signed-off-by: Alex Yu <yu.alex96@gmail.com>
2018-11-23 14:22:40 +01:00
achiuBAE a9050c45f6 Allow setting the Prometheus instance document title through a flag. (#4841)
* web: added ability to set page title through flag.

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* Reformatted variable names and Flag description for readability.

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* assets_vfsdata.go

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* Flag name changed from web.ui-title to web.page-title

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>

* make assets

Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>
2018-11-21 12:45:06 +08:00
Lucas Serven 70c8b2c63c
cmd/prometheus: buffer signal chans
According to the GoDoc for os.Signal [0]:

> Package signal will not block sending to c: the caller must ensure that
> c has sufficient buffer space to keep up with the expected signal rate.
> For a channel used for notification of just one signal value, a buffer
> of size 1 is sufficient.

[0] https://golang.org/pkg/os/signal/#Notify

Signed-off-by: Lucas Serven <lserven@gmail.com>
2018-11-14 10:33:28 +01:00
Simon Pasquier a30348f1a4 discovery: add config label to discovered targets metric (#4753)
* discovery: add labels to discovered targets metric

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-10-18 16:46:59 +01:00
Callum Styan 9bca041285 WIP: keep track of samples per query, set a max # of samples (#4513)
* keep track of samples per query, set a max # of samples that can be in
memory at once

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2018-10-02 12:59:19 +01:00
Tom Wilkie 4c52400708
Limit concurrent remote reads. (#4656)
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-09-25 20:07:34 +01:00
Tom Wilkie 457e4bb58e
Limit the number of samples remote read can return. (#4532)
* Limit the number of samples remote read can return.

- Return 413 entity too large.
- Limit can be set be a flag.  Allow 0 to mean no limit.
- Include limit in error message.
- Set default limit to 50M (* 16 bytes = 800MB).

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-09-05 15:50:50 +02:00
Chris Marchbanks 63ed9d1b70 Send EndsAt along with alerts (#4550)
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2018-08-28 16:05:00 +01:00
Chris Marchbanks 87f1dad16d throttle resends of alerts to 1 minute by default (#4538)
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2018-08-27 17:41:42 +01:00
Krasi Georgiev 12fe204ea6
move runtime debug funcs in own package (#4494)
To make local debuging with `go run` easyer moved all files into a
dedicate package `runtime`.
This allows running prometheus just by using `go run main.go` instead of
passing mani files like `go run main.go limits_default.go ...`

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-08-22 13:41:11 +03:00
Simon Pasquier 08c2f50382
Merge pull request #4418 from simonpasquier/log-vm-limits
prometheus: log virtual memory limits
2018-08-07 16:27:46 +02:00
Julius Volz 90521a65f8
Remove error return value from NotifyFunc() (#4459)
It's always nil and we also forgot to check it.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-08-04 21:31:12 +02:00
Ganesh Vernekar f1db699dff Persist alert 'for' state across restarts (#4061)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-08-02 11:18:24 +01:00
Simon Pasquier a94450c288 Fix build for openbsd
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-07-31 14:41:30 +02:00
Simon Pasquier 141c188ae6 Enforce conversion for freebsd
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-07-26 14:58:56 +02:00
Simon Pasquier 208d21a393 Add comment and print units
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-07-26 10:26:58 +02:00
Simon Pasquier ba22b10113 prometheus: log virtual memory limits
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-07-25 15:51:27 +02:00
Julius Volz 03aa3a3de8
main: Improve / clean up error messages (#4286)
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-07-18 09:58:40 +02:00
Brian Brazil 68e8b80ffe
Reorder startup and shutdown to prevent panics. (#4321)
Start rule manager only after tsdb and config is loaded.
Stop rule manager before tsdb to avoid writing to closed storage.
Wait for any in-progress reloads to complete before shutting
down rule manager, so that rule manager doesn't get updated after
being shut down.

Remove incorrect comment around shutting down query enginge.
Log when config reload is completed.

Fixes #4133
Fixes #4262

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-07-04 13:41:16 +01:00
Michael Khalil 78e0784d04 return error exit status in prometheus cli (#4296)
Signed-off-by: mikeykhalil <mikeyfkhalil@gmail.com>
2018-06-21 08:32:26 +01:00
Tom Wilkie 8acad5f3cd make it compile
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-05-24 15:40:24 +01:00
Tom Wilkie e51d6c4b6c Make remote flush deadline a command line param.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-05-23 15:06:01 +01:00
Mario Trangoni 464e747f1e fix some comments typos (#4059) 2018-04-08 10:51:54 +01:00
Sneha Inguva 7be846754a main: actor functionality comments 2018-04-01 11:19:30 -07:00
Marek Siarkowicz bb86c3f62b Report internal runtime information on status page (#3921)
Add information about tsdb, wal and config reload
2018-03-21 16:08:37 +00:00
James Turnbull ba5273a0ab Minor edits to help text (#3990) 2018-03-20 16:54:36 +00:00
Simon Pasquier e1fd96db25 cmd: fix help text (#3989) 2018-03-20 15:58:19 +00:00
ferhat elmas ffa673f7d8 General simplifications (#3887)
Another try as in #1516
2018-02-26 07:58:10 +00:00
Bartek Plotka 93a63ac5fd api: Added v1/status/flags endpoint. (#3864)
Endpoint URL: /api/v1/status/flags
Example Output:
```json
{
  "status": "success",
  "data": {
    "alertmanager.notification-queue-capacity": "10000",
    "alertmanager.timeout": "10s",
    "completion-bash": "false",
    "completion-script-bash": "false",
    "completion-script-zsh": "false",
    "config.file": "my_cool_prometheus.yaml",
    "help": "false",
    "help-long": "false",
    "help-man": "false",
    "log.level": "info",
    "query.lookback-delta": "5m",
    "query.max-concurrency": "20",
    "query.timeout": "2m",
    "storage.tsdb.max-block-duration": "36h",
    "storage.tsdb.min-block-duration": "2h",
    "storage.tsdb.no-lockfile": "false",
    "storage.tsdb.path": "data/",
    "storage.tsdb.retention": "15d",
    "version": "false",
    "web.console.libraries": "console_libraries",
    "web.console.templates": "consoles",
    "web.enable-admin-api": "false",
    "web.enable-lifecycle": "false",
    "web.external-url": "",
    "web.listen-address": "0.0.0.0:9090",
    "web.max-connections": "512",
    "web.read-timeout": "5m",
    "web.route-prefix": "/",
    "web.user-assets": ""
  }
}
```

Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2018-02-21 08:49:02 +00:00
Fabian Reinartz 7ccd4b39b8 *: implement query params
This adds a parameter to the storage selection interface which allows
query engine(s) to pass information about the operations surrounding a
data selection.
This can for example be used by remote storage backends to infer the
correct downsampling aggregates that need to be provided.
2018-02-13 12:17:22 +01:00
Conor Broderick 5169ccf258
Merge pull request #3724 from simonpasquier/fix-bad-data-error
Don't reset FiredAt for inactive alerts
2018-02-01 16:18:09 +00:00
Krasi Georgiev b75428ec19 rename package retrieve to scrape
no fucnctinal changes just renaming retrieval to scrape
2018-02-01 09:55:07 +00:00
Krasi Georgiev 7858745c04 rename structs for consistency 2018-01-30 17:49:05 +00:00
Krasi Georgiev acc4197098 remove dicovery race for the context field 2018-01-29 15:18:07 +00:00
Julien Pivotto 8b20cb1e8d last config success time gauge: use SetToCurrentTime() (#3750)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2018-01-27 07:48:13 +00:00
Simon Pasquier 81c0ab69e0 Don't reset FiredAt for inactive alerts
Otherwise AlertManager receives resolved alerts where StartsAt is zero which
fails the validation.
2018-01-22 17:17:33 +01:00
Krasi Georgiev 719c579f7b refactor main execution reloadReady handling, update some comments 2018-01-17 18:14:24 +00:00
Krasi Georgiev 0eafaf32d3 set the correct config reloading execution for scraper and notifier 2018-01-17 13:06:56 +00:00
Krasi Georgiev 97f0461e29 refactor the config reloading execution 2018-01-17 12:02:13 +00:00
Krasi Georgiev 5260c650ec use the config hash for the map lookup 2018-01-16 11:10:54 +00:00
Krasi Georgiev 8369826808 comment to rethink the map reference for the notifier discovery 2018-01-16 09:47:53 +00:00
Krasi Georgiev d12e6f29fc discovery manager ApplyConfig now takes a direct ServiceDiscoveryConfig so that it can be used for the notify manager
reimplement the service discovery for the notify manager

Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>
2018-01-15 13:39:44 +00:00
Goutham Veeramachaneni 35a6ffbaf3
Merge pull request #3587 from krasi-georgiev/web-test-error-check
handle web_test webhandler errors.
2018-01-10 22:03:25 +05:30
Brian Brazil ecc24b554d
Hide block duration flags. (#3618)
Users are starting to use these mistakenly thinking they'll help
with issues, and thus causing some confusion.
Thus hide them and make it clear that they're only there for testing
reasons.
2017-12-24 12:13:48 +00:00
Krasi Georgiev c94fa731aa bypass the proxy for the tests 2017-12-20 18:21:10 +00:00
Krasi Georgiev ad66476c4f fix flaky main.go test and simplify a bit 2017-12-19 15:07:49 +00:00
Fabian Reinartz 2881d73ed8
Merge pull request #3362 from krasi-georgiev/discovery-refactoring
Decouple the discovery and refactor the retrieval package
2017-12-19 12:56:34 +01:00
Goutham Veeramachaneni 9c9f96b2c0
Merge pull request #3529 from krasi-georgiev/main-integration-test
main.go integration test for Startup interrupting.
2017-12-18 22:12:13 -06:00
Krasi Georgiev 587dec9eb9 rebased and resolved conflicts with the new Discovery GUI page
Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>
2017-12-18 20:10:03 +00:00
Krasi Georgiev 1ec76d1950 rearange the contexts variables and logic
split the groupsMerge function to set and get
other small nits
2017-12-18 17:23:47 +00:00
Krasi Georgiev 6ff1d5c51e add the scrape manager config reloader
handle errors with invalid scrape config
2017-12-18 17:23:47 +00:00
Krasi Georgiev b0d4f6ee08 resolved merge confilc in main.go 2017-12-18 17:23:46 +00:00
Krasi Georgiev c5cb0d2910 simplify naming and API. 2017-12-18 17:22:50 +00:00
Krasi Georgiev 9c61f0e8a0 scrape pool doesn't rely on context as Stop() needs to be blocking to prevent Scrape loops trying to write to a closed TSDB storage. 2017-12-18 17:22:49 +00:00
Krasi Georgiev e405e2f1ea refactored discovery 2017-12-18 17:22:49 +00:00
pasquier-s 2440696961 Log file descriptor limits at startup (#3567)
Fixes #3564
2017-12-11 13:01:53 +00:00
Alberto Cortés 29da2fb9cd testutil: update to go1.9 testing.Helper 2017-12-08 19:06:53 +01:00
Alberto Cortés 8f6a9f7833 config: simplify tests by using testutil.NotOk (#3289)
Also include filename in all LoadFile errors

Also add mesage to testuitl.NotOk so we can identify failing tests when
using table driven tests.
2017-12-08 16:52:25 +00:00
Krasi Georgiev 740662644e write to temp dir and remove it at the end.
Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>
2017-12-06 10:45:58 +00:00
Brian Brazil b97f4cf48c Add metrics for rule group interval and last duration. 2017-12-04 11:44:38 +00:00
Krasi Georgiev 2c2a962da3 main.go integration test for Startup interrupting. 2017-12-01 10:58:01 +00:00
Goutham Veeramachaneni 823b7f90b3
Use the files globbed files and not the files in cfg
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-11-30 17:08:34 +05:30
Fabian Reinartz 62461379b7 rules: decouple notifier packages
The dependency on the notifier packages caused a transitive dependency
on discovery and with that all client libraries our service discovery
uses.
2017-11-27 16:38:14 +01:00
Fabian Reinartz 4d964a0a0d rules: make glob expansion a concern of main 2017-11-24 08:22:57 +01:00
Fabian Reinartz bd9f7460eb rules: remove config package dependency 2017-11-24 07:57:54 +01:00
Fabian Reinartz 2d0e3746ac rules: remove dependency on promql.Engine 2017-11-24 07:57:54 +01:00
Krasi Georgiev e2f4850fea Refactor main.go with oklog/pkg/group actors pattern 2017-11-11 12:33:15 +00:00
Thibault Chataigner fc4406201e Tsdb StartTime : Use a simplier way to compute StartTime 2017-10-25 17:41:00 +02:00
Julius Volz 099df0c5f0 Migrate "golang.org/x/net/context" -> "context" (#3333)
In some places, where ctxhttp or gRPC are concerned, we still need to use the
old contexts.
2017-10-24 21:21:42 -07:00
Julius Volz 9d43176ab3 Remove unused printVersion variable (#3335)
Kingpin now automatically does this via --version.
2017-10-23 08:50:13 +01:00
Julius Volz 82c5b98496 Capitalize Prometheus in startup message (#3332)
Hey, branding :)
2017-10-23 08:49:28 +01:00
Thibault Chataigner bf4a279a91 Remote storage reads based on oldest timestamp in primary storage (#3129)
Currently all read queries are simply pushed to remote read clients.
This is fine, except for remote storage for wich it unefficient and
make query slower even if remote read is unnecessary.
So we need instead to compare the oldest timestamp in primary/local
storage with the query range lower boundary. If the oldest timestamp
is older than the mint parameter, then there is no need for remote read.
This is an optionnal behavior per remote read client.

Signed-off-by: Thibault Chataigner <t.chataigner@criteo.com>
2017-10-18 12:08:14 +01:00
Julius Volz 5f715f5733 Fix typo in flag description (#3302) 2017-10-16 23:00:05 +01:00
Tobias Schmidt 3589f2f1d4 Merge pull request #3285 from jlevesy/use-testutils-in-cmd-subpackage
Use testutil assertion helpers in cmd package
2017-10-13 00:12:39 +02:00
Julien Levesy d7b4fa8d78 use testutil assertions in the cmd/prometheus package 2017-10-12 13:45:38 +02:00
Mathieu Pasquet 38afa507bb Provide better errors messages in commandline
Instead or only printing the help message, which is not always helpful.
For example, when upgrading from prometheus v1, the retention time value
format has changed and now only accepts one unit (e.g. "15d") where it
previously allowed more complex strings (e.g. "360h0m0s").

This commit provides the error message as an explanation for the parsing
failure.
2017-10-09 16:25:50 +02:00
Marc Sluiter 6a633eece1 Added go-conntrack for monitoring http connections (#3241)
Added metrics for in- and outgoing traffic with go-conntrack.
2017-10-06 11:22:19 +01:00
Fabian Reinartz 2d0b8e8b94 Merge branch 'master' into dev-2.0 2017-10-05 13:09:18 +02:00
Paul Gier 08af129b4d cmd/prometheus: don't allow quotes at beginning or end of url
This prevents accidental copy/paste error where a the web.external-url
or alertmanager.url params could have an extra set of quotes.
See also: https://github.com/prometheus/prometheus/issues/1229
2017-10-04 10:10:02 -05:00
Paul Gier f79b55d057 cmd/prometheus: remove govalidator for url validation
The usage of govalidator is redundant with the call to url.Parse for
url validation. Removing it has the following benefits:

 - The explicit error message is displayed instead of just a generic
   valid/invalid message
 - Slightly smaller code with one fewer external dependency
 - Speed improvement by removing duplicate call to url.Parse (inside
   govalidator.IsURL()
 - Resolves issue #2717

The only potential drawback of removing govalidator is that certain
URLs will be considered valid which were previously invalid. For example:

 - URLs with hostnames that start and/or end with an underscore (http://_example.com_)
 - URLs with hostnames that contain some special characters (http://foo&*bar.org)

These are valid URIs according to RFC 3986 and valid domain names per RFC 2181,
however they are not valid hostnames per RFC 952.
2017-10-04 10:08:34 -05:00
Fabian Reinartz 7b02bfee0a web: start web handler while TSDB is starting up 2017-09-20 15:03:19 +02:00
Goutham Veeramachaneni f5aed810f9 logging: Port to common/promlog
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-09-15 12:40:50 +05:30
Fabian Reinartz d21f149745 *: migrate to go-kit/log 2017-09-08 22:01:51 +05:30
Fabian Reinartz c70379e1c7 Merge branch 'dev-2.0' of github.com:prometheus/prometheus into dev-2.0 2017-09-04 13:10:50 +02:00
Fabian Reinartz fffe51fb03 Add mutex and block profiling via envvar 2017-09-04 13:10:32 +02:00
Ben Kochie 59aca4138b Fix staticcheck issues. 2017-08-28 17:29:01 +02:00
Matt Bostock 64973f5c65 cmd/prometheus: Fix capitalisation in log line (#3123)
Change 'Ready' to 'ready'.
2017-08-28 11:03:25 +01:00
Mark Adams 77c816b309 Fix pprof endpoints when -web.route-prefix or -web.external-url is used (#3054)
Whenever a route prefix is applied, the router prepends the prefix to
the URL path on the request. For most handlers, this is not an issue
because the request's path is only used for routing and is not actually
needed by the handler itself. However, Prometheus delegates the handling
of the /debug/* endpoints to the http.DefaultServeMux which has it's own
routing logic that depends on the url.Path. As a result, whenever a
prefix is applied, the prefixed URL is passed to the DefaultServeMux
which has no awareness of the prefix and returns a 404.

This change fixes the issue by creating a new serveDebug handler which
routes requests /debug/* requests to appropriate net/http/pprof handler
and removing the net/http/pprof import in cmd/prometheus since it is no
longer necessary.

Fixes #2183.
2017-08-23 00:00:56 +01:00
Fabian Reinartz 25f3e1c424 Merge branch 'master' into mergemaster 2017-08-10 17:04:25 +02:00
KalivarapuReshma 686050d816 Change -config.file to --config.file in Readme and error message 2017-08-08 12:49:35 +05:30