Commit graph

961 commits

Author SHA1 Message Date
Ganesh Vernekar d4b9fe801f
M-map full chunks of Head from disk (#6679)
When appending to the head and a chunk is full it is flushed to the disk and m-mapped (memory mapped) to free up memory

Prom startup now happens in these stages
 - Iterate the m-maped chunks from disk and keep a map of series reference to its slice of mmapped chunks.
- Iterate the WAL as usual. Whenever we create a new series, look for it's mmapped chunks in the map created before and add it to that series.

If a head chunk is corrupted the currpted one and all chunks after that are deleted and the data after the corruption is recovered from the existing WAL which means that a corruption in m-mapped files results in NO data loss.

[Mmaped chunks format](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/head_chunks.md)  - main difference is that the chunk for mmaping now also includes series reference because there is no index for mapping series to chunks.
[The block chunks](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/chunks.md) are accessed from the index which includes the offsets for the chunks in the chunks file - example - chunks of series ID have offsets 200, 500 etc in the chunk files.
In case of mmaped chunks, the offsets are stored in memory and accessed from that. During WAL replay, these offsets are restored by iterating all m-mapped chunks as stated above by matching the series id present in the chunk header and offset of that chunk in that file.

**Prombench results**

_WAL Replay_

1h Wal reply time
30% less wal reply time - 4m31 vs 3m36
2h Wal reply time
20% less wal reply time - 8m16 vs 7m

_Memory During WAL Replay_

High Churn:
10-15% less RAM -  32gb vs 28gb
20% less RAM after compaction 34gb vs 27gb
No Churn:
20-30% less RAM -  23gb vs 18gb
40% less RAM after compaction 32.5gb vs 20gb

Screenshots are in [this comment](https://github.com/prometheus/prometheus/pull/6679#issuecomment-621678932)


Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-05-06 21:00:00 +05:30
Ben Ye 1e4e37144d
Fixed wrongly handled not ready TSDB on web and API. (#7182)
* fix federate endpoint panic

Signed-off-by: yeya24 <yb532204897@gmail.com>

* Fixed all cases of not ready TSDB being wrongly handled.

* Fixed issue for federation.
* Ensured this will never happen again thanks to interfaces
* Fixes same issue for stats.
* Added tests for readiness.
* Fixed bug in stats. It was:
   status.MaxTime = db.Head().MaxTime()
   status.MinTime = db.Head().MaxTime()


Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Brian's comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Brian's comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-04-29 17:16:14 +01:00
Julien Pivotto fc3fb3265a
Merge pull request #7145 from prometheus/release-2.17
Backport release 2.17 into master
2020-04-20 14:08:12 +02:00
Julien Pivotto 9072cf7203
Merge pull request #7137 from roidelapluie/cherrypicks
Cherry-pick three bugfixes from master to release-2.17
2020-04-18 20:21:26 +02:00
beorn7 69ac27e1b4 Make series method return a finalizer, too
Signed-off-by: beorn7 <beorn@grafana.com>
2020-04-17 22:40:39 +02:00
Julian Taylor e2c06a8898 register federation failure metrics (#7081)
Closes gh-7080

Signed-off-by: Julian Taylor <juliantaylor108@gmail.com>
2020-04-17 22:06:16 +02:00
Julien Pivotto a2fcdeb1ef Defer finalizer (#7129)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-04-17 22:05:38 +02:00
beorn7 f9f423ec0a Ensure queries are closed in API calls
Signed-off-by: beorn7 <beorn@grafana.com>
2020-04-17 20:32:36 +02:00
Julien Pivotto 209d4bb8a1
Defer finalizer (#7129)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-04-16 20:16:16 +02:00
gotjosh 24af5049bb
API: Allow TargetRetriever to receive a Context (#7125)
Fixes #7103

Signed-off-by: gotjosh <josue@grafana.com>
2020-04-16 09:30:47 +01:00
Marek Slabicki 8224ddec23
Capitalizing first letter of all log lines (#7043)
Signed-off-by: Marek Slabicki <thaniri@gmail.com>
2020-04-11 09:22:18 +01:00
Ben Ye ecda6013ed
Use only local tsdb for federation (#7096)
Signed-off-by: yeya24 <yb532204897@gmail.com>
2020-04-07 16:42:42 +01:00
Julian Taylor 05442b31c8
register federation failure metrics (#7081)
Closes gh-7080

Signed-off-by: Julian Taylor <juliantaylor108@gmail.com>
2020-04-06 09:05:01 +01:00
Chris Marchbanks 62bd77bf93
Fix react tests (#7077)
https://github.com/facebook/create-react-app/issues/8689 is causing our
tests to fail in the CI pipeline. As the comments suggest, downgrading
to react-scripts 3.4.0 fixes the problem.

In addition, fix a test warning due to a missing id field.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2020-04-01 16:37:52 +02:00
Brian Brazil 7646cbca32
Use .UTC everywhere we use time.Unix (#7066)
time.Unix attaches the local timezone, which can then
leak out (e.g. in the alert json). While this is harmless,
we should be consistent.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2020-03-29 17:35:39 +01:00
Bartlomiej Plotka d5c33877f9
storage: Added Chunks{Queryable/Querier/SeriesSet/Series/Iteratable. Added generic Merge{SeriesSet/Querier} implementation. (#7005)
* storage: Added Chunks{Queryable/Querier/SeriesSet/Series/Iteratable. Added generic Merge{SeriesSet/Querier} implementation.

## Rationales:

In many places (e.g. chunk Remote read, Thanos Receive fetching chunk from TSDB), we operate on encoded chunks not samples.
This means that we unnecessary decode/encode, wasting CPU, time and memory.
This PR adds chunk iterator interfaces and makes the merge code to be reused between both seriesSets

I will make the use of it in following PR inside tsdb itself. For now fanout implements it and mergers.

All merges now also allows passing series mergers. This opens doors for custom deduplications other than TSDB vertical ones (e.g. offline one we have in Thanos).

## Changes

* Added Chunk versions of all iterating methods. It all starts in Querier/ChunkQuerier. The plan is that
Storage will implement both chunked and samples.
* Added Seek to chunks.Iterator interface for iterating over chunks.
* NewMergeChunkQuerier was added; Both this and NewMergeQuerier are now using generigMergeQuerier to share the code. Generic code was added.
* Improved tests.
* Added some TODO for further simplifications in next PRs.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Brian's comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Moved s/Labeled/SeriesLabels as per Krasi suggestion.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Krasi's comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Second iteration of Krasi comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Another round of comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-03-24 20:15:47 +00:00
Ben Kochie 269e7c8091
Fix golint issues.
Signed-off-by: Ben Kochie <superq@gmail.com>
2020-03-23 20:38:43 +01:00
Ben Kochie 24ecae9956
Update yarn deps (#7027)
Fix security warnings.

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-03-22 20:38:48 +01:00
Bartlomiej Plotka c4eefd1b3a storage: Removed SelectSorted method; Simplified interface; Added requirement for remote read to sort response.
This is technically BREAKING CHANGE, but it was like this from the beginning: I just notice that we rely in
Prometheus on remote read being sorted. This is because we use selected data from remote reads in MergeSeriesSet
which rely on sorting.

I found during work on https://github.com/prometheus/prometheus/pull/5882 that
we do so many repetitions because of this, for not good reason. I think
I found a good balance between convenience and readability with just one method.
Smaller the interface = better.

Also I don't know what TestSelectSorted was testing, but now it's testing sorting.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-03-20 21:14:43 +01:00
Jaga Santagostino 350f25eed3
changes from PR requests (#6880)
Signed-off-by: Jaga Santagostino <jagasantagostino@gmail.com>
2020-03-16 08:22:08 +01:00
Bartlomiej Plotka fe802f29c9 storage: Removed SelectSorted method; Simplified interface; Added requirement for remote read to sort response.
This is technically BREAKING CHANGE, but it was like this from the beginning: I just notice that we rely in
Prometheus on remote read being sorted. This is because we use selected data from remote reads in MergeSeriesSet
which rely on sorting.

I found during work on https://github.com/prometheus/prometheus/pull/5882 that
we do so many repetitions because of this, for not good reason. I think
I found a good balance between convenience and readability with just one method.
Smaller the interface = better.

Also I don't know what TestSelectSorted was testing, but now it's testing sorting.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-03-13 13:06:25 +00:00
Boyko 84b00564f4
API_V1: Extract time param parsing logic (#6860)
* extract API time param parsing logic in a func

Signed-off-by: blalov <boiskila@gmail.com>
2020-03-06 12:33:01 +02:00
李国忠 3fff701b77
[comments] change word ‘dependencie’ to ‘dependencies’ (#6904)
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
2020-03-01 09:34:27 +01:00
Simon Kirsten 45fbed94d6
React UI: Fixed data table for matrix always showing 1 as value (#6896)
Signed-off-by: Simon Kirsten <1972314+skirsten@users.noreply.github.com>

React UI: Fixed data table for matrix always showing 1 as value
2020-02-29 07:45:48 +01:00
Chris Marchbanks 0e78908407
Escape target selector for tooltip on targets page (#6892)
This fixes an issue where the /new/targets page will not load when there
are jobs with invalid CSS characters in them, such as the
namespace/service/0 form used by the Prometheus Operator.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2020-02-28 20:27:45 +01:00
LongKB 82f7ed208b
Remove some duplicated words (#6882)
Signed-off-by: Pham Duc Hanh <hanhpd@fujitsu.com>
2020-02-27 07:08:31 +01:00
李国忠 ddd4dcec19
[fix] ui: th to td (#6874)
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
2020-02-26 09:39:05 +00:00
Ben Kochie 65a19421a4
Update React vendoring (#6855)
* Update React vendoring

Update webpack-scripts to 3.4.0
* Fix security warning for `serialize-javascript`.

Signed-off-by: Ben Kochie <superq@gmail.com>

* Fix eslint errors.

Signed-off-by: Ben Kochie <superq@gmail.com>

* Update test snapshot.

Signed-off-by: Ben Kochie <superq@gmail.com>
2020-02-25 20:12:40 +01:00
Bartlomiej Plotka c4e74e241f
Merge pull request #6744 from slrtbtfs/split_parser
Split promql package into parser and engine
2020-02-25 16:49:19 +00:00
Bartlomiej Plotka c869e046a5
Merge pull request #6840 from pstibrany/context_from_request_should_not_return_error
Don't return error in ContextFromRequest function.
2020-02-25 12:39:46 +00:00
李国忠 d276b5fb65
[format]Js: format checktimedrift() success (#6850)
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
2020-02-20 11:36:22 +01:00
Tobias Guggenmos 4835bbf376
Merge branch 'master' into split_parser 2020-02-19 15:18:13 +01:00
Peter Štibraný 318cd413fc Don't return error in ContextFromRequest function.
Previously it could return error if RemoteAddr didn't
have correct format, but since this field has no specified
format, that was little too strict.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
2020-02-18 15:58:14 +01:00
Bartlomiej Plotka 48ead578a0 Moved tsdbconfig to main.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-18 11:25:36 +00:00
Bartlomiej Plotka a20bebf7eb Moved readyStorage to main.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:57 +00:00
Bartlomiej Plotka 8a775bc468 Moved unit agnostic options to separate pkg.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:57 +00:00
Bartlomiej Plotka 59c9d6ef45 Addressed Brian's comments, moved metrics to main.go
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:57 +00:00
Bartlomiej Plotka cfba92a133 Addressed comments.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:57 +00:00
Bartlomiej Plotka fb79f515fc Fixed second bug.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:57 +00:00
Bartlomiej Plotka 34426766d8 Unify Iterator interfaces. All point to storage now.
This is part of https://github.com/prometheus/prometheus/pull/5882 that can be done to simplify things.
All todos I added will be fixed in follow up PRs.

* querier.Querier, querier.Appender, querier.SeriesSet, and querier.Series interfaces merged
with storage interface.go. All imports that.
* querier.SeriesIterator replaced by chunkenc.Iterator
* Added chunkenc.Iterator.Seek method and tests for xor implementation (?)
* Since we properly handle SelectParams for Select methods I adjusted min max
based on that. This should help in terms of performance for queries with functions like offset.
* added Seek to deletedIterator and test.
* storage/tsdb was removed as it was only a unnecessary glue with incompatible structs.

No logic was changed, only different source of abstractions, so no need for benchmarks.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-02-17 18:03:54 +00:00
Harkishen Singh 489a9aa7b9
Adds normalization of localhost urls in targets page react (#6794)
* support for globalurls in targets page react

Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>

* fixed tests

Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>

* removed fmts

Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>

* implemented suggestions

Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>

* formatted

Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>

* implemented suggestions. fixed tests.

Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>

* formated go code

Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>

* implemented suggestions

Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>
2020-02-17 18:19:15 +01:00
Tobias Guggenmos 6c00f2ffcb Comment fixes
Signed-off-by: Tobias Guggenmos <tguggenm@redhat.com>
2020-02-17 16:09:23 +01:00
Tobias Guggenmos 20b1f596f6 Fix build errors in rest of prometheus
Signed-off-by: Tobias Guggenmos <tguggenm@redhat.com>
2020-02-17 16:09:23 +01:00
Julien Pivotto 135cc30063
rules: Make deleted rule series as stale after a reload (#6745)
* rules: Make deleted rule series as stale after a reload

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-02-12 16:22:18 +01:00
Boyko 1c321ed047
React-UI: flex-wrap the content (#6796)
* flex-wrap the content

Signed-off-by: blalov <boiskila@gmail.com>

* wrap formatted series in a div

Signed-off-by: blalov <boiskila@gmail.com>
2020-02-11 15:25:39 +01:00
Julien Pivotto ff0003e072
Make lookbackDelta a option of QueryEngine (#6746)
* Make lookbackDelta a option of QueryEngine

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* julius' suggestion

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* remove trivial getter

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Assume lookback delta is always > 0

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* add debug log

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* don't expose loopback delta

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Specify that lookack delta is also used in federation

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Fix federation test

While we have added some logic to the promql engine to keep it backwards
compatible and have a 5 minute loopback by default, the web/ package is
likely to really be internal to Prometheus and we should not add the
same kind of heuritstics here.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* loopback delta: Fix debug log

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-02-10 00:58:23 +01:00
Drumil Patel b00023344e
Formatting short tables for readability (#6762)
* Formatting short tables for readability

Signed-off-by: Drumil Patel <drumilpatel720@gmail.com>

* Solving linting issues in DataTable.tsx

Signed-off-by: Drumil Patel <drumilpatel720@gmail.com>

* Increasing maxFormattable Size to 1000 and changing value of toFormatStyle

Signed-off-by: Drumil Patel <drumilpatel720@gmail.com>

* Solve tests failure for multiple alerts on size gretaer than 10000

Signed-off-by: Drumil Patel <drumilpatel720@gmail.com>

* Solve linting errors

Signed-off-by: Drumil Patel <drumilpatel720@gmail.com>

* Add tests for alert of not formatting

Signed-off-by: Drumil Patel <drumilpatel720@gmail.com>
2020-02-10 00:11:57 +01:00
Julius Volz 0a8acb654e
React UI: Use null value for determining consoles link display (#6790)
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2020-02-09 13:39:44 +01:00
Drumil Patel 687a962bd1 Add conditional rendering of Navlink for Consoles (#6761)
* Add conditional rendering of Navlink for Consoles

Signed-off-by: Drumil Patel <drumilpatel720@gmail.com>

* Replacing if else with only if conditional rendering

Signed-off-by: Drumil Patel <drumilpatel720@gmail.com>

* Add tests and removing global declaration in Navbar

Signed-off-by: Drumil Patel <drumilpatel720@gmail.com>

* Correct Navbar Testcases and add types for ConsolesLink

Signed-off-by: Drumil Patel <drumilpatel720@gmail.com>

* Change names for Console link as per-naming convention

Signed-off-by: Drumil Patel <drumilpatel720@gmail.com>

* Change prop names to AppProps and NavbarProps respectively

Signed-off-by: Drumil Patel <drumilpatel720@gmail.com>
2020-02-08 11:00:47 +01:00
Boyko 8c2bc2f57a
Unify react fetcher components (#6629)
* set useFetch loading flag to be true initially

Signed-off-by: blalov <boiskila@gmail.com>

* make extended props optional

Signed-off-by: blalov <boiskila@gmail.com>

* add status indicator to targets page

Signed-off-by: blalov <boiskila@gmail.com>

* add status indicator to tsdb status page

Signed-off-by: blalov <boiskila@gmail.com>

* spread response in Alerts

Signed-off-by: blalov <boiskila@gmail.com>

* disable eslint func retun type rule

Signed-off-by: blalov <boiskila@gmail.com>

* add status indicator to Service Discovery page

Signed-off-by: blalov <boiskila@gmail.com>

* refactor PanelList

Signed-off-by: blalov <boiskila@gmail.com>

* test fix

Signed-off-by: blalov <boiskila@gmail.com>

* use local storage hook in PanelList

Signed-off-by: blalov <boiskila@gmail.com>

* use 'useFetch' for fetching metrics

Signed-off-by: blalov <boiskila@gmail.com>

* left-overs

Signed-off-by: blalov <boiskila@gmail.com>

* remove targets page custom error message

Signed-off-by: Boyko Lalov <boiskila@gmail.com>

* adding components displayName

Signed-off-by: Boyko Lalov <boiskila@gmail.com>

* display more user friendly error messages

Signed-off-by: Boyko Lalov <boiskila@gmail.com>

* update status page snapshot

Signed-off-by: Boyko Lalov <boiskila@gmail.com>

* pr review changes

Signed-off-by: Boyko Lalov <boiskila@gmail.com>

* fix broken tests

Signed-off-by: Boyko Lalov <boiskila@gmail.com>

* fix typos

Signed-off-by: Boyko Lalov <boiskila@gmail.com>
2020-02-03 15:14:25 +01:00