When appending to the head and a chunk is full it is flushed to the disk and m-mapped (memory mapped) to free up memory
Prom startup now happens in these stages
- Iterate the m-maped chunks from disk and keep a map of series reference to its slice of mmapped chunks.
- Iterate the WAL as usual. Whenever we create a new series, look for it's mmapped chunks in the map created before and add it to that series.
If a head chunk is corrupted the currpted one and all chunks after that are deleted and the data after the corruption is recovered from the existing WAL which means that a corruption in m-mapped files results in NO data loss.
[Mmaped chunks format](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/head_chunks.md) - main difference is that the chunk for mmaping now also includes series reference because there is no index for mapping series to chunks.
[The block chunks](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/chunks.md) are accessed from the index which includes the offsets for the chunks in the chunks file - example - chunks of series ID have offsets 200, 500 etc in the chunk files.
In case of mmaped chunks, the offsets are stored in memory and accessed from that. During WAL replay, these offsets are restored by iterating all m-mapped chunks as stated above by matching the series id present in the chunk header and offset of that chunk in that file.
**Prombench results**
_WAL Replay_
1h Wal reply time
30% less wal reply time - 4m31 vs 3m36
2h Wal reply time
20% less wal reply time - 8m16 vs 7m
_Memory During WAL Replay_
High Churn:
10-15% less RAM - 32gb vs 28gb
20% less RAM after compaction 34gb vs 27gb
No Churn:
20-30% less RAM - 23gb vs 18gb
40% less RAM after compaction 32.5gb vs 20gb
Screenshots are in [this comment](https://github.com/prometheus/prometheus/pull/6679#issuecomment-621678932)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
https://github.com/facebook/create-react-app/issues/8689 is causing our
tests to fail in the CI pipeline. As the comments suggest, downgrading
to react-scripts 3.4.0 fixes the problem.
In addition, fix a test warning due to a missing id field.
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
time.Unix attaches the local timezone, which can then
leak out (e.g. in the alert json). While this is harmless,
we should be consistent.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* storage: Added Chunks{Queryable/Querier/SeriesSet/Series/Iteratable. Added generic Merge{SeriesSet/Querier} implementation.
## Rationales:
In many places (e.g. chunk Remote read, Thanos Receive fetching chunk from TSDB), we operate on encoded chunks not samples.
This means that we unnecessary decode/encode, wasting CPU, time and memory.
This PR adds chunk iterator interfaces and makes the merge code to be reused between both seriesSets
I will make the use of it in following PR inside tsdb itself. For now fanout implements it and mergers.
All merges now also allows passing series mergers. This opens doors for custom deduplications other than TSDB vertical ones (e.g. offline one we have in Thanos).
## Changes
* Added Chunk versions of all iterating methods. It all starts in Querier/ChunkQuerier. The plan is that
Storage will implement both chunked and samples.
* Added Seek to chunks.Iterator interface for iterating over chunks.
* NewMergeChunkQuerier was added; Both this and NewMergeQuerier are now using generigMergeQuerier to share the code. Generic code was added.
* Improved tests.
* Added some TODO for further simplifications in next PRs.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Addressed Brian's comments.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Moved s/Labeled/SeriesLabels as per Krasi suggestion.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Addressed Krasi's comments.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Second iteration of Krasi comments.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Another round of comments.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
This is technically BREAKING CHANGE, but it was like this from the beginning: I just notice that we rely in
Prometheus on remote read being sorted. This is because we use selected data from remote reads in MergeSeriesSet
which rely on sorting.
I found during work on https://github.com/prometheus/prometheus/pull/5882 that
we do so many repetitions because of this, for not good reason. I think
I found a good balance between convenience and readability with just one method.
Smaller the interface = better.
Also I don't know what TestSelectSorted was testing, but now it's testing sorting.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
This is technically BREAKING CHANGE, but it was like this from the beginning: I just notice that we rely in
Prometheus on remote read being sorted. This is because we use selected data from remote reads in MergeSeriesSet
which rely on sorting.
I found during work on https://github.com/prometheus/prometheus/pull/5882 that
we do so many repetitions because of this, for not good reason. I think
I found a good balance between convenience and readability with just one method.
Smaller the interface = better.
Also I don't know what TestSelectSorted was testing, but now it's testing sorting.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
This fixes an issue where the /new/targets page will not load when there
are jobs with invalid CSS characters in them, such as the
namespace/service/0 form used by the Prometheus Operator.
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
Previously it could return error if RemoteAddr didn't
have correct format, but since this field has no specified
format, that was little too strict.
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
This is part of https://github.com/prometheus/prometheus/pull/5882 that can be done to simplify things.
All todos I added will be fixed in follow up PRs.
* querier.Querier, querier.Appender, querier.SeriesSet, and querier.Series interfaces merged
with storage interface.go. All imports that.
* querier.SeriesIterator replaced by chunkenc.Iterator
* Added chunkenc.Iterator.Seek method and tests for xor implementation (?)
* Since we properly handle SelectParams for Select methods I adjusted min max
based on that. This should help in terms of performance for queries with functions like offset.
* added Seek to deletedIterator and test.
* storage/tsdb was removed as it was only a unnecessary glue with incompatible structs.
No logic was changed, only different source of abstractions, so no need for benchmarks.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Make lookbackDelta a option of QueryEngine
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* julius' suggestion
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* remove trivial getter
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Assume lookback delta is always > 0
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* add debug log
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* don't expose loopback delta
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Specify that lookack delta is also used in federation
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Fix federation test
While we have added some logic to the promql engine to keep it backwards
compatible and have a 5 minute loopback by default, the web/ package is
likely to really be internal to Prometheus and we should not add the
same kind of heuritstics here.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* loopback delta: Fix debug log
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Add conditional rendering of Navlink for Consoles
Signed-off-by: Drumil Patel <drumilpatel720@gmail.com>
* Replacing if else with only if conditional rendering
Signed-off-by: Drumil Patel <drumilpatel720@gmail.com>
* Add tests and removing global declaration in Navbar
Signed-off-by: Drumil Patel <drumilpatel720@gmail.com>
* Correct Navbar Testcases and add types for ConsolesLink
Signed-off-by: Drumil Patel <drumilpatel720@gmail.com>
* Change names for Console link as per-naming convention
Signed-off-by: Drumil Patel <drumilpatel720@gmail.com>
* Change prop names to AppProps and NavbarProps respectively
Signed-off-by: Drumil Patel <drumilpatel720@gmail.com>
The function HoldDuration and Duration did the exact same thing.
Let's only keep HoldDuration() as Duration() is more confusing.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Since we use ActiveQueryTracker to check for concurrency in
d992c36b3a it does not make sense to keep
the MaxConcurrent value as an option of the PromQL engine.
This pull request removes it from the PromQL engine options, sets the
max concurrent metric to -1 if there is no active query tracker, and use
the value of the active query tracker otherwise.
It removes dead code and also will inform people who import the promql
package that we made that change, as it breaks the EngineOpts struct.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* React UI: Support local timezone on /graph
This partially implements
https://github.com/prometheus/prometheus/issues/500 in the sense that it
only addresses the /graph page, and only allows toggling between UTC and
local (browser) time, but no arbitrary timezone selection yet.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Fixup: Also display TZ offset in tooltip
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Just show offset, not timezone name abbreviation
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* React UI: Send cookies on fetch() on older browsers
Fixes https://github.com/prometheus/prometheus/issues/6428
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Fix fetch() tests to expect new options
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* add panel state for the expression input
Signed-off-by: blalov <boiskila@gmail.com>
* remove redundant test
Signed-off-by: blalov <boiskila@gmail.com>
Rather than keeping the entire symbol table in memory, keep every nth
offset and walk from there to the entry we need. This ends up slightly
slower, ~360ms per 1M series returned from PostingsForMatchers which is
not much considering the rest of the CPU such a query would go on to
use.
Make LabelValues use the postings tables, rather than having
to do symbol lookups. Use yoloString, as PostingsForMatchers
doesn't need the strings to stick around and adjust the API
call to keep the Querier open until it's all marshalled.
Remove allocatedSymbols memory optimisation, we no longer keep all the
symbol strings in heap memory. Remove LabelValuesFor and LabelIndices,
they're dead code. Ensure we've still tests for label indices,
and add missing test that we can work with old V1 Format index files.
PostingForMatchers performance is slightly better, with a big drop in
allocation counts due to using yoloString for LabelValues:
benchmark old ns/op new ns/op delta
BenchmarkPostingsForMatchers/Block/n="1"-4 36698 36681 -0.05%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4 522786 560887 +7.29%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4 511652 537680 +5.09%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4 522102 564239 +8.07%
BenchmarkPostingsForMatchers/Block/i=~".*"-4 113689911 111795919 -1.67%
BenchmarkPostingsForMatchers/Block/i=~".+"-4 135825572 132871085 -2.18%
BenchmarkPostingsForMatchers/Block/i=~""-4 40782628 38038181 -6.73%
BenchmarkPostingsForMatchers/Block/i!=""-4 31267869 29194327 -6.63%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4 112733329 111568823 -1.03%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4 112868153 111232029 -1.45%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4 31338257 29349446 -6.35%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4 32054482 29972436 -6.50%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4 136504654 133968442 -1.86%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4 27960350 27264997 -2.49%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4 136765564 133860724 -2.12%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4 163714583 159453668 -2.60%
benchmark old allocs new allocs delta
BenchmarkPostingsForMatchers/Block/n="1"-4 6 6 +0.00%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4 11 11 +0.00%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4 11 11 +0.00%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4 17 15 -11.76%
BenchmarkPostingsForMatchers/Block/i=~".*"-4 100012 12 -99.99%
BenchmarkPostingsForMatchers/Block/i=~".+"-4 200040 100040 -49.99%
BenchmarkPostingsForMatchers/Block/i=~""-4 200045 100045 -49.99%
BenchmarkPostingsForMatchers/Block/i!=""-4 200041 100041 -49.99%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4 100017 17 -99.98%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4 100023 23 -99.98%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4 200046 100046 -49.99%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4 200050 100050 -49.99%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4 200049 100049 -49.99%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4 111150 11150 -89.97%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4 200055 100055 -49.99%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4 311238 111238 -64.26%
benchmark old bytes new bytes delta
BenchmarkPostingsForMatchers/Block/n="1"-4 296 296 +0.00%
BenchmarkPostingsForMatchers/Block/n="1",j="foo"-4 424 424 +0.00%
BenchmarkPostingsForMatchers/Block/j="foo",n="1"-4 424 424 +0.00%
BenchmarkPostingsForMatchers/Block/n="1",j!="foo"-4 552 1544 +179.71%
BenchmarkPostingsForMatchers/Block/i=~".*"-4 1600482 1606125 +0.35%
BenchmarkPostingsForMatchers/Block/i=~".+"-4 17259065 17264709 +0.03%
BenchmarkPostingsForMatchers/Block/i=~""-4 17259150 17264780 +0.03%
BenchmarkPostingsForMatchers/Block/i!=""-4 17259048 17264680 +0.03%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",j="foo"-4 1600610 1606242 +0.35%
BenchmarkPostingsForMatchers/Block/n="1",i=~".*",i!="2",j="foo"-4 1600813 1606434 +0.35%
BenchmarkPostingsForMatchers/Block/n="1",i!=""-4 17259176 17264808 +0.03%
BenchmarkPostingsForMatchers/Block/n="1",i!="",j="foo"-4 17259304 17264936 +0.03%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",j="foo"-4 17259333 17264965 +0.03%
BenchmarkPostingsForMatchers/Block/n="1",i=~"1.+",j="foo"-4 3142628 3148262 +0.18%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!="2",j="foo"-4 17259509 17265141 +0.03%
BenchmarkPostingsForMatchers/Block/n="1",i=~".+",i!~"2.*",j="foo"-4 20405680 20416944 +0.06%
However overall Select performance is down and involves more allocs, due to
having to do more than a simple map lookup to resolve a symbol and that all the strings
returned are allocated:
benchmark old ns/op new ns/op delta
BenchmarkQuerierSelect/Block/1of1000000-4 506092636 862678244 +70.46%
BenchmarkQuerierSelect/Block/10of1000000-4 505638968 860917636 +70.26%
BenchmarkQuerierSelect/Block/100of1000000-4 505229450 882150048 +74.60%
BenchmarkQuerierSelect/Block/1000of1000000-4 515905414 862241115 +67.13%
BenchmarkQuerierSelect/Block/10000of1000000-4 516785354 874841110 +69.29%
BenchmarkQuerierSelect/Block/100000of1000000-4 540742808 907030187 +67.74%
BenchmarkQuerierSelect/Block/1000000of1000000-4 815224288 1181236903 +44.90%
benchmark old allocs new allocs delta
BenchmarkQuerierSelect/Block/1of1000000-4 4000020 6000020 +50.00%
BenchmarkQuerierSelect/Block/10of1000000-4 4000038 6000038 +50.00%
BenchmarkQuerierSelect/Block/100of1000000-4 4000218 6000218 +50.00%
BenchmarkQuerierSelect/Block/1000of1000000-4 4002018 6002018 +49.97%
BenchmarkQuerierSelect/Block/10000of1000000-4 4020018 6020018 +49.75%
BenchmarkQuerierSelect/Block/100000of1000000-4 4200018 6200018 +47.62%
BenchmarkQuerierSelect/Block/1000000of1000000-4 6000018 8000019 +33.33%
benchmark old bytes new bytes delta
BenchmarkQuerierSelect/Block/1of1000000-4 176001468 227201476 +29.09%
BenchmarkQuerierSelect/Block/10of1000000-4 176002620 227202628 +29.09%
BenchmarkQuerierSelect/Block/100of1000000-4 176014140 227214148 +29.09%
BenchmarkQuerierSelect/Block/1000of1000000-4 176129340 227329348 +29.07%
BenchmarkQuerierSelect/Block/10000of1000000-4 177281340 228481348 +28.88%
BenchmarkQuerierSelect/Block/100000of1000000-4 188801340 240001348 +27.12%
BenchmarkQuerierSelect/Block/1000000of1000000-4 304001340 355201616 +16.84%
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* React UI: Fix issue when changing query then time, the old query is executed
Signed-off-by: Dustin Hooten <dhooten@splunk.com>
* pr feedback
Signed-off-by: Dustin Hooten <dhooten@splunk.com>
* more pr feedback
Signed-off-by: Dustin Hooten <dhooten@splunk.com>
This adds support for a new query param on the new `api/v1/metadata`
endpoint that provides metadata for a specified metric via the V1 API.
It collapses metadata that is equal across all targets, and aggregates
under the same metric name the ones that differ.
Signed-off-by: gotjosh <josue@grafana.com>
* api: provide per metric metadata
This adds a new endpoint that provides per metric metadata via the V1 API.
It collapses metadata that is equal across all targets, and aggregates under the same metric name the ones that differ.
* Allow tests to be asserted on response length
Some tests e.g. limit on API responses, don't require an assertion on
equality.
This allows us to assert against response length instead of
equality.
Signed-off-by: gotjosh <josue@grafana.com>
* Allows sorting of responses from the API in tests
Fixes flaky test for api/v1/targets/metadata.
Allows sorting of responses from the API. For our tests to be deterministic, we need to ensure the response from the API follows an order. This structure allows us to define one.
Fixes#6431
Signed-off-by: gotjosh <josue@grafana.com>
The most common format (used by go, gcc and clang) for compiler error positions seems to be
`filename:line:char:` or `line:char:` if the filename is unknown.
This PR adapts the PromQL parser to use this convention.
Signed-off-by: Tobias Guggenmos <tguggenm@redhat.com>
This commit introduces several test cases for the current /targets/metadata API endpoint.
To achieve so, we use a mock of the metadataStore and inject it to the targets under test.
Currently, three success cases are covered: with a metric name, with a target matcher, and with both. As for the failure scenario, the one where we couldn't match against a particular metric is covered.
Signed-off-by: gotjosh <josue@grafana.com>
Previously, the struct `testTargetRetriever` had hardcoded active and dropped targets. This made it difficult to change the target information depending on the test case.
This change introduces a way to define them as arguments and pass it to a constructor for building. It lays a foundation for dynamically defining targets with various set of arguments to test different scenarios.
Signed-off-by: gotjosh <josue@grafana.com>
* move graph related files into own folder
Signed-off-by: blalov <boiskila@gmail.com>
* move graph helper functions into own file
Signed-off-by: blalov <boiskila@gmail.com>
* fix typo in file name
Signed-off-by: blalov <boiskila@gmail.com>
* fix typo in file name and lint fixes
Signed-off-by: blalov <boiskila@gmail.com>
* React UI: Fix tests harder
Again not sure why this passed last time (?), but now I was getting an
error about 'NaN' not being a valid value to assign to the 'height'
property of the input element. This changes it so that only the blur()
function is actually mocked out on the active input element.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Fixup
Signed-off-by: Julius Volz <julius.volz@gmail.com>
It being a Reach Router <Link> caused the Reach router to not actually
leave the React app, even though the destination path was not a path
handled by the Reach Router.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
According to the documentation, the target metadata API accepts it,
if no value for match_target has been provided. This was not the case
in the implementation.
This commit make the API behave as described in the docs.
Signed-off-by: Tobias Guggenmos <tguggenm@redhat.com>