Commit graph

850 commits

Author SHA1 Message Date
Arthur Silva Sens 0305490e4e
Merge pull request #13987 from prometheus/nativeHis-flag-ingestion
bugfix: Decouple native histogram ingestions and protobuf parsing
2024-04-25 10:56:40 -03:00
Arthur Silva Sens 7aacef9b42
bugfix: Decouple native histogram ingestions and protobuf parsing
Up until this point, if a scrape was done with the protobuf format Prometheus would always try to ingest native histograms even with the feature flag disabled. This causes problems with other feature-flags that depend on the protobuf format, like 'created-timestamp-zero-ingestion'. This commit decouples native histogram parsing from ingestion, making sure ingestion only happens when the 'native-histogram' feature-flag is enabled.

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>
2024-04-24 17:02:52 -03:00
Will Hegedus bd1878700b
promtool: Fix panic on extended tsdb analyze (#13976)
Currently, running promtool tsdb analyze with the --extended flag
will cause an 'index out of range' error if running it
against a block that does not have any native histogram chunks.

This change ensures that promtool won't try to display data that doesn't exist.

Signed-off-by: Will Hegedus <whegedus@linode.com>
2024-04-24 11:35:34 +10:00
Julius Volz 5827772f69 Merge branch 'main' into mantine-ui
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2024-04-19 14:03:25 +02:00
Julius Volz e8bbe191d4 Build both old & new UI into Prometheus, allow choosing via feature flag
This keeps the old "react-app" directory in its existing location (to make
it easier to merge changes from the main branch), but separates it from the
npm workspaces setup. Thus it now needs to be npm-installed/built/linted
separately. This is a bit hacky, but should only be needed temporarily,
until the old UI can be removed.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2024-04-18 15:08:40 +02:00
machine424 c5a1cc9148
chore(tsdb): add a sandboxDir to DBReadOnly, the directory can be used for transient file writes.
use it in loadDataAsQueryable to make sure the RO Head doesn't truncate or cut new chunks in data/chunks_head/.

add a -sandbox-dir-root flag to "promtool tsdb dump/dump-openmetrics" to control the root of that sandbox dirrectory.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-04-15 17:00:25 +02:00
Matthieu MOREL 6f595c6762
golangci-lint: enable whitespace linter (#13905)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-04-11 09:27:54 +01:00
Matthieu MOREL d496687c8e golangci-lint: enable usestdlibvars linter
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-04-08 19:26:23 +00:00
Bryan Boreham fc567a1df8
Merge pull request #13889 from komisan19/refactor/utilize_standard_functions_max/min
refactor: utilize standard functions max/min in promtool and tsdb
2024-04-06 10:23:18 +01:00
komisan19 0249e080b4 refactor: utilize standard functions max/min
Signed-off-by: komisan19 <18901496+komisan19@users.noreply.github.com>
2024-04-04 03:15:38 +09:00
Arthur Silva Sens db64d2dcdc
Update documentation about existing feature-flags
Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>
2024-04-02 19:18:57 -03:00
Artur Melanchyk 44dcf02c69
TSDB: make total lock-free by using atomic
Signed-off-by: Artur Melanchyk <artur.melanchyk@gmail.com>
2024-03-23 19:51:29 +01:00
Bryan Boreham 773170f372
Merge pull request #13822 from dgl/promtool-test-errors
promtool: Avoid using testify for user rule tests
2024-03-23 09:42:34 +01:00
David Leadbeater 7ec4a11472 promtool: Avoid using testify for user rule tests
Using testify outside of unit tests results in panics rather than a
useful error for the user.

Fixes #13703

Signed-off-by: David Leadbeater <dgl@dgl.cx>
2024-03-21 22:08:10 +11:00
heyitao c7ca85388f Fix yaml file format and clear ci errors
Signed-off-by: heyitao <heyitao@uniontech.com>
2024-03-21 11:32:02 +08:00
machine424 3eed6c760a
adjust signal termination warning log
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-03-14 18:45:45 +01:00
Arthur Silva Sens 07355c9199
Bump client_golang to 1.19
Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>
2024-03-06 09:11:13 -03:00
Bryan Boreham e79b9ed2ab
Merge pull request #13194 from machine424/open
promtool: add a "tsdb dump-openmetrics" to dump in OpemMetrics format.
2024-02-28 17:46:59 +00:00
machine424 4b71f6ffc2
promtool: add a "tsdb dump-openmetrics" to dump in OpemMetrics format.
This closes the loop, as the output can be fed into "tsdb create-blocks-from openmetrics"

Native histograms are not supported.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-02-28 15:34:18 +01:00
machine424 f477e0539a
Move from golang.org/x/exp/slices into slices now that we only support Go >= 1.21
Prevent adding back golang.org/x/exp/slices.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-02-28 14:54:53 +01:00
Bryan Boreham eff3a13e19 model/textparse: parsers take a labels SymbolTable
This allows strings to be interned to save memory.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-26 11:45:25 +00:00
Łukasz Mierzwa 5597020a60 Use github.com/klauspost/compress for gzip and zlib
klauspost/compress is a high quality drop-in replacement for common Go
compression libraries. Since Prometheus sends out a lot of HTTP requests
that often return compressed output having improved compression
libraries helps to save cpu & memory resources.
On a test Prometheus server I was able to see cpu reduction from 31 to
30 cores.

Benchmark results:

name                                old time/op    new time/op    delta
TargetScraperGzip/metrics=1-8         69.4µs ± 4%    69.2µs ± 3%     ~     (p=0.122 n=50+50)
TargetScraperGzip/metrics=100-8       84.3µs ± 2%    80.9µs ± 2%   -4.02%  (p=0.000 n=48+46)
TargetScraperGzip/metrics=1000-8       296µs ± 1%     274µs ±14%   -7.35%  (p=0.000 n=47+45)
TargetScraperGzip/metrics=10000-8     2.06ms ± 1%    1.66ms ± 2%  -19.34%  (p=0.000 n=47+45)
TargetScraperGzip/metrics=100000-8    20.9ms ± 2%    17.5ms ± 3%  -16.50%  (p=0.000 n=49+50)

name                                old alloc/op   new alloc/op   delta
TargetScraperGzip/metrics=1-8         6.06kB ± 0%    6.07kB ± 0%   +0.24%  (p=0.000 n=48+48)
TargetScraperGzip/metrics=100-8       7.04kB ± 0%    6.89kB ± 0%   -2.17%  (p=0.000 n=49+50)
TargetScraperGzip/metrics=1000-8      9.02kB ± 0%    8.35kB ± 1%   -7.49%  (p=0.000 n=50+50)
TargetScraperGzip/metrics=10000-8     18.1kB ± 1%    16.1kB ± 2%  -10.87%  (p=0.000 n=47+47)
TargetScraperGzip/metrics=100000-8    1.21MB ± 0%    1.01MB ± 2%  -16.69%  (p=0.000 n=36+50)

name                                old allocs/op  new allocs/op  delta
TargetScraperGzip/metrics=1-8           71.0 ± 0%      72.0 ± 0%   +1.41%  (p=0.000 n=50+50)
TargetScraperGzip/metrics=100-8         81.0 ± 0%      76.0 ± 0%   -6.17%  (p=0.000 n=50+50)
TargetScraperGzip/metrics=1000-8        92.0 ± 0%      83.0 ± 0%   -9.78%  (p=0.000 n=50+50)
TargetScraperGzip/metrics=10000-8       93.0 ± 0%      91.0 ± 0%   -2.15%  (p=0.000 n=50+50)
TargetScraperGzip/metrics=100000-8       111 ± 0%       135 ± 1%  +21.89%  (p=0.000 n=40+50)

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2024-02-22 17:08:15 +00:00
Bryan Boreham 5a6c8f9c15 promtool: use go-cmp instead of DeepEqual
go-cmp allows more control over unexported fields and implementation
details.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-08 19:30:20 +00:00
Bryan Boreham 39af788dbd Tests: use replacement DeepEquals using go-cmp
Use DeepEqual replacement using go-cmp, which is more flexible.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-08 19:30:20 +00:00
Bryan Boreham 16e68c01e4 tests: remove err from message when testify prints it already
For instance `require.NoError` will print the unexpected error; we don't
need to include it in the message.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-01 14:18:01 +00:00
Paweł Szulik d5eb636a89 Refactor cmd tests to use testify.
Signed-off-by: Paweł Szulik <paul.szulik@gmail.com>
2024-02-01 13:51:31 +00:00
Marco Pracucci ac1c6eb3ef
Fix typo in CLI flag description
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-01-29 10:11:50 +01:00
Danny Kopping 5bda33375a
Rename flag
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2024-01-29 10:08:41 +01:00
Danny Kopping e7758d187e
Refactor concurrency control
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2024-01-29 10:08:39 +01:00
Danny Kopping ed2933ca60
Add feature flag
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2024-01-29 10:08:07 +01:00
Danny Kopping 940f83a540
Implementation
NOTE:
Rebased from main after refactor in #13014

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2024-01-29 10:07:15 +01:00
Filip Petkovski 583f3e587c
Optimize histogram iterators (#13340)
Optimize histogram iterators

Histogram iterators allocate new objects in the AtHistogram and
AtFloatHistogram methods, which makes calculating rates over long
ranges expensive.

In #13215 we allowed an existing object to be reused
when converting an integer histogram to a float histogram. This commit follows
the same idea and allows injecting an existing object in the AtHistogram and
AtFloatHistogram methods. When the injected value is nil, iterators allocate
new histograms, otherwise they populate and return the injected object.

The commit also adds a CopyTo method to Histogram and FloatHistogram which
is used in the BufferedIterator to overwrite items in the ring instead of making
new copies.

Note that a specialized HPoint pool is needed for all of this to work 
(`matrixSelectorHPool`).

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
2024-01-23 17:02:14 +01:00
Paulin Todev 78411d5e8b
SD Managers taking over responsibility for registration of debug metrics (#13375)
SD Managers take over responsibility for SD metrics registration

---------

Signed-off-by: Paulin Todev <paulin.todev@gmail.com>
Signed-off-by: Björn Rabenstein <github@rabenste.in>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
2024-01-23 16:53:55 +01:00
SuperQ 2c0f9d1eee
Add automatic memory limit handling
Enable automatic detection of memory limits and configure GOMEMLIMIT to
match.
* Also includes a flag to allow controlling the reserved ratio.

Signed-off-by: SuperQ <superq@gmail.com>
2024-01-23 14:53:57 +01:00
Rewanth Tammana 102fd8cc88
Enhanced visibility for promtool test rules with JSON colored formatting (#13342)
* Added diff flag for unit test to improvise readability & debugging

Signed-off-by: Rewanth Tammana <22347290+rewanthtammana@users.noreply.github.com>

* Removed blank spaces

Signed-off-by: Rewanth Tammana <22347290+rewanthtammana@users.noreply.github.com>

* Fixed linting error

Signed-off-by: Rewanth Tammana <22347290+rewanthtammana@users.noreply.github.com>

* Added cli flags to documentation

Signed-off-by: Rewanth Tammana <22347290+rewanthtammana@users.noreply.github.com>

* Revert unrrelated linting fixes

Signed-off-by: Rewanth Tammana <22347290+rewanthtammana@users.noreply.github.com>

* Fixed review suggestions

Signed-off-by: Rewanth Tammana <22347290+rewanthtammana@users.noreply.github.com>

* Cleanup

Signed-off-by: Rewanth Tammana <22347290+rewanthtammana@users.noreply.github.com>

* Updated flag description

Signed-off-by: Rewanth Tammana <22347290+rewanthtammana@users.noreply.github.com>

* Updated flag description

Signed-off-by: Rewanth Tammana <22347290+rewanthtammana@users.noreply.github.com>

---------

Signed-off-by: Rewanth Tammana <22347290+rewanthtammana@users.noreply.github.com>
2024-01-18 09:49:16 -05:00
tyltr f97fa2736c remove obsolete build tag
Signed-off-by: tyltr <tylitianrui@126.com>
2024-01-17 22:26:32 +08:00
Giedrius Statkevičius b695e069b8
tsdb/main: wire "EnableOverlappingCompaction" to tsdb.Options (#13398)
This added the https://github.com/prometheus/prometheus/pull/13393
"EnableOverlappingCompaction" parameter to the compactor code but not to
the tsdb.Options. I forgot about that. Add it to `tsdb.Options` too and
set it to `true` in Prometheus.

Copy/paste the description from
https://github.com/prometheus/prometheus/pull/13393#issuecomment-1891787986

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-01-15 16:42:40 +01:00
Ayoub Mrini ace9c8a3da
promtool: allow setting multiple matchers to "promtool tsdb dump" command. (#13296)
Conditions are ANDed inside the same matcher but matchers are ORed

Including unit tests for "promtool tsdb dump".

Refactor some matchers scraping utils.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-01-15 10:29:53 +00:00
zenador 6150e1ca0e
Add analyze histograms command to promtool (#12331)
Add `query analyze` command to promtool

This command analyzes the buckets of classic and native histograms,
based on data queried from the Prometheus query API, i.e. it
doesn't require direct access to the TSDB files.

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>

---------

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2024-01-10 17:32:36 +01:00
Björn Rabenstein 9825a80498
Merge pull request #13023 from ptodev/prefer-to-not-register-metrics-globally-sd
Allow non-default registry to be used for metrics of SD components
2023-12-12 17:58:36 +01:00
Giedrius Statkevičius f36b56a62c
tsdb: remove unused option (#13282)
Digging around the TSDB code and I've found that this flag is unused so
let's remove it.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2023-12-12 09:58:54 +00:00
Paulin Todev 6279497124
Remove unnecessary else.
This was flagged by the linter.

Signed-off-by: Paulin Todev <paulin.todev@gmail.com>
2023-12-11 11:14:27 +00:00
Paulin Todev 6a5306a53c
Use const labels for Discovery Manager metrics.
Signed-off-by: Paulin Todev <paulin.todev@gmail.com>
2023-12-11 11:14:27 +00:00
Paulin Todev 6de80d7fb0
Allow non-default registry to be used for metrics of SD components
Signed-off-by: Paulin Todev <paulin.todev@gmail.com>
2023-12-11 11:14:26 +00:00
Arthur Silva Sens 5082655392
Append Created Timestamps (#12733)
* Append created timestamps.

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>

* Log when created timestamps are ignored

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>

* Proposed changes to Append CT PR.

Changes:

* Changed textparse Parser interface for consistency and robustness.
* Changed CT interface to be more explicit and handle validation.
* Simplified test, change scrapeManager to allow testability.
* Added TODOs.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Updates.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Addressed comments.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Refactor head_appender test

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>

* Fix linter issues

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>

* Use model.Sample in head appender test

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>

---------

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>
Signed-off-by: bwplotka <bwplotka@gmail.com>
Co-authored-by: bwplotka <bwplotka@gmail.com>
2023-12-11 08:43:42 +00:00
Matthieu MOREL 9c4782f1cc
golangci-lint: enable testifylint linter (#13254)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-12-07 11:35:01 +00:00
Julien Pivotto 74cd1b6a09
Merge branch 'main' into add-focus-flag-to-promtool-test-rules
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-12-05 12:27:15 +01:00
Julien Levesy 501f514389
feat(tsdb/agent): notify remote storage when commit happens (#13223)
Signed-off-by: Julien Levesy <jlevesy@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Callum Styan <callumstyan@gmail.com>
2023-12-01 14:00:26 -08:00
Oleksandr Redko 2a75604f8e
Enable default revive rules (#13068)
Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>
2023-11-29 17:23:34 +00:00
Fiona Liao 5bee0cfce2
Change ChunkReader.Chunk() to ChunkOrIterable()
The ChunkReader interface's Chunk() has been changed to ChunkOrIterable(). 

This is a precursor to OOO native histogram support - with OOO native histograms, the chunks.Meta passed to Chunk() can result in multiple chunks being returned rather than just a single chunk (e.g. if oooMergedChunk has a counter reset in the middle). 

To support this, ChunkOrIterable() requires either a single chunk or an iterable to be returned. If an iterable is returned, the caller has the responsibility of converting the samples from the iterable into possibly multiple chunks. The OOOHeadChunkReader now returns an iterable rather than a chunk to prepare for the native histograms case. Also as a beneficial side effect, oooMergedChunk and boundedChunk has been simplified as they only need to implement the Iterable interface now, not the full Chunk interface.

---------

Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
2023-11-28 11:14:29 +01:00
Julien Pivotto c92fbf3fdf Add feature flag for PromQL experimental functions.
This PR adds an Experimental flag to the functions.

This can be used by https://github.com/prometheus/prometheus/pull/13059
but also xrate and other future functions.

Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-11-14 17:48:58 +01:00
Oleksandr Redko fa90ca46e5 ci(lint): enable godot; append dot at the end of comments
Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>
2023-10-31 19:53:38 +02:00
Oleksandr Redko 8e5f0387a2
ci(lint): enable nolintlint and remove redundant comments (#12926)
Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>
2023-10-31 12:35:13 +01:00
Björn Rabenstein f2e02c52db
Merge pull request #12958 from ptodev/prefer-to-not-register-metrics-globally-scrape
Metrics in the "scrape" package can now be registered with a non-default registry
2023-10-24 13:40:04 +02:00
Rens Groothuijsen 122f9506e9
Set test group interval default to evaluation interval (#13011)
Signed-off-by: Rens Groothuijsen <l.groothuijsen@alumni.maastrichtuniversity.nl>
2023-10-20 21:32:46 +11:00
Bartlomiej Plotka 6d083312e7
native-histograms: Fixed PrometheusProto scrape format preference. (#13010)
Broken by https://github.com/prometheus/prometheus/pull/12738. We have to update both global variables (as GlobalConfig is not a pointer here).
DefaultConfig is used when no global: section is provided, whereas DefaultGlobalConfig is used when it's provided and for individual scrape configs.

Reported on #prometheus-dev (thanks to @beorn7): https://cloud-native.slack.com/archives/C01AUBA4PFE/p1697733267205649

Tested manually, it would be nice to add test at some point (quick fix for now).

Signed-off-by: bwplotka <bwplotka@gmail.com>
2023-10-19 20:38:45 +01:00
George Krajcsovits 7d7b9eacff
Fix int32 overflow issues (#12978)
On a 32 bit architecture the size of int is 32 bits. Thus converting from
int64, uint64 can overflow it and flip the sign.

Try for yourself in playground:
package main

import "fmt"

func main() {
	x := int64(0x1F0000001)
	y := int64(1)
	z := int32(x - y) // numerically this is 0x1F0000000
	fmt.Printf("%v\n", z)
}

Prints -268435456 as if x was smaller.

Followup to #12650

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2023-10-16 16:23:26 +02:00
Ziqi Zhao 1a6edff882
enhance promtool tsdb analyze command (#12869)
Improve promtool tsdb analyze

- Make it more suitable for variable size float chunks.
- Add support for histogram chunks.

---------

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
2023-10-14 20:34:50 +02:00
Paulin Todev 5752050b42
Scrape metrics can now be registered with a non-default registry.
* A registerer is passed to the scrape Manager,
and all scrape metrics register with it.
* For now the registry which we pass to the scrape
Manager is still the global one.

Signed-off-by: Paulin Todev <paulin.todev@gmail.com>
2023-10-11 16:19:00 +01:00
Bartlomiej Plotka 624b973ebf
Added ability to specify scrape protocols to accept during HTTP content type negotiation. (#12738)
* Added ability to specify scrape protocols to accept during HTTP content type negotiation.


This is done via new option in GlobalConfig and ScrapeConfig: "scrape_protocol"

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Fixed readability and log message.

Signed-off-by: bwplotka <bwplotka@gmail.com>

---------

Signed-off-by: bwplotka <bwplotka@gmail.com>
2023-10-10 11:16:55 +01:00
Matthieu MOREL 67dcca5005 ci(lint): enable errorlint linter on cmd
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-09-28 08:21:01 +00:00
ouyang1204 5d233df7ef
Fix rule check broken (#12715)
Signed-off-by: DrAuYueng <ouyang1204@gmail.com>
2023-09-25 17:48:05 +10:00
Goutham Veeramachaneni 86729d4d7b
Update exp package (#12650) 2023-09-21 22:53:51 +02:00
Bryan Boreham 5ecea3c840 promtool: fix compile error from bad merge
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-09-20 16:34:20 +00:00
Ben Ye c78124427e
Support specifying series matchers to analyze tsdb (#12842)
* support specifying series matchers to analyze tsdb

Signed-off-by: Ben Ye <benye@amazon.com>

* fix cli docs

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>
2023-09-20 11:37:32 +01:00
Paschalis Tsilias c173cd57c9
Add a header to count retried remote write requests (#12729)
Header name is `Retry-Attempt`, only set when >0.

Signed-off-by: Marc Tuduri <marctc@protonmail.com>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
2023-09-20 11:11:03 +01:00
zenador 69edd8709b
Add warnings (and annotations) to PromQL query results (#12152)
Return annotations (warnings and infos) from PromQL queries

This generalizes the warnings we have already used before (but only for problems with remote read) as "annotations".

Annotations can be warnings or infos (the latter could be false positives). We do not treat them different in the API for now and return them all as "warnings". It would be easy to distinguish them and return infos separately, should that appear useful in the future.

The new annotations are then used to create a lot of warnings or infos during PromQL evaluations. Partially these are things we have wanted for a long time (e.g. inform the user that they have applied `rate` to a metric that doesn't look like a counter), but the new native histograms have created even more needs for those annotations (e.g. if a query tries to aggregate float numbers with histograms).

The annotations added here are not yet complete. A prominent example would be a warning about a range too short for a rate calculation. But such a warnings is more tricky to create with good fidelity and we will tackle it later.

Another TODO is to take annotations into account when evaluating recording rules.

---------

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2023-09-14 18:57:31 +02:00
Arve Knudsen 156222cc50
Add context argument to LabelQuerier.LabelValues (#12665)
Add context argument to LabelQuerier.LabelValues and
LabelQuerier.SortedLabelValues.

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2023-09-14 16:02:04 +02:00
Arve Knudsen a964349e97
Add context argument to LabelQuerier.LabelNames (#12666)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2023-09-14 10:39:51 +02:00
Arve Knudsen 4451ba10b4
Add context argument to IndexReader.Postings (#12667)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2023-09-13 17:45:06 +02:00
Arve Knudsen 6ef9ed0bc3
Add context argument to DB.Delete (#12834)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2023-09-13 15:43:06 +02:00
Arve Knudsen 6daee89e5f
Add context argument to Querier.Select (#12660)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2023-09-12 12:37:38 +02:00
Ziqi Zhao eaaa21aa7f
promtool tsdb dump support native histogram (#12775)
Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
2023-09-01 13:21:52 +10:00
Gregor Zeitlinger f01718262a
Unit tests for native histograms (#12668)
promql: Extend testing framework to support native histograms

This includes both the internal testing framework as well as the rules unit test feature of promtool.

This also adds a bunch of basic tests. Many of the code level tests can now be converted to tests within the framework, and more tests can be added easily.

---------

Signed-off-by: Harold Dost <h.dost@criteo.com>
Signed-off-by: Gregor Zeitlinger <gregor.zeitlinger@grafana.com>
Signed-off-by: Stephen Lang <stephen.lang@grafana.com>
Co-authored-by: Harold Dost <h.dost@criteo.com>
Co-authored-by: Stephen Lang <stephen.lang@grafana.com>
Co-authored-by: Gregor Zeitlinger <gregor.zeitlinger@grafana.com>
2023-08-25 23:35:42 +02:00
Sylvain Rabot 4399959f79
Remove native histograms / memory snapshot restriction
Signed-off-by: Sylvain Rabot <sylvain@abstraction.fr>
2023-08-18 15:15:55 +02:00
Goutham Veeramachaneni ad4f514e66
Add OTLP Ingestion endpoint (#12571)
* Add OTLP Ingestion endpoint

We copy files from the otel-collector-contrib. See the README in
`storage/remote/otlptranslator/README.md`.

This supersedes: https://github.com/prometheus/prometheus/pull/11965

Signed-off-by: gouthamve <gouthamve@gmail.com>

* Return a 200 OK

It is what the OTEL Golang SDK expect :(

https://github.com/open-telemetry/opentelemetry-go/issues/4363

Signed-off-by: Goutham <gouthamve@gmail.com>

---------

Signed-off-by: gouthamve <gouthamve@gmail.com>
Signed-off-by: Goutham <gouthamve@gmail.com>
2023-07-28 12:35:28 +02:00
Julien Pivotto b3b669fd9a Add experimental flag and docs
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-07-12 22:33:49 +02:00
Rob Skillington e1ace8d00e Add PromQL format and label matcher set/delete commands to promtool
Signed-off-by: Rob Skillington <rob@chronosphere.io>
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-07-12 22:33:44 +02:00
Justin Lei 32d87282ad
Add Zstandard compression option for wlog (#11666)
Snappy remains as the default compression but there is now a flag to switch 
the compression algorithm.

Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-07-11 14:57:57 +02:00
Bryan Boreham 578e2b6a3f re-order imports for linter
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-07-08 12:39:33 +00:00
Bryan Boreham 5255bf06ad Replace sort.Slice with faster slices.SortFunc
The generic version is more efficient.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-07-02 22:17:08 +00:00
João Vilaça 81394ea1c5 Add --run flag to promtool test rules
Signed-off-by: João Vilaça <jvilaca@redhat.com>
2023-06-28 17:57:32 +01:00
Julien Pivotto 1214d314c3
Merge pull request #12225 from fgouteroux/feat/promtool_check_rules_stdin
promtool: read from stdin if no filenames are provided in check rules
2023-06-27 13:22:00 +02:00
Julien Pivotto 771f512757
Merge pull request #12299 from fgouteroux/promtool_push_metrics_cmd
feat(promtool): add push metrics command
2023-06-27 10:42:14 +02:00
François Gouteroux 58d38c4c56 fix: apply suggested changes
Signed-off-by: François Gouteroux <francois.gouteroux@gmail.com>
2023-06-27 09:30:39 +02:00
tyltr 0941ea4afc typo
Signed-off-by: tyltr <tylitianrui@126.com>
2023-06-16 11:09:19 +08:00
François Gouteroux f676d4a756 feat refactoring checkrules func
Signed-off-by: François Gouteroux <francois.gouteroux@gmail.com>
2023-06-01 18:04:53 +02:00
Nidhey Nitin Indurkar a8772a4178
Feat: Get block by id directly on promtool analyze & get latest block if ID not provided (#12031)
* feat: analyze latest block or block by ID in CLI (promtool)

Signed-off-by: nidhey27 <nidhey.indurkar@infracloud.io>

* address remarks

Signed-off-by: nidhey60@gmail.com <nidhey.indurkar@infracloud.io>

* address latest review comments

Signed-off-by: nidhey60@gmail.com <nidhey.indurkar@infracloud.io>

---------

Signed-off-by: nidhey27 <nidhey.indurkar@infracloud.io>
Signed-off-by: nidhey60@gmail.com <nidhey.indurkar@infracloud.io>
2023-06-01 17:13:09 +05:30
François Gouteroux 6ae4a46845 feat: enhance stdin check and add tests parsing error
Signed-off-by: François Gouteroux <francois.gouteroux@gmail.com>
2023-06-01 10:28:55 +02:00
François Gouteroux 4341b98eb2 fix: apply suggested changes
Signed-off-by: François Gouteroux <francois.gouteroux@gmail.com>
2023-05-28 19:55:00 +02:00
François Gouteroux 934c5ddb8d feat: make push metrics labels generic and repeatable
Signed-off-by: François Gouteroux <francois.gouteroux@gmail.com>
2023-05-24 11:33:07 +02:00
François Gouteroux 3524a16aa0 feat: add suggested changes, tests, and stdin support
Signed-off-by: François Gouteroux <francois.gouteroux@gmail.com>
2023-05-23 11:15:29 +02:00
Baskar Shanmugam 905a0bd63a
Added 'limit' query parameter support to /api/v1/status/tsdb endpoint (#12336)
* Added 'topN' query parameter support to /api/v1/status/tsdb endpoint

Signed-off-by: Baskar Shanmugam <baskar.shanmugam.career@gmail.com>

* Updated query parameter for tsdb status to 'limit'

Signed-off-by: Baskar Shanmugam <baskar.shanmugam.career@gmail.com>

* Corrected Stats() parameter name from topN to limit

Signed-off-by: Baskar Shanmugam <baskar.shanmugam.career@gmail.com>

* Fixed p.Stats CI failure

Signed-off-by: Baskar Shanmugam <baskar.shanmugam.career@gmail.com>

---------

Signed-off-by: Baskar Shanmugam <baskar.shanmugam.career@gmail.com>
2023-05-22 14:37:07 +02:00
Callum Styan 0d2108ad79
[tsdb] re-implement WAL watcher to read via a "notification" channel (#11949)
* WIP implement WAL watcher reading via notifications over a channel from
the TSDB code

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Notify via head appenders Commit (finished all WAL logging) rather than
on each WAL Log call

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix misspelled Notify plus add a metric for dropped Write notifications

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Update tests to handle new notification pattern

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* this test maybe needs more time on windows?

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* does this test need more time on windows as well?

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* read timeout is already a time.Duration

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* remove mistakenly commited benchmark data files

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* address some review feedback

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* fix missed changes from previous commit

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix issues from wrapper function

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* try fixing race condition in test by allowing tests to overwrite the
read ticker timeout instead of calling the Notify function

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* fix linting

Signed-off-by: Callum Styan <callumstyan@gmail.com>

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2023-05-15 12:31:49 -07:00
Björn Rabenstein 37fe9b89dc
Merge pull request #12055 from leizor/leizor/prometheus/issues/12009
Adjust samplesPerChunk from 120 to 220
2023-05-10 14:45:12 +02:00
François Gouteroux b1bab7bc54 feat(promtool): add push metrics command
Signed-off-by: François Gouteroux <francois.gouteroux@gmail.com>
2023-04-27 14:49:38 +02:00
Matthieu MOREL bae9a21200
Merge branch 'main' into linter/nilerr
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-04-19 19:56:39 +02:00
beorn7 5b53aa1108 style: Replace else if cascades with switch
Wiser coders than myself have come to the conclusion that a `switch`
statement is almost always superior to a statement that includes any
`else if`.

The exceptions that I have found in our codebase are just these two:

* The `if else` is followed by an additional statement before the next
  condition (separated by a `;`).
* The whole thing is within a `for` loop and `break` statements are
  used. In this case, using `switch` would require tagging the `for`
  loop, which probably tips the balance.

Why are `switch` statements more readable?

For one, fewer curly braces. But more importantly, the conditions all
have the same alignment, so the whole thing follows the natural flow
of going down a list of conditions. With `else if`, in contrast, all
conditions but the first are "hidden" behind `} else if `, harder to
spot and (for no good reason) presented differently from the first
condition.

I'm sure the aforemention wise coders can list even more reasons.

In any case, I like it so much that I have found myself recommending
it in code reviews. I would like to make it a habit in our code base,
without making it a hard requirement that we would test on the CI. But
for that, there has to be a role model, so this commit eliminates all
`if else` occurrences, unless it is autogenerated code or fits one of
the exceptions above.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-19 17:22:31 +02:00
beorn7 c3c7d44d84 lint: Adjust to the lint warnings raised by current versions of golint-ci
We haven't updated golint-ci in our CI yet, but this commit prepares
for that.

There are a lot of new warnings, and it is mostly because the "revive"
linter got updated. I agree with most of the new warnings, mostly
around not naming unused function parameters (although it is justified
in some cases for documentation purposes – while things like mocks are
a good example where not naming the parameter is clearer).

I'm pretty upset about the "empty block" warning to include `for`
loops. It's such a common pattern to do something in the head of the
`for` loop and then have an empty block. There is still an open issue
about this: https://github.com/mgechev/revive/issues/810 I have
disabled "revive" altogether in files where empty blocks are used
excessively, and I have made the effort to add individual
`// nolint:revive` where empty blocks are used just once or twice.
It's borderline noisy, though, but let's go with it for now.

I should mention that none of the "empty block" warnings for `for`
loop bodies were legitimate.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-19 17:10:10 +02:00
Ben Ye fd3630b9a3 add ctx to QueryEngine interface
Signed-off-by: Ben Ye <benye@amazon.com>
2023-04-17 21:32:38 -07:00
Justin Lei 052993414a Add storage.tsdb.samples-per-chunk flag
Signed-off-by: Justin Lei <justin.lei@grafana.com>
2023-04-13 15:59:49 -07:00
Matthieu MOREL fb3eb21230 enable gocritic, unconvert and unused linters
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-04-13 19:20:22 +00:00
beorn7 c0879d64cf promql: Separate Point into FPoint and HPoint
In other words: Instead of having a “polymorphous” `Point` that can
either contain a float value or a histogram value, use an `FPoint` for
floats and an `HPoint` for histograms.

This seemingly small change has a _lot_ of repercussions throughout
the codebase.

The idea here is to avoid the increase in size of `Point` arrays that
happened after native histograms had been added.

The higher-level data structures (`Sample`, `Series`, etc.) are still
“polymorphous”. The same idea could be applied to them, but at each
step the trade-offs needed to be evaluated.

The idea with this change is to do the minimum necessary to get back
to pre-histogram performance for functions that do not touch
histograms. Here are comparisons for the `changes` function. The test
data doesn't include histograms yet. Ideally, there would be no change
in the benchmark result at all.

First runtime v2.39 compared to directly prior to this commit:

```
name                                                  old time/op    new time/op    delta
RangeQuery/expr=changes(a_one[1d]),steps=1-16            391µs ± 2%     542µs ± 1%  +38.58%  (p=0.000 n=9+8)
RangeQuery/expr=changes(a_one[1d]),steps=10-16           452µs ± 2%     617µs ± 2%  +36.48%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_one[1d]),steps=100-16         1.12ms ± 1%    1.36ms ± 2%  +21.58%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_one[1d]),steps=1000-16        7.83ms ± 1%    8.94ms ± 1%  +14.21%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_ten[1d]),steps=1-16           2.98ms ± 0%    3.30ms ± 1%  +10.67%  (p=0.000 n=9+10)
RangeQuery/expr=changes(a_ten[1d]),steps=10-16          3.66ms ± 1%    4.10ms ± 1%  +11.82%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_ten[1d]),steps=100-16         10.5ms ± 0%    11.8ms ± 1%  +12.50%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_ten[1d]),steps=1000-16        77.6ms ± 1%    87.4ms ± 1%  +12.63%  (p=0.000 n=9+9)
RangeQuery/expr=changes(a_hundred[1d]),steps=1-16       30.4ms ± 2%    32.8ms ± 1%   +8.01%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=10-16      37.1ms ± 2%    40.6ms ± 2%   +9.64%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=100-16      105ms ± 1%     117ms ± 1%  +11.69%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16     783ms ± 3%     876ms ± 1%  +11.83%  (p=0.000 n=9+10)
```

And then runtime v2.39 compared to after this commit:

```
name                                                  old time/op    new time/op    delta
RangeQuery/expr=changes(a_one[1d]),steps=1-16            391µs ± 2%     547µs ± 1%  +39.84%  (p=0.000 n=9+8)
RangeQuery/expr=changes(a_one[1d]),steps=10-16           452µs ± 2%     616µs ± 2%  +36.15%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_one[1d]),steps=100-16         1.12ms ± 1%    1.26ms ± 1%  +12.20%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_one[1d]),steps=1000-16        7.83ms ± 1%    7.95ms ± 1%   +1.59%  (p=0.000 n=10+8)
RangeQuery/expr=changes(a_ten[1d]),steps=1-16           2.98ms ± 0%    3.38ms ± 2%  +13.49%  (p=0.000 n=9+10)
RangeQuery/expr=changes(a_ten[1d]),steps=10-16          3.66ms ± 1%    4.02ms ± 1%   +9.80%  (p=0.000 n=10+9)
RangeQuery/expr=changes(a_ten[1d]),steps=100-16         10.5ms ± 0%    10.8ms ± 1%   +3.08%  (p=0.000 n=8+10)
RangeQuery/expr=changes(a_ten[1d]),steps=1000-16        77.6ms ± 1%    78.1ms ± 1%   +0.58%  (p=0.035 n=9+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=1-16       30.4ms ± 2%    33.5ms ± 4%  +10.18%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=10-16      37.1ms ± 2%    40.0ms ± 1%   +7.98%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=100-16      105ms ± 1%     107ms ± 1%   +1.92%  (p=0.000 n=10+10)
RangeQuery/expr=changes(a_hundred[1d]),steps=1000-16     783ms ± 3%     775ms ± 1%   -1.02%  (p=0.019 n=9+9)
```

In summary, the runtime doesn't really improve with this change for
queries with just a few steps. For queries with many steps, this
commit essentially reinstates the old performance. This is good
because the many-step queries are the one that matter most (longest
absolute runtime).

In terms of allocations, though, this commit doesn't make a dent at
all (numbers not shown). The reason is that most of the allocations
happen in the sampleRingIterator (in the storage package), which has
to be addressed in a separate commit.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-13 19:25:16 +02:00
François Gouteroux 8472596fd0 fix: apply suggested changes
Signed-off-by: François Gouteroux <francois.gouteroux@gmail.com>
2023-04-05 14:51:08 +02:00
François Gouteroux 034eb2b3f2 promtool: read from stdin if no filenames are provided in check rules
Signed-off-by: François Gouteroux <francois.gouteroux@gmail.com>
2023-04-05 11:33:47 +02:00
Julien Pivotto 391473141d
Check health & ready: move to flags (#12223)
This makes it more consistent with other command like import rules. We
don't have stricts rules and uniformity accross promtool unfortunately,
but I think it's better to only have the http config on relevant check
commands to avoid thinking Prometheus can e.g. check the config over the
wire.

Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-04-05 09:45:39 +02:00
Ganesh Vernekar 5588cab8b2
Merge pull request #12173 from bboreham/builder-no-empty-labels
labels: simplify call to get Labels from Builder
2023-04-04 12:02:55 +05:30
Nidhey Nitin Indurkar 3f7beeecc6
feat: health and readiness check of prometheus server in CLI (promtool) (#12096)
* feat: health and readiness check of prometheus server in CLI (promtool)

Signed-off-by: nidhey27 <nidhey.indurkar@infracloud.io>
2023-04-03 22:32:39 +02:00
Łukasz Mierzwa f2b9a39a48 Use a random port in cmd/prometheus tests
There are a few tests that will run prometheus command.
This can test if there's already something listening on port :9090 since --web.listen-address defaults to 0.0.0.0:9090.
To fix that we can tell prometheus to use a random port on loopback interface.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2023-04-03 11:21:50 +01:00
Bryan Boreham b987afa7ef labels: simplify call to get Labels from Builder
It took a `Labels` where the memory could be re-used, but in practice
this hardly ever benefitted. Especially after converting `relabel.Process`
to `relabel.ProcessBuilder`.

Comparing the parameter to `nil` was a bug; `EmptyLabels` is not `nil`
so the slice was reallocated multiple times by `append`.

Lastly `Builder.Labels()` now estimates that the final size will depend
on labels added and deleted.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-03-22 17:05:20 +00:00
Julien Pivotto 1922db0586 Document command line tools
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-03-13 14:20:55 +01:00
Bryan Boreham b96b89ef8b
Merge pull request #12048 from bboreham/faster-targets
Scraping targets are synced by creating the full set, then adding/removing any which have changed.
This PR speeds up the process of creating the full set.

I added a benchmark for `TargetsFromGroup`; it uses configuration from a typical Kubernetes SD.

The crux of the change is to do relabeling inside labels.Builder instead of converting to labels.Labels and back again for every rule. The change is broken into several commits for easier review.

This is a breaking change to `scrape.PopulateLabels()`, but `relabel.Process` is left as-is, with a new `relabel.ProcessBuilder` option.
2023-03-09 11:10:01 +00:00
Julien Pivotto 0c56e5d014 Update our own dependencies, support proxy from env
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-03-08 12:00:17 +01:00
Bryan Boreham f4fd9b0d68 scrape: re-use memory in TargetsFromGroup
Common service discovery mechanisms such as Kubernetes can generate a
lot of target groups, so this function was allocating a lot of memory
which then immediately became garbage. Re-using the structures across
an entire Sync saves effort.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-03-07 17:21:37 +00:00
Bryan Boreham 5cfe759348 scrape: make TargetsFromGroup work with Builder not []Label
Save work converting to `Labels` then to `Builder`.
`PopulateLabels()` now takes as Builder as input.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-03-07 17:21:37 +00:00
Julien Pivotto 599b70a05d Add include scrape configs
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-03-06 23:35:39 +01:00
Martin Chodur f1de2cec3d fix: set the http round tripper fro promtool import command
Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
2023-02-10 23:23:38 +01:00
Martin Chodur 3ebe4b48db feat: add promtool http config support
Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
2023-02-10 14:53:20 +01:00
Amin Borjian 90d6873c7f promtool: add support of selecting timeseries for TSDB dump
Dumping without any limit on the data being dumped will generate
a large amount of data. Also, sometimes it is necessary to dump
only a part of the data in order to change or transfer it.

This change allows to specify a part of the data to dump and
by default works same as before. (no public API change)

Signed-off-by: Amin Borjian <borjianamin98@outlook.com>
2023-01-20 15:46:23 +03:30
Marc Tudurí 9474610baf
Support FloatHistogram in TSDB (#11522)
Extends Appender.AppendHistogram function to accept the FloatHistogram. TSDB supports appending, querying, WAL replay, for this new type of histogram.

Signed-off-by: Marc Tudurí <marctc@protonmail.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-12-28 14:25:07 +05:30
Bryan Boreham 10b27dfb84 Simplify IndexReader.Series interface
Instead of passing in a `ScratchBuilder` and `Labels`, just pass the
builder and the caller can extract labels from it. In many cases the
caller didn't use the Labels value anyway.

Now in `Labels.ScratchBuilder` we need a slightly different API: one
to assign what will be the result, instead of overwriting some other
`Labels`. This is safer and easier to reason about.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-12-19 15:22:09 +00:00
Bryan Boreham 19f300e6f0 Update package cmd/promtool tests for new labels.Labels type
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-12-19 15:22:09 +00:00
Bryan Boreham bf2c827d91 Update package cmd/promtool for new labels.Labels type
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-12-19 15:22:09 +00:00
Bryan Boreham d3d96ec887 tsdb/index: use ScratchBuilder to create Labels
This necessitates a change to the `tsdb.IndexReader` interface:
`index.Reader` is used from multiple goroutines concurrently, so we
can't have state in it.

We do retain a `ScratchBuilder` in `blockBaseSeriesSet` which is
iterator-like.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-12-19 15:22:09 +00:00
Bryan Boreham 3c7de69059 storage: allow re-use of iterators
Patterned after `Chunk.Iterator()`: pass the old iterator in so it
can be re-used to avoid allocating a new object.

(This commit does not do any re-use; it is just changing all the method
signatures so re-use is possible in later commits.)

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-12-15 18:32:45 +00:00
Ganesh Vernekar 6dd4e907a3
Update dependencies for 2.40 (#11524)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-11-03 11:48:01 -04:00
Ganesh Vernekar 04b370da00
Disable snapshot on shutdown if native histograms are enabled (#11473)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-10-19 15:21:29 +05:30
Ganesh Vernekar 05b7af28ee
Merge pull request #11450 from codesome/fix-conflict
Sync sparsehistogram branch with main branch
2022-10-12 18:33:26 +05:30
Ganesh Vernekar 648be89822
Merge remote-tracking branch 'upstream/main' into fix-conflict
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-10-12 14:20:02 +05:30
Ganesh Vernekar 081ad2d690
Update help text for enable-feature to mention native-histograms
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-10-12 13:27:37 +05:30
Ganesh Vernekar 3cbf87b83d
Enable protobuf negotiation only when histograms are enabled
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-10-12 13:27:22 +05:30
Ganesh Vernekar 46b26c4f09
Fix notifier relabel changing the labels of active alerts (#11427)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-10-07 20:28:17 +05:30
Jesus Vazquez e934d0f011 Merge 'main' into sparsehistogram
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
2022-10-05 22:14:49 +02:00
Ganesh Vernekar f34aeefe6e
Allow overlapping blocks by default (#11331)
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-09-28 19:17:54 +05:30
Paschalis Tsilias f2ee959354
Remove 'metadata-storage' CLI flag (#11351)
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
2022-09-27 12:05:09 +05:30
Jesus Vazquez c1b669bf9b
Add out-of-order sample support to the TSDB (#11075)
* Introduce out-of-order TSDB support

This implementation is based on this design doc:
https://docs.google.com/document/d/1Kppm7qL9C-BJB1j6yb6-9ObG3AbdZnFUBYPNNWwDBYM/edit?usp=sharing

This commit adds support to accept out-of-order ("OOO") sample into the TSDB
up to a configurable time allowance. If OOO is enabled, overlapping querying
are automatically enabled.

Most of the additions have been borrowed from
https://github.com/grafana/mimir-prometheus/
Here is the list ist of the original commits cherry picked
from mimir-prometheus into this branch:
- 4b2198d7ec
- 2836e5513f
- 00b379c3a5
- ff0dc75758
- a632c73352
- c6f3d4ab33
- 5e8406a1d4
- abde1e0ba1
- e70e769889
- df59320886

Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Dieter Plaetinck <dieter@grafana.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* gofumpt files

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Add license header to missing files

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix OOO tests due to existing chunk disk mapper implementation

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix truncate int overflow

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Add Sync method to the WAL and update tests

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* remove useless sync

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Update minOOOTime after truncating Head

* Update minOOOTime after truncating Head

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix lint

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Add a unit test

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Load OutOfOrderTimeWindow only once per appender

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix OOO Head LabelValues and PostingsForMatchers

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix replay of OOO mmap chunks

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Remove unnecessary err check

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Prevent panic with ApplyConfig

Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Run OOO compaction after restart if there is OOO data from WBL

Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Apply Bartek's suggestions

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Refactor OOO compaction

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Address comments and TODOs

- Added a comment explaining why we need the allow overlapping
  compaction toggle
- Clarified TSDBConfig OutOfOrderTimeWindow doc
- Added an owner to all the TODOs in the code

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Run go format

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix remaining review comments

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix tests

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Change wbl reference when truncating ooo in TestHeadMinOOOTimeUpdate

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix TestWBLAndMmapReplay test failure on windows

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Address most of the feedback

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Refactor the block meta for out of order

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix windows error

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix review comments

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Dieter Plaetinck <dieter@grafana.com>
Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2022-09-20 22:35:50 +05:30
Ganesh Vernekar d354f20c2a
Add a feature flag to control native histogram ingestion (#11253)
* Add runtime config to control native histogram ingestion

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Make the config into a CLI flag

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2022-09-14 17:38:34 +05:30
Bryan Boreham c438b50133 cmd/promtool: in tests use labels.FromStrings
Replacing code which assumes the internal structure of `Labels`.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-09-09 13:34:49 +02:00
Bryan Boreham 735914f692 cmd/prometheus: in tests use labels.FromStrings
Replacing code which assumes the internal structure of `Labels`.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-09-09 13:34:49 +02:00
Cosrider bef6556ca5
delete redundant alias (#11180)
Signed-off-by: Cosrider <cosrider7@gmail.com>

Signed-off-by: Cosrider <cosrider7@gmail.com>
2022-08-31 15:50:38 +02:00
Paschalis Tsilias 5a8e202f94
Append metadata to the WAL in the scrape loop (#10312)
* Append metadata to the WAL

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Remove extra whitespace; Reword some docstrings and comments

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Use RLock() for hasNewMetadata check

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Use single byte for metric type in RefMetadata

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Update proposed WAL format for single-byte type metadata

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Address first round of review comments

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Amend description of metadata in wal.md

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Correct key used to retrieve metadata from cache

When we're setting metadata entries in the scrapeCace, we're using the
p.Help(), p.Unit(), p.Type() helpers, which retrieve the series name and
use it as the cache key. When checking for cache entries though, we used
p.Series() as the key, which included the metric name _with_ its labels.
That meant that we were never actually hitting the cache. We're fixing
this by utiling the __name__ internal label for correctly getting the
cache entries after they've been set by setHelp(), setType() or
setUnit().

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Put feature behind a feature flag

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Reorder WAL format document

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Fix CR comments

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Extract logic about changing metadata in an anonymous function

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Implement new proposed WAL format and amend relevant tests

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Use 'const' for metadata field names

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Apply metadata to head memSeries in Commit, not in AppendMetadata

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Add docstring and rename extracted helper in scrape.go

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Fix review comments around TestMetadata* tests

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Rebase with merged TSDB changes; fix duplicate definitions after rebase

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Remove leftover changes on db_test.go

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Rename feature flag

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Simplify updateMetadata helper function

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Remove extra newline

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
2022-08-31 15:50:05 +02:00
Bryan Boreham 8b863c42dd
Optimise relabeling by re-using memory (#11147)
* model/relabel: Add benchmark

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* model/relabel: re-use Builder across relabels

Saves memory allocations.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* labels.Builder: allow re-use of result slice

This reduces memory allocations where the caller has a suitable slice available.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* model/relabel: re-use source values slice

To reduce memory allocations.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Unwind one change causing test failures

Restore original behaviour in PopulateLabels, where we must not overwrite the input set.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* relabel: simplify values optimisation

Use a stack-based array for up to 16 source labels, which will be the
vast majority of cases.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* lint

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2022-08-19 15:27:52 +05:30
beorn7 c9fd3c235d Merge branch 'main' into sparsehistogram 2022-08-10 17:54:37 +02:00
Levi Harrison fa9bc5184a
Update and fix interface (#11131)
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2022-08-10 10:14:52 +02:00
Levi Harrison d61459d826
no-default-scrape-port feature flag (#9523)
* Add `no-default-scrape-port` flag

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2022-07-20 13:35:47 +02:00
Paschalis Tsilias d1122e0743
Introduce TSDB changes for appending metadata to the WAL (#10972)
* Append metadata to the WAL

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Remove extra whitespace; Reword some docstrings and comments

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Use RLock() for hasNewMetadata check

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Use single byte for metric type in RefMetadata

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Update proposed WAL format for single-byte type metadata

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Implementa MetadataAppender interface for the Agent

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Address first round of review comments

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Amend description of metadata in wal.md

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Correct key used to retrieve metadata from cache

When we're setting metadata entries in the scrapeCace, we're using the
p.Help(), p.Unit(), p.Type() helpers, which retrieve the series name and
use it as the cache key. When checking for cache entries though, we used
p.Series() as the key, which included the metric name _with_ its labels.
That meant that we were never actually hitting the cache. We're fixing
this by utiling the __name__ internal label for correctly getting the
cache entries after they've been set by setHelp(), setType() or
setUnit().

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Put feature behind a feature flag

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Fix AppendMetadata docstring

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Reorder WAL format document

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Change error message of AppendMetadata; Fix access of s.meta in AppendMetadata

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Reuse temporary buffer in Metadata encoder

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Only keep latest metadata for each refID during checkpointing

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Fix test that's referencing decoding metadata

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Avoid creating metadata block if no new metadata are present

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Add tests for corrupt metadata block and relevant record type

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Fix CR comments

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Extract logic about changing metadata in an anonymous function

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Implement new proposed WAL format and amend relevant tests

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Use 'const' for metadata field names

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Apply metadata to head memSeries in Commit, not in AppendMetadata

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Add docstring and rename extracted helper in scrape.go

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Add tests for tsdb-related cases

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Fix linter issues vol1

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Fix linter issues vol2

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Fix Windows test by closing WAL reader files

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Use switch instead of two if statements in metadata decoding

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Fix review comments around TestMetadata* tests

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Add code for replaying WAL; test correctness of in-memory data after a replay

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Remove scrape-loop related code from PR

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Address first round of comments

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Simplify tests by sorting slices before comparison

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Fix test to use separate transactions

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Empty out buffer and record slices after encoding latest metadata

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Fix linting issue

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Update calculation for DroppedMetadata metric

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Rename MetadataAppender interface and AppendMetadata method to MetadataUpdater/UpdateMetadata

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Reuse buffer when encoding latest metadata for each series

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Fix review comments; Check all returned error values using two helpers

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Simplify use of helpers

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>

* Satisfy linter

Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
2022-07-19 10:58:52 +02:00
beorn7 28f028e938 Merge branch 'main' into sparsehistogram 2022-07-12 19:07:13 +02:00
Julien Pivotto 7a2d24b76a
Fix flakiness in windows tests (#10983)
Our windows CI is too slow, process takes lots of time to start.

Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2022-07-06 10:33:14 +02:00
Julien Pivotto 13bd4fd3c8
Fix promtool check config not erroring properly on failures (#10952)
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2022-07-01 14:38:49 +02:00
lixin18 735a07444a
Update main_unix_test.go (#10917)
so->,so

Signed-off-by: lixin18 <68135097+lixin963@users.noreply.github.com>
2022-06-27 16:15:51 +02:00