prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2025-03-05 20:59:13 -08:00

Author	SHA1	Message	Date
William Dumont	ce6ad15422	remote-write: TestClientRetryAfter status code 500 and compare the retryAfter values. Signed-off-by: William Dumont <william.dumont@grafana.com>	2023-09-20 10:25:43 +00:00
William Dumont	febd62a23e	remote-write: refactor TestClientRetryAfter The new version features a set of test cases that simplify the addition of new HTTP status codes. Signed-off-by: William Dumont <william.dumont@grafana.com>	2023-09-20 10:24:52 +00:00
Bryan Boreham	9b85354acd	remote-write: respect Retry-After header on 5xx errors If the server sent it to us, we should assume it knows better than we do and respect it. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2023-09-20 10:14:38 +00:00
Paschalis Tsilias	c173cd57c9	Add a header to count retried remote write requests (#12729 ) Header name is `Retry-Attempt`, only set when >0. Signed-off-by: Marc Tuduri <marctc@protonmail.com> Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>	2023-09-20 11:11:03 +01:00
George Krajcsovits	3512b2d678	storage: make histogram reset handling consistent in chainSampleIterator (#12779 ) storage: make histogram reset handling consistent in chainSampleIterator --------- Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2023-09-19 17:06:46 +02:00
zenador	69edd8709b	Add warnings (and annotations) to PromQL query results (#12152 ) Return annotations (warnings and infos) from PromQL queries This generalizes the warnings we have already used before (but only for problems with remote read) as "annotations". Annotations can be warnings or infos (the latter could be false positives). We do not treat them different in the API for now and return them all as "warnings". It would be easy to distinguish them and return infos separately, should that appear useful in the future. The new annotations are then used to create a lot of warnings or infos during PromQL evaluations. Partially these are things we have wanted for a long time (e.g. inform the user that they have applied `rate` to a metric that doesn't look like a counter), but the new native histograms have created even more needs for those annotations (e.g. if a query tries to aggregate float numbers with histograms). The annotations added here are not yet complete. A prominent example would be a warning about a range too short for a rate calculation. But such a warnings is more tricky to create with good fidelity and we will tackle it later. Another TODO is to take annotations into account when evaluating recording rules. --------- Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2023-09-14 18:57:31 +02:00
Arve Knudsen	156222cc50	Add context argument to LabelQuerier.LabelValues (#12665 ) Add context argument to LabelQuerier.LabelValues and LabelQuerier.SortedLabelValues. Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2023-09-14 16:02:04 +02:00
Arve Knudsen	a964349e97	Add context argument to LabelQuerier.LabelNames (#12666 ) Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2023-09-14 10:39:51 +02:00
beorn7	0521ec12af	storage: remove obsolete TODO This was solved one layer deeper with #11687. Signed-off-by: beorn7 <beorn@grafana.com>	2023-09-13 15:10:22 +02:00
Arve Knudsen	6daee89e5f	Add context argument to Querier.Select (#12660 ) Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2023-09-12 12:37:38 +02:00
Gregor Zeitlinger	f01718262a	Unit tests for native histograms (#12668 ) promql: Extend testing framework to support native histograms This includes both the internal testing framework as well as the rules unit test feature of promtool. This also adds a bunch of basic tests. Many of the code level tests can now be converted to tests within the framework, and more tests can be added easily. --------- Signed-off-by: Harold Dost <h.dost@criteo.com> Signed-off-by: Gregor Zeitlinger <gregor.zeitlinger@grafana.com> Signed-off-by: Stephen Lang <stephen.lang@grafana.com> Co-authored-by: Harold Dost <h.dost@criteo.com> Co-authored-by: Stephen Lang <stephen.lang@grafana.com> Co-authored-by: Gregor Zeitlinger <gregor.zeitlinger@grafana.com>	2023-08-25 23:35:42 +02:00
Justin Lei	8ef7dfdeeb	Add a chunk size limit in bytes (#12054 ) Add a chunk size limit in bytes This creates a hard cap for XOR chunks of 1024 bytes. The limit for histogram chunk is also 1024 bytes, but it is a soft limit as a histogram has a dynamic size, and even a single one could be larger than 1024 bytes. This also avoids cutting new histogram chunks if the existing chunk has fewer than 10 histograms yet. In that way, we are accepting "jumbo chunks" in order to have at least 10 histograms in a chunk, allowing compression to kick in. Signed-off-by: Justin Lei <justin.lei@grafana.com>	2023-08-24 15:21:17 +02:00
beorn7	aa82fe198f	tsdb: Fix histogram validation So far, `ValidateHistogram` would not detect if the count did not include the count in the zero bucket. This commit fixes the problem and updates all the tests that have been undetected offenders so far. Note that this problem would only ever create false negatives, so we never falsely rejected to store a histogram because of it. On the other hand, `ValidateFloatHistogram` has been to strict with the count being at least as large as the sum of the counts in all the buckets. Float precision issues could create false positives here, see products of PromQL evaluations, it's actually quite hard to put an upper limit no the floating point imprecision. Users could produce the weirdest expressions, maxing out float precision problems. Therefore, this commit simply removes that particular check from `ValidateFloatHistogram`. Signed-off-by: beorn7 <beorn@grafana.com>	2023-08-22 23:04:01 +02:00
Michael Hoffmann	4d8e380269	promql: allow tests to be imported (#12050 ) Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>	2023-08-18 20:48:59 +02:00
Bryan Boreham	d2ae8dc3cb	remote-write: add http.resend_count tracing attribute As recommended by the OpenTelemetry semantic conventions. https://opentelemetry.io/docs/specs/otel/trace/semantic_conventions/http/#http-client Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2023-08-11 16:20:12 +00:00
Goutham Veeramachaneni	ad4f514e66	Add OTLP Ingestion endpoint (#12571 ) * Add OTLP Ingestion endpoint We copy files from the otel-collector-contrib. See the README in `storage/remote/otlptranslator/README.md`. This supersedes: https://github.com/prometheus/prometheus/pull/11965 Signed-off-by: gouthamve <gouthamve@gmail.com> * Return a 200 OK It is what the OTEL Golang SDK expect :( https://github.com/open-telemetry/opentelemetry-go/issues/4363 Signed-off-by: Goutham <gouthamve@gmail.com> --------- Signed-off-by: gouthamve <gouthamve@gmail.com> Signed-off-by: Goutham <gouthamve@gmail.com>	2023-07-28 12:35:28 +02:00
George Krajcsovits	6cd2d1621f	Hide histogram chunk append and reset header internals (#12352 ) tsdb: Hide histogram chunk append and reset header internals Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>	2023-07-26 15:08:16 +02:00
LHHDZ	7d8f9b0978	remote-write receiver: reuse 'ref' to optimize multiple samples for same series (#12580 ) reuse 'ref' to optimize multi samples processing efficiency Signed-off-by: changlin.shi <changlin.shi@ly.com>	2023-07-22 14:24:46 +01:00
György Krajcsovits	d4e355243a	tsdbutil/ChunkFromSamplesGeneric should not panic Add error handling instead. Prepares for #12352 Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2023-07-20 17:01:34 +02:00
Julien Pivotto	0f85e4f41d	Merge pull request #12539 from bboreham/slices-sorts Replace sort.Slice with faster slices.SortFunc	2023-07-11 13:09:02 +02:00
Bryan Boreham	ce153e3fff	Replace sort.Sort with faster slices.SortFunc The generic version is more efficient. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2023-07-10 09:43:45 +00:00
Julien Pivotto	986fde06b2	Merge pull request #11688 from damnever/fix/datamodelvalidation-remotewriteapi Validate the metric names and labels in the remote write handler	2023-07-04 13:52:02 +02:00
Bryan Boreham	5255bf06ad	Replace sort.Slice with faster slices.SortFunc The generic version is more efficient. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2023-07-02 22:17:08 +00:00
rakshith210	b1675e23af	Add Azure AD package for remote write (#11944 ) * Add Azure AD package for remote write * Made AzurePublic default and updated configuration.md * Updated config structure and removed getToken at initialization * Changed passing context from request Signed-off-by: Rakshith Padmanabha <rapadman@microsoft.com> Signed-off-by: rakshith210 <rakshith.me@gmail.com>	2023-06-01 15:20:10 -06:00
Bryan Boreham	a073e04a9b	Merge pull request #12366 from prometheus/release-2.44 Merge release 2.44 back to main	2023-05-16 18:06:29 +01:00
Callum Styan	0d2108ad79	[tsdb] re-implement WAL watcher to read via a "notification" channel (#11949 ) * WIP implement WAL watcher reading via notifications over a channel from the TSDB code Signed-off-by: Callum Styan <callumstyan@gmail.com> * Notify via head appenders Commit (finished all WAL logging) rather than on each WAL Log call Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix misspelled Notify plus add a metric for dropped Write notifications Signed-off-by: Callum Styan <callumstyan@gmail.com> * Update tests to handle new notification pattern Signed-off-by: Callum Styan <callumstyan@gmail.com> * this test maybe needs more time on windows? Signed-off-by: Callum Styan <callumstyan@gmail.com> * does this test need more time on windows as well? Signed-off-by: Callum Styan <callumstyan@gmail.com> * read timeout is already a time.Duration Signed-off-by: Callum Styan <callumstyan@gmail.com> * remove mistakenly commited benchmark data files Signed-off-by: Callum Styan <callumstyan@gmail.com> * address some review feedback Signed-off-by: Callum Styan <callumstyan@gmail.com> * fix missed changes from previous commit Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix issues from wrapper function Signed-off-by: Callum Styan <callumstyan@gmail.com> * try fixing race condition in test by allowing tests to overwrite the read ticker timeout instead of calling the Notify function Signed-off-by: Callum Styan <callumstyan@gmail.com> * fix linting Signed-off-by: Callum Styan <callumstyan@gmail.com> --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>	2023-05-15 12:31:49 -07:00
George Krajcsovits	f5fcaa3872	Fix setting reset header to gauge histogram in seriesToChunkEncoder (#12329 ) Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2023-05-05 18:04:30 +05:30
Justin Lei	7bbf24b707	Make MemoizedSeriesIterator not implement chunkenc.Iterator Signed-off-by: Justin Lei <justin.lei@grafana.com>	2023-05-03 12:45:39 -07:00
beorn7	b0272255b7	storage: optimise sampleRing Replace many checks for the lengths of slices with a single tracking variable. Signed-off-by: beorn7 <beorn@grafana.com>	2023-05-03 20:09:29 +02:00
Justin Lei	6985dcbe73	Optimize and test MemoizedSeriesIterator Signed-off-by: Justin Lei <justin.lei@grafana.com>	2023-05-02 08:53:18 -07:00
Filip Petkovski	0d049feac7	Fix encoding samples in ChunkSeries (#12185 ) The storage.ChunkSeries iterator assumes that a histogram sample can always be appended to the currently open chunk. This is not the case when there is a counter reset, or when appending a stale sample to a chunk with non-stale samples. In addition, the open chunk sometimes needs to be recoded before a sample can be appended. This commit addresses the issue by implementing a RecodingAppender which can recode incoming samples in a transparent way. It also detects cases when a sample cannot be appended at all and returns `false` so that the caller can open a new chunk. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Co-authored-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>	2023-04-28 16:52:21 -04:00
Jeanette Tan	1102ffd188	Fix according to code review Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2023-04-22 02:27:15 +08:00
Jeanette Tan	e9a1e26ab7	Perform integer/float histogram type checking on conversions, and use a consistent method for determining integer vs float histogram Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2023-04-22 02:27:15 +08:00
Björn Rabenstein	78cd9ae2c3	Merge pull request #12264 from rabenhorst/sample-ring-iterator-mixed-histograms-fix Fix for `sampleRingIterator` with mixed histograms	2023-04-20 16:58:18 +02:00
Matthieu MOREL	bae9a21200	Merge branch 'main' into linter/nilerr Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2023-04-19 19:56:39 +02:00
beorn7	5b53aa1108	style: Replace `else if` cascades with `switch` Wiser coders than myself have come to the conclusion that a `switch` statement is almost always superior to a statement that includes any `else if`. The exceptions that I have found in our codebase are just these two: * The `if else` is followed by an additional statement before the next condition (separated by a `;`). * The whole thing is within a `for` loop and `break` statements are used. In this case, using `switch` would require tagging the `for` loop, which probably tips the balance. Why are `switch` statements more readable? For one, fewer curly braces. But more importantly, the conditions all have the same alignment, so the whole thing follows the natural flow of going down a list of conditions. With `else if`, in contrast, all conditions but the first are "hidden" behind `} else if `, harder to spot and (for no good reason) presented differently from the first condition. I'm sure the aforemention wise coders can list even more reasons. In any case, I like it so much that I have found myself recommending it in code reviews. I would like to make it a habit in our code base, without making it a hard requirement that we would test on the CI. But for that, there has to be a role model, so this commit eliminates all `if else` occurrences, unless it is autogenerated code or fits one of the exceptions above. Signed-off-by: beorn7 <beorn@grafana.com>	2023-04-19 17:22:31 +02:00
beorn7	c3c7d44d84	lint: Adjust to the lint warnings raised by current versions of golint-ci We haven't updated golint-ci in our CI yet, but this commit prepares for that. There are a lot of new warnings, and it is mostly because the "revive" linter got updated. I agree with most of the new warnings, mostly around not naming unused function parameters (although it is justified in some cases for documentation purposes – while things like mocks are a good example where not naming the parameter is clearer). I'm pretty upset about the "empty block" warning to include `for` loops. It's such a common pattern to do something in the head of the `for` loop and then have an empty block. There is still an open issue about this: https://github.com/mgechev/revive/issues/810 I have disabled "revive" altogether in files where empty blocks are used excessively, and I have made the effort to add individual `// nolint:revive` where empty blocks are used just once or twice. It's borderline noisy, though, but let's go with it for now. I should mention that none of the "empty block" warnings for `for` loop bodies were legitimate. Signed-off-by: beorn7 <beorn@grafana.com>	2023-04-19 17:10:10 +02:00
Sebastian Rabenhorst	5d4ec08a1f	Fixed sampleRingIterator for mixed histograms Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> Fixed sampleRingIterator for mixed histograms Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> Fixed lint	2023-04-14 12:45:36 +02:00
Matthieu MOREL	fb3eb21230	enable gocritic, unconvert and unused linters Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2023-04-13 19:20:22 +00:00
beorn7	717a3f8e25	storage: Manually expand `genericAdd` for specific types This commit is doing what I would have expected that Go generics do for me. However, the PromQL benchmarks show a significant runtime and allocation increase with `genericAdd`, so this replaces it with hand-coded non-generic versions of it. Signed-off-by: beorn7 <beorn@grafana.com>	2023-04-13 19:25:24 +02:00
beorn7	817a2396cb	Name float values as "floats", not as "values" In the past, every sample value was a float, so it was fine to call a variable holding such a float "value" or "sample". With native histograms, a sample might have a histogram value. And a histogram value is still a value. Calling a float value just "value" or "sample" or "V" is therefore misleading. Over the last few commits, I already renamed many variables, but this cleans up a few more places where the changes are more invasive. Note that we do not to attempt naming in the JSON APIs or in the protobufs. That would be quite a disruption. However, internally, we can call variables as we want, and we should go with the option of avoiding misunderstandings. Signed-off-by: beorn7 <beorn@grafana.com>	2023-04-13 19:25:24 +02:00
beorn7	462240bc78	storage: add specialized buffers to sampleRing This utilizes the fact that most sampleRings will only contain samples of one type. In this case, the generic interface is circumvented, and a bespoke buffer for the one actually occurring sample type is used. Should a sampleRing receive a sample of a different kind later, it will transparently switch to the generic behavior. Signed-off-by: beorn7 <beorn@grafana.com>	2023-04-13 19:25:24 +02:00
beorn7	630bcb494b	storage: Use separate sample types for histogram vs. float Previously, we had one “polymorphous” `sample` type in the `storage` package. This commit breaks it up into `fSample`, `hSample`, and `fhSample`, each still implementing the `tsdbutil.Sample` interface. This reduces allocations in `sampleRing.Add` but inflicts the penalty of the interface wrapper, which makes things worse in total. This commit therefore just demonstrates the step taken. The next commit will tackle the interface overhead problem. Signed-off-by: beorn7 <beorn@grafana.com>	2023-04-13 19:25:24 +02:00
Björn Rabenstein	6e0a46900b	Merge pull request #12192 from leizor/leizor/prometheus/issues/11204 Add support for native histograms to concreteSeriesIterator	2023-04-11 12:30:35 +02:00
Justin Lei	f90013a5a0	Update storage/remote/codec.go Co-authored-by: Björn Rabenstein <github@rabenste.in> Signed-off-by: Justin Lei <97976793+leizor@users.noreply.github.com>	2023-04-06 09:54:15 -07:00
Justin Lei	83f43982c9	Add support for native histograms to concreteSeriesIterator Signed-off-by: Justin Lei <justin.lei@grafana.com>	2023-04-06 09:54:15 -07:00
Xiaochao Dong (@damnever)	2b7202c4cc	Validate the metric names and labels in the remote write handler Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>	2023-04-05 19:09:05 +08:00
Bryan Boreham	b987afa7ef	labels: simplify call to get Labels from Builder It took a `Labels` where the memory could be re-used, but in practice this hardly ever benefitted. Especially after converting `relabel.Process` to `relabel.ProcessBuilder`. Comparing the parameter to `nil` was a bug; `EmptyLabels` is not `nil` so the slice was reallocated multiple times by `append`. Lastly `Builder.Labels()` now estimates that the final size will depend on labels added and deleted. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2023-03-22 17:05:20 +00:00
Björn Rabenstein	559adab471	Merge pull request #12085 from leizor/leizor/prometheus/issues/11204 Handle native histograms in remote read	2023-03-21 17:25:34 +01:00
Oleg Zaytsev	beb7d3b80f	remote.Client: store urlString During remote write, we call url.String() twice: - to add the Endpoint() to the span - to actually know where whe should send the request This value does not change over time, and it's not really that lightweight to calculate. I wrote this simple benchmark: func BenchmarkURLString(b testing.B) { u, err := url.Parse("https://remote.write.com/api/v1") require.NoError(b, err) b.Run("string", func(b testing.B) { count := 0 for i := 0; i < b.N; i++ { count += len(u.String()) } }) } And the results are ~200ns/op, 80B/op, 3 allocs/op. Yes, we're going to go to the network here, which is a huge amount of resources compared to this, but still, on agents that send 500 requests per second, that is 1500 wasteful allocations per second. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>	2023-03-16 09:53:10 +01:00

1 2 3 4 5 ...

1333 commits