prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2025-03-05 20:59:13 -08:00

Author	SHA1	Message	Date
György Krajcsovits	2a781ec5ac	Replicate infinite loop in native-classic histogram scrape Enable scraping a native histogram with exemplars that leads to infinite loop. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2023-08-21 13:12:45 +02:00
Bryan Boreham	611f50bb3d	scrape: retain all dropped targets when KeepDroppedTargets is zero This was a bug. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2023-08-20 14:32:23 +01:00
Bryan Boreham	627c99424b	scrape: extend TestDroppedTargetsList to check counts Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2023-08-20 14:32:23 +01:00
Bryan Boreham	1e3fef6ab0	scraping: limit detail on dropped targets, to save memory (#12647 ) It's possible (quite common on Kubernetes) to have a service discovery return thousands of targets then drop most of them in relabel rules. The main place this data is used is to display in the web UI, where you don't want thousands of lines of display. The new limit is `keep_dropped_targets`, which defaults to 0 for backwards-compatibility. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2023-08-14 15:39:25 +01:00
beorn7	536a487af4	scrape: Refactor names of float samples Continue to remove confusion that histogram samples are also samples and histogram values are also values etc. by renaming float values and float samples using the same schema as for histograms. Concretely: - result → resultFloats (corresponding to resultHistograms) - pendingResult → pendingFloats (corresponding to pendingHistograms) - rolledbackResult → rolledbackFloats (corresponding to rolledbackHistograms) - sample → floatSample (corresponding to histogramSample) This also order the fields in `collectResultAppender` more consistently. Signed-off-by: beorn7 <beorn@grafana.com>	2023-07-13 14:27:51 +02:00
beorn7	0e3f35324b	scrape: Enable ingestion of multiple exemplars per sample This has become a requirement for native histograms, as a single histogram sample commonly has many buckets, so that providing many exemplars makes sense. Since OM text doesn't support native histograms yet, the test had to be expanded to also support protobuf test cases. Signed-off-by: beorn7 <beorn@grafana.com>	2023-07-13 14:16:10 +02:00
Bryan Boreham	5255bf06ad	Replace sort.Slice with faster slices.SortFunc The generic version is more efficient. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2023-07-02 22:17:08 +00:00
Julius Volz	ac8abdaacd	Rename remaining jitterSeed -> offsetSeed variables (#12414 ) I had changed the naming from "jitter" to "offset" in: `cb045c0e4b` ...but I forgot to add this file to the commit to complete the renaming, doing that now. Signed-off-by: Julius Volz <julius.volz@gmail.com>	2023-06-05 17:36:11 +02:00
Julius Volz	cb045c0e4b	Fix wording from "jitterSeed" -> "offsetSeed" for server-wide scrape offsets In digital communication, "jitter" usually refers to how much a signal deviates from true periodicity, see https://en.wikipedia.org/wiki/Jitter. The way we are using the "jitterSeed" in Prometheus does not affect the true periodicity at all, but just introduces a constant phase shift (or offset) within the period. So it would be more correct and less confusing to call the "jitterSeed" an "offsetSeed" instead. Signed-off-by: Julius Volz <julius.volz@gmail.com>	2023-05-25 11:54:00 +02:00
beorn7	9e500345f3	textparse/scrape: Add option to scrape both classic and native histograms So far, if a target exposes a histogram with both classic and native buckets, a native-histogram enabled Prometheus would ignore the classic buckets. With the new scrape config option `scrape_classic_histograms` set, both buckets will be ingested, creating all the series of a classic histogram in parallel to the native histogram series. For example, a histogram `foo` would create a native histogram series `foo` and classic series called `foo_sum`, `foo_count`, and `foo_bucket`. This feature can be used in a migration strategy from classic to native histograms, where it is desired to have a transition period during which both native and classic histograms are present. Note that two bugs in classic histogram parsing were found and fixed as a byproduct of testing the new feature: 1. Series created from classic _gauge_ histograms didn't get the _sum/_count/_bucket prefix set. 2. Values of classic _float_ histograms weren't parsed properly. Signed-off-by: beorn7 <beorn@grafana.com>	2023-05-13 01:32:25 +02:00
Björn Rabenstein	bd98fc8c45	Merge pull request #12254 from zenador/histogram-bucket-limit Implement bucket limit for native histograms	2023-05-10 17:42:29 +02:00
Jeanette Tan	40240c9c1c	Update according to code review Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2023-05-05 02:33:00 +08:00
György Krajcsovits	19a4f314f5	Refactor testutil/protobuf.go into scrape package Renamed to clientprotobuf.go and added comments to indicate the intended usage. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2023-05-04 08:36:44 +02:00
Russ Cox	28f5502828	scrape: fix two loop variable scoping bugs in test Consider code like: for i := 0; i < numTargets; i++ { stopFuncs = append(stopFuncs, func() { time.Sleep(i20time.Millisecond) }) } Because the loop variable i is shared by all closures, all the stopFuncs sleep for numTargets20 ms. If the i were made per-iteration, as we are considering for a future Go release, the stopFuncs would have sleep durations ranging from 0 to (numTargets-1)20 ms. Two tests had code like this and were checking that the aggregate sleep was at least numTargets20 ms ("at least as long as the last target slept"). This is only true today because i == numTarget during all the sleeps. To keep the code working even if the semantics of this loop change, this PR computes d := time.Duration((i+1)20) * time.Millisecond outside the closure (but inside the loop body), and then each closure has its own d. Now the sleeps range from 20 ms to numTargets*20 ms, keeping the test passing (and probably behaving closer to the intent of the test author). The failure being fixed can be reproduced by using the current Go development branch with GOEXPERIMENT=loopvar go test Signed-off-by: Russ Cox <rsc@golang.org>	2023-04-26 10:33:10 -04:00
Jeanette Tan	dfabc69303	Add tests according to code review Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2023-04-25 02:07:36 +08:00
Jeanette Tan	2ad39baa72	Treat bucket limit like sample limit and make it fail the whole scrape and return an error Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2023-04-22 03:25:07 +08:00
György Krajcsovits	071426f72f	Add unit test for bucket limit appender Refactors textparser test to use a common test utility to create protobuf representation from MetricFamily Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2023-04-22 03:14:19 +08:00
Jeanette Tan	4d21ac23e6	Implement bucket limit for native histograms Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>	2023-04-22 03:14:19 +08:00
Matthieu MOREL	bae9a21200	Merge branch 'main' into linter/nilerr Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2023-04-19 19:56:39 +02:00
beorn7	5b53aa1108	style: Replace `else if` cascades with `switch` Wiser coders than myself have come to the conclusion that a `switch` statement is almost always superior to a statement that includes any `else if`. The exceptions that I have found in our codebase are just these two: * The `if else` is followed by an additional statement before the next condition (separated by a `;`). * The whole thing is within a `for` loop and `break` statements are used. In this case, using `switch` would require tagging the `for` loop, which probably tips the balance. Why are `switch` statements more readable? For one, fewer curly braces. But more importantly, the conditions all have the same alignment, so the whole thing follows the natural flow of going down a list of conditions. With `else if`, in contrast, all conditions but the first are "hidden" behind `} else if `, harder to spot and (for no good reason) presented differently from the first condition. I'm sure the aforemention wise coders can list even more reasons. In any case, I like it so much that I have found myself recommending it in code reviews. I would like to make it a habit in our code base, without making it a hard requirement that we would test on the CI. But for that, there has to be a role model, so this commit eliminates all `if else` occurrences, unless it is autogenerated code or fits one of the exceptions above. Signed-off-by: beorn7 <beorn@grafana.com>	2023-04-19 17:22:31 +02:00
beorn7	c3c7d44d84	lint: Adjust to the lint warnings raised by current versions of golint-ci We haven't updated golint-ci in our CI yet, but this commit prepares for that. There are a lot of new warnings, and it is mostly because the "revive" linter got updated. I agree with most of the new warnings, mostly around not naming unused function parameters (although it is justified in some cases for documentation purposes – while things like mocks are a good example where not naming the parameter is clearer). I'm pretty upset about the "empty block" warning to include `for` loops. It's such a common pattern to do something in the head of the `for` loop and then have an empty block. There is still an open issue about this: https://github.com/mgechev/revive/issues/810 I have disabled "revive" altogether in files where empty blocks are used excessively, and I have made the effort to add individual `// nolint:revive` where empty blocks are used just once or twice. It's borderline noisy, though, but let's go with it for now. I should mention that none of the "empty block" warnings for `for` loop bodies were legitimate. Signed-off-by: beorn7 <beorn@grafana.com>	2023-04-19 17:10:10 +02:00
Matthieu MOREL	fb3eb21230	enable gocritic, unconvert and unused linters Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2023-04-13 19:20:22 +00:00
Bryan Boreham	b987afa7ef	labels: simplify call to get Labels from Builder It took a `Labels` where the memory could be re-used, but in practice this hardly ever benefitted. Especially after converting `relabel.Process` to `relabel.ProcessBuilder`. Comparing the parameter to `nil` was a bug; `EmptyLabels` is not `nil` so the slice was reallocated multiple times by `append`. Lastly `Builder.Labels()` now estimates that the final size will depend on labels added and deleted. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2023-03-22 17:05:20 +00:00
Bryan Boreham	0c09c3feb0	scrape sync: avoid copy of labels for dropped targets Since the Target object was just created in this function, nobody else has a reference to it and there are no concerns about it being modified concurrently so we don't need to copy the value. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2023-03-16 20:35:13 +00:00
Bryan Boreham	0dfa1e73f8	scrape: use LabelsRange instead of Labels, for performance Includes a rewrite of `resolveConflictingExposedLabels` to use `labels.Builder.Get`, which simplifies it considerably. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2023-03-16 20:35:13 +00:00
Bryan Boreham	2fde2fb37d	scrape: add Target.LabelsRange This allows users of a Target to iterate labels without allocating heap memory. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2023-03-16 20:35:13 +00:00
Bryan Boreham	b96b89ef8b	Merge pull request #12048 from bboreham/faster-targets Scraping targets are synced by creating the full set, then adding/removing any which have changed. This PR speeds up the process of creating the full set. I added a benchmark for `TargetsFromGroup`; it uses configuration from a typical Kubernetes SD. The crux of the change is to do relabeling inside labels.Builder instead of converting to labels.Labels and back again for every rule. The change is broken into several commits for easier review. This is a breaking change to `scrape.PopulateLabels()`, but `relabel.Process` is left as-is, with a new `relabel.ProcessBuilder` option.	2023-03-09 11:10:01 +00:00
Julien Pivotto	1fd59791e1	Update tests Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>	2023-03-08 16:32:39 +01:00
Julien Pivotto	0c56e5d014	Update our own dependencies, support proxy from env Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>	2023-03-08 12:00:17 +01:00
Bryan Boreham	f4fd9b0d68	scrape: re-use memory in TargetsFromGroup Common service discovery mechanisms such as Kubernetes can generate a lot of target groups, so this function was allocating a lot of memory which then immediately became garbage. Re-using the structures across an entire Sync saves effort. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2023-03-07 17:21:37 +00:00
Bryan Boreham	5cfe759348	scrape: make TargetsFromGroup work with Builder not []Label Save work converting to `Labels` then to `Builder`. `PopulateLabels()` now takes as Builder as input. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2023-03-07 17:21:37 +00:00
Bryan Boreham	c1dbc7b838	scrape: make PopulateLabels work with Builder not Labels Save work converting to and fro. Uses the recently-added relabel.ProcessBuilder variant. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2023-03-07 17:21:37 +00:00
Bryan Boreham	95fc032a61	scrape: add benchmark for TargetsFromGroup `loadConfiguration` is made more general. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2023-03-07 09:46:19 +00:00
Julien Pivotto	599b70a05d	Add include scrape configs Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>	2023-03-06 23:35:39 +01:00
Jimmie Han	a13249a98f	scrape: fix prometheus_target_scrape_pool_target_limit metric not set on creating scrape pool (#12001 ) Signed-off-by: Jimmie Han <hanjinming@outlook.com>	2023-02-21 13:14:04 +08:00
Bryan Boreham	75e5d600d9	Merge pull request #11748 from bboreham/safe-scrape scrape: remove unsafe code	2023-01-16 17:57:12 +00:00
Bryan Boreham	d228d1d9cc	scrape: remove 'mets' string completely This makes all usage of maps in scrape.go consistent. Also remove comment about unsafe strings, since we don't use them any more in this package. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2023-01-04 12:05:58 +00:00
Fish-pro	6ed71a229e	Use errors.Is to check for a specific error Signed-off-by: Fish-pro <zechun.chen@daocloud.io>	2022-12-29 23:23:07 +08:00
Marc Tudurí	9474610baf	Support FloatHistogram in TSDB (#11522 ) Extends Appender.AppendHistogram function to accept the FloatHistogram. TSDB supports appending, querying, WAL replay, for this new type of histogram. Signed-off-by: Marc Tudurí <marctc@protonmail.com> Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-12-28 14:25:07 +05:30
Łukasz Mierzwa	e1b7082008	Show individual scrape pools on /targets page (#11142 ) * Add API endpoints for getting scrape pool names This adds api/v1/scrape_pools endpoint that returns the list of names of all the scrape pools configured. Having it allows to find out what scrape pools are defined without having to list and parse all targets. The second change is adding scrapePool query parameter support in api/v1/targets endpoint, that allows to filter returned targets by only finding ones for passed scrape pool name. Both changes allow to query for a specific scrape pool data, rather than getting all the targets for all possible scrape pools. The problem with api/v1/targets endpoint is that it returns huge amount of data if you configure a lot of scrape pools. Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com> * Add a scrape pool selector on /targets page Current targets page lists all possible targets. This works great if you only have a few scrape pools configured, but for systems with a lot of scrape pools and targets this slow things down a lot. Not only does the /targets page load very slowly in such case (waiting for huge API response) but it also take a long time to render, due to huge number of elements. This change adds a dropdown selector so it's possible to select only intersting scrape pool to view. There's also scrapePool query param that will open selected pool automatically. Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com> Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>	2022-12-23 11:55:08 +01:00
Bryan Boreham	bec5abc4dc	scrape: remove unsafe code The `yolostring` routine was intended to avoid an allocation when converting from a `[]byte` to a `string` for map lookup. However, since 2014 Go has recognized this pattern and does not make a copy of the data when looking up a map. So the unsafe code is not necessary. In line with this, constants like `scrapeHealthMetricName` also become `[]byte`. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-12-20 17:26:43 +00:00
Bryan Boreham	9bc6d7a7db	Update package scrape tests for new labels.Labels type Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-12-19 15:22:09 +00:00
Bryan Boreham	91254fb187	Update package scrape for new labels.Labels type Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-12-19 15:22:09 +00:00
Bryan Boreham	3c7de69059	storage: allow re-use of iterators Patterned after `Chunk.Iterator()`: pass the old iterator in so it can be re-used to avoid allocating a new object. (This commit does not do any re-use; it is just changing all the method signatures so re-use is possible in later commits.) Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-12-15 18:32:45 +00:00
Xiaochao Dong (@damnever)	9979024a30	Report error if the series contains invalid metric names or labels during scrape Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>	2022-12-08 20:01:20 +08:00
Björn Rabenstein	a61c4b266a	scrape: Fix accept header, now for real (#11552 ) This reinstates the behavior of v2.39. The header got messed up in the sparsehistogram when the change of the version in main was merged into it (and the merge conflict had to be resolved). I don't think the current state will actually break anyone, although it is technically possible. I propose to merge this into the bugfix branch in any case, but I think we can wait for other bugfixes before cutting a v2.40.1. (Unless, of course, somebody reports an actual breakage because of the header.) Signed-off-by: beorn7 <beorn@grafana.com>	2022-11-09 11:19:25 +01:00
Björn Rabenstein	54ce07e9a0	scrape: Fix accept header (#11542 ) First of all, there was a typo: `encoding=delimited` was a left-over in the `scrapeAcceptHeader`. Second, the recently updated `version=1.0.0` prevents current versions of client_golang to negotiate OpenMetrics, as they expect `version=0.0.1` or no version at all. This commit adds, with lower priority, the latter (no version at all) to the accept header. Fixes #11540, Signed-off-by: beorn7 <beorn@grafana.com>	2022-11-07 18:22:03 +01:00
Ganesh Vernekar	3cbf87b83d	Enable protobuf negotiation only when histograms are enabled Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2022-10-12 13:27:22 +05:30
Jesus Vazquez	e934d0f011	Merge 'main' into sparsehistogram Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>	2022-10-05 22:14:49 +02:00
Bryan Boreham	4927e13537	scrape tests: undo EmptyLabels change Needs other code changes otherwise tests fail Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-09-09 13:34:49 +02:00
Bryan Boreham	14780c3b4e	scrape: in tests use labels.FromStrings And a few cases of `EmptyLabels()`. Replacing code which assumes the internal structure of `Labels`. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-09-09 13:34:49 +02:00
Bogdan Drutu	3cde9287a6	scrape: remove unused member from cacheEntry (#11281 ) Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>	2022-09-08 00:01:01 +02:00
Bogdan Drutu	f736a9e953	scrape: remove duplicate mutex unlock (#11282 ) Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com> Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>	2022-09-08 00:00:14 +02:00
Bogdan Drutu	c8cfe5c25d	scrape: remove unused argument in newScrapeLoop (#11283 ) Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com> Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>	2022-09-07 23:59:57 +02:00
Cosrider	bef6556ca5	delete redundant alias (#11180 ) Signed-off-by: Cosrider <cosrider7@gmail.com> Signed-off-by: Cosrider <cosrider7@gmail.com>	2022-08-31 15:50:38 +02:00
Paschalis Tsilias	5a8e202f94	Append metadata to the WAL in the scrape loop (#10312 ) * Append metadata to the WAL Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Remove extra whitespace; Reword some docstrings and comments Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Use RLock() for hasNewMetadata check Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Use single byte for metric type in RefMetadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Update proposed WAL format for single-byte type metadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Address first round of review comments Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Amend description of metadata in wal.md Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Correct key used to retrieve metadata from cache When we're setting metadata entries in the scrapeCace, we're using the p.Help(), p.Unit(), p.Type() helpers, which retrieve the series name and use it as the cache key. When checking for cache entries though, we used p.Series() as the key, which included the metric name _with_ its labels. That meant that we were never actually hitting the cache. We're fixing this by utiling the __name__ internal label for correctly getting the cache entries after they've been set by setHelp(), setType() or setUnit(). Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Put feature behind a feature flag Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Reorder WAL format document Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix CR comments Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Extract logic about changing metadata in an anonymous function Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Implement new proposed WAL format and amend relevant tests Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Use 'const' for metadata field names Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Apply metadata to head memSeries in Commit, not in AppendMetadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Add docstring and rename extracted helper in scrape.go Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix review comments around TestMetadata* tests Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Rebase with merged TSDB changes; fix duplicate definitions after rebase Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Remove leftover changes on db_test.go Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Rename feature flag Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Simplify updateMetadata helper function Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Remove extra newline Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>	2022-08-31 15:50:05 +02:00
Marc Tudurí	f7df3b86ba	histograms: parse float histograms from proto definition (#11149 ) * histograms: parse float histograms from proto definition Signed-off-by: Marc Tuduri <marctc@protonmail.com> * Improve comment Signed-off-by: Marc Tuduri <marctc@protonmail.com> * Ignore float buckets Signed-off-by: Marc Tuduri <marctc@protonmail.com> * Refactor Histogram() function Signed-off-by: Marc Tuduri <marctc@protonmail.com> * Fix test_float_histogram Signed-off-by: Marc Tuduri <marctc@protonmail.com> * Update model/textparse/protobufparse.go Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> Signed-off-by: Marc Tudurí <marctc@protonmail.com> * Update protobufparse.go Signed-off-by: Marc Tudurí <marctc@protonmail.com> * Update scrape.go Signed-off-by: Marc Tudurí <marctc@protonmail.com> * Update scrape/scrape.go Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> Signed-off-by: Marc Tudurí <marctc@protonmail.com> Signed-off-by: Marc Tuduri <marctc@protonmail.com> Signed-off-by: Marc Tudurí <marctc@protonmail.com> Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>	2022-08-25 20:37:41 +05:30
Bryan Boreham	8b863c42dd	Optimise relabeling by re-using memory (#11147 ) * model/relabel: Add benchmark Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * model/relabel: re-use Builder across relabels Saves memory allocations. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * labels.Builder: allow re-use of result slice This reduces memory allocations where the caller has a suitable slice available. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * model/relabel: re-use source values slice To reduce memory allocations. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Unwind one change causing test failures Restore original behaviour in PopulateLabels, where we must not overwrite the input set. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * relabel: simplify values optimisation Use a stack-based array for up to 16 source labels, which will be the vast majority of cases. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * lint Signed-off-by: Bryan Boreham <bjboreham@gmail.com> Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2022-08-19 15:27:52 +05:30
beorn7	c9fd3c235d	Merge branch 'main' into sparsehistogram	2022-08-10 17:54:37 +02:00
Levi Harrison	d61459d826	`no-default-scrape-port` feature flag (#9523 ) * Add `no-default-scrape-port` flag Signed-off-by: Levi Harrison <git@leviharrison.dev>	2022-07-20 13:35:47 +02:00
Paschalis Tsilias	d1122e0743	Introduce TSDB changes for appending metadata to the WAL (#10972 ) * Append metadata to the WAL Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Remove extra whitespace; Reword some docstrings and comments Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Use RLock() for hasNewMetadata check Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Use single byte for metric type in RefMetadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Update proposed WAL format for single-byte type metadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Implementa MetadataAppender interface for the Agent Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Address first round of review comments Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Amend description of metadata in wal.md Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Correct key used to retrieve metadata from cache When we're setting metadata entries in the scrapeCace, we're using the p.Help(), p.Unit(), p.Type() helpers, which retrieve the series name and use it as the cache key. When checking for cache entries though, we used p.Series() as the key, which included the metric name _with_ its labels. That meant that we were never actually hitting the cache. We're fixing this by utiling the __name__ internal label for correctly getting the cache entries after they've been set by setHelp(), setType() or setUnit(). Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Put feature behind a feature flag Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix AppendMetadata docstring Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Reorder WAL format document Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Change error message of AppendMetadata; Fix access of s.meta in AppendMetadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Reuse temporary buffer in Metadata encoder Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Only keep latest metadata for each refID during checkpointing Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix test that's referencing decoding metadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Avoid creating metadata block if no new metadata are present Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Add tests for corrupt metadata block and relevant record type Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix CR comments Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Extract logic about changing metadata in an anonymous function Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Implement new proposed WAL format and amend relevant tests Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Use 'const' for metadata field names Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Apply metadata to head memSeries in Commit, not in AppendMetadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Add docstring and rename extracted helper in scrape.go Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Add tests for tsdb-related cases Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix linter issues vol1 Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix linter issues vol2 Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix Windows test by closing WAL reader files Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Use switch instead of two if statements in metadata decoding Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix review comments around TestMetadata* tests Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Add code for replaying WAL; test correctness of in-memory data after a replay Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Remove scrape-loop related code from PR Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Address first round of comments Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Simplify tests by sorting slices before comparison Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix test to use separate transactions Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Empty out buffer and record slices after encoding latest metadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix linting issue Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Update calculation for DroppedMetadata metric Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Rename MetadataAppender interface and AppendMetadata method to MetadataUpdater/UpdateMetadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Reuse buffer when encoding latest metadata for each series Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Fix review comments; Check all returned error values using two helpers Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Simplify use of helpers Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com> * Satisfy linter Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>	2022-07-19 10:58:52 +02:00
beorn7	28f028e938	Merge branch 'main' into sparsehistogram	2022-07-12 19:07:13 +02:00
Julien Pivotto	90583c8906	TestScrapeLoopCache: Display content of the appender (#10937 ) This should help identifying windows tests flakiness. Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>	2022-07-01 14:28:56 +02:00
Xiaonan Shen	0c3abdc26d	Keep relabeled scrape interval and timeout on reloads (#10916 ) * Preserve relabeled scrape interval and timeout on reloads Signed-off-by: Xiaonan Shen <s@sxn.dev>	2022-06-28 11:58:52 +02:00
beorn7	40ad5e284a	Merge branch 'main' into beorn7/sparsehistogram	2022-06-09 20:50:30 +02:00
Alban Hurtaud	41630b8e88	Add hidden flag to configure discovery loop interval (#10634 ) * Add hidden flag to configure discovery loop interval Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>	2022-05-06 00:42:04 +02:00
beorn7	3bc711e333	Merge branch 'main' into sparsehistogram	2022-05-04 13:37:13 +02:00
Goutham Veeramachaneni	2381d7be57	Send target and metadata cache in context (again) (#10636 ) * Send target and metadata cache in context (again) The previous attempt was rolled back in #10590 due to memory issues. `sl.parentCtx` and `sl.ctx` both had a copy of the cache and target info in the previous attempt and it was hard to pin-point where the context was being retained causing the memory increase. I've experimented a bunch in #10627 to figure out that this approach doesn't cause memory increase. Beyond that, just using this info in _any_ other context is causing a memory increase. The change fixed a bunch of long-standing in the OTel Collector that the community was waiting on and release is blocked on a few downstream distrubutions of OTel Collector waiting on a fix. I propose to merge this change in while I investigate what is happening. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Gate the change behind a manager option Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>	2022-05-03 11:45:52 -07:00
Matthieu MOREL	e2ede285a2	refactor: move from io/ioutil to io and os packages (#10528 ) * refactor: move from io/ioutil to io and os packages * use fs.DirEntry instead of os.FileInfo after os.ReadDir Signed-off-by: MOREL Matthieu <matthieu.morel@cnp.fr>	2022-04-27 11:24:36 +02:00
Julien Pivotto	685ce9964d	Merge pull request #10599 from prometheus/release-2.35 Merge back release 2.35	2022-04-15 00:10:06 +02:00
Goutham Veeramachaneni	ec3d02019e	Pass the correct context to staleness Appender (#10588 ) OTel Collector prints the following error when a target disappears: ``` 2022-04-13T14:20:24.932-0400 warn scrape/scrape.go:1408 Stale append failed {"kind": "receiver", "name": "prometheus", "scrape_pool": "beep-boop", "target": "http://localhost:9090/metrics", "error": "transaction aborted"} ``` This `transaction aborted` error is returned by the custom appender that is used by the collector when the context of the appender is cancelled: `b7bf11174e/receiver/prometheusreceiver/internal/otlp_transaction.go (L81-L82)` We call `endOfRunStaleness` after `sl.stop()` which cancels `sl.ctx`. The other `.Appender()` calls use `parentCtx` for the same reason. This hasn't come up so far because Prometheus' Appender implementation just ignores the context passed. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>	2022-04-14 10:03:07 -04:00
Julien Pivotto	db8c550570	Revoke storing target and metadata cache in context. (#10590 ) Storing the scrape cache and the target (which also contains that cache) is apparently causing hige memory increase. I think me might not control the lifespan of the context enough, therefore old objects keep living in memory for longer than needed. Let's unblock the release and look for an alternative so that downstream consumers can get access to that data. Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>	2022-04-14 15:18:46 +02:00
Jayapriya Pai	580e852f10	scrape: Update error message for label limits Signed-off-by: Jayapriya Pai <janantha@redhat.com>	2022-04-14 11:43:17 +02:00
beorn7	7ee1836ef5	Merge branch 'main' into sparsehistogram	2022-04-05 18:31:19 +02:00
Robert Fratto	44a5e705be	discovery: Expose custom HTTP client options to discoverers (#10462 ) * discovery: expose HTTP client options to discoverers Signed-off-by: Robert Fratto <robertfratto@gmail.com> * discovery/http: use HTTP client options for created client Signed-off-by: Robert Fratto <robertfratto@gmail.com> * scrape: use a list of HTTP client options instead of just dial context Signed-off-by: Robert Fratto <robertfratto@gmail.com> * discovery: rephrase comment Signed-off-by: Robert Fratto <robertfratto@gmail.com>	2022-03-24 18:16:59 -04:00
Goutham Veeramachaneni	4d8bbfd416	Add target to context (#10473 ) Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>	2022-03-24 16:53:04 +01:00
beorn7	4210aac74a	Merge branch 'main' into sparsehistogram	2022-03-22 14:47:42 +01:00
Alvin Lin	8b5eb562b1	Re-generate test cert to fix test_windows test failures Signed-off-by: Alvin Lin <alvinlin@amazon.com>	2022-03-17 19:37:18 +01:00
Goutham Veeramachaneni	c4f8020dca	Embed MetadaStore in scrape context (#10450 ) This will allow downstream users to easily access metadata required. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>	2022-03-16 09:45:15 +01:00
Robert Fratto	f0ec619eec	scrape: allow providing a custom Dialer for scraping (#10415 ) * scrape: allow providing a custom Dialer for scraping This commit extends config.ScrapeConfig with an optional field to override how HTTP connections to targets are created. This field is not set directly in Prometheus, and is only added for the convenience of downstream importers. Closes #9706 Signed-off-by: Robert Fratto <robertfratto@gmail.com> * scrape: move custom dial function to scrape.Options Signed-off-by: Robert Fratto <robertfratto@gmail.com>	2022-03-09 00:48:47 +01:00
Jayapriya Pai	edfe657b54	scrape: Fix label_limits cache usage (#10370 ) Fixes #10344 Signed-off-by: Jayapriya Pai <janantha@redhat.com>	2022-03-03 18:37:53 +01:00
Julien Pivotto	f695df843f	Improve content-type error handling - Call err everywhere - Change log message to underscore-separated field Followup on #10186 Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2022-02-08 11:02:51 +01:00
Matheus Pimenta	8d8ce641a4	error for invalid media type should not be completely swallowed (#10186 ) * error for invalid media type should not be completely swallowed Signed-off-by: Matheus Pimenta <matheuscscp@gmail.com>	2022-02-08 10:57:56 +01:00
Jonatan Ivanov	b6df3b6f67	Prefer 1.0.0 in the accept header for application/openmetrics-text (#9431 ) related: https://github.com/prometheus/client_java/issues/702 fixes gh-9430 Signed-off-by: Jonatan Ivanov <jonatan.ivanov@gmail.com>	2022-01-28 00:37:51 +01:00
beorn7	86cc83b13c	storage: iterator fixes after merge Signed-off-by: beorn7 <beorn@grafana.com>	2021-12-18 14:12:01 +01:00
beorn7	64c7bd2b08	Merge branch 'main' into sparsehistogram	2021-12-18 14:04:25 +01:00
Julius Volz	fa552b98bb	Merge pull request #9996 from roidelapluie/fixreportlimit Fix reporting metrics when sample limit is reached during the report	2021-12-17 13:17:07 +01:00
Julien Pivotto	67a64ee092	Remove check against cfg so interval/ timeout are always set (#10023 ) (#10031 ) Signed-off-by: Nicholas Blott <blottn@tcd.ie> Co-authored-by: Nicholas Blott <blottn@tcd.ie>	2021-12-16 16:46:14 +01:00
Julien Pivotto	e94a0b28e1	Append reporting metrics without limit If reporting metrics fails due to reaching the limit, this makes the target appear as UP in the UI, but the metrics are missing. This commit bypasses that limit for report metrics. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2021-12-16 13:26:53 +01:00
Björn Rabenstein	7e42acd3b1	tsdb: Rework iterators (#9877 ) - Pick At... method via return value of Next/Seek. - Do not clobber returned buckets. - Add partial FloatHistogram suppert. Note that the promql package is now _only_ dealing with FloatHistograms, following the idea that PromQL only knows float values. As a byproduct, I have removed the histogramSeries metric. In my understanding, series can have both float and histogram samples, so that metric doesn't make sense anymore. As another byproduct, I have converged the sampleBuf and the histogramSampleBuf in memSeries into one. The sample type stored in the sampleBuf has been extended to also contain histograms even before this commit. Signed-off-by: beorn7 <beorn@grafana.com>	2021-11-29 13:24:23 +05:30
beorn7	5d4db805ac	Merge branch 'main' into sparsehistogram	2021-11-17 19:57:31 +01:00
beorn7	4c28d9fac7	Move to histogram.Histogram pointers This is to avoid copying the many fields of a histogram.Histogram all the time. This also fixes a bunch of formerly broken tests. Signed-off-by: beorn7 <beorn@grafana.com>	2021-11-12 23:17:35 +01:00
beorn7	c954cd9d1d	Move packages out of deprecated pkg directory This creates a new `model` directory and moves all data-model related packages over there: exemplar labels relabel rulefmt textparse timestamp value All the others are more or less utilities and have been moved to `util`: gate logging modetimevfs pool runtime Signed-off-by: beorn7 <beorn@grafana.com>	2021-11-09 08:03:10 +01:00
Dieter Plaetinck	cda025b5b5	TSDB: demistify SeriesRefs and ChunkRefs (#9536 ) * TSDB: demistify seriesRefs and ChunkRefs The TSDB package contains many types of series and chunk references, all shrouded in uint types. Often the same uint value may actually mean one of different types, in non-obvious ways. This PR aims to clarify the code and help navigating to relevant docs, usage, etc much quicker. Concretely: * Use appropriately named types and document their semantics and relations. * Make multiplexing and demuxing of types explicit (on the boundaries between concrete implementations and generic interfaces). * Casting between different types should be free. None of the changes should have any impact on how the code runs. TODO: Implement BlockSeriesRef where appropriate (for a future PR) Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * feedback Signed-off-by: Dieter Plaetinck <dieter@grafana.com> * agent: demistify seriesRefs and ChunkRefs Signed-off-by: Dieter Plaetinck <dieter@grafana.com>	2021-11-06 15:40:04 +05:30
Mateusz Gozdek	1a6c2283a3	Format Go source files using 'gofumpt -w -s -extra' Part of #9557 Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>	2021-11-02 19:52:34 +01:00
Darshan Chaudhary	a7e554b158	add check service-discovery command (#8970 ) Signed-off-by: darshanime <deathbullet@gmail.com>	2021-11-01 14:42:12 +01:00
DrAuYueng	69e309d202	Expose TargetsFromGroup/AlertmanagerFromGroup func and reuse this for (#9343 ) static/file sd config check in promtool Signed-off-by: DrAuYueng <ouyang1204@gmail.com>	2021-10-28 02:01:28 +02:00
Furkan Türkal	a6e6011d55	Add scrape_body_size_bytes metric (#9569 ) Fixes #9520 Signed-off-by: Furkan <furkan.turkal@trendyol.com>	2021-10-24 23:45:31 +02:00
Levi Harrison	5d409b0637	Remove `interval` and `timeout` parameters (#9578 )	2021-10-24 10:38:21 -04:00
Julien Pivotto	b0c98e01c8	Include scrape labels in the hash (#9551 ) Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2021-10-20 23:44:45 +02:00
beorn7	a9008f5423	Merge branch 'main' into sparsehistogram	2021-10-19 17:14:23 +02:00
beorn7	b8d953a5a0	scrape: Avoid creating a label map during conflict resolution This also avoids the recursive function call. I think it is quite readable. And much less code. Signed-off-by: beorn7 <beorn@grafana.com>	2021-10-15 21:56:48 +02:00
Shirley Leu	c890ea407f	Resolve conflicts between multiple exported label prefixes (#9479 ) Resolve conflicts between multiple exported label prefixes Signed-off-by: Shirley Leu <shirley.w.leu@gmail.com>	2021-10-15 20:31:03 +02:00
beorn7	7a8bb8222c	Style cleanup of all the changes in sparsehistogram so far A lot of this code was hacked together, literally during a hackathon. This commit intends not to change the code substantially, but just make the code obey the usual style practices. A (possibly incomplete) list of areas: * Generally address linter warnings. * The `pgk` directory is deprecated as per dev-summit. No new packages should be added to it. I moved the new `pkg/histogram` package to `model` anticipating what's proposed in #9478. * Make the naming of the Sparse Histogram more consistent. Including abbreviations, there were just too many names for it: SparseHistogram, Histogram, Histo, hist, his, shs, h. The idea is to call it "Histogram" in general. Only add "Sparse" if it is needed to avoid confusion with conventional Histograms (which is rare because the TSDB really has no notion of conventional Histograms). Use abbreviations only in local scope, and then really abbreviate (not just removing three out of seven letters like in "Histo"). This is in the spirit of https://github.com/golang/go/wiki/CodeReviewComments#variable-names * Several other minor name changes. * A lot of formatting of doc comments. For one, following https://github.com/golang/go/wiki/CodeReviewComments#comment-sentences , but also layout question, anticipating how things will look like when rendered by `godoc` (even where `godoc` doesn't render them right now because they are for unexported types or not a doc comment at all but just a normal code comment - consistency is queen!). * Re-enabled `TestQueryLog` and `TestEndopints` (they pass now, leaving them disabled was presumably an oversight). * Bucket iterator for histogram.Histogram is now created with a method. * HistogramChunk.iterator now allows iterator recycling. (I think @dieterbe only commented it out because he was confused by the question in the comment.) * HistogramAppender.Append panics now because we decided to treat staleness marker differently. Signed-off-by: beorn7 <beorn@grafana.com>	2021-10-11 13:02:03 +02:00
beorn7	fd5ea4e0b5	Merge branch 'main' into sparsehistogram	2021-10-07 23:16:42 +02:00
Julien Pivotto	63b3e4e5ec	Enable HTTP2 again (#9398 ) We are re-enabling HTTP 2 again. There has been a few bugfixes upstream in go, and we have also enabled ReadIdleTimeout. Fix #7588 Fix #9068 Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2021-09-26 23:16:12 +02:00
Robert Fratto	daf2887fd4	expose scrape.userAgentHeader like remote.UserAgent Signed-off-by: Robert Fratto <robertfratto@gmail.com>	2021-09-13 14:10:34 -04:00
Julien Pivotto	48a101be1b	Allow to tune the scrape tolerance (#9283 ) * Allow to tune the scrape tolerance In most of the classic monitoring use cases, a few milliseconds difference can be omitted. In Prometheus, a few millisecond difference can however make a big difference. Currently, Prometheus will ignore up to 2 ms difference in the alignments. It turns out that for users who can afford a 10ms difference, there is a lot of resources and disk space to win, as shown in this graph, which shows the bytes / samples over a production Prometheus server. You can clearly see the switch from 2ms to 10ms tolerance. This pull request enables the adjustment of the scrape timestamp alignment tolerance. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * Fix golint Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2021-09-08 17:27:33 +05:30
Bryan Boreham	92a3eeac55	Create less garbage when parsing metrics (#9299 ) * Refactor: extract function to make scrapeLoop for testing Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Add benchmarks for ScrapeLoopAppend For Prometheus and OpenMetrics Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Create less garbage when parsing metrics Exemplar escapes to heap due to being passed through text-parser interface, but we can reduce the impact by hoisting it out of the loop and resetting it after every use. (Note the cost was paid on every line even when exemplars were disabled) Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Create less garbage when parsing OpenMetrics After calling parseLVals() we always append the return value, so pass in what we want to append it to and save garbage. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2021-09-08 13:39:21 +05:30
Łukasz Mierzwa	f0a26266c0	Add scrape_sample_limit metric This adds a new metric exposing per target scrape sample_limit value. Metrics are only exposed if extra-scrape-metrics feature flag is enabled. scrape_sample_limit will make it easy to monitor and alert on targets getting close to configured sample_limit, which is important given than exceeding sample_limit results in the entire scrape results being rejected. Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>	2021-09-03 15:42:41 +01:00
SuperQ	31f4108758	Add scrape_timeout_seconds metric Add a new built-in metric `scrape_timeout_seconds` to allow monitoring of the ratio of scrape duration to the scrape timeout. Hide behind a feature flag to avoid additional cardinality by default. Signed-off-by: SuperQ <superq@gmail.com>	2021-09-02 12:15:35 +02:00
Levi Harrison	70f597b033	Configure Scrape Interval and Timeout Via Relabeling (#8911 ) * Configure scrape interval and timeout with labels Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-08-31 17:37:32 +02:00
Ganesh Vernekar	8b70e87ab9	Merge remote-tracking branch 'upstream/main' into sparse-refactor Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-08-05 12:16:08 +05:30
Arunprasad Rajkumar	5527e26efc	scrape: fix 'target_limit exceeded error' when reloading conf with 0 Signed-off-by: Arunprasad Rajkumar <arajkuma@redhat.com>	2021-07-27 17:34:22 +05:30
austin ce	5bdfba1d20	Extract and export GetFQDN() Signed-off-by: austin ce <austin.cawley@gmail.com>	2021-07-21 12:55:02 -04:00
Naka Masato	a1c1313b3c	fix typo in comment for scrape manager (#9094 ) Signed-off-by: Masato Naka <masatonaka1989@gmail.com>	2021-07-19 15:55:13 +05:30
beorn7	5de2df752f	Hacky implementation of protobuf parsing This "brings back" protobuf parsing, with the only goal to play with the new sparse histograms. The Prom-2.x style parser is highly adapted to the structure of the Prometheus text format (and later OpenMetrics). Some jumping through hoops is required to feed protobuf into it. This is not meant to be a model for the final implementation. It should just enable sparse histogram ingestion at a reasonable efficiency. Following known shortcomings and flaws: - No tests yet. - Summaries and legacy histograms, i.e. without sparse buckets, are ignored. - Staleness doesn't work (but this could be fixed in the appender, to be discussed). - No tricks have been tried that would be similar to the tricks the text parsers do (like direct pointers into the HTTP response body). That makes things weird here. Tricky optimizations only make sense once the final format is specified, which will almost certainly not be the old protobuf format. (Interestingly, I expect this implementation to be in fact much more efficient than the original protobuf ingestion in Prom-1.x.) - This is using a proto3 version of metrics.proto (mostly to be consistent with the other protobuf uses). However, proto3 sees no difference between an unset field. We depend on that to distinguish between an unset timestamp and the timestamp 0 (1970-01-01, 00:00:00 UTC). In this experimental code, we just assume that timestamp is never specified and therefore a timestamp of 0 always is interpreted as "not set". Signed-off-by: beorn7 <beorn@grafana.com>	2021-07-01 01:35:11 +02:00
Ganesh Vernekar	04ad56d9b8	Append sparse histograms into the Head block (#9013 ) * Append sparse histograms into the Head block Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Add AtHistogram() to Iterator interface. Make HistoChunk conform to Chunk interface. Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-06-29 20:08:46 +05:30
Ganesh Vernekar	64bea6999e	HistogramAppender interface for sparse histograms (#9007 ) Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>	2021-06-28 20:30:55 +05:30
Julius Volz	9d495afd2c	Remove trailing zeros in scrape timeout header See https://twitter.com/AviKivity/status/1405147699557638145 and https://twitter.com/juliusvolz/status/1405790211670515712 Signed-off-by: Julius Volz <julius.volz@gmail.com>	2021-06-18 09:38:12 +02:00
Levi Harrison	b5f6f8fb36	Switched to go-kit/log Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-06-11 12:28:36 -04:00
hanjm	1df05bfd49	Add body_size_limit to prevent bad targets response large body cause Prometheus server OOM (#8827 ) Signed-off-by: hanjm <hanjinming@outlook.com>	2021-05-29 07:05:42 +08:00
Levi Harrison	2826fbeeb7	SD: Add target creation failure counter and change failure handling (#8786 ) * Added metric and changed failure/drop strategy Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-05-28 23:50:59 +02:00
Callum Styan	8fd73b1d28	Add Exemplar Remote Write support (#8296 ) * Write exemplars to the WAL and send them over remote write. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Update example for exemplars, print data in a more obvious format. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Add metrics for remote write of exemplars. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix incorrect slices passed to send in remote write. Signed-off-by: Callum Styan <callumstyan@gmail.com> * We need to unregister the new metrics. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address review comments Signed-off-by: Callum Styan <callumstyan@gmail.com> * Order of exemplar append vs write exemplar to WAL needs to change. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Several fixes to prevent sending uninitialized or incorrect samples with an exemplar. Fix dropping exemplar for missing series. Add tests for queue_manager sending exemplars Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Store both samples and exemplars in the same timeseries buffer to remove the alloc when building final request, keep sub-slices in separate buffers for re-use Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Condense sample/exemplar delivery tests to parameterized sub-tests Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Rename test methods for clarity now that they also handle exemplars Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Rename counter variable. Fix instances where metrics were not updated correctly Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Add exemplars to LoadWAL benchmark Signed-off-by: Callum Styan <callumstyan@gmail.com> * last exemplars timestamp metric needs to convert value to seconds with ms precision Signed-off-by: Callum Styan <callumstyan@gmail.com> * Process exemplar records in a separate go routine when loading the WAL. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address review comments related to clarifying comments and variable names. Also refactor sample/exemplar to enqueue prompb types. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Regenerate types proto with comments, update protoc version again. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Put remote write of exemplars behind a feature flag. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address some of Ganesh's review comments. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Move exemplar remote write feature flag to a config file field. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address Bartek's review comments. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Don't allocate exemplar buffers in queue_manager if we're not going to send exemplars over remote write. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Add ValidateExemplar function, validate exemplars when appending to head and log them all to WAL before adding them to exemplar storage. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address more reivew comments from Ganesh. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Add exemplar total label length check. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address a few last review comments Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Martin Disibio <mdisibio@gmail.com>	2021-05-06 13:53:52 -07:00
Damien Grisonnet	b50f9c1c84	Add label scrape limits (#8777 ) * scrape: add label limits per scrape Add three new limits to the scrape configuration to provide some mechanism to defend against unbound number of labels and excessive label lengths. If any of these limits are broken by a sample from a scrape, the whole scrape will fail. For all of these configuration options, a zero value means no limit. The `label_limit` configuration will provide a mechanism to bound the number of labels per-scrape of a certain sample to a user defined limit. This limit will be tested against the sample labels plus the discovery labels, but it will exclude the __name__ from the count since it is a mandatory Prometheus label to which applying constraints isn't meaningful. The `label_name_length_limit` and `label_value_length_limit` will prevent having labels of excessive lengths. These limits also skip the __name__ label for the same reasons as the `label_limit` option and will also make the scrape fail if any sample has a label name/value length that exceed the predefined limits. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com> * scrape: add metrics and alert to label limits Add three gauge, one for each label limit to easily access the limit set by a certain scrape target. Also add a counter to count the number of targets that exceeded the label limits and thus were dropped. This is useful for the `PrometheusLabelLimitHit` alert that will notify the users that scraping some targets failed because they had samples exceeding the label limits defined in the scrape configuration. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com> * scrape: apply label limits to __name__ label Apply limits to the __name__ label that was previously skipped and truncate the label names and values in the error messages as they can be very very long. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com> * scrape: remove label limits gauges and refactor Remove `prometheus_target_scrape_pool_label_limit`, `prometheus_target_scrape_pool_label_name_length_limit`, and `prometheus_target_scrape_pool_label_value_length_limit` as they are not really useful since we don't have the information on the labels in it. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>	2021-05-06 09:56:21 +01:00
Marco Pracucci	4da5c25ea4	Upgrade prometheus/common to v0.21.0 Signed-off-by: Marco Pracucci <marco@pracucci.com>	2021-04-21 12:19:16 +02:00
Julien Pivotto	e14176756f	Merge pull request #8601 from dgl/fix-8243 Ensure that timestamp comparison uses wall clock time	2021-03-16 16:00:25 +01:00
Callum Styan	289ba11b79	Add circular in-memory exemplars storage (#6635 ) * Add circular in-memory exemplars storage Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com> Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> Signed-off-by: Martin Disibio <mdisibio@gmail.com> Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> Co-authored-by: Martin Disibio <mdisibio@gmail.com> * Fix some comments, clean up exemplar metrics struct and exemplar tests. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix exemplar query api null vs empty array issue. Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> Co-authored-by: Martin Disibio <mdisibio@gmail.com>	2021-03-16 15:17:45 +05:30
David Leadbeater	21a282fabe	Ensure that timestamp comparison uses wall clock time It's not possible to assume subtraction and addition of a time.Time will result in consistent values. Signed-off-by: David Leadbeater <dgl@dgl.cx>	2021-03-15 13:05:17 +00:00
Tom Wilkie	7369561305	Combine Appender.Add and AddFast into a single Append method. (#8489 ) This moves the label lookup into TSDB, whilst still keeping the cached-ref optimisation for repeated Appends. This makes the API easier to consume and implement. In particular this change is motivated by the scrape-time-aggregation work, which I don't think is possible to implement without it as it needs access to label values. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2021-02-18 17:37:00 +05:30
gotjosh	4eca4dffb8	Allow metric metadata to be propagated via Remote Write. (#6815 ) * Introduce a metadata watcher Similarly to the WAL watcher, its purpose is to observe the scrape manager and pull metadata. Then, send it to a remote storage. Signed-off-by: gotjosh <josue@grafana.com> * Additional fixes after rebasing. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Rework samples/metadata metrics. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Use more descriptive variable names in MetadataWatcher collect. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix issues caused during rebasing. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix missing metric add and unneeded config code. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address some review comments. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix metrics and docs Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Replace assert with require Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Bring back max_samples_per_send metric Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Fix tests Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> Co-authored-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-11-19 20:53:03 +05:30
Brian Brazil	ebe0da7a72	Protect sp.loops from concurrent access. (#8176 ) Manager.reload takes the mutex that would make it safe, however releases it before the goroutines spawned are finished with it. Thus more explicit locking of scrapePool.Sync/stop/reload is needed. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2020-11-12 16:06:25 +00:00
Julien Pivotto	6c56a1faaa	Testify: move to require (#8122 ) * Testify: move to require Moving testify to require to fail tests early in case of errors. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * More moves Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-29 09:43:23 +00:00
Julien Pivotto	1282d1b39c	Refactor test assertions (#8110 ) * Refactor test assertions This pull request gets rid of assert.True where possible to use fine-grained assertions. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-27 11:06:53 +01:00
Brian Brazil	3f8e51738c	More granular locking for scrapeLoop. (#8104 ) Don't lock for all of Sync/stop/reload as that holds up /metrics and the UI when they want a list of active/dropped targets. Instead take advantage of the fact that Sync/stop/reload cannot be called concurrently by the scrape Manager and lock just on the targets themselves. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2020-10-26 14:46:20 +00:00
Julien Pivotto	4e5b1722b3	Move away from testutil, refactor imports (#8087 ) Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-22 11:00:08 +02:00
Julien Pivotto	be5ba1a62d	Fix wordings Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-07 21:44:36 +02:00
Julien Pivotto	671f7c66e5	Adjust comment Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-07 18:28:02 +02:00
Julien Pivotto	627ff84599	Adjust flag Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-07 18:25:52 +02:00
Julien Pivotto	536dfb6234	Add an experimental, hidden flag Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-07 17:31:46 +02:00
Julien Pivotto	b90c7a55da	Simplify logic Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-06 21:17:16 +02:00
Julien Pivotto	ccc1df3140	Fix comment Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-06 13:48:24 +02:00
Julien Pivotto	98e14611a5	Move the tolerance logic in the loop function. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-05 18:20:10 +02:00
Julien Pivotto	6544f95403	Introduce timestamp tolerance in scrapes Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-10-05 18:20:10 +02:00
Julien Pivotto	6f13c60219	Scrape: Test that deduplicated targets are started (#7975 ) This PR test that de-duplicated targets are actually started. It is a unit test for this line of code: `072b9649a3/scrape/scrape.go (L457)` which is working and necessary but was not tested yet. It also tests that scrapes are started in the normal way, in the targets limit test. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-09-30 20:21:32 +02:00
iurii	bd53b5ff37	Unnecessary go routine spawn. (#7879 ) * Unnecessary go routine spawn. * Remove unnecessary local variable creation. Signed-off-by: iurii <iurii@coins.ph> Co-authored-by: iurii <iurii@coins.ph>	2020-09-02 16:26:42 +01:00
Andy Bursavich	4e6a94a27d	Invert service discovery dependencies (#7701 ) This also fixes a bug in query_log_file, which now is relative to the config file like all other paths. Signed-off-by: Andy Bursavich <abursavich@gmail.com>	2020-08-20 13:48:26 +01:00
Julien Pivotto	64236cf9e8	Use SAN in test certificate (#7789 ) go 1.15 deprecated the common name verification. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-08-12 23:15:38 +02:00
Julien Pivotto	2899773b01	Do not stop scrapes in progress during reload (#7752 ) * Do not stop scrapes in progress during reload. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-08-07 15:58:16 +02:00
johncming	5578c96307	scrape: fix typo. (#7712 ) Signed-off-by: johncming <johncming@yahoo.com>	2020-08-01 09:56:21 +01:00

1 2 3 4 5 ...

355 commits