* Write exemplars to the WAL and send them over remote write.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Update example for exemplars, print data in a more obvious format.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Add metrics for remote write of exemplars.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Fix incorrect slices passed to send in remote write.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* We need to unregister the new metrics.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address review comments
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Order of exemplar append vs write exemplar to WAL needs to change.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Several fixes to prevent sending uninitialized or incorrect samples with an exemplar. Fix dropping exemplar for missing series. Add tests for queue_manager sending exemplars
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Store both samples and exemplars in the same timeseries buffer to remove the alloc when building final request, keep sub-slices in separate buffers for re-use
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Condense sample/exemplar delivery tests to parameterized sub-tests
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Rename test methods for clarity now that they also handle exemplars
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Rename counter variable. Fix instances where metrics were not updated correctly
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Add exemplars to LoadWAL benchmark
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* last exemplars timestamp metric needs to convert value to seconds with
ms precision
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Process exemplar records in a separate go routine when loading the WAL.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address review comments related to clarifying comments and variable
names. Also refactor sample/exemplar to enqueue prompb types.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Regenerate types proto with comments, update protoc version again.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Put remote write of exemplars behind a feature flag.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address some of Ganesh's review comments.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Move exemplar remote write feature flag to a config file field.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address Bartek's review comments.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Don't allocate exemplar buffers in queue_manager if we're not going to
send exemplars over remote write.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Add ValidateExemplar function, validate exemplars when appending to head
and log them all to WAL before adding them to exemplar storage.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address more reivew comments from Ganesh.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Add exemplar total label length check.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address a few last review comments
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Martin Disibio <mdisibio@gmail.com>
* Added test to reproduce panic on TSDB head chunks truncated while querying
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Added test for Querier too
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Stop the bleed on mmap-ed head chunks panic
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Lower memory pressure in tests to ensure it doesn't OOM
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Skip TestQuerier_ShouldNotPanicIfHeadChunkIsTruncatedWhileReadingQueriedChunks
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Experiment to not trigger runtime.GC() continuously
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Try to fix test in CI
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* Do not call runtime.GC() at all
Signed-off-by: Marco Pracucci <marco@pracucci.com>
* I have no idea why it's failing in CI, skipping tests
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Snappy cannot encode records larger than ~3.7 GB and will panic if an
encoding is attempted. Check to make sure that the record is smaller
than this before encoding.
In the future, we could improve this behavior to still compress large
records (or break them up into smaller records), but this avoids the
panic for users with very large single scrape targets.
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
* scrape: add label limits per scrape
Add three new limits to the scrape configuration to provide some
mechanism to defend against unbound number of labels and excessive
label lengths. If any of these limits are broken by a sample from a
scrape, the whole scrape will fail. For all of these configuration
options, a zero value means no limit.
The `label_limit` configuration will provide a mechanism to bound the
number of labels per-scrape of a certain sample to a user defined limit.
This limit will be tested against the sample labels plus the discovery
labels, but it will exclude the __name__ from the count since it is a
mandatory Prometheus label to which applying constraints isn't
meaningful.
The `label_name_length_limit` and `label_value_length_limit` will
prevent having labels of excessive lengths. These limits also skip the
__name__ label for the same reasons as the `label_limit` option and will
also make the scrape fail if any sample has a label name/value length
that exceed the predefined limits.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
* scrape: add metrics and alert to label limits
Add three gauge, one for each label limit to easily access the
limit set by a certain scrape target.
Also add a counter to count the number of targets that exceeded the
label limits and thus were dropped. This is useful for the
`PrometheusLabelLimitHit` alert that will notify the users that scraping
some targets failed because they had samples exceeding the label limits
defined in the scrape configuration.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
* scrape: apply label limits to __name__ label
Apply limits to the __name__ label that was previously skipped and
truncate the label names and values in the error messages as they can be
very very long.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
* scrape: remove label limits gauges and refactor
Remove `prometheus_target_scrape_pool_label_limit`,
`prometheus_target_scrape_pool_label_name_length_limit`, and
`prometheus_target_scrape_pool_label_value_length_limit` as they are not
really useful since we don't have the information on the labels in it.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
I think we should increment the metric here, we're setting the rule
health anyways. This means even if the "evaluation" suceeded, none of
the samples made it to storage.
This is a simplified solution to: https://github.com/prometheus/prometheus/pull/8410/
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
The merge of 8761 did not catch that the secrets were off by one
because it was not rebased on top of 8693.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Clarify documentation for label_replace() because of ambiguities
between label keys and label values.
Signed-off-by: Matthew Smedberg <matthew.smedberg@gmail.com>
* Add range query test cases
This includes a couple of failing ones that double count some points due
to the iterator seek bug.
Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
* Add Seek() implementation for memSafeIterator
Previously, calling memSafeIterator.Seek() would call the Seek() method
on its embedded iterator. This was causing the embedded iterator and the
memSafeIterator to get out of sync because when the embedded Seek()
moved to the next element of the embedded iterator, memSafeIterator
didn't "know" about it. memSafeIterator has to "know" when the embedded
iterator has moved to be able to work out when it should be reading from
its buffer rather than the embedded iterator.
Used same logic as for xorIterator.Seek() (which in runtime is used as
the embedded iterator) - return false if the iterator has an error and
try to move to next element if the required time hasn't been reached, or
if no elements have been read yet. The memSafeIterator.Next() method is
being called so memSafeIterator.i is always accurate.
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
* Add tsdb package test
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>