Would love to propose @npazosmendez and @alexgreenbank to help us own remote storage (especially write).
Nico & Alex are not yet Prometheus maintainer, but I think that’s fine.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
[ENHANCEMENT] Docker SD: add MatchFirstNetwork for containers with multiple networks
Fixes docker sd service misssing in shared mode and deduplicate targets by network
* Add draining of queued notifications to `notifier.Manager`
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Update docs
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Address PR feedback
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Add more logging
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Address offline feedback: remove timeout
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Ensure stopping takes priority over further processing, make tests more robust
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Make channel unbuffered
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Update docs
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Fix race in test
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Remove unnecessary context
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Make Stop safe to call multiple times
Signed-off-by: Charles Korn <charles.korn@grafana.com>
---------
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* expose hook for block querier
Signed-off-by: Ben Ye <benye@amazon.com>
* update comment
Signed-off-by: Ben Ye <benye@amazon.com>
* use defined type
Signed-off-by: Ben Ye <benye@amazon.com>
---------
Signed-off-by: Ben Ye <benye@amazon.com>
* fix: try to reproduce the bug from https://github.com/prometheus/prometheus/issues/13979 in a test case
Signed-off-by: David Vavra <sevenood@gmail.com>
* fix: data corruption in remote write if max_sample_age is applied
Signed-off-by: David Vavra <sevenood@gmail.com>
* add benchmark for buildTimeSeries which does the filtering
Signed-off-by: Callum Styan <callumstyan@gmail.com>
---------
Signed-off-by: David Vavra <sevenood@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: David Vavra <sevenood@gmail.com>
Co-authored-by: Callum Styan <callumstyan@gmail.com>
goyacc is installed using 'install-goyacc' and ends up in GOPATH/bin.
GOPATH isn't usually part of standard PATH, so when make tries to run goyacc it fails, unless PATH includes GOPATH/bin.
Other Go tools, like golangci-lint, are also installed via go install into GOPATH/bin but they run correctly because make invocations for them use FIRST_GOPATH viriable to use full path.
Call goyacc using FIRST_GOPATH/bin as well so it works without GOPATH being included in PATH.
Signed-off-by: Lukasz Mierzwa <lukasz@cloudflare.com>
* discovery: aws: expose Primary IPv6 addresses as label
Add __meta_ec2_primary_ipv6_addresses label. This label contains the
Primary IPv6 address for every ENI attached to the EC2 instance. It is
ordered by the DeviceIndex and the missing elements (interface without
Primary IPv6 address) are kept in the list.
---------
Signed-off-by: Arpad Kunszt <akunszt@hiya.com>
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
that was introcued in https://github.com/prometheus/prometheus/pull/13554
The same motivation for adding the metric applies: To avoid silent SD failures,
as existing logs may not be regularly checked and can be missed.
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
This is done to prevent the latter operation from blocking/starving the former, as previously, the `tsets` channel was consumed by the same goroutine that consumes and feeds the buffered `n.more` channel, the `tsets` channel was less likely to be ready as it's unbuffered and only fed every `SDManager.updatert` seconds.
See https://github.com/prometheus/prometheus/issues/13676 and https://github.com/prometheus/prometheus/issues/8768
The synchronization with the sendLoop goroutine is managed through the n.mtx mutex.
This uses a similar approach than scrape manager's efbd6e41c5/scrape/manager.go (L115-L117)
The old TestHangingNotifier was replaced by the new one to more closely reflect reality.
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
to show "targets groups update" starvation when the notifications queue is full and an Alertmanager
is down.
The existing `TestHangingNotifier` that was added in https://github.com/prometheus/prometheus/pull/10948 doesn't really reflect the reality as the SD changes are manually fed into `syncCh` in a continuous way, whereas in reality, updates are only resent every `updatert`.
The test added here sets up an SD manager and links it to the notifier. The SD changes will be triggered by that manager as it's done in reality.
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Co-authored-by: Ethan Hunter <ehunter@hudson-trading.com>
* clarify backup requirements for storage
After reading this (again) recently, I was under the impression that our backup strategy ("just throw Bacula at it") was just not good enough and that our backups were inconsistent. I filed [an issue internally][41627] about this because of that concern.
But reading a conversation with @SuperQ on IRC, I came under the impression that only the WAL files would be lost. This is an attempt at documenting this more clearly.
[41627]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41627
---------
Signed-off-by: anarcat <anarcat@users.noreply.github.com>
Co-authored-by: Ben Kochie <superq@gmail.com>
* Pass affected labels to MemPostings.Delete
As suggested by @bboreham, we can track the labels of the deleted series
and avoid iterating through all the label/value combinations.
This looks much faster on the MemPostings.Delete call. We don't have a
benchmark on stripeSeries.gc() where we'll pay the price of iterating
the labels of each one of the deleted series.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>