* [ENHANCEMENT] Alerts: remove metrics for removed Alertmanagers
So they don't continue to report stale values.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This test keeps timing out on our arm64 CI server, it does use a very slow timeout and that 5ms doesn't seem to be enough.
But it 10x.
Signed-off-by: Lukasz Mierzwa <lukasz@cloudflare.com>
* Add draining of queued notifications to `notifier.Manager`
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Update docs
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Address PR feedback
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Add more logging
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Address offline feedback: remove timeout
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Ensure stopping takes priority over further processing, make tests more robust
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Make channel unbuffered
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Update docs
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Fix race in test
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Remove unnecessary context
Signed-off-by: Charles Korn <charles.korn@grafana.com>
* Make Stop safe to call multiple times
Signed-off-by: Charles Korn <charles.korn@grafana.com>
---------
Signed-off-by: Charles Korn <charles.korn@grafana.com>
This is done to prevent the latter operation from blocking/starving the former, as previously, the `tsets` channel was consumed by the same goroutine that consumes and feeds the buffered `n.more` channel, the `tsets` channel was less likely to be ready as it's unbuffered and only fed every `SDManager.updatert` seconds.
See https://github.com/prometheus/prometheus/issues/13676 and https://github.com/prometheus/prometheus/issues/8768
The synchronization with the sendLoop goroutine is managed through the n.mtx mutex.
This uses a similar approach than scrape manager's efbd6e41c5/scrape/manager.go (L115-L117)
The old TestHangingNotifier was replaced by the new one to more closely reflect reality.
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
to show "targets groups update" starvation when the notifications queue is full and an Alertmanager
is down.
The existing `TestHangingNotifier` that was added in https://github.com/prometheus/prometheus/pull/10948 doesn't really reflect the reality as the SD changes are manually fed into `syncCh` in a continuous way, whereas in reality, updates are only resent every `updatert`.
The test added here sets up an SD manager and links it to the notifier. The SD changes will be triggered by that manager as it's done in reality.
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Co-authored-by: Ethan Hunter <ehunter@hudson-trading.com>
When all alerts were dropped after alert relabeling, the `sendAll()`
function didn't release the lock properly which created a deadlock with
the Alertmanager target discovery.
In addition, the commit detects early when there are no Alertmanager
endpoint to notify to avoid unnecessary work.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Avoids possible false sharing between loops.
Plausibly there is no problem in the current code, but it's easy enough to write it more safely.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Addresses: #12536
This commit adds support for configuring sigv4 to an
`alertmanager_config`. Based heavily on the sigv4 work in the remote
write client.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Use a label builder instead of a slice when creating labels for the
target alertmanagers. This can be passed directly to
`relabel.ProcessBuilder`, skipping a copy.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
It took a `Labels` where the memory could be re-used, but in practice
this hardly ever benefitted. Especially after converting `relabel.Process`
to `relabel.ProcessBuilder`.
Comparing the parameter to `nil` was a bug; `EmptyLabels` is not `nil`
so the slice was reallocated multiple times by `append`.
Lastly `Builder.Labels()` now estimates that the final size will depend
on labels added and deleted.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* model/relabel: Add benchmark
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* model/relabel: re-use Builder across relabels
Saves memory allocations.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* labels.Builder: allow re-use of result slice
This reduces memory allocations where the caller has a suitable slice available.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* model/relabel: re-use source values slice
To reduce memory allocations.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Unwind one change causing test failures
Restore original behaviour in PopulateLabels, where we must not overwrite the input set.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* relabel: simplify values optimisation
Use a stack-based array for up to 16 source labels, which will be the
vast majority of cases.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* lint
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* refactor: move from io/ioutil to io and os packages
* use fs.DirEntry instead of os.FileInfo after os.ReadDir
Signed-off-by: MOREL Matthieu <matthieu.morel@cnp.fr>
This creates a new `model` directory and moves all data-model related
packages over there:
exemplar labels relabel rulefmt textparse timestamp value
All the others are more or less utilities and have been moved to `util`:
gate logging modetimevfs pool runtime
Signed-off-by: beorn7 <beorn@grafana.com>
We are re-enabling HTTP 2 again. There has been a few bugfixes upstream
in go, and we have also enabled ReadIdleTimeout.
Fix#7588Fix#9068
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Fix: Use json.Unmarshal() instead of json.Decoder
See https://ahmet.im/blog/golang-json-decoder-pitfalls/
json.Decoder is for JSON streams, not single JSON objects / bodies.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Revert modifications to targetgroup parsing
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Testify: move to require
Moving testify to require to fail tests early in case of errors.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* More moves
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Refactor test assertions
This pull request gets rid of assert.True where possible to use
fine-grained assertions.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
This also fixes a bug in query_log_file, which now is relative to the config file like all other paths.
Signed-off-by: Andy Bursavich <abursavich@gmail.com>
* storage: Replace usage of sync/atomic with uber-go/atomic
Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>
* tsdb: Replace usage of sync/atomic with uber-go/atomic
Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>
* web: Replace usage of sync/atomic with uber-go/atomic
Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>
* notifier: Replace usage of sync/atomic with uber-go/atomic
Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>
* cmd: Replace usage of sync/atomic with uber-go/atomic
Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>
* scripts: Verify that we are not using restricted packages
It checks that we are not directly importing 'sync/atomic'.
Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>
* Reorganise imports in blocks
Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>
* notifier/test: Apply PR suggestions
Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>
* storage/remote: avoid storing references on newEntry
Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>
* Revert "scripts: Verify that we are not using restricted packages"
This reverts commit 278d32748e.
Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>
* web: Group imports accordingly
Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>