Rule Concurrency: Test safe abort of rule evaluations (#15797)
Some checks are pending
buf.build / lint and publish (push) Waiting to run
CI / Go tests (push) Waiting to run
CI / More Go tests (push) Waiting to run
CI / Go tests with previous Go version (push) Waiting to run
CI / UI tests (push) Waiting to run
CI / Go tests on Windows (push) Waiting to run
CI / Mixins tests (push) Waiting to run
CI / Build Prometheus for common architectures (0) (push) Waiting to run
CI / Build Prometheus for common architectures (1) (push) Waiting to run
CI / Build Prometheus for common architectures (2) (push) Waiting to run
CI / Build Prometheus for all architectures (0) (push) Waiting to run
CI / Build Prometheus for all architectures (1) (push) Waiting to run
CI / Build Prometheus for all architectures (10) (push) Waiting to run
CI / Build Prometheus for all architectures (11) (push) Waiting to run
CI / Build Prometheus for all architectures (2) (push) Waiting to run
CI / Build Prometheus for all architectures (3) (push) Waiting to run
CI / Build Prometheus for all architectures (4) (push) Waiting to run
CI / Build Prometheus for all architectures (5) (push) Waiting to run
CI / Build Prometheus for all architectures (6) (push) Waiting to run
CI / Build Prometheus for all architectures (7) (push) Waiting to run
CI / Build Prometheus for all architectures (8) (push) Waiting to run
CI / Build Prometheus for all architectures (9) (push) Waiting to run
CI / Report status of build Prometheus for all architectures (push) Blocked by required conditions
CI / Check generated parser (push) Waiting to run
CI / golangci-lint (push) Waiting to run
CI / fuzzing (push) Waiting to run
CI / codeql (push) Waiting to run
CI / Publish main branch artifacts (push) Blocked by required conditions
CI / Publish release artefacts (push) Blocked by required conditions
CI / Publish UI on npm Registry (push) Blocked by required conditions
Scorecards supply-chain security / Scorecards analysis (push) Waiting to run

This test was added in the Grafana fork a while ago: https://github.com/grafana/mimir-prometheus/pull/714 and has been helpful to make sure we can safely terminate rule evaluations early
The new rule evaluation logic (done here: https://github.com/prometheus/prometheus/pull/15681) does not have the bug, but the test was useful to verify that

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>
This commit is contained in:
Julien Duchesne 2025-01-08 11:32:48 -05:00 committed by GitHub
parent 1ea9b72997
commit a768a3b95e
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -2326,6 +2326,41 @@ func TestUpdateWhenStopped(t *testing.T) {
require.NoError(t, err)
}
func TestGroup_Eval_RaceConditionOnStoppingGroupEvaluationWhileRulesAreEvaluatedConcurrently(t *testing.T) {
storage := teststorage.New(t)
t.Cleanup(func() { storage.Close() })
var (
inflightQueries atomic.Int32
maxInflight atomic.Int32
maxConcurrency int64 = 10
)
files := []string{"fixtures/rules_multiple_groups.yaml"}
files2 := []string{"fixtures/rules.yaml"}
ruleManager := NewManager(optsFactory(storage, &maxInflight, &inflightQueries, maxConcurrency))
go func() {
ruleManager.Run()
}()
<-ruleManager.block
// Update the group a decent number of times to simulate start and stopping in the middle of an evaluation.
for i := 0; i < 10; i++ {
err := ruleManager.Update(time.Second, files, labels.EmptyLabels(), "", nil)
require.NoError(t, err)
// Wait half of the query execution duration and then change the rule groups loaded by the manager
// so that the previous rule group will be interrupted while the query is executing.
time.Sleep(artificialDelay / 2)
err = ruleManager.Update(time.Second, files2, labels.EmptyLabels(), "", nil)
require.NoError(t, err)
}
ruleManager.Stop()
}
const artificialDelay = 250 * time.Millisecond
func optsFactory(storage storage.Storage, maxInflight, inflightQueries *atomic.Int32, maxConcurrent int64) *ManagerOptions {