It is now also used in label matching, so the name of the metric
changed from `prometheus_local_storage_invalid_preload_requests_total`
to `non_existent_series_matches_total'.
Only return an error where callers are doing something with it except
simply logging and ignoring.
All the errors touched in this commit flag the storage as dirty
anyway, and that fact is logged anyway. So most of what is being
removed here is just log spam.
As discussed earlier, the class of errors that flags the storage as
dirty signals fundamental corruption, no even bubbling up a one-time
warning to the user (e.g. about incomplete results) isn't helping much
because _anything_ happening in the storage has to be doubted from
that point on (and in fact retroactively into the past, too). Flagging
the storage dirty, and alerting on it (plus marking the state in the
web UI) is the only way I can see right now.
As a byproduct, I cleaned up the setDirty method a bit and improved
the logged errors.
WIP: This needs more tests.
It now gets a from and through value, which it may opportunistically
use to optimize the retrieval. With possible future range indices,
this could be used in a very efficient way. This change merely applies
some easy checks, which should nevertheless solve the use case of
heavy rule evaluations on servers with a lot of series churn.
Idea is the following:
- Only archive series that are at least as old as the headChunkTimeout
(which was already extremely unlikely to happen).
- Then maintain a high watermark for the last archival, i.e. no
archived series has a sample more recent than that watermark.
- Any query that doesn't reach to a time before that watermark doesn't
have to touch the archive index at all. (A production server at
Soundcloud with the aforementioned series churn and heavy rule
evaluations spends 50% of its CPU time in archive index
lookups. Since rule evaluations usually only touch very recent
values, most of those lookup should disappear with this change.)
- Federation with a very broad label matcher will profit from this,
too.
As a byproduct, the un-needed MetricForFingerprint method was removed
from the Storage interface.
This fixes the case where a target provider closes the update
channel and exits before the context is canceled.
This should only be true for the static provider but it's safer
to generally handle this case.