To evenly distribute scraping load we currently rely on random
jittering. This commit hashes over the target's identity and calculates
a consistent offset. This also ensures that scrape intervals
are constantly spaced between config/target changes.
The First time is kind of trivial as we always know it when we create
a new chunkDesc.
The last time is only know when the chunk is closed, so we have to set
it at that time.
The change saves a lot of digging down into the chunk
itself. Especially the last time is relative expensive as it involves
the creation of an iterator. The first time access now doesn't require
locking, which is also a nice gain.
Although they are only in examples/tests and don't affect anything, they
could be confusing (the label has been renamed in the rest of the code a
while ago).
This has the advantage that the user doesn't need
to list all labels they want to keep (as with "by")
but without having to worry about inconsistent labels
as when there's only one time series (as with "keeping_common").
Almost all aggregation should use this rather than the existing
two options as it's much less error prone and easier to maintain
due to not having to always add in "job" plus whatever other common
job-level labels you have like "region".
This gives up on the idea to communicate throuh the Append() call (by
either not returning as it is now or returning an error as
suggested/explored elsewhere). Here I have added a Throttled() call,
which has the advantage that it can be called before a whole _batch_
of Append()'s. Scrapes will happen completely or not at all. Same for
rule group evaluations. That's a highly desired behavior (as discussed
elsewhere). The code is even simpler now as the whole ingestion buffer
could be removed.
Logging of throttled mode has been streamlined and will create at most
one message per minute.
It's actually happening in several places (and for flags, we use the
standard Go time.Duration...). This at least reduces all our
home-grown parsing to one place (in model).