The curator requires the existence of a curator remark table, which
stores the progress for a given curation policy. The tests for the
curator create an ad hoc table, but core Prometheus presently lacks
said table, which this commit adds.
Secondarily, the error handling for the LevelDB lifecycle functions
in the metric persistence have been wrapped into an UncertaintyGroup,
which mirrors some of the functions of sync.WaitGroup but adds error
capturing capability to the mix.
This commit introduces to Prometheus a batch database sample curator,
which corroborates the high watermarks for sample series against the
curation watermark table to see whether a curator of a given type
needs to be run.
The curator is an abstract executor, which runs various curation
strategies across the database. It remarks the progress for each
type of curation processor that runs for a given sample series.
A curation procesor is responsible for effectuating the underlying
batch changes that are request. In this commit, we introduce the
CompactionProcessor, which takes several bits of runtime metadata and
combine sparse sample entries in the database together to form larger
groups. For instance, for a given series it would be possible to
have the curator effectuate the following grouping:
- Samples Older than Two Weeks: Grouped into Bunches of 10000
- Samples Older than One Week: Grouped into Bunches of 1000
- Samples Older than One Day: Grouped into Bunches of 100
- Samples Older than One Hour: Grouped into Bunches of 10
The benefits hereof of such a compaction are 1. a smaller search
space in the database keyspace, 2. better employment of compression
for repetious values, and 3. reduced seek times.
Unfortunately ``cp`` on Darwin regards some flags as positional and
requires them to be in a specific place. The new Protocol Buffer
descriptor bundling fails on Mac OS.
The Protocol Buffer compiler supports generating a machine-readable
descriptor file encoded as a provided Protocol Buffer message type,
which can be used to decode messages that have been encoded with it
after-the-fact. The generated descriptor also bundles in dependent
message types.
We can use this to perform forensics on old Prometheus clients, if
necessary.
Go's time.Time represents time as UTC in its fundamental data type.
That said, when using ``time.Unix(...)``, it sets the zone for the
time representation to the local. Unfortunately with diagnosis and
our tests, it is a PITA to jump between various zones, even though
the serialized version remains the same.
To keep things easy, all places where times are generated or read
are converted into UTC. These conversions are cheap, for
``Time.In`` merely changes a pointer reference in the struct,
nothing more. This enables me to diagnose test failures with fixture
data very easily.
For the forthcoming Curator, we don't record timezone information in
the samples, nor do we in the curation remarks. All times are
recorded UTC. That said, for the test environment to better match
production, the special instant should be in UTC.
The curator work can be done easier if dto.SampleKey is no longer
directly accessed but rather has a higher level type around it that
captures a certain modicum of business logic. This doesn't look
terribly interesting today, but it will get more so.