This commit introduces to Prometheus a batch database sample curator,
which corroborates the high watermarks for sample series against the
curation watermark table to see whether a curator of a given type
needs to be run.
The curator is an abstract executor, which runs various curation
strategies across the database. It remarks the progress for each
type of curation processor that runs for a given sample series.
A curation procesor is responsible for effectuating the underlying
batch changes that are request. In this commit, we introduce the
CompactionProcessor, which takes several bits of runtime metadata and
combine sparse sample entries in the database together to form larger
groups. For instance, for a given series it would be possible to
have the curator effectuate the following grouping:
- Samples Older than Two Weeks: Grouped into Bunches of 10000
- Samples Older than One Week: Grouped into Bunches of 1000
- Samples Older than One Day: Grouped into Bunches of 100
- Samples Older than One Hour: Grouped into Bunches of 10
The benefits hereof of such a compaction are 1. a smaller search
space in the database keyspace, 2. better employment of compression
for repetious values, and 3. reduced seek times.
Primary changes:
* Strictly typed unmarshalling of metric values
* Schema types are contained by the processor (no "type entity002")
Minor changes:
* Added ProcessorFunc type for expressing processors as simple
functions.
* Added non-destructive `Merge` method to `model.LabelSet`
The Protocol Buffer compiler supports generating a machine-readable
descriptor file encoded as a provided Protocol Buffer message type,
which can be used to decode messages that have been encoded with it
after-the-fact. The generated descriptor also bundles in dependent
message types.
We can use this to perform forensics on old Prometheus clients, if
necessary.
Go's time.Time represents time as UTC in its fundamental data type.
That said, when using ``time.Unix(...)``, it sets the zone for the
time representation to the local. Unfortunately with diagnosis and
our tests, it is a PITA to jump between various zones, even though
the serialized version remains the same.
To keep things easy, all places where times are generated or read
are converted into UTC. These conversions are cheap, for
``Time.In`` merely changes a pointer reference in the struct,
nothing more. This enables me to diagnose test failures with fixture
data very easily.
The curator work can be done easier if dto.SampleKey is no longer
directly accessed but rather has a higher level type around it that
captures a certain modicum of business logic. This doesn't look
terribly interesting today, but it will get more so.
The curator doesn't do anything yet; rather, this is the type
definition including the anciliary testing scaffold.
Improve Makefile and Git developer experience.
The top-level Makefile was a bit overloaded in terms of generation of
assets and their management. This has been offloaded into separate
Makefiles.
The Git developer experience sucked due to lack of .gitignore
policies.
Also: Fix faulty skiplist naming from old merge.
The old system relies off of super-careful notion that the serialized
form of a Protocol Buffer should be used for fingerprint formulation.
Of course this is both wrong and inefficient. This commit breaks
ground for swapping to a pure attribute-oriented digest.
Kill LevelDB watermarks due to redundancy.
General interface documentation has begun.
Creating custom types for the model to prevent errors down the
road.
Renaming of components for easier comprehension.
Exposition of interface in LevelDB.
Slew of simple refactorings.