Implement histogram statistics decoder
This commit speeds up histogram_count and histogram_sum
functions on native histograms. The idea is to have separate decoders which can be
used by the engine to only read count/sum values from histogram objects. This should help
with reducing allocations when decoding histograms, as well as with speeding up aggregations
like sum since they will be done on floats and not on histogram objects.
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
---------
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Co-authored-by: Anthony Mirabella <a9@aneurysm9.com>
This can give a more precise result, by keeping a separate running
compensation value to accumulate small errors.
See https://en.wikipedia.org/wiki/Kahan_summation_algorithm
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
The function `rangeEvalTimestampFunctionOverVectorSelector` appeared to be checking histogram size, however the value it used was always 0 due to subtle variable shadowing.
However we don't need to pass sample values to the `timestamp` function, since the latter only cares about timestamps. This also affects peak sample count in statistics, since we are no longer copying histogram samples.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
This saves memory in other kinds of aggregation.
We don't need `orderedResult` in `aggregationCountValues`; the ordering
is not guaranteed.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
They aggregate results in different ways.
topk/bottomk don't consider histograms so can simplify data collection.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This is a cleaner split of responsibilities.
We now check the sample count after calling rangeEvalAgg.
Changed re-use of samples to use `Clone` and `defer`.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Pass it as a float64 not as interface{}.
Make k a simple int, since that is the parameter to make().
Pull invalid quantile warning out of the loop.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
The new function `rangeEvalAgg` is mostly a copy of `rangeEval`, but
without `initSeries` which we don't need and inlining the callback to
`aggregation()`.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
The existing aggregation function is very long and covers very different
cases.
`aggregationCountValues` is just for `count_values`, which differs from
other aggregations in that it outputs as many series per group as there
are values in the input.
Remove the top-level switch on string parameter type; use the same `Op`
check there as elswehere.
Pull checking parameters out to caller, where it is only executed once.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* add custom buckets to native histogram model
* simple copy for custom bounds
* return errors for unsupported add/sub operations
* add test cases for string and update appendhistogram in scrape to account for new schema
* check fields which are supposed to be unused but may affect results in equals
* allow appending custom buckets histograms regardless of max schema
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>