Commit graph

301 commits

Author SHA1 Message Date
Julius Volz 9d5c367745 Fix incorrect interval op advancement.
This fixes a bug where an interval op might advance too far past the end
of the currently extracted chunk, effectively skipping over relevant
(to-be-extracted) values in the subsequent chunk. The result: missing
samples at chunk boundaries in the resulting view.

Change-Id: Iebf5d086293a277d330039c69f78e1eaf084b3c8
2014-03-18 16:22:50 +01:00
Julius Volz cc04238a85 Switch to new "__name__" metric name label.
This also fixes the compaction test, which before worked only because
the input sample sorting was accidentally equal to the resulting on-disk
sample sorting.

Change-Id: I2a21c4b46ba562424b27058fc02eba84fa6a6006
2014-03-14 16:52:37 +01:00
Bjoern Rabenstein c3b282bd14 Add regression tests for 'loop until op is consumed' bug.
- Most of this is the actual regression test in tiered_test.go.

- Working on that regression tests uncovered problems in
  tiered_test.go that are fixed in this commit.

- The 'op.consumed = false' line added to freelist.go was actually not
  fixing a bug. Instead, there was no bug at all. So this commit
  removes that line again, but adds a regression test to make sure
  that the assumed bug is indeed not there (cf. freelist_test.go).

- Removed more code duplication in operation.go (following the same
  approach as before, i.e. embedding op type A into op type B if
  everything in A is the same as in B with the exception of String()
  and ExtractSample()). (This change make struct literals for ops more
  clunky, but that only affects tests. No code change whatsoever was
  necessary in the actual code after this refactoring.)

- Fix another op leak in tiered.go.

Change-Id: Ia165c52e33290ad4f6aba9c83d92318d4f583517
2014-03-12 18:40:24 +01:00
Julius Volz 86fc13a52e Convert metric.Values to slice of values.
The initial impetus for this was that it made unmarshalling sample
values much faster.

Other relevant benchmark changes in ns/op:

Benchmark                                 old        new   speedup
==================================================================
BenchmarkMarshal                       179170     127996     1.4x
BenchmarkUnmarshal                     404984     132186     3.1x

BenchmarkMemoryGetValueAtTime           57801      50050     1.2x
BenchmarkMemoryGetBoundaryValues        64496      53194     1.2x
BenchmarkMemoryGetRangeValues           66585      54065     1.2x

BenchmarkStreamAdd                       45.0       75.3     0.6x
BenchmarkAppendSample1                   1157       1587     0.7x
BenchmarkAppendSample10                  4090       4284     0.95x
BenchmarkAppendSample100                45660      44066     1.0x
BenchmarkAppendSample1000              579084     582380     1.0x
BenchmarkMemoryAppendRepeatingValues 22796594   22005502     1.0x

Overall, this gives us good speedups in the areas where they matter
most: decoding values from disk and accessing the memory storage (which
is also used for views).

Some of the smaller append examples take minimally longer, but the cost
seems to get amortized over larger appends, so I'm not worried about
these. Also, we're currently not bottlenecked on the write path and have
plenty of other optimizations available in that area if it becomes
necessary.

Memory allocations during appends don't change measurably at all.

Change-Id: I7dc7394edea09506976765551f35b138518db9e8
2014-03-11 18:23:37 +01:00
Julius Volz a7d0973fe3 Add version field to LevelDB sample format.
This doesn't add complex discriminator logic yet, but adds a single
version byte to the beginning of each samples chunk. If we ever need to
change the disk format again, this will make it easy to do so without
having to wipe the entire database.

Change-Id: I60c39274256f790bc2da83167a1effaa174588fe
2014-03-11 14:08:40 +01:00
Julius Volz 1eee448bc1 Store samples in custom binary encoding.
This has been shown to provide immense decoding speed benefits.

See also:

https://groups.google.com/forum/#!topic/prometheus-developers/FeGl_qzGrYs

Change-Id: I7d45b4650e44ddecaa91dad9d7fdb3cd0b9f15fe
2014-03-09 22:31:38 +01:00
Julius Volz c2a2a20f36 Remove obsolete scanjobs timer.
Change-Id: Ifb29b4d93c9c1c6cacb8b098d5237866925c9fac
2014-03-07 17:10:28 +01:00
Julius Volz dd4892dcad Ensure no ops are leaked in renderView().
Change-Id: I6970a9098be305fcd010d46443b040d864d9740a
2014-03-07 14:33:13 +01:00
Julius Volz 5745ce0a60 Fixups for single-op-per-fingerprint view rendering.
Change-Id: Ie496d4529b65a3819c6042f43d7cf99e0e1ac60b
2014-03-07 00:54:28 +01:00
Björn Rabenstein 8b43497002 Merge "Fix memory series indexing bug." 2014-03-06 11:53:10 +01:00
Björn Rabenstein 0bb33b6525 Merge "Remove unused labelname -> fingerprints index." 2014-03-06 11:40:09 +01:00
Julius Volz d6827b6898 Fix memory series indexing bug.
This fixes https://github.com/prometheus/prometheus/issues/381.

For any stale series we dropped from memory, this bug caused us to also drop
any other series from the labelpair->fingerprints memory index if they had any
label/value-pairs in common with the intentionally dropped series.

To fix this issue more easily, I converted the labelpair->fingerprints index
map values to a utility.Set of clientmodel.Fingerprints. This makes handling
this index much easier in general.

Change-Id: If5e81e202e8c542261bbd9797aa1257376c5c074
2014-03-06 01:23:22 +01:00
Julius Volz c6013ff309 Remove unused labelname -> fingerprints index.
Change-Id: Ie4ccea3a230532e670030ca64ede9435b1b3e506
2014-03-05 23:49:33 +01:00
Bjoern Rabenstein 9ea9189dd1 Remove the multi-op-per-fingerprint capability.
Currently, rendering a view is capable of handling multiple ops for
the same fingerprint efficiently. However, this capability requires a
lot of complexity in the code, which we are not using at all because
the way we assemble a viewRequest will never have more than one
operation per fingerprint.

This commit weeds out the said capability, along with all the code
needed for it. It is still possible to have more than one operation
for the same fingerprint, it will just be handled in a less efficient
way (as proven by the unit tests).

As a result, scanjob.go could be removed entirely.

This commit also contains a few related refactorings and removals of
dead code in operation.go, view,go, and freelist.go. Also, the
docstrings received some love.

Change-Id: I032b976e0880151c3f3fdb3234fb65e484f0e2e5
2014-03-04 16:29:56 +01:00
Bjoern Rabenstein e11e8c7a23 Unify LevelDB.*Options.
We have seven different types all called like LevelDB.*Options.  One
of them is the plain LevelDBOptions. All others are just wrapping that
type without adding anything except clunkier handling.

If there ever was a plan to add more specific options to the various
LevelDB.*Options types, history has proven that nothing like that is
going to happen anytime soon.

To keep the code a bit shorter and more focused on the real (quite
significant) complexities we have to deal with here, this commit
reduces all uses of LevelDBOptions to the actual LevelDBOptions type.

1576 fewer characters to read...

Change-Id: I3d7a2b7ffed78b337aa37f812c53c058329ecaa6
2014-02-27 16:03:58 +01:00
Bjoern Rabenstein 6bc083f38b Major code cleanup in storage.
- Mostly docstring fixed/additions.
  (Please review these carefully, since most of them were missing, I
  had to guess them from an outsider's perspective. (Which on the
  other hand proves how desperately required many of these docstrings
  are.))

- Removed all uses of new(...) to meet our own style guide (draft).

- Fixed all other 'go vet' and 'golint' issues (except those that are
  not fixable (i.e. caused by bugs in or by design of 'go vet' and
  'golint')).

- Some trivial refactorings, like reorder functions, minor renames, ...

- Some slightly less trivial refactoring, mostly to reduce code
  duplication by embedding types instead of writing many explicit
  forwarders.

- Cleaned up the interface structure a bit. (Most significant probably
  the removal of the View-like methods from MetricPersistenc. Now they
  are only in View and not duplicated anymore.)

- Removed dead code. (Probably not all of it, but it's a first
  step...)

- Fixed a leftover in storage/metric/end_to_end_test.go (that made
  some parts of the code never execute (incidentally, those parts
  were broken (and I fixed them, too))).

Change-Id: Ibcac069940d118a88f783314f5b4595dce6641d5
2014-02-27 15:22:37 +01:00
Björn Rabenstein 59febe771a Merge "Minor code cleanups." 2014-02-13 15:29:16 +01:00
Julius Volz c4adfc4f25 Minor code cleanups.
Change-Id: Ib3729cf38b107b7f2186ccf410a745e0472e3630
2014-02-13 15:24:43 +01:00
Julius Volz 8cadae6102 Merge "Fix LevelDB closing order." 2014-02-03 23:22:30 +01:00
Julius Volz 94666e20b7 Minor test error reporting cleanup.
Change-Id: Ie11c16b4e60de7c179c6d2a86e063f4432e2000f
2014-02-03 12:27:01 +01:00
Julius Volz fd2158e746 Store copy of metric during fingerprint caching
Problem description:
====================
If a rule evaluation referencing a metric/timeseries M happens at a time
when M doesn't have a memory timeseries yet, looking up the fingerprint
for M (via TieredStorage.GetMetricForFingerprint()) will create a new
Metric object for M which gets both: a) attached to a new empty memory
timeseries (so we don't have to ask disk for the Metric's fingerprint
next time), and b) returned to the rule evaluation layer. However, the
rule evaluation layer replaces the name label (and possibly other
labels) of the metric with the name of the recorded rule.  Since both
the rule evaluator and the memory storage share a reference to the same
Metric object, the original memory timeseries will now also be
incorrectly renamed.

Fix:
====
Instead of storing a reference to a shared metric object, take a copy of
the object when creating an empty memory timeseries for caching
purposes.

Change-Id: I9f2172696c16c10b377e6708553a46ef29390f1e
2014-02-02 17:11:08 +01:00
Julius Volz 718ad2224b Fix LevelDB closing order.
The storage itself should be closed before any of the objects passed into it
are closed (otherwise closing the storage can randomly freeze). Defers are
executed in reverse order, so closing the storage should be the last of the
defer statements.

Change-Id: Id920318b876f5b94767ed48c81221b3456770620
2014-01-28 15:16:06 +01:00
Bjoern Rabenstein c342ad33a0 Fix OperatorError.
This used to work with Go 1.1, but only because of a compiler bug.
The bug is fixed in Go 1.2, so we have to fix our code now.

Change-Id: I5a9f3a15878afd750e848be33e90b05f3aa055e1
2014-01-21 16:49:51 +01:00
Julius Volz d5ef0c64dc Merge "Add optional sample replication to OpenTSDB." 2014-01-08 17:45:08 +01:00
Julius Volz 61d26e8445 Add optional sample replication to OpenTSDB.
Prometheus needs long-term storage. Since we don't have enough resources
to build our own timeseries storage from scratch ontop of Riak,
Cassandra or a similar distributed datastore at the moment, we're
planning on using OpenTSDB as long-term storage for Prometheus. It's
data model is roughly compatible with that of Prometheus, with some
caveats.

As a first step, this adds write-only replication from Prometheus to
OpenTSDB, with the following things worth noting:

1)
I tried to keep the integration lightweight, meaning that anything
related to OpenTSDB is isolated to its own package and only main knows
about it (essentially it tees all samples to both the existing storage
and TSDB). It's not touching the existing TieredStorage at all to avoid
more complexity in that area. This might change in the future,
especially if we decide to implement a read path for OpenTSDB through
Prometheus as well.

2)
Backpressure while sending to OpenTSDB is handled by simply dropping
samples on the floor when the in-memory queue of samples destined for
OpenTSDB runs full.  Prometheus also only attempts to send samples once,
rather than implementing a complex retry algorithm. Thus, replication to
OpenTSDB is best-effort for now.  If needed, this may be extended in the
future.

3)
Samples are sent in batches of limited size to OpenTSDB. The optimal
batch size, timeout parameters, etc. may need to be adjusted in the
future.

4)
OpenTSDB has different rules for legal characters in tag (label) values.
While Prometheus allows any characters in label values, OpenTSDB limits
them to a to z, A to Z, 0 to 9, -, _, . and /. Currently any illegal
characters in Prometheus label values are simply replaced by an
underscore. Especially when integrating OpenTSDB with the read path in
Prometheus, we'll need to reconsider this: either we'll need to
introduce the same limitations for Prometheus labels or escape/encode
illegal characters in OpenTSDB in such a way that they are fully
decodable again when reading through Prometheus, so that corresponding
timeseries in both systems match in their labelsets.

Change-Id: I8394c9c55dbac3946a0fa497f566d5e6e2d600b5
2014-01-02 18:21:38 +01:00
Stuart Nelson 0c58e388f6 rename curation metrics to prometheus_curation
Change-Id: I6a0bf277e88ea8eb737670b7e865ae20f2cbfb91
2013-12-13 17:45:01 -05:00
Stuart Nelson 28f59edf16 Added telemetry for counting stored samples
Change-Id: I0f36f7c2738d070ca2f107fcb315f98e46803af3
2013-12-12 10:06:41 -05:00
Tobias Schmidt 6947ee9bc9 Try to create metrics root directory if missing
This change tries to be nice and create the metrics directoy first
before erroring out.

Change-Id: I72691cdc32469708cd671c6ef1fb7db55fe60430
2013-12-03 18:16:13 +07:00
Julius Volz 740d448983 Use custom timestamp type for sample timestamps and related code.
So far we've been using Go's native time.Time for anything related to sample
timestamps. Since the range of time.Time is much bigger than what we need, this
has created two problems:

- there could be time.Time values which were out of the range/precision of the
  time type that we persist to disk, therefore causing incorrectly ordered keys.
  One bug caused by this was:

  https://github.com/prometheus/prometheus/issues/367

  It would be good to use a timestamp type that's more closely aligned with
  what the underlying storage supports.

- sizeof(time.Time) is 192, while Prometheus should be ok with a single 64-bit
  Unix timestamp (possibly even a 32-bit one). Since we store samples in large
  numbers, this seriously affects memory usage. Furthermore, copying/working
  with the data will be faster if it's smaller.

*MEMORY USAGE RESULTS*
Initial memory usage comparisons for a running Prometheus with 1 timeseries and
100,000 samples show roughly a 13% decrease in total (VIRT) memory usage. In my
tests, this advantage for some reason decreased a bit the more samples the
timeseries had (to 5-7% for millions of samples). This I can't fully explain,
but perhaps garbage collection issues were involved.

*WHEN TO USE THE NEW TIMESTAMP TYPE*
The new clientmodel.Timestamp type should be used whenever time
calculations are either directly or indirectly related to sample
timestamps.

For example:
- the timestamp of a sample itself
- all kinds of watermarks
- anything that may become or is compared to a sample timestamp (like the timestamp
  passed into Target.Scrape()).

When to still use time.Time:
- for measuring durations/times not related to sample timestamps, like duration
  telemetry exporting, timers that indicate how frequently to execute some
  action, etc.

*NOTE ON OPERATOR OPTIMIZATION TESTS*
We don't use operator optimization code anymore, but it still lives in
the code as dead code. It still has tests, but I couldn't get all of them to
pass with the new timestamp format. I commented out the failing cases for now,
but we should probably remove the dead code soon. I just didn't want to do that
in the same change as this.

Change-Id: I821787414b0debe85c9fffaeb57abd453727af0f
2013-12-03 09:11:28 +01:00
Julius Volz 6b7de31a3c Upgrade to LevelDB 1.14.0 to fix LevelDB bugs.
This tentatively fixes https://github.com/prometheus/prometheus/issues/368 due
to an upstream bugfix in snapshotted LevelDB iterator handling, which got fixed
in LevelDB 1.14.0:

https://code.google.com/p/leveldb/issues/detail?id=200

Change-Id: Ib0cc67b7d3dc33913a1c16736eff32ef702c63bf
2013-12-03 09:07:15 +01:00
Julius Volz db015de65b Comment and "go fmt" fixups in compaction tests.
Change-Id: Iaa0eda6a22a5caa0590bae87ff579f9ace21e80a
2013-10-30 17:06:17 +01:00
Julius Volz 51408bdfe8 Merge changes I3ffeb091,Idffefea4
* changes:
  Add chunk sanity checking to dumper tool.
  Add compaction regression tests.
2013-10-24 13:58:14 +02:00
Julius Volz 2162e57784 Merge "Fix watermarker default time / LevelDB key ordering bug." 2013-10-24 13:57:48 +02:00
Julius Volz 5e18255920 Merge "Fix chunk corruption compaction bug." 2013-10-24 13:57:31 +02:00
Julius Volz eb461a707d Add chunk sanity checking to dumper tool.
Also, move codecs/filters to common location so they can be used in subsequent
test.

Change-Id: I3ffeb09188b8f4552e42683cbc9279645f45b32e
2013-10-23 01:06:49 +02:00
Julius Volz 6ea22f2bf9 Add compaction regression tests.
This adds regression tests that catch the two error cases reported in

  https://github.com/prometheus/prometheus/issues/367

It also adds a commented-out test case for the crash in

  https://github.com/prometheus/prometheus/issues/368

but there's no fix for the latter crash yet.

Change-Id: Idffefea4ed7cc281caae660bcad2e3c13ec3bd17
2013-10-23 01:06:28 +02:00
Conor Hennessy 9a48010cec Add a check for metrics directory existence.
Previously on startup the program would just quit without stating
explicitly why.

Change-Id: I833b85eb74d2dd27cdc3f0f2e65d7bb1c42caa39
2013-10-22 20:54:34 +02:00
Julius Volz b5f6e3c90c Fix watermarker default time / LevelDB key ordering bug.
This fixes part 2) of https://github.com/prometheus/prometheus/issues/367
(uninitialized time.Time mapping to a higher LevelDB key than "normal"
timestamps).

Change-Id: Ib079974110a7b7c4757948f81fc47d3d29ae43c9
2013-10-21 14:32:21 +02:00
Julius Volz a1a97ed064 Fix chunk corruption compaction bug.
This fixes part 1) of https://github.com/prometheus/prometheus/issues/367 (the
storing of samples with the wrong fingerprint into a compacted chunk, thus
corrupting it).

Change-Id: I4c36d0d2e508e37a0aba90b8ca2ecc78ee03e3f1
2013-10-21 14:30:22 +02:00
Matt T. Proud 86fcbe5bde Retain DTO on each cycle.
Change-Id: Ifc6f68f98eacb01097771d0dbf043c98bba1d518
2013-09-05 10:14:34 +02:00
Matt T. Proud 4a87c002e8 Update low-level i'faces to reflect wireformats.
This commit fixes a critique of the old storage API design, whereby
the input parameters were always as raw bytes and never Protocol
Buffer messages that encapsulated the data, meaning every place a
read or mutation was conducted needed to manually perform said
translations on its own.  This is taxing.

Change-Id: I4786938d0d207cefb7782bd2bd96a517eead186f
2013-09-04 17:13:58 +02:00
Matt T. Proud 7910f6e863 Prevent total storage locking during memory flush.
While a hack, this change should allow us to serve queries
expeditiously during a flush operation.

Change-Id: I9a483fd1dd2b0638ab24ace960df08773c4a5079
2013-08-29 11:33:38 +02:00
Matt T. Proud 12d5e6ca5a Curation should not starve user-interactive ops.
The background curation should be staggered to ensure that disk
I/O yields to user-interactive operations in a timely manner. The
lack of routine prioritization necessitates this.

Change-Id: I9b498a74ccd933ffb856e06fedc167430e521d86
2013-08-26 19:40:55 +02:00
Matt T. Proud 2b42fd0068 Snapshot of no more frontier.
Change-Id: Icd52da3f52bfe4529829ea70b4865ed7c9f6c446
2013-08-23 17:13:58 +02:00
Matt T. Proud 7db518d3a0 Abstract high watermark cache into standard LRU.
Conflicts:
	storage/metric/memory.go
	storage/metric/tiered.go
	storage/metric/watermark.go

Change-Id: Iab2aedbd8f83dc4ce633421bd4a55990fa026b85
2013-08-19 12:26:55 +02:00
Matt T. Proud d74c2c54d4 Interfacification of stream.
Move the stream to an interface, for a number of additional changes
around it are underway.

Conflicts:
	storage/metric/memory.go

Change-Id: I4a5fc176f4a5274a64ebdb1cad52600954c463c3
2013-08-16 17:35:21 +02:00
Matt T. Proud c262907fec Kill interface cruft.
These pieces were never used and should be thusly removed.

Change-Id: I8dd151ec4c40b6d3ccffad1bb9b8b75a92e9ee37
2013-08-15 11:39:07 +02:00
Matt T. Proud b23acccea8 Kill AppendSample interface definition.
AppendSample will be repcated with AppendSamples, which will take
advantage of bulks appends.  This is a necessary step for indexing
pipeline decoupling.

Change-Id: Ia83811a87bcc89973d3b64d64b85a28710253ebc
2013-08-15 11:35:50 +02:00
Matt T. Proud aaaf3367d6 Include forgotten imports.
This fixes the build.

Change-Id: Id132f4342adb9ed20116191086f157ca7f7cf515
2013-08-14 18:52:55 +02:00
Matt T. Proud acf91f38bd Build layered indexers.
The indexers will be extracted in a short while and wrapped accordingly with
these types.

Change-Id: I4d1abda4e46117210babad5aa0d42f9ca1f6594f
2013-08-14 13:32:53 +02:00