Commit graph

618 commits

Author SHA1 Message Date
beorn7 a052d32609 Comment improvement. 2015-04-14 10:49:43 +02:00
beorn7 66fc61f9b7 Make bufPool a member of the persistence struct. 2015-04-14 10:43:09 +02:00
beorn7 b02d900e61 Improve chunk and chunkDesc loading.
Also, clean up some things in the code (especially introduction of the
chunkLenWithHeader constant to avoid the same expression all over the place).

Benchmark results:

BEFORE
BenchmarkLoadChunksSequentially     5000            283580 ns/op          152143 B/op        312 allocs/op
BenchmarkLoadChunksRandomly        20000             82936 ns/op           39310 B/op         99 allocs/op
BenchmarkLoadChunkDescs            10000            110833 ns/op           15092 B/op        345 allocs/op

AFTER
BenchmarkLoadChunksSequentially    10000            146785 ns/op          152285 B/op        315 allocs/op
BenchmarkLoadChunksRandomly        20000             67598 ns/op           39438 B/op        103 allocs/op
BenchmarkLoadChunkDescs            20000             99631 ns/op           12636 B/op        192 allocs/op

Note that everything is obviously loaded from the page cache (as the
benchmark runs thousands of times with very small series files). In a
real-world scenario, I expect a larger impact, as the disk operations
will more often actually hit the disk. To load ~50 sequential chunks,
this reduces the iops from 100 seeks and 100 reads to 1 seek and 1
read.
2015-04-13 21:06:04 +02:00
beorn7 c563398c68 Remove obsolete debug message. 2015-04-13 16:59:52 +02:00
beorn7 c5fa0b90c3 Fix the case where a series in memory has 0 chunks, but chunks on disk.
This is actually completely normal for a freshly unarchived series.

Test added to expose.
2015-04-09 15:57:11 +02:00
Björn Rabenstein d8e515e9cb Merge pull request #617 from prometheus/influxdb-write-support
Add experimental InfluxDB write support.
2015-04-07 13:23:06 +02:00
Julius Volz 593e565688 Allow writing to InfluxDB/OpenTSDB at the same time. 2015-04-02 20:24:38 +02:00
beorn7 3035b8bfdd Adaptively reduce the wait time for memory series maintenance.
This will make in-memory series maintenance the faster the more chunks
are waiting for persistence.
2015-04-01 17:52:03 +02:00
Julius Volz 61fb688dd9 Add experimental InfluxDB write support. 2015-04-01 02:03:16 +02:00
beorn7 fbc44d8f95 Add benchmark for loading chunks and chunk descs. 2015-03-19 19:28:21 +01:00
beorn7 6a21f73898 Fixes after review. 2015-03-19 17:54:59 +01:00
beorn7 51d35f4481 Instrument series maintenance durations. 2015-03-19 17:06:16 +01:00
beorn7 12ae6e9203 Increase resilience of the storage against data corruption - step 4.
Step 4: Add a configurable sync'ing of series files after modification.
2015-03-19 15:58:02 +01:00
beorn7 11bd9ce1bd Increase resilience of the storage against data corruption - step 3.
Step 3: Remember the mtime of series files and make use of it to
detect series files that are not the one the checkpoint thinks they
are.
2015-03-19 15:44:11 +01:00
beorn7 e25cca823c Increase resilience of the storage against data corruption - step 2.
Step 2: Add a flag -storage.local.pedantic-checks to check every
series file.

Also, remove countPersistedHeadChunks channel, which is unused.
2015-03-19 12:06:15 +01:00
beorn7 3d8d8928be Increase resilience of the storage against data corruption - step 1.
Step 1: Admit the problem by turning the various "panic"s into logged
errors, followed by marking the persistence as dirty.
2015-03-19 11:49:18 +01:00
beorn7 da7c0461c6 Rename persist queue len/cap to num/max chunks to persist.
Remove deprecated flag storage.incoming-samples-queue-capacity.
2015-03-18 19:36:41 +01:00
beorn7 a075900f9a Merge branch 'beorn7/persistence' into beorn7/ingestion-tweaks 2015-03-18 19:09:31 +01:00
beorn7 1d8fc7d56f Change minor things after code review. 2015-03-18 19:09:07 +01:00
beorn7 be11cb2b07 Remove the sample ingestion channel.
The one central sample ingestion channel has caused a variety of
trouble. This commit removes it. Targets and rule evaluation call an
Append method directly now. To incorporate multiple storage backends
(like OpenTSDB), storage.Tee forks the Append into two different
appenders.

Note that the tsdb queue manager had its own queue anyway. It was a
queue after a queue... Much queue, so overhead...

Targets have their own little buffer (implemented as a channel) to
avoid stalling during an http scrape. But a new scrape will only be
started once the old one is fully ingested.

The contraption of three pipelined ingesters was removed. A Target is
an ingester itself now. Despite more logic in Target, things should be
less confusing now.

Also, remove lint and vet warnings in ast.go.
2015-03-15 14:08:22 +01:00
beorn7 0056eaeb4f Redesign series maintenance and chunk persistence. 2015-03-14 22:05:23 +01:00
beorn7 5bea942d8e Improve various things around chunk encoding.
A number of mostly minor things:

- Rename chunk type -> chunk encoding.

- After all, do not carry around the chunk encoding to all parts of
  the system, but just have one place where the encoding for new
  chunks is set based on the flag. The new approach has caveats as
  well, but the polution of so many method signatures is worse.

- Use the default chunk encoding for new chunks of existing
  series. (Previously, only new _series_ would get chunks with the
  default encoding.)

- Use an enum for chunk encoding. (But keep the version number for the
  flag, for reasons discussed previously.)

- Add encoding() to the chunk interface (so that a chunk knows its own
  encoding - no need to have that in a different top-level function).

- Got rid of newFollowUpChunk (which would keep the existing encoding
  for all chunks of a time series). Now only use newChunk(), which
  will create a chunk encoding according to the flag.

- Simplified transcodeAndAdd.

- Reordered methods of deltaEncodedChunk and doubleDeltaEncoded chunk
  to match the order in the chunk interface.

- Only transcode if the chunk is not yet half full. If more than half
  full, add a new chunk instead.
2015-03-14 19:03:20 +01:00
beorn7 9ecf93526d Sync the checkpoints.
Because that's what should be done with checkpoints.
2015-03-11 19:10:51 +01:00
beorn7 853f971540 Actually use double-delta encoding for transcoding. :-o 2015-03-11 16:52:58 +01:00
beorn7 23ba8a5516 Make floats exact again.
This should do the right thing for the old delta chunks, too.
2015-03-06 17:03:56 +01:00
beorn7 a8d4f8af9a Improve minor things after review.
The problem of float precision will be addressed in the next commit.
2015-03-06 12:53:00 +01:00
beorn7 13fcf1ddbc Implement double-delta encoded chunks. 2015-03-05 20:33:26 +01:00
beorn7 5ed8f6c205 Update persistQueueLength after chunks were persisted. 2015-03-04 18:46:16 +01:00
beorn7 0167083da6 Improvements after review. 2015-03-03 18:59:39 +01:00
beorn7 ebac14eff3 Add version guard to persistence. 2015-03-03 18:34:01 +01:00
Julius Volz 795704f0df Merge pull request #565 from fabxc/fabxc/labelmatcher_test
Tests for retrieving fingerprints for label matchers added.
2015-02-27 14:52:37 +01:00
Fabian Reinartz 4bff5d29bf Add tests for retrieving fingerprints for label matchers.
This checks for the basic behaviour of GetFingerprintsForLabelMatchers, that is, whether the different matcher types filter the correct fingerprints and intersections are correct.
2015-02-27 14:41:43 +01:00
beorn7 92991026bb Fix chunkDescsTotal count in case of errors.
Only increment the counter if we actually add the memory series to the
fingerprintToSeries map.
2015-02-27 02:21:12 +01:00
beorn7 1db7589081 Reduce the capacity of countPersistedHeadChunks.
The capacity is basically how many persisted head chunks we will count
at most while doing other things, in particular checkpointing. To
limit the amount of already counted head chunks, keep this number low,
otherwise we will easily checkpoint too often if checkpoints take long
anyway.
2015-02-27 00:53:52 +01:00
beorn7 9406afad72 Do not double-count non-persisted head chunks on loading. 2015-02-27 00:06:16 +01:00
beorn7 dbc22b972c Check last time in head chunk for head chunk timeout, not first. 2015-02-26 23:40:42 +01:00
beorn7 edd716e63c Fix the embarrassing bug introduced in commit 0851945.
In that commit, the 'maintainSeries' call was accidentally removed.

This commit refactors things a bit so that there is now a clean
'maintainMemorySeries' and a 'maintainArchivedSeries' call.

Straighten the nomenclature a bit (consistently use 'drop' for
chunks and 'purge' for series/metrics).

Remove the annoying 'Completed maintenance sweep through archived
fingerprints' message if there were no archived fingerprints to do
maintenance on.
2015-02-26 18:30:33 +01:00
beorn7 af91fb8e31 Improve persisting chunks to disk.
This is done by bucketing chunks by fingerprint. If the persisting to
disk falls behind, more and more chunks are in the queue. As soon as
there are "double hits", we will now persist both chunks in one go,
doubling the disk throughput (assuming it is limited by disk
seeks). Should even more pile up so that we end wit "triple hits", we
will persist those first, and so on.

Even if we have millions of time series, this will still help,
assuming not all of them are growing with the same speed. Series that
get many samples and/or are not very compressable will accumulate
chunks faster, and they will soon get double- or triple-writes.

To improve the chance of double writes,
-storage.local.persistence-queue-capacity could be set to a higher
value. However, that will slow down shutdown a lot (as the queue has
to be worked through). So we leave it to the user to set it to a
really high value. A more fundamental solution would be to checkpoint
not only head chunks, but also chunks still in the persist queue. That
would be quite complicated for a rather limited use-case (running many
time series with high ingestion rate on slow spinning disks).
2015-02-17 16:02:09 +01:00
beorn7 e22f26bc58 Move to a queue model for appending samples after all.
Starting a goroutine takes 1-2µs on my laptop. From the "numbers every
Go programmer should know", I had 300ns for a channel send in my
mind. Turns out, on my laptop, it takes only 60ns. That's fast enough
to warrant the machinery of yet another channel with a fixed set of
worker goroutines feeding from it. The number chosen (8 for now) is
low enough to not really afflict a measurable overhead (a big
Prometheus server has >1000 goroutines running), but high enough to
not make sample ingestion a bottleneck.
2015-02-13 14:26:54 +01:00
beorn7 fe518fdb28 Simplify AppendSamples by allowing it to be goroutine-unsafe. 2015-02-13 12:13:22 +01:00
beorn7 8a1c195b54 Move emptiness check to the receivers. 2015-02-12 19:47:24 +01:00
beorn7 5d3cd65a5d Improve performance of ingestion.
- Parallelize AppendSamples as much as possible without breaking the
  contract about temporal order.

- Allocate more fingerprint locker slots.

- Do not run early checkpoints if we are behind on chunk persistence.

- Increase fpMinWaitDuration to give the disk more time for more
  important things.

Also, switch math.MaxInt64 and math.MinInt64 to the new constants.
2015-02-12 18:12:37 +01:00
beorn7 d2ab49c396 Make the persist queue length configurable.
Also, set a much higher default value.

Chunk persist requests can be quite spiky. If you collect a large
number of time series that are very similar, they will tend to finish
up a chunk at about the same time. There is no reason we need to back
up scraping just because of that. The rationale of the new default
value is "1/8 of the chunks in memory".
2015-02-06 14:54:53 +01:00
Julius Volz 9412b296d5 Remove labels on persist error counter.
This fixes https://github.com/prometheus/prometheus/issues/496
2015-02-01 14:03:34 +01:00
Bjoern Rabenstein 3948e2a7f8 Move lost files to an "orphaned" directory.
Previously, those were simply deleted. The orphaned files can now be
used for forensics if needed.
2015-01-29 14:52:12 +01:00
Bjoern Rabenstein c24bfdf701 Move crash related code into separate file.
persistence.go is way too long anyway, and a lot of code is just crash
recovery, which is not important to understand the normal operation.

Also, remove unused `exists` function.
2015-01-29 13:13:16 +01:00
Bjoern Rabenstein ab386d1f5d Declare storage.local.index-cache-size.* default values as tweaked. 2015-01-29 13:04:54 +01:00
Bjoern Rabenstein 73f6dc4d44 Make KeyValueStore.Delete report if the key to delete was found.
Previously, it would return an error instead. Now we can distinguish
the cases 'error while deleting known key' vs. 'key not in index'
without testing for leveldb-internal kinds of errors.
2015-01-29 12:57:50 +01:00
Bjoern Rabenstein 2c8d324ca4 Remove check that did not check anything. 2015-01-26 13:48:24 +01:00
Julius Volz d4374a9265 More efficient JSON query result format.
This depends on https://github.com/prometheus/client_golang/pull/51.

For vectors, the result format looks like this:

```json
{
   "version": 1,
   "type" : "vector",
   "value" : [
      {
         "timestamp" : 1421765411.045,
         "value" : "65.475000",
         "metric" : {
            "quantile" : "0.5",
            "instance" : "http://localhost:9090/metrics",
            "job" : "prometheus",
            "__name__" : "http_request_duration_microseconds",
            "handler" : "/static/",
            "method" : "get",
            "code" : "304"
         }
      },
      {
         "timestamp" : 1421765411.045,
         "value" : "5826.339000",
         "metric" : {
            "quantile" : "0.9",
            "instance" : "http://localhost:9090/metrics",
            "job" : "prometheus",
            "__name__" : "http_request_duration_microseconds",
            "handler" : "prometheus",
            "method" : "get",
            "code" : "200"
         }
      },
      /* ... */
   ]
}
```

For matrices, it looks like this:

```json
{
   "version": 1,
   "type" : "matrix",
   "value" : [
      {
         "metric" : {
            "quantile" : "0.99",
            "instance" : "http://localhost:9090/metrics",
            "job" : "prometheus",
            "__name__" : "http_request_duration_microseconds",
            "handler" : "/static/",
            "method" : "get",
            "code" : "200"
         },
         "values" : [
            [
               1421765547.659,
               "29162.953000"
            ],
            [
               1421765548.659,
               "29162.953000"
            ],
            [
               1421765549.659,
               "29162.953000"
            ],
            /* ... */
         ]
      }
   ]
}
```
2015-01-26 13:06:22 +01:00
Bjoern Rabenstein 2c8fdcbc23 Remove a deadlock during shutdown.
If queries are still running when the shutdown is initiated, they will
finish _during_ the shutdown. In that case, they might request chunk
eviction upon unpinning their pinned chunks. That might completely
fill the evict request queue _after_ draining it during storage
shutdown. If that ever happens (which is the case if there are _many_
queries still running during shutdown), the affected queries will be
stuck while keeping a fingerprint locked. The checkpointing can then
not process that fingerprint (or one that shares the same lock). And
then we are deadlocked.
2015-01-22 14:42:15 +01:00
Bjoern Rabenstein 5859b74f1b Clean up license issues.
- Move CONTRIBUTORS.md to the more common AUTHORS.
- Added the required NOTICE file.
- Changed "Prometheus Team" to "The Prometheus Authors".
- Reverted the erroneous changes to the Apache License.
2015-01-21 20:07:45 +01:00
Bjoern Rabenstein f298af5756 Use named returns in flock.New. 2015-01-19 14:31:16 +01:00
Bjoern Rabenstein baca6faa1c Add double-start protection.
This mimics the locking leveldb is performing anyway. Advantages of
doing it separately:

- Should we ever replace the leveldb implementation by one without
  double-start protection, we are still good.

- In contrast to leveldb, the new code creates a meaningful error
  message.
2015-01-14 17:13:42 +01:00
Bjoern Rabenstein ae70eac97d Adjust the partitioning by outcome. 2015-01-13 18:34:56 +01:00
Julius Volz a6bc42bc61 Minor formatting/spelling fixups. 2015-01-09 11:04:20 +01:00
Bjoern Rabenstein 0851945054 Add a heuristics to checkpoint early if there are many "dirty" series.. 2015-01-08 20:15:58 +01:00
Bjoern Rabenstein 622e8350cd Fix a bug handling freshly unarchived series.
Usually, if you unarchive a series, it is to add something to it,
which will create a new head chunk. However, if a series in
unarchived, and before anything is added to it, it is handled by the
maintenance loop, it will be archived again. In that case, we have to
load the chunkDescs to know the lastTime of the series to be
archived. Usually, this case will happen only rarely (as a race, has
never happened so far, possibly because the locking around unarchiving
and the subsequent sample append is smart enough). However, during
crash recovery, we sometimes treat series as "freshly unarchived"
without directly appending a sample. We might add more cases of that
type later, so better deal with archiving properly and load chunkDescs
if required.
2015-01-08 16:25:50 +01:00
Bjoern Rabenstein eb932d1524 Remove a deadlock during shutdown. 2015-01-07 19:02:38 +01:00
Brian Brazil e56786b221 Have scrape time as a pseudovariable, not a prometheus variable.
This ensures it has the right timestamp, and is easier to work with.

Switch sd variable away from 'outcome', using total/failed instead.
2014-12-27 00:39:33 +00:00
Bjoern Rabenstein ff24070a03 Fix embarrassing bug in crash recovery.
(And yes, we always knew we need tests for that. I have added a TODO now.)

Change-Id: I9cf52bbf98e263e0b79404bda4c442beba9696a8
2014-12-17 17:18:04 +01:00
Julius Volz c9618d11e8 Introduce copy-on-write for metrics in AST.
This depends on changes in:

https://github.com/prometheus/client_golang/tree/cow-metrics.

Change-Id: I80b94833a60ddf954c7cd92fd2cfbebd8dd46142
2014-12-12 20:34:55 +01:00
Bjoern Rabenstein afd864e7f4 Adjust to the new version of goleveldb.
(And yes, we do want vendoring for that... This is just the quick fix.)

Change-Id: I9d347a64d96de6b3390a0e35c8d466f14bb83e4e
2014-12-10 18:04:29 +01:00
Bjoern Rabenstein fee88a7a77 Remove the remaining races, new and old.
Also, resolve a few other TODOs.

Change-Id: Icb39b5a5e8ca22ebcb48771cd8951c5d9e112691
2014-12-03 18:07:23 +01:00
Bjoern Rabenstein 66c80b5ebd Fix typo.
Change-Id: I72608c7841c00145458807d3c3ee29db7b5ac2bc
2014-11-28 12:50:19 +01:00
Bjoern Rabenstein 674624f1c8 Completed more TODOs.
- Documented checkpoint file format.
- High-level description of series sanitation.
- Replace fp.LoadFromString panic with an error.
  (Change in client_golang already submitted.)
- Introduced checks for series file size where appropriate.
- Removed two Law of Demeter violations.

Change-Id: I555d97a2c8f4769820c2fc8bf5d6f4e160222abc
2014-11-27 20:46:45 +01:00
Bjoern Rabenstein 7d11019aa2 Squash a few trivial TODOs.
- Delete unneeded file view_adapter.go.
- Assessed that we still need the fingerprints in nodes
  (to create iterators).
- Turned numMemChunkDescs into a metric.

Change-Id: I29be963c795a075ec00c095f76bf26405535609d
2014-11-27 18:26:06 +01:00
Bjoern Rabenstein 49683c0c20 Avoid test flags in normal binary.
Change-Id: If1fba813a73bf93ea5918dcda326e3ffa81a797d
2014-11-27 18:04:48 +01:00
Bjoern Rabenstein 9bc05052ad Add line that has mysteriously disappeared after rebase.
Change-Id: I3612eb0b626e66e607b363e9801f187d2ba637a3
2014-11-25 17:15:56 +01:00
Bjoern Rabenstein 14bda4180c Changes after pair code review.
Change-Id: Ib72d40f8e9027818cfbbd32a7a7201eebda07455
2014-11-25 17:12:59 +01:00
Bjoern Rabenstein 9ea808cd8b Remove debug log line.
Change-Id: Icdd2351b89f2d37ac2b615f9cf872e054c694ad1
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein bb42cc2e2d Evict based on memory pressure. Evict recently used chunks last.
Change-Id: Ie6168f0cdb3917bdc63b6fe15585dd70c1e42afe
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein e23ee0f7cc Fix race in test.
Change-Id: I53e1a4c5a6b5f846acd76043166b6cb7bf7d5dc7
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein d73e851b14 Tweak timing in the maintenance loop.
Change-Id: I9801c4f9a22c3b3dc1ce1af81fdd9e992a4f4dd7
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein 2672aa8ece Instrument series maintenance.
Change-Id: Ie4269d07ad4d23d44230c95a523088b472718e54
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein 74c143c4c9 Improve scraper shutdown time.
- Stop target pools in parallel.
- Stop individual scrapers in goroutines, too.
- Timing tweaks.

Change-Id: I9dff1ee18616694f14b04408eaf1625d0f989696
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein 3f61d304ce Reorganize maintenance loop.
Change-Id: Iac10f988ba3e93ffb188f49c30f92e0b6adce5a3
2014-11-25 17:10:30 +01:00
Bjoern Rabenstein c087ee35f7 Remove archiveMtx.
Change-Id: Ie8019f860bbda68621f74380c90a4e57930d3d7a
2014-11-25 17:10:30 +01:00
Bjoern Rabenstein 7af42eda65 Optimize purging.
Now only purge if there is something to purge.
Also, set savedFirstTime and archived time range appropriately.
(Which is needed for the optimization.)

Change-Id: Idcd33319a84def3ce0318d886f10c6800369e7f9
2014-11-25 17:10:30 +01:00
Bjoern Rabenstein 33b959b898 Persist savedFirstTime in checkpoint.
Change-Id: Ibdfdea16fad0608ec104fbccc749e824a171f227
2014-11-25 17:10:30 +01:00
Bjoern Rabenstein 904acd43da Add crash recovery.
Fix the behavior if preload for non-existent series is requested.

Instead of returning an error (which triggers a panic further up),
simply count those incidents. They can happen regularly, we just want
to know if they happen too frequently because that would mean the
indexing is behind or broken.

Change-Id: I4b2d1b93c4146eeea897d188063cb9574a270f8b
2014-11-25 17:09:43 +01:00
Bjoern Rabenstein 7a9efc9c59 Fix typo in test.
Change-Id: I3c2fd76bc5f50446c58f8ef693d9c6595197feaa
2014-11-25 17:09:43 +01:00
Bjoern Rabenstein 4efc60174b Tweak and verify a few parameters.
Remove TODOs accordingly.

Change-Id: Ic062e13b6ae89a9135d3f14011114fe1cca1cef8
2014-11-25 17:09:43 +01:00
Bjoern Rabenstein 5f8e9617ef Add more tests.
Add an end-to-end fuzz and race test.

Fix a race exposed by the above.

Change-Id: Ifaa39a90cefbde8d4c29bda197cc92592ded21bb
2014-11-25 17:09:17 +01:00
Bjoern Rabenstein d215e013b7 Fix the weird chunkDesc shuffling bug.
The root cause was that after chunkDesc eviction, the offset between
memory representation of chunk layout (via chunkDescs in memory) was
shiftet against chunks as layed out on disk. Keeping the offset up to
date is by no means trivial, so this commit is pretty involved.

Also, found a race that for some reason didn't bite us so far:
Persisting chunks was completel unlocked, so if chunks were purged on
disk at the same time, disaster would strike. However, locking the
persisting of chunk revealed interesting dead locks. Basically, never
queue under the fp lock.

Change-Id: I1ea9e4e71024cabbc1f9601b28e74db0c5c55db8
2014-11-25 17:09:17 +01:00
Bjoern Rabenstein a617269b12 Avoid unnecessary cloning of the head chunk.
Change-Id: I5da774515d5493166a197b5814d0a720628cfaff
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein f1de5b0c4e Run checkpointing of in-memory metrics and head chunks periodically.
Checkpointing interval is now a command line flag.

Along the way, several things were refactored.
- Restructure the way the storage is started and stopped..
- Number of series in checkpoint is now a uint64, not a varint.
  (Breaks old checkpoints, needs wipe!)
- More consistent naming and order of methods.

Change-Id: I883d9170c9a608ee716bb0ab3d0ded8ca03760d9
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 74c9b34a5e Improve storage instrumentation even more.
Add gauge for chunks and chunkdescs in memory (backed by a global
variable to be used later not only for instrumentation but also for
memory management).

Refactored instrumentation code once more (instrumentation.go is back :).

Change-Id: Ife39947e22a48cac4982db7369c231947f446e17
2014-11-25 17:09:04 +01:00
Julius Volz c3fcea45e3 Support finer time resolutions than 1 second.
Change-Id: I4c5f1d6d2361e841999b23283d1961b1bd0c2859
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 443dd33805 Improve instrumentation in storage.
Also, fix some other minor bugs.

Change-Id: If72f1c058b0f47d3e378fdf80228d7e9a8db06c7
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 1936a40e75 Minor loging improvement.
Change-Id: I7875d1a58ef9c5ff149f18e36f65959a4712fea2
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 192bf52c41 Evict chunkDescs, too.
Change-Id: I8b70f22fbf1dfcbc49f9ec391985144649e6ce9c
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 95f392fb2c Prevent an indexing death spiral.
Change-Id: I86b20cd0830d02f87b2f020767257e2d3fb2033c
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 40354eaa29 Reduce directory depth by one.
Change-Id: I7f89df61135ff19169ed97633a662685d414c448
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 096fa0f8b2 Squash a number of TODOs.
- Staleness delta is no a proper function parameter and not replicated
  from package ast.

- Named type 'chunks' replaced by explicit '[]chunk' to avoid confusion.

- For the same reason, replaced 'chunkDescs' by '[]*chunkDescs'.

- Verified that math.Modf is not a speed enhancement over conversion
  (actually 5x slower).

- Renamed firstTimeField, lastTimeField into chunkFirstTime and
  chunkLastTime.

- Verified unpin() is sufficiently goroutine-safe.

- Decided not to update archivedFingerprintToTimeRange upon series
  truncation and added a rationale why.

Change-Id: I863b8d785e5ad9f71eb63e229845eacf1bed8534
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 427c8d53a5 Fix handling of empty chunkDescs while preloading chunks.
Change-Id: I73ce89fe0ef90c6eda78218e5be2cbfa0207c364
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein ecee5d8281 Fix head chunk persisting and a chunkDesc race condition.
- Head chunk persisting only happens in evictOlderThan, so do it
  there.  (With the previous code, it would never happen.)

- Raw accesses to chunkDesc.chunk are now done via isEvicted (with
  locking).

Change-Id: I48b07b56dfea4899b50df159b4ea566954396fcd
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 6b37e47f9e Remove unused metrics.
Change-Id: Icf03ba4ce92a5e38daf12930f9661daba79c83bb
2014-11-25 17:09:03 +01:00
Bjoern Rabenstein 2b4ff620aa Return a nop iterator for series that have been purged completely.
Change-Id: I6e92cac4472486feefdecba8593c17867e8c710d
2014-11-25 17:09:03 +01:00
Bjoern Rabenstein 6e3a366f91 Only archive a time series when none of its chunks is pinned.
Change-Id: I7e4b67c34b417b8980173bc5dc3b213bd7d698e5
2014-11-25 17:09:03 +01:00
Julius Volz bfa64248b7 Deal with missing series in preloading.
Change-Id: Ibf3a57b329f40a3d5e0b98464a2f45d2f1bd07bf
2014-11-25 17:09:03 +01:00
Bjoern Rabenstein ca42a22e20 Add safety panic to seriesMap.put.
Change-Id: I4d4d2e45cc0f908a33eb1ae6e3ee6796adfcbd1e
2014-11-25 17:09:03 +01:00
Bjoern Rabenstein 83b4fa868d Fix GetBoundaryValues.
Change-Id: I8f8bbdb88e9b24e4c37ff869126ed9343f261ce2
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein b3ed9aa7a2 Clean up start-up and shut-down.
Change-Id: Idff4bbb0a15a9f879bfbb3da5b1025179cab5e2c
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein 4447708c9f Fix a race in target.go.
Also, fix problems in shutdown.
Starting serving and shutdown still has to be cleaned up properly.
It's a mess.

Change-Id: I51061db12064e434066446e6fceac32741c4f84c
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein fd6600850a Fix race in chunkDesc.
Change-Id: Id7bae115d75886e10d44184a690a76777b1531fe
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein 1c53c09558 Treat empty chunkDescs properly in preloadChunksForRange.
Change-Id: Ida1bd3fe1f9fb0ea2d5dbb9704be926f0824f873
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein 934d09f738 Fix race during shutdown.
Change-Id: I2f8bf48d92a14f1e5ecde27c1b138734d7653394
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein 38fc24d0ed Fix targetpool_test.go and other tests.
Change-Id: I91a4dd1d39e01f174e1aaae653ce1ed7aecaa624
2014-11-25 17:08:26 +01:00
Julius Volz 7f5d3c2c29 Fix and improve the fp locker.
Benchmark:
$ go test -bench 'Fingerprint' -test.run 'Fingerprint' -test.cpu=1,2,4

OLD
BenchmarkFingerprintLockerParallel        500000              3618 ns/op
BenchmarkFingerprintLockerParallel-2      100000             12257 ns/op
BenchmarkFingerprintLockerParallel-4      500000             10164 ns/op
BenchmarkFingerprintLockerSerial        10000000               283 ns/op
BenchmarkFingerprintLockerSerial-2      10000000               284 ns/op
BenchmarkFingerprintLockerSerial-4      10000000               288 ns/op

NEW
BenchmarkFingerprintLockerParallel       1000000              1018 ns/op
BenchmarkFingerprintLockerParallel-2     1000000              1164 ns/op
BenchmarkFingerprintLockerParallel-4     2000000               910 ns/op
BenchmarkFingerprintLockerSerial        50000000                56.0 ns/op
BenchmarkFingerprintLockerSerial-2      50000000                47.9 ns/op
BenchmarkFingerprintLockerSerial-4      50000000                54.5 ns/op

Change-Id: I3c65a43822840e7e64c3c3cfe759e1de51272581
2014-11-25 17:07:45 +01:00
Bjoern Rabenstein 7ad55ef83c Actually close the iterator channels.
Change-Id: I6f6a2aef5ff55c6b2d21ad91d02ae6b0ecba4ae8
2014-11-25 17:07:45 +01:00
Bjoern Rabenstein 8fba3302bc Bold changes to concurrency.
(WIP. Probably doesn't work yet.)

Change-Id: Id1537dfcca53831a1d428078a5863ece7bdf4875
2014-11-25 17:07:45 +01:00
Bjoern Rabenstein fcdf5a8ee7 Fix bugs in chunk evict code.
Also, simplify code by re-looking up metric in metric map.

Change-Id: Ib2092f9184374e5a543e87d3a9f4a74fda64b193
2014-11-25 17:07:45 +01:00
Bjoern Rabenstein 7e6a03fbf9 Fix a few concurrency issues before starting to use the new fp locker.
Change-Id: I8615e8816e79ef0882e123163ee590c739b79d12
2014-11-25 17:07:45 +01:00
Julius Volz db92620163 Instrument eviction and purge durations.
Change-Id: Ia5b2319363ad2644674c9b7a94162a89bcc296fb
2014-11-25 17:07:45 +01:00
Julius Volz e0ee7ec7ab Add fingerprintLocker for locking individual fingerprints.
Change-Id: Id41ba555715229edf7d6543f56736b82f6eff1ef
2014-11-25 17:07:45 +01:00
Julius Volz df1b2a2422 Fix indexing latency instrumentation.
Change-Id: I532c170121cd2996d1a378adbb1fd551cd5a4e38
2014-11-25 17:07:44 +01:00
Bjoern Rabenstein 01dd618a20 Fix a locking bug.
Change-Id: I183780785991d0b4165ce9186f53eb8201fb3ed5
2014-11-25 17:07:44 +01:00
Julius Volz a746fbb8bc Instrument indexing: queue length, batch sizes and latencies.
Change-Id: I60bcbd24b160e47d418a485d8cffa39344a257c6
2014-11-25 17:07:44 +01:00
Bjoern Rabenstein aea32b0b4b Avoid redundant fingerprint calculation.
Change-Id: Ief8a165dcfa5030226953346ec9dfe4a7787df1f
2014-11-25 17:07:44 +01:00
Bjoern Rabenstein e9ff29c547 Comment/code cleanup.
Change-Id: I38736e3d0fec79759a2bafa35aecf914480ff810
2014-11-25 17:07:44 +01:00
Bjoern Rabenstein 0031a448e2 Add WaitForIndexing.
Change-Id: I5a5c975c4246632f937413322c855bbe63d00802
2014-11-25 17:07:44 +01:00
Bjoern Rabenstein c7aad110fb Add an indexing queue and batch the ops.
Some other improvements on the way, in particular codec -> codable
renaming and addition of LookupSet methods.

Change-Id: I978f8f3f84ca8e4d39a9d9f152ae0ad274bbf4e2
2014-11-25 17:07:44 +01:00
Bjoern Rabenstein 71206dbc06 More code cleanups.
Add license text everywhere.
And others....

Change-Id: I11ccde267a2ef7eb366c4788ba7aeae14ba7545c
2014-11-25 17:07:44 +01:00
Julius Volz f0d5d4bda3 Fix bug around index purging.
Change-Id: I8cea00e03f72bbeead2cbd2d26b34d986059ced0
2014-11-25 17:07:44 +01:00
Julius Volz 630b5a087a Also consider on-disk fingerprints during purge.
This reintroduces LevelDB iterators so that we can iterate through all
the on-disk fingerprints.

Change-Id: I007ee4638d038d2a4461bbda27f30fcaad411474
2014-11-25 17:07:35 +01:00
Bjoern Rabenstein f5f9f3514a Major code cleanup.
- Make it go-vet and golint clean.
- Add comments, TODOs, etc.

Change-Id: If1392d96f3d5b4cdde597b10c8dff1769fcfabe2
2014-11-25 17:02:53 +01:00
Bjoern Rabenstein 3592dc2359 Implement series eviction.
Change-Id: I7a503e0ba78aae3761d032851b06f2807122b085
2014-11-25 17:02:52 +01:00
Bjoern Rabenstein bbf49200ab Implement methods in persistence.go.
Change-Id: I804cdd0b30420e171825fd86fe1281eca0d5e638
2014-11-25 17:02:23 +01:00
Bjoern Rabenstein 5a128a04a9 Major reorganization of the storage.
Most important, the heads file will now persist all the chunk descs,
too. Implicitly, it will serve as the persisted form of the
fp-to-series map.

Change-Id: Ic867e78f2714d54c3b5733939cc5aef43f7bd08d
2014-11-25 17:02:01 +01:00
Bjoern Rabenstein e7cb9ddb9f Use a sync.pool for the staging buffer in codec.go.
Change-Id: I1aae6847f77b5a7c75582b07c199b1943cf90552
2014-11-25 17:02:01 +01:00
Bjoern Rabenstein 4770cf76a4 Make index package more self-contained.
Moved interna from diskPersistence into the indexer.
TotalIndexer now called diskIndexer.

Change-Id: I6c8c62cb171f12bbd8a5474773af7786d71ba388
2014-11-25 17:02:01 +01:00
Bjoern Rabenstein 89f10e8eb2 Move to using the standard library interfaces for encoding/decoding.
BinaryMarshaler instead of encodable.
BinaryUnmarshaler instead of decodable.

Left 'codable' in place for lack of a better word.

Change-Id: I8a104be7d6db916e8dbc47ff95e6ff73b845ac22
2014-11-25 17:02:01 +01:00
Bjoern Rabenstein af77d5ef0b Added a few missing implementations in index.go.
Also, added closing of persistence and mem storage.

Change-Id: Iacf0d22c3520dd2584d9546984c1f8a5ed6cd54e
2014-11-25 17:02:01 +01:00
Julius Volz cca7ebe906 Some more cleanups / obsolete code removals.
Change-Id: I584144ceeeedafdb114266d8a6d2513e67b1d010
2014-11-25 17:02:00 +01:00
Julius Volz 7e85711df0 Beginnings of a tiered index implementation.
This reintroduces a LevelDB-based metrics index.

Change-Id: I4111540301c52255a07b2f570761707a32f72c05
2014-11-25 17:02:00 +01:00
Julius Volz 8dfaa5ecd2 Remove use of freelists for chunk bufs.
Change-Id: Ib887fdb61e1d96da0cd32545817b925ba88831c1
2014-11-25 17:02:00 +01:00
Julius Volz 7b35e0f0b8 Use constants from math package instead of literals.
Change-Id: I55427ba32c2cbb32ee42ec1e3153160965ab8b3c
2014-11-25 17:02:00 +01:00
Julius Volz 15929eece2 Unpin any already loaded chunks upon preloading error.
Change-Id: Ib451136e3ef21bce8b814c21b66eaab727ab341b
2014-11-25 17:02:00 +01:00
Julius Volz fd01d07589 Check that chunk buffer length fits in 16 bit.
Change-Id: Id086a54aa8a1990c1979e747c1c02e53bed6d447
2014-11-25 17:02:00 +01:00
Bjoern Rabenstein 1ca7f24137 Remove float diff tolerance altogether.
Change-Id: I9ea9683a4665d5800fca75560bb4b8a8b4406d55
2014-11-25 17:02:00 +01:00
Bjoern Rabenstein d742edfe0d Fix precision loss.
Large delta values often imply a difference between a large base value
and the large delta value, potentially resulting in small numbers with
a huge precision error. Since large delta values need 8 bytes anyway,
we are not even saving memory.

As a solution, always save the absoluto value rather than a delta once
8 bytes would be needed for the delta. Timestamps are then saved as 8
byte integers, while values are always saved as float64 in that case.

Change-Id: I01100d600515e16df58ce508b50982ffd762cc49
2014-11-25 17:02:00 +01:00
Bjoern Rabenstein dc2e463a97 Improvements after review.
Change-Id: I484359282d4c7113518bbbb131f4f18383c08fdb
2014-11-25 17:02:00 +01:00
Bjoern Rabenstein 52c9dc43a3 Improve testing.
In particular, create a fuzz test for time series.

Change-Id: I523a17912405a0b6b46bd395c781d201dfe55036
2014-11-25 17:02:00 +01:00
Julius Volz 3b25867d61 Add chunk persistence tests, fix storage tests.
Change-Id: Id0b8f5382e99efa839cc0f826e92bbda985fe9a9
2014-11-25 17:02:00 +01:00
Bjoern Rabenstein ecdf5ab14f Index-persistence switched from gob to a hand-coded solution.
Change-Id: Ib4ec42535bd08df16d34d4774bb638e35c5a1841
2014-11-25 17:02:00 +01:00
Julius Volz e7ed39c9a6 Initial experimental snapshot of next-gen storage.
Change-Id: Ifb8709960dbedd1d9f5efd88cdd359ee9fa9d26d
2014-11-25 17:02:00 +01:00
Julius Volz c6e9f085a3 Update used Go version to 1.3.
Go downloads moved to a different URL and require following redirects
(curl's '-L' option) now.

Go 1.3 deliberately randomizes ranges over maps, which uncovered some
bugs in our tests. These are fixed too.

Change-Id: Id2d9e185d8d2379a9b7b8ad5ba680024565d15f4
2014-11-25 17:02:00 +01:00
Bjoern Rabenstein 1909686789 Make metrics exported by the Prometheus server itself more consistent.
- Always spell out the time unit (e.g. milliseconds instead of ms).

- Remove "_total" from the names of metrics that are not counters.

- Make use of the "Namespace" and "Subsystem" fields in the options.

- Removed the "capacity" facet from all metrics about channels/queues.
  These are all fixed via command line flags and will never change
  during the runtime of a process. Also, they should not be part of
  the same metric family. I have added separate metrics for the
  capacity of queues as convenience. (They will never change and are
  only set once.)

- I left "metric_disk_latency_microseconds" unchanged, although that
  metric measures the latency of the storage device, even if it is not
  a spinning disk. "SSD" is read by many as "solid state disk", so
  it's not too far off. (It should be "solid state drive", of course,
  but "metric_drive_latency_microseconds" is probably confusing.)

- Brian suggested to not mix "failure" and "success" outcome in the
  same metric family (distinguished by labels). For now, I left it as
  it is. We are touching some bigger issue here, especially as other
  parts in the Prometheus ecosystem are following the same
  principle. We still need to come to terms here and then change
  things consistently everywhere.

Change-Id: If799458b450d18f78500f05990301c12525197d3
2014-11-25 17:02:00 +01:00
Julius Volz 80b3d3bf34 Speed up disk flushes by removing unnecessary sort.
The first sort in groupByFingerprint already ensures that all resulting sample
lists contain only one fingerprint. We also already assume that all
samples passed into AppendSamples (and thus groupByFingerprint) are
chronologically sorted within each fingerprint.

The extra chronological sort is thus superfluous. Furthermore, this
second sort didn't only sort chronologically, but also compared all
metric fingerprints again (although we already know that we're only
sorting within samples for the same fingerprint). This caused a huge
memory and runtime overhead.

In a heavily loaded real Prometheus, this brought down disk flush times
from ~9 minutes to ~1 minute.

OLD:
BenchmarkLevelDBAppendRepeatingValues   5  331391808 ns/op  44542953 B/op   597788 allocs/op
BenchmarkLevelDBAppendsRepeatingValues  5  329893512 ns/op  46968288 B/op  3104373 allocs/op

NEW:
BenchmarkLevelDBAppendRepeatingValues   5  299298635 ns/op  43329497 B/op   567616 allocs/op
BenchmarkLevelDBAppendsRepeatingValues 20   92204601 ns/op   1779454 B/op    70975 allocs/op

Change-Id: Ie2d8db3569b0102a18010f9e106e391fda7f7883
2014-11-25 17:01:59 +01:00
Julius Volz 21cafe6cd7 Only evict memory series after they are on disk.
This fixes the problem where samples become temporarily unavailable for
queries while they are being flushed to disk. Although the entire
flushing code could use some major refactoring, I'm explicitly trying to
do the minimal change to fix the problem since there's a whole new
storage implementation in the pipeline.

Change-Id: I0f5393a30b88654c73567456aeaea62f8b3756d9
2014-11-25 17:01:59 +01:00
Bjoern Rabenstein 8956faeccb Migrate to new client_golang.
This change will only be submitted when the new client_golang has been
moved to the new version.

Change-Id: Ifceb59333072a08286a8ac910709a8ba2e3a1581
2014-11-25 17:01:59 +01:00
Brian Brazil e041c0cd46 Add console and alert templates with access to all data.
Move rulemanager to it's own package to break cicrular dependency.
Make NewTestTieredStorage available to tests, remove duplication.

Change-Id: I33b321245a44aa727bfc3614a7c9ae5005b34e03
2014-05-30 16:24:56 +01:00
Bjoern Rabenstein ca6a4fccef Weed out our homegrown test.Tester.
The Go stdlib has testing.TB now, which fulfills the exact same
purpose.

Change-Id: I0db9c73400e208ca376b932a02b7e3402234b87c
2014-05-21 19:27:24 +02:00
Julius Volz 4df5c7ab18 Optimize label matcher memory and runtime behavior.
This optimizes the runtime and memory allocation behavior for label matchers
other than type "Equal". Instead of creating a new set for every union of
fingerprints, this simply adds new fingerprints to the existing set to achieve
the same effect.

The current behavior made a production Prometheus unresponsive when running a
NotEqual match against the "instance" label (a label with high value
cardinality).

BEFORE:
BenchmarkGetFingerprintsForNotEqualMatcher        10   170430297 ns/op  39229944 B/op    40709 allocs/op

AFTER:
BenchmarkGetFingerprintsForNotEqualMatcher      5000      706260 ns/op    217717 B/op     1116 allocs/op

Change-Id: Ifd78e81e7dfbf5d7249e50ad1903a5d9c42c347a
2014-05-05 11:29:17 -04:00
Bjoern Rabenstein de9a88b964 Ensure temporal order in streams.
BenchmarkAppendSample.* before this change:

BenchmarkAppendSample1   1000000              1142 ns/op
--- BENCH: BenchmarkAppendSample1
        memory_test.go:81: 1 cycles with 9992.000000 bytes per cycle, totalling 9992
        memory_test.go:81: 100 cycles with 250.399994 bytes per cycle, totalling 25040
        memory_test.go:81: 10000 cycles with 239.428802 bytes per cycle, totalling 2394288
        memory_test.go:81: 1000000 cycles with 255.504684 bytes per cycle, totalling 255504688
BenchmarkAppendSample10   500000              3823 ns/op
--- BENCH: BenchmarkAppendSample10
        memory_test.go:81: 1 cycles with 15536.000000 bytes per cycle, totalling 15536
        memory_test.go:81: 100 cycles with 662.239990 bytes per cycle, totalling 66224
        memory_test.go:81: 10000 cycles with 601.937622 bytes per cycle, totalling 6019376
        memory_test.go:81: 500000 cycles with 598.582764 bytes per cycle, totalling 299291408
BenchmarkAppendSample100           50000             41111 ns/op
--- BENCH: BenchmarkAppendSample100
        memory_test.go:81: 1 cycles with 79824.000000 bytes per cycle, totalling 79824
        memory_test.go:81: 100 cycles with 4924.479980 bytes per cycle, totalling 492448
        memory_test.go:81: 10000 cycles with 4278.019043 bytes per cycle, totalling 42780192
        memory_test.go:81: 50000 cycles with 4275.242676 bytes per cycle, totalling 213762144
BenchmarkAppendSample1000           5000            533933 ns/op
--- BENCH: BenchmarkAppendSample1000
        memory_test.go:81: 1 cycles with 840224.000000 bytes per cycle, totalling 840224
        memory_test.go:81: 100 cycles with 62789.281250 bytes per cycle, totalling 6278928
        memory_test.go:81: 5000 cycles with 55208.601562 bytes per cycle, totalling 276043008
ok      github.com/prometheus/prometheus/storage/metric/tiered  27.828s

BenchmarkAppendSample.* after this change:

BenchmarkAppendSample1   1000000              1109 ns/op
--- BENCH: BenchmarkAppendSample1
        memory_test.go:131: 1 cycles with 9992.000000 bytes per cycle, totalling 9992
        memory_test.go:131: 100 cycles with 250.399994 bytes per cycle, totalling 25040
        memory_test.go:131: 10000 cycles with 239.220795 bytes per cycle, totalling 2392208
        memory_test.go:131: 1000000 cycles with 255.492630 bytes per cycle, totalling 255492624
BenchmarkAppendSample10   500000              3663 ns/op
--- BENCH: BenchmarkAppendSample10
        memory_test.go:131: 1 cycles with 15536.000000 bytes per cycle, totalling 15536
        memory_test.go:131: 100 cycles with 662.239990 bytes per cycle, totalling 66224
        memory_test.go:131: 10000 cycles with 601.889587 bytes per cycle, totalling 6018896
        memory_test.go:131: 500000 cycles with 598.550903 bytes per cycle, totalling 299275472
BenchmarkAppendSample100           50000             40694 ns/op
--- BENCH: BenchmarkAppendSample100
        memory_test.go:131: 1 cycles with 78976.000000 bytes per cycle, totalling 78976
        memory_test.go:131: 100 cycles with 4928.319824 bytes per cycle, totalling 492832
        memory_test.go:131: 10000 cycles with 4277.961426 bytes per cycle, totalling 42779616
        memory_test.go:131: 50000 cycles with 4275.054199 bytes per cycle, totalling 213752720
BenchmarkAppendSample1000           5000            530744 ns/op
--- BENCH: BenchmarkAppendSample1000
        memory_test.go:131: 1 cycles with 842192.000000 bytes per cycle, totalling 842192
        memory_test.go:131: 100 cycles with 62765.441406 bytes per cycle, totalling 6276544
        memory_test.go:131: 5000 cycles with 55209.812500 bytes per cycle, totalling 276049056
ok      github.com/prometheus/prometheus/storage/metric/tiered  27.468s

Change-Id: Idaa339cd83539b5e4391614541a2c3a04002d66d
2014-04-22 15:22:54 +02:00
Julius Volz 1b29975865 Fix RWLock memory storage deadlock.
This fixes https://github.com/prometheus/prometheus/issues/390

The cause for the deadlock was a lock semantic in Go that wasn't
obvious to me when introducing this bug:

http://golang.org/pkg/sync/#RWMutex.Lock

Key phrase: "To ensure that the lock eventually becomes available, a
blocked Lock call excludes new readers from acquiring the lock."

In the memory series storage, we have one function
(GetFingerprintsForLabelMatchers) acquiring an RLock(), which calls
another function also acquiring the same RLock()
(GetLabelValuesForLabelName). That normally doesn't deadlock, unless a
Lock() call from another goroutine happens right in between the two
RLock() calls, blocking both the Lock() and the second RLock() call from
ever completing.

  GoRoutine 1          GoRoutine 2
  ======================================
  RLock()
  ...                  Lock() [DEADLOCK]
  RLock() [DEADLOCK]   Unlock()
  RUnlock()
  RUnlock()

Testing deadlocks is tricky, but the regression test I added does
reliably detect the deadlock in the original code on my machine within a
normal concurrent reader/writer run duration of 250ms.

Change-Id: Ib34c2bb8df1a80af44550cc2bf5007055cdef413
2014-04-17 13:43:13 +02:00
Julius Volz 01f652cb4c Separate storage implementation from interfaces.
This was initially motivated by wanting to distribute the rule checker
tool under `tools/rule_checker`. However, this was not possible without
also distributing the LevelDB dynamic libraries because the tool
transitively depended on Levigo:

rule checker -> query layer -> tiered storage layer -> leveldb

This change separates external storage interfaces from the
implementation (tiered storage, leveldb storage, memory storage) by
putting them into separate packages:

- storage/metric: public, implementation-agnostic interfaces
- storage/metric/tiered: tiered storage implementation, including memory
                         and LevelDB storage.

I initially also considered splitting up the implementation into
separate packages for tiered storage, memory storage, and LevelDB
storage, but these are currently so intertwined that it would be another
major project in itself.

The query layers and most other parts of Prometheus now have notion of
the storage implementation anymore and just use whatever implementation
they get passed in via interfaces.

The rule_checker is now a static binary :)

Change-Id: I793bbf631a8648ca31790e7e772ecf9c2b92f7a0
2014-04-16 13:30:19 +02:00
Matt T. Proud 3e969a8ca2 Parameterize the buffer for marshal/unmarshal.
We are not reusing buffers yet.  This could introduce problems,
so the behavior is disabled for now.

Cursory benchmark data:
- Marshal for 10,000 samples: -30% overhead.
- Unmarshal for 10,000 samples: -15% overhead.

Change-Id: Ib006bdc656af45dca2b92de08a8f905d8d728cac
2014-04-16 12:16:59 +02:00
Matt T. Proud 58ef638e72 Merge "Use idiomatic one-to-many one-time signal pattern." 2014-04-15 21:26:31 +02:00
Matt T. Proud 6ec72393c4 Correct size of unmarshalling destination buffer.
The format header size is not deducted from the size of the byte
stream when calculating the output buffer size for samples.  I have
yet to notice problems directly as a result of this, but it is good
to fix.

Change-Id: Icb07a0718366c04ddac975d738a6305687773af0
2014-04-15 11:55:44 +02:00
Matt T. Proud 81367893fd Use idiomatic one-to-many one-time signal pattern.
The idiomatic pattern for signalling a one-time message to multiple
consumers from a single producer is as follows:

```
  c := make(chan struct{})
  w := new(sync.WaitGroup)  // Boilerplate to ensure synchronization.

  for i := 0; i < 1000; i++ {
    w.Add(1)
    go func() {
      defer w.Done()

      for {
        select {
        case _, ok := <- c:
          if !ok {
            return
          }
        default:
          // Do something here.
        }
      }
    }()
  }

  close(c)  // Signal the one-to-many single-use message.
  w.Wait()

```

Change-Id: I755f73ba4c70a923afd342a4dea63365bdf2144b
2014-04-15 10:15:25 +02:00
Julius Volz c7c0b33d0b Add regex-matching support for labels.
There are four label-matching ops for selecting timeseries now:

- Equal: =
- NotEqual: !=
- RegexMatch: =~
- RegexNoMatch: !~

Instead of looking up labels by a simple clientmodel.LabelSet (basically
an equals op for every key/value pair in the set), timeseries
fingerprint selection is now done via a list of metric.LabelMatchers.

Change-Id: I510a83f761198e80946146770ebb64e4abc3bb96
2014-04-01 14:24:53 +02:00
Julius Volz ae30453214 Add label names -> label values index.
Change-Id: Ie39b4044558afc4d1aa937de7dcf8df61f821fb4
2014-03-28 15:16:37 +01:00
Julius Volz 7a577b86b7 Fix interval op special case.
In the case that a getValuesAtIntervalOp's ExtractSamples() is called
with a current time after the last chunk time, we return without
extracting any further values beyond the last one in the chunk
(correct), but also without advancing the op's time (incorrect). This
leads to an infinite loop in renderView(), since the op is called
repeatedly without ever being advanced and consumed.

This adds handling for this special case. When detecting this case, we
immediately set the op to be consumed, since we would always get a value
after the current time passed in if there was one.

Change-Id: Id99149e07b5188d655331382b8b6a461b677005c
2014-03-26 13:29:03 +01:00
Bjoern Rabenstein 257b720e87 Fix typo.
Change-Id: I6e7edcb48ace7fe4d6de4ff16519da5bb326b6ce
2014-03-25 12:22:18 +01:00
Bjoern Rabenstein caf47b2fbc New encoding for OpenTSDB tag values (and metric names).
Change-Id: I0f4393f638c6e2bb2b2ce14e58e38b49ce456da8
2014-03-21 17:18:44 +01:00
Julius Volz 9d5c367745 Fix incorrect interval op advancement.
This fixes a bug where an interval op might advance too far past the end
of the currently extracted chunk, effectively skipping over relevant
(to-be-extracted) values in the subsequent chunk. The result: missing
samples at chunk boundaries in the resulting view.

Change-Id: Iebf5d086293a277d330039c69f78e1eaf084b3c8
2014-03-18 16:22:50 +01:00
Julius Volz cc04238a85 Switch to new "__name__" metric name label.
This also fixes the compaction test, which before worked only because
the input sample sorting was accidentally equal to the resulting on-disk
sample sorting.

Change-Id: I2a21c4b46ba562424b27058fc02eba84fa6a6006
2014-03-14 16:52:37 +01:00
Bjoern Rabenstein c3b282bd14 Add regression tests for 'loop until op is consumed' bug.
- Most of this is the actual regression test in tiered_test.go.

- Working on that regression tests uncovered problems in
  tiered_test.go that are fixed in this commit.

- The 'op.consumed = false' line added to freelist.go was actually not
  fixing a bug. Instead, there was no bug at all. So this commit
  removes that line again, but adds a regression test to make sure
  that the assumed bug is indeed not there (cf. freelist_test.go).

- Removed more code duplication in operation.go (following the same
  approach as before, i.e. embedding op type A into op type B if
  everything in A is the same as in B with the exception of String()
  and ExtractSample()). (This change make struct literals for ops more
  clunky, but that only affects tests. No code change whatsoever was
  necessary in the actual code after this refactoring.)

- Fix another op leak in tiered.go.

Change-Id: Ia165c52e33290ad4f6aba9c83d92318d4f583517
2014-03-12 18:40:24 +01:00
Julius Volz 86fc13a52e Convert metric.Values to slice of values.
The initial impetus for this was that it made unmarshalling sample
values much faster.

Other relevant benchmark changes in ns/op:

Benchmark                                 old        new   speedup
==================================================================
BenchmarkMarshal                       179170     127996     1.4x
BenchmarkUnmarshal                     404984     132186     3.1x

BenchmarkMemoryGetValueAtTime           57801      50050     1.2x
BenchmarkMemoryGetBoundaryValues        64496      53194     1.2x
BenchmarkMemoryGetRangeValues           66585      54065     1.2x

BenchmarkStreamAdd                       45.0       75.3     0.6x
BenchmarkAppendSample1                   1157       1587     0.7x
BenchmarkAppendSample10                  4090       4284     0.95x
BenchmarkAppendSample100                45660      44066     1.0x
BenchmarkAppendSample1000              579084     582380     1.0x
BenchmarkMemoryAppendRepeatingValues 22796594   22005502     1.0x

Overall, this gives us good speedups in the areas where they matter
most: decoding values from disk and accessing the memory storage (which
is also used for views).

Some of the smaller append examples take minimally longer, but the cost
seems to get amortized over larger appends, so I'm not worried about
these. Also, we're currently not bottlenecked on the write path and have
plenty of other optimizations available in that area if it becomes
necessary.

Memory allocations during appends don't change measurably at all.

Change-Id: I7dc7394edea09506976765551f35b138518db9e8
2014-03-11 18:23:37 +01:00
Julius Volz a7d0973fe3 Add version field to LevelDB sample format.
This doesn't add complex discriminator logic yet, but adds a single
version byte to the beginning of each samples chunk. If we ever need to
change the disk format again, this will make it easy to do so without
having to wipe the entire database.

Change-Id: I60c39274256f790bc2da83167a1effaa174588fe
2014-03-11 14:08:40 +01:00
Julius Volz 1eee448bc1 Store samples in custom binary encoding.
This has been shown to provide immense decoding speed benefits.

See also:

https://groups.google.com/forum/#!topic/prometheus-developers/FeGl_qzGrYs

Change-Id: I7d45b4650e44ddecaa91dad9d7fdb3cd0b9f15fe
2014-03-09 22:31:38 +01:00
Julius Volz c2a2a20f36 Remove obsolete scanjobs timer.
Change-Id: Ifb29b4d93c9c1c6cacb8b098d5237866925c9fac
2014-03-07 17:10:28 +01:00
Julius Volz dd4892dcad Ensure no ops are leaked in renderView().
Change-Id: I6970a9098be305fcd010d46443b040d864d9740a
2014-03-07 14:33:13 +01:00
Julius Volz 5745ce0a60 Fixups for single-op-per-fingerprint view rendering.
Change-Id: Ie496d4529b65a3819c6042f43d7cf99e0e1ac60b
2014-03-07 00:54:28 +01:00
Björn Rabenstein 8b43497002 Merge "Fix memory series indexing bug." 2014-03-06 11:53:10 +01:00
Björn Rabenstein 0bb33b6525 Merge "Remove unused labelname -> fingerprints index." 2014-03-06 11:40:09 +01:00
Julius Volz d6827b6898 Fix memory series indexing bug.
This fixes https://github.com/prometheus/prometheus/issues/381.

For any stale series we dropped from memory, this bug caused us to also drop
any other series from the labelpair->fingerprints memory index if they had any
label/value-pairs in common with the intentionally dropped series.

To fix this issue more easily, I converted the labelpair->fingerprints index
map values to a utility.Set of clientmodel.Fingerprints. This makes handling
this index much easier in general.

Change-Id: If5e81e202e8c542261bbd9797aa1257376c5c074
2014-03-06 01:23:22 +01:00
Julius Volz c6013ff309 Remove unused labelname -> fingerprints index.
Change-Id: Ie4ccea3a230532e670030ca64ede9435b1b3e506
2014-03-05 23:49:33 +01:00
Bjoern Rabenstein 9ea9189dd1 Remove the multi-op-per-fingerprint capability.
Currently, rendering a view is capable of handling multiple ops for
the same fingerprint efficiently. However, this capability requires a
lot of complexity in the code, which we are not using at all because
the way we assemble a viewRequest will never have more than one
operation per fingerprint.

This commit weeds out the said capability, along with all the code
needed for it. It is still possible to have more than one operation
for the same fingerprint, it will just be handled in a less efficient
way (as proven by the unit tests).

As a result, scanjob.go could be removed entirely.

This commit also contains a few related refactorings and removals of
dead code in operation.go, view,go, and freelist.go. Also, the
docstrings received some love.

Change-Id: I032b976e0880151c3f3fdb3234fb65e484f0e2e5
2014-03-04 16:29:56 +01:00
Bjoern Rabenstein e11e8c7a23 Unify LevelDB.*Options.
We have seven different types all called like LevelDB.*Options.  One
of them is the plain LevelDBOptions. All others are just wrapping that
type without adding anything except clunkier handling.

If there ever was a plan to add more specific options to the various
LevelDB.*Options types, history has proven that nothing like that is
going to happen anytime soon.

To keep the code a bit shorter and more focused on the real (quite
significant) complexities we have to deal with here, this commit
reduces all uses of LevelDBOptions to the actual LevelDBOptions type.

1576 fewer characters to read...

Change-Id: I3d7a2b7ffed78b337aa37f812c53c058329ecaa6
2014-02-27 16:03:58 +01:00
Bjoern Rabenstein 6bc083f38b Major code cleanup in storage.
- Mostly docstring fixed/additions.
  (Please review these carefully, since most of them were missing, I
  had to guess them from an outsider's perspective. (Which on the
  other hand proves how desperately required many of these docstrings
  are.))

- Removed all uses of new(...) to meet our own style guide (draft).

- Fixed all other 'go vet' and 'golint' issues (except those that are
  not fixable (i.e. caused by bugs in or by design of 'go vet' and
  'golint')).

- Some trivial refactorings, like reorder functions, minor renames, ...

- Some slightly less trivial refactoring, mostly to reduce code
  duplication by embedding types instead of writing many explicit
  forwarders.

- Cleaned up the interface structure a bit. (Most significant probably
  the removal of the View-like methods from MetricPersistenc. Now they
  are only in View and not duplicated anymore.)

- Removed dead code. (Probably not all of it, but it's a first
  step...)

- Fixed a leftover in storage/metric/end_to_end_test.go (that made
  some parts of the code never execute (incidentally, those parts
  were broken (and I fixed them, too))).

Change-Id: Ibcac069940d118a88f783314f5b4595dce6641d5
2014-02-27 15:22:37 +01:00
Björn Rabenstein 59febe771a Merge "Minor code cleanups." 2014-02-13 15:29:16 +01:00
Julius Volz c4adfc4f25 Minor code cleanups.
Change-Id: Ib3729cf38b107b7f2186ccf410a745e0472e3630
2014-02-13 15:24:43 +01:00
Julius Volz 8cadae6102 Merge "Fix LevelDB closing order." 2014-02-03 23:22:30 +01:00
Julius Volz 94666e20b7 Minor test error reporting cleanup.
Change-Id: Ie11c16b4e60de7c179c6d2a86e063f4432e2000f
2014-02-03 12:27:01 +01:00
Julius Volz fd2158e746 Store copy of metric during fingerprint caching
Problem description:
====================
If a rule evaluation referencing a metric/timeseries M happens at a time
when M doesn't have a memory timeseries yet, looking up the fingerprint
for M (via TieredStorage.GetMetricForFingerprint()) will create a new
Metric object for M which gets both: a) attached to a new empty memory
timeseries (so we don't have to ask disk for the Metric's fingerprint
next time), and b) returned to the rule evaluation layer. However, the
rule evaluation layer replaces the name label (and possibly other
labels) of the metric with the name of the recorded rule.  Since both
the rule evaluator and the memory storage share a reference to the same
Metric object, the original memory timeseries will now also be
incorrectly renamed.

Fix:
====
Instead of storing a reference to a shared metric object, take a copy of
the object when creating an empty memory timeseries for caching
purposes.

Change-Id: I9f2172696c16c10b377e6708553a46ef29390f1e
2014-02-02 17:11:08 +01:00
Julius Volz 718ad2224b Fix LevelDB closing order.
The storage itself should be closed before any of the objects passed into it
are closed (otherwise closing the storage can randomly freeze). Defers are
executed in reverse order, so closing the storage should be the last of the
defer statements.

Change-Id: Id920318b876f5b94767ed48c81221b3456770620
2014-01-28 15:16:06 +01:00
Bjoern Rabenstein c342ad33a0 Fix OperatorError.
This used to work with Go 1.1, but only because of a compiler bug.
The bug is fixed in Go 1.2, so we have to fix our code now.

Change-Id: I5a9f3a15878afd750e848be33e90b05f3aa055e1
2014-01-21 16:49:51 +01:00
Julius Volz d5ef0c64dc Merge "Add optional sample replication to OpenTSDB." 2014-01-08 17:45:08 +01:00
Julius Volz 61d26e8445 Add optional sample replication to OpenTSDB.
Prometheus needs long-term storage. Since we don't have enough resources
to build our own timeseries storage from scratch ontop of Riak,
Cassandra or a similar distributed datastore at the moment, we're
planning on using OpenTSDB as long-term storage for Prometheus. It's
data model is roughly compatible with that of Prometheus, with some
caveats.

As a first step, this adds write-only replication from Prometheus to
OpenTSDB, with the following things worth noting:

1)
I tried to keep the integration lightweight, meaning that anything
related to OpenTSDB is isolated to its own package and only main knows
about it (essentially it tees all samples to both the existing storage
and TSDB). It's not touching the existing TieredStorage at all to avoid
more complexity in that area. This might change in the future,
especially if we decide to implement a read path for OpenTSDB through
Prometheus as well.

2)
Backpressure while sending to OpenTSDB is handled by simply dropping
samples on the floor when the in-memory queue of samples destined for
OpenTSDB runs full.  Prometheus also only attempts to send samples once,
rather than implementing a complex retry algorithm. Thus, replication to
OpenTSDB is best-effort for now.  If needed, this may be extended in the
future.

3)
Samples are sent in batches of limited size to OpenTSDB. The optimal
batch size, timeout parameters, etc. may need to be adjusted in the
future.

4)
OpenTSDB has different rules for legal characters in tag (label) values.
While Prometheus allows any characters in label values, OpenTSDB limits
them to a to z, A to Z, 0 to 9, -, _, . and /. Currently any illegal
characters in Prometheus label values are simply replaced by an
underscore. Especially when integrating OpenTSDB with the read path in
Prometheus, we'll need to reconsider this: either we'll need to
introduce the same limitations for Prometheus labels or escape/encode
illegal characters in OpenTSDB in such a way that they are fully
decodable again when reading through Prometheus, so that corresponding
timeseries in both systems match in their labelsets.

Change-Id: I8394c9c55dbac3946a0fa497f566d5e6e2d600b5
2014-01-02 18:21:38 +01:00
Stuart Nelson 0c58e388f6 rename curation metrics to prometheus_curation
Change-Id: I6a0bf277e88ea8eb737670b7e865ae20f2cbfb91
2013-12-13 17:45:01 -05:00
Stuart Nelson 28f59edf16 Added telemetry for counting stored samples
Change-Id: I0f36f7c2738d070ca2f107fcb315f98e46803af3
2013-12-12 10:06:41 -05:00
Tobias Schmidt 6947ee9bc9 Try to create metrics root directory if missing
This change tries to be nice and create the metrics directoy first
before erroring out.

Change-Id: I72691cdc32469708cd671c6ef1fb7db55fe60430
2013-12-03 18:16:13 +07:00
Julius Volz 740d448983 Use custom timestamp type for sample timestamps and related code.
So far we've been using Go's native time.Time for anything related to sample
timestamps. Since the range of time.Time is much bigger than what we need, this
has created two problems:

- there could be time.Time values which were out of the range/precision of the
  time type that we persist to disk, therefore causing incorrectly ordered keys.
  One bug caused by this was:

  https://github.com/prometheus/prometheus/issues/367

  It would be good to use a timestamp type that's more closely aligned with
  what the underlying storage supports.

- sizeof(time.Time) is 192, while Prometheus should be ok with a single 64-bit
  Unix timestamp (possibly even a 32-bit one). Since we store samples in large
  numbers, this seriously affects memory usage. Furthermore, copying/working
  with the data will be faster if it's smaller.

*MEMORY USAGE RESULTS*
Initial memory usage comparisons for a running Prometheus with 1 timeseries and
100,000 samples show roughly a 13% decrease in total (VIRT) memory usage. In my
tests, this advantage for some reason decreased a bit the more samples the
timeseries had (to 5-7% for millions of samples). This I can't fully explain,
but perhaps garbage collection issues were involved.

*WHEN TO USE THE NEW TIMESTAMP TYPE*
The new clientmodel.Timestamp type should be used whenever time
calculations are either directly or indirectly related to sample
timestamps.

For example:
- the timestamp of a sample itself
- all kinds of watermarks
- anything that may become or is compared to a sample timestamp (like the timestamp
  passed into Target.Scrape()).

When to still use time.Time:
- for measuring durations/times not related to sample timestamps, like duration
  telemetry exporting, timers that indicate how frequently to execute some
  action, etc.

*NOTE ON OPERATOR OPTIMIZATION TESTS*
We don't use operator optimization code anymore, but it still lives in
the code as dead code. It still has tests, but I couldn't get all of them to
pass with the new timestamp format. I commented out the failing cases for now,
but we should probably remove the dead code soon. I just didn't want to do that
in the same change as this.

Change-Id: I821787414b0debe85c9fffaeb57abd453727af0f
2013-12-03 09:11:28 +01:00
Julius Volz 6b7de31a3c Upgrade to LevelDB 1.14.0 to fix LevelDB bugs.
This tentatively fixes https://github.com/prometheus/prometheus/issues/368 due
to an upstream bugfix in snapshotted LevelDB iterator handling, which got fixed
in LevelDB 1.14.0:

https://code.google.com/p/leveldb/issues/detail?id=200

Change-Id: Ib0cc67b7d3dc33913a1c16736eff32ef702c63bf
2013-12-03 09:07:15 +01:00
Julius Volz db015de65b Comment and "go fmt" fixups in compaction tests.
Change-Id: Iaa0eda6a22a5caa0590bae87ff579f9ace21e80a
2013-10-30 17:06:17 +01:00
Julius Volz 51408bdfe8 Merge changes I3ffeb091,Idffefea4
* changes:
  Add chunk sanity checking to dumper tool.
  Add compaction regression tests.
2013-10-24 13:58:14 +02:00
Julius Volz 2162e57784 Merge "Fix watermarker default time / LevelDB key ordering bug." 2013-10-24 13:57:48 +02:00