Commit graph

399 commits

Author SHA1 Message Date
Bjoern Rabenstein 14bda4180c Changes after pair code review.
Change-Id: Ib72d40f8e9027818cfbbd32a7a7201eebda07455
2014-11-25 17:12:59 +01:00
Bjoern Rabenstein 9ea808cd8b Remove debug log line.
Change-Id: Icdd2351b89f2d37ac2b615f9cf872e054c694ad1
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein bb42cc2e2d Evict based on memory pressure. Evict recently used chunks last.
Change-Id: Ie6168f0cdb3917bdc63b6fe15585dd70c1e42afe
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein e23ee0f7cc Fix race in test.
Change-Id: I53e1a4c5a6b5f846acd76043166b6cb7bf7d5dc7
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein d73e851b14 Tweak timing in the maintenance loop.
Change-Id: I9801c4f9a22c3b3dc1ce1af81fdd9e992a4f4dd7
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein 2672aa8ece Instrument series maintenance.
Change-Id: Ie4269d07ad4d23d44230c95a523088b472718e54
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein 74c143c4c9 Improve scraper shutdown time.
- Stop target pools in parallel.
- Stop individual scrapers in goroutines, too.
- Timing tweaks.

Change-Id: I9dff1ee18616694f14b04408eaf1625d0f989696
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein 3f61d304ce Reorganize maintenance loop.
Change-Id: Iac10f988ba3e93ffb188f49c30f92e0b6adce5a3
2014-11-25 17:10:30 +01:00
Bjoern Rabenstein c087ee35f7 Remove archiveMtx.
Change-Id: Ie8019f860bbda68621f74380c90a4e57930d3d7a
2014-11-25 17:10:30 +01:00
Bjoern Rabenstein 7af42eda65 Optimize purging.
Now only purge if there is something to purge.
Also, set savedFirstTime and archived time range appropriately.
(Which is needed for the optimization.)

Change-Id: Idcd33319a84def3ce0318d886f10c6800369e7f9
2014-11-25 17:10:30 +01:00
Bjoern Rabenstein 33b959b898 Persist savedFirstTime in checkpoint.
Change-Id: Ibdfdea16fad0608ec104fbccc749e824a171f227
2014-11-25 17:10:30 +01:00
Bjoern Rabenstein 904acd43da Add crash recovery.
Fix the behavior if preload for non-existent series is requested.

Instead of returning an error (which triggers a panic further up),
simply count those incidents. They can happen regularly, we just want
to know if they happen too frequently because that would mean the
indexing is behind or broken.

Change-Id: I4b2d1b93c4146eeea897d188063cb9574a270f8b
2014-11-25 17:09:43 +01:00
Bjoern Rabenstein 7a9efc9c59 Fix typo in test.
Change-Id: I3c2fd76bc5f50446c58f8ef693d9c6595197feaa
2014-11-25 17:09:43 +01:00
Bjoern Rabenstein 4efc60174b Tweak and verify a few parameters.
Remove TODOs accordingly.

Change-Id: Ic062e13b6ae89a9135d3f14011114fe1cca1cef8
2014-11-25 17:09:43 +01:00
Bjoern Rabenstein 5f8e9617ef Add more tests.
Add an end-to-end fuzz and race test.

Fix a race exposed by the above.

Change-Id: Ifaa39a90cefbde8d4c29bda197cc92592ded21bb
2014-11-25 17:09:17 +01:00
Bjoern Rabenstein d215e013b7 Fix the weird chunkDesc shuffling bug.
The root cause was that after chunkDesc eviction, the offset between
memory representation of chunk layout (via chunkDescs in memory) was
shiftet against chunks as layed out on disk. Keeping the offset up to
date is by no means trivial, so this commit is pretty involved.

Also, found a race that for some reason didn't bite us so far:
Persisting chunks was completel unlocked, so if chunks were purged on
disk at the same time, disaster would strike. However, locking the
persisting of chunk revealed interesting dead locks. Basically, never
queue under the fp lock.

Change-Id: I1ea9e4e71024cabbc1f9601b28e74db0c5c55db8
2014-11-25 17:09:17 +01:00
Bjoern Rabenstein a617269b12 Avoid unnecessary cloning of the head chunk.
Change-Id: I5da774515d5493166a197b5814d0a720628cfaff
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein f1de5b0c4e Run checkpointing of in-memory metrics and head chunks periodically.
Checkpointing interval is now a command line flag.

Along the way, several things were refactored.
- Restructure the way the storage is started and stopped..
- Number of series in checkpoint is now a uint64, not a varint.
  (Breaks old checkpoints, needs wipe!)
- More consistent naming and order of methods.

Change-Id: I883d9170c9a608ee716bb0ab3d0ded8ca03760d9
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 74c9b34a5e Improve storage instrumentation even more.
Add gauge for chunks and chunkdescs in memory (backed by a global
variable to be used later not only for instrumentation but also for
memory management).

Refactored instrumentation code once more (instrumentation.go is back :).

Change-Id: Ife39947e22a48cac4982db7369c231947f446e17
2014-11-25 17:09:04 +01:00
Julius Volz c3fcea45e3 Support finer time resolutions than 1 second.
Change-Id: I4c5f1d6d2361e841999b23283d1961b1bd0c2859
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 443dd33805 Improve instrumentation in storage.
Also, fix some other minor bugs.

Change-Id: If72f1c058b0f47d3e378fdf80228d7e9a8db06c7
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 1936a40e75 Minor loging improvement.
Change-Id: I7875d1a58ef9c5ff149f18e36f65959a4712fea2
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 192bf52c41 Evict chunkDescs, too.
Change-Id: I8b70f22fbf1dfcbc49f9ec391985144649e6ce9c
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 95f392fb2c Prevent an indexing death spiral.
Change-Id: I86b20cd0830d02f87b2f020767257e2d3fb2033c
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 40354eaa29 Reduce directory depth by one.
Change-Id: I7f89df61135ff19169ed97633a662685d414c448
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 096fa0f8b2 Squash a number of TODOs.
- Staleness delta is no a proper function parameter and not replicated
  from package ast.

- Named type 'chunks' replaced by explicit '[]chunk' to avoid confusion.

- For the same reason, replaced 'chunkDescs' by '[]*chunkDescs'.

- Verified that math.Modf is not a speed enhancement over conversion
  (actually 5x slower).

- Renamed firstTimeField, lastTimeField into chunkFirstTime and
  chunkLastTime.

- Verified unpin() is sufficiently goroutine-safe.

- Decided not to update archivedFingerprintToTimeRange upon series
  truncation and added a rationale why.

Change-Id: I863b8d785e5ad9f71eb63e229845eacf1bed8534
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 427c8d53a5 Fix handling of empty chunkDescs while preloading chunks.
Change-Id: I73ce89fe0ef90c6eda78218e5be2cbfa0207c364
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein ecee5d8281 Fix head chunk persisting and a chunkDesc race condition.
- Head chunk persisting only happens in evictOlderThan, so do it
  there.  (With the previous code, it would never happen.)

- Raw accesses to chunkDesc.chunk are now done via isEvicted (with
  locking).

Change-Id: I48b07b56dfea4899b50df159b4ea566954396fcd
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 6b37e47f9e Remove unused metrics.
Change-Id: Icf03ba4ce92a5e38daf12930f9661daba79c83bb
2014-11-25 17:09:03 +01:00
Bjoern Rabenstein 2b4ff620aa Return a nop iterator for series that have been purged completely.
Change-Id: I6e92cac4472486feefdecba8593c17867e8c710d
2014-11-25 17:09:03 +01:00
Bjoern Rabenstein 6e3a366f91 Only archive a time series when none of its chunks is pinned.
Change-Id: I7e4b67c34b417b8980173bc5dc3b213bd7d698e5
2014-11-25 17:09:03 +01:00
Julius Volz bfa64248b7 Deal with missing series in preloading.
Change-Id: Ibf3a57b329f40a3d5e0b98464a2f45d2f1bd07bf
2014-11-25 17:09:03 +01:00
Bjoern Rabenstein ca42a22e20 Add safety panic to seriesMap.put.
Change-Id: I4d4d2e45cc0f908a33eb1ae6e3ee6796adfcbd1e
2014-11-25 17:09:03 +01:00
Bjoern Rabenstein 83b4fa868d Fix GetBoundaryValues.
Change-Id: I8f8bbdb88e9b24e4c37ff869126ed9343f261ce2
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein b3ed9aa7a2 Clean up start-up and shut-down.
Change-Id: Idff4bbb0a15a9f879bfbb3da5b1025179cab5e2c
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein 4447708c9f Fix a race in target.go.
Also, fix problems in shutdown.
Starting serving and shutdown still has to be cleaned up properly.
It's a mess.

Change-Id: I51061db12064e434066446e6fceac32741c4f84c
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein fd6600850a Fix race in chunkDesc.
Change-Id: Id7bae115d75886e10d44184a690a76777b1531fe
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein 1c53c09558 Treat empty chunkDescs properly in preloadChunksForRange.
Change-Id: Ida1bd3fe1f9fb0ea2d5dbb9704be926f0824f873
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein 934d09f738 Fix race during shutdown.
Change-Id: I2f8bf48d92a14f1e5ecde27c1b138734d7653394
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein 38fc24d0ed Fix targetpool_test.go and other tests.
Change-Id: I91a4dd1d39e01f174e1aaae653ce1ed7aecaa624
2014-11-25 17:08:26 +01:00
Julius Volz 7f5d3c2c29 Fix and improve the fp locker.
Benchmark:
$ go test -bench 'Fingerprint' -test.run 'Fingerprint' -test.cpu=1,2,4

OLD
BenchmarkFingerprintLockerParallel        500000              3618 ns/op
BenchmarkFingerprintLockerParallel-2      100000             12257 ns/op
BenchmarkFingerprintLockerParallel-4      500000             10164 ns/op
BenchmarkFingerprintLockerSerial        10000000               283 ns/op
BenchmarkFingerprintLockerSerial-2      10000000               284 ns/op
BenchmarkFingerprintLockerSerial-4      10000000               288 ns/op

NEW
BenchmarkFingerprintLockerParallel       1000000              1018 ns/op
BenchmarkFingerprintLockerParallel-2     1000000              1164 ns/op
BenchmarkFingerprintLockerParallel-4     2000000               910 ns/op
BenchmarkFingerprintLockerSerial        50000000                56.0 ns/op
BenchmarkFingerprintLockerSerial-2      50000000                47.9 ns/op
BenchmarkFingerprintLockerSerial-4      50000000                54.5 ns/op

Change-Id: I3c65a43822840e7e64c3c3cfe759e1de51272581
2014-11-25 17:07:45 +01:00
Bjoern Rabenstein 7ad55ef83c Actually close the iterator channels.
Change-Id: I6f6a2aef5ff55c6b2d21ad91d02ae6b0ecba4ae8
2014-11-25 17:07:45 +01:00
Bjoern Rabenstein 8fba3302bc Bold changes to concurrency.
(WIP. Probably doesn't work yet.)

Change-Id: Id1537dfcca53831a1d428078a5863ece7bdf4875
2014-11-25 17:07:45 +01:00
Bjoern Rabenstein fcdf5a8ee7 Fix bugs in chunk evict code.
Also, simplify code by re-looking up metric in metric map.

Change-Id: Ib2092f9184374e5a543e87d3a9f4a74fda64b193
2014-11-25 17:07:45 +01:00
Bjoern Rabenstein 7e6a03fbf9 Fix a few concurrency issues before starting to use the new fp locker.
Change-Id: I8615e8816e79ef0882e123163ee590c739b79d12
2014-11-25 17:07:45 +01:00
Julius Volz db92620163 Instrument eviction and purge durations.
Change-Id: Ia5b2319363ad2644674c9b7a94162a89bcc296fb
2014-11-25 17:07:45 +01:00
Julius Volz e0ee7ec7ab Add fingerprintLocker for locking individual fingerprints.
Change-Id: Id41ba555715229edf7d6543f56736b82f6eff1ef
2014-11-25 17:07:45 +01:00
Julius Volz df1b2a2422 Fix indexing latency instrumentation.
Change-Id: I532c170121cd2996d1a378adbb1fd551cd5a4e38
2014-11-25 17:07:44 +01:00
Bjoern Rabenstein 01dd618a20 Fix a locking bug.
Change-Id: I183780785991d0b4165ce9186f53eb8201fb3ed5
2014-11-25 17:07:44 +01:00
Julius Volz a746fbb8bc Instrument indexing: queue length, batch sizes and latencies.
Change-Id: I60bcbd24b160e47d418a485d8cffa39344a257c6
2014-11-25 17:07:44 +01:00