Commit graph

11131 commits

Author SHA1 Message Date
Bjoern Rabenstein 0ae1d8889a Fix tests after merge.
Change-Id: Ia90da9a3e48ed780ec38c4a6a1fd9ea34e7f6a58
2014-11-25 17:13:04 +01:00
Julius Volz b7bf11230a Add absent() function.
A common problem in Prometheus alerting is to detect when no timeseries
exist for a given metric name and label combination. Unfortunately,
Prometheus alert expressions need to be of vector type, and
"count(nonexistent_metric)" results in an empty vector, yielding no
output vector elements to base an alert on. The newly introduced
absent() function solves this issue:

  ALERT FooAbsent IF absent(foo{job="myjob"}) [...]

absent() has the following behavior:

- if the vector passed to it has any elements, it returns an empty
  vector.

- if the vector passed to it has no elements, it returns a 1-element
  vector with the value 1.

In the second case, absent() tries to be smart about deriving labels of
the 1-element output vector from the input vector:

  absent(nonexistent{job="myjob"})                   => {job="myjob"}
  absent(nonexistent{job="myjob",instance=~".*"})    => {job="myjob"}
  absent(sum(nonexistent{job="myjob"}))              => {}

That is, if the passed vector is a literal vector selector, it takes all
"=" label matchers as the basis for the output labels, but ignores all
non-equals or regex matchers. Also, if the passed vector results from a
non-selector expression, no labels can be derived.

Change-Id: I948505a1488d50265ab5692a3286bd7c8c70cd78
2014-11-25 17:13:04 +01:00
Julius Volz 3d47f94149 Drop metric names after transformations.
After many transformations, it doesn't make sense to keep the metric
names, since the result of the transformation is no longer that metric.
This drops the metric name after such transformations and makes the web
UI deal well with missing metric names.

This depends on the current branch on the following things:

- prometheus/client_golang needs to be at
  e237cf15c6
  in branch "julius/int-fingerprints" (to be merged with new storage)

- prometheus/promdash needs to be at
  dd7691c9c2

Change-Id: Ib3c8cad8d647d9854e8c653c424b8c235ccc231d
2014-11-25 17:13:04 +01:00
Bjoern Rabenstein 53c0a43754 Update README.md.
Change-Id: Ife51ed266333ff8bc228260187ea4bcf377bde31
2014-11-25 17:13:04 +01:00
Bjoern Rabenstein 14bda4180c Changes after pair code review.
Change-Id: Ib72d40f8e9027818cfbbd32a7a7201eebda07455
2014-11-25 17:12:59 +01:00
Bjoern Rabenstein a2feed343a Convert another occurrence from chan bool to chan struct{}.
Change-Id: I11ba127a934ee3aec0fcd139ad32a7751cff77a0
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein 006b5517e2 Simplify makefiles.
This removes the dependancy on C leveldb and snappy.
It also takes care of fewer dependencies as they would
anyway not work on any non-Debian, non-Brew system.

Change-Id: Ia70dce1ba8a816a003587927e0b3a3f8ad2fd28c
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein 9ea808cd8b Remove debug log line.
Change-Id: Icdd2351b89f2d37ac2b615f9cf872e054c694ad1
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein bb42cc2e2d Evict based on memory pressure. Evict recently used chunks last.
Change-Id: Ie6168f0cdb3917bdc63b6fe15585dd70c1e42afe
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein e23ee0f7cc Fix race in test.
Change-Id: I53e1a4c5a6b5f846acd76043166b6cb7bf7d5dc7
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein d73e851b14 Tweak timing in the maintenance loop.
Change-Id: I9801c4f9a22c3b3dc1ce1af81fdd9e992a4f4dd7
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein 2672aa8ece Instrument series maintenance.
Change-Id: Ie4269d07ad4d23d44230c95a523088b472718e54
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein 371445f4da Fix typo in comment.
Change-Id: Ib3ee493bb29e7e9967214627e543baedb897ab67
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein 74c143c4c9 Improve scraper shutdown time.
- Stop target pools in parallel.
- Stop individual scrapers in goroutines, too.
- Timing tweaks.

Change-Id: I9dff1ee18616694f14b04408eaf1625d0f989696
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein 92156ee89d Drain the newBaseLabels channel upon shutdown.
This should help cut down shutdown times.

Change-Id: I6e70a598a9e49aa6eeeb2034105b1bc6e9014324
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein 3f61d304ce Reorganize maintenance loop.
Change-Id: Iac10f988ba3e93ffb188f49c30f92e0b6adce5a3
2014-11-25 17:10:30 +01:00
Bjoern Rabenstein a5f56639b8 Instrument unwritten samples queue.
Change-Id: Id77387314d340a5118490cf08e7bbc37c7366b25
2014-11-25 17:10:30 +01:00
Bjoern Rabenstein c087ee35f7 Remove archiveMtx.
Change-Id: Ie8019f860bbda68621f74380c90a4e57930d3d7a
2014-11-25 17:10:30 +01:00
Bjoern Rabenstein 7af42eda65 Optimize purging.
Now only purge if there is something to purge.
Also, set savedFirstTime and archived time range appropriately.
(Which is needed for the optimization.)

Change-Id: Idcd33319a84def3ce0318d886f10c6800369e7f9
2014-11-25 17:10:30 +01:00
Bjoern Rabenstein 33b959b898 Persist savedFirstTime in checkpoint.
Change-Id: Ibdfdea16fad0608ec104fbccc749e824a171f227
2014-11-25 17:10:30 +01:00
Bjoern Rabenstein 904acd43da Add crash recovery.
Fix the behavior if preload for non-existent series is requested.

Instead of returning an error (which triggers a panic further up),
simply count those incidents. They can happen regularly, we just want
to know if they happen too frequently because that would mean the
indexing is behind or broken.

Change-Id: I4b2d1b93c4146eeea897d188063cb9574a270f8b
2014-11-25 17:09:43 +01:00
Bjoern Rabenstein 7a9efc9c59 Fix typo in test.
Change-Id: I3c2fd76bc5f50446c58f8ef693d9c6595197feaa
2014-11-25 17:09:43 +01:00
Bjoern Rabenstein 4efc60174b Tweak and verify a few parameters.
Remove TODOs accordingly.

Change-Id: Ic062e13b6ae89a9135d3f14011114fe1cca1cef8
2014-11-25 17:09:43 +01:00
Bjoern Rabenstein 5f8e9617ef Add more tests.
Add an end-to-end fuzz and race test.

Fix a race exposed by the above.

Change-Id: Ifaa39a90cefbde8d4c29bda197cc92592ded21bb
2014-11-25 17:09:17 +01:00
Bjoern Rabenstein d215e013b7 Fix the weird chunkDesc shuffling bug.
The root cause was that after chunkDesc eviction, the offset between
memory representation of chunk layout (via chunkDescs in memory) was
shiftet against chunks as layed out on disk. Keeping the offset up to
date is by no means trivial, so this commit is pretty involved.

Also, found a race that for some reason didn't bite us so far:
Persisting chunks was completel unlocked, so if chunks were purged on
disk at the same time, disaster would strike. However, locking the
persisting of chunk revealed interesting dead locks. Basically, never
queue under the fp lock.

Change-Id: I1ea9e4e71024cabbc1f9601b28e74db0c5c55db8
2014-11-25 17:09:17 +01:00
Bjoern Rabenstein a617269b12 Avoid unnecessary cloning of the head chunk.
Change-Id: I5da774515d5493166a197b5814d0a720628cfaff
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein f1de5b0c4e Run checkpointing of in-memory metrics and head chunks periodically.
Checkpointing interval is now a command line flag.

Along the way, several things were refactored.
- Restructure the way the storage is started and stopped..
- Number of series in checkpoint is now a uint64, not a varint.
  (Breaks old checkpoints, needs wipe!)
- More consistent naming and order of methods.

Change-Id: I883d9170c9a608ee716bb0ab3d0ded8ca03760d9
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 74c9b34a5e Improve storage instrumentation even more.
Add gauge for chunks and chunkdescs in memory (backed by a global
variable to be used later not only for instrumentation but also for
memory management).

Refactored instrumentation code once more (instrumentation.go is back :).

Change-Id: Ife39947e22a48cac4982db7369c231947f446e17
2014-11-25 17:09:04 +01:00
Julius Volz c3fcea45e3 Support finer time resolutions than 1 second.
Change-Id: I4c5f1d6d2361e841999b23283d1961b1bd0c2859
2014-11-25 17:09:04 +01:00
Julius Volz 0712d738d1 Allow alternative "by"-clause position in grammar.
In addition to the existing by-clause syntax:

  sum(<expression>) by (<labels>) [keeping_extra]

...this allows the following new syntax:

  sum by (<labels>) [keeping_extra] (<expression>)

Both orderings may be used in a single expression. It is up to the users
to establish guidelines around their usage.

Change-Id: Iba10c9cc5fb6ac62edfcf246d281473e82467992
2014-11-25 17:09:04 +01:00
Brian Brazil f114bbd4e7 Make query_range more robust.
Gracefully handle decimal values, by truncating them.

Limit amount of steps, to avoid accidentally pulling too much data.
This limit returns up to ~500kB per timeseries, and allows
for 60s granularity for a week and 1h granularity for a year.

Change-Id: Ie549fc24deb2eecbc6c5d1b6088a548a6b02e849
2014-11-25 17:09:04 +01:00
Brian Brazil 75e37db55b Don't alert() when a query is aborted,
such as when you change the range.

Change-Id: I574504f97446ac5f3dda737fe054ae83f17dbbc2
2014-11-25 17:09:04 +01:00
Julius Volz 0e48c18bbf Allow omitting the metric name in queries.
This allows the following expression syntaxes for selecting timeseries:

  foo                    (already valid before)
  foo{}                  (already valid before)
  {job="prometheus"}     (new, select all timeseries for job "prometheus")

Omitting both the metric name *and* any label matchers ("" or "{}") will
still yield a syntax error.

To get all timeseries, you could do:

    {__name__=~".*"}
       or, without relying on knowledge about __metric__:
    {job=~".*"}

Change-Id: Ifee000b9ac0184ef6ced18411069c7f2699a2dda
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 443dd33805 Improve instrumentation in storage.
Also, fix some other minor bugs.

Change-Id: If72f1c058b0f47d3e378fdf80228d7e9a8db06c7
2014-11-25 17:09:04 +01:00
Julius Volz 351e66c5d2 Remove obsolete "coding" directory.
Change-Id: I9e71098fd526336b4af7f7eab6d2a43504079658
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 1936a40e75 Minor loging improvement.
Change-Id: I7875d1a58ef9c5ff149f18e36f65959a4712fea2
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 192bf52c41 Evict chunkDescs, too.
Change-Id: I8b70f22fbf1dfcbc49f9ec391985144649e6ce9c
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 95f392fb2c Prevent an indexing death spiral.
Change-Id: I86b20cd0830d02f87b2f020767257e2d3fb2033c
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 40354eaa29 Reduce directory depth by one.
Change-Id: I7f89df61135ff19169ed97633a662685d414c448
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 096fa0f8b2 Squash a number of TODOs.
- Staleness delta is no a proper function parameter and not replicated
  from package ast.

- Named type 'chunks' replaced by explicit '[]chunk' to avoid confusion.

- For the same reason, replaced 'chunkDescs' by '[]*chunkDescs'.

- Verified that math.Modf is not a speed enhancement over conversion
  (actually 5x slower).

- Renamed firstTimeField, lastTimeField into chunkFirstTime and
  chunkLastTime.

- Verified unpin() is sufficiently goroutine-safe.

- Decided not to update archivedFingerprintToTimeRange upon series
  truncation and added a rationale why.

Change-Id: I863b8d785e5ad9f71eb63e229845eacf1bed8534
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 427c8d53a5 Fix handling of empty chunkDescs while preloading chunks.
Change-Id: I73ce89fe0ef90c6eda78218e5be2cbfa0207c364
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein ecee5d8281 Fix head chunk persisting and a chunkDesc race condition.
- Head chunk persisting only happens in evictOlderThan, so do it
  there.  (With the previous code, it would never happen.)

- Raw accesses to chunkDesc.chunk are now done via isEvicted (with
  locking).

Change-Id: I48b07b56dfea4899b50df159b4ea566954396fcd
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 9c3ecc2134 Remove unused flags.
Change-Id: Ie1bcbb0743d65e92072628811706d49753023205
2014-11-25 17:09:03 +01:00
Bjoern Rabenstein 6b37e47f9e Remove unused metrics.
Change-Id: Icf03ba4ce92a5e38daf12930f9661daba79c83bb
2014-11-25 17:09:03 +01:00
Bjoern Rabenstein 2b4ff620aa Return a nop iterator for series that have been purged completely.
Change-Id: I6e92cac4472486feefdecba8593c17867e8c710d
2014-11-25 17:09:03 +01:00
Bjoern Rabenstein 6e3a366f91 Only archive a time series when none of its chunks is pinned.
Change-Id: I7e4b67c34b417b8980173bc5dc3b213bd7d698e5
2014-11-25 17:09:03 +01:00
Julius Volz bfa64248b7 Deal with missing series in preloading.
Change-Id: Ibf3a57b329f40a3d5e0b98464a2f45d2f1bd07bf
2014-11-25 17:09:03 +01:00
Bjoern Rabenstein ca42a22e20 Add safety panic to seriesMap.put.
Change-Id: I4d4d2e45cc0f908a33eb1ae6e3ee6796adfcbd1e
2014-11-25 17:09:03 +01:00
Bjoern Rabenstein 83b4fa868d Fix GetBoundaryValues.
Change-Id: I8f8bbdb88e9b24e4c37ff869126ed9343f261ce2
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein 4fc8ad6677 Fix retrieval unit tests.
Change-Id: I299b71406b59539230e5182ccc37bc8a83af60b3
2014-11-25 17:08:45 +01:00