Commit graph

962 commits

Author SHA1 Message Date
Julius Volz 6dc36d0c3e Don't keep extra labels in aggregations by default.
MIN/MAX/SUM/AVG/COUNT aggregations will now by default drop all labels that are
not specifically part of a BY-clause, even if a label value is the same within
all timeseries of an aggregation group. The old behavior of keeping extra
labels may still be switched on by adding KEEPING_EXTRA to the end of an
aggregation statement:

  sum(http_requests) by (job, method) keeping_extra

I'm open to better syntax/naming suggestions.

Change-Id: I21d3fe7af9e98552ce3dffa3ce7c0a4ba4c0b4a4
2013-12-16 12:53:10 +01:00
Stuart Nelson 0c58e388f6 rename curation metrics to prometheus_curation
Change-Id: I6a0bf277e88ea8eb737670b7e865ae20f2cbfb91
2013-12-13 17:45:01 -05:00
Julius Volz 20bfaf80ab Merge "Display filename when encountering bad rule file." 2013-12-13 15:01:02 +01:00
Stuart Nelson 28f59edf16 Added telemetry for counting stored samples
Change-Id: I0f36f7c2738d070ca2f107fcb315f98e46803af3
2013-12-12 10:06:41 -05:00
Julius Volz 3bf3a555b2 Merge "add evalDuration histogram and ruleCount counter for rules" 2013-12-11 22:52:19 +01:00
Stuart Nelson b75adfebad add evalDuration histogram and ruleCount counter for rules
Change-Id: I3508fe72526348d96b8158828388c3ac8d7c3fa9
2013-12-11 15:42:53 -05:00
Julius Volz 77a79d1fc0 Display filename when encountering bad rule file.
Change-Id: I4729371be92c5659a6938145c5fde66771d7be22
2013-12-11 15:44:11 +01:00
Julius Volz fb44580110 Cleanup/fix program termination sequence.
Change-Id: I2bc58a2583fb079c9ef383cfc7a5e0fbe613f1cd
2013-12-11 15:40:32 +01:00
Tobias Schmidt 6947ee9bc9 Try to create metrics root directory if missing
This change tries to be nice and create the metrics directoy first
before erroring out.

Change-Id: I72691cdc32469708cd671c6ef1fb7db55fe60430
2013-12-03 18:16:13 +07:00
Tobias Schmidt 4300ce3dc8 Merge "Ensure that job names are unique in parsed configs." 2013-12-03 12:13:03 +01:00
Julius Volz 78ebc1a61f Ensure that job names are unique in parsed configs.
Change-Id: I6bd89e6401bd924315981db797af21bdf0b81252
2013-12-03 12:10:22 +01:00
Julius Volz 436f3df0e8 Merge "Add note that pbcopy is only available in OSX" 2013-12-03 12:08:55 +01:00
Tobias Schmidt ee7f81b665 Add note that pbcopy is only available in OSX
Change-Id: I4eda3a5a9117b5021fbc6e3625afa01100c39fa6
2013-12-03 18:06:04 +07:00
Julius Volz 740d448983 Use custom timestamp type for sample timestamps and related code.
So far we've been using Go's native time.Time for anything related to sample
timestamps. Since the range of time.Time is much bigger than what we need, this
has created two problems:

- there could be time.Time values which were out of the range/precision of the
  time type that we persist to disk, therefore causing incorrectly ordered keys.
  One bug caused by this was:

  https://github.com/prometheus/prometheus/issues/367

  It would be good to use a timestamp type that's more closely aligned with
  what the underlying storage supports.

- sizeof(time.Time) is 192, while Prometheus should be ok with a single 64-bit
  Unix timestamp (possibly even a 32-bit one). Since we store samples in large
  numbers, this seriously affects memory usage. Furthermore, copying/working
  with the data will be faster if it's smaller.

*MEMORY USAGE RESULTS*
Initial memory usage comparisons for a running Prometheus with 1 timeseries and
100,000 samples show roughly a 13% decrease in total (VIRT) memory usage. In my
tests, this advantage for some reason decreased a bit the more samples the
timeseries had (to 5-7% for millions of samples). This I can't fully explain,
but perhaps garbage collection issues were involved.

*WHEN TO USE THE NEW TIMESTAMP TYPE*
The new clientmodel.Timestamp type should be used whenever time
calculations are either directly or indirectly related to sample
timestamps.

For example:
- the timestamp of a sample itself
- all kinds of watermarks
- anything that may become or is compared to a sample timestamp (like the timestamp
  passed into Target.Scrape()).

When to still use time.Time:
- for measuring durations/times not related to sample timestamps, like duration
  telemetry exporting, timers that indicate how frequently to execute some
  action, etc.

*NOTE ON OPERATOR OPTIMIZATION TESTS*
We don't use operator optimization code anymore, but it still lives in
the code as dead code. It still has tests, but I couldn't get all of them to
pass with the new timestamp format. I commented out the failing cases for now,
but we should probably remove the dead code soon. I just didn't want to do that
in the same change as this.

Change-Id: I821787414b0debe85c9fffaeb57abd453727af0f
2013-12-03 09:11:28 +01:00
Julius Volz 6b7de31a3c Upgrade to LevelDB 1.14.0 to fix LevelDB bugs.
This tentatively fixes https://github.com/prometheus/prometheus/issues/368 due
to an upstream bugfix in snapshotted LevelDB iterator handling, which got fixed
in LevelDB 1.14.0:

https://code.google.com/p/leveldb/issues/detail?id=200

Change-Id: Ib0cc67b7d3dc33913a1c16736eff32ef702c63bf
2013-12-03 09:07:15 +01:00
Julius Volz db015de65b Comment and "go fmt" fixups in compaction tests.
Change-Id: Iaa0eda6a22a5caa0590bae87ff579f9ace21e80a
2013-10-30 17:06:17 +01:00
Johannes 'fish' Ziemke 8c08a5031f Add search domain support to SRV lookups
This adds search domain support by trying to resolve a name by
appending each search domain configured in /etc/resolv.conf until
the query succeeds (NOERROR) and has at least one answer.

Change-Id: Ibdc5138c5d8cc049e11fab90c3d5243d5a06852c
2013-10-29 17:19:49 +01:00
Julius Volz 39417f93ee Merge "Remove usage of gorest." 2013-10-28 10:29:33 +01:00
Julius Volz fceef4137c Fix /metrics endpoint in sample config.
Change-Id: I2daca6a31f536b87aa8e49a2190787ad9d803595
2013-10-28 08:03:58 +01:00
Julius Volz 51408bdfe8 Merge changes I3ffeb091,Idffefea4
* changes:
  Add chunk sanity checking to dumper tool.
  Add compaction regression tests.
2013-10-24 13:58:14 +02:00
Julius Volz 2162e57784 Merge "Fix watermarker default time / LevelDB key ordering bug." 2013-10-24 13:57:48 +02:00
Julius Volz 5e18255920 Merge "Fix chunk corruption compaction bug." 2013-10-24 13:57:31 +02:00
Julius Volz 6f6f56021a Merge changes I53a24c06,Ibe1def5c,Ife68c9c6,Ia3284a90
* changes:
  fix link to CONTRIBUTING.md in README.md
  moved CONTRIBUTING.md to top of repo; link to CONTRIBUTING.md in README.md
  change double quotes to backticks for md awesomeness
  add contributing.md
2013-10-24 13:03:10 +02:00
Julius Volz b70d5ca143 Merge changes I76203973,I38646c2b
* changes:
  More updates for first time users.
  Update example config file from json to new protobuf format.
2013-10-24 12:45:55 +02:00
Julius Volz 98007b8289 Merge "Add a check for metrics directory existence." 2013-10-24 12:42:25 +02:00
Stuart Nelson 1e357cf859 fix link to CONTRIBUTING.md in README.md
Change-Id: I53a24c061d0610a9c4b3c515c7d5ba7c04ae9f54
2013-10-23 16:26:39 -04:00
Stuart Nelson 28b055554f moved CONTRIBUTING.md to top of repo; link to CONTRIBUTING.md in README.md
Change-Id: Ibe1def5c0c5e1e7f6eb0da344badc53d18f2ecb3
2013-10-23 16:21:35 -04:00
Stuart Nelson dd2b5e0e1c change double quotes to backticks for md awesomeness
Change-Id: Ife68c9c67d36ffec24927176ab519f7cb08976a8
2013-10-23 10:16:25 -04:00
Stuart Nelson af5114d81e add contributing.md
Change-Id: Ia3284a90dfbbaaf655facd885a8ef13858bdb2c9
2013-10-23 10:11:43 -04:00
Conor Hennessy eba01d1119 Remove usage of gorest.
Due to on going issues, we've decided to remove gorest. It started with gorest
not being thread-safe (it does introspection to create a new handler which is
an easy process to mess up with multiple threads of execution):
    https://code.google.com/p/gorest/issues/detail?id=15
While the issue has been marked fixed, it looks like the patch has introduced
more problems than the original issue and simply doesn't work properly.
I'm not sure the behaviour was thought through properly. If a new instance is
needed every request then a handler-factory is needed or the library needs to
set expectations about how the new objects should interact with their
constructor state.
While it was tempting to try out another routing library, I think for now
it's better to use dumb vanilla Go routing. At least until we decide which
URL format we intend to standardize on.

Change-Id: Ica3da135d05f8ab8fc206f51eeca4f684f8efa0e
2013-10-23 14:19:14 +02:00
Stuart Nelson 72b861bebb remove duplicate users word from README
Change-Id: I3a9c84f16731c76f957155e58d05beda26505924
2013-10-22 23:25:08 -04:00
Julius Volz eb461a707d Add chunk sanity checking to dumper tool.
Also, move codecs/filters to common location so they can be used in subsequent
test.

Change-Id: I3ffeb09188b8f4552e42683cbc9279645f45b32e
2013-10-23 01:06:49 +02:00
Julius Volz 6ea22f2bf9 Add compaction regression tests.
This adds regression tests that catch the two error cases reported in

  https://github.com/prometheus/prometheus/issues/367

It also adds a commented-out test case for the crash in

  https://github.com/prometheus/prometheus/issues/368

but there's no fix for the latter crash yet.

Change-Id: Idffefea4ed7cc281caae660bcad2e3c13ec3bd17
2013-10-23 01:06:28 +02:00
Conor Hennessy aada5ded85 Replace some uses of obsolete /metrics.json with /metrics
(haven't touched test files yet).

Change-Id: I48c7c0cf27a39d596627a06cbb4f5913fb3da13c
2013-10-22 20:54:43 +02:00
Conor Hennessy 2d2c434d48 More updates for first time users.
- Modified sample conf so it is useable by default, also added some
      comments from the 'hello world' configuration.
    - Updated README so there's a clear two step start for newbies.
    - Added extra vim swap files to gitignore.

Change-Id: I76203973db4a7b332014662fcfb2ce5e7d137bd8
2013-10-22 20:54:43 +02:00
Conor Hennessy 986adfa557 Update example config file from json to new protobuf format.
Change-Id: I38646c2be53b6993abe464d9cdd9b211678de496
2013-10-22 20:54:43 +02:00
Conor Hennessy 9a48010cec Add a check for metrics directory existence.
Previously on startup the program would just quit without stating
explicitly why.

Change-Id: I833b85eb74d2dd27cdc3f0f2e65d7bb1c42caa39
2013-10-22 20:54:34 +02:00
Julius Volz b5f6e3c90c Fix watermarker default time / LevelDB key ordering bug.
This fixes part 2) of https://github.com/prometheus/prometheus/issues/367
(uninitialized time.Time mapping to a higher LevelDB key than "normal"
timestamps).

Change-Id: Ib079974110a7b7c4757948f81fc47d3d29ae43c9
2013-10-21 14:32:21 +02:00
Julius Volz a1a97ed064 Fix chunk corruption compaction bug.
This fixes part 1) of https://github.com/prometheus/prometheus/issues/367 (the
storing of samples with the wrong fingerprint into a compacted chunk, thus
corrupting it).

Change-Id: I4c36d0d2e508e37a0aba90b8ca2ecc78ee03e3f1
2013-10-21 14:30:22 +02:00
Julius Volz a50ee8df30 Always set CORS headers at beginning of API handler.
Change-Id: Icde9a74260c4bb919f09c3e10c6dd5f372ccdaec
2013-10-16 15:59:47 +02:00
Julius Volz c7daedc840 Merge "Add scalar() function." 2013-10-16 15:49:54 +02:00
Conor Hennessy 2d26db7e37 Trivial regeneration of config proto (wasn't regenerated with new comments).
Change-Id: Iaa97c8b96b40b7f13884e85a39d5a229c2d33f37
2013-10-10 20:58:30 +02:00
Julius Volz be8024e18c Add scalar() function.
Change-Id: I1d1183e926a18fc98c9e94bbb9a808a3fb313102
2013-09-17 15:01:16 +02:00
Julius Volz 3ee34fa29b Merge "Revert "Revert "Merge pull request #317 from prometheus/fix/miekg-dns-for-srv""" 2013-09-11 10:21:25 +02:00
Julius Volz 274934bcd3 Revert "Revert "Merge pull request #317 from prometheus/fix/miekg-dns-for-srv""
This reverts commit 88099328d1.

Change-Id: I7bf74de5fda458e2e6f9eea2eacd0e256f95bdee
2013-09-10 17:48:05 +02:00
Matt T. Proud d91a40c3cf Merge "Retain DTO on each cycle." 2013-09-05 11:11:35 +02:00
Matt T. Proud 86fcbe5bde Retain DTO on each cycle.
Change-Id: Ifc6f68f98eacb01097771d0dbf043c98bba1d518
2013-09-05 10:14:34 +02:00
Matt Proud 189b3a2eee Merge "Fix arguments for format string." 2013-09-04 18:39:07 +02:00
Matt T. Proud df2f9e47b8 Fix arguments for format string.
Change-Id: Ie0d55a70969c992b1afc6fa96b0e3f2171f0de5a
2013-09-04 18:38:29 +02:00
Matt Proud 434675efc3 Merge "Remove unused import." 2013-09-04 18:15:12 +02:00