Commit graph

10024 commits

Author SHA1 Message Date
beorn7 9e7c3e3bcd Add the histogram_quantile function.
Since we are now getting really deep into floating point calculation,
the tests had to take into account the precision loss. Since the rule
tests are based on direct line matching in the output, implementing
the "almost equal" semantics was pretty cumbersome, but here we are.
2015-02-22 01:04:51 +01:00
Julius Volz 452c88964a Merge pull request #546 from prometheus/fix-label-grouping
Fix aggregation grouping key calculation.
2015-02-21 17:55:26 +01:00
Julius Volz 42601acfde Replace labelsToKey() with metric Fingerprint (fixes grouping bug). 2015-02-21 17:45:47 +01:00
Julius Volz 7fefccd929 Write() directly into hash and use model.SeparatorByte. 2015-02-21 17:19:13 +01:00
Julius Volz 645cf57bed Fix aggregation grouping key calculation. 2015-02-21 14:05:50 +01:00
Julius Volz 79834edcb5 Merge pull request #544 from q3k/feature/hide-instance-auth
Hide HTTP auth parts from instance label
2015-02-20 17:34:04 +01:00
Sergiusz 'q3k' Bazański 0d0bb3c030 Change instance identifiers to be host:port
This changes the PublicURL function into InstanceIdentifier, which now
returns a simple <host>:<port> string instead of a full URL.
2015-02-20 16:21:13 +01:00
Sergiusz 'q3k' Bazański bb69a3d284 Hide HTTP auth parts from URL
This  instroduces an extra function in the Target interface (PublicURL)
which is used to populate the instance field in scraped metrics.
2015-02-19 18:58:47 +01:00
Brian Brazil 79bf5a278e Merge pull request #543 from kormat/cpucount
Add cpu count to rhs table on node cpu console
2015-02-19 11:19:44 +00:00
Stephen Shirley 05a746bf95 Add cpu count to rhs table
Also fix formatting of bounded values to be more readable.
2015-02-19 12:07:52 +01:00
Julius Volz 2dd70b8c13 Merge pull request #542 from prometheus/I-cant-believe-its-not-federation
Proof of concept for federation via console templates.
2015-02-19 00:23:39 +01:00
Brian Brazil 5adcec3018 Proof of concept for federation via console templates.
This exposes samples via the console templates in the
text exposition format, which the parser will fall back to.

This is not a proper federation solution, but should tide us
over for now. Extenions could include passing in a query or queries in
the url parameters.
2015-02-18 23:21:33 +00:00
Julius Volz c069c0dafa Merge pull request #536 from prometheus/offset
Implement offset operator.
2015-02-18 14:48:56 +01:00
Brian Brazil 40b0a1ac0d Merge pull request #537 from kormat/master
Fix available memory calculation.
2015-02-18 10:36:08 +00:00
Stephen Shirley fbcbb6a635 Fix available memory calculation.
Also account for buffers, making the value match the output of free(1)
2015-02-18 11:26:29 +01:00
Julius Volz 15b2b5aa66 Add tests for invalid uses of "offset". 2015-02-18 02:56:40 +01:00
Julius Volz 67e20acc6c Lower-case some package-internal names. 2015-02-18 02:45:54 +01:00
Julius Volz 72d7b325a1 Implement offset operator.
This allows changing the time offset for individual instant and range
vectors in a query.

For example, this returns the value of `foo` 5 minutes in the past
relative to the current query evaluation time:

    foo offset 5m

Note that the `offset` modifier always needs to follow the selector
immediately. I.e. the following would be correct:

    sum(foo offset 5m) // GOOD.

While the following would be *incorrect*:

    sum(foo) offset 5m // INVALID.

The same works for range vectors. This returns the 5-minutes-rate that
`foo` had a week ago:

    rate(foo[5m] offset 1w)

This change touches the following components:

* Lexer/parser: additions to correctly parse the new `offset`/`OFFSET`
  keyword.
* AST: vector and matrix nodes now have an additional `offset` field.
  This is used during their evaluation to adjust query and result times
  appropriately.
* Query analyzer: now works on separate sets of ranges and instants per
  offset. Isolating different offsets from each other completely in this
  way keeps the preloading code relatively simple.

No storage engine changes were needed by this change.

The rules tests have been changed to not probe the internal
implementation details of the query analyzer anymore (how many instants
and ranges have been preloaded). This would also become too cumbersome
to test with the new model, and measuring the result of the query should
be sufficient.

This fixes https://github.com/prometheus/prometheus/issues/529
This fixed https://github.com/prometheus/promdash/issues/201
2015-02-18 02:41:27 +01:00
Julius Volz 79a4a6d8e8 Merge pull request #532 from prometheus/beorn7/ingestion-tweaks
Improve persisting chunks to disk.
2015-02-17 17:31:00 +01:00
Julius Volz b0c8b56603 Merge pull request #527 from prometheus/nogodep-build
Copy vendored deps manually instead of using Godeps.
2015-02-17 17:17:45 +01:00
beorn7 af91fb8e31 Improve persisting chunks to disk.
This is done by bucketing chunks by fingerprint. If the persisting to
disk falls behind, more and more chunks are in the queue. As soon as
there are "double hits", we will now persist both chunks in one go,
doubling the disk throughput (assuming it is limited by disk
seeks). Should even more pile up so that we end wit "triple hits", we
will persist those first, and so on.

Even if we have millions of time series, this will still help,
assuming not all of them are growing with the same speed. Series that
get many samples and/or are not very compressable will accumulate
chunks faster, and they will soon get double- or triple-writes.

To improve the chance of double writes,
-storage.local.persistence-queue-capacity could be set to a higher
value. However, that will slow down shutdown a lot (as the queue has
to be worked through). So we leave it to the user to set it to a
really high value. A more fundamental solution would be to checkpoint
not only head chunks, but also chunks still in the persist queue. That
would be quite complicated for a rather limited use-case (running many
time series with high ingestion rate on slow spinning disks).
2015-02-17 16:02:09 +01:00
Julius Volz 13048b7468 Simplify GOPATH/dependency setup. 2015-02-17 02:20:16 +01:00
Julius Volz 464524fa44 Make config/Makefile use user Go installation.
Otherwise it will fail to "go get" the "github.com/golang/protobuf"
package because its dir already exists in the vendored packages,
but the copy isn't a git repository.
2015-02-17 02:20:16 +01:00
Julius Volz 5643587cc9 Also vendor Prometheus-internal deps and update existing ones. 2015-02-17 02:20:12 +01:00
Julius Volz b85c72bc50 Get rid of unnecessary tabs in Makefile.INCLUDE. 2015-02-17 02:08:56 +01:00
Julius Volz 740a99a9ac Remove Google Code files, add GitHub files for protobuf lib. 2015-02-17 02:08:56 +01:00
Julius Volz af627bb2b9 Copy vendored deps manually instead of using Godeps.
We were using Godep incorrectly (cloning repos from the internet during
build time instead of including Godeps/_workspace in the GOPATH via
"godep go"). However, to avoid even having to fetch "godeps" from the
internet during build, this now just copies the vendored files into the
GOPATH.

Also, the protocol buffer library moved from Google Code to GitHub,
which is reflected in these updates.

This fixes https://github.com/prometheus/prometheus/issues/525
2015-02-17 02:08:56 +01:00
juliusv 4d135b8b36 Merge pull request #530 from prometheus/fix-mac-build
Fix embedded files generation on OS X.
2015-02-16 20:27:49 +01:00
Julius Volz 5917a04554 Fix embedded files generation on OS X. 2015-02-16 16:52:11 +01:00
juliusv 0ae2c1fc18 Merge pull request #526 from prometheus/lightening-embedding
Dramatically decrease resources for file embedding.
2015-02-13 17:55:31 +01:00
Julius Volz 8cb2c802a0 Dramatically decrease resources for file embedding.
This dramatically decreases the needed time and memory for building the
blob files. The memory numbers are measured via the
memory.max_usage_in_bytes value from cgroups.

* generating files.go:
OLD: 466MB   19s
NEW:  80MB    1s

* building files.go:
OLD: 1210MB  2.25s
NEW:    7MB  0.05s
2015-02-13 17:16:44 +01:00
Björn Rabenstein 498e1b5154 Merge pull request #524 from prometheus/beorn7/ingestion-tweaks
Improve performance of ingestion.
2015-02-13 14:49:16 +01:00
beorn7 e22f26bc58 Move to a queue model for appending samples after all.
Starting a goroutine takes 1-2µs on my laptop. From the "numbers every
Go programmer should know", I had 300ns for a channel send in my
mind. Turns out, on my laptop, it takes only 60ns. That's fast enough
to warrant the machinery of yet another channel with a fixed set of
worker goroutines feeding from it. The number chosen (8 for now) is
low enough to not really afflict a measurable overhead (a big
Prometheus server has >1000 goroutines running), but high enough to
not make sample ingestion a bottleneck.
2015-02-13 14:26:54 +01:00
beorn7 fe518fdb28 Simplify AppendSamples by allowing it to be goroutine-unsafe. 2015-02-13 12:13:22 +01:00
beorn7 8a1c195b54 Move emptiness check to the receivers. 2015-02-12 19:47:24 +01:00
beorn7 5d3cd65a5d Improve performance of ingestion.
- Parallelize AppendSamples as much as possible without breaking the
  contract about temporal order.

- Allocate more fingerprint locker slots.

- Do not run early checkpoints if we are behind on chunk persistence.

- Increase fpMinWaitDuration to give the disk more time for more
  important things.

Also, switch math.MaxInt64 and math.MinInt64 to the new constants.
2015-02-12 18:12:37 +01:00
Johannes 'fish' Ziemke 3d0fb51648 Merge pull request #521 from sammcj/patch-1
Clean fetched package cache
2015-02-11 11:21:56 +01:00
Sam ddc065b943 Clean fetched package cache
To further reduce image size
2015-02-11 11:23:35 +11:00
juliusv fd9ee9b009 Merge pull request #518 from prometheus/beorn7/ingestion-tweaks
Next try to deal with backed-up ingestion.
2015-02-10 14:59:13 +01:00
beorn7 11b3c2387c Improvements after review.
- Increase samplesQueueCapacity.

- Improve docstring for the above.

- Accept a short waiting period for the ingest channel to become
  ready. This should depend on the http timeout, but 100ms is probably
  good enough to cushion bursts bigger than samplesQueueCapacity,
  while it is unlikely that anybody ever will set an HTTP timeout
  similarly short.
2015-02-10 14:58:46 +01:00
beorn7 0f191629c6 Next try to deal with backed-up ingestion.
This is now not even trying to throttle in a benign way, but creates a
fully-fledged error. Advantage: It shows up very visible on the status
page. Disadvantage: The server does not really adjusts to a lower
scraping rate. However, if your ingestion backs up, you are in a very
irregulare state, I'd say it _should_ be considered an error and not
dealt with in a more graceful way.

In different news: I'll work on optimizing ingestion so that we will
not as easily run into that situation in the first place.
2015-02-09 17:32:47 +01:00
juliusv 5a4fe403ff Merge pull request #514 from prometheus/fix-graph-js-errors
Fix graph JS glitches and simplify graphing code.
2015-02-09 13:17:28 +01:00
juliusv af410426de Merge pull request #517 from prometheus/flag-cleanup
Make flag names consistent across projects.
2015-02-09 12:25:00 +01:00
Julius Volz 989bc86bcb Make flag names consistent across projects. 2015-02-08 23:29:57 +01:00
juliusv ab551888a2 Merge pull request #516 from prometheus/grobie/help-href
Update help URL
2015-02-07 16:59:58 +01:00
Tobias Schmidt 655dffe393 Update help URL 2015-02-07 05:10:12 -05:00
Julius Volz 0229a89925 Fix graph JS glitches and simplify graphing code.
- original series data is saved so it can be re-transformed after
  Rickshaw's stacking modified the series data

- always reconstruct graphs from scratch instead of updating the
  settings of an existing one (simplification)

- always wipe and recreate all graph-related DOM elements completely so
  that no left-over event handlers cause background event handlers
2015-02-06 23:51:37 +01:00
juliusv 4e6a807bde Merge pull request #513 from prometheus/beorn7/ingestion-tweaks
Ingestion tweaks
2015-02-06 18:39:10 +01:00
beorn7 16a1a6d324 Add another check for stopped scraper. 2015-02-06 18:30:33 +01:00
beorn7 5678a86924 Throttle scraping if a scrape took longer than the configured interval.
The simple algorithm applied here will increase the actual interval
incrementally, whenever and as long as the scrape itself takes longer
than the configured interval. Once it takes shorter again, the actual
interval will iteratively decrease again.
2015-02-06 16:44:56 +01:00