Commit graph

1554 commits

Author SHA1 Message Date
beorn7 af91fb8e31 Improve persisting chunks to disk.
This is done by bucketing chunks by fingerprint. If the persisting to
disk falls behind, more and more chunks are in the queue. As soon as
there are "double hits", we will now persist both chunks in one go,
doubling the disk throughput (assuming it is limited by disk
seeks). Should even more pile up so that we end wit "triple hits", we
will persist those first, and so on.

Even if we have millions of time series, this will still help,
assuming not all of them are growing with the same speed. Series that
get many samples and/or are not very compressable will accumulate
chunks faster, and they will soon get double- or triple-writes.

To improve the chance of double writes,
-storage.local.persistence-queue-capacity could be set to a higher
value. However, that will slow down shutdown a lot (as the queue has
to be worked through). So we leave it to the user to set it to a
really high value. A more fundamental solution would be to checkpoint
not only head chunks, but also chunks still in the persist queue. That
would be quite complicated for a rather limited use-case (running many
time series with high ingestion rate on slow spinning disks).
2015-02-17 16:02:09 +01:00
Julius Volz 13048b7468 Simplify GOPATH/dependency setup. 2015-02-17 02:20:16 +01:00
Julius Volz 464524fa44 Make config/Makefile use user Go installation.
Otherwise it will fail to "go get" the "github.com/golang/protobuf"
package because its dir already exists in the vendored packages,
but the copy isn't a git repository.
2015-02-17 02:20:16 +01:00
Julius Volz 5643587cc9 Also vendor Prometheus-internal deps and update existing ones. 2015-02-17 02:20:12 +01:00
Julius Volz b85c72bc50 Get rid of unnecessary tabs in Makefile.INCLUDE. 2015-02-17 02:08:56 +01:00
Julius Volz 740a99a9ac Remove Google Code files, add GitHub files for protobuf lib. 2015-02-17 02:08:56 +01:00
Julius Volz af627bb2b9 Copy vendored deps manually instead of using Godeps.
We were using Godep incorrectly (cloning repos from the internet during
build time instead of including Godeps/_workspace in the GOPATH via
"godep go"). However, to avoid even having to fetch "godeps" from the
internet during build, this now just copies the vendored files into the
GOPATH.

Also, the protocol buffer library moved from Google Code to GitHub,
which is reflected in these updates.

This fixes https://github.com/prometheus/prometheus/issues/525
2015-02-17 02:08:56 +01:00
juliusv 4d135b8b36 Merge pull request #530 from prometheus/fix-mac-build
Fix embedded files generation on OS X.
2015-02-16 20:27:49 +01:00
Julius Volz 5917a04554 Fix embedded files generation on OS X. 2015-02-16 16:52:11 +01:00
juliusv 0ae2c1fc18 Merge pull request #526 from prometheus/lightening-embedding
Dramatically decrease resources for file embedding.
2015-02-13 17:55:31 +01:00
Julius Volz 8cb2c802a0 Dramatically decrease resources for file embedding.
This dramatically decreases the needed time and memory for building the
blob files. The memory numbers are measured via the
memory.max_usage_in_bytes value from cgroups.

* generating files.go:
OLD: 466MB   19s
NEW:  80MB    1s

* building files.go:
OLD: 1210MB  2.25s
NEW:    7MB  0.05s
2015-02-13 17:16:44 +01:00
Björn Rabenstein 498e1b5154 Merge pull request #524 from prometheus/beorn7/ingestion-tweaks
Improve performance of ingestion.
2015-02-13 14:49:16 +01:00
beorn7 e22f26bc58 Move to a queue model for appending samples after all.
Starting a goroutine takes 1-2µs on my laptop. From the "numbers every
Go programmer should know", I had 300ns for a channel send in my
mind. Turns out, on my laptop, it takes only 60ns. That's fast enough
to warrant the machinery of yet another channel with a fixed set of
worker goroutines feeding from it. The number chosen (8 for now) is
low enough to not really afflict a measurable overhead (a big
Prometheus server has >1000 goroutines running), but high enough to
not make sample ingestion a bottleneck.
2015-02-13 14:26:54 +01:00
beorn7 fe518fdb28 Simplify AppendSamples by allowing it to be goroutine-unsafe. 2015-02-13 12:13:22 +01:00
beorn7 8a1c195b54 Move emptiness check to the receivers. 2015-02-12 19:47:24 +01:00
beorn7 5d3cd65a5d Improve performance of ingestion.
- Parallelize AppendSamples as much as possible without breaking the
  contract about temporal order.

- Allocate more fingerprint locker slots.

- Do not run early checkpoints if we are behind on chunk persistence.

- Increase fpMinWaitDuration to give the disk more time for more
  important things.

Also, switch math.MaxInt64 and math.MinInt64 to the new constants.
2015-02-12 18:12:37 +01:00
Johannes 'fish' Ziemke 3d0fb51648 Merge pull request #521 from sammcj/patch-1
Clean fetched package cache
2015-02-11 11:21:56 +01:00
Sam ddc065b943 Clean fetched package cache
To further reduce image size
2015-02-11 11:23:35 +11:00
juliusv fd9ee9b009 Merge pull request #518 from prometheus/beorn7/ingestion-tweaks
Next try to deal with backed-up ingestion.
2015-02-10 14:59:13 +01:00
beorn7 11b3c2387c Improvements after review.
- Increase samplesQueueCapacity.

- Improve docstring for the above.

- Accept a short waiting period for the ingest channel to become
  ready. This should depend on the http timeout, but 100ms is probably
  good enough to cushion bursts bigger than samplesQueueCapacity,
  while it is unlikely that anybody ever will set an HTTP timeout
  similarly short.
2015-02-10 14:58:46 +01:00
beorn7 0f191629c6 Next try to deal with backed-up ingestion.
This is now not even trying to throttle in a benign way, but creates a
fully-fledged error. Advantage: It shows up very visible on the status
page. Disadvantage: The server does not really adjusts to a lower
scraping rate. However, if your ingestion backs up, you are in a very
irregulare state, I'd say it _should_ be considered an error and not
dealt with in a more graceful way.

In different news: I'll work on optimizing ingestion so that we will
not as easily run into that situation in the first place.
2015-02-09 17:32:47 +01:00
juliusv 5a4fe403ff Merge pull request #514 from prometheus/fix-graph-js-errors
Fix graph JS glitches and simplify graphing code.
2015-02-09 13:17:28 +01:00
juliusv af410426de Merge pull request #517 from prometheus/flag-cleanup
Make flag names consistent across projects.
2015-02-09 12:25:00 +01:00
Julius Volz 989bc86bcb Make flag names consistent across projects. 2015-02-08 23:29:57 +01:00
juliusv ab551888a2 Merge pull request #516 from prometheus/grobie/help-href
Update help URL
2015-02-07 16:59:58 +01:00
Tobias Schmidt 655dffe393 Update help URL 2015-02-07 05:10:12 -05:00
Julius Volz 0229a89925 Fix graph JS glitches and simplify graphing code.
- original series data is saved so it can be re-transformed after
  Rickshaw's stacking modified the series data

- always reconstruct graphs from scratch instead of updating the
  settings of an existing one (simplification)

- always wipe and recreate all graph-related DOM elements completely so
  that no left-over event handlers cause background event handlers
2015-02-06 23:51:37 +01:00
juliusv 4e6a807bde Merge pull request #513 from prometheus/beorn7/ingestion-tweaks
Ingestion tweaks
2015-02-06 18:39:10 +01:00
beorn7 16a1a6d324 Add another check for stopped scraper. 2015-02-06 18:30:33 +01:00
beorn7 5678a86924 Throttle scraping if a scrape took longer than the configured interval.
The simple algorithm applied here will increase the actual interval
incrementally, whenever and as long as the scrape itself takes longer
than the configured interval. Once it takes shorter again, the actual
interval will iteratively decrease again.
2015-02-06 16:44:56 +01:00
juliusv 1c9b3c4c45 Merge pull request #511 from prometheus/remove-custom-flip
Remove custom hover flip code. Fixed upstream.
2015-02-06 15:21:21 +01:00
Julius Volz 0e8c0b67ad Remove custom hover flip code. Fixed upstream. 2015-02-06 15:02:13 +01:00
beorn7 d2ab49c396 Make the persist queue length configurable.
Also, set a much higher default value.

Chunk persist requests can be quite spiky. If you collect a large
number of time series that are very similar, they will tend to finish
up a chunk at about the same time. There is no reason we need to back
up scraping just because of that. The rationale of the new default
value is "1/8 of the chunks in memory".
2015-02-06 14:54:53 +01:00
juliusv 198ac9538b Merge pull request #510 from prometheus/unlimited-autocomplete
Show unlimited number of metrics in autocomplete.
2015-02-06 11:00:03 +01:00
Julius Volz 517a731ebf Show unlimited number of metrics in autocomplete. 2015-02-06 02:03:08 +01:00
juliusv 6e296648ed Merge pull request #508 from brian-brazil/round
Change the 2nd argument of round to toNearest.
2015-02-05 18:15:52 +01:00
Brian Brazil 60271d58bf Change the 2nd argument of round to toNearest.
This is more useful if you want get a multiple of 2 or 5, while
still working for .001.
2015-02-05 16:13:40 +00:00
juliusv 0d6c958847 Merge pull request #507 from prometheus/i686-build
Fix Go download path for i686 architecture.
2015-02-05 16:48:18 +01:00
Julius Volz d9ff2f7edb Fix Go download path for FreeBSD. 2015-02-05 16:35:53 +01:00
Julius Volz 753113f21a Fix Go download path for x86-based architectures.
This fixes https://github.com/prometheus/prometheus/issues/503.
2015-02-05 16:35:48 +01:00
Julius Volz 82613527f3 Remove unnecessary float64() conversion in round(). 2015-02-05 15:14:05 +01:00
juliusv 982923f0c4 Merge pull request #502 from mmikulicic/floor_ceil
Add floor, ceil and round functions. Closes #402
2015-02-04 17:23:27 +01:00
Marko Mikulicic 8fdacbdf17 Add floor, ceil and round functions. Closes #402 2015-02-04 17:20:56 +01:00
juliusv 9e6b3bcefa Merge pull request #498 from fabxc/feature/query_timeout
Implement query timeouts
2015-02-03 13:51:47 +01:00
Fabian Reinartz fa1e90003b Query timeout added.
This is related to #454. Queries now timeout after a duration set by
the -query.timeout flag. The TotalEvalTimer is now started/stopped
inside any of the ast.Eval* functions.
2015-02-03 08:04:27 +01:00
juliusv 199a94b619 Merge pull request #501 from prometheus/fix-d3-version
Fix Rickshaw/D3 version mismatch.
2015-02-03 00:29:43 +01:00
Julius Volz b3978fe869 Fix Rickshaw/D3 version mismatch.
When Rickshaw was updated to 1.5.1 in
fd43daf82e,
the Rickshaw upstream package now contained 3 different D3 files:

d3.min.js
d3.v2.js
d3.v3.js

For details on why that is, see
https://groups.google.com/forum/#!topic/d3-js/lXQgKA7mtEw

For the 1.5.1 Rickshaw to work properly (being able to format dates with
D3 without causing a JS error), it needs d3.v2.js or d3.v3.js, not the
d3.min.js one. I chose to update us to d3.v3.js now, since that is the
most recent and minified version, and I didn't see any problems with it
(also, the current Rickshaw examples are using that D3 version).

Currently, displaying graphs with a range >14d is broken. This fixes
that.
2015-02-02 23:41:36 +01:00
Björn Rabenstein 63a79821fc Merge pull request #499 from prometheus/beorn7/makefile
Improve comments about embedding.
2015-02-02 13:37:51 +01:00
Bjoern Rabenstein f568bbc19f Improve comments about embedding. 2015-02-02 12:37:39 +01:00
juliusv 3012de7f5e Merge pull request #497 from prometheus/remove-persist-error-labels
Remove labels on persist error counter.
2015-02-01 14:04:21 +01:00