Commit graph

7154 commits

Author SHA1 Message Date
Fabian Reinartz a52980e0a8 Add workaround for deadlocks
This adds a workaround to avoid deadlocks for inconsistent write lock
order across headBlocks.
Things keep working if transactions only append data for the same
timestamp, which is generally the case for Prometheus.

Full behavior should be restored in a subsequent change.
2017-03-27 19:05:34 +02:00
Björn Rabenstein 29f05680a2 Merge pull request #2528 from prometheus/beorn7/storage2
main.go: Set GOGC to 40 by default
2017-03-27 15:00:37 +02:00
Björn Rabenstein e63d079b59 Merge pull request #2527 from prometheus/beorn7/storage
storage: Evict chunks and calculate persistence pressure...
2017-03-27 14:49:42 +02:00
Julius Volz b5b0e00923 Merge pull request #2499 from prometheus/remote-read
Remote Read
2017-03-27 14:43:44 +02:00
beorn7 434ab2a6a3 storage: Evict chunks and calculate persistence pressure based on target heap size
This is a fairly easy attempt to dynamically evict chunks based on the
heap size. A target heap size has to be set as a command line flage,
so that users can essentially say "utilize 4GiB of RAM, and please
don't OOM".

The -storage.local.max-chunks-to-persist and
-storage.local.memory-chunks flags are deprecated by this
change. Backwards compatibility is provided by ignoring
-storage.local.max-chunks-to-persist and use
-storage.local.memory-chunks to set the new
-storage.local.target-heap-size to a reasonable (and conservative)
value (both with a warning).

This also makes the metrics intstrumentation more consistent (in
naming and implementation) and cleans up a few quirks in the tests.

Answers to anticipated comments:

There is a chance that Go 1.9 will allow programs better control over
the Go memory management. I don't expect those changes to be in
contradiction with the approach here, but I do expect them to
complement them and allow them to be more precise and controlled. In
any case, once those Go changes are available, this code has to be
revisted.

One might be tempted to let the user specify an estimated value for
the RSS usage, and then internall set a target heap size of a certain
fraction of that. (In my experience, 2/3 is a fairly safe bet.)
However, investigations have shown that RSS size and its relation to
the heap size is really really complicated. It depends on so many
factors that I wouldn't even start listing them in a commit
description. It depends on many circumstances and not at least on the
risk trade-off of each individual user between RAM utilization and
probability of OOMing during a RAM usage peak. To not add even more to
the confusion, we need to stick to the well-defined number we also use
in the targeting here, the sum of the sizes of heap objects.
2017-03-27 14:33:50 +02:00
Björn Rabenstein e1a84b6256 Merge pull request #2529 from prometheus/beorn7/storage3
storage: Use staleness delta as head chunk timeout
2017-03-27 14:25:08 +02:00
Goutham Veeramachaneni 141499ff19
Add Tests For bigEndianPostings 2017-03-27 15:46:55 +05:30
Goutham Veeramachaneni 7b94a4e17d
Rename bytePostings To bigEndianPostings
* To be more specific about the contents of the byte slice.
2017-03-27 14:04:42 +05:30
Fabian Reinartz 6a87e1a926 Merge pull request #22 from Gouthamve/master
Add Sample Back
2017-03-27 09:55:55 +02:00
beorn7 96a303b348 storage: Use staleness delta as head chunk timeout
Currently, if a series stops to exist, its head chunk will be kept
open for an hour. That prevents it from being persisted. Which
prevents it from being evicted. Which prevents the series from being
archived.

Most of the time, once no sample has been added to a series within the
staleness limit, we can be pretty confident that this series will not
receive samples anymore. The whole chain as described above can be
started after 5m instead of 1h. In the relaxed case, this doesn't
change a lot as the head chunk timeout is only checked during series
maintenance, and usually, a series is only maintained every six
hours. However, there is the typical scenario where a large service is
deployed, the deoply turns out to be bad, and then it is deployed
again within minutes, and quite quickly the number of time series has
tripled. That's the point where the Prometheus server is stressed and
switches (rightfully) into rushed mode. In that mode, time series are
processed as quickly as possible, but all of that is in vein if all of
those recently ended time series cannot be persisted yet for another
hour. In that scenario, this change will help most, and it's exactly
the scenario where help is most desperately needed.
2017-03-26 23:44:50 +02:00
beorn7 04ccf84559 main.go: Set GOGC to 40 by default
Rationale: The default value for GOGC is 100, i.e. a garbage collected
is initialized once as many heap space has been allocated as was in
use after the last GC was done. This ratio doesn't make a lot of sense
in Prometheus, as typically about 60% of the heap is allocated for
long-lived memory chunks (most of which are around for many hours if
not days). Thus, short-lived heap objects are accumulated for quite
some time until they finally match the large amount of memory used by
bulk memory chunks and a gigantic GC cyle is invoked. With GOGC=40, we
are essentially reinstating "normal" GC behavior by acknowledging that
about 60% of the heap are used for long-term bulk storage.

The median Prometheus production server at SoundCloud runs a GC cycle
every 90 seconds. With GOGC=40, a GC cycle is run every 35 seconds
(which is still not very often). However, the effective RAM usage is
now reduced by about 30%. If settings are updated to utilize more RAM,
the time between GC cycles goes up again (as the heap size is larger
with more long-lived memory chunks, but the frequency of creating
short-lived heap objects does not change). On a quite busy large
Prometheus server, the timing changed from one GC run every 20s to one
GC run every 12s.

In the former case (just changing GOGC, leave everything else as it
is), the CPU usage increases by about 10% (on a mid-size referenc
server from 8.1 to 8.9). If settings are adjusted, the CPU
consumptions increases more drastically (from 8 cores to 13 cores on a
large reference server), despite GCs happening more rarely, presumably
because a 50% larger set of memory chunks is managed now. Having more
memory chunks is good in many regards, and most servers are running
out of memory long before they run out of CPU cycles, so the tradeoff
is overwhelmingly positive in most cases.

Power users can still set the GOGC environment variable as usual, as
the implementation in this commit honors an explicitly set variable.
2017-03-26 21:55:37 +02:00
Goutham Veeramachaneni efb0dfe1be
Implement Postings Iterator Over Bytes
Closes fabxc/tsdb#18
2017-03-26 23:40:12 +05:30
Goutham Veeramachaneni 61f866bb94
Add Sample Back
The compilation and tests are broken as head.go requires sample which
has been moved to another package while moving BufferedSeriesIterator.

Duplication seemed better compared to exposing sample from tsdbutil.
2017-03-26 23:22:58 +05:30
Julius Volz 3f23aa2cc7 Add headers to indicate remote read/write version
Also add Content-Type header.
2017-03-24 17:39:51 +01:00
Fabian Reinartz 3be4ef94ce Move BufferedSeriesIterator in own package
This functionality is useful for a lot of clients but not relevant to
the TSDB's core features.
2017-03-24 13:23:32 +01:00
Fabian Reinartz f85d89abc0 Move BufferedSeriesIterator in own package
This functionality is useful for a lot of clients but not relevant to
the TSDB's core features.
2017-03-24 10:20:39 +01:00
Fabian Reinartz a2e7b0b934 venodr: update tsdb and go-kit/log 2017-03-23 18:44:15 +01:00
Fabian Reinartz e478d0e3bc Actually close olds blocks in reloadBlocks
This fixes a bug leaking memory because blocks were not actually closed
as the closing call references the initial, empty slice
2017-03-23 18:27:20 +01:00
Tobias Schmidt 6dbd779099 Merge pull request #2519 from prometheus/update-arch-diag-link
Update architecture diagram link
2017-03-23 14:18:38 +02:00
Julius Volz a20105ddb0 Update architecture diagram link 2017-03-23 13:16:54 +01:00
Julius Volz c34257d069 Merge pull request #2518 from prometheus/update-arch-diag
Remove PromDash from architecture diagram
2017-03-23 13:13:14 +01:00
Julius Volz 428e1ad42c Remove PromDash from architecture diagram 2017-03-23 13:11:05 +01:00
Björn Rabenstein ddcf04a768 Merge pull request #2515 from leitzler/leitzler-patch-1
Use go env to fetch GOPATH to support Go 1.8
2017-03-23 11:58:30 +01:00
Pontus Leitzler 4774d6736a Use go env to fetch GOPATH to support Go 1.8
Go 1.8 do not require env GOPATH to be set and make will fail if it isn't set.
2017-03-22 19:04:20 +01:00
Fabian Reinartz 70909ca8ad Ensure GC runs after each compactor call
GC is triggered rarely, which may cause unnecessarily high memory
spikes when running several compaction cycles in a row. Explicitly run
GC so we don't have idle bytes marked as used from the previous cycle.
2017-03-21 12:21:02 +01:00
Fabian Reinartz 789e8224ff Fix wrong comparison in head block resorting 2017-03-21 12:12:33 +01:00
Fabian Reinartz 55ee4b5b3b Merge branch 'master' of github.com:fabxc/tsdb 2017-03-21 10:11:39 +01:00
Fabian Reinartz c18e055d7c Fix races and add comments on remaining ones 2017-03-21 10:11:23 +01:00
Fabian Reinartz d3669bd8b1 Merge pull request #15 from Gouthamve/lint-vet
Lint and Vet Fixes
2017-03-21 09:58:45 +01:00
Fabian Reinartz a4be181d3c Merge branch 'master' into lint-vet 2017-03-21 09:58:34 +01:00
Fabian Reinartz e837034360 Merge pull request #14 from Gouthamve/log-update
Update kit/log To New API
2017-03-21 09:56:32 +01:00
Julius Volz 8fda83ea12 Make rules only read local data 2017-03-21 00:50:04 +01:00
Julius Volz 94acd3f1d8 Add fanin tests and fix uncovered bugs 2017-03-21 00:08:17 +01:00
Fabian Reinartz 9c93f8f2aa Fix various races
This fixes different race condition encoutnered when running Prometheus.
It reduces the overall performance in the synthetic benchmark a fair bit
but has no indiciations of impacting a real-world setup notably.
2017-03-20 14:45:27 +01:00
Julius Volz 9b33cfc457 Fix/unify context-based remote storage timeouts 2017-03-20 14:17:06 +01:00
Julius Volz 815762a4ad Move retrieval.NewHTTPClient -> httputil.NewClientFromConfig 2017-03-20 14:17:04 +01:00
Fabian Reinartz 397f001ac5 Merge branch 'master' into dev-2.0 2017-03-20 14:12:11 +01:00
Fabian Reinartz fc2e56c13f vendor: update tsdb 2017-03-20 14:07:25 +01:00
Julius Volz eb14678a25 Make remote read/write use config.HTTPClientConfig 2017-03-20 13:37:50 +01:00
Julius Volz 406b65d0dc Rename remote.Storage to remote.Writer 2017-03-20 13:15:28 +01:00
Julius Volz 02395a224d [WIP] Remote Read 2017-03-20 13:13:44 +01:00
Julius Volz 40e41a4776 Merge pull request #2494 from tomwilkie/remote-write-sharding
Dynamically reshard the QueueManager based on observed load.
2017-03-20 12:45:17 +01:00
Julius Volz 525da88c35 Merge pull request #2479 from YKlausz/consul-tls
Adding consul capability to connect via tls
2017-03-20 11:40:18 +01:00
Fabian Reinartz 2ef3682560 Hotfix erroneous "label index missing" error 2017-03-20 11:37:06 +01:00
Fabian Reinartz 3635569257 Trigger reload correctly on interrupted compaction 2017-03-20 10:41:43 +01:00
Fabian Reinartz 2c999836fb Add Queryable interface to Block
This adds the Queryable interface to the Block interface. Head and
persisted blocks now implement their own Querier() method and thus
isolate customization (e.g. remapPostings) more cleanly.
2017-03-20 10:21:21 +01:00
Fabian Reinartz 11be2cc585 Add composed Block interfaces, remove head generation
This adds more lower-leve interfaces which are used to compose
to different Block interfaces.
The DB only uses interfaces instead of explicit persistedBlock and
headBlock. The headBlock generation property is dropped as the use-case
can be implemented using block sequence numbers.
2017-03-20 09:02:36 +01:00
Fabian Reinartz 0958c83d5d Merge pull request #2511 from prometheus/fix-go-build
Only truncate buildVersion if it's set
2017-03-20 08:46:57 +01:00
Julius Volz 107c33545b Don't truncate build version 2017-03-19 18:37:23 +01:00
Goutham Veeramachaneni 761e4768f3
Lint and Vet Fixes 2017-03-19 21:35:01 +05:30