prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-10-01 05:07:32 -07:00

Author	SHA1	Message	Date
Fabian Reinartz	a52980e0a8	Add workaround for deadlocks This adds a workaround to avoid deadlocks for inconsistent write lock order across headBlocks. Things keep working if transactions only append data for the same timestamp, which is generally the case for Prometheus. Full behavior should be restored in a subsequent change.	2017-03-27 19:05:34 +02:00
Björn Rabenstein	29f05680a2	Merge pull request #2528 from prometheus/beorn7/storage2 main.go: Set GOGC to 40 by default	2017-03-27 15:00:37 +02:00
Björn Rabenstein	e63d079b59	Merge pull request #2527 from prometheus/beorn7/storage storage: Evict chunks and calculate persistence pressure...	2017-03-27 14:49:42 +02:00
Julius Volz	b5b0e00923	Merge pull request #2499 from prometheus/remote-read Remote Read	2017-03-27 14:43:44 +02:00
beorn7	434ab2a6a3	storage: Evict chunks and calculate persistence pressure based on target heap size This is a fairly easy attempt to dynamically evict chunks based on the heap size. A target heap size has to be set as a command line flage, so that users can essentially say "utilize 4GiB of RAM, and please don't OOM". The -storage.local.max-chunks-to-persist and -storage.local.memory-chunks flags are deprecated by this change. Backwards compatibility is provided by ignoring -storage.local.max-chunks-to-persist and use -storage.local.memory-chunks to set the new -storage.local.target-heap-size to a reasonable (and conservative) value (both with a warning). This also makes the metrics intstrumentation more consistent (in naming and implementation) and cleans up a few quirks in the tests. Answers to anticipated comments: There is a chance that Go 1.9 will allow programs better control over the Go memory management. I don't expect those changes to be in contradiction with the approach here, but I do expect them to complement them and allow them to be more precise and controlled. In any case, once those Go changes are available, this code has to be revisted. One might be tempted to let the user specify an estimated value for the RSS usage, and then internall set a target heap size of a certain fraction of that. (In my experience, 2/3 is a fairly safe bet.) However, investigations have shown that RSS size and its relation to the heap size is really really complicated. It depends on so many factors that I wouldn't even start listing them in a commit description. It depends on many circumstances and not at least on the risk trade-off of each individual user between RAM utilization and probability of OOMing during a RAM usage peak. To not add even more to the confusion, we need to stick to the well-defined number we also use in the targeting here, the sum of the sizes of heap objects.	2017-03-27 14:33:50 +02:00
Björn Rabenstein	e1a84b6256	Merge pull request #2529 from prometheus/beorn7/storage3 storage: Use staleness delta as head chunk timeout	2017-03-27 14:25:08 +02:00
Goutham Veeramachaneni	141499ff19	Add Tests For bigEndianPostings	2017-03-27 15:46:55 +05:30
Goutham Veeramachaneni	7b94a4e17d	Rename bytePostings To bigEndianPostings * To be more specific about the contents of the byte slice.	2017-03-27 14:04:42 +05:30
Fabian Reinartz	6a87e1a926	Merge pull request #22 from Gouthamve/master Add Sample Back	2017-03-27 09:55:55 +02:00
beorn7	96a303b348	storage: Use staleness delta as head chunk timeout Currently, if a series stops to exist, its head chunk will be kept open for an hour. That prevents it from being persisted. Which prevents it from being evicted. Which prevents the series from being archived. Most of the time, once no sample has been added to a series within the staleness limit, we can be pretty confident that this series will not receive samples anymore. The whole chain as described above can be started after 5m instead of 1h. In the relaxed case, this doesn't change a lot as the head chunk timeout is only checked during series maintenance, and usually, a series is only maintained every six hours. However, there is the typical scenario where a large service is deployed, the deoply turns out to be bad, and then it is deployed again within minutes, and quite quickly the number of time series has tripled. That's the point where the Prometheus server is stressed and switches (rightfully) into rushed mode. In that mode, time series are processed as quickly as possible, but all of that is in vein if all of those recently ended time series cannot be persisted yet for another hour. In that scenario, this change will help most, and it's exactly the scenario where help is most desperately needed.	2017-03-26 23:44:50 +02:00
beorn7	04ccf84559	main.go: Set GOGC to 40 by default Rationale: The default value for GOGC is 100, i.e. a garbage collected is initialized once as many heap space has been allocated as was in use after the last GC was done. This ratio doesn't make a lot of sense in Prometheus, as typically about 60% of the heap is allocated for long-lived memory chunks (most of which are around for many hours if not days). Thus, short-lived heap objects are accumulated for quite some time until they finally match the large amount of memory used by bulk memory chunks and a gigantic GC cyle is invoked. With GOGC=40, we are essentially reinstating "normal" GC behavior by acknowledging that about 60% of the heap are used for long-term bulk storage. The median Prometheus production server at SoundCloud runs a GC cycle every 90 seconds. With GOGC=40, a GC cycle is run every 35 seconds (which is still not very often). However, the effective RAM usage is now reduced by about 30%. If settings are updated to utilize more RAM, the time between GC cycles goes up again (as the heap size is larger with more long-lived memory chunks, but the frequency of creating short-lived heap objects does not change). On a quite busy large Prometheus server, the timing changed from one GC run every 20s to one GC run every 12s. In the former case (just changing GOGC, leave everything else as it is), the CPU usage increases by about 10% (on a mid-size referenc server from 8.1 to 8.9). If settings are adjusted, the CPU consumptions increases more drastically (from 8 cores to 13 cores on a large reference server), despite GCs happening more rarely, presumably because a 50% larger set of memory chunks is managed now. Having more memory chunks is good in many regards, and most servers are running out of memory long before they run out of CPU cycles, so the tradeoff is overwhelmingly positive in most cases. Power users can still set the GOGC environment variable as usual, as the implementation in this commit honors an explicitly set variable.	2017-03-26 21:55:37 +02:00
Goutham Veeramachaneni	efb0dfe1be	Implement Postings Iterator Over Bytes Closes fabxc/tsdb#18	2017-03-26 23:40:12 +05:30
Goutham Veeramachaneni	61f866bb94	Add Sample Back The compilation and tests are broken as head.go requires sample which has been moved to another package while moving BufferedSeriesIterator. Duplication seemed better compared to exposing sample from tsdbutil.	2017-03-26 23:22:58 +05:30
Julius Volz	3f23aa2cc7	Add headers to indicate remote read/write version Also add Content-Type header.	2017-03-24 17:39:51 +01:00
Fabian Reinartz	3be4ef94ce	Move BufferedSeriesIterator in own package This functionality is useful for a lot of clients but not relevant to the TSDB's core features.	2017-03-24 13:23:32 +01:00
Fabian Reinartz	f85d89abc0	Move BufferedSeriesIterator in own package This functionality is useful for a lot of clients but not relevant to the TSDB's core features.	2017-03-24 10:20:39 +01:00
Fabian Reinartz	a2e7b0b934	venodr: update tsdb and go-kit/log	2017-03-23 18:44:15 +01:00
Fabian Reinartz	e478d0e3bc	Actually close olds blocks in reloadBlocks This fixes a bug leaking memory because blocks were not actually closed as the closing call references the initial, empty slice	2017-03-23 18:27:20 +01:00
Tobias Schmidt	6dbd779099	Merge pull request #2519 from prometheus/update-arch-diag-link Update architecture diagram link	2017-03-23 14:18:38 +02:00
Julius Volz	a20105ddb0	Update architecture diagram link	2017-03-23 13:16:54 +01:00
Julius Volz	c34257d069	Merge pull request #2518 from prometheus/update-arch-diag Remove PromDash from architecture diagram	2017-03-23 13:13:14 +01:00
Julius Volz	428e1ad42c	Remove PromDash from architecture diagram	2017-03-23 13:11:05 +01:00
Björn Rabenstein	ddcf04a768	Merge pull request #2515 from leitzler/leitzler-patch-1 Use go env to fetch GOPATH to support Go 1.8	2017-03-23 11:58:30 +01:00
Pontus Leitzler	4774d6736a	Use go env to fetch GOPATH to support Go 1.8 Go 1.8 do not require env GOPATH to be set and make will fail if it isn't set.	2017-03-22 19:04:20 +01:00
Fabian Reinartz	70909ca8ad	Ensure GC runs after each compactor call GC is triggered rarely, which may cause unnecessarily high memory spikes when running several compaction cycles in a row. Explicitly run GC so we don't have idle bytes marked as used from the previous cycle.	2017-03-21 12:21:02 +01:00
Fabian Reinartz	789e8224ff	Fix wrong comparison in head block resorting	2017-03-21 12:12:33 +01:00
Fabian Reinartz	55ee4b5b3b	Merge branch 'master' of github.com:fabxc/tsdb	2017-03-21 10:11:39 +01:00
Fabian Reinartz	c18e055d7c	Fix races and add comments on remaining ones	2017-03-21 10:11:23 +01:00
Fabian Reinartz	d3669bd8b1	Merge pull request #15 from Gouthamve/lint-vet Lint and Vet Fixes	2017-03-21 09:58:45 +01:00
Fabian Reinartz	a4be181d3c	Merge branch 'master' into lint-vet	2017-03-21 09:58:34 +01:00
Fabian Reinartz	e837034360	Merge pull request #14 from Gouthamve/log-update Update kit/log To New API	2017-03-21 09:56:32 +01:00
Julius Volz	8fda83ea12	Make rules only read local data	2017-03-21 00:50:04 +01:00
Julius Volz	94acd3f1d8	Add fanin tests and fix uncovered bugs	2017-03-21 00:08:17 +01:00
Fabian Reinartz	9c93f8f2aa	Fix various races This fixes different race condition encoutnered when running Prometheus. It reduces the overall performance in the synthetic benchmark a fair bit but has no indiciations of impacting a real-world setup notably.	2017-03-20 14:45:27 +01:00
Julius Volz	9b33cfc457	Fix/unify context-based remote storage timeouts	2017-03-20 14:17:06 +01:00
Julius Volz	815762a4ad	Move retrieval.NewHTTPClient -> httputil.NewClientFromConfig	2017-03-20 14:17:04 +01:00
Fabian Reinartz	397f001ac5	Merge branch 'master' into dev-2.0	2017-03-20 14:12:11 +01:00
Fabian Reinartz	fc2e56c13f	vendor: update tsdb	2017-03-20 14:07:25 +01:00
Julius Volz	eb14678a25	Make remote read/write use config.HTTPClientConfig	2017-03-20 13:37:50 +01:00
Julius Volz	406b65d0dc	Rename remote.Storage to remote.Writer	2017-03-20 13:15:28 +01:00
Julius Volz	02395a224d	[WIP] Remote Read	2017-03-20 13:13:44 +01:00
Julius Volz	40e41a4776	Merge pull request #2494 from tomwilkie/remote-write-sharding Dynamically reshard the QueueManager based on observed load.	2017-03-20 12:45:17 +01:00
Julius Volz	525da88c35	Merge pull request #2479 from YKlausz/consul-tls Adding consul capability to connect via tls	2017-03-20 11:40:18 +01:00
Fabian Reinartz	2ef3682560	Hotfix erroneous "label index missing" error	2017-03-20 11:37:06 +01:00
Fabian Reinartz	3635569257	Trigger reload correctly on interrupted compaction	2017-03-20 10:41:43 +01:00
Fabian Reinartz	2c999836fb	Add Queryable interface to Block This adds the Queryable interface to the Block interface. Head and persisted blocks now implement their own Querier() method and thus isolate customization (e.g. remapPostings) more cleanly.	2017-03-20 10:21:21 +01:00
Fabian Reinartz	11be2cc585	Add composed Block interfaces, remove head generation This adds more lower-leve interfaces which are used to compose to different Block interfaces. The DB only uses interfaces instead of explicit persistedBlock and headBlock. The headBlock generation property is dropped as the use-case can be implemented using block sequence numbers.	2017-03-20 09:02:36 +01:00
Fabian Reinartz	0958c83d5d	Merge pull request #2511 from prometheus/fix-go-build Only truncate buildVersion if it's set	2017-03-20 08:46:57 +01:00
Julius Volz	107c33545b	Don't truncate build version	2017-03-19 18:37:23 +01:00
Goutham Veeramachaneni	761e4768f3	Lint and Vet Fixes	2017-03-19 21:35:01 +05:30

... 61 62 63 64 65 ...

7154 commits