Commit graph

3720 commits

Author SHA1 Message Date
beorn7 434ab2a6a3 storage: Evict chunks and calculate persistence pressure based on target heap size
This is a fairly easy attempt to dynamically evict chunks based on the
heap size. A target heap size has to be set as a command line flage,
so that users can essentially say "utilize 4GiB of RAM, and please
don't OOM".

The -storage.local.max-chunks-to-persist and
-storage.local.memory-chunks flags are deprecated by this
change. Backwards compatibility is provided by ignoring
-storage.local.max-chunks-to-persist and use
-storage.local.memory-chunks to set the new
-storage.local.target-heap-size to a reasonable (and conservative)
value (both with a warning).

This also makes the metrics intstrumentation more consistent (in
naming and implementation) and cleans up a few quirks in the tests.

Answers to anticipated comments:

There is a chance that Go 1.9 will allow programs better control over
the Go memory management. I don't expect those changes to be in
contradiction with the approach here, but I do expect them to
complement them and allow them to be more precise and controlled. In
any case, once those Go changes are available, this code has to be
revisted.

One might be tempted to let the user specify an estimated value for
the RSS usage, and then internall set a target heap size of a certain
fraction of that. (In my experience, 2/3 is a fairly safe bet.)
However, investigations have shown that RSS size and its relation to
the heap size is really really complicated. It depends on so many
factors that I wouldn't even start listing them in a commit
description. It depends on many circumstances and not at least on the
risk trade-off of each individual user between RAM utilization and
probability of OOMing during a RAM usage peak. To not add even more to
the confusion, we need to stick to the well-defined number we also use
in the targeting here, the sum of the sizes of heap objects.
2017-03-27 14:33:50 +02:00
Björn Rabenstein e1a84b6256 Merge pull request #2529 from prometheus/beorn7/storage3
storage: Use staleness delta as head chunk timeout
2017-03-27 14:25:08 +02:00
beorn7 96a303b348 storage: Use staleness delta as head chunk timeout
Currently, if a series stops to exist, its head chunk will be kept
open for an hour. That prevents it from being persisted. Which
prevents it from being evicted. Which prevents the series from being
archived.

Most of the time, once no sample has been added to a series within the
staleness limit, we can be pretty confident that this series will not
receive samples anymore. The whole chain as described above can be
started after 5m instead of 1h. In the relaxed case, this doesn't
change a lot as the head chunk timeout is only checked during series
maintenance, and usually, a series is only maintained every six
hours. However, there is the typical scenario where a large service is
deployed, the deoply turns out to be bad, and then it is deployed
again within minutes, and quite quickly the number of time series has
tripled. That's the point where the Prometheus server is stressed and
switches (rightfully) into rushed mode. In that mode, time series are
processed as quickly as possible, but all of that is in vein if all of
those recently ended time series cannot be persisted yet for another
hour. In that scenario, this change will help most, and it's exactly
the scenario where help is most desperately needed.
2017-03-26 23:44:50 +02:00
Tobias Schmidt 6dbd779099 Merge pull request #2519 from prometheus/update-arch-diag-link
Update architecture diagram link
2017-03-23 14:18:38 +02:00
Julius Volz a20105ddb0 Update architecture diagram link 2017-03-23 13:16:54 +01:00
Julius Volz c34257d069 Merge pull request #2518 from prometheus/update-arch-diag
Remove PromDash from architecture diagram
2017-03-23 13:13:14 +01:00
Julius Volz 428e1ad42c Remove PromDash from architecture diagram 2017-03-23 13:11:05 +01:00
Björn Rabenstein ddcf04a768 Merge pull request #2515 from leitzler/leitzler-patch-1
Use go env to fetch GOPATH to support Go 1.8
2017-03-23 11:58:30 +01:00
Pontus Leitzler 4774d6736a Use go env to fetch GOPATH to support Go 1.8
Go 1.8 do not require env GOPATH to be set and make will fail if it isn't set.
2017-03-22 19:04:20 +01:00
Julius Volz 40e41a4776 Merge pull request #2494 from tomwilkie/remote-write-sharding
Dynamically reshard the QueueManager based on observed load.
2017-03-20 12:45:17 +01:00
Julius Volz 525da88c35 Merge pull request #2479 from YKlausz/consul-tls
Adding consul capability to connect via tls
2017-03-20 11:40:18 +01:00
Fabian Reinartz 0958c83d5d Merge pull request #2511 from prometheus/fix-go-build
Only truncate buildVersion if it's set
2017-03-20 08:46:57 +01:00
Julius Volz 107c33545b Don't truncate build version 2017-03-19 18:37:23 +01:00
Goutham Veeramachaneni 5c89cec65c Stricter Relabel Config Checking for Labeldrop/keep (#2510)
* Minor code cleanup

* Labeldrop/Labelkeep Now *Only* Support Regex

Ref promtheus/prometheus#2368
2017-03-18 22:32:08 +01:00
Robson Roberto Souza Peixoto cc3e859d9e Add support for multiple ports in Marathon (#2506)
- create a target for every port
- add meta labels for Marathon labels in portMappings and portDefinitions
2017-03-18 22:10:44 +02:00
yklausz 75880b594f Adding consul capability to connect via tls 2017-03-17 22:37:18 +01:00
Fabian Reinartz 0a7c8e9da1 Merge pull request #2504 from prometheus/grobie/fix-discovery-naming
Follow golang naming conventions in discovery packages
2017-03-17 08:01:48 +01:00
Tobias Schmidt 7bde44e98e Remove testing.T usage in goroutines
The staticcheck warns about testing.T usage in goroutines. Moving the
t.Fatal* calls to the main thread showed immediately that this is a good
practice, as one of the test setups didn't work.
2017-03-16 23:40:46 -03:00
Tobias Schmidt 58cd39aacd Follow golang naming conventions in discovery packages 2017-03-16 23:40:46 -03:00
Bplotka 1823ae8bc4 Fixed int64 overflow for timestamp in v1/api parseDuration and parseTime (#2501)
* Fixed int64 overflow for timestamp in v1/api parseDuration and parseTime

This led to unexpected results on wrong query with "(...)&start=148966367200.372&end=1489667272.372"
That query is wrong because of `start > end` but actually internal int64 overflow caused start to be something around MinInt64 (huge negative value) and was passing validation.

BTW: Not sure if negative timestamp makes sense even.. But model.Earliest is actually MinInt64, can someone explain me why?

Signed-off-by: Bartek Plotka <bwplotka@gmail.com>

* Added missing trailing periods on comments.

Signed-off-by: Bartek Plotka <bwplotka@gmail.com>

* MOved to only `<` and `>`. Removed equal.

Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2017-03-16 15:16:20 +01:00
beorn7 48d221c11e storage: Fix typo in comment 2017-03-16 11:49:41 +01:00
Robert Neumayer feb7670929 Add tests for consul service discovery (#2490)
* Add tests for consul service discovery

* Add license header

* Address comments

* inline variables
* check for extra error

* Fix error formatting
2017-03-15 09:33:53 +01:00
Tom Wilkie 75bb0f3253 Review feedback 2017-03-13 21:24:49 +00:00
Tom Wilkie 77cce900b8 Fix tests 2017-03-13 15:21:59 +00:00
Tom Wilkie b48799a01e Add license stanza 2017-03-13 14:50:15 +00:00
Tom Wilkie 9d22f030cf Dynamically reshard the QueueManager based on observed load. 2017-03-13 14:41:16 +00:00
Wéber Gyula 5aa90c075b added docker run command to readme (#2491)
* added docker run command to readme

* updated codebox in readme
2017-03-13 11:37:25 +01:00
Fabian Reinartz de1e4322d7 Merge pull request #2474 from Gouthamve/custom-timeouts-1399
Support Custom Timeout for Queries
2017-03-12 14:20:59 +01:00
Fabian Reinartz 2677f3eaf2 Merge pull request #2462 from m-kraus/master
Allow the use of bearer_token or bearer_token_file for MarathonSD authorization
2017-03-08 10:33:26 +01:00
Julius Volz e22553edd2 Merge pull request #2468 from agaoglu/version-statics
Adding version to names of static files
2017-03-07 22:22:01 +01:00
Erdem Agaoglu 90625b0400 Use revision as cachebuster 2017-03-07 18:03:52 +03:00
Goutham Veeramachaneni 4b0270290b
Fix comments to match convention 2017-03-06 23:21:27 +05:30
Goutham Veeramachaneni c6b329c55b
Support Custom Timeouts for Queries 2017-03-06 23:02:21 +05:30
Goutham Veeramachaneni 6634984a38
Comments and Typo Fixes 2017-03-06 17:16:37 +05:30
Fabian Reinartz 6aee1551e1 Merge pull request #2470 from StephanErb/zk-deadlock
Prevent deadlock in ZK TreeCache constructor by deferring the initial sync.
2017-03-06 12:36:51 +01:00
Michael Kraus 690b49e503 Fix marathon tests 2017-03-06 11:36:55 +01:00
Michael Kraus 31252cc1b5 Clarify explicit use of authorization header 2017-03-06 11:36:36 +01:00
Stephan Erb 3038d0eb9b Prevent deadlock in ZK TreeCache constructor by deferring the initial sync.
Fixes #2254
2017-03-03 23:58:46 +01:00
Erdem Agaoglu 241da87f7f Adding version to names of static files
to prevent browsers using old files in local caches after an upgrade.
2017-03-03 23:36:06 +03:00
Michael Kraus 04eadf6e20 Allow Marathon SD without bearer_token and bearer_token_file 2017-03-02 13:17:19 +01:00
Michael Kraus 47bdcf0f67 Allow the use of bearer_token or bearer_token_file for MarathonSD 2017-03-02 09:44:20 +01:00
Derek Marcotte 0a7fb56b16 Expose PromConsole.Graph.buildQueryUrl, refactor dispatch to use (#2461)
Expose buildQueryUrl, refactor dispatch to use

buildQueryUrl will allow users to execute queries over the range of an
existing graph.  This will be helpful to select data series they wish to
annotate the graph with, for example.
2017-03-01 22:37:50 +00:00
Julius Volz 7e14533b32 Merge pull request #2396 from lightpriest/fuzzy-fix-lookup
Fix fuzzy search lookup issues
2017-02-28 19:06:30 +01:00
Erdem Agaoglu 8809735d7f Setting User-Agent header (#2447) 2017-02-28 09:59:33 -04:00
Julius Volz 07491c9f1b Merge pull request #2452 from prometheus/swappable-post
notifier: Allow swapping out HTTP Doer
2017-02-27 22:15:24 +01:00
Julius Volz f152ac5e23 notifier: Allow swapping out HTTP Doer
We need to be able to modify the HTTP POST in Weave Cortex to add
multitenancy information to a notification. Since we only really need a
special header in the end, the other option would be to just allow
passing in headers to the notifier. But swapping out the whole Doer is
more general and allows others to swap out the network-talky bits of the
notifier for their own use. Doing this via contexts here wouldn't work
well, due to the decoupled flow of data in the notifier.

There was no existing interface containing the ctxhttp.Post() or
ctxhttp.Do() methods, so I settled on just using Do() as a swappable
function directly (and with a more minimal signature than Post).
2017-02-27 20:36:22 +01:00
Tom Wilkie 1ab893c6ec Limit 'discarding sample' logs to 1 every 10s (#2446)
* Limit 'discarding sample' logs to 1 every 10s

* Include the vendored library

* Review feedback
2017-02-23 19:20:39 +01:00
Julius Volz 16bd5c8ebe Merge pull request #2423 from prometheus/multiple-remote-writers
Re-add multiple remote writers
2017-02-23 09:48:40 +01:00
Julius Volz 2f39dbc8b3 Rename StorageQueueManager -> QueueManager 2017-02-21 21:45:43 +01:00
Julius Volz e9476b35d5 Re-add multiple remote writers
Each remote write endpoint gets its own set of relabeling rules.

This is based on the (yet-to-be-merged)
https://github.com/prometheus/prometheus/pull/2419, which removes legacy
remote write implementations.
2017-02-20 13:23:12 +01:00