prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2025-03-05 20:59:13 -08:00

Author	SHA1	Message	Date
beorn7	d290340367	Fix and improve chunkDesc locking	2016-02-19 16:24:38 +01:00
beorn7	0e202dacb4	Streamline series iterator creation This will fix issue #1035 and will also help to make issue #1264 less bad. The fundamental problem in the current code: In the preload phase, we quite accurately determine which chunks will be used for the query being executed. However, in the subsequent step of creating series iterators, the created iterators are referencing _all_ in-memory chunks in their series, even the un-pinned ones. In iterator creation, we copy a pointer to each in-memory chunk of a series into the iterator. While this creates a certain amount of allocation churn, the worst thing about it is that copying the chunk pointer out of the chunkDesc requires a mutex acquisition. (Remember that the iterator will also reference un-pinned chunks, so we need to acquire the mutex to protect against concurrent eviction.) The worst case happens if a series doesn't even contain any relevant samples for the query time range. We notice that during preloading but then we will still create a series iterator for it. But even for series that do contain relevant samples, the overhead is quite bad for instant queries that retrieve a single sample from each series, but still go through all the effort of series iterator creation. All of that is particularly bad if a series has many in-memory chunks. This commit addresses the problem from two sides: First, it merges preloading and iterator creation into one step, i.e. the preload call returns an iterator for exactly the preloaded chunks. Second, the required mutex acquisition in chunkDesc has been greatly reduced. That was enabled by a side effect of the first step, which is that the iterator is only referencing pinned chunks, so there is no risk of concurrent eviction anymore, and chunks can be accessed without mutex acquisition. To simplify the code changes for the above, the long-planned change of ValueAtTime to ValueAtOrBefore time was performed at the same time. (It should have been done first, but it kind of accidentally happened while I was in the middle of writing the series iterator changes. Sorry for that.) So far, we actively filtered the up to two values that were returned by ValueAtTime, i.e. we invested work to retrieve up to two values, and then we invested more work to throw one of them away. The SeriesIterator.BoundaryValues method can be removed once #1401 is fixed. But I really didn't want to load even more changes into this PR. Benchmarks: The BenchmarkFuzz.* benchmarks run 83% faster (i.e. about six times faster) and allocate 95% fewer bytes. The reason for that is that the benchmark reads one sample after another from the time series and creates a new series iterator for each sample read. To find out how much these improvements matter in practice, I have mirrored a beefy Prometheus server at SoundCloud that suffers from both issues #1035 and #1264. To reach steady state that would be comparable, the server needs to run for 15d. So far, it has run for 1d. The test server currently has only half as many memory time series and 60% of the memory chunks the main server has. The 90th percentile rule evaluation cycle time is ~11s on the main server and only ~3s on the test server. However, these numbers might get much closer over time. In addition to performance improvements, this commit removes about 150 LOC.	2016-02-19 16:24:38 +01:00
Fabian Reinartz	fce17b41c5	Merge pull request #1408 from prometheus/hostname Log argument parse errors	2016-02-19 12:22:12 +01:00
Fabian Reinartz	e62677d7ba	Log argument parse errors Fixes #1407	2016-02-19 12:20:10 +01:00
Brian Brazil	cd85352fe1	Merge pull request #1403 from igncp/master Fix minor typo	2016-02-17 22:58:05 +00:00
Ignacio Carbajo	6a323b1e6d	Fix minor typo	2016-02-17 22:52:44 +00:00
Brian Brazil	b447002309	Merge pull request #1402 from prometheus/fabxc/target-identity Use fingerprint for target identity comparison	2016-02-17 15:37:10 +00:00
Fabian Reinartz	825831e98f	Use fingerprint for target identity comparison So far we were using the InstanceIdentifier to compare equality of targets. This is not always accurate, for example for the blackbox exporter where the actual target is in the parameter.	2016-02-17 16:34:53 +01:00
Fabian Reinartz	c24c5e6fb3	Merge pull request #1400 from prometheus/beorn7/instrumentation Fix the instrumentation fixes	2016-02-17 15:57:48 +01:00
beorn7	663a1550d0	Fix the instrumentation fixes	2016-02-17 15:50:55 +01:00
Fabian Reinartz	73e38c534a	Merge pull request #1398 from prometheus/scraperef2 Handle scrape timeout on request.	2016-02-16 15:11:09 +01:00
Fabian Reinartz	66767121ab	Handle scrape timeout on request. For historic reasons we were enforcing a timeout directly via the TCP dialer. This is no longer necessary for quite a while now. Switching to context.Context will allow us to properly terminate requests on shutdown as well.	2016-02-16 11:46:02 +01:00
Fabian Reinartz	1f70345d0c	Merge pull request #1397 from prometheus/remove-old-scrapetime-setting Remove old superfluous calls to setLastScrape().	2016-02-15 22:46:09 +01:00
Julius Volz	293486c7b1	Remove old superfluous calls to setLastScrape(). This is called from within the scrape()->report() flow now. See https://github.com/prometheus/prometheus/pull/1394/files#r52945817	2016-02-15 22:42:24 +01:00
Fabian Reinartz	a0078ec84c	Merge pull request #1394 from prometheus/scraperef2 Refactor and test appender modifications	2016-02-15 21:19:40 +01:00
Fabian Reinartz	463dd3ea06	Refactor target scrape reporting.	2016-02-15 18:06:15 +01:00
Fabian Reinartz	f1101590ee	Merge pull request #1395 from prometheus/fabxc/eof Fix wrong EOF error on successful target scraping	2016-02-15 17:26:34 +01:00
Fabian Reinartz	cd28b88b08	Fix wrong EOF error on successful target scraping	2016-02-15 17:23:04 +01:00
Fabian Reinartz	cb86a4300b	Merge pull request #1393 from prometheus/scraperef Make scraping offset consistent.	2016-02-15 16:52:03 +01:00
Fabian Reinartz	27d71b08d1	Factor out appender wrapping	2016-02-15 16:47:39 +01:00
Fabian Reinartz	fe7e91e2eb	Make scraping offset consistent. To evenly distribute scraping load we currently rely on random jittering. This commit hashes over the target's identity and calculates a consistent offset. This also ensures that scrape intervals are constantly spaced between config/target changes.	2016-02-15 16:46:29 +01:00
Brian Brazil	65d226b17a	Merge pull request #1392 from prometheus/scrapetimeout Fix global config YAML issues	2016-02-15 13:21:17 +00:00
Björn Rabenstein	7e41f45fe7	Merge pull request #1387 from prometheus/beorn7/storage Populate first and last time in the chunk descriptor earlier	2016-02-15 14:18:01 +01:00
Fabian Reinartz	37c709f917	Fix global config YAML issues	2016-02-15 14:08:25 +01:00
beorn7	ef3ab96111	Populate first and last time in the chunk descriptor earlier The First time is kind of trivial as we always know it when we create a new chunkDesc. The last time is only know when the chunk is closed, so we have to set it at that time. The change saves a lot of digging down into the chunk itself. Especially the last time is relative expensive as it involves the creation of an iterator. The first time access now doesn't require locking, which is also a nice gain.	2016-02-15 14:06:09 +01:00
Brian Brazil	b3fb91ec87	Merge pull request #1391 from prometheus/scrapetimeout Fix scrape timeout config checks	2016-02-15 11:12:28 +00:00
Fabian Reinartz	44a5e860ed	Fix scrape timeout config checks	2016-02-15 12:07:46 +01:00
Brian Brazil	938ebe78c2	Merge pull request #1390 from prometheus/scraperef Adjust labels on status page	2016-02-15 10:19:45 +00:00
Fabian Reinartz	915a7c09a8	Adjust labels on status page	2016-02-15 11:10:14 +01:00
Fabian Reinartz	70336c6f5b	Merge pull request #1385 from prometheus/scraperef Cleanup target internals	2016-02-15 10:47:03 +01:00
Fabian Reinartz	a06bc75519	Remove occurrences of 'base' labels	2016-02-15 10:36:57 +01:00
Brian Brazil	718098a4df	Merge pull request #1388 from prometheus/update-dns-meta-refs Update two more __meta_dns_srv_name references.	2016-02-14 22:36:52 +00:00
Julius Volz	829a029dda	Update two more __meta_dns_srv_name references. Although they are only in examples/tests and don't affect anything, they could be confusing (the label has been renamed in the rest of the code a while ago).	2016-02-14 22:20:39 +01:00
Fabian Reinartz	0d44248fb8	Cleanup cluttered test data	2016-02-13 10:13:38 +01:00
Fabian Reinartz	65eba080a0	Cleanup internal target data	2016-02-13 10:13:38 +01:00
Brian Brazil	738e6f41d4	Merge pull request #1384 from prometheus/scraperef Restrict scrape timeout to interval length	2016-02-12 12:00:19 +00:00
Fabian Reinartz	e26e4b6e89	Restrict scrape timeout to interval length	2016-02-12 12:52:22 +01:00
Björn Rabenstein	abeeebeed4	Merge pull request #1383 from prometheus/beorn7/race Remove race condition from TestRetentionCutoff	2016-02-12 12:42:13 +01:00
beorn7	9a3edea477	Remove race condition from TestRetentionCutoff	2016-02-12 12:13:19 +01:00
Fabian Reinartz	90b9fae638	Merge pull request #1382 from prometheus/beorn7/vendoring Update common/expfmt vendoring	2016-02-11 17:06:56 +01:00
beorn7	6946fb2058	Update common/expfmt vendoring	2016-02-11 16:08:29 +01:00
Fabian Reinartz	83c5ef7c03	Merge pull request #1380 from prometheus/fix-typos Fix various typos in comments.	2016-02-10 08:31:11 +01:00
Julius Volz	9b6d69610a	Fix various typos in comments. Helpfully reported by https://goreportcard.com/report/github.com/prometheus/prometheus :)	2016-02-10 03:47:00 +01:00
Julius Volz	1c1dcd0255	Merge pull request #1379 from prometheus/fix-target-init Fix target update error handling.	2016-02-10 01:14:24 +01:00
Julius Volz	3728b5872f	Fix target update error handling. Fixes https://github.com/prometheus/prometheus/issues/1378	2016-02-08 21:42:59 +01:00
Brian Brazil	c0df1c7e81	Merge pull request #1376 from prometheus/without Add without aggregator modifier.	2016-02-08 14:09:14 +00:00
Brian Brazil	9d0112d7cf	Add without aggregator modifier. This has the advantage that the user doesn't need to list all labels they want to keep (as with "by") but without having to worry about inconsistent labels as when there's only one time series (as with "keeping_common"). Almost all aggregation should use this rather than the existing two options as it's much less error prone and easier to maintain due to not having to always add in "job" plus whatever other common job-level labels you have like "region".	2016-02-08 14:05:33 +00:00
Julius Volz	e3baa35e9f	Fix typo in documentation/examples/kubernetes-rabbitmq/README.md	2016-02-08 02:00:10 +01:00
Julius Volz	7f6acef4d5	Merge pull request #1314 from katcipis/master Adding RabbitMQ deploy for kubernetes + prometheus integration	2016-02-08 01:59:11 +01:00
Brian Brazil	b7ef0b45e8	Break aggregation tests out. Add missing tests.	2016-02-07 18:02:51 +00:00

1 2 3 4 5 ...

2563 commits