prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-11-18 11:34:05 -08:00

Author	SHA1	Message	Date
beorn7	4b503ed9a5	Merge branch 'master' into beorn7/storage2	2016-02-24 14:03:49 +01:00
beorn7	059295332f	Merge remote-tracking branch 'origin/master' into beorn7/storage	2016-02-24 14:02:27 +01:00
beorn7	53005c3085	Merge branch 'beorn7/storage' into beorn7/storage2	2016-02-24 14:00:56 +01:00
beorn7	28e9bbc15f	Populate chunkDesc.chunkLastTime during checkpoint loading, too	2016-02-24 13:58:34 +01:00
Björn Rabenstein	a8c79f0a0c	Merge pull request #1422 from prometheus/release-0.17 Merge more commits from 0.17.	2016-02-23 23:07:44 +01:00
Björn Rabenstein	5eff37ccbe	Merge pull request #1421 from prometheus/beorn7/fix Fix a very special case of handling the checkpoint timer	2016-02-23 22:25:27 +01:00
beorn7	8fa1560e48	Fix a very special case of handling the checkpoint timer	2016-02-23 16:48:35 +01:00
Björn Rabenstein	17bfe798eb	Merge pull request #1419 from prometheus/release-note-fixes Improve 0.17.0 changelog	2016-02-23 11:21:35 +01:00
Tobias Schmidt	b7e6651e06	Improve 0.17.0 changelog * remove wrong release date until 0.17.0 gets actually released * fix wrong alertmanager version number * add example for regex anchor change	2016-02-22 19:49:33 -05:00
Brian Brazil	e4e00b6f24	Merge pull request #1418 from igncp/patch-1 Fix minor typo	2016-02-22 23:44:46 +00:00
Ignacio Carbajo	0c537d6af6	Fix minor typo	2016-02-22 23:25:17 +00:00
beorn7	41e44f6ab9	Merge branch 'master' into beorn7/storage2	2016-02-22 16:54:33 +01:00
Björn Rabenstein	888c77cb06	Merge pull request #1416 from prometheus/beorn7/fix-test Fix a targetmanager test	2016-02-22 16:53:59 +01:00
beorn7	fd5108b038	Fix a targetmanager test	2016-02-22 16:43:48 +01:00
Björn Rabenstein	d9eb624322	Merge pull request #1415 from prometheus/release-0.17 Forward-merge release-0.17 into master	2016-02-22 16:39:48 +01:00
Björn Rabenstein	51aad630b6	Merge pull request #1414 from prometheus/beorn7/rushed-race Fix a race condition in calculatePersistenceUrgencyScore	2016-02-22 16:09:19 +01:00
beorn7	4d1f7b49b6	Fix a race condition in calculatePersistenceUrgencyScore	2016-02-22 15:48:39 +01:00
Brian Brazil	04946afd0a	Merge pull request #1412 from prometheus/fingerprintfix Remove fullLabels method and fix target updating	2016-02-22 12:11:08 +00:00
Fabian Reinartz	6df1f49c13	Remove fullLabels method and fix target updating With recent changes to a Target's internal data representation updating by fullLabels() assigns the additional default instance label. This breaks target identity comparison and causes identical targets from service discovery to be constantly swapped.	2016-02-22 13:06:30 +01:00
beorn7	454ecf3f52	Rework the way ranges and instants are handled In a way, our instants were also ranges, just with the staleness delta as range length. They are no treated equally, just that in one case, the range length is set as range, in the other the staleness delta. However, there are "real" instants where start and and time of a query is the same. In those cases, we only want to return a single value (the one closest before or at the equal start and end time). If that value is the last sample in the series, odds are we have it already in the series object. In that case, there is no need to pin or load any chunks. A special singleSampleSeriesIterator is created for that. This should greatly speed up instant queries as they happen frequently for rule evaluations.	2016-02-22 01:47:18 +01:00
Fabian Reinartz	209c4ad64f	Merge pull request #1410 from bluecmd/patch-1 Allow custom ldflags for go build	2016-02-21 10:35:00 +01:00
Christian Svensson	69ebf45649	Allow custom ldflags for go build This allows users to use CGO and external linker when building Prometheus.	2016-02-20 17:34:36 +01:00
beorn7	b876f8e6a5	Move lastSamplePair method up to memorySeries This implies a slight change of behavior as only samples added to the respective instance of a memorySeries are returned. However, this is most likely anyway what we want. Following cases: - Server has been restarted: Given the time it takes to cleanly shutdown and start up a server, the series are now stale anyway. An improved staleness handling (still to be implemented) will be based on tracking if a given target is continuing to expose samples for a given time series. In that case, we need a full scrape cycle to decide about staleness. So again, it makes sense to consider everything stale directly after a server restart. - Series unarchived due to a read request: The series is definitely stale so we don't want to return anything anyway. - Freshly created time series or series unarchived because of a sample append: That happens because appending a sample is imminent. Before the fingerprint lock is released, the series will have received a sample, and lastSamplePair will always returned the expected value.	2016-02-19 18:16:41 +01:00
beorn7	1e13f89039	Return SamplePair istead of *SamplePair consistently Formalize ZeroSamplePair as return value for non-existing samples. Change LastSamplePairForFingerprint to return a SamplePair (and not a pointer to it), which saves allocations in a potentially extremely frequent call.	2016-02-19 17:00:40 +01:00
beorn7	d290340367	Fix and improve chunkDesc locking	2016-02-19 16:24:38 +01:00
beorn7	0e202dacb4	Streamline series iterator creation This will fix issue #1035 and will also help to make issue #1264 less bad. The fundamental problem in the current code: In the preload phase, we quite accurately determine which chunks will be used for the query being executed. However, in the subsequent step of creating series iterators, the created iterators are referencing _all_ in-memory chunks in their series, even the un-pinned ones. In iterator creation, we copy a pointer to each in-memory chunk of a series into the iterator. While this creates a certain amount of allocation churn, the worst thing about it is that copying the chunk pointer out of the chunkDesc requires a mutex acquisition. (Remember that the iterator will also reference un-pinned chunks, so we need to acquire the mutex to protect against concurrent eviction.) The worst case happens if a series doesn't even contain any relevant samples for the query time range. We notice that during preloading but then we will still create a series iterator for it. But even for series that do contain relevant samples, the overhead is quite bad for instant queries that retrieve a single sample from each series, but still go through all the effort of series iterator creation. All of that is particularly bad if a series has many in-memory chunks. This commit addresses the problem from two sides: First, it merges preloading and iterator creation into one step, i.e. the preload call returns an iterator for exactly the preloaded chunks. Second, the required mutex acquisition in chunkDesc has been greatly reduced. That was enabled by a side effect of the first step, which is that the iterator is only referencing pinned chunks, so there is no risk of concurrent eviction anymore, and chunks can be accessed without mutex acquisition. To simplify the code changes for the above, the long-planned change of ValueAtTime to ValueAtOrBefore time was performed at the same time. (It should have been done first, but it kind of accidentally happened while I was in the middle of writing the series iterator changes. Sorry for that.) So far, we actively filtered the up to two values that were returned by ValueAtTime, i.e. we invested work to retrieve up to two values, and then we invested more work to throw one of them away. The SeriesIterator.BoundaryValues method can be removed once #1401 is fixed. But I really didn't want to load even more changes into this PR. Benchmarks: The BenchmarkFuzz.* benchmarks run 83% faster (i.e. about six times faster) and allocate 95% fewer bytes. The reason for that is that the benchmark reads one sample after another from the time series and creates a new series iterator for each sample read. To find out how much these improvements matter in practice, I have mirrored a beefy Prometheus server at SoundCloud that suffers from both issues #1035 and #1264. To reach steady state that would be comparable, the server needs to run for 15d. So far, it has run for 1d. The test server currently has only half as many memory time series and 60% of the memory chunks the main server has. The 90th percentile rule evaluation cycle time is ~11s on the main server and only ~3s on the test server. However, these numbers might get much closer over time. In addition to performance improvements, this commit removes about 150 LOC.	2016-02-19 16:24:38 +01:00
Fabian Reinartz	fce17b41c5	Merge pull request #1408 from prometheus/hostname Log argument parse errors	2016-02-19 12:22:12 +01:00
Fabian Reinartz	e62677d7ba	Log argument parse errors Fixes #1407	2016-02-19 12:20:10 +01:00
Brian Brazil	cd85352fe1	Merge pull request #1403 from igncp/master Fix minor typo	2016-02-17 22:58:05 +00:00
Ignacio Carbajo	6a323b1e6d	Fix minor typo	2016-02-17 22:52:44 +00:00
Brian Brazil	b447002309	Merge pull request #1402 from prometheus/fabxc/target-identity Use fingerprint for target identity comparison	2016-02-17 15:37:10 +00:00
Fabian Reinartz	825831e98f	Use fingerprint for target identity comparison So far we were using the InstanceIdentifier to compare equality of targets. This is not always accurate, for example for the blackbox exporter where the actual target is in the parameter.	2016-02-17 16:34:53 +01:00
Fabian Reinartz	c24c5e6fb3	Merge pull request #1400 from prometheus/beorn7/instrumentation Fix the instrumentation fixes	2016-02-17 15:57:48 +01:00
beorn7	663a1550d0	Fix the instrumentation fixes	2016-02-17 15:50:55 +01:00
Fabian Reinartz	73e38c534a	Merge pull request #1398 from prometheus/scraperef2 Handle scrape timeout on request.	2016-02-16 15:11:09 +01:00
Fabian Reinartz	66767121ab	Handle scrape timeout on request. For historic reasons we were enforcing a timeout directly via the TCP dialer. This is no longer necessary for quite a while now. Switching to context.Context will allow us to properly terminate requests on shutdown as well.	2016-02-16 11:46:02 +01:00
Fabian Reinartz	1f70345d0c	Merge pull request #1397 from prometheus/remove-old-scrapetime-setting Remove old superfluous calls to setLastScrape().	2016-02-15 22:46:09 +01:00
Julius Volz	293486c7b1	Remove old superfluous calls to setLastScrape(). This is called from within the scrape()->report() flow now. See https://github.com/prometheus/prometheus/pull/1394/files#r52945817	2016-02-15 22:42:24 +01:00
Fabian Reinartz	a0078ec84c	Merge pull request #1394 from prometheus/scraperef2 Refactor and test appender modifications	2016-02-15 21:19:40 +01:00
Fabian Reinartz	463dd3ea06	Refactor target scrape reporting.	2016-02-15 18:06:15 +01:00
Fabian Reinartz	f1101590ee	Merge pull request #1395 from prometheus/fabxc/eof Fix wrong EOF error on successful target scraping	2016-02-15 17:26:34 +01:00
Fabian Reinartz	cd28b88b08	Fix wrong EOF error on successful target scraping	2016-02-15 17:23:04 +01:00
Fabian Reinartz	cb86a4300b	Merge pull request #1393 from prometheus/scraperef Make scraping offset consistent.	2016-02-15 16:52:03 +01:00
Fabian Reinartz	27d71b08d1	Factor out appender wrapping	2016-02-15 16:47:39 +01:00
Fabian Reinartz	fe7e91e2eb	Make scraping offset consistent. To evenly distribute scraping load we currently rely on random jittering. This commit hashes over the target's identity and calculates a consistent offset. This also ensures that scrape intervals are constantly spaced between config/target changes.	2016-02-15 16:46:29 +01:00
Brian Brazil	65d226b17a	Merge pull request #1392 from prometheus/scrapetimeout Fix global config YAML issues	2016-02-15 13:21:17 +00:00
Björn Rabenstein	7e41f45fe7	Merge pull request #1387 from prometheus/beorn7/storage Populate first and last time in the chunk descriptor earlier	2016-02-15 14:18:01 +01:00
Fabian Reinartz	37c709f917	Fix global config YAML issues	2016-02-15 14:08:25 +01:00
beorn7	ef3ab96111	Populate first and last time in the chunk descriptor earlier The First time is kind of trivial as we always know it when we create a new chunkDesc. The last time is only know when the chunk is closed, so we have to set it at that time. The change saves a lot of digging down into the chunk itself. Especially the last time is relative expensive as it involves the creation of an iterator. The first time access now doesn't require locking, which is also a nice gain.	2016-02-15 14:06:09 +01:00
Brian Brazil	b3fb91ec87	Merge pull request #1391 from prometheus/scrapetimeout Fix scrape timeout config checks	2016-02-15 11:12:28 +00:00

... 218 219 220 221 222 ...

13544 commits