prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2025-03-05 20:59:13 -08:00

Author	SHA1	Message	Date
beorn7	059295332f	Merge remote-tracking branch 'origin/master' into beorn7/storage	2016-02-24 14:02:27 +01:00
beorn7	28e9bbc15f	Populate chunkDesc.chunkLastTime during checkpoint loading, too	2016-02-24 13:58:34 +01:00
Björn Rabenstein	a8c79f0a0c	Merge pull request #1422 from prometheus/release-0.17 Merge more commits from 0.17.	2016-02-23 23:07:44 +01:00
beorn7	8fa1560e48	Fix a very special case of handling the checkpoint timer	2016-02-23 16:48:35 +01:00
Björn Rabenstein	d9eb624322	Merge pull request #1415 from prometheus/release-0.17 Forward-merge release-0.17 into master	2016-02-22 16:39:48 +01:00
beorn7	4d1f7b49b6	Fix a race condition in calculatePersistenceUrgencyScore	2016-02-22 15:48:39 +01:00
beorn7	d290340367	Fix and improve chunkDesc locking	2016-02-19 16:24:38 +01:00
beorn7	0e202dacb4	Streamline series iterator creation This will fix issue #1035 and will also help to make issue #1264 less bad. The fundamental problem in the current code: In the preload phase, we quite accurately determine which chunks will be used for the query being executed. However, in the subsequent step of creating series iterators, the created iterators are referencing _all_ in-memory chunks in their series, even the un-pinned ones. In iterator creation, we copy a pointer to each in-memory chunk of a series into the iterator. While this creates a certain amount of allocation churn, the worst thing about it is that copying the chunk pointer out of the chunkDesc requires a mutex acquisition. (Remember that the iterator will also reference un-pinned chunks, so we need to acquire the mutex to protect against concurrent eviction.) The worst case happens if a series doesn't even contain any relevant samples for the query time range. We notice that during preloading but then we will still create a series iterator for it. But even for series that do contain relevant samples, the overhead is quite bad for instant queries that retrieve a single sample from each series, but still go through all the effort of series iterator creation. All of that is particularly bad if a series has many in-memory chunks. This commit addresses the problem from two sides: First, it merges preloading and iterator creation into one step, i.e. the preload call returns an iterator for exactly the preloaded chunks. Second, the required mutex acquisition in chunkDesc has been greatly reduced. That was enabled by a side effect of the first step, which is that the iterator is only referencing pinned chunks, so there is no risk of concurrent eviction anymore, and chunks can be accessed without mutex acquisition. To simplify the code changes for the above, the long-planned change of ValueAtTime to ValueAtOrBefore time was performed at the same time. (It should have been done first, but it kind of accidentally happened while I was in the middle of writing the series iterator changes. Sorry for that.) So far, we actively filtered the up to two values that were returned by ValueAtTime, i.e. we invested work to retrieve up to two values, and then we invested more work to throw one of them away. The SeriesIterator.BoundaryValues method can be removed once #1401 is fixed. But I really didn't want to load even more changes into this PR. Benchmarks: The BenchmarkFuzz.* benchmarks run 83% faster (i.e. about six times faster) and allocate 95% fewer bytes. The reason for that is that the benchmark reads one sample after another from the time series and creates a new series iterator for each sample read. To find out how much these improvements matter in practice, I have mirrored a beefy Prometheus server at SoundCloud that suffers from both issues #1035 and #1264. To reach steady state that would be comparable, the server needs to run for 15d. So far, it has run for 1d. The test server currently has only half as many memory time series and 60% of the memory chunks the main server has. The 90th percentile rule evaluation cycle time is ~11s on the main server and only ~3s on the test server. However, these numbers might get much closer over time. In addition to performance improvements, this commit removes about 150 LOC.	2016-02-19 16:24:38 +01:00
beorn7	ef3ab96111	Populate first and last time in the chunk descriptor earlier The First time is kind of trivial as we always know it when we create a new chunkDesc. The last time is only know when the chunk is closed, so we have to set it at that time. The change saves a lot of digging down into the chunk itself. Especially the last time is relative expensive as it involves the creation of an iterator. The first time access now doesn't require locking, which is also a nice gain.	2016-02-15 14:06:09 +01:00
beorn7	9a3edea477	Remove race condition from TestRetentionCutoff	2016-02-12 12:13:19 +01:00
Julius Volz	9b6d69610a	Fix various typos in comments. Helpfully reported by https://goreportcard.com/report/github.com/prometheus/prometheus :)	2016-02-10 03:47:00 +01:00
Fabian Reinartz	1f877f3d2a	Fix deadlock, structure target logging	2016-02-03 10:39:34 +01:00
Fabian Reinartz	59f1e722df	Return error on sample appending	2016-02-02 14:01:44 +01:00
beorn7	ec08c9a391	Rework the way to communicate backpressure (AKA suspended ingestion) This gives up on the idea to communicate throuh the Append() call (by either not returning as it is now or returning an error as suggested/explored elsewhere). Here I have added a Throttled() call, which has the advantage that it can be called before a whole _batch_ of Append()'s. Scrapes will happen completely or not at all. Same for rule group evaluations. That's a highly desired behavior (as discussed elsewhere). The code is even simpler now as the whole ingestion buffer could be removed. Logging of throttled mode has been streamlined and will create at most one message per minute.	2016-02-01 14:45:44 +01:00
beorn7	87ef24cd25	Add instrumentation and refactor things around "rushed mode"	2016-01-26 17:44:21 +01:00
beorn7	a2cd479058	Fix calculation of chunks to persist after restart Since we are not overestimating the number of chunks to persist anymore, this commit also adjusts the default value for -storage.local.memory-chunks. Update of documentation will follow.	2016-01-25 19:33:51 +01:00
beorn7	972d94433a	Introduce a hysteresis for "rushed mode" "Rushed mode" is formerly known as "degraded mode", which is changed with this commit, too. The name "degraded" was very misleading. Also, switch into rushed mode if we have too many chunks in memory and an at least reasonable amount of chunks to persist so that speeding up persisting chunks can help.	2016-01-25 19:24:37 +01:00
beorn7	14796bdb60	Improve chunkMaxBatchSize doc comment	2016-01-25 18:57:51 +01:00
beorn7	582af1618c	Streamline chunk writing This helps to avoid allocations in the same way we were already doing it during reading.	2016-01-25 16:36:36 +01:00
beorn7	99b9611351	Remove a race condition from TestRetentionCutoff	2016-01-25 16:36:14 +01:00
beorn7	3f4d22e4c7	Update doc comment This should have gone into a previous commit, but I forgot to save this particular file.	2016-01-12 12:38:18 +01:00
beorn7	add2ebdd56	Tolerate the lost+found directory in the data directory	2016-01-11 18:05:36 +01:00
Björn Rabenstein	6293f3a374	Merge pull request #1304 from prometheus/beorn7/storage Improve handling of series file truncation	2016-01-11 17:27:08 +01:00
beorn7	cb117d8346	Add a series ops metric "purge_on_request" It counts series deletions triggered via the API.	2016-01-11 17:22:16 +01:00
beorn7	4221c7de5c	Improve handling of series file truncation If only very few chunks are to be truncated from a very large series file, the rewrite of the file is a lorge overhead. With this change, a certain ratio of the file has to be dropped to make it happen. While only causing disk overhead at about the same ratio (by default 10%), it will cut down I/O by a lot in above scenario.	2016-01-11 16:42:10 +01:00
Corentin Chary	7b6c3e556c	Use '.' instead of '=' to separate labels from their values in Graphite Using .label=value. was weird to use in Graphite and didn't bring much value.	2016-01-11 13:57:14 +01:00
Julius Volz	75fdcf5698	Merge pull request #1197 from iksaif/master Add support for remote storage on Graphite	2015-11-10 09:46:17 +01:00
Corentin Chary	a2e4439086	Add support for remote storage on Graphite Allows to use graphite over tcp or udp. Metrics labels and values are used to construct a valid Graphite path in a way that will allow us to eventually read them back and reconstruct the metrics. For example, this metric: model.Metric{ model.MetricNameLabel: "test:metric", "testlabel": "test:value", "testlabel2": "test:value", ) Will become: test:metric.testlabel=test:value.testlabel2=test:value escape.go takes care of escaping values to match Graphite character set, it basically uses percent-encoding as a fallback wich will work pretty will in the graphite/grafana world. The remote storage module also has an optional 'prefix' parameter to prefix all metrics with a path (for example, 'prometheus.'). Graphite URLs are simply in the form tcp://host:port or udp://host:port.	2015-11-10 07:58:57 +01:00
Fabian Reinartz	33aab4169c	Anchor regexes in vector matching This commit makes the regex behavior of vector matching consistent with configuration and label_replace() by anchoring it. Fixes #1200	2015-11-05 11:23:43 +01:00
Fabian Reinartz	e3b6ec9784	Switch to common/log	2015-10-03 10:21:43 +02:00
Julius Volz	dac26cef71	Rename global "labels" config option to "external_labels".	2015-09-29 20:54:20 +02:00
Julius Volz	eeb1da36ac	Fix InfluxDB write support to work with InfluxDB 0.9.x. Because the InfluxDB client library currently pulls in multiple MBs of unnecessary dependencies, I have modified and cut up the vendored version to only pull in the few pieces that are actually needed. On InfluxDB's side, this dependency issue is tracked in: https://github.com/influxdb/influxdb/issues/3447 Hopefully, it will be resolved soon. If a password is needed for InfluxDB, it may be supplied via the INFLUXDB_PW environment variable.	2015-09-16 17:40:03 +02:00
Julius Volz	5f77fce578	Improve remote storage queue manager metrics.	2015-09-16 17:20:23 +02:00
beorn7	22d3a4311a	Increase waiting time in TestEvictAndLoadChunkDescs The test had become flaky with Go1.5. Theory here is that with Go1.5.x, sleeping for 10ms might not be enough to wake up another goroutine, possibly because it is used for GC. 50ms should always be enough due to GC pause guarantees with the new GC.	2015-09-14 21:09:46 +02:00
Julius Volz	af513468eb	Fix some dead code, missing error checks, shadowings. I applied https://medium.com/@jgautheron/quality-pipeline-for-go-projects-497e34d6567 and was greeted with a deluge of warnings, most of which were not applicable or really fixable realistically. These are some of the first ones I decided to fix.	2015-09-14 12:21:34 +02:00
beorn7	daeccdd0e9	Fix DropMetricsForFingerprints It now deletes the series file also for archived series. Also, fix a naming error in a doc comment.	2015-09-11 15:47:23 +02:00
Julius Volz	ffc5142c54	Merge pull request #1058 from prometheus/check-errors Fix error checking and logging around checkpointing.	2015-09-07 19:57:16 +02:00
Julius Volz	6774a73878	Fix error checking and logging around checkpointing.	2015-09-07 19:34:59 +02:00
Julius Volz	011faf9057	Fix typo in comment.	2015-09-07 19:15:28 +02:00
Fabian Reinartz	8fa719f778	Attach global labels to remote storage samples	2015-09-03 16:38:04 +02:00
Dieter Plaetinck	e1dacc56e6	fix comment. the sample doesn't get appended to the list of sampleappenders.	2015-08-30 16:26:46 +02:00
Julius Volz	744d5d5a7a	Merge pull request #1029 from prometheus/vet-fixes Fix "go vet" errors.	2015-08-26 12:50:18 +02:00
Julius Volz	995d3b831d	Fix most golint warnings. This is with `golint -min_confidence=0.5`. I left several lint warnings untouched because they were either incorrect or I felt it was better not to change them at the moment.	2015-08-26 12:44:46 +02:00
Julius Volz	963ad82dcb	Fix "go vet" errors. I ignored all errors of the type "composite literal uses unkeyed fields". Most of them are wrong because of https://github.com/golang/go/issues/9171.	2015-08-26 02:05:04 +02:00
Fabian Reinartz	d6b8da8d43	Switch promql types to common/model	2015-08-25 13:49:14 +02:00
Fabian Reinartz	e061595352	Move COWMetric into storage/metric package	2015-08-25 11:59:07 +02:00
Brian Brazil	fdf0d0642e	Cast value to float, as that's what the console templates expect.	2015-08-24 16:59:08 +01:00
Fabian Reinartz	1535ef1457	Replace metric.SamplePair with model.SamplePair	2015-08-22 14:52:35 +02:00
Fabian Reinartz	c9d396f476	Replace metric.LabelPair with model.LabelPair	2015-08-22 13:32:13 +02:00
Fabian Reinartz	438e232c9b	Fix grouping of import blocks	2015-08-22 09:42:45 +02:00

1 2 3 4 5 ...

569 commits