prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-09-20 15:57:31 -07:00

Author	SHA1	Message	Date
Björn Rabenstein	9688a312ed	Merge pull request #2355 from prometheus/beorn7/lint Remove auto-generated protobuf code from codeclimate	2017-01-20 11:31:51 +01:00
beorn7	4392aa43d4	Remove auto-generated protobuf code from codeclimate	2017-01-20 11:07:20 +01:00
Björn Rabenstein	d717175104	Merge pull request #2354 from prometheus/beorn7/lint Documentation: Add Code Climate badges to README.md	2017-01-20 10:51:05 +01:00
beorn7	0c8b753f6e	Documentation: Add Code Climate badges to README.md	2017-01-19 23:22:22 +01:00
Scott Larkin	e5a75b2b30	Code Climate config (#2351 ) Created a Code Climate config with gofmt, golint, and govet enabled	2017-01-19 22:19:32 +01:00
Alex Somesan	b22eb65d0f	Cleaner separation between ServiceAccount and custom authentication in K8S SD (#2348 ) * Canonical usage of cluster service-account in K8S SD * Early validation for opt-in custom auth in K8S SD * Fix typo in condition	2017-01-19 10:52:52 +01:00
Fabian Reinartz	7eb849e6a8	Merge pull request #2307 from joyent/triton_discovery Add Joyent Triton discovery	2017-01-18 05:08:11 +01:00
Richard Kiene	f3d9692d09	Add Joyent Triton discovery	2017-01-17 20:34:32 +00:00
Brian Brazil	c1b547a90e	Only checkpoint chunkdescs and series that need persisting. (#2340 ) This decreases checkpoint size by not checkpointing things that don't actually need checkpointing. This is fully compatible with the v2 checkpoint format, as it makes series appear as though the only chunksdescs in memory are those that need persisting.	2017-01-17 00:59:38 +00:00
Fabian Reinartz	5418a42965	Merge pull request #2345 from Bplotka/fixed-alertmanager-flag-auth Fixed regression in `-alertmanager.url flag`. Basic auth was ignored.	2017-01-16 18:29:51 +01:00
Bartek Plotka	579e33f19a	Fixed style issues.	2017-01-16 16:45:58 +00:00
Bartek Plotka	d7febe97fa	Fixed regression in -alertmanager.url flag. Basic auth was ignored. - Included basic auth parsing while parsing to AlertmanagerConfig - Added test case Signed-off-by: Bartek Plotka <bwplotka@gmail.com>	2017-01-16 16:39:20 +00:00
Fabian Reinartz	990e40c959	Merge pull request #2338 from brancz/alertmanager-api web/api: add alertmanager api	2017-01-16 12:08:14 +01:00
Frederic Branczyk	bd92571bdd	web/api: make target and alertmanager api responses consistent	2017-01-16 11:53:00 +01:00
Fabian Reinartz	022714b60a	Merge pull request #2341 from mattbostock/patch-1 Correct notifications_dropped description	2017-01-16 09:23:46 +01:00
Matt Bostock	4160892109	Correct notifications_dropped description The current description does not accurately describe when the metric is incremented. Aside from Alertmanger missing from the configuration, `prometheus_notifications_dropped_total` is incremented when errors occur while sending alert notifications to Alertmanager, or because the notifications queue is full, or because the number of notifications to be sent exceeds the queue capacity. I think calling these cases 'errors' in a generic sense is more useful than the current description.	2017-01-13 23:36:00 +00:00
Brian Brazil	f64c231dad	Allow checkpoints and maintenance to happen concurrently. (#2321 ) This is essential on larger Prometheus servers, as otherwise checkpoints prevent sufficient persisting of chunks to disk.	2017-01-13 17:24:19 +00:00
Frederic Branczyk	389c6d0043	web/api: add alertmanager api	2017-01-13 15:30:20 +01:00
Brian Brazil	1dcb7637f5	Add various persistence related metrics (#2333 ) Add metrics around checkpointing and persistence * Add a metric to say if checkpointing is happening, and another to track total checkpoint time and count. This breaks the existing prometheus_local_storage_checkpoint_duration_seconds by renaming it to prometheus_local_storage_checkpoint_last_duration_seconds as the former name is more appropriate for a summary. * Add metric for last checkpoint size. * Add metric for series/chunks processed by checkpoints. For long checkpoints it'd be useful to see how they're progressing. * Add metric for dirty series * Add metric for number of chunks persisted per series. You can get the number of chunks from chunk_ops, but not the matching number of series. This helps determine the size of the writes being made. * Add metric for chunks queued for persistence Chunks created includes both chunks that'll need persistence and chunks read in for queries. This only includes chunks created for persistence. * Code review comments on new persistence metrics.	2017-01-11 15:11:19 +00:00
Björn Rabenstein	6ce97837ab	Merge pull request #2327 from prometheus/beorn7/vendoring vendoring: Update prometheus/common to pull in bug fixes	2017-01-09 13:28:36 +01:00
beorn7	86ec87b78f	vendoring: Update prometheus/common to pull in bug fixes In particular the one for https://github.com/prometheus/common/issues/72.	2017-01-09 12:25:17 +01:00
Fabian Reinartz	3302bb1eb1	Merge pull request #2323 from prometheus/beorn7/retrieval Retrieval: Avoid copying Target	2017-01-08 06:49:47 +01:00
Björn Rabenstein	ad40d0abbc	Merge pull request #2288 from prometheus/limit-scrape Add ability to limit scrape samples, and related metrics	2017-01-08 01:34:06 +01:00
beorn7	5dc01202d7	Retrieval: Remove some test lines that fail on Travis only These lines exercise an append in TestScrapeLoopWrapSampleAppender. Arguably, append shouldn't be tested there in the first place. Still it's weird why this fails on Travis: ``` --- FAIL: TestScrapeLoopWrapSampleAppender (0.00s) scrape_test.go:259: Expected count of 1, got 0 scrape_test.go:290: Expected count of 1, got 0 2017/01/07 22:48:26 http: TLS handshake error from 127.0.0.1:50716: read tcp 127.0.0.1:40265->127.0.0.1:50716: read: connection reset by peer FAIL FAIL github.com/prometheus/prometheus/retrieval 3.603s ``` Should anybody ever find out why, please revert this commit accordingly.	2017-01-08 00:01:46 +01:00
beorn7	3610331eeb	Retrieval: Do not buffer the samples if no sample limit configured Also, simplify and streamline the code a bit.	2017-01-07 18:18:54 +01:00
André Carvalho	c43dfaba1c	Add max concurrent and current queries engine metrics (#2326 ) * Add max concurrent and current queries engine metrics This commit adds two metrics to the promql/engine: the number of max concurrent queries, as configured by the flag, and the number of current queries being served+blocked in the engine.	2017-01-07 14:41:25 +00:00
beorn7	767c0709b1	Retrieval: Avoid copying Target retreival.Target contains a mutex. It was copied in the Targets() call. This potentially can wreak a lot of havoc. It might even have caused the issues reported as #2266 and #2262 .	2017-01-06 18:43:41 +01:00
Brian Brazil	f9e581907a	Make index queue bigger. (#2322 ) When a large Prometheus starts up fresh it can take many minutes to warmup and clear out the index queue. A larger queue means less blocking, bigger batches and cuts down startup time by ~50%.	2017-01-05 17:57:42 +00:00
Fabian Reinartz	c9f4aea8e2	Merge pull request #2305 from alicebob/favicon Add a favicon to the web GUI	2017-01-04 10:15:27 +01:00
Martin Lehmann	78fae3155f	Make relative links in README.md absolute (#2316 ) The relative links don't work in other pages that render the README (for example https://hub.docker.com/r/prom/prometheus/). As they are (hopefully) not due to change any time soon, I think using absolute links is better.	2017-01-03 20:07:33 +00:00
Julius Volz	90dd216646	Merge pull request #2306 from EdSchouten/sorted-alerts Use lexicographic order to sort alerts by name.	2016-12-31 13:12:30 +01:00
Mitsuhiro Tanda	7e369b9318	expose max memory chunks metrics (#2303 ) * expose max memory chunks metrics	2016-12-27 18:34:07 +00:00
Ed Schouten	b3a39ccd8a	Use lexicographic order to sort alerts by name. Right now the /alerts page of Prometheus sorts alerts by severity (firing, pending, inactive). Once multiple alerts have the same severity, their order seems to correlate to how they are placed in the configuration files, but not always. Looking at the code, we make use of sort.Sort(), which is documented not to provide a stable sort. The Less() function also only takes the alert state into account. This change extends the Less() function to provide a lexicographic order on both the alert state and the name. This means I can finally find the alerts I'm looking for without using my browser's search feature.	2016-12-27 14:28:44 +01:00
Harmen	135d32ea22	make assets	2016-12-27 13:59:20 +01:00
Harmen	dfa4f79bcd	add favicon	2016-12-27 13:58:51 +01:00
Brian Brazil	93b70ee4ea	Evict chunk descs of all unloaded chunks during maintenance. (#2297 ) Keeping these around has two problems: 1) Each desc takes 64 bytes, 10 of them is 640B. This is a lot of overhead on a 1024 byte chunk. 2) It can take well over a week to reach a point where this and thus Prometheus memory usage as a whole enters steady state. This makes RAM estimation very hard for users, and makes it difficult to investigate things like memory fragmentation. Instead we'll wipe them during each memory series maintenance cycle, and if a query pulls them in they'll hang around as cache until the next cycle.	2016-12-22 13:49:03 +00:00
Brian Brazil	bed4635802	Use irate consistently in console template examples. (#2296 ) I must have forgotten my 'g' when switching these.	2016-12-21 13:19:23 +00:00
Fabian Reinartz	d6d03a966f	Merge pull request #2295 from prometheus/fast-path-remote Don't clone the metric if there's no remote writes.	2016-12-21 12:36:41 +01:00
Brian Brazil	1b8a474612	Don't clone the metric if there's no remote writes. The metric clone can't be further optimised, and is a non-trivial memory allocation cost so fast path it if there's no remote writes configured.	2016-12-21 11:34:48 +00:00
Brian Brazil	6c07453ec1	Only clone the metric in the one place relabelling needs it. (#2292 ) This cuts ~17% off memory allocations related to ingesting data in a basic setup.	2016-12-21 10:00:33 +00:00
Brian Brazil	2e3b42ad6c	Correctly handle the end time being 0 in the URL. (#2290 )	2016-12-18 19:30:52 +00:00
Brian Brazil	f421ce0636	Remove label from prometheus_target_skipped_scrapes_total (#2289 ) This avoids it not being intialised, and breaking out by interval wasn't partiuclarly useful. Fixes #2269	2016-12-16 18:00:52 +00:00
Brian Brazil	30448286c7	Add sample_limit to scrape config. This imposes a hard limit on the number of samples ingested from the target. This is counted after metric relabelling, to allow dropping of problemtic metrics. This is intended as a very blunt tool to prevent overload due to misbehaving targets that suddenly jump in sample count (e.g. adding a label containing email addresses). Add metric to track how often this happens. Fixes #2137	2016-12-16 15:10:09 +00:00
Björn Rabenstein	f3f798fbcf	Merge pull request #2283 from tcolgate/ignoredots ignore dotfiles in data directory	2016-12-15 13:32:03 +01:00
Tristan Colgate	30be8e0b8a	ignore dotfiles in data directory	2016-12-15 11:48:23 +00:00
Tristan Colgate-McFarlane	4d9134e6d8	Add labeldrop and labelkeep actions. (#2279 ) Introduce two new relabel actions. labeldrop, and labelkeep. These can be used to filter the set of labels by matching regex - labeldrop: drops all labels that match the regex - labelkeep: drops all labels that do not match the regex	2016-12-14 10:17:42 +00:00
Björn Rabenstein	45570e5972	Merge pull request #2277 from prometheus/beorn7/storage2 storage: Sanity-check number of loaded chunk descs	2016-12-14 02:59:10 +01:00
beorn7	253be23c00	storage: Sanity-check number of loaded chunk descs Two cases: - An unarchived metric must have at least one chunk desc loaded upon unarchival. Otherwise, the file is gone or has size 0, which is an inconsistency (because the series is still indexed in the archive index). Hence, quarantining is triggered. - If loading the chunk descs of a series with a known chunkDescsOffset (i.e. != -1), the number of chunks loaded must be equal to chunkDescsOffset. If not, there is a data corruption. An error is returned, which leads to qurantining. In any case, there is a guard added to not access the 1st element of an empty chunkDescs slice. (That's what triggered the crashes in issue 2249.) A time series with unknown chunkDescsOffset and no chunks in memory and no chunks on disk either could trigger that case. I would assume such a "null series" doesn't exist, but it's not entirely unthinkable and unreasonable to happen (perhaps in future uses of the storage). (Create a series, and then something tries to preload chunks before the first sample is added.)	2016-12-13 23:19:39 +01:00
Björn Rabenstein	5f0c0e43cf	Merge pull request #2276 from prometheus/beorn7/storage storage: Catch data corruption that leads to division by zero	2016-12-13 23:13:39 +01:00
Björn Rabenstein	a4c8292232	Merge pull request #2278 from prometheus/beorn7/style storage: Fix linter issue	2016-12-13 23:13:05 +01:00

... 2 3 4 5 6 ...

3755 commits