prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-11-13 09:04:06 -08:00

Author	SHA1	Message	Date
Julius Volz	85497e3f38	Add function to drop common labels in a vector. This fixes https://github.com/prometheus/prometheus/issues/384. Change-Id: I2973c4baeb8a4618ec3875fb11c6fcf5d111784b	2014-11-25 17:02:00 +01:00
Julius Volz	3fdb74e571	Add more topk() / bottomk() tests. Test what happens if k > number of input elements. Change-Id: Ie724b850939e297ebf085f0a5a3522e9cfcc6534	2014-11-25 17:02:00 +01:00
Julius Volz	c582ae73c2	Implement topk() and bottomk() functions. To achieve O(log n * k) runtime, this uses a heap to track the current bottom-k or top-k elements while iterating over the full set of available elements. It would be possible to reuse more code between topk and bottomk, but I decided for some more duplication for the sake of clarity. This fixes https://github.com/prometheus/prometheus/issues/399 Change-Id: I7487ddaadbe7acb22ca2cf2283ba6e7915f2b336	2014-11-25 17:02:00 +01:00
Bjoern Rabenstein	1909686789	Make metrics exported by the Prometheus server itself more consistent. - Always spell out the time unit (e.g. milliseconds instead of ms). - Remove "_total" from the names of metrics that are not counters. - Make use of the "Namespace" and "Subsystem" fields in the options. - Removed the "capacity" facet from all metrics about channels/queues. These are all fixed via command line flags and will never change during the runtime of a process. Also, they should not be part of the same metric family. I have added separate metrics for the capacity of queues as convenience. (They will never change and are only set once.) - I left "metric_disk_latency_microseconds" unchanged, although that metric measures the latency of the storage device, even if it is not a spinning disk. "SSD" is read by many as "solid state disk", so it's not too far off. (It should be "solid state drive", of course, but "metric_drive_latency_microseconds" is probably confusing.) - Brian suggested to not mix "failure" and "success" outcome in the same metric family (distinguished by labels). For now, I left it as it is. We are touching some bigger issue here, especially as other parts in the Prometheus ecosystem are following the same principle. We still need to come to terms here and then change things consistently everywhere. Change-Id: If799458b450d18f78500f05990301c12525197d3	2014-11-25 17:02:00 +01:00
Brian Brazil	4a2b96f848	Remove backoff on scrape failure. Having metrics with variable timestamps inconsistently spaced when things fail will make it harder to write correct rules. Update status page, requires some refactoring to insert a function. Change-Id: Ie1c586cca53b8f3b318af8c21c418873063738a8	2014-11-25 17:02:00 +01:00
Julius Volz	00b9489f1c	Fix time() behavior. time() should return the timestamp for which the query is executed, not the actual current time. Change-Id: I430a45cabad7785cd58f95b1028a71dff4c87710	2014-11-25 17:02:00 +01:00
Julius Volz	c5984f1818	Add abs() and over-time aggregation functions. This implements aggregation functions over time as request in https://github.com/prometheus/prometheus/issues/383. Change-Id: Ifd69b850de8cfdf6e7a6c0e042056fa4c672410e	2014-11-25 17:02:00 +01:00
Julius Volz	1bb7074fec	Fix HTTP connection leak upon non-OK status. Change-Id: Ie7fbd7dcc089b8306b40631be3e3d736c23c1cd3	2014-11-25 17:02:00 +01:00
Brian Brazil	144d5bb9fd	Add 'tmpl', a 'template' for non-string literal names. Change-Id: I6a03a5c5d20029cf414562efa7745ed6c53b2731	2014-11-25 17:02:00 +01:00
Brian Brazil	f525ca5d9e	Let consoles get graph links from experssions. Rename ConsoleLinkFromExpression, as we now have consoles. Change-Id: I7ed2c9c83863adb390b51121dd9736845f7bcdfc	2014-11-25 17:01:59 +01:00
Brian Brazil	eba205fcac	Expose path used to get to console to console. Change-Id: I72386a2d4e53863da302ecc5c7e44d6c310197e0	2014-11-25 17:01:59 +01:00
Johannes 'fish' Ziemke	aed1d384a9	Build prometheus tools as well Change-Id: I49d5ca4d6ff715e8a6631caf052de309b91b0b1b	2014-11-25 17:01:59 +01:00
Brian Brazil	eb5d928da7	Fix console handler. This was accidnetally broken in `2128d9d811`. Change-Id: I50ea1fdb8ae4d28ae4555410bee97e5037692aa5	2014-11-25 17:01:59 +01:00
Bjoern Rabenstein	bacc31d5cc	Remove work-around that required copying all bytes of a scrape. Now that the subtle bug in matttproud/golang_protobuf_extensions is fixed, we do not need to copy the bytes of a scrape into a buffer first before starting to parse it. Change-Id: Ib73ecae16173ddd219cda56388a8f853332f8853	2014-11-25 17:01:59 +01:00
Julius Volz	74de633a3a	Prometheus version 0.6.0. Change-Id: I50f6b69cca952eedf9a62b9a8f58e0fb633a83ed	2014-11-25 17:01:59 +01:00
Julius Volz	80b3d3bf34	Speed up disk flushes by removing unnecessary sort. The first sort in groupByFingerprint already ensures that all resulting sample lists contain only one fingerprint. We also already assume that all samples passed into AppendSamples (and thus groupByFingerprint) are chronologically sorted within each fingerprint. The extra chronological sort is thus superfluous. Furthermore, this second sort didn't only sort chronologically, but also compared all metric fingerprints again (although we already know that we're only sorting within samples for the same fingerprint). This caused a huge memory and runtime overhead. In a heavily loaded real Prometheus, this brought down disk flush times from ~9 minutes to ~1 minute. OLD: BenchmarkLevelDBAppendRepeatingValues 5 331391808 ns/op 44542953 B/op 597788 allocs/op BenchmarkLevelDBAppendsRepeatingValues 5 329893512 ns/op 46968288 B/op 3104373 allocs/op NEW: BenchmarkLevelDBAppendRepeatingValues 5 299298635 ns/op 43329497 B/op 567616 allocs/op BenchmarkLevelDBAppendsRepeatingValues 20 92204601 ns/op `1779454` B/op 70975 allocs/op Change-Id: Ie2d8db3569b0102a18010f9e106e391fda7f7883	2014-11-25 17:01:59 +01:00
Julius Volz	21cafe6cd7	Only evict memory series after they are on disk. This fixes the problem where samples become temporarily unavailable for queries while they are being flushed to disk. Although the entire flushing code could use some major refactoring, I'm explicitly trying to do the minimal change to fix the problem since there's a whole new storage implementation in the pipeline. Change-Id: I0f5393a30b88654c73567456aeaea62f8b3756d9	2014-11-25 17:01:59 +01:00
Bjoern Rabenstein	8956faeccb	Migrate to new client_golang. This change will only be submitted when the new client_golang has been moved to the new version. Change-Id: Ifceb59333072a08286a8ac910709a8ba2e3a1581	2014-11-25 17:01:59 +01:00
Bjoern Rabenstein	814e479723	Treat non-200 HTTP response as error. Change-Id: I2a9f3b47012b3c4839be53aa44c66d16dd41a24a	2014-11-25 17:01:59 +01:00
Brian Brazil	e27447da5c	Remove the broken "User Dashboard" link. Due to the lack of a </a>, this makes the entire header render badly. Accordingly it's safe to assume noone is using it, so remove it. With the new console template support, we'll need to something a bit more nuanced later. Change-Id: I3424bed6aea18cbd4c63ad48f98808098dadc3ad	2014-11-25 17:01:59 +01:00
Brian Brazil	2f76f434a5	Add humanizeDuration function. This attempts to reasonably handle things from weekly cronjobs, to rpcs taking ns to things that are usually ms but jump to over a second. For consistency, stop putting spaces before prefixes. Change-Id: I6407879187b25680b323cd70254e205315b5fc3c	2014-11-25 17:01:59 +01:00
Brian Brazil	960ede66dc	Use html/template for console templates and add template libary support. Add a function to bypass the new auto-escaping. Add a function to workaround go's templates only allowing passing in one argument. Change-Id: Id7aa3f95e7c227692dc22108388b1d9b1e2eec99	2014-11-25 17:01:59 +01:00
Brian Brazil	0f5874ff97	Make Prometheus in header link to status page. This is consistent with alertmanager, and more intiutive for users. The graphs page just has graphs, so remove mention of consoles. Change-Id: I87780a4ade33697a6095423e1a7de47d341d2838	2014-11-25 17:01:59 +01:00
Brian Brazil	cd3592aebc	Add title and match functions. Change-Id: Ifd376c2935e22d378e7afa06122642847a237d78	2014-11-25 17:01:59 +01:00
Brian Brazil	1828b1f55c	Only log every query when debugging. Change-Id: I4f988d81cda6f6deb0ed7f497de4aa75409b158f	2014-11-25 17:01:59 +01:00
Brian Brazil	9b74324d9e	Add functions for regex replacement, sorting and humanizing. Change-Id: I471c7a8087cd5432b51afce811b591b11583a0c3	2014-11-25 17:01:59 +01:00
Julius Volz	00fd10e24f	Update GeneratorURL field name in notification tests. Change-Id: Ic4357999b6ebcf54008869a395e56d12a0ead211	2014-11-20 18:10:43 +01:00
Julius Volz	459f551259	Merge "Eliminate modal alerts in graphing UI."	2014-10-30 17:00:57 +01:00
Julius Volz	0da8b2add1	Make tabular view the default (vs. graphing view). Change-Id: I9f0961f2c474e8cce5e376ce4e20040644f89370	2014-10-30 16:38:25 +01:00
Julius Volz	921ebbf744	Eliminate modal alerts in graphing UI. This shows errors in a pane under the expression input instead. Change-Id: Iec209e1628a3b102cce9f34b2467621772dfb8ff	2014-10-30 16:18:05 +01:00
Julius Volz	2c4cab07b1	Fix acronym caps in GeneratorURL. Change-Id: Ib18c1f617dcde1039e848059545a6d8831d9bf66	2014-10-27 17:03:00 +01:00
Julius Volz	f1aac54104	Allow alternative "by"-clause position in grammar. In addition to the existing by-clause syntax: sum(<expression>) by (<labels>) [keeping_extra] ...this allows the following new syntax: sum by (<labels>) [keeping_extra] (<expression>) Both orderings may be used in a single expression. It is up to the users to establish guidelines around their usage. Change-Id: Iba10c9cc5fb6ac62edfcf246d281473e82467992	2014-10-22 11:57:20 +02:00
Brian Brazil	2aa8c8669e	Make query_range more robust. Gracefully handle decimal values, by truncating them. Limit amount of steps, to avoid accidentally pulling too much data. This limit returns up to ~500kB per timeseries, and allows for 60s granularity for a week and 1h granularity for a year. Change-Id: Ie549fc24deb2eecbc6c5d1b6088a548a6b02e849	2014-10-20 18:39:46 +01:00
Brian Brazil	50a995c8de	Don't alert() when a query is aborted, such as when you change the range. Change-Id: I574504f97446ac5f3dda737fe054ae83f17dbbc2	2014-10-15 15:38:09 +01:00
Julius Volz	080b952647	Allow omitting the metric name in queries. This allows the following expression syntaxes for selecting timeseries: foo (already valid before) foo{} (already valid before) {job="prometheus"} (new, select all timeseries for job "prometheus") Omitting both the metric name and any label matchers ("" or "{}") will still yield a syntax error. To get all timeseries, you could do: {__name__=~"."} or, without relying on knowledge about __metric__: {job=~"."} Change-Id: Ifee000b9ac0184ef6ced18411069c7f2699a2dda	2014-10-14 17:43:37 +02:00
Brian Brazil	35fb5378bc	Add back consoles link. Goes in index.html in consoles or else user data, if present. Change-Id: I5303d30aa24ca0c20d2e0f49121e04a260b9c4f4	2014-10-02 15:44:47 +01:00
Andres Suarez	dba246e97a	Focus expression after selection from dropdown Change-Id: Id7f67e558e3611ab4c7188cc428c342d8d3e67db	2014-09-16 19:02:01 +02:00
Andres Suarez	76527bae8b	Allow selecting metric from Insert Metric Change-Id: I99e0539cab2749a8aeabc0a13015889ff45834f7	2014-09-16 19:01:14 +02:00
Bjoern Rabenstein	de337e6404	Cut v0.8.0. Change-Id: Ie8d49793e78f10bdeb7ebe19cc2dc729ff7ef590	2014-09-04 15:41:13 +02:00
Bjoern Rabenstein	943a939c29	Fix the accept header. A '/' is a separator and has to be in a quoted string. Change-Id: If7a3a847f84f8f709074d05dc98b5b21e954030c	2014-09-03 16:46:29 +02:00
Julius Volz	f739980dfe	Format changelog properly. Change-Id: I62c5bf8c5b880272d207da564a3fc45490c5db5e	2014-08-25 15:14:25 +02:00
Julius Volz	e995cda75c	Merge "Stagger scrapes to spread out load."	2014-08-20 18:13:19 +02:00
Brian Brazil	3b3ec604c3	Stagger scrapes to spread out load. Change-Id: Ib141b271e4adfb817886871f86051c207b05cf35	2014-08-20 17:07:10 +01:00
Julius Volz	0ca5be127f	Prometheus version 0.7.0. Change-Id: I73468f72b43654f4bf57627c2f49fe802b18f637	2014-08-06 14:13:17 +02:00
Julius Volz	bfb64321de	Merge "Update used Go version to 1.3."	2014-08-06 13:10:09 +02:00
Julius Volz	ef3b512dcf	Update used Go version to 1.3. Go downloads moved to a different URL and require following redirects (curl's '-L' option) now. Go 1.3 deliberately randomizes ranges over maps, which uncovered some bugs in our tests. These are fixed too. Change-Id: Id2d9e185d8d2379a9b7b8ad5ba680024565d15f4	2014-08-06 12:51:53 +02:00
Julius Volz	b65c5dd752	Add function to drop common labels in a vector. This fixes https://github.com/prometheus/prometheus/issues/384. Change-Id: I2973c4baeb8a4618ec3875fb11c6fcf5d111784b	2014-08-05 20:43:52 +02:00
Julius Volz	f7cd18abdf	Add more topk() / bottomk() tests. Test what happens if k > number of input elements. Change-Id: Ie724b850939e297ebf085f0a5a3522e9cfcc6534	2014-08-05 20:14:04 +02:00
Julius Volz	200d02effe	Implement topk() and bottomk() functions. To achieve O(log n * k) runtime, this uses a heap to track the current bottom-k or top-k elements while iterating over the full set of available elements. It would be possible to reuse more code between topk and bottomk, but I decided for some more duplication for the sake of clarity. This fixes https://github.com/prometheus/prometheus/issues/399 Change-Id: I7487ddaadbe7acb22ca2cf2283ba6e7915f2b336	2014-08-05 19:05:36 +02:00
Bjoern Rabenstein	24ece38f7c	Make metrics exported by the Prometheus server itself more consistent. - Always spell out the time unit (e.g. milliseconds instead of ms). - Remove "_total" from the names of metrics that are not counters. - Make use of the "Namespace" and "Subsystem" fields in the options. - Removed the "capacity" facet from all metrics about channels/queues. These are all fixed via command line flags and will never change during the runtime of a process. Also, they should not be part of the same metric family. I have added separate metrics for the capacity of queues as convenience. (They will never change and are only set once.) - I left "metric_disk_latency_microseconds" unchanged, although that metric measures the latency of the storage device, even if it is not a spinning disk. "SSD" is read by many as "solid state disk", so it's not too far off. (It should be "solid state drive", of course, but "metric_drive_latency_microseconds" is probably confusing.) - Brian suggested to not mix "failure" and "success" outcome in the same metric family (distinguished by labels). For now, I left it as it is. We are touching some bigger issue here, especially as other parts in the Prometheus ecosystem are following the same principle. We still need to come to terms here and then change things consistently everywhere. Change-Id: If799458b450d18f78500f05990301c12525197d3	2014-07-31 15:44:31 +02:00

... 134 135 136 137 138 ...

7777 commits