prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-12-25 13:44:05 -08:00

Author	SHA1	Message	Date
Fabian Reinartz	fa1e90003b	Query timeout added. This is related to #454. Queries now timeout after a duration set by the -query.timeout flag. The TotalEvalTimer is now started/stopped inside any of the ast.Eval* functions.	2015-02-03 08:04:27 +01:00
Bjoern Rabenstein	26e22e6ad6	Fix rule manager shutdown.	2015-01-29 15:05:10 +01:00
Julius Volz	d4374a9265	More efficient JSON query result format. This depends on https://github.com/prometheus/client_golang/pull/51. For vectors, the result format looks like this: ```json { "version": 1, "type" : "vector", "value" : [ { "timestamp" : 1421765411.045, "value" : "65.475000", "metric" : { "quantile" : "0.5", "instance" : "http://localhost:9090/metrics", "job" : "prometheus", "__name__" : "http_request_duration_microseconds", "handler" : "/static/", "method" : "get", "code" : "304" } }, { "timestamp" : 1421765411.045, "value" : "5826.339000", "metric" : { "quantile" : "0.9", "instance" : "http://localhost:9090/metrics", "job" : "prometheus", "__name__" : "http_request_duration_microseconds", "handler" : "prometheus", "method" : "get", "code" : "200" } }, /* ... / ] } ``` For matrices, it looks like this: ```json { "version": 1, "type" : "matrix", "value" : [ { "metric" : { "quantile" : "0.99", "instance" : "http://localhost:9090/metrics", "job" : "prometheus", "__name__" : "http_request_duration_microseconds", "handler" : "/static/", "method" : "get", "code" : "200" }, "values" : [ [ 1421765547.659, "29162.953000" ], [ 1421765548.659, "29162.953000" ], [ 1421765549.659, "29162.953000" ], / ... */ ] } ] } ```	2015-01-26 13:06:22 +01:00
Brian Brazil	a31730e88b	Make 2nd arg to delta optional. Add a deriv() function. The 2nd isCounter argument to delta is ugly, make it optional as the first step of deprecating it. This will makes delta only ever applied to gauges. Add a deriv function to calculate the least squares slope of a gauge. This is more useful for prediction than delta, as it isn't as heavily influenced by outliers at the boundaries.	2015-01-23 14:50:27 +00:00
Bjoern Rabenstein	5859b74f1b	Clean up license issues. - Move CONTRIBUTORS.md to the more common AUTHORS. - Added the required NOTICE file. - Changed "Prometheus Team" to "The Prometheus Authors". - Reverted the erroneous changes to the Apache License.	2015-01-21 20:07:45 +01:00
Bjoern Rabenstein	b09453af1d	Adjust to new client_golang API.	2015-01-21 15:42:25 +01:00
Julius Volz	bb1e49383e	Log rule evalation errors.	2015-01-08 17:50:55 +01:00
Julius Volz	d6b9e97655	Remove extraction.Result type, simplify code.	2015-01-08 16:34:01 +01:00
Julius Volz	9a4ca68a61	Add metrics for rule evaluation failures. Fixes https://github.com/prometheus/prometheus/issues/417	2015-01-08 16:33:35 +01:00
Brian Brazil	ffa2e73803	Fix regression from `5e8d57bec1` 0 is a false value, so shortcutting no longer works. Update other places in the code that assumed graph was the default.	2014-12-27 00:28:36 +00:00
Julius Volz	cc27fb8aab	Rename remaining all-caps constants in AST layer. Change-Id: Ibe97e30981969056ffcdb89e63c1468ea1ffa140	2014-12-25 01:30:47 +01:00
Julius Volz	895523ad14	Include necessary Makefile.INCLUDE from rules/Makefile. Change-Id: I077d018dbe4093cd40ddf38d66a996df222bf5e4	2014-12-25 01:13:59 +01:00
Julius Volz	2ade9d40cf	Clarify why we need int constants for expression types. Change-Id: I053fc5d32c118dbdb204dc8193337f981aff796e	2014-12-25 00:45:30 +01:00
Julius Volz	00a2a93a05	Add regression tests for metrics mutations in AST. It turned out in the end, that only drop_common_metrics() produced any erroneous output in the old system. The second expression in the test ("sum(testmetric) keeping_extra") already worked in the old code, but why not keep it in... The way to test ranged evaluations is a bit clumsy so far, so I want to build a nicer test framework in the end, where all the test cases can be specified as text files which specify desired inputs, outputs, query step widths, etc. Change-Id: I821859789e69b8232bededf670a1b76e9e8c8ca4	2014-12-12 20:34:55 +01:00
Julius Volz	c9618d11e8	Introduce copy-on-write for metrics in AST. This depends on changes in: https://github.com/prometheus/client_golang/tree/cow-metrics. Change-Id: I80b94833a60ddf954c7cd92fd2cfbebd8dd46142	2014-12-12 20:34:55 +01:00
Bjoern Rabenstein	b1e4956142	Apply a giant code cleanup. Essentially: - Remove unused code. - Make it 'go vet' clean. The only remaining warnings are in generated code. - Make it 'golint' clean. The only remaining warnings are in gerenated code. - Smoothed out same minor things. Change-Id: I3fe5c1fbead27b0e7a9c247fee2f5a45bc2d42c6	2014-12-10 16:16:49 +01:00
Bjoern Rabenstein	fee88a7a77	Remove the remaining races, new and old. Also, resolve a few other TODOs. Change-Id: Icb39b5a5e8ca22ebcb48771cd8951c5d9e112691	2014-12-03 18:07:23 +01:00
Bjoern Rabenstein	7d11019aa2	Squash a few trivial TODOs. - Delete unneeded file view_adapter.go. - Assessed that we still need the fingerprints in nodes (to create iterators). - Turned numMemChunkDescs into a metric. Change-Id: I29be963c795a075ec00c095f76bf26405535609d	2014-11-27 18:26:06 +01:00
Julius Volz	6eecee55b7	Fix acronym caps in GeneratorURL. Change-Id: Ib18c1f617dcde1039e848059545a6d8831d9bf66	2014-11-25 17:13:04 +01:00
Bjoern Rabenstein	0ae1d8889a	Fix tests after merge. Change-Id: Ia90da9a3e48ed780ec38c4a6a1fd9ea34e7f6a58	2014-11-25 17:13:04 +01:00
Julius Volz	b7bf11230a	Add absent() function. A common problem in Prometheus alerting is to detect when no timeseries exist for a given metric name and label combination. Unfortunately, Prometheus alert expressions need to be of vector type, and "count(nonexistent_metric)" results in an empty vector, yielding no output vector elements to base an alert on. The newly introduced absent() function solves this issue: ALERT FooAbsent IF absent(foo{job="myjob"}) [...] absent() has the following behavior: - if the vector passed to it has any elements, it returns an empty vector. - if the vector passed to it has no elements, it returns a 1-element vector with the value 1. In the second case, absent() tries to be smart about deriving labels of the 1-element output vector from the input vector: absent(nonexistent{job="myjob"}) => {job="myjob"} absent(nonexistent{job="myjob",instance=~".*"}) => {job="myjob"} absent(sum(nonexistent{job="myjob"})) => {} That is, if the passed vector is a literal vector selector, it takes all "=" label matchers as the basis for the output labels, but ignores all non-equals or regex matchers. Also, if the passed vector results from a non-selector expression, no labels can be derived. Change-Id: I948505a1488d50265ab5692a3286bd7c8c70cd78	2014-11-25 17:13:04 +01:00
Julius Volz	3d47f94149	Drop metric names after transformations. After many transformations, it doesn't make sense to keep the metric names, since the result of the transformation is no longer that metric. This drops the metric name after such transformations and makes the web UI deal well with missing metric names. This depends on the current branch on the following things: - prometheus/client_golang needs to be at `e237cf15c6` in branch "julius/int-fingerprints" (to be merged with new storage) - prometheus/promdash needs to be at `dd7691c9c2` Change-Id: Ib3c8cad8d647d9854e8c653c424b8c235ccc231d	2014-11-25 17:13:04 +01:00
Bjoern Rabenstein	14bda4180c	Changes after pair code review. Change-Id: Ib72d40f8e9027818cfbbd32a7a7201eebda07455	2014-11-25 17:12:59 +01:00
Bjoern Rabenstein	006b5517e2	Simplify makefiles. This removes the dependancy on C leveldb and snappy. It also takes care of fewer dependencies as they would anyway not work on any non-Debian, non-Brew system. Change-Id: Ia70dce1ba8a816a003587927e0b3a3f8ad2fd28c	2014-11-25 17:10:39 +01:00
Bjoern Rabenstein	74c143c4c9	Improve scraper shutdown time. - Stop target pools in parallel. - Stop individual scrapers in goroutines, too. - Timing tweaks. Change-Id: I9dff1ee18616694f14b04408eaf1625d0f989696	2014-11-25 17:10:39 +01:00
Julius Volz	0712d738d1	Allow alternative "by"-clause position in grammar. In addition to the existing by-clause syntax: sum(<expression>) by (<labels>) [keeping_extra] ...this allows the following new syntax: sum by (<labels>) [keeping_extra] (<expression>) Both orderings may be used in a single expression. It is up to the users to establish guidelines around their usage. Change-Id: Iba10c9cc5fb6ac62edfcf246d281473e82467992	2014-11-25 17:09:04 +01:00
Julius Volz	0e48c18bbf	Allow omitting the metric name in queries. This allows the following expression syntaxes for selecting timeseries: foo (already valid before) foo{} (already valid before) {job="prometheus"} (new, select all timeseries for job "prometheus") Omitting both the metric name and any label matchers ("" or "{}") will still yield a syntax error. To get all timeseries, you could do: {__name__=~"."} or, without relying on knowledge about __metric__: {job=~"."} Change-Id: Ifee000b9ac0184ef6ced18411069c7f2699a2dda	2014-11-25 17:09:04 +01:00
Bjoern Rabenstein	096fa0f8b2	Squash a number of TODOs. - Staleness delta is no a proper function parameter and not replicated from package ast. - Named type 'chunks' replaced by explicit '[]chunk' to avoid confusion. - For the same reason, replaced 'chunkDescs' by '[]*chunkDescs'. - Verified that math.Modf is not a speed enhancement over conversion (actually 5x slower). - Renamed firstTimeField, lastTimeField into chunkFirstTime and chunkLastTime. - Verified unpin() is sufficiently goroutine-safe. - Decided not to update archivedFingerprintToTimeRange upon series truncation and added a rationale why. Change-Id: I863b8d785e5ad9f71eb63e229845eacf1bed8534	2014-11-25 17:09:04 +01:00
Bjoern Rabenstein	b3ed9aa7a2	Clean up start-up and shut-down. Change-Id: Idff4bbb0a15a9f879bfbb3da5b1025179cab5e2c	2014-11-25 17:08:45 +01:00
Bjoern Rabenstein	38fc24d0ed	Fix targetpool_test.go and other tests. Change-Id: I91a4dd1d39e01f174e1aaae653ce1ed7aecaa624	2014-11-25 17:08:26 +01:00
Julius Volz	7f5d3c2c29	Fix and improve the fp locker. Benchmark: $ go test -bench 'Fingerprint' -test.run 'Fingerprint' -test.cpu=1,2,4 OLD BenchmarkFingerprintLockerParallel 500000 3618 ns/op BenchmarkFingerprintLockerParallel-2 100000 12257 ns/op BenchmarkFingerprintLockerParallel-4 500000 10164 ns/op BenchmarkFingerprintLockerSerial 10000000 283 ns/op BenchmarkFingerprintLockerSerial-2 10000000 284 ns/op BenchmarkFingerprintLockerSerial-4 10000000 288 ns/op NEW BenchmarkFingerprintLockerParallel 1000000 1018 ns/op BenchmarkFingerprintLockerParallel-2 1000000 1164 ns/op BenchmarkFingerprintLockerParallel-4 2000000 910 ns/op BenchmarkFingerprintLockerSerial 50000000 56.0 ns/op BenchmarkFingerprintLockerSerial-2 50000000 47.9 ns/op BenchmarkFingerprintLockerSerial-4 50000000 54.5 ns/op Change-Id: I3c65a43822840e7e64c3c3cfe759e1de51272581	2014-11-25 17:07:45 +01:00
Julius Volz	358f97791d	Minor cleanups. Change-Id: Ia8685d8439a421fe2143d9ec7120d5bb5ab88d78	2014-11-25 17:07:44 +01:00
Bjoern Rabenstein	f5f9f3514a	Major code cleanup. - Make it go-vet and golint clean. - Add comments, TODOs, etc. Change-Id: If1392d96f3d5b4cdde597b10c8dff1769fcfabe2	2014-11-25 17:02:53 +01:00
Julius Volz	e7ed39c9a6	Initial experimental snapshot of next-gen storage. Change-Id: Ifb8709960dbedd1d9f5efd88cdd359ee9fa9d26d	2014-11-25 17:02:00 +01:00
Julius Volz	85497e3f38	Add function to drop common labels in a vector. This fixes https://github.com/prometheus/prometheus/issues/384. Change-Id: I2973c4baeb8a4618ec3875fb11c6fcf5d111784b	2014-11-25 17:02:00 +01:00
Julius Volz	3fdb74e571	Add more topk() / bottomk() tests. Test what happens if k > number of input elements. Change-Id: Ie724b850939e297ebf085f0a5a3522e9cfcc6534	2014-11-25 17:02:00 +01:00
Julius Volz	c582ae73c2	Implement topk() and bottomk() functions. To achieve O(log n * k) runtime, this uses a heap to track the current bottom-k or top-k elements while iterating over the full set of available elements. It would be possible to reuse more code between topk and bottomk, but I decided for some more duplication for the sake of clarity. This fixes https://github.com/prometheus/prometheus/issues/399 Change-Id: I7487ddaadbe7acb22ca2cf2283ba6e7915f2b336	2014-11-25 17:02:00 +01:00
Bjoern Rabenstein	1909686789	Make metrics exported by the Prometheus server itself more consistent. - Always spell out the time unit (e.g. milliseconds instead of ms). - Remove "_total" from the names of metrics that are not counters. - Make use of the "Namespace" and "Subsystem" fields in the options. - Removed the "capacity" facet from all metrics about channels/queues. These are all fixed via command line flags and will never change during the runtime of a process. Also, they should not be part of the same metric family. I have added separate metrics for the capacity of queues as convenience. (They will never change and are only set once.) - I left "metric_disk_latency_microseconds" unchanged, although that metric measures the latency of the storage device, even if it is not a spinning disk. "SSD" is read by many as "solid state disk", so it's not too far off. (It should be "solid state drive", of course, but "metric_drive_latency_microseconds" is probably confusing.) - Brian suggested to not mix "failure" and "success" outcome in the same metric family (distinguished by labels). For now, I left it as it is. We are touching some bigger issue here, especially as other parts in the Prometheus ecosystem are following the same principle. We still need to come to terms here and then change things consistently everywhere. Change-Id: If799458b450d18f78500f05990301c12525197d3	2014-11-25 17:02:00 +01:00
Julius Volz	00b9489f1c	Fix time() behavior. time() should return the timestamp for which the query is executed, not the actual current time. Change-Id: I430a45cabad7785cd58f95b1028a71dff4c87710	2014-11-25 17:02:00 +01:00
Julius Volz	c5984f1818	Add abs() and over-time aggregation functions. This implements aggregation functions over time as request in https://github.com/prometheus/prometheus/issues/383. Change-Id: Ifd69b850de8cfdf6e7a6c0e042056fa4c672410e	2014-11-25 17:02:00 +01:00
Brian Brazil	f525ca5d9e	Let consoles get graph links from experssions. Rename ConsoleLinkFromExpression, as we now have consoles. Change-Id: I7ed2c9c83863adb390b51121dd9736845f7bcdfc	2014-11-25 17:01:59 +01:00
Bjoern Rabenstein	8956faeccb	Migrate to new client_golang. This change will only be submitted when the new client_golang has been moved to the new version. Change-Id: Ifceb59333072a08286a8ac910709a8ba2e3a1581	2014-11-25 17:01:59 +01:00
Brian Brazil	960ede66dc	Use html/template for console templates and add template libary support. Add a function to bypass the new auto-escaping. Add a function to workaround go's templates only allowing passing in one argument. Change-Id: Id7aa3f95e7c227692dc22108388b1d9b1e2eec99	2014-11-25 17:01:59 +01:00
Brian Brazil	e041c0cd46	Add console and alert templates with access to all data. Move rulemanager to it's own package to break cicrular dependency. Make NewTestTieredStorage available to tests, remove duplication. Change-Id: I33b321245a44aa727bfc3614a7c9ae5005b34e03	2014-05-30 16:24:56 +01:00
Bjoern Rabenstein	ca6a4fccef	Weed out our homegrown test.Tester. The Go stdlib has testing.TB now, which fulfills the exact same purpose. Change-Id: I0db9c73400e208ca376b932a02b7e3402234b87c	2014-05-21 19:27:24 +02:00
Julius Volz	6297a405f2	Do not indent API JSON responses. In one example response, this reduced the uncompressed size by 25% and the gzipped size by 11%. Change-Id: Ie80d44253124b9f8601b8ef9fc978e92dacff523	2014-04-22 15:16:37 +02:00
Julius Volz	01f652cb4c	Separate storage implementation from interfaces. This was initially motivated by wanting to distribute the rule checker tool under `tools/rule_checker`. However, this was not possible without also distributing the LevelDB dynamic libraries because the tool transitively depended on Levigo: rule checker -> query layer -> tiered storage layer -> leveldb This change separates external storage interfaces from the implementation (tiered storage, leveldb storage, memory storage) by putting them into separate packages: - storage/metric: public, implementation-agnostic interfaces - storage/metric/tiered: tiered storage implementation, including memory and LevelDB storage. I initially also considered splitting up the implementation into separate packages for tiered storage, memory storage, and LevelDB storage, but these are currently so intertwined that it would be another major project in itself. The query layers and most other parts of Prometheus now have notion of the storage implementation anymore and just use whatever implementation they get passed in via interfaces. The rule_checker is now a static binary :) Change-Id: I793bbf631a8648ca31790e7e772ecf9c2b92f7a0	2014-04-16 13:30:19 +02:00
Julius Volz	d411a7d810	Allow reversing vector and scalar arguments in binops. This allows putting a scalar as the first argument of a binary operator in which the second argument is a vector: <scalar> <binop> <vector> For example, 1 / http_requests_total ...will output a vector in which every sample value is 1 divided by the respective input vector element. This even works for filter binary operators now: 1 == http_requests_total Returns a vector with all values set to 1 for every element in http_requests_total whose initial value was 1. Note: For filter binary operators, the resulting values are always taken from the left-hand-side of the operation, no matter whether the scalar or the vector argument is the left-hand-side. That is, 1 != http_requests_total ...will set all result vector sample values to 1, although these are exactly the sample elements that were != 1 in the input vector. If you want to just filter elements without changing their sample values, you still need to do: http_requests_total != 1 The new filter form is a bit exotic, and so probably won't be used often. But it was easier to implement it than disallow it completely or change its behavior. Change-Id: Idd083f2bd3a1219ba1560cf4ace42f5b82e797a5	2014-04-08 17:16:18 +02:00
Julius Volz	c7c0b33d0b	Add regex-matching support for labels. There are four label-matching ops for selecting timeseries now: - Equal: = - NotEqual: != - RegexMatch: =~ - RegexNoMatch: !~ Instead of looking up labels by a simple clientmodel.LabelSet (basically an equals op for every key/value pair in the set), timeseries fingerprint selection is now done via a list of metric.LabelMatchers. Change-Id: I510a83f761198e80946146770ebb64e4abc3bb96	2014-04-01 14:24:53 +02:00
Bjoern Rabenstein	0a65b691cc	Disallow ":" in identifiers, but still allow it in metric names. Change-Id: Iace925ab1b71a360bd63357e87f68e727f7afbcb	2014-03-21 13:44:37 +01:00

1 2 3 4

176 commits