prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-11-14 17:44:06 -08:00

Author	SHA1	Message	Date
Florian Pfitzer	1fa0b0f253	fix consul port label	2015-07-31 16:20:17 +00:00
Brian Brazil	adf7f16d1a	Merge pull request #934 from prometheus/query-params Retrieval: Make it possible to relabel query params	2015-07-31 11:01:45 +01:00
Brian Brazil	d8875d17d8	Retrieval: Make it possible to relabel query params This only allows relabelling the first value for a given parameter, this should be sufficient in practice.	2015-07-31 10:09:28 +01:00
Johannes 'fish' Ziemke	6e7d743cd4	Merge pull request #946 from prometheus/add-sd-dns-a Add support for A record based DNS SD	2015-07-30 16:01:47 +02:00
Johannes 'fish' Ziemke	9ab340e95e	Add support for A record based DNS SD If using A records, the user needs to specify "port" and set "type" to "A".	2015-07-30 15:55:38 +02:00
beorn7	645f6772e5	Add Consul Address, ServicePort, and ServiceAddress to the meta labels. In setups where the ServiceAddress is the relevant address for scraping, users can relabel the `__address__` label to ServiceAddress + ":" + ServicePort. This needs to be documented, of course. Will do once this is LGTM'd.	2015-07-22 18:19:13 +02:00
Julius Volz	9d98910fca	Revert "Use Consul ServiceAddress instead of Address when set" This reverts commit `0ac7e7217e`. See discussion on https://github.com/prometheus/prometheus/pull/812 for reasoning. While fixing one use case, it breaks others, and we need a more generic way of handling this.	2015-07-22 13:04:29 +02:00
Fabian Reinartz	d53cc7935d	retrieval: avoid race conditions	2015-07-08 21:27:52 +02:00
Brian Brazil	3d268d681e	retrieval: Handle serverset node not existing. This stops configuration loading hanging if the Znode doesn't exist, and retries until the node does exist.	2015-07-01 13:56:31 +01:00
Fabian Reinartz	080e067601	Merge pull request #832 from prometheus/fabxc/target-test retrieval: double timeout in target scrape test.	2015-06-25 17:23:52 +02:00
Brian Brazil	52859b8033	Merge pull request #836 from prometheus/shard Add 'hashmod' relabel action.	2015-06-24 21:40:10 +01:00
Brian Brazil	682f949ab1	Add 'hashmod' relabel action. This takes the modulus of a hash of some labels. Combined with a keep relabel action, this allows for sharding of targets across multiple prometheus servers.	2015-06-24 21:14:53 +01:00
Fabian Reinartz	23862c92c4	retrieval/discovery: refresh services in Consul to recover from missing events.	2015-06-24 17:48:27 +02:00
Fabian Reinartz	c292979374	retrieval: double timeout in target scrape test.	2015-06-23 21:59:55 +02:00
Julius Volz	d868264bb8	Improve UI of /alerts page. Changes to the UI: - "Active Since" timestamps are now human-readable. - Alerting rules are now pretty-printed better. - Labels are no longer just strings, but alert bubbles (like we do on the status page for base labels). - Alert states and target health states are now capitalized in the presentation layer rather than at the source.	2015-06-23 18:48:45 +02:00
Fabian Reinartz	53b9d5917d	web: improve target URL handling and display.	2015-06-23 13:45:15 +02:00
Fabian Reinartz	dc7d27ab9a	retrieval: add honor label handling and parametrized querying. This commit adds the honor_labels and params arguments to the scrape config. This allows to specify query parameters used by the scrapers and handling scraped labels with precedence.	2015-06-23 13:45:14 +02:00
Fabian Reinartz	459d18cf18	Merge pull request #812 from Marmelatze/consul_services Use Consul ServiceAddress instead of Address when set	2015-06-17 20:10:52 +02:00
Florian Pfitzer	0ac7e7217e	Use Consul ServiceAddress instead of Address when set	2015-06-17 15:39:42 +02:00
Brian Brazil	4d895242f9	Add support for Zookeeper Serversets for SD. It can discover an entire tree of serversets, or just one.	2015-06-16 11:02:08 +01:00
Brian Brazil	0dbae36d36	Allow ingested metrics to be relabeled. The main purpose of this is to allow for blacklisting of expensive metrics as a tactical option. It could also find uses for renaming and removing labels from federation.	2015-06-13 15:18:27 +01:00
Brian Brazil	58ceae82bc	Revert "Allow ingested metrics to be relabeled." This reverts commit `f2f26ca08f`. Was accidentally pushed to master instead of a branch for PR.	2015-06-12 22:12:26 +01:00
Brian Brazil	f2f26ca08f	Allow ingested metrics to be relabeled. The main purpose of this is to allow for blacklisting of expensive metrics as a tactical option. It could also find uses for renaming and removing labels from federation.	2015-06-12 22:06:30 +01:00
Fabian Reinartz	b5fe2e9afe	Merge pull request #773 from prometheus/fabxc/simple-cfg config: simplify default config handling.	2015-06-08 16:22:06 +02:00
Brian Brazil	b8b1d3cbac	Web: Add pre-relabel labels to status page. Figuring out what's going on with the new service discovery and labels is difficult. Add a popover with the labels to the target table to make things simpler, and help discovery of potentially useful labels.	2015-06-08 12:19:01 +01:00
Fabian Reinartz	0af1cff8af	config: simplify default config handling.	2015-06-06 09:04:04 +02:00
Fabian Reinartz	8214b4ee78	retrieval/discovery: surround __meta_consul_tags value with tag seperators.	2015-06-05 19:18:34 +02:00
Fabian Reinartz	280d11dca8	main: exit on invalid rule files on startup.	2015-06-02 18:44:41 +02:00
Fabian Reinartz	0de6edbdfc	Move pkg/ to util/	2015-06-01 21:12:32 +02:00
Fabian Reinartz	dfaf31a1da	Move web/httputils to pkg/httputil and add DeadlineClient to it	2015-06-01 21:12:31 +02:00
Fabian Reinartz	a4f179230a	Merge pull request #744 from prometheus/fabxc/fix-labels Fix discarding of labels in file target groups	2015-05-27 19:57:15 +02:00
Fabian Reinartz	e9b344abee	Fix discarding of labels in file target groups	2015-05-27 18:52:44 +02:00
Fabian Reinartz	8b7e5f9184	Stop holding TargetManager lock when stopping components. TargetProviders may flush some last changes to the target manager before actually stopping. To properly read those form the channel the target manager must not be locked while stopping a provider.	2015-05-27 12:41:37 +02:00
Brian Brazil	f34de493d5	Add increase() function, to replace delta(..., 1). This calculates how much a counter increases over a given period of time, which is the area under the curve of it's rate. increase(x[5m]) is equivilent to rate(x[5m]) * 300.	2015-05-26 22:49:21 +01:00
Fabian Reinartz	efb39cfd4e	Fix file SD test	2015-05-23 21:20:39 +02:00
Julius Volz	267fd34156	Switch Prometheus to use github.com/prometheus/log. This change is conceptually very simple, although the diff is large. It switches logging from "github.com/golang/glog" to "github.com/prometheus/log", while not actually changing any log messages. V(1)-style logging has been changed to be log.Debug*().	2015-05-20 18:19:32 +02:00
Fabian Reinartz	7143dff02f	Add initial implementation for SD via Consul. This commit adds service discovery using Consul's HTTP API and watches (long polling) to retrieve target updates.	2015-05-20 11:46:24 +02:00
Fabian Reinartz	b0c181dc0d	Add Consul SD configuration.	2015-05-20 11:46:24 +02:00
Fabian Reinartz	ff832d2e03	Attach __meta_filepath label to file SD targets.	2015-05-19 15:49:38 +02:00
Fabian Reinartz	8de50619f1	Increase target test wait times On slow systems such as Travis CI occasionally the tests fail because the wait times are too short.	2015-05-19 12:06:52 +02:00
Fabian Reinartz	385919a65a	Avoid inter-component blocking if ingestion/scraping blocks. Appending to the storage can block for a long time. Timing out scrapes can also cause longer blocks. This commit avoids that those blocks affect other compnents than the target itself. Also the Target interface was removed.	2015-05-18 17:58:51 +02:00
Fabian Reinartz	1a2d57b45c	Move template functionality out of target. The target implementation and interface contain methods only serving a specific purpose of the templates. They were moved to the template as they operate on more fundamental target data.	2015-05-18 13:35:43 +02:00
Fabian Reinartz	dbc08d390e	Move target status data into its own object	2015-05-18 11:15:42 +02:00
Fabian Reinartz	9ca47869ed	Provide full SD configs to discovery constructors. Some SD configs may have many options. To be readable and consistent, make all discovery constructors receive the full config rather than the separate arguments.	2015-05-15 14:54:29 +02:00
Fabian Reinartz	93548a8882	Add initial file based service discovery. This commits adds file based service discovery which reads target groups from specified files. It detects changes based on file watches and regular refreshes.	2015-05-15 14:44:54 +02:00
Fabian Reinartz	d5aa012fd0	Make HTTP basic auth configurable for scrape targets.	2015-05-15 12:47:50 +02:00
Fabian Reinartz	bb540fd9fd	Implement config reloading on SIGHUP. With this commit, sending SIGHUP to the Prometheus process will reload and apply the configuration file. The different components attempt to handle failing changes gracefully.	2015-05-13 16:49:46 +02:00
Fabian Reinartz	86087120dd	Replace example config with new YAML format.	2015-05-11 18:14:07 +02:00
Fabian Reinartz	5fbde88919	Switch config to YAML format.	2015-05-07 16:52:14 +02:00
Fabian Reinartz	b5a8f7b8fa	Cleanup, test, and document config.	2015-04-30 21:17:19 +02:00
Fabian Reinartz	945c49a2dd	Add relabelling to target management. This commit adds a relabelling stage on the set of base labels from which a target is created. It allows to drop targets and rewrite any regular or internal label.	2015-04-30 18:46:33 +02:00
Fabian Reinartz	0b619b46d6	Change JobConfig to ScrapeConfig. This commit changes the configuration interface from job configs to scrape configs. This includes allowing multiple ways of target definition at once and moving DNS SD to its own config message. DNS SD can now contain multiple DNS names per configured discovery.	2015-04-28 23:18:55 +02:00
Fabian Reinartz	5015c2a0e8	Make target manager source based. This commit shifts responsibility for maintaining targets from providers and pools to the target manager. Target groups have a source name that identifies them for updates.	2015-04-24 15:49:35 +02:00
Fabian Reinartz	4f8673aa88	Simplify update sync for targets, format config fixtures.	2015-04-19 10:36:26 +02:00
Fabian Reinartz	36184f3530	Show correct error on wrong DNS response.	2015-04-11 16:14:38 +02:00
beorn7	fa1935a644	Remove /api/targets call and do not show job and instance labels on status. /api/targets was undocumented and never used and also broken. Showing instance and job labels on the status page (next to targets) does not make sense as those labels are set in an obvious way. Also add a doc comment to TargetStateToClass.	2015-03-18 18:53:43 +01:00
beorn7	be11cb2b07	Remove the sample ingestion channel. The one central sample ingestion channel has caused a variety of trouble. This commit removes it. Targets and rule evaluation call an Append method directly now. To incorporate multiple storage backends (like OpenTSDB), storage.Tee forks the Append into two different appenders. Note that the tsdb queue manager had its own queue anyway. It was a queue after a queue... Much queue, so overhead... Targets have their own little buffer (implemented as a channel) to avoid stalling during an http scrape. But a new scrape will only be started once the old one is fully ingested. The contraption of three pipelined ingesters was removed. A Target is an ingester itself now. Despite more logic in Target, things should be less confusing now. Also, remove lint and vet warnings in ast.go.	2015-03-15 14:08:22 +01:00
Julius Volz	140eede5e0	Rename UNREACHABLE to UNHEALTHY. The current wording suggests that a target is not reachable at all, although it might also get set when the target was reachable, but there was some other error during the scrape (invalid headers or invalid scrape content). UNHEALTHY is a more general wording that includes all these cases. For consistency, ALIVE is also renamed to HEALTHY.	2015-03-07 23:18:18 +01:00
Sergiusz 'q3k' Bazański	0d0bb3c030	Change instance identifiers to be host:port This changes the PublicURL function into InstanceIdentifier, which now returns a simple <host>:<port> string instead of a full URL.	2015-02-20 16:21:13 +01:00
Sergiusz 'q3k' Bazański	bb69a3d284	Hide HTTP auth parts from URL This instroduces an extra function in the Target interface (PublicURL) which is used to populate the instance field in scraped metrics.	2015-02-19 18:58:47 +01:00
Julius Volz	af627bb2b9	Copy vendored deps manually instead of using Godeps. We were using Godep incorrectly (cloning repos from the internet during build time instead of including Godeps/_workspace in the GOPATH via "godep go"). However, to avoid even having to fetch "godeps" from the internet during build, this now just copies the vendored files into the GOPATH. Also, the protocol buffer library moved from Google Code to GitHub, which is reflected in these updates. This fixes https://github.com/prometheus/prometheus/issues/525	2015-02-17 02:08:56 +01:00
beorn7	11b3c2387c	Improvements after review. - Increase samplesQueueCapacity. - Improve docstring for the above. - Accept a short waiting period for the ingest channel to become ready. This should depend on the http timeout, but 100ms is probably good enough to cushion bursts bigger than samplesQueueCapacity, while it is unlikely that anybody ever will set an HTTP timeout similarly short.	2015-02-10 14:58:46 +01:00
beorn7	0f191629c6	Next try to deal with backed-up ingestion. This is now not even trying to throttle in a benign way, but creates a fully-fledged error. Advantage: It shows up very visible on the status page. Disadvantage: The server does not really adjusts to a lower scraping rate. However, if your ingestion backs up, you are in a very irregulare state, I'd say it _should_ be considered an error and not dealt with in a more graceful way. In different news: I'll work on optimizing ingestion so that we will not as easily run into that situation in the first place.	2015-02-09 17:32:47 +01:00
beorn7	16a1a6d324	Add another check for stopped scraper.	2015-02-06 18:30:33 +01:00
beorn7	5678a86924	Throttle scraping if a scrape took longer than the configured interval. The simple algorithm applied here will increase the actual interval incrementally, whenever and as long as the scrape itself takes longer than the configured interval. Once it takes shorter again, the actual interval will iteratively decrease again.	2015-02-06 16:44:56 +01:00
Bjoern Rabenstein	5859b74f1b	Clean up license issues. - Move CONTRIBUTORS.md to the more common AUTHORS. - Added the required NOTICE file. - Changed "Prometheus Team" to "The Prometheus Authors". - Reverted the erroneous changes to the Apache License.	2015-01-21 20:07:45 +01:00
Bjoern Rabenstein	b09453af1d	Adjust to new client_golang API.	2015-01-21 15:42:25 +01:00
Julius Volz	d6b9e97655	Remove extraction.Result type, simplify code.	2015-01-08 16:34:01 +01:00
juliusv	917acb6baf	Merge pull request #429 from brian-brazil/scrape-time Have scrape time as a pseudovariable, not a prometheus variable.	2015-01-02 13:22:04 +01:00
Brian Brazil	e56786b221	Have scrape time as a pseudovariable, not a prometheus variable. This ensures it has the right timestamp, and is easier to work with. Switch sd variable away from 'outcome', using total/failed instead.	2014-12-27 00:39:33 +00:00
Brian Brazil	89c43dd0d7	Sort targets on the status page. Change-Id: I6b59c97ab50093c50b608e29be2304475bc5d9f6	2014-12-26 13:14:19 +00:00
Johannes 'fish' Ziemke	ff95a52b0f	Rename Address to URL The "Address" is actually a URL which may contain username and password. Calling this Address is misleading so we rename it. Change-Id: I441c7ab9dfa2ceedc67cde7a47e6843a65f60511	2014-12-18 12:18:16 +01:00
Bjoern Rabenstein	b1e4956142	Apply a giant code cleanup. Essentially: - Remove unused code. - Make it 'go vet' clean. The only remaining warnings are in generated code. - Make it 'golint' clean. The only remaining warnings are in gerenated code. - Smoothed out same minor things. Change-Id: I3fe5c1fbead27b0e7a9c247fee2f5a45bc2d42c6	2014-12-10 16:16:49 +01:00
Bjoern Rabenstein	89bb376bce	Reduce lock-protected area during scrape. Change-Id: Iaa7faa7c916b1890b568d05bd8bfff6299b6767d	2014-12-05 19:40:41 +01:00
Bjoern Rabenstein	fee88a7a77	Remove the remaining races, new and old. Also, resolve a few other TODOs. Change-Id: Icb39b5a5e8ca22ebcb48771cd8951c5d9e112691	2014-12-03 18:07:23 +01:00
Bjoern Rabenstein	14bda4180c	Changes after pair code review. Change-Id: Ib72d40f8e9027818cfbbd32a7a7201eebda07455	2014-11-25 17:12:59 +01:00
Bjoern Rabenstein	a2feed343a	Convert another occurrence from chan bool to chan struct{}. Change-Id: I11ba127a934ee3aec0fcd139ad32a7751cff77a0	2014-11-25 17:10:39 +01:00
Bjoern Rabenstein	74c143c4c9	Improve scraper shutdown time. - Stop target pools in parallel. - Stop individual scrapers in goroutines, too. - Timing tweaks. Change-Id: I9dff1ee18616694f14b04408eaf1625d0f989696	2014-11-25 17:10:39 +01:00
Bjoern Rabenstein	92156ee89d	Drain the newBaseLabels channel upon shutdown. This should help cut down shutdown times. Change-Id: I6e70a598a9e49aa6eeeb2034105b1bc6e9014324	2014-11-25 17:10:39 +01:00
Bjoern Rabenstein	6b37e47f9e	Remove unused metrics. Change-Id: Icf03ba4ce92a5e38daf12930f9661daba79c83bb	2014-11-25 17:09:03 +01:00
Bjoern Rabenstein	4fc8ad6677	Fix retrieval unit tests. Change-Id: I299b71406b59539230e5182ccc37bc8a83af60b3	2014-11-25 17:08:45 +01:00
Bjoern Rabenstein	b3ed9aa7a2	Clean up start-up and shut-down. Change-Id: Idff4bbb0a15a9f879bfbb3da5b1025179cab5e2c	2014-11-25 17:08:45 +01:00
Bjoern Rabenstein	4447708c9f	Fix a race in target.go. Also, fix problems in shutdown. Starting serving and shutdown still has to be cleaned up properly. It's a mess. Change-Id: I51061db12064e434066446e6fceac32741c4f84c	2014-11-25 17:08:45 +01:00
Bjoern Rabenstein	38fc24d0ed	Fix targetpool_test.go and other tests. Change-Id: I91a4dd1d39e01f174e1aaae653ce1ed7aecaa624	2014-11-25 17:08:26 +01:00
Julius Volz	7f5d3c2c29	Fix and improve the fp locker. Benchmark: $ go test -bench 'Fingerprint' -test.run 'Fingerprint' -test.cpu=1,2,4 OLD BenchmarkFingerprintLockerParallel 500000 3618 ns/op BenchmarkFingerprintLockerParallel-2 100000 12257 ns/op BenchmarkFingerprintLockerParallel-4 500000 10164 ns/op BenchmarkFingerprintLockerSerial 10000000 283 ns/op BenchmarkFingerprintLockerSerial-2 10000000 284 ns/op BenchmarkFingerprintLockerSerial-4 10000000 288 ns/op NEW BenchmarkFingerprintLockerParallel 1000000 1018 ns/op BenchmarkFingerprintLockerParallel-2 1000000 1164 ns/op BenchmarkFingerprintLockerParallel-4 2000000 910 ns/op BenchmarkFingerprintLockerSerial 50000000 56.0 ns/op BenchmarkFingerprintLockerSerial-2 50000000 47.9 ns/op BenchmarkFingerprintLockerSerial-4 50000000 54.5 ns/op Change-Id: I3c65a43822840e7e64c3c3cfe759e1de51272581	2014-11-25 17:07:45 +01:00
Bjoern Rabenstein	e0a6cb281e	Fix the accept header. A '/' is a separator and has to be in a quoted string. Change-Id: If7a3a847f84f8f709074d05dc98b5b21e954030c	2014-11-25 17:02:00 +01:00
Brian Brazil	5edf689133	Stagger scrapes to spread out load. Change-Id: Ib141b271e4adfb817886871f86051c207b05cf35	2014-11-25 17:02:00 +01:00
Bjoern Rabenstein	1909686789	Make metrics exported by the Prometheus server itself more consistent. - Always spell out the time unit (e.g. milliseconds instead of ms). - Remove "_total" from the names of metrics that are not counters. - Make use of the "Namespace" and "Subsystem" fields in the options. - Removed the "capacity" facet from all metrics about channels/queues. These are all fixed via command line flags and will never change during the runtime of a process. Also, they should not be part of the same metric family. I have added separate metrics for the capacity of queues as convenience. (They will never change and are only set once.) - I left "metric_disk_latency_microseconds" unchanged, although that metric measures the latency of the storage device, even if it is not a spinning disk. "SSD" is read by many as "solid state disk", so it's not too far off. (It should be "solid state drive", of course, but "metric_drive_latency_microseconds" is probably confusing.) - Brian suggested to not mix "failure" and "success" outcome in the same metric family (distinguished by labels). For now, I left it as it is. We are touching some bigger issue here, especially as other parts in the Prometheus ecosystem are following the same principle. We still need to come to terms here and then change things consistently everywhere. Change-Id: If799458b450d18f78500f05990301c12525197d3	2014-11-25 17:02:00 +01:00
Brian Brazil	4a2b96f848	Remove backoff on scrape failure. Having metrics with variable timestamps inconsistently spaced when things fail will make it harder to write correct rules. Update status page, requires some refactoring to insert a function. Change-Id: Ie1c586cca53b8f3b318af8c21c418873063738a8	2014-11-25 17:02:00 +01:00
Julius Volz	1bb7074fec	Fix HTTP connection leak upon non-OK status. Change-Id: Ie7fbd7dcc089b8306b40631be3e3d736c23c1cd3	2014-11-25 17:02:00 +01:00
Bjoern Rabenstein	bacc31d5cc	Remove work-around that required copying all bytes of a scrape. Now that the subtle bug in matttproud/golang_protobuf_extensions is fixed, we do not need to copy the bytes of a scrape into a buffer first before starting to parse it. Change-Id: Ib73ecae16173ddd219cda56388a8f853332f8853	2014-11-25 17:01:59 +01:00
Bjoern Rabenstein	8956faeccb	Migrate to new client_golang. This change will only be submitted when the new client_golang has been moved to the new version. Change-Id: Ifceb59333072a08286a8ac910709a8ba2e3a1581	2014-11-25 17:01:59 +01:00
Bjoern Rabenstein	814e479723	Treat non-200 HTTP response as error. Change-Id: I2a9f3b47012b3c4839be53aa44c66d16dd41a24a	2014-11-25 17:01:59 +01:00
Bjoern Rabenstein	ca6a4fccef	Weed out our homegrown test.Tester. The Go stdlib has testing.TB now, which fulfills the exact same purpose. Change-Id: I0db9c73400e208ca376b932a02b7e3402234b87c	2014-05-21 19:27:24 +02:00
Brian Brazil	23255f1499	Fix negative Next Retrieval on status page. Change-Id: Ifa754034660a251fee71f166dbf057697ec4e872	2014-05-12 15:24:34 +01:00
Bjoern Rabenstein	64811caaec	Make Prometheus announce its new super-power: text format! Change-Id: Ia2ddfb28999c145e4d46c395381a9bf89d43148c	2014-04-22 18:44:52 +02:00
Julius Volz	84df022025	Cleanup server address handling, support IPv6. This fixes https://github.com/prometheus/prometheus/issues/377, as IPv6 server addresses are now handled correctly. Change-Id: Iebde7cfdadb0a52041472517e6fdcff4303a25ab	2014-03-09 23:31:30 +01:00
Julius Volz	b382e8b7bd	Remove overly verbose DNS-SD logging line. Change-Id: Ie4534437ab88b9a6b99f5cb6c2f32c9588c1fff6	2014-01-24 16:09:41 +01:00
Julius Volz	0378c2ca1f	Nonexistent labels in BY-clauses shouldn't propagate to result. This fixes bug 2. of https://github.com/prometheus/prometheus/issues/374 Change-Id: Ia4a13153616bafce5bf10597966b071434422d09	2014-01-24 16:05:30 +01:00
Stuart Nelson	48a6326d25	Added DNS-SD lookup counter for successful/unsuccessful lookups Change-Id: I0a71e994a989cecace280b5134a31ebc2ace7591	2013-12-16 08:48:56 -05:00

1 2 3 4 5

230 commits