prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-09-20 07:47:31 -07:00

Author	SHA1	Message	Date
Krasi Georgiev	638818a974	some Discoverers send nil targetgroup so need to check for it when updating a group	2018-01-04 13:57:34 +00:00
Krasi Georgiev	7e28397a2c	discovery - handle Discoverers that send only target Group updates rather than all Targets on every update. Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>	2018-01-04 13:28:37 +00:00
Shubheksha Jalan	ec94df49d4	Refactor SD configuration to remove `config` dependency (#3629 ) * refactor: move targetGroup struct and CheckOverflow() to their own package * refactor: move auth and security related structs to a utility package, fix import error in utility package * refactor: Azure SD, remove SD struct from config * refactor: DNS SD, remove SD struct from config into dns package * refactor: ec2 SD, move SD struct from config into the ec2 package * refactor: file SD, move SD struct from config to file discovery package * refactor: gce, move SD struct from config to gce discovery package * refactor: move HTTPClientConfig and URL into util/config, fix import error in httputil * refactor: consul, move SD struct from config into consul discovery package * refactor: marathon, move SD struct from config into marathon discovery package * refactor: triton, move SD struct from config to triton discovery package, fix test * refactor: zookeeper, move SD structs from config to zookeeper discovery package * refactor: openstack, remove SD struct from config, move into openstack discovery package * refactor: kubernetes, move SD struct from config into kubernetes discovery package * refactor: notifier, use targetgroup package instead of config * refactor: tests for file, marathon, triton SD - use targetgroup package instead of config.TargetGroup * refactor: retrieval, use targetgroup package instead of config.TargetGroup * refactor: storage, use config util package * refactor: discovery manager, use targetgroup package instead of config.TargetGroup * refactor: use HTTPClient and TLS config from configUtil instead of config * refactor: tests, use targetgroup package instead of config.TargetGroup * refactor: fix tagetgroup.Group pointers that were removed by mistake * refactor: openstack, kubernetes: drop prefixes * refactor: remove import aliases forced due to vscode bug * refactor: move main SD struct out of config into discovery/config * refactor: rename configUtil to config_util * refactor: rename yamlUtil to yaml_config * refactor: kubernetes, remove prefixes * refactor: move the TargetGroup package to discovery/ * refactor: fix order of imports	2017-12-29 21:01:34 +01:00
Callum Styan	d76d5de66f	refactor to make timestamp collector work for multiple file_sd's	2017-12-23 10:13:11 +00:00
KalivarapuReshma	a00fc883c3	Add metric for timestamp of the files file_sd is using.	2017-12-23 10:13:11 +00:00
pasquier-s	78625f85a7	Fix race condition on file SD (#3468 ) The file discovery should only stop the watcher if it has been created otherwise it may trigger a segmentation fault.	2017-12-21 10:07:43 +00:00
Krasi Georgiev	587dec9eb9	rebased and resolved conflicts with the new Discovery GUI page Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>	2017-12-18 20:10:03 +00:00
Krasi Georgiev	80182a5d82	use poolKey as the pool map key to avoid multi dimensional maps	2017-12-18 17:23:47 +00:00
Krasi Georgiev	1ec76d1950	rearange the contexts variables and logic split the groupsMerge function to set and get other small nits	2017-12-18 17:23:47 +00:00
Krasi Georgiev	f2df712166	updated README	2017-12-18 17:22:50 +00:00
Krasi Georgiev	aca8f85699	fixed the tests	2017-12-18 17:22:50 +00:00
Krasi Georgiev	fe6c544532	some renaming and comments fixes. remove some select state that is most likely obsoleete and hoepfully doesn't braje anything :) merge targets will sort by Discoverer name so we can have consistent tests for the maps.	2017-12-18 17:22:50 +00:00
Krasi Georgiev	f5c2c5ff8f	brake the start provider func so that can run unit tests against it.	2017-12-18 17:22:50 +00:00
Krasi Georgiev	c5cb0d2910	simplify naming and API.	2017-12-18 17:22:50 +00:00
Krasi Georgiev	9c61f0e8a0	scrape pool doesn't rely on context as Stop() needs to be blocking to prevent Scrape loops trying to write to a closed TSDB storage.	2017-12-18 17:22:49 +00:00
Krasi Georgiev	e405e2f1ea	refactored discovery	2017-12-18 17:22:49 +00:00
Brian Brazil	81db4716c1	Mention SD moratorium in README (#3573 )	2017-12-11 15:38:23 +00:00
Will Howard	6a80fc24cf	Parse the normalized container.PortMappings presented by the Marathon 1.5.x API Fixes #3465	2017-12-06 11:23:12 -05:00
Brian Brazil	d7b3df5ae1	Fix staticcheck errors	2017-12-02 14:52:13 +00:00
Krasi Georgiev	29506e0bca	one meaningless write to the config file to trigger anothe fsnotify (#3492 )	2017-12-01 17:32:27 +00:00
Tom Wilkie	099c50ce93	Avoid empty pod UID in test.	2017-11-24 15:02:42 +00:00
Tom Wilkie	9811e90d65	Fix tests.	2017-11-24 12:24:13 +00:00
Tom Wilkie	06dc1e8797	Include Pod UID in the discovery metadata.	2017-11-20 21:09:47 +00:00
Tobias Schmidt	91be55ebf0	Merge pull request #3458 from grandbora/test-race Fix race in test	2017-11-13 17:57:21 +01:00
Bora Tunca	493fd6bd1f	Fix race in test	2017-11-13 11:47:59 -05:00
Krasi Georgiev	1005ef0a70	Fix flaky file discovery tests - sync the channel draining goroutine	2017-11-13 12:12:01 +00:00
Bora Tunca	3cc01a3088	Add more discovery tests for updating target groups (#3426 ) * Adds a test covering the case where a target providers sends updated versions of the same target groups and the system should reconcile to the latest version of each of the target groups * Refactors how input data is represented in the tests. It used to be literal declarations of necessary structs, now it is parsing yaml. Yaml declarations are half as long as the former. And these can be put in a fixture file * Adds a tiny bit of refactoring on test timeouts	2017-11-12 03:39:08 +01:00
Krasi Georgiev	c8a735ceb6	Fix flaky file discovery tests (#3438 ) * flaky test caused by invalid fsnotify updates before the test files are written to disk causing the fd service to send empty `group[]` struct * `close(filesReady)` needs to be before the file closing so that fsnotify triggers a new loop of the discovery service. * nits * use filepath.Join for the path url to be cross platform * stupid mistake revert	2017-11-11 17:20:39 +01:00
Bora Tunca	e63219ae6a	Add discovery test (#3417 )	2017-11-06 17:33:52 +00:00
Bora Tunca	09be10a553	Add test to prove redundant calls to identical target providers (#3404 )	2017-11-06 16:14:15 +00:00
beorn7	348ea482ea	Merge branch 'beorn7/release'	2017-11-04 12:32:49 +01:00
Dominik Schulz	a731a43302	Guard against tags being nil in EC2 discovery Fixes #3001	2017-11-03 13:23:01 +01:00
Callum Styan	7776527390	bump consul HTTP client timeout by 5s so it doesn't match up exactly with the consul SD watch timeout	2017-10-28 16:42:42 -07:00
Jason Anderson	808f79f00a	Feature: Allow getting credentials via EC2 role (#3343 ) * Allow getting credentials via EC2 role This is subtly different than the existing `role_arn` solution, which allows Prometheus to assume an IAM role given some set of credentials already in-scope. With EC2 roles, one specifies the role at instance launch time (via an instance profile.) The instance then exposes temporary credentials via its metadata. The AWS Go SDK exposes a credential provider that polls the [instance metadata endpoint][1] already, so we can simply use that and it will take care of renewing the credentials when they expire. Without this, if this is being used inside EC2, it is difficult to cleanly allow the use of STS credentials. One has to set up a proxy role that can assume the role you really want, and launch the EC2 instance with the proxy role. This isn't very clean, and also doesn't seem to be [supported very well][2]. [1]: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html [2]: https://github.com/aws/aws-cli/issues/1390 * Automatically try to detect EC2 role credentials The `Available()` function exposed on ec2metadata returns a simple true/false if the ec2 metadata is available. This is the best way to know if we're actually running in EC2 (which is the only valid use-case for this credential provider.) This allows this to "just work" if you are using EC2 instance roles.	2017-10-25 14:15:39 +01:00
Julius Volz	099df0c5f0	Migrate "golang.org/x/net/context" -> "context" (#3333 ) In some places, where ctxhttp or gRPC are concerned, we still need to use the old contexts.	2017-10-24 21:21:42 -07:00
Julius Volz	c3d6abc8e6	Fix some lint errors (#3334 ) I left the promql ones and some others untouched as I remember that @fabxc prefers them that way.	2017-10-23 14:57:30 +01:00
Callum Styan	45f9f3c539	use a timeout in the HTTP client used for consul sd (#3303 )	2017-10-20 16:56:30 +01:00
Alexander Kazarin	2c163f32a5	fix for issue 2976 (#3313 ) fix for null pointer exception in ZookeeperLogger	2017-10-18 17:02:20 +01:00
pasquier-s	88e4815bb7	Get OpenStack variables from env as fallback (#3293 ) This change enables the OpenStack service discovery to read the authentication parameters from the OS_* environment variables when the identity endpoint URL is not defined in the Prometheus configuration file.	2017-10-16 18:01:50 +01:00
Marc Sluiter	6a633eece1	Added go-conntrack for monitoring http connections (#3241 ) Added metrics for in- and outgoing traffic with go-conntrack.	2017-10-06 11:22:19 +01:00
Fabian Reinartz	2d0b8e8b94	Merge branch 'master' into dev-2.0	2017-10-05 13:09:18 +02:00
Goutham Veeramachaneni	3f0267c548	Merge branch 'dev-2.0' into go-kit/log Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-09-15 23:15:27 +05:30
beorn7	84211bd2df	Foward-merge bug fixes and cherry-picks from 'release-1.7'	2017-09-15 13:44:22 +02:00
Matt Palmer	3369422327	Improve DNS response handling to prevent "stuck" records [Fixes #2799 ] (#3138 ) The problem reported in #2799 was that in the event that all records for a name were removed, the target group was never updated to be the "empty" set. Essentially, whatever Prometheus last saw as a non-empty list of targets would stay that way forever (or at least until Prometheus restarted...). This came about because of a fairly naive interpretation of what a valid-looking DNS response actually looked like -- essentially, the only valid DNS responses were ones that had a non-empty record list. That's fine as long as your config always lists only target names which have non-empty record sets; if your environment happens to legitimately have empty record sets sometimes, all hell breaks loose (otherwise-cleanly shutdown systems trigger up==0 alerts, for instance). This patch is a refactoring of the DNS lookup behaviour that maintains existing behaviour with regard to search paths, but correctly handles empty and non-existent record sets. RFC1034 s4.3.1 says there's three ways a recursive DNS server can respond: 1. Here is your answer (possibly an empty answer, because of the way DNS considers all records for a name, regardless of type, when deciding whether the name exists). 2. There is no spoon (the name you asked for definitely does not exist). 3. I am a teapot (something has gone terribly wrong). Situations 1 and 2 are fine and dandy; whatever the answer is (empty or otherwise) is the list of targets. If something has gone wrong, then we shouldn't go updating the target list because we don't really know what the target list should be. Multiple DNS servers to query is a straightforward augmentation; if you get an error, then try the next server in the list, until you get an answer or run out servers to ask. Only if all the servers return errors should you return an error to the calling code. Where things get complicated is the search path. In order to be able to confidently say, "this name does not exist anywhere, you can remove all the targets for this name because it's definitely GORN", at least one server for all the possible names need to return either successful-but-empty responses, or NXDOMAIN. If any name errors out, then -- since that one might have been the one where the records came from -- you need to say "maintain the status quo until we get a known-good response". It is possible, though unlikely, that a poorly-configured DNS setup (say, one which had a domain in its search path for which all configured recursive resolvers respond with REFUSED) could result in the same "stuck" records problem we're solving here, but the DNS configuration should be fixed in that case, and there's nothing we can do in Prometheus itself to fix the problem. I've tested this patch on a local scratch instance in all the various ways I can think of: 1. Adding records (targets get scraped) 2. Adding records of a different type 3. Remove records of the requested type, leaving other type records intact (targets don't get scraped) 4. Remove all records for the name (targets don't get scraped) 5. Shutdown the resolver (targets still get scraped) There's no automated test suite additions, because there isn't a test suite for DNS discovery, and I was stretching my Go skills to the limit to make this happen; mock objects are beyond me.	2017-09-15 12:26:10 +02:00
Goutham Veeramachaneni	f5aed810f9	logging: Port to common/promlog Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-09-15 12:40:50 +05:30
Matt Bostock	e758260986	Marathon SD: Set port index label The changes [1][] to Marathon service discovery to support multiple ports mean that Prometheus now attempts to scrape all ports belonging to a Marathon service. You can use port definition or port mapping labels to filter out which ports to scrape but that requires service owners to update their Marathon configuration. To allow for a smoother migration path, add a `__meta_marathon_port_index` label, whose value is set to the port's sequential index integer. For example, PORT0 has the value `0`, PORT1 has the value `1`, and so on. This allows you to support scraping both the first available port (the previous behaviour) in addition to ports with a `metrics` label. For example, here's the relabel configuration we might use with this patch: - action: keep source_labels: ['__meta_marathon_port_definition_label_metrics', '__meta_marathon_port_mapping_label_metrics', '__meta_marathon_port_index'] # Keep if port mapping or definition has a 'metrics' label with any # non-empty value, or if no 'metrics' port label exists but this is the # service's first available port regex: ([^;]+;;[^;]+\|;[^;]+;[^;]+\|;;0) This assumes that the Marathon API returns the ports in sorted order (matching PORT0, PORT1, etc), which it appears that it does. [1]: https://github.com/prometheus/prometheus/pull/2506	2017-09-11 13:40:51 +01:00
Fabian Reinartz	e746282772	Merge branch 'master' into dev-2.0	2017-09-11 10:55:19 +02:00
Jamie Moore	7a135e0a1b	Add the ability to assume a role for ec2 discovery	2017-09-10 00:36:43 +10:00
Fabian Reinartz	d21f149745	*: migrate to go-kit/log	2017-09-08 22:01:51 +05:30
Johannes 'fish' Ziemke	75aec7d970	k8s: Use versioned struct for ingress discovery	2017-09-06 12:47:03 +02:00

1 2

98 commits