Commit graph

183 commits

Author SHA1 Message Date
Yecheng Fu 56ed29fbf7 Map target infos of endpoints to prometheus meta labels. (#3770) 2018-03-09 10:07:00 +00:00
Marek Siarkowicz 86011047ca Validate required fields in sd configuration (#3911) 2018-03-05 19:27:54 +00:00
Krasi Georgiev 6b0e9ef183 Validate json parse for TargetGroup Unmarshal (#3614)
Using DisallowUnknownFields in golang 1.10 to forbid unknown fields in targetGroups
added the license header for the targetGroup test
2018-02-27 12:33:27 +00:00
Krasi Georgiev 4fa7e719f4 race in Triton SD Test (#3885) 2018-02-26 10:03:50 +00:00
ferhat elmas ffa673f7d8 General simplifications (#3887)
Another try as in #1516
2018-02-26 07:58:10 +00:00
Pedro Araújo 575f665944 Add OS type meta label to Azure SD (#3863)
There is currently no way to differentiate Windows instances from Linux
ones. This is needed when you have a mix of node_exporters /
wmi_exporters for OS-level metrics and you want to have them in separate
scrape jobs.

This change allows you to do just that. Example:

```
  - job_name: 'node'
    azure_sd_configs:
      - <azure_sd_config>
    relabel_configs:
      - source_labels: [__meta_azure_machine_os_type]
        regex: Linux
        action: keep
```

The way the vendor'd AzureSDK provides to get the OsType is a bit
awkward - as far as I can tell, this information can only be gotten from
the startup disk. Newer versions of the SDK appear to improve this a
bit (by having OS information in the InstanceView), but the current way
still works.
2018-02-19 15:40:57 +00:00
Simon Pasquier 2072bbc824 Send update when pod's IP address is empty
When the pod gets evicted, its IP address becomes empty and it needs to
be removed from the targets.
2018-02-14 14:23:52 +01:00
Krasi Georgiev b75428ec19 rename package retrieve to scrape
no fucnctinal changes just renaming retrieval to scrape
2018-02-01 09:55:07 +00:00
Frederic Branczyk d3ae1ac40e
Merge pull request #3741 from krasi-georgiev/discovery-race
read/write race for the  context field in the discovery package
2018-01-30 18:17:09 +01:00
pasquier-s bde64cf5a6 Fix Kubernetes endpoints SD for empty subsets (#3660)
* Fix Kubernetes endpoints SD for empty subsets

When an endpoints object has no associated pods (replica scaled to zero
for instance), the endpoints SD should return a target group with no
targets so that the SD manager propagates this information to the scrape
manager.

Fixes #3659

* Don't send nil target groups from the Kubernetes SD

This is to be consistent with the endpoints SD part.
2018-01-30 15:00:33 +00:00
Krasi Georgiev 818dda72db updated the sd tests 2018-01-29 15:19:15 +00:00
Krasi Georgiev acc4197098 remove dicovery race for the context field 2018-01-29 15:18:07 +00:00
Frederic Branczyk 47538cf6ce
Merge pull request #3747 from prometheus/sched-update-throttle
Update throttle & tsdb update
2018-01-29 16:05:05 +01:00
Frederic Branczyk 73e829137b
discovery: Cleanup ticker 2018-01-29 13:51:04 +01:00
Ganesh Vernekar 66b0aa3b45 Fixed race condition in map iteration and map write in Discovery (#3735) (#3738)
* Fixed concurrent map iteration and map write in Discovery (#3735)

* discovery: Changed Lock to RLock in Collect
2018-01-28 22:24:31 +05:30
Krasi Georgiev fe926e7829 update the discover tests
the discovery test is now only testing update and get groups.
It doesn't do an e2e test but just a unit test of setting and receiving
target groups
2018-01-27 12:03:06 +00:00
Callum Styan 7dc05538f7 docs: SD implementations do not have to only send new/changed target groups (#3713) 2018-01-26 22:03:11 +00:00
Frederic Branczyk cfa0253ed8
discovery: Schedule updates to throttle 2018-01-26 16:24:44 +01:00
zemek 8a01a0fbed Set consul server default to localhost:8500 (#3703) 2018-01-24 12:14:32 +00:00
Julius Volz 09e460a647
discovery: Rename file SD mtime metric (#3723)
- "timestamp" -> "mtime" to be in line with node exporter and clearer.
- add unit suffix
2018-01-22 14:02:24 +01:00
Krasi Georgiev ec26751fd2 use mutexes for the discovery manager instead of a loop as this was a stupid idea 2018-01-17 18:12:58 +00:00
Krasi Georgiev 767faa44b6 fixed the tests
Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>
2018-01-15 13:39:47 +00:00
Krasi Georgiev d12e6f29fc discovery manager ApplyConfig now takes a direct ServiceDiscoveryConfig so that it can be used for the notify manager
reimplement the service discovery for the notify manager

Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>
2018-01-15 13:39:44 +00:00
Goutham Veeramachaneni b20a1b1b1b
Merge pull request #3654 from krasi-georgiev/discovery-handle-discoverer-updates
discovery - handle Discoverers that send only target Group updates.
2018-01-15 18:53:22 +05:30
Krasi Georgiev 790cf30fcb remove uneeded check 2018-01-15 11:52:20 +00:00
Krasi Georgiev 38938ba493 comment nits 2018-01-15 11:47:36 +00:00
Krasi Georgiev febebcd49a more comments for the future ME, and reverted the Discovery manager execution changes as these were correct in the first place 2018-01-12 22:07:21 +00:00
Krasi Georgiev 78ba5e62a6 few mote usefull comments 2018-01-12 13:58:23 +00:00
Krasi Georgiev cabce21b70 delete empty targets sets to avoid memory leaks 2018-01-12 13:10:59 +00:00
Krasi Georgiev abfd9f1920 nits 2018-01-12 12:19:52 +00:00
Shubheksha Jalan 0471e64ad1 Use shared types from the common repo (#3674)
* refactor: use shared types from common repo, remove util/config

* vendor: add common/config

* fix nit
2018-01-11 16:10:25 +01:00
Krasi Georgiev 546c29af5b return early for nil target groups 2018-01-09 16:34:23 +00:00
Callum Styan 97464236c7 comments with TargetProvider should read Discoverer instead (#3667) 2018-01-08 23:59:18 +00:00
Krasi Georgiev 77bf6bece0 discovery-manager comment update 2018-01-04 21:57:28 +00:00
Krasi Georgiev 135ea0f793 discovery manager - doesn't need sorting of the target groups so move it in the discovery manager tests as we only need it there.
discovery manager - refactor the discovery tests.

Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>
2018-01-04 21:41:54 +00:00
Krasi Georgiev 638818a974 some Discoverers send nil targetgroup so need to check for it when updating a group 2018-01-04 13:57:34 +00:00
Krasi Georgiev 7e28397a2c discovery - handle Discoverers that send only target Group updates rather than all Targets on every update.
Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>
2018-01-04 13:28:37 +00:00
Shubheksha Jalan ec94df49d4 Refactor SD configuration to remove config dependency (#3629)
* refactor: move targetGroup struct and CheckOverflow() to their own package

* refactor: move auth and security related structs to a utility package, fix import error in utility package

* refactor: Azure SD, remove SD struct from config

* refactor: DNS SD, remove SD struct from config into dns package

* refactor: ec2 SD, move SD struct from config into the ec2 package

* refactor: file SD, move SD struct from config to file discovery package

* refactor: gce, move SD struct from config to gce discovery package

* refactor: move HTTPClientConfig and URL into util/config, fix import error in httputil

* refactor: consul, move SD struct from config into consul discovery package

* refactor: marathon, move SD struct from config into marathon discovery package

* refactor: triton, move SD struct from config to triton discovery package, fix test

* refactor: zookeeper, move SD structs from config to zookeeper discovery package

* refactor: openstack, remove SD struct from config, move into openstack discovery package

* refactor: kubernetes, move SD struct from config into kubernetes discovery package

* refactor: notifier, use targetgroup package instead of config

* refactor: tests for file, marathon, triton SD - use targetgroup package instead of config.TargetGroup

* refactor: retrieval, use targetgroup package instead of config.TargetGroup

* refactor: storage, use config util package

* refactor: discovery manager, use targetgroup package instead of config.TargetGroup

* refactor: use HTTPClient and TLS config from configUtil instead of config

* refactor: tests, use targetgroup package instead of config.TargetGroup

* refactor: fix tagetgroup.Group pointers that were removed by mistake

* refactor: openstack, kubernetes: drop prefixes

* refactor: remove import aliases forced due to vscode bug

* refactor: move main SD struct out of config into discovery/config

* refactor: rename configUtil to config_util

* refactor: rename yamlUtil to yaml_config

* refactor: kubernetes, remove prefixes

* refactor: move the TargetGroup package to discovery/

* refactor: fix order of imports
2017-12-29 21:01:34 +01:00
Callum Styan d76d5de66f refactor to make timestamp collector work for multiple file_sd's 2017-12-23 10:13:11 +00:00
KalivarapuReshma a00fc883c3 Add metric for timestamp of the files file_sd is using. 2017-12-23 10:13:11 +00:00
pasquier-s 78625f85a7 Fix race condition on file SD (#3468)
The file discovery should only stop the watcher if it has been created
otherwise it may trigger a segmentation fault.
2017-12-21 10:07:43 +00:00
Krasi Georgiev 587dec9eb9 rebased and resolved conflicts with the new Discovery GUI page
Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>
2017-12-18 20:10:03 +00:00
Krasi Georgiev 80182a5d82 use poolKey as the pool map key to avoid multi dimensional maps 2017-12-18 17:23:47 +00:00
Krasi Georgiev 1ec76d1950 rearange the contexts variables and logic
split the groupsMerge function to set and get
other small nits
2017-12-18 17:23:47 +00:00
Krasi Georgiev f2df712166 updated README 2017-12-18 17:22:50 +00:00
Krasi Georgiev aca8f85699 fixed the tests 2017-12-18 17:22:50 +00:00
Krasi Georgiev fe6c544532 some renaming and comments fixes.
remove some select state that is most likely obsoleete and hoepfully doesn't braje anything :)
merge targets will sort by Discoverer name so we can have consistent tests for the maps.
2017-12-18 17:22:50 +00:00
Krasi Georgiev f5c2c5ff8f brake the start provider func so that can run unit tests against it. 2017-12-18 17:22:50 +00:00
Krasi Georgiev c5cb0d2910 simplify naming and API. 2017-12-18 17:22:50 +00:00
Krasi Georgiev 9c61f0e8a0 scrape pool doesn't rely on context as Stop() needs to be blocking to prevent Scrape loops trying to write to a closed TSDB storage. 2017-12-18 17:22:49 +00:00
Krasi Georgiev e405e2f1ea refactored discovery 2017-12-18 17:22:49 +00:00
Brian Brazil 81db4716c1
Mention SD moratorium in README (#3573) 2017-12-11 15:38:23 +00:00
Will Howard 6a80fc24cf Parse the normalized container.PortMappings presented by the Marathon 1.5.x API
Fixes #3465
2017-12-06 11:23:12 -05:00
Brian Brazil d7b3df5ae1 Fix staticcheck errors 2017-12-02 14:52:13 +00:00
Krasi Georgiev 29506e0bca one meaningless write to the config file to trigger anothe fsnotify (#3492) 2017-12-01 17:32:27 +00:00
Tom Wilkie 099c50ce93 Avoid empty pod UID in test. 2017-11-24 15:02:42 +00:00
Tom Wilkie 9811e90d65 Fix tests. 2017-11-24 12:24:13 +00:00
Tom Wilkie 06dc1e8797 Include Pod UID in the discovery metadata. 2017-11-20 21:09:47 +00:00
Tobias Schmidt 91be55ebf0
Merge pull request #3458 from grandbora/test-race
Fix race in test
2017-11-13 17:57:21 +01:00
Bora Tunca 493fd6bd1f Fix race in test 2017-11-13 11:47:59 -05:00
Krasi Georgiev 1005ef0a70 Fix flaky file discovery tests - sync the channel draining goroutine 2017-11-13 12:12:01 +00:00
Bora Tunca 3cc01a3088 Add more discovery tests for updating target groups (#3426)
* Adds a test covering the case where a target providers sends updated versions of the same target groups and the system should reconcile to the latest version of each of the target groups
* Refactors how input data is represented in the tests. It used to be literal declarations of necessary structs, now it is parsing yaml. Yaml declarations are half as long as the former. And these can be put in a fixture file
* Adds a tiny bit of refactoring on test timeouts
2017-11-12 03:39:08 +01:00
Krasi Georgiev c8a735ceb6 Fix flaky file discovery tests (#3438)
* flaky test caused by invalid fsnotify updates before the test files are written to disk causing the fd service to send empty `group[]` struct

* `close(filesReady)` needs to be before the file closing so that fsnotify triggers a new loop of the discovery service.

* nits

* use filepath.Join for the path url to be cross platform

* stupid mistake revert
2017-11-11 17:20:39 +01:00
Bora Tunca e63219ae6a Add discovery test (#3417) 2017-11-06 17:33:52 +00:00
Bora Tunca 09be10a553 Add test to prove redundant calls to identical target providers (#3404) 2017-11-06 16:14:15 +00:00
beorn7 348ea482ea Merge branch 'beorn7/release' 2017-11-04 12:32:49 +01:00
Dominik Schulz a731a43302 Guard against tags being nil in EC2 discovery
Fixes #3001
2017-11-03 13:23:01 +01:00
Callum Styan 7776527390 bump consul HTTP client timeout by 5s so it doesn't match up exactly with the consul SD watch timeout 2017-10-28 16:42:42 -07:00
Jason Anderson 808f79f00a Feature: Allow getting credentials via EC2 role (#3343)
* Allow getting credentials via EC2 role

This is subtly different than the existing `role_arn` solution, which
allows Prometheus to assume an IAM role given some set of credentials
already in-scope. With EC2 roles, one specifies the role at instance
launch time (via an instance profile.) The instance then exposes
temporary credentials via its metadata. The AWS Go SDK exposes a
credential provider that polls the [instance metadata endpoint][1]
already, so we can simply use that and it will take care of renewing the
credentials when they expire.

Without this, if this is being used inside EC2, it is difficult to
cleanly allow the use of STS credentials. One has to set up a proxy role
that can assume the role you really want, and launch the EC2 instance
with the proxy role. This isn't very clean, and also doesn't seem to be
[supported very well][2].

[1]:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html
[2]: https://github.com/aws/aws-cli/issues/1390

* Automatically try to detect EC2 role credentials

The `Available()` function exposed on ec2metadata returns a simple
true/false if the ec2 metadata is available. This is the best way to
know if we're actually running in EC2 (which is the only valid use-case
for this credential provider.)

This allows this to "just work" if you are using EC2 instance roles.
2017-10-25 14:15:39 +01:00
Julius Volz 099df0c5f0 Migrate "golang.org/x/net/context" -> "context" (#3333)
In some places, where ctxhttp or gRPC are concerned, we still need to use the
old contexts.
2017-10-24 21:21:42 -07:00
Julius Volz c3d6abc8e6 Fix some lint errors (#3334)
I left the promql ones and some others untouched as I remember that @fabxc
prefers them that way.
2017-10-23 14:57:30 +01:00
Callum Styan 45f9f3c539 use a timeout in the HTTP client used for consul sd (#3303) 2017-10-20 16:56:30 +01:00
Alexander Kazarin 2c163f32a5 fix for issue 2976 (#3313)
fix for null pointer exception in ZookeeperLogger
2017-10-18 17:02:20 +01:00
pasquier-s 88e4815bb7 Get OpenStack variables from env as fallback (#3293)
This change enables the OpenStack service discovery to read the
authentication parameters from the OS_* environment variables when the
identity endpoint URL is not defined in the Prometheus configuration
file.
2017-10-16 18:01:50 +01:00
Marc Sluiter 6a633eece1 Added go-conntrack for monitoring http connections (#3241)
Added metrics for in- and outgoing traffic with go-conntrack.
2017-10-06 11:22:19 +01:00
Fabian Reinartz 2d0b8e8b94 Merge branch 'master' into dev-2.0 2017-10-05 13:09:18 +02:00
Goutham Veeramachaneni 3f0267c548 Merge branch 'dev-2.0' into go-kit/log
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-09-15 23:15:27 +05:30
beorn7 84211bd2df Foward-merge bug fixes and cherry-picks from 'release-1.7' 2017-09-15 13:44:22 +02:00
Matt Palmer 3369422327 Improve DNS response handling to prevent "stuck" records [Fixes #2799] (#3138)
The problem reported in #2799 was that in the event that all records for a
name were removed, the target group was never updated to be the "empty" set.
Essentially, whatever Prometheus last saw as a non-empty list of targets
would stay that way forever (or at least until Prometheus restarted...).  This
came about because of a fairly naive interpretation of what a valid-looking
DNS response actually looked like -- essentially, the only valid DNS responses
were ones that had a non-empty record list.  That's fine as long as your
config always lists only target names which have non-empty record sets; if
your environment happens to legitimately have empty record sets sometimes,
all hell breaks loose (otherwise-cleanly shutdown systems trigger up==0 alerts,
for instance).

This patch is a refactoring of the DNS lookup behaviour that maintains
existing behaviour with regard to search paths, but correctly handles empty
and non-existent record sets.

RFC1034 s4.3.1 says there's three ways a recursive DNS server can respond:

1.  Here is your answer (possibly an empty answer, because of the way DNS
   considers all records for a name, regardless of type, when deciding
   whether the name exists).

2. There is no spoon (the name you asked for definitely does not exist).

3. I am a teapot (something has gone terribly wrong).

Situations 1 and 2 are fine and dandy; whatever the answer is (empty or
otherwise) is the list of targets.  If something has gone wrong, then we
shouldn't go updating the target list because we don't really *know* what
the target list should be.

Multiple DNS servers to query is a straightforward augmentation; if you get
an error, then try the next server in the list, until you get an answer or
run out servers to ask.  Only if *all* the servers return errors should you
return an error to the calling code.

Where things get complicated is the search path.  In order to be able to
confidently say, "this name does not exist anywhere, you can remove all the
targets for this name because it's definitely GORN", at least one server for
*all* the possible names need to return either successful-but-empty
responses, or NXDOMAIN.  If any name errors out, then -- since that one
might have been the one where the records came from -- you need to say
"maintain the status quo until we get a known-good response".

It is possible, though unlikely, that a poorly-configured DNS setup (say,
one which had a domain in its search path for which all configured recursive
resolvers respond with REFUSED) could result in the same "stuck" records
problem we're solving here, but the DNS configuration should be fixed in
that case, and there's nothing we can do in Prometheus itself to fix the
problem.

I've tested this patch on a local scratch instance in all the various ways I
can think of:

1. Adding records (targets get scraped)

2. Adding records of a different type

3. Remove records of the requested type, leaving other type records intact
   (targets don't get scraped)

4. Remove all records for the name (targets don't get scraped)

5. Shutdown the resolver (targets still get scraped)

There's no automated test suite additions, because there isn't a test suite
for DNS discovery, and I was stretching my Go skills to the limit to make
this happen; mock objects are beyond me.
2017-09-15 12:26:10 +02:00
Goutham Veeramachaneni f5aed810f9 logging: Port to common/promlog
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-09-15 12:40:50 +05:30
Matt Bostock e758260986 Marathon SD: Set port index label
The changes [1][] to Marathon service discovery to support multiple
ports mean that Prometheus now attempts to scrape all ports belonging to
a Marathon service.

You can use port definition or port mapping labels to filter out which
ports to scrape but that requires service owners to update their
Marathon configuration.

To allow for a smoother migration path, add a
`__meta_marathon_port_index` label, whose value is set to the port's
sequential index integer. For example, PORT0 has the value `0`, PORT1
has the value `1`, and so on.

This allows you to support scraping both the first available port (the
previous behaviour) in addition to ports with a `metrics` label.

For example, here's the relabel configuration we might use with
this patch:

    - action: keep
      source_labels: ['__meta_marathon_port_definition_label_metrics', '__meta_marathon_port_mapping_label_metrics', '__meta_marathon_port_index']
      # Keep if port mapping or definition has a 'metrics' label with any
      # non-empty value, or if no 'metrics' port label exists but this is the
      # service's first available port
      regex: ([^;]+;;[^;]+|;[^;]+;[^;]+|;;0)

This assumes that the Marathon API returns the ports in sorted order
(matching PORT0, PORT1, etc), which it appears that it does.

[1]: https://github.com/prometheus/prometheus/pull/2506
2017-09-11 13:40:51 +01:00
Fabian Reinartz e746282772 Merge branch 'master' into dev-2.0 2017-09-11 10:55:19 +02:00
Jamie Moore 7a135e0a1b Add the ability to assume a role for ec2 discovery 2017-09-10 00:36:43 +10:00
Fabian Reinartz d21f149745 *: migrate to go-kit/log 2017-09-08 22:01:51 +05:30
Johannes 'fish' Ziemke 75aec7d970 k8s: Use versioned struct for ingress discovery 2017-09-06 12:47:03 +02:00
Fabian Reinartz 87918f3097 Merge branch 'master' into dev-2.0 2017-09-04 14:09:21 +02:00
Johannes 'fish' Ziemke 70f3d1e9f9 k8s: Support discovery of ingresses (#3111)
* k8s: Support discovery of ingresses

* Move additional labels below allocation

This makes it more obvious why the additional elements are allocated.
Also fix allocation for node where we only set a single label.

* k8s: Remove port from ingress discovery

* k8s: Add comment to ingress discovery example
2017-09-04 13:10:44 +02:00
Tobias Schmidt 29fff1eca4 Merge pull request #2966 from alkalinecoffee/consul-node-metadata
Add support for consul's node metadata
2017-09-02 18:43:25 +02:00
Tobias Schmidt d0a02703a2 Merge pull request #3105 from sak0/dev
discovery openstack: support discovery hypervisors, add rule option.
2017-08-31 14:08:16 +02:00
CuiHaozhi b1c18bf29b discovery openstack: support discovery hosts, add rule option.
Signed-off-by: CuiHaozhi <cuihz@wise2c.com>
2017-08-29 10:14:00 -04:00
Colstuwjx 2b49df2c61 Fix target group foreach nil bug, directly return err. 2017-08-22 08:37:39 +08:00
CuiHaozhi 31b6f8b04c discovery openstack: handle instances without ip
Signed-off-by: CuiHaozhi <cuihz@wise2c.com>
2017-08-11 12:36:12 -04:00
Fabian Reinartz 25f3e1c424 Merge branch 'master' into mergemaster 2017-08-10 17:04:25 +02:00
Fabian Reinartz ac511ecf30 Merge pull request #2970 from Gouthamve/docs/sd-interface
Add docs about SD interface
2017-08-01 22:44:28 +02:00
Goutham Veeramachaneni ab96e79bc8 Add docs about SD interface
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-08-01 13:53:50 +05:30
Fabian Reinartz 40db026381 Merge pull request #2957 from prometheus/sd-doc
Tweaks to SD README from review
2017-07-28 08:51:50 +02:00
Joe Martin aba41c7d0f add support for consul's node metadata 2017-07-18 16:46:16 -04:00
J. Taylor O'Connor 5a19ffb315 A few spelling corrections. (#2960) 2017-07-17 22:13:50 +01:00
Brian Brazil 84be97bd98 Tweaks to SD README from review 2017-07-17 14:20:54 +01:00
Brian Brazil 2a9ca394dd Document how/when to write service discovery (#2943) 2017-07-14 15:22:09 +01:00