prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2025-03-05 20:59:13 -08:00

Author	SHA1	Message	Date
Takahito Yamatoya	738a51bea6	#2371 fix to display utc date at datetime picker	2017-09-16 11:38:29 +09:00
Matt Palmer	3369422327	Improve DNS response handling to prevent "stuck" records [Fixes #2799 ] (#3138 ) The problem reported in #2799 was that in the event that all records for a name were removed, the target group was never updated to be the "empty" set. Essentially, whatever Prometheus last saw as a non-empty list of targets would stay that way forever (or at least until Prometheus restarted...). This came about because of a fairly naive interpretation of what a valid-looking DNS response actually looked like -- essentially, the only valid DNS responses were ones that had a non-empty record list. That's fine as long as your config always lists only target names which have non-empty record sets; if your environment happens to legitimately have empty record sets sometimes, all hell breaks loose (otherwise-cleanly shutdown systems trigger up==0 alerts, for instance). This patch is a refactoring of the DNS lookup behaviour that maintains existing behaviour with regard to search paths, but correctly handles empty and non-existent record sets. RFC1034 s4.3.1 says there's three ways a recursive DNS server can respond: 1. Here is your answer (possibly an empty answer, because of the way DNS considers all records for a name, regardless of type, when deciding whether the name exists). 2. There is no spoon (the name you asked for definitely does not exist). 3. I am a teapot (something has gone terribly wrong). Situations 1 and 2 are fine and dandy; whatever the answer is (empty or otherwise) is the list of targets. If something has gone wrong, then we shouldn't go updating the target list because we don't really know what the target list should be. Multiple DNS servers to query is a straightforward augmentation; if you get an error, then try the next server in the list, until you get an answer or run out servers to ask. Only if all the servers return errors should you return an error to the calling code. Where things get complicated is the search path. In order to be able to confidently say, "this name does not exist anywhere, you can remove all the targets for this name because it's definitely GORN", at least one server for all the possible names need to return either successful-but-empty responses, or NXDOMAIN. If any name errors out, then -- since that one might have been the one where the records came from -- you need to say "maintain the status quo until we get a known-good response". It is possible, though unlikely, that a poorly-configured DNS setup (say, one which had a domain in its search path for which all configured recursive resolvers respond with REFUSED) could result in the same "stuck" records problem we're solving here, but the DNS configuration should be fixed in that case, and there's nothing we can do in Prometheus itself to fix the problem. I've tested this patch on a local scratch instance in all the various ways I can think of: 1. Adding records (targets get scraped) 2. Adding records of a different type 3. Remove records of the requested type, leaving other type records intact (targets don't get scraped) 4. Remove all records for the name (targets don't get scraped) 5. Shutdown the resolver (targets still get scraped) There's no automated test suite additions, because there isn't a test suite for DNS discovery, and I was stretching my Go skills to the limit to make this happen; mock objects are beyond me.	2017-09-15 12:26:10 +02:00
Björn Rabenstein	4b8666b739	Merge pull request #3176 from prometheus/beorn7/release Backport the templating fix from master	2017-09-14 19:07:52 +02:00
beorn7	a3fd7dd335	Backport the templating fix from master The original fix is in commit `5f5d77848e`	2017-09-14 18:12:00 +02:00
Björn Rabenstein	df4bc3e407	Merge pull request #3170 from tomwilkie/1.7-2969-negative-shards Prevent number of remote write shards from going negative.	2017-09-14 13:29:34 +02:00
Tom Wilkie	4f8efdbd59	Prevent number of remote write shards from going negative. This can happen in the situation where the system scales up the number of shards massively (to deal with some backlog), then scales it down again as the number of samples sent during the time period is less than the number received.	2017-09-14 08:07:40 +01:00
Björn Rabenstein	4d8e7ca185	Merge pull request #3159 from mattbostock/1.7_marathon_sd_cherrypick Marathon SD: Set port index label	2017-09-12 18:53:40 +02:00
Matt Bostock	e758260986	Marathon SD: Set port index label The changes [1][] to Marathon service discovery to support multiple ports mean that Prometheus now attempts to scrape all ports belonging to a Marathon service. You can use port definition or port mapping labels to filter out which ports to scrape but that requires service owners to update their Marathon configuration. To allow for a smoother migration path, add a `__meta_marathon_port_index` label, whose value is set to the port's sequential index integer. For example, PORT0 has the value `0`, PORT1 has the value `1`, and so on. This allows you to support scraping both the first available port (the previous behaviour) in addition to ports with a `metrics` label. For example, here's the relabel configuration we might use with this patch: - action: keep source_labels: ['__meta_marathon_port_definition_label_metrics', '__meta_marathon_port_mapping_label_metrics', '__meta_marathon_port_index'] # Keep if port mapping or definition has a 'metrics' label with any # non-empty value, or if no 'metrics' port label exists but this is the # service's first available port regex: ([^;]+;;[^;]+\|;[^;]+;[^;]+\|;;0) This assumes that the Marathon API returns the ports in sorted order (matching PORT0, PORT1, etc), which it appears that it does. [1]: https://github.com/prometheus/prometheus/pull/2506	2017-09-11 13:40:51 +01:00
Björn Rabenstein	a5ddcf5fb2	Merge pull request #2979 from prometheus/beorn7/storage2 Fix iterator issue in varbit chunk	2017-07-21 19:38:23 +02:00
beorn7	ea5e7eafde	Fix #2965 We would overscan when hitting a value directly, interspersed with samples in between timestamps. Apparently, that happens rarely enough that it was only noticed recently.	2017-07-21 16:35:15 +02:00
beorn7	c06292af2f	Add test to expose #2965	2017-07-21 16:25:24 +02:00
Alexey Palazhchenko	b6f89a1982	Parse custom step parameter correctly. (#2928 ) Backport of `6a767b736b`. Refs #2827, #2861.	2017-07-10 21:05:40 +02:00
Frederic Branczyk	3afb3fffa3	Merge pull request #2836 from brancz/cut-1.7.1 cut 1.7.1	2017-06-12 13:41:21 +02:00
Frederic Branczyk	aef7104791	cut 1.7.1	2017-06-12 11:44:04 +02:00
Fabian Reinartz	f71d39a053	Merge pull request #2832 from brancz/fix-redirect web: fix double prefix redirect	2017-06-12 10:40:43 +02:00
Frederic Branczyk	9063f8dedd	web: fix double prefix	2017-06-10 12:07:43 +02:00
Frederic Branczyk	bfa37c8ee3	Merge pull request #2814 from brancz/cut-1.7.0 cut 1.7.0	2017-06-07 11:40:29 +02:00
Frederic Branczyk	5d9ac3565e	Merge pull request #2786 from tomwilkie/report-error-in-remote-write Remote write: read first line of response and include it in the error.	2017-06-07 10:21:34 +02:00
Frederic Branczyk	6d29f0c19f	cut 1.7.0	2017-06-07 10:00:48 +02:00
Frederic Branczyk	d26f789cb0	Merge pull request #2816 from brancz/gophercloud-vendoring vendor: fix mixed versions of gophercloud packages	2017-06-07 09:54:31 +02:00
Frederic Branczyk	d6a55c3013	vendor: fix mixed versions of gophercloud packages	2017-06-07 09:21:43 +02:00
Christian Groschupp	8f781e411c	Openstack Service Discovery (#2701 ) * Add openstack service discovery. * Add gophercloud code for openstack service discovery. * first changes for juliusv comments. * add gophercloud code for floatingip. * Add tests to openstack sd. * Add testify suite vendor files. * add copyright and make changes for code climate. * Fixed typos in provider openstack. * Renamed tenant to project in openstack sd. * Change type of password to Secret in openstack sd.	2017-06-01 23:49:02 +02:00
Roman Vynar	dbe2eb2afc	Hide consul token on UI. (#2797 )	2017-06-01 22:14:23 +01:00
Fabian Reinartz	a391156dfb	Merge pull request #2667 from goller/go-discovery-logger Add logger injection into discovery services	2017-06-01 10:20:10 -07:00
Chris Goller	42de0ae013	Use log.Logger interface for all discovery services	2017-06-01 11:25:55 -05:00
Tom Wilkie	24a113bb09	Review feedback: limit number of bytes read under error.	2017-06-01 11:21:48 +01:00
Tom Wilkie	46abe8cbf2	Remote write: read first line of response and include it in the error.	2017-05-31 13:46:08 +01:00
Tobias Schmidt	1c9499bbbd	Merge pull request #2785 from prometheus/grobie/fix-target-group-naming Fix outdated target_group naming in error message	2017-05-31 11:28:46 +02:00
Tobias Schmidt	287ec6e6cc	Fix outdated target_group naming in error message The target_groups config has been renamed to static_configs, the error message for overflow attributes should reflect that.	2017-05-31 11:01:13 +02:00
Julius Volz	240bb671e2	config: Fix overflow checking in global config (#2783 )	2017-05-30 20:58:06 +02:00
Julius Volz	e0f046396a	Fix InfluxDB retention policy usage in read adapter (#2781 )	2017-05-29 16:24:24 +02:00
Stephan Erb	14eee34da3	Update vendored go-zookeeper client (#2778 ) It is likely this will fix #2758.	2017-05-29 15:59:30 +02:00
Benjamin	51626f2573	change deprecated maintainer to label (#2724 )	2017-05-29 15:58:40 +02:00
Conor Broderick	6766123f93	Replace regex with Secret type and remarshal config to hide secrets (#2775 )	2017-05-29 12:46:23 +01:00
Brian Brazil	d66799d7f3	Show gaps in graphs. (#2766 ) Fixes #345	2017-05-26 16:17:48 +01:00
Tobias Schmidt	2a426bfead	Revert "Use tag names consistently (#2743 )" Apparently, a decision was made at some point to only use the v prefix in tags and similar contexts where other things can appear. There was a vote to stick to that decision. For more information, read https://github.com/prometheus/prometheus/pull/2743. This reverts commit `5405a4724f`.	2017-05-25 08:57:01 +02:00
conorbroderick	9c953064c3	check if result is a scalar in order to display correct number of returned time series	2017-05-24 14:07:24 +01:00
Fabian Reinartz	09fcbf78df	Merge pull request #2755 from brancz/redirect-prefix prefix redirect with external url path	2017-05-24 10:09:47 +02:00
Tobias Schmidt	5405a4724f	Use tag names consistently (#2743 )	2017-05-23 14:14:15 +02:00
Frederic Branczyk	ad22606a3d	web: prefix redirect with ExternalURL path	2017-05-22 14:56:52 +02:00
Frederic Branczyk	45df5c2daf	Merge branch 'release-1.6'	2017-05-22 13:44:44 +02:00
Jacky Wu	75b89739de	Fix go version hint. (#2750 )	2017-05-20 18:33:14 +02:00
Frederic Branczyk	7d17ecbd48	Merge pull request #2735 from brancz/cut-1.6.3 cut 1.6.3	2017-05-18 16:56:54 +02:00
Frederic Branczyk	53a2bd71b9	*: cut 1.6.3	2017-05-18 16:51:46 +02:00
Tobias Schmidt	2ae2b663a9	Create sha256 checksums file during release	2017-05-18 16:50:44 +02:00
Tom Wilkie	e9787382b4	Ensure ewma int64s are always aligned. (#2675 )	2017-05-18 16:50:44 +02:00
Frederic Branczyk	363554f675	Merge pull request #2739 from Conorbro/stack-graph-fix Fixed graph ui max/min logic to accommodate for toggling of stacked graph option	2017-05-18 16:49:30 +02:00
conorbroderick	9287a01bbf	Fixed fixed yaxis of stacked graph being cut off	2017-05-18 15:18:29 +01:00
Frederic Branczyk	b916b3784b	Merge pull request #2731 from brancz/lset-non-cloned notifier: clone and not reuse LabelSet in AM discovery	2017-05-18 10:59:38 +02:00
Frederic Branczyk	94e8b43aae	notifier: clone and not reuse LabelSet in AM discovery	2017-05-18 10:12:42 +02:00

1 2 3 4 5 ...

3905 commits