prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2025-03-05 20:59:13 -08:00

Author	SHA1	Message	Date
Tom Wilkie	4d9b917d11	Instrument Prometheus with OpenTracing (#2554 ) * Use request.Context() instead of a global map of contexts. * Add some basic opentracing instrumentation on the query path. * Remove tracehandler endpoint.	2017-05-02 18:49:29 -05:00
Stephan Erb	0b9fca983b	Fix reload of ZooKeeper service discovery config (#2669 ) Rational: * When the config is reloaded and the provider context is canceled, we need to exit the current ZK `TargetProvider.Run` method as a new provider will be instantiated. * In case `Stop` is called on the `ZookeeperTreeCache`, the update/events channel may not be closed as it is shared by multiple caches and would thus be double closed. * Stopping all `zookeeperTreeCacheNode`s on teardown ensures all associated watcher go-routines will be closed eagerly rather than implicityly on connection close events.	2017-05-02 18:21:37 -05:00
Fabian Reinartz	86426c0566	Merge pull request #2672 from svend/kubernetes-pods-port-comment Document what ports are scraped by default in k8s example	2017-05-02 11:12:13 +02:00
Svend Sorensen	94a3e863e4	Document what ports are scraped by default in k8s example The Kubernetes pod SD creates a target for each declared port, as documented: https://prometheus.io/docs/operating/configuration/#pod > The pod role discovers all pods and exposes their containers as targets. For > each declared port of a container, a single target is generated. If a > container has no specified ports, a port-free target per container is created > for manually adding a port via relabeling. This results in the default port being the declared port, or no port if none are declared.	2017-05-01 15:58:48 -07:00
Conor Broderick	314b81062d	Updated vendoring for log level reporting issue (#2660 )	2017-04-27 14:25:13 +01:00
Julius Volz	fe11c5933a	Fix mutation of active alert elements by notifier (#2656 ) This caused the external label application in the notifier to bleed back into the rule manager's active alerting elements.	2017-04-26 10:29:42 -05:00
Fabian Reinartz	5248118b10	Merge pull request #2654 from dsymonds/master Add maintainers' GitHub usernames to MAINTAINERS.md.	2017-04-25 08:43:36 +02:00
David Symonds	8bb07490a2	Add maintainers' GitHub usernames to MAINTAINERS.md. CONTRIBUTING.md instructs people to loop them in using that mechanism, but nothing lists the right username.	2017-04-25 16:32:23 +10:00
Fabian Reinartz	60d9138b6b	Merge pull request #2653 from dsymonds/master Preserve Alertmanager URLs as *url.URL.	2017-04-25 08:27:31 +02:00
David Symonds	04ad889751	Preserve Alertmanager URLs as *url.URL. Render a nicer link in the web UI.	2017-04-25 16:17:46 +10:00
Conor Broderick	9eb1a5d6bf	Handle invalid query in graph UI (#2652 )	2017-04-24 10:50:57 +01:00
Brian Brazil	8b8ba26129	Merge pull request #2644 from prometheus/release-1.6 Merge 1.6.1 release from 1.6 branch	2017-04-19 15:22:24 +01:00
Brian Brazil	8097a3c523	Cut v1.6.1 (#2640 )	2017-04-19 14:23:56 +01:00
beorn7	e499ef8cac	Merge bug fixes from branch 'release-1.6'	2017-04-18 18:06:01 +02:00
Björn Rabenstein	872ed88166	Merge pull request #2638 from prometheus/beorn7/storage storage: Don't panic if storage has no FPs even after initial wait	2017-04-18 17:02:07 +02:00
beorn7	1dd737d7c3	storage: Don't panic if storage has no FPs even after initial wait	2017-04-18 15:59:12 +02:00
Matt Layher	1faf33acac	Add promlint check for histogram/summary reserved names (#2626 )	2017-04-15 22:38:01 +01:00
Tobias Schmidt	09a977a782	Create sha256 checksums file during release	2017-04-15 12:26:51 -03:00
Tobias Schmidt	619cc0e0ff	Merge pull request #2625 from mdlayher/promlint-cleanup Simplify promlint problems gathering, use protobuf accessors	2017-04-14 22:47:30 +02:00
Matt Layher	cc4198f421	Simplify promlint problems gathering, use protobuf accessors	2017-04-14 16:40:40 -04:00
Matt Layher	34a4813464	Initial promlint counter _total suffix check (#2624 )	2017-04-14 22:09:54 +02:00
Matt Layher	254cb1ec29	Use untyped metrics for some promlint tests (#2623 )	2017-04-14 19:38:57 +01:00
Björn Rabenstein	67d511784d	Merge pull request #2619 from prometheus/release-1.6 Cut v1.6.0	2017-04-14 20:12:22 +02:00
beorn7	10f6453829	Cut v1.6.0	2017-04-14 19:53:58 +02:00
Jack Neely	896f951e68	Force buckets in a histogram to be monotonic for quantile estimation (#2610 ) * Force buckets in a histogram to be monotonic for quantile estimation The assumption that bucket counts increase monotonically with increasing upperBound may be violated during: * Recording rule evaluation of histogram_quantile, especially when rate() has been applied to the underlying bucket timeseries. * Evaluation of histogram_quantile computed over federated bucket timeseries, especially when rate() has been applied This is because scraped data is not made available to RR evalution or federation atomically, so some buckets are computed with data from the N most recent scrapes, but the other buckets are missing the most recent observations. Monotonicity is usually guaranteed because if a bucket with upper bound u1 has count c1, then any bucket with a higher upper bound u > u1 must have counted all c1 observations and perhaps more, so that c >= c1. Randomly interspersed partial sampling breaks that guarantee, and rate() exacerbates it. Specifically, suppose bucket le=1000 has a count of 10 from 4 samples but the bucket with le=2000 has a count of 7, from 3 samples. The monotonicity is broken. It is exacerbated by rate() because under normal operation, cumulative counting of buckets will cause the bucket counts to diverge such that small differences from missing samples are not a problem. rate() removes this divergence.) bucketQuantile depends on that monotonicity to do a binary search for the bucket with the qth percentile count, so breaking the monotonicity guarantee causes bucketQuantile() to return undefined (nonsense) results. As a somewhat hacky solution until the Prometheus project is ready to accept the changes required to make scrapes atomic, we calculate the "envelope" of the histogram buckets, essentially removing any decreases in the count between successive buckets. * Fix up comment docs for ensureMonotonic * ensureMonotonic: Use switch statement Use switch statement rather than if/else for better readability. Process the most frequent cases first.	2017-04-14 16:21:49 +02:00
Matt Layher	283756c503	Initial commit of 'promtool check-metrics', promlint package (#2605 )	2017-04-13 23:53:41 +02:00
Conor Broderick	ee62807b62	Added min/max to graph to accomodate for constant time series (#2612 ) Added min/max to graph to accommodate constant time series	2017-04-12 14:25:25 +01:00
Björn Rabenstein	1fb2190eeb	Merge pull request #2607 from prometheus/beorn7/storage Vendoring update prior to 1.6 release	2017-04-11 14:31:58 +02:00
beorn7	c53f256a09	storage: Fix use of counter (Set -> Add)	2017-04-11 12:58:24 +02:00
beorn7	1ae50b1d1b	vendoring: Update client_golang/prometheus This is mostly required to enable summaries without quantiles	2017-04-11 12:58:24 +02:00
beorn7	92d4cf7663	vendoring: Remove unused packages	2017-04-11 12:58:24 +02:00
Brian Brazil	0e0fc5a7f4	Correct example name to adapter. (#2590 )	2017-04-10 17:24:53 +01:00
Björn Rabenstein	acd72ae1a7	Merge pull request #2591 from prometheus/beorn7/storage storage: Several optimizations of checkpointing	2017-04-07 20:02:14 +02:00
Goutham Veeramachaneni	cffb1acf7f	Test Longer Tests in Travis (#2570 ) * Test Longer Tests in Travis Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in> * Make test Target Run All Tests * Add test-short to run short tests test is running all the tests now as we are running make tests in CircleCI and I think the base image is shared across Prometheus Org. Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in> * Remove Empty Line Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-04-07 13:46:06 +02:00
beorn7	f20b84e816	flags: Improve doc strings for checkpoint flags	2017-04-07 13:10:12 +02:00
beorn7	f338d791d2	storage: Several optimizations of checkpointing - checkpointSeriesMapAndHeads accepts a context now to allow cancelling. - If a shutdown is initiated, cancel the ongoing checkpoint. (We will create a final checkpoint anyway.) - Always wait for at least as long as the last checkpoint took before starting the next checkpoint (to cap the time spending checkpointing at 50%). - If an error has occurred during checkpointing, don't bother to sync the write. - Make sure the temporary checkpoint file is deleted, even if an error has occurred. - Clean up the checkpoint loop a bit. (The concurrent Timer.Reset(0) call might have cause a race.)	2017-04-07 13:10:12 +02:00
Björn Rabenstein	934d86b936	Merge pull request #2593 from prometheus/beorn7/storage2 storage: Recover from corrupted indices for archived series	2017-04-07 12:55:35 +02:00
Goutham Veeramachaneni	0f48d07f95	Fix Map Race by Moving Locking closer to the Write (#2476 )	2017-04-07 08:55:01 +02:00
Julius Volz	182d7de9cd	Merge pull request #2597 from richardkiene/CMON-53 Add triton zone brand metadata	2017-04-07 01:02:02 +02:00
Björn Rabenstein	38bcba11fe	Merge pull request #2594 from prometheus/beorn7/storage3 storage: Guard against a corner case of data corruption	2017-04-07 00:52:28 +02:00
Björn Rabenstein	f0076aca01	Merge pull request #2595 from prometheus/beorn7/storage4 storage: Guard against appending to evicted chunk	2017-04-07 00:51:53 +02:00
Tom Wilkie	e5d7bbfc3c	Remote writes: retry on recoverable errors. (#2552 ) * Remote writes: retry on recoverable errors. * Add comments * Review feedback * Comments * Review feedback * Final spelling misteak (I hope). Plus, record failed samples correctly.	2017-04-07 00:15:41 +02:00
Richard Kiene	ec692f6161	Add triton zone brand metadata	2017-04-06 21:35:42 +00:00
beorn7	7199a9d9d4	storage: Guard against appending to evicted chunk Fixes #2480. For certain definition of "fixes". This is something that should never happen. Sadly, it does happen, albeit extremely rarely. This could be some weird cornercase we haven't covered yet. Or it happens as a consequesnce of data corruption or a crash recovery gone bad. This is not a "real" fix as we don't know the root cause of the incident reported in #2480. However, this makes sure the server does not crash, but deals gracefully with the problem: The series in question is quarantined, which even makes it available for forensics.	2017-04-06 20:02:52 +02:00
beorn7	3d12906286	storage: Guard against a corner case of data corruption Fixes #2475.	2017-04-06 19:50:32 +02:00
beorn7	4fcc73a04c	storage: Recover from corrupted indices for archived series An unopenable archived_fingerprint_to_timerange is simply deleted and will be rebuilt during crash recovery (wich can then take quite some time). An unopenable archived_fingerprint_to_metric is not deleted but instructions to the user are logged. A deletion has to be done by the user explicitly as it means losing all archived series (and a repair with a 3rd party tool might still be possible).	2017-04-06 19:26:39 +02:00
Julius Volz	9775ad4754	Merge pull request #2588 from prometheus/read-multi Separate out remote read responses.	2017-04-06 17:10:31 +02:00
Conor Broderick	c72692fd75	Fixed issue of partially hidden y-axis values on graph (#2589 )	2017-04-06 16:04:44 +01:00
Brian Brazil	c813c824d4	Separate out remote read responses. Fixes #2574	2017-04-06 15:49:47 +01:00
Björn Rabenstein	516a96d9a3	Merge pull request #2587 from prometheus/beorn7/storage2 storage: Mark storage as dirty if indexing fails	2017-04-06 16:42:06 +02:00

1 2 3 4 5 ...

3827 commits