prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-11-14 09:34:05 -08:00

Author	SHA1	Message	Date
Julien Pivotto	645b71e9ef	Fix snapshots (#7217 ) Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-05-07 10:03:48 +01:00
Ganesh Vernekar	d4b9fe801f	M-map full chunks of Head from disk (#6679 ) When appending to the head and a chunk is full it is flushed to the disk and m-mapped (memory mapped) to free up memory Prom startup now happens in these stages - Iterate the m-maped chunks from disk and keep a map of series reference to its slice of mmapped chunks. - Iterate the WAL as usual. Whenever we create a new series, look for it's mmapped chunks in the map created before and add it to that series. If a head chunk is corrupted the currpted one and all chunks after that are deleted and the data after the corruption is recovered from the existing WAL which means that a corruption in m-mapped files results in NO data loss. [Mmaped chunks format](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/head_chunks.md) - main difference is that the chunk for mmaping now also includes series reference because there is no index for mapping series to chunks. [The block chunks](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/chunks.md) are accessed from the index which includes the offsets for the chunks in the chunks file - example - chunks of series ID have offsets 200, 500 etc in the chunk files. In case of mmaped chunks, the offsets are stored in memory and accessed from that. During WAL replay, these offsets are restored by iterating all m-mapped chunks as stated above by matching the series id present in the chunk header and offset of that chunk in that file. Prombench results _WAL Replay_ 1h Wal reply time 30% less wal reply time - 4m31 vs 3m36 2h Wal reply time 20% less wal reply time - 8m16 vs 7m _Memory During WAL Replay_ High Churn: 10-15% less RAM - 32gb vs 28gb 20% less RAM after compaction 34gb vs 27gb No Churn: 20-30% less RAM - 23gb vs 18gb 40% less RAM after compaction 32.5gb vs 20gb Screenshots are in [this comment](https://github.com/prometheus/prometheus/pull/6679#issuecomment-621678932) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-05-06 21:00:00 +05:30
Ben Ye	1e4e37144d	Fixed wrongly handled not ready TSDB on web and API. (#7182 ) * fix federate endpoint panic Signed-off-by: yeya24 <yb532204897@gmail.com> * Fixed all cases of not ready TSDB being wrongly handled. * Fixed issue for federation. * Ensured this will never happen again thanks to interfaces * Fixes same issue for stats. * Added tests for readiness. * Fixed bug in stats. It was: status.MaxTime = db.Head().MaxTime() status.MinTime = db.Head().MaxTime() Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed Brian's comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed Brian's comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-04-29 17:16:14 +01:00
Vasily Sliouniaev	0393b188c9	Add Jaeger (#7148 ) * Trace remote read Signed-off-by: vas <vasily.sliouniaev@jet.com> * Use jaeger Signed-off-by: vas <vasily.sliouniaev@jet.com>	2020-04-23 02:05:55 +02:00
Marek Slabicki	8224ddec23	Capitalizing first letter of all log lines (#7043 ) Signed-off-by: Marek Slabicki <thaniri@gmail.com>	2020-04-11 09:22:18 +01:00
Brian Brazil	7646cbca32	Use .UTC everywhere we use time.Unix (#7066 ) time.Unix attaches the local timezone, which can then leak out (e.g. in the alert json). While this is harmless, we should be consistent. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2020-03-29 17:35:39 +01:00
Ben Kochie	269e7c8091	Fix golint issues. Signed-off-by: Ben Kochie <superq@gmail.com>	2020-03-23 20:38:43 +01:00
johncming	bbacd2dd09	remove needless break. (#7008 ) Signed-off-by: johncming <johncming@yahoo.com>	2020-03-19 11:21:00 +00:00
李国忠	52025bd7a9	[comments] change word ‘wheter’ to ‘whether’ (#6912 ) * [comments] change word ‘wheter’ to ‘whether’ Signed-off-by: fuling <fuling.lgz@alibaba-inc.com> * [comments] change word ‘wheter’ to ‘whether’ Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>	2020-03-02 13:51:24 +05:30
Tobias Guggenmos	4835bbf376	Merge branch 'master' into split_parser	2020-02-19 15:18:13 +01:00
Bartlomiej Plotka	48ead578a0	Moved tsdbconfig to main. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-02-18 11:25:36 +00:00
Bartlomiej Plotka	a20bebf7eb	Moved readyStorage to main. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-02-17 18:03:57 +00:00
Bartlomiej Plotka	8a775bc468	Moved unit agnostic options to separate pkg. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-02-17 18:03:57 +00:00
Bartlomiej Plotka	59c9d6ef45	Addressed Brian's comments, moved metrics to main.go Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-02-17 18:03:57 +00:00
Bartlomiej Plotka	cfba92a133	Addressed comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-02-17 18:03:57 +00:00
Bartlomiej Plotka	34426766d8	Unify Iterator interfaces. All point to storage now. This is part of https://github.com/prometheus/prometheus/pull/5882 that can be done to simplify things. All todos I added will be fixed in follow up PRs. * querier.Querier, querier.Appender, querier.SeriesSet, and querier.Series interfaces merged with storage interface.go. All imports that. * querier.SeriesIterator replaced by chunkenc.Iterator * Added chunkenc.Iterator.Seek method and tests for xor implementation (?) * Since we properly handle SelectParams for Select methods I adjusted min max based on that. This should help in terms of performance for queries with functions like offset. * added Seek to deletedIterator and test. * storage/tsdb was removed as it was only a unnecessary glue with incompatible structs. No logic was changed, only different source of abstractions, so no need for benchmarks. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>	2020-02-17 18:03:54 +00:00
Tobias Guggenmos	454ba12676	Fix build errors in promtool Signed-off-by: Tobias Guggenmos <tguggenm@redhat.com>	2020-02-17 16:09:23 +01:00
Björn Rabenstein	af04cb22c8	Merge pull request #6821 from prometheus/release-2.16 Release 2.16	2020-02-14 13:10:14 +01:00
Julien Pivotto	ff0003e072	Make lookbackDelta a option of QueryEngine (#6746 ) * Make lookbackDelta a option of QueryEngine Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * julius' suggestion Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * remove trivial getter Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * Assume lookback delta is always > 0 Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * add debug log Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * don't expose loopback delta Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * Specify that lookack delta is also used in federation Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * Fix federation test While we have added some logic to the promql engine to keep it backwards compatible and have a 5 minute loopback by default, the web/ package is likely to really be internal to Prometheus and we should not add the same kind of heuritstics here. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu> * loopback delta: Fix debug log Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-02-10 00:58:23 +01:00
Julien Pivotto	d799078c88	also test start and end Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-02-08 16:42:50 +01:00
Julien Pivotto	881dde505a	promql: fix promql query log step unit Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-02-08 16:26:56 +01:00
Julien Pivotto	3c4c01eae2	Fix race in Query Log Test (#6727 ) A data race can happen if we run t.Log after the test t is done -- which in this case is highly possible because of the use of subtests and the fact that we call t.Log in a goroutine. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-01-30 13:51:18 -08:00
Julien Pivotto	9adad8ad30	Remove MaxConcurrent from the PromQL engine opts (#6712 ) Since we use ActiveQueryTracker to check for concurrency in `d992c36b3a` it does not make sense to keep the MaxConcurrent value as an option of the PromQL engine. This pull request removes it from the PromQL engine options, sets the max concurrent metric to -1 if there is no active query tracker, and use the value of the active query tracker otherwise. It removes dead code and also will inform people who import the promql package that we made that change, as it breaks the EngineOpts struct. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-01-28 20:38:49 +00:00
Julien Pivotto	5f27ac3583	Refactor query log fields (#6694 ) * Refactor query log fields Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-01-27 09:53:10 +00:00
Julien Pivotto	2b2eb79e8b	Add windows tests for query logger (#6653 ) * Add windows tests * Do not rely on time.Time in timer Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-01-20 13:17:11 +00:00
Julien Pivotto	0eb34299da	End-to-end Query Log test (#6600 ) * End-to-end Query Log test Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-01-19 21:56:13 +00:00
Julien Pivotto	1a58d2657d	Removed compilation step inside main_test (#6658 ) Inspired by https://github.com/prometheus/prometheus/pull/6347 and https://github.com/prometheus/prometheus/pull/6347#issuecomment-570151979 Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-01-19 07:14:25 +00:00
Harkishen Singh	84e6459c4d	Adds support for line-column numbers for invalid rules, promtool (#6533 ) Signed-off-by: Harkishen Singh <harkishensingh@hotmail.com>	2020-01-15 18:07:54 +00:00
Julien Pivotto	3885562587	Query Logging styling (#6594 ) - Fix Json vs JSON in activequerylogger - Fix SetQueryLogger always returns nil Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-01-09 21:11:39 +00:00
Julien Pivotto	9d9bc524e5	Add query log (#6520 ) * Add query log, make stats logged in JSON like in the API Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-01-08 13:28:43 +00:00
Simon Pasquier	cccd542891	*: avoid missed Alertmanager targets (#6455 ) This change makes sure that nearly-identical Alertmanager configurations aren't merged together. The config's identifier was the MD5 hash of the configuration serialized to JSON but because `relabel.Regexp` has no public field and doesn't implement the JSON.Marshaler interface, it was always serialized to "{}". In practice, the identifier can be based on the index of the configuration in the list. Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-12-12 17:00:19 +01:00
Brooks Swinnerton	0ea3a2218d	Add time units to storage.tsdb.retention.size flag (#6365 ) * Add time units to storage.tsdb.retention.size flag In an effort to reduce confusion with the `m` option of the `ParseDuration()` function, this commit adds the available time units to the `storage.tsdb.retention.time` flag to help showcase that there is no option for months (which could be assumed to be `m`). If someone were looking to set the retention to six months, they may mistakenly do so with `6m`, which would reduce their retention to six minutes. Signed-off-by: Brooks Swinnerton <bswinnerton@gmail.com>	2019-11-30 08:00:51 +00:00
johncming	ad4bc5701e	remove unwanted break (#6338 ) Signed-off-by: johncming <johncming@yahoo.com>	2019-11-18 23:01:03 -08:00
akerele abraham	9d39fdad0c	unittest: check for rule files existence (#6075 ) Signed-off-by: akerele abraham <abrahamakerele38@gmail.com>	2019-11-18 13:54:52 -08:00
Chris Marchbanks	1d1f64b4bc	Fix Promtool showing false duplicate rule warnings (#6270 ) Alert rules do not use the Record field, so any alerts with the same labels and different names would be counted as being duplicates. Promtool will now consider either field when finding duplicates. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-11-05 11:22:31 -07:00
Simon Pasquier	ddff1480a7	cmd/promtool: improve output for PromQL tests (#6052 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-09-25 09:26:29 +02:00
Harkishen Singh	e097c70e6d	add checks for metrics and display duplicate fields (#6026 ) Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>	2019-09-20 11:29:47 +01:00
Simon Pasquier	06066a3619	*: improve error messages when parsing bad rules (#5965 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-08-28 17:36:48 +02:00
Sayan Chowdhury	cb66e325d8	Show the warnings during label query (#5924 ) This patch loops through the warnings while querying the label and spits the output to stderr Fixes #5885 Signed-off-by: Sayan Chowdhury <sayan.chowdhury2012@gmail.com>	2019-08-24 19:42:21 +02:00
Bartek Płotka	48b2c9c8ea	remote-read: streamed chunked server side; Extended protobuf; Added chunked, checksumed reader (#5703 ) Part of: https://github.com/prometheus/prometheus/issues/4517 and https://github.com/improbable-eng/thanos/issues/488 Changes: * Extended protobuf for chunked remote read and negotation. * Added checksumed, chunked Writer/Reader. * Added Server side implementation for chunked streamed remote-read. Signed-off-by: Bartek Plotka <bwplotka@gmail.com>	2019-08-19 21:16:10 +01:00
Bartek Płotka	5cb32d67f9	Merge pull request #5893 from prometheus/unify-tsdbutil Removed extra tsdb/testutil after merge.	2019-08-15 12:07:59 +01:00
Bartek Plotka	f0863a604e	Removed extra tsdb/testutil after merge. Signed-off-by: Bartek Plotka <bwplotka@gmail.com>	2019-08-14 10:12:32 +01:00
Julius Volz	b5c833ca21	Update go.mod dependencies before release (#5883 ) * Update go.mod dependencies before release Signed-off-by: Julius Volz <julius.volz@gmail.com> * Add issue for showing query warnings in promtool Signed-off-by: Julius Volz <julius.volz@gmail.com> * Revert json-iterator back to 1.1.6 It produced errors when marshaling Point values with special float values. Signed-off-by: Julius Volz <julius.volz@gmail.com> * Fix expected step values in promtool tests after client_golang update Signed-off-by: Julius Volz <julius.volz@gmail.com> * Update generated protobuf code after proto dep updates Signed-off-by: Julius Volz <julius.volz@gmail.com>	2019-08-14 11:00:39 +02:00
Advait Bhatwadekar	5d401f1e1b	Added query logging for prometheus. Issue #1315 (#5794 ) * Added query logging for prometheus. Options added: 1) active.queries.filepath: Filename where queries will be recorded 2) active.queries.filesize: Size of the file where queries will be recorded. Functionality added: All active queries are now logged in a file. If prometheus crashes unexpectedly, these queries are also printed out on stdout in the rerun. Queries are written concurrently to an mmaped file, and removed once they are done. Their positions in the file are reused. They are written in json format. However, due to dynamic nature of application, the json has an extra comma after the last query, and is missing an ending ']'. There may also null bytes in the tail of file. Signed-off-by: Advait Bhatwadekar <advait123@ymail.com>	2019-07-31 16:12:43 +01:00
Simon Pasquier	75886e0464	cmd/promtool: fix panic with empty exp_labels Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-07-17 17:02:31 +02:00
Chris Marchbanks	06f1ba73eb	Provide flag to compress the tsdb WAL Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-07-03 08:03:29 -06:00
Tom Wilkie	851131b074	Allow injection of arbitrary headers in promtool, for auth etc. (#4389 ) Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-06-30 11:50:23 +01:00
Simon Pasquier	be67b8d460	web: fix flaky TestHTTPMetrics() (#5695 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-06-24 15:48:15 +02:00
Björn Rabenstein	dc22f74153	Merge pull request #5608 from simonpasquier/external-labels-for-alert-tests cmd/promtool: add $externalLabels for alert unit tests	2019-06-20 16:48:12 +02:00
Björn Rabenstein	372b3438e5	Update prometheus/client_golang to v1.0.0 (#5682 ) Signed-off-by: beorn7 <beorn@grafana.com>	2019-06-17 19:14:36 +01:00
Keenan Romain	55f3a9fe4a	Allows globs for rules when unit testing (#5595 ) * Includes glob support when unit testing rule_files. Signed-off-by: Keenan Romain <Keenan.Romain@mailchimp.com>	2019-06-12 11:31:07 +01:00
Simon Pasquier	74ff35ccdd	cmd/promtool: add $externalLabels for alert unit tests Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-05-29 16:40:01 +02:00
beorn7	aff4738f33	Adjust TestQueryRange to new Prometheus API client Signed-off-by: beorn7 <bjoern@rabenste.in>	2019-05-17 18:09:47 +02:00
Lee Gaines	f4486815c1	logs filesystem type on startup (#5558 ) Signed-off-by: Lee Gaines <leetgaines@gmail.com>	2019-05-17 10:16:16 +01:00
Björn Rabenstein	0a34399611	Fix minor punctuation and language issues in flag doc strings (#5568 ) This is mostly to create consistency, not because the one or the other way would be wrong. A few actual corrections are also included. Signed-off-by: beorn7 <bjoern@rabenste.in>	2019-05-15 16:59:06 +02:00
Simon Pasquier	45506841e6	*: enable all default linters (#5504 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-05-03 15:11:28 +02:00
Simon Pasquier	9c69eec82a	cmd/promtool: use log.NewNopLogger() (#5531 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-05-03 10:00:07 +01:00
Frederic Branczyk	c790d7658c	Merge pull request #5491 from metalmatze/rungroup Use github.com/oklog/run not archived oklog/oklog	2019-04-29 16:22:16 +02:00
Björn Rabenstein	0be9388f8d	Merge pull request #5463 from prometheus/beorn7/templating Follow-up on #5009	2019-04-24 16:42:23 +02:00
Simon Pasquier	abc1994bec	cmd/promtool: return errors from rule evaluations (#5483 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-04-23 09:59:03 +02:00
Matthias Loibl	388caa06ac	Use github.com/oklog/run not archived oklog/oklog Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>	2019-04-19 14:55:28 +02:00
Bjoern Rabenstein	38d518c0fe	Rework #5009 after comments Signed-off-by: Bjoern Rabenstein <bjoern@rabenste.in>	2019-04-17 01:40:10 +02:00
Bjoern Rabenstein	a92ef68dd8	Fix staticcheck errors Not sure why they only show up now. Signed-off-by: Bjoern Rabenstein <bjoern@rabenste.in>	2019-04-17 01:40:10 +02:00
Sylvain Rabot	335a34486e	Add external labels to template expansion This affects the expansion of templates in alert labels and annotations and console templates. Signed-off-by: Sylvain Rabot <sylvain@abstraction.fr>	2019-04-17 01:40:10 +02:00
Simon Pasquier	e5dbac7972	cmd/prometheus: group flags properly (#5419 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-04-10 13:22:05 +01:00
David Symonds	7a60e22c2d	cmd/promtool: resolve relative paths in alert test files (#5336 ) Like `promtool check config <path/to/foo.yaml>`, which resolves relative paths inside foo.yaml to be relative to `path/to`, this now makes `promtool test rules <path/to/test.yaml>` do the same thing. Signed-off-by: David Symonds <dsymonds@gmail.com>	2019-03-27 10:27:26 +01:00
Tariq Ibrahim	8fdfa8abea	refine error handling in prometheus (#5388 ) i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors. ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives. iii) Does away with the use of fmt package for errors in favour of pkg/errors Signed-off-by: tariqibrahim <tariq181290@gmail.com>	2019-03-26 00:01:12 +01:00
Brian Brazil	0a87dcd416	cmd: Warn rather than Info when retention time wraps (#5403 ) Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2019-03-25 18:06:38 +00:00
Krasi Georgiev	9d96ada510	Display correct values for the retention in the flags web gui. (#5322 ) * Display correct values for the retention in the flags web gui. Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com> * adding a log entry Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com> * added the retention info to the runtime status page Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com> * simplify the retention display Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>	2019-03-11 22:48:57 +05:30
Krasi Georgiev	1684dc750a	updated tsdb to 0.6.0 (#5292 ) * updated tsdb to 0.6.0 as part of the update also added the new storage.tsdb.allow-overlapping-blocks flag and mark it as experimental.	2019-03-04 21:42:45 +02:00
Simon Pasquier	c8a1a5a93c	discovery/kubernetes: fix support for password_file and bearer_token_file (#5211 ) * discovery/kubernetes: fix support for password_file Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Create and pass custom RoundTripper to Kubernetes client Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Use inline HTTPClientConfig Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-02-20 11:22:34 +01:00
Krasi Georgiev	a3c41f4256	use the default time retention value only when no size retention is set (#5216 ) fixes https://github.com/prometheus/prometheus/issues/5213 Now that we have time and size base retention time bases should not have a default value. A default is set only when both - time and size flags are not set. This change will not affect current installations that rely on the default time based value, and will avoid confusions when only the size retention is set and it is expected that the default time based setting would be no longer in place. Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>	2019-02-19 13:53:43 +02:00
Callum Styan	6f69e31398	Tail the TSDB WAL for remote_write This change switches the remote_write API to use the TSDB WAL. This should reduce memory usage and prevent sample loss when the remote end point is down. We use the new LiveReader from TSDB to tail WAL segments. Logic for finding the tracking segment is included in this PR. The WAL is tailed once for each remote_write endpoint specified. Reading from the segment is based on a ticker rather than relying on fsnotify write events, which were found to be complicated and unreliable in early prototypes. Enqueuing a sample for sending via remote_write can now block, to provide back pressure. Queues are still required to acheive parallelism and batching. We have updated the queue config based on new defaults for queue capacity and pending samples values - much smaller values are now possible. The remote_write resharding code has been updated to prevent deadlocks, and extra tests have been added for these cases. As part of this change, we attempt to guarantee that samples are not lost; however this initial version doesn't guarantee this across Prometheus restarts or non-retryable errors from the remote end (eg 400s). This changes also includes the following optimisations: - only marshal the proto request once, not once per retry - maintain a single copy of the labels for given series to reduce GC pressure Other minor tweaks: - only reshard if we've also successfully sent recently - add pending samples, latest sent timestamp, WAL events processed metrics Co-authored-by: Chris Marchbanks <csmarchbanks.com> (initial prototype) Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> (sharding changes) Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-02-12 11:39:13 +00:00
Brian Brazil	1dd57765b4	Reduce time that alertmanagers are in flux when reloaded. (#5126 ) This no longer waits for all of the scrape reload to complete before getting a list of AMs again. Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2019-01-28 18:34:12 +00:00
Goutham Veeramachaneni	4068968e12	Protect retention from overflowing (#5112 ) Also sanitise the max block duration to max a month. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>	2019-01-18 20:18:06 +05:30
Goutham Veeramachaneni	384cba1211	Add flag for size based retention (#5109 ) * Add flag for size based retention Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Deprecate the old retention flag for a new one. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Add ability to take a suffix for size flag Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Address feedback Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>	2019-01-18 19:18:36 +05:30
Hrishikesh Barman	a1f34bec2e	Added CORS Origin flag (#5011 ) Signed-off-by: Hrishikesh Barman <hrishikeshbman@gmail.com>	2019-01-17 15:01:06 +00:00
Matt Layher	302148fd69	*: apply gofmt -s Signed-off-by: Matt Layher <mdlayher@gmail.com>	2019-01-16 17:28:14 -05:00
Ryan Leung	45c8b084c6	fix TestFailedStartupExitCode (#5076 ) Signed-off-by: rleungx <rleungx@gmail.com>	2019-01-16 10:13:36 +01:00
Lv Jiawei	b8ede99767	Fix comment typo (#5087 ) According to code, I think it is a typo. Signed-off-by: MIBc <lvjiawei@cmss.chinamobile.com>	2019-01-09 10:56:47 +00:00
Frederic Branczyk	e9ae0b5a1b	Merge pull request #4927 from tariq1890/update_k8s update client-go to v10.0.0 and other k8s deps to v1.13.1	2019-01-07 10:54:34 +01:00
Simon Pasquier	f678e27eb6	: use latest release of staticcheck (#5057 ) : use latest release of staticcheck It also fixes a couple of things in the code flagged by the additional checks. Signed-off-by: Simon Pasquier <spasquie@redhat.com> Use official release of staticcheck Also run 'go list' before staticcheck to avoid failures when downloading packages. Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-01-04 14:47:38 +01:00
tariqibrahim	9b4a25e7b0	use klog dependency Signed-off-by: tariqibrahim <tariq181290@gmail.com>	2019-01-03 13:57:20 -08:00
glutamatt	5ddde1965b	tune the "Wal segment size" with a flag (#5029 ) Add WALSegmentSize as an option, and the corresponding flag "storage.tsdb.wal-segment-size" to tune the max size of wal segment files. The addressed base problem is to reduce the disk space used by wal segment files : on a raspberry pi, for instance, we often want to reduce write load of the sd card, then, the wal directory is mounted on a memory (space limited) partition. the default value of the segment max file size, pushed the size of directory to 128 MB for each segment , which is too much ram consumption on a rasp. the initial discussion is at https://github.com/prometheus/tsdb/pull/450	2019-01-03 17:13:21 +03:00
Ganesh Vernekar	7d30ccd0eb	Sort samples before comparing - PromQL unit test (#5052 ) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2018-12-31 10:55:49 +00:00
Ganesh Vernekar	dbe55c1352	Subquery (#4831 ) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2018-12-22 13:47:13 +00:00
Simon Pasquier	a2766a94a3	cmd/prometheus: add tests for sendAlerts() (#4910 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2018-12-18 11:15:46 +00:00
AixesHunter	1b166d7174	Fix variable 'notifier' collides with imported package name 'github.com/prometheus/prometheus/notifier', changed to 'notifierManager'. (#4947 ) Signed-off-by: aixeshunter <aixeshunter@gmail.com>	2018-12-18 11:13:18 +00:00
Ganesh Vernekar	fbadd88ba5	Get unique eval times for alert unit tests (#4964 ) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2018-12-18 08:40:03 +00:00
Simon Pasquier	ac9d5f3d53	cmd/prometheus: replace glog by glog-gokit (#4931 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2018-12-04 15:01:12 +01:00
Krasi Georgiev	080e6ed31a	collect cpu and trace profiles with the promtool debug command (#4897 ) Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>	2018-11-23 17:57:31 +02:00
Alex Yu	5dcce32ef8	update promlog to latest version (#4876 ) * update promlog to latest version Signed-off-by: Alex Yu <yu.alex96@gmail.com> * Update api tests, fix main setup Signed-off-by: Alex Yu <yu.alex96@gmail.com> * tidy go.sum Signed-off-by: Alex Yu <yu.alex96@gmail.com> * revendor prometheus/common Signed-off-by: Alex Yu <yu.alex96@gmail.com> * only initialize config; use kingpin for remote_storage_adapter Signed-off-by: Alex Yu <yu.alex96@gmail.com> * actually parse the flags Signed-off-by: Alex Yu <yu.alex96@gmail.com> * clean up imports Signed-off-by: Alex Yu <yu.alex96@gmail.com>	2018-11-23 14:22:40 +01:00
Ganesh Vernekar	cfb3769274	Lazily load samples for unit testing (#4851 ) * Lazily load samples for unit testing Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * cleanup Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2018-11-22 14:21:38 +05:30
achiuBAE	a9050c45f6	Allow setting the Prometheus instance document title through a flag. (#4841 ) * web: added ability to set page title through flag. Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com> * Reformatted variable names and Flag description for readability. Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com> * assets_vfsdata.go Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com> * Flag name changed from web.ui-title to web.page-title Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com> * make assets Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>	2018-11-21 12:45:06 +08:00
stuart nelson	6a69471bc2	[promtool] Support writing output as json (#4848 ) * Support writing output as json Oftentimes I'll want to execute something based on the output from promtool, and supporting json makes it easy to pull out values with a supporting tool such as jq. Signed-off-by: stuart nelson <stuartnelson3@gmail.com>	2018-11-14 18:40:07 +01:00
Lucas Serven	70c8b2c63c	cmd/prometheus: buffer signal chans According to the GoDoc for os.Signal [0]: > Package signal will not block sending to c: the caller must ensure that > c has sufficient buffer space to keep up with the expected signal rate. > For a channel used for notification of just one signal value, a buffer > of size 1 is sufficient. [0] https://golang.org/pkg/os/signal/#Notify Signed-off-by: Lucas Serven <lserven@gmail.com>	2018-11-14 10:33:28 +01:00
Frederic Branczyk	bda9781ccd	Merge pull request #3839 from brancz/remove-old-alert-record promql: Remove old and unused alerting/reconding syntax	2018-11-06 15:53:27 +01:00
Simon Pasquier	a30348f1a4	discovery: add config label to discovered targets metric (#4753 ) * discovery: add labels to discovered targets metric Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2018-10-18 16:46:59 +01:00
Callum Styan	9bca041285	WIP: keep track of samples per query, set a max # of samples (#4513 ) * keep track of samples per query, set a max # of samples that can be in memory at once Signed-off-by: Callum Styan <callumstyan@gmail.com>	2018-10-02 12:59:19 +01:00
Tom Wilkie	4c52400708	Limit concurrent remote reads. (#4656 ) Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2018-09-25 20:07:34 +01:00
Ganesh Vernekar	5790d23fd8	Unit testing for rules (#4350 ) * Unit testing for rules * Specifying order of group evaluation in unit tests Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2018-09-25 17:06:26 +01:00
Tom Wilkie	457e4bb58e	Limit the number of samples remote read can return. (#4532 ) * Limit the number of samples remote read can return. - Return 413 entity too large. - Limit can be set be a flag. Allow 0 to mean no limit. - Include limit in error message. - Set default limit to 50M (* 16 bytes = 800MB). Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2018-09-05 15:50:50 +02:00
Chris Marchbanks	63ed9d1b70	Send EndsAt along with alerts (#4550 ) Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2018-08-28 16:05:00 +01:00
Chris Marchbanks	87f1dad16d	throttle resends of alerts to 1 minute by default (#4538 ) Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2018-08-27 17:41:42 +01:00
Krasi Georgiev	12fe204ea6	move runtime debug funcs in own package (#4494 ) To make local debuging with `go run` easyer moved all files into a dedicate package `runtime`. This allows running prometheus just by using `go run main.go` instead of passing mani files like `go run main.go limits_default.go ...` Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>	2018-08-22 13:41:11 +03:00
Simon Pasquier	08c2f50382	Merge pull request #4418 from simonpasquier/log-vm-limits prometheus: log virtual memory limits	2018-08-07 16:27:46 +02:00
Frederic Branczyk	b0b3e3dd74	promql: Remove old and unused alerting/reconding syntax Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>	2018-08-07 15:14:06 +02:00
Dave Henderson	73a08f0045	promtool - Adding --step flag to 'query range' subcommand (#4454 ) Signed-off-by: Dave Henderson <dhenderson@gmail.com>	2018-08-05 11:03:18 +02:00
Julius Volz	90521a65f8	Remove error return value from NotifyFunc() (#4459 ) It's always nil and we also forgot to check it. Signed-off-by: Julius Volz <julius.volz@gmail.com>	2018-08-04 21:31:12 +02:00
Ganesh Vernekar	f1db699dff	Persist alert 'for' state across restarts (#4061 ) Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2018-08-02 11:18:24 +01:00
Simon Pasquier	a94450c288	Fix build for openbsd Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2018-07-31 14:41:30 +02:00
Simon Pasquier	141c188ae6	Enforce conversion for freebsd Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2018-07-26 14:58:56 +02:00
Simon Pasquier	208d21a393	Add comment and print units Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2018-07-26 10:26:58 +02:00
Simon Pasquier	ba22b10113	prometheus: log virtual memory limits Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2018-07-25 15:51:27 +02:00
Daisy T	a3376e8f36	add query labels command to promtool (#4346 ) Signed-off-by: Daisy T <daisyts@gmx.com>	2018-07-18 16:27:28 +02:00
Julius Volz	95dfb1b1dd	Add missing import to promtool, fix build (#4395 ) Sorry, I used GitHub's web-based merge-conflict-resolution editor on https://github.com/prometheus/prometheus/pull/4308 and it didn't show me test errors afterwards, but maybe they didn't run again or I should have waited or something. Signed-off-by: Julius Volz <julius.volz@gmail.com>	2018-07-18 10:26:45 +02:00
Shubheksha	125da3b812	promtool: add command for querying series (#4308 ) Signed-off-by: Shubheksha Jalan <jshubheksha@gmail.com>	2018-07-18 10:15:58 +02:00
Julius Volz	03aa3a3de8	main: Improve / clean up error messages (#4286 ) Signed-off-by: Julius Volz <julius.volz@gmail.com>	2018-07-18 09:58:40 +02:00
Chih-Hung Yeh	912d19fb85	Add 3 commands in `promtool` for getting debug information from prometheus server (#4247 ) `debug all` - all information `debug metrics` - metrics information `debug pprof` - profiling information the final result is compressed in a `tar.gz` file Signed-off-by: chyeh <chyeh.taiwan@gmail.com>	2018-07-18 10:52:01 +03:00
Brian Brazil	68e8b80ffe	Reorder startup and shutdown to prevent panics. (#4321 ) Start rule manager only after tsdb and config is loaded. Stop rule manager before tsdb to avoid writing to closed storage. Wait for any in-progress reloads to complete before shutting down rule manager, so that rule manager doesn't get updated after being shut down. Remove incorrect comment around shutting down query enginge. Log when config reload is completed. Fixes #4133 Fixes #4262 Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>	2018-07-04 13:41:16 +01:00
Michael Khalil	78e0784d04	return error exit status in prometheus cli (#4296 ) Signed-off-by: mikeykhalil <mikeyfkhalil@gmail.com>	2018-06-21 08:32:26 +01:00
Tom Wilkie	8acad5f3cd	make it compile Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2018-05-24 15:40:24 +01:00
Tom Wilkie	e51d6c4b6c	Make remote flush deadline a command line param. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2018-05-23 15:06:01 +01:00
Sneha Inguva	c1a851074b	promtool: add query instant and query range commands (#4085 ) * promtool: add QueryInstant and QueryRange cmds * promtool: add more query functions * promtool: finished query Instant * promtool: add range query * promtool: add query command and address arguments * vendor client and api	2018-04-26 20:41:56 +02:00
Mario Trangoni	464e747f1e	fix some comments typos (#4059 )	2018-04-08 10:51:54 +01:00
Sneha Inguva	7be846754a	main: actor functionality comments	2018-04-01 11:19:30 -07:00
Marek Siarkowicz	bb86c3f62b	Report internal runtime information on status page (#3921 ) Add information about tsdb, wal and config reload	2018-03-21 16:08:37 +00:00
James Turnbull	ba5273a0ab	Minor edits to help text (#3990 )	2018-03-20 16:54:36 +00:00
Simon Pasquier	e1fd96db25	cmd: fix help text (#3989 )	2018-03-20 15:58:19 +00:00
ferhat elmas	ffa673f7d8	General simplifications (#3887 ) Another try as in #1516	2018-02-26 07:58:10 +00:00
Bartek Plotka	93a63ac5fd	api: Added v1/status/flags endpoint. (#3864 ) Endpoint URL: /api/v1/status/flags Example Output: ```json { "status": "success", "data": { "alertmanager.notification-queue-capacity": "10000", "alertmanager.timeout": "10s", "completion-bash": "false", "completion-script-bash": "false", "completion-script-zsh": "false", "config.file": "my_cool_prometheus.yaml", "help": "false", "help-long": "false", "help-man": "false", "log.level": "info", "query.lookback-delta": "5m", "query.max-concurrency": "20", "query.timeout": "2m", "storage.tsdb.max-block-duration": "36h", "storage.tsdb.min-block-duration": "2h", "storage.tsdb.no-lockfile": "false", "storage.tsdb.path": "data/", "storage.tsdb.retention": "15d", "version": "false", "web.console.libraries": "console_libraries", "web.console.templates": "consoles", "web.enable-admin-api": "false", "web.enable-lifecycle": "false", "web.external-url": "", "web.listen-address": "0.0.0.0:9090", "web.max-connections": "512", "web.read-timeout": "5m", "web.route-prefix": "/", "web.user-assets": "" } } ``` Signed-off-by: Bartek Plotka <bwplotka@gmail.com>	2018-02-21 08:49:02 +00:00
Fabian Reinartz	7ccd4b39b8	*: implement query params This adds a parameter to the storage selection interface which allows query engine(s) to pass information about the operations surrounding a data selection. This can for example be used by remote storage backends to infer the correct downsampling aggregates that need to be provided.	2018-02-13 12:17:22 +01:00
Conor Broderick	5169ccf258	Merge pull request #3724 from simonpasquier/fix-bad-data-error Don't reset FiredAt for inactive alerts	2018-02-01 16:18:09 +00:00
Krasi Georgiev	b75428ec19	rename package retrieve to scrape no fucnctinal changes just renaming retrieval to scrape	2018-02-01 09:55:07 +00:00
Krasi Georgiev	7858745c04	rename structs for consistency	2018-01-30 17:49:05 +00:00
Krasi Georgiev	acc4197098	remove dicovery race for the context field	2018-01-29 15:18:07 +00:00
Julien Pivotto	8b20cb1e8d	last config success time gauge: use SetToCurrentTime() (#3750 ) Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2018-01-27 07:48:13 +00:00
Simon Pasquier	81c0ab69e0	Don't reset FiredAt for inactive alerts Otherwise AlertManager receives resolved alerts where StartsAt is zero which fails the validation.	2018-01-22 17:17:33 +01:00
Krasi Georgiev	719c579f7b	refactor main execution reloadReady handling, update some comments	2018-01-17 18:14:24 +00:00
Krasi Georgiev	0eafaf32d3	set the correct config reloading execution for scraper and notifier	2018-01-17 13:06:56 +00:00
Krasi Georgiev	97f0461e29	refactor the config reloading execution	2018-01-17 12:02:13 +00:00
Krasi Georgiev	5260c650ec	use the config hash for the map lookup	2018-01-16 11:10:54 +00:00
Krasi Georgiev	8369826808	comment to rethink the map reference for the notifier discovery	2018-01-16 09:47:53 +00:00
Krasi Georgiev	d12e6f29fc	discovery manager ApplyConfig now takes a direct ServiceDiscoveryConfig so that it can be used for the notify manager reimplement the service discovery for the notify manager Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>	2018-01-15 13:39:44 +00:00
Shubheksha Jalan	0471e64ad1	Use shared types from the `common` repo (#3674 ) * refactor: use shared types from common repo, remove util/config * vendor: add common/config * fix nit	2018-01-11 16:10:25 +01:00
Goutham Veeramachaneni	35a6ffbaf3	Merge pull request #3587 from krasi-georgiev/web-test-error-check handle web_test webhandler errors.	2018-01-10 22:03:25 +05:30
Shubheksha Jalan	ec94df49d4	Refactor SD configuration to remove `config` dependency (#3629 ) * refactor: move targetGroup struct and CheckOverflow() to their own package * refactor: move auth and security related structs to a utility package, fix import error in utility package * refactor: Azure SD, remove SD struct from config * refactor: DNS SD, remove SD struct from config into dns package * refactor: ec2 SD, move SD struct from config into the ec2 package * refactor: file SD, move SD struct from config to file discovery package * refactor: gce, move SD struct from config to gce discovery package * refactor: move HTTPClientConfig and URL into util/config, fix import error in httputil * refactor: consul, move SD struct from config into consul discovery package * refactor: marathon, move SD struct from config into marathon discovery package * refactor: triton, move SD struct from config to triton discovery package, fix test * refactor: zookeeper, move SD structs from config to zookeeper discovery package * refactor: openstack, remove SD struct from config, move into openstack discovery package * refactor: kubernetes, move SD struct from config into kubernetes discovery package * refactor: notifier, use targetgroup package instead of config * refactor: tests for file, marathon, triton SD - use targetgroup package instead of config.TargetGroup * refactor: retrieval, use targetgroup package instead of config.TargetGroup * refactor: storage, use config util package * refactor: discovery manager, use targetgroup package instead of config.TargetGroup * refactor: use HTTPClient and TLS config from configUtil instead of config * refactor: tests, use targetgroup package instead of config.TargetGroup * refactor: fix tagetgroup.Group pointers that were removed by mistake * refactor: openstack, kubernetes: drop prefixes * refactor: remove import aliases forced due to vscode bug * refactor: move main SD struct out of config into discovery/config * refactor: rename configUtil to config_util * refactor: rename yamlUtil to yaml_config * refactor: kubernetes, remove prefixes * refactor: move the TargetGroup package to discovery/ * refactor: fix order of imports	2017-12-29 21:01:34 +01:00
Brian Brazil	ecc24b554d	Hide block duration flags. (#3618 ) Users are starting to use these mistakenly thinking they'll help with issues, and thus causing some confusion. Thus hide them and make it clear that they're only there for testing reasons.	2017-12-24 12:13:48 +00:00
Krasi Georgiev	c94fa731aa	bypass the proxy for the tests	2017-12-20 18:21:10 +00:00
Krasi Georgiev	ad66476c4f	fix flaky main.go test and simplify a bit	2017-12-19 15:07:49 +00:00
Fabian Reinartz	2881d73ed8	Merge pull request #3362 from krasi-georgiev/discovery-refactoring Decouple the discovery and refactor the retrieval package	2017-12-19 12:56:34 +01:00
Goutham Veeramachaneni	9c9f96b2c0	Merge pull request #3529 from krasi-georgiev/main-integration-test main.go integration test for Startup interrupting.	2017-12-18 22:12:13 -06:00
Krasi Georgiev	587dec9eb9	rebased and resolved conflicts with the new Discovery GUI page Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>	2017-12-18 20:10:03 +00:00
Krasi Georgiev	1ec76d1950	rearange the contexts variables and logic split the groupsMerge function to set and get other small nits	2017-12-18 17:23:47 +00:00
Krasi Georgiev	6ff1d5c51e	add the scrape manager config reloader handle errors with invalid scrape config	2017-12-18 17:23:47 +00:00
Krasi Georgiev	b0d4f6ee08	resolved merge confilc in main.go	2017-12-18 17:23:46 +00:00
Krasi Georgiev	c5cb0d2910	simplify naming and API.	2017-12-18 17:22:50 +00:00
Krasi Georgiev	9c61f0e8a0	scrape pool doesn't rely on context as Stop() needs to be blocking to prevent Scrape loops trying to write to a closed TSDB storage.	2017-12-18 17:22:49 +00:00
Krasi Georgiev	e405e2f1ea	refactored discovery	2017-12-18 17:22:49 +00:00
pasquier-s	2440696961	Log file descriptor limits at startup (#3567 ) Fixes #3564	2017-12-11 13:01:53 +00:00
Alberto Cortés	29da2fb9cd	testutil: update to go1.9 testing.Helper	2017-12-08 19:06:53 +01:00
Alberto Cortés	8f6a9f7833	config: simplify tests by using testutil.NotOk (#3289 ) Also include filename in all LoadFile errors Also add mesage to testuitl.NotOk so we can identify failing tests when using table driven tests.	2017-12-08 16:52:25 +00:00
Krasi Georgiev	740662644e	write to temp dir and remove it at the end. Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>	2017-12-06 10:45:58 +00:00
Brian Brazil	b97f4cf48c	Add metrics for rule group interval and last duration.	2017-12-04 11:44:38 +00:00
Krasi Georgiev	2c2a962da3	main.go integration test for Startup interrupting.	2017-12-01 10:58:01 +00:00
Goutham Veeramachaneni	823b7f90b3	Use the files globbed files and not the files in cfg Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-11-30 17:08:34 +05:30
Fabian Reinartz	62461379b7	rules: decouple notifier packages The dependency on the notifier packages caused a transitive dependency on discovery and with that all client libraries our service discovery uses.	2017-11-27 16:38:14 +01:00
Fabian Reinartz	4d964a0a0d	rules: make glob expansion a concern of main	2017-11-24 08:22:57 +01:00
Fabian Reinartz	bd9f7460eb	rules: remove config package dependency	2017-11-24 07:57:54 +01:00
Fabian Reinartz	2d0e3746ac	rules: remove dependency on promql.Engine	2017-11-24 07:57:54 +01:00
Krasi Georgiev	e2f4850fea	Refactor main.go with oklog/pkg/group actors pattern	2017-11-11 12:33:15 +00:00
Thibault Chataigner	fc4406201e	Tsdb StartTime : Use a simplier way to compute StartTime	2017-10-25 17:41:00 +02:00
Julius Volz	099df0c5f0	Migrate "golang.org/x/net/context" -> "context" (#3333 ) In some places, where ctxhttp or gRPC are concerned, we still need to use the old contexts.	2017-10-24 21:21:42 -07:00
Julius Volz	9d43176ab3	Remove unused printVersion variable (#3335 ) Kingpin now automatically does this via --version.	2017-10-23 08:50:13 +01:00
Julius Volz	82c5b98496	Capitalize Prometheus in startup message (#3332 ) Hey, branding :)	2017-10-23 08:49:28 +01:00
Thibault Chataigner	bf4a279a91	Remote storage reads based on oldest timestamp in primary storage (#3129 ) Currently all read queries are simply pushed to remote read clients. This is fine, except for remote storage for wich it unefficient and make query slower even if remote read is unnecessary. So we need instead to compare the oldest timestamp in primary/local storage with the query range lower boundary. If the oldest timestamp is older than the mint parameter, then there is no need for remote read. This is an optionnal behavior per remote read client. Signed-off-by: Thibault Chataigner <t.chataigner@criteo.com>	2017-10-18 12:08:14 +01:00
Julius Volz	5f715f5733	Fix typo in flag description (#3302 )	2017-10-16 23:00:05 +01:00
Tobias Schmidt	3589f2f1d4	Merge pull request #3285 from jlevesy/use-testutils-in-cmd-subpackage Use testutil assertion helpers in cmd package	2017-10-13 00:12:39 +02:00
Julien Levesy	d7b4fa8d78	use testutil assertions in the cmd/prometheus package	2017-10-12 13:45:38 +02:00
Mathieu Pasquet	38afa507bb	Provide better errors messages in commandline Instead or only printing the help message, which is not always helpful. For example, when upgrading from prometheus v1, the retention time value format has changed and now only accepts one unit (e.g. "15d") where it previously allowed more complex strings (e.g. "360h0m0s"). This commit provides the error message as an explanation for the parsing failure.	2017-10-09 16:25:50 +02:00
Marc Sluiter	6a633eece1	Added go-conntrack for monitoring http connections (#3241 ) Added metrics for in- and outgoing traffic with go-conntrack.	2017-10-06 11:22:19 +01:00
Fabian Reinartz	2d0b8e8b94	Merge branch 'master' into dev-2.0	2017-10-05 13:09:18 +02:00
Paul Gier	08af129b4d	cmd/prometheus: don't allow quotes at beginning or end of url This prevents accidental copy/paste error where a the web.external-url or alertmanager.url params could have an extra set of quotes. See also: https://github.com/prometheus/prometheus/issues/1229	2017-10-04 10:10:02 -05:00
Paul Gier	f79b55d057	cmd/prometheus: remove govalidator for url validation The usage of govalidator is redundant with the call to url.Parse for url validation. Removing it has the following benefits: - The explicit error message is displayed instead of just a generic valid/invalid message - Slightly smaller code with one fewer external dependency - Speed improvement by removing duplicate call to url.Parse (inside govalidator.IsURL() - Resolves issue #2717 The only potential drawback of removing govalidator is that certain URLs will be considered valid which were previously invalid. For example: - URLs with hostnames that start and/or end with an underscore (http://_example.com_) - URLs with hostnames that contain some special characters (http://foo&*bar.org) These are valid URIs according to RFC 3986 and valid domain names per RFC 2181, however they are not valid hostnames per RFC 952.	2017-10-04 10:08:34 -05:00
Fabian Reinartz	7b02bfee0a	web: start web handler while TSDB is starting up	2017-09-20 15:03:19 +02:00
Goutham Veeramachaneni	f5aed810f9	logging: Port to common/promlog Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-09-15 12:40:50 +05:30
Fabian Reinartz	d21f149745	*: migrate to go-kit/log	2017-09-08 22:01:51 +05:30
Fabian Reinartz	c70379e1c7	Merge branch 'dev-2.0' of github.com:prometheus/prometheus into dev-2.0	2017-09-04 13:10:50 +02:00
Fabian Reinartz	fffe51fb03	Add mutex and block profiling via envvar	2017-09-04 13:10:32 +02:00
Ben Kochie	59aca4138b	Fix staticcheck issues.	2017-08-28 17:29:01 +02:00
Matt Bostock	64973f5c65	cmd/prometheus: Fix capitalisation in log line (#3123 ) Change 'Ready' to 'ready'.	2017-08-28 11:03:25 +01:00
Mark Adams	77c816b309	Fix pprof endpoints when -web.route-prefix or -web.external-url is used (#3054 ) Whenever a route prefix is applied, the router prepends the prefix to the URL path on the request. For most handlers, this is not an issue because the request's path is only used for routing and is not actually needed by the handler itself. However, Prometheus delegates the handling of the /debug/* endpoints to the http.DefaultServeMux which has it's own routing logic that depends on the url.Path. As a result, whenever a prefix is applied, the prefixed URL is passed to the DefaultServeMux which has no awareness of the prefix and returns a 404. This change fixes the issue by creating a new serveDebug handler which routes requests /debug/* requests to appropriate net/http/pprof handler and removing the net/http/pprof import in cmd/prometheus since it is no longer necessary. Fixes #2183.	2017-08-23 00:00:56 +01:00
Callum Styan	8912f81ffe	check if file_sd files exist in checkConfig	2017-08-22 15:25:30 -07:00
Fabian Reinartz	25f3e1c424	Merge branch 'master' into mergemaster	2017-08-10 17:04:25 +02:00
KalivarapuReshma	686050d816	Change -config.file to --config.file in Readme and error message	2017-08-08 12:49:35 +05:30
emluque	ff54c5c11a	2831 Add Healthy and Ready endpoints	2017-08-07 17:34:04 -03:00
Fabian Reinartz	4d3d8ee229	Merge pull request #2850 from tomwilkie/dev-2.0-remote Remote APIs for v2	2017-08-03 13:39:09 +02:00
Julius Volz	cc50aa2c6b	main: Consistently end flag descriptions with periods. (#2977 )	2017-07-20 23:48:35 +02:00
Tom Wilkie	2dda5775e3	Initial port of remote storage to v2.	2017-07-12 12:27:57 +01:00
Fabian Reinartz	32226e30f5	Guard reload and quit endpoints by flag	2017-07-11 14:25:07 +02:00
Fabian Reinartz	45ac064669	web: disable Amin APIs by default	2017-07-10 09:29:41 +02:00
Fabian Reinartz	ccf9e62972	*: add admin grpc API	2017-07-10 09:14:14 +02:00
Fabian Reinartz	be32afd6df	cmd/prometheus: add back tsdb.no-lockfile flag	2017-06-22 15:02:10 +02:00
Goutham Veeramachaneni	f9202c6511	Move from .yaml to .yml in update rules Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-21 18:38:37 +05:30
Goutham Veeramachaneni	e3701077c3	Move promtool to kingpin Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-21 17:42:57 +05:30
Fabian Reinartz	867b8d108f	cmd/prometheus: cleanup	2017-06-21 11:38:13 +02:00
Fabian Reinartz	34ab7a885a	cmd/prometheus: switch to kingpin	2017-06-20 17:38:01 +02:00
Goutham Veeramachaneni	592cb00c2f	Remove version from RuleGroups Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-19 16:38:46 +05:30
Goutham Veeramachaneni	37e7b69f56	Merge remote-tracking branch 'upstream/dev-2.0' into rulegroups Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-19 16:34:55 +05:30
Goutham Veeramachaneni	67dc73fd59	Flag changes for 2.0 Fixes: prometheus/prometheus#2087 Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-16 20:21:41 +05:30
Goutham Veeramachaneni	d407bd150c	Consolidate the duration params in CLI * All CLI params moved to model.Duration Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-16 20:20:57 +05:30
Goutham Veeramachaneni	6b70a4d850	Incorporate PR feedback * Move fingerprint to Hash() * Move away from tsdb.MultiError * 0777 -> 0666 for files * checkOverflow of extra fields Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-16 16:44:33 +05:30
Goutham Veeramachaneni	6c1617fd13	Simplify usage string Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-16 15:55:13 +05:30
Goutham Veeramachaneni	507790a357	Rework logging to use explicitly passed logger Mostly cleaned up the global logger use. Still some uses in discovery package. Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-16 15:52:44 +05:30
Goutham Veeramachaneni	dc69645e92	Move back to go-yaml Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-16 10:46:21 +05:30
Goutham Veeramachaneni	8abb91f656	Move CLI commander to cobra Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-15 16:38:08 +05:30
Goutham Veeramachaneni	1c08743721	Update check-rules to new format. Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-14 13:32:26 +05:30
Goutham Veeramachaneni	cea1e99f78	Add update-rules command to promtool Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>	2017-06-14 11:38:54 +05:30
Fabian Reinartz	669075c6b9	Merge branch 'master' into dev-2.0	2017-06-06 09:36:51 +02:00
Chris Goller	42de0ae013	Use log.Logger interface for all discovery services	2017-06-01 11:25:55 -05:00
Conor Broderick	6766123f93	Replace regex with Secret type and remarshal config to hide secrets (#2775 )	2017-05-29 12:46:23 +01:00
Fabian Reinartz	4c31061251	Merge branch 'master' into dev-2.0	2017-05-24 15:36:17 +02:00
Fabian Reinartz	d289dc55c3	storage: update TSDB	2017-05-22 11:53:08 +02:00
Shashank Varanasi	dea60bb553	Fix malformed uname string (#2727 ) * Fix malformed uname string * Make fix better * Reformat code for simplicity	2017-05-16 18:44:11 +02:00
Fabian Reinartz	06c2b76cd4	Merge branch 'master' into uptsdb	2017-05-16 16:48:37 +02:00
Shashank Varanasi	61235fd851	Print system information (uname) at Prometheus startup (#2709 ) * Print uname on prom startup * Make uname file linux-only * Add missing license headers Add missing license headers * Print OS when uname is not available * Print only OS name when uname not available * Remove extra space, fix cmd/prometheus/main.go license header * Add fix for int8 and uint8 systems * Better formatting for build tags in cmd/prometheus/uname files * Remove newline	2017-05-13 20:42:29 +02:00
Frederic Branczyk	c50a3eccce	prometheus: default max-block-duration to 10% of retention	2017-05-12 11:48:51 +02:00
Michal Witkowski	4177c35eba	Fixup sighup for P2 TSDB init #2699	2017-05-09 17:00:54 +01:00
Fabian Reinartz	9b175d48cb	Add flag to disable TSDB lock file	2017-05-09 12:56:51 +02:00
Fabian Reinartz	73b8ff0ddc	Merge branch 'master' into dev-2.0	2017-04-27 10:19:55 +02:00
Matt Layher	283756c503	Initial commit of 'promtool check-metrics', promlint package (#2605 )	2017-04-13 23:53:41 +02:00
Fabian Reinartz	757cba7c31	cmd/prometheus: Undo GOGC adjustment	2017-04-10 16:22:01 +02:00
beorn7	f20b84e816	flags: Improve doc strings for checkpoint flags	2017-04-07 13:10:12 +02:00
Fabian Reinartz	8ffc851147	Merge branch 'master' into dev-2.0	2017-04-04 15:17:56 +02:00
Julius Volz	589061919a	Merge pull request #2465 from Gouthamve/alert-metrics-2429 Better Metrics For Alerts	2017-03-31 21:45:05 +02:00
Goutham Veeramachaneni	f27ce34a13	Use Registerer to Register All Metrics * Made Metric a Gauge so that it can be registered.	2017-04-01 00:14:30 +05:30
Goutham Veeramachaneni	0d0c9d5440	Move Registerer to Config Struct in Notifier	2017-03-31 21:20:12 +05:30
Björn Rabenstein	29f05680a2	Merge pull request #2528 from prometheus/beorn7/storage2 main.go: Set GOGC to 40 by default	2017-03-27 15:00:37 +02:00
Björn Rabenstein	e63d079b59	Merge pull request #2527 from prometheus/beorn7/storage storage: Evict chunks and calculate persistence pressure...	2017-03-27 14:49:42 +02:00
Julius Volz	b5b0e00923	Merge pull request #2499 from prometheus/remote-read Remote Read	2017-03-27 14:43:44 +02:00
beorn7	434ab2a6a3	storage: Evict chunks and calculate persistence pressure based on target heap size This is a fairly easy attempt to dynamically evict chunks based on the heap size. A target heap size has to be set as a command line flage, so that users can essentially say "utilize 4GiB of RAM, and please don't OOM". The -storage.local.max-chunks-to-persist and -storage.local.memory-chunks flags are deprecated by this change. Backwards compatibility is provided by ignoring -storage.local.max-chunks-to-persist and use -storage.local.memory-chunks to set the new -storage.local.target-heap-size to a reasonable (and conservative) value (both with a warning). This also makes the metrics intstrumentation more consistent (in naming and implementation) and cleans up a few quirks in the tests. Answers to anticipated comments: There is a chance that Go 1.9 will allow programs better control over the Go memory management. I don't expect those changes to be in contradiction with the approach here, but I do expect them to complement them and allow them to be more precise and controlled. In any case, once those Go changes are available, this code has to be revisted. One might be tempted to let the user specify an estimated value for the RSS usage, and then internall set a target heap size of a certain fraction of that. (In my experience, 2/3 is a fairly safe bet.) However, investigations have shown that RSS size and its relation to the heap size is really really complicated. It depends on so many factors that I wouldn't even start listing them in a commit description. It depends on many circumstances and not at least on the risk trade-off of each individual user between RAM utilization and probability of OOMing during a RAM usage peak. To not add even more to the confusion, we need to stick to the well-defined number we also use in the targeting here, the sum of the sizes of heap objects.	2017-03-27 14:33:50 +02:00
beorn7	96a303b348	storage: Use staleness delta as head chunk timeout Currently, if a series stops to exist, its head chunk will be kept open for an hour. That prevents it from being persisted. Which prevents it from being evicted. Which prevents the series from being archived. Most of the time, once no sample has been added to a series within the staleness limit, we can be pretty confident that this series will not receive samples anymore. The whole chain as described above can be started after 5m instead of 1h. In the relaxed case, this doesn't change a lot as the head chunk timeout is only checked during series maintenance, and usually, a series is only maintained every six hours. However, there is the typical scenario where a large service is deployed, the deoply turns out to be bad, and then it is deployed again within minutes, and quite quickly the number of time series has tripled. That's the point where the Prometheus server is stressed and switches (rightfully) into rushed mode. In that mode, time series are processed as quickly as possible, but all of that is in vein if all of those recently ended time series cannot be persisted yet for another hour. In that scenario, this change will help most, and it's exactly the scenario where help is most desperately needed.	2017-03-26 23:44:50 +02:00
beorn7	04ccf84559	main.go: Set GOGC to 40 by default Rationale: The default value for GOGC is 100, i.e. a garbage collected is initialized once as many heap space has been allocated as was in use after the last GC was done. This ratio doesn't make a lot of sense in Prometheus, as typically about 60% of the heap is allocated for long-lived memory chunks (most of which are around for many hours if not days). Thus, short-lived heap objects are accumulated for quite some time until they finally match the large amount of memory used by bulk memory chunks and a gigantic GC cyle is invoked. With GOGC=40, we are essentially reinstating "normal" GC behavior by acknowledging that about 60% of the heap are used for long-term bulk storage. The median Prometheus production server at SoundCloud runs a GC cycle every 90 seconds. With GOGC=40, a GC cycle is run every 35 seconds (which is still not very often). However, the effective RAM usage is now reduced by about 30%. If settings are updated to utilize more RAM, the time between GC cycles goes up again (as the heap size is larger with more long-lived memory chunks, but the frequency of creating short-lived heap objects does not change). On a quite busy large Prometheus server, the timing changed from one GC run every 20s to one GC run every 12s. In the former case (just changing GOGC, leave everything else as it is), the CPU usage increases by about 10% (on a mid-size referenc server from 8.1 to 8.9). If settings are adjusted, the CPU consumptions increases more drastically (from 8 cores to 13 cores on a large reference server), despite GCs happening more rarely, presumably because a 50% larger set of memory chunks is managed now. Having more memory chunks is good in many regards, and most servers are running out of memory long before they run out of CPU cycles, so the tradeoff is overwhelmingly positive in most cases. Power users can still set the GOGC environment variable as usual, as the implementation in this commit honors an explicitly set variable.	2017-03-26 21:55:37 +02:00
Julius Volz	8fda83ea12	Make rules only read local data	2017-03-21 00:50:04 +01:00
Julius Volz	406b65d0dc	Rename remote.Storage to remote.Writer	2017-03-20 13:15:28 +01:00
Julius Volz	02395a224d	[WIP] Remote Read	2017-03-20 13:13:44 +01:00
Fabian Reinartz	b586781283	*: update tsdb vendoring and add retention flag	2017-03-17 16:06:04 +01:00
Goutham Veeramachaneni	f35816613e	Refactored Notifier to use Registerer * Brought metrics back into Notifier Notifier still implements a Collector. Check if that is needed.	2017-03-03 02:53:16 +05:30
Fabian Reinartz	9304179ef7	Merge branch 'master' into dev-2.0	2017-03-02 08:16:58 +01:00
Fabian Reinartz	4397b4d508	*: pass Prometheus registry into storage	2017-02-28 09:33:14 +01:00
Julius Volz	beb3c4b389	Remove legacy remote storage implementations This removes legacy support for specific remote storage systems in favor of only offering the generic remote write protocol. An example bridge application that translates from the generic protocol to each of those legacy backends is still provided at: documentation/examples/remote_storage/remote_storage_bridge See also https://github.com/prometheus/prometheus/issues/10 The next step in the plan is to re-add support for multiple remote storages.	2017-02-14 17:52:05 +01:00
Fabian Reinartz	ea3ba338dd	main: add flags for new storage	2017-02-05 18:22:06 +01:00
Fabian Reinartz	5772f1a7ba	retrieval/storage: adapt to new interface This simplifies the interface to two add methods for appends with labels or faster reference numbers.	2017-02-02 13:05:46 +01:00
Fabian Reinartz	1d3cdd0d67	Merge branch 'master' into dev-2.0-rebase	2017-01-30 17:43:01 +01:00
Fabian Reinartz	035976b275	retrieval: handle not found error correctly	2017-01-20 11:27:01 +01:00
Bartek Plotka	579e33f19a	Fixed style issues.	2017-01-16 16:45:58 +00:00
Bartek Plotka	d7febe97fa	Fixed regression in -alertmanager.url flag. Basic auth was ignored. - Included basic auth parsing while parsing to AlertmanagerConfig - Added test case Signed-off-by: Bartek Plotka <bwplotka@gmail.com>	2017-01-16 16:39:20 +00:00
Fabian Reinartz	ad9bc62e4c	storage: extend appender and adapt it	2017-01-13 14:48:01 +01:00
Fabian Reinartz	e631a1260d	retrieval: use separate appender per target	2016-12-30 21:35:35 +01:00
Fabian Reinartz	68dc358496	cmd/prometheus: remove tests for old flags	2016-12-29 16:55:22 +01:00
Fabian Reinartz	f8fc1f5bb2	*: migrate ingestion to new batch Appender	2016-12-29 11:03:56 +01:00
Fabian Reinartz	1becee3f6c	main: remove Alertmanager legacy flag configuration	2016-12-25 00:43:41 +01:00
Fabian Reinartz	15a931dbdb	promql: migrate model types, use tsdb interfaces	2016-12-24 00:39:52 +01:00
Fabian Reinartz	8b84ee5ee6	storage: remove old storage This removes all old storage files and only keeps interfaces to still allow the code to compile.	2016-12-22 23:33:32 +01:00
Fabian Reinartz	11a731ba82	remote: remove hard-coded remote storages This commit removes the flag-configured remote storage integrations in favor of the generic remote write path.	2016-12-22 23:17:35 +01:00
Erdem Agaoglu	054f8ebbfb	Increase default max-connections	2016-12-06 17:45:19 +03:00
Erdem Agaoglu	e487477a17	LimitListener to limit max number of connections This also drops tcp keep-alive in ListenAndServe but it's no longer necessary since we now close idle connections long before that.	2016-12-06 12:45:59 +03:00
Erdem Agaoglu	9986b28380	Set read-timeout for http.Server This also specifies a timeout for idle client connections, which may cause "too many open files" errors. See #2238	2016-12-01 16:29:45 +03:00
Fabian Reinartz	3fb4d1191b	config: rename AlertingConfig, resolve file paths	2016-11-24 15:19:37 +01:00
Fabian Reinartz	d4deb8bbf2	web: show discovered Alertmanagers in UI	2016-11-24 15:06:50 +01:00
Fabian Reinartz	f210d96497	notifier: use dynamic service discovery	2016-11-23 18:23:37 +01:00
Fabian Reinartz	200bbe1bad	config: extract SD and HTTPClient configurations	2016-11-23 18:23:37 +01:00
beorn7	5c41ca84e5	Catch negative staleness delta set on the command line	2016-11-01 15:17:59 +01:00
Brian Brazil	6bc29ba857	Fix regression from #1957 , specify non-zero default timeout. (#2121 ) Fixes #2075	2016-10-26 14:47:41 +01:00
Julius Volz	ab80ced756	storage: separate chunk package, publish more names This is a followup to https://github.com/prometheus/prometheus/pull/2011. This publishes more of the methods and other names of the chunk code and moves the chunk code to its own package. There's some unavoidable ugliness: the chunk and chunkDesc metrics are used by both packages, so I had to move them to the chunk package. That isn't great, but I don't see how to do it better without a larger redesign of everything. Same for the evict requests and some other types.	2016-09-26 13:25:11 +02:00
Fabian Reinartz	57b358b82a	vendor: update govalidator (#2023 ) Fixes #2022	2016-09-23 01:06:51 +02:00
Matt Bostock	dd98766b32	cmd/prometheus/main.go: Fix typo in comment	2016-09-21 21:59:25 +01:00
Tom Wilkie	4520e12440	Add HTTP Basic Auth & TLS support to the generic write path. (#1957 ) * Add config, HTTP Basic Auth and TLS support to the generic write path. - Move generic write path configuration to the config file - Factor out config.TLSConfig -> tlf.Config translation - Support TLSConfig for generic remote storage - Rename Run to Start, and make it non-blocking. - Dedupe code in httputil for TLS config. - Make remote queue metrics global.	2016-09-19 22:47:51 +02:00
Julius Volz	c187308366	storage: Contextify storage interfaces. This is based on https://github.com/prometheus/prometheus/pull/1997. This adds contexts to the relevant Storage methods and already passes PromQL's new per-query context into the storage's query methods. The immediate motivation supporting multi-tenancy in Frankenstein, but this could also be used by Prometheus's normal local storage to support cancellations and timeouts at some point.	2016-09-19 16:29:07 +02:00
Julius Volz	ed5a0f0abe	promql: Allow per-query contexts. For Weaveworks' Frankenstein, we need to support multitenancy. In Frankenstein, we initially solved this without modifying the promql package at all: we constructed a new promql.Engine for every query and injected a storage implementation into that engine which would be primed to only collect data for a given user. This is problematic to upstream, however. Prometheus assumes that there is only one engine: the query concurrency gate is part of the engine, and the engine contains one central cancellable context to shut down all queries. Also, creating a new engine for every query seems like overkill. Thus, we want to be able to pass per-query contexts into a single engine. This change gets rid of the promql.Engine's built-in base context and allows passing in a per-query context instead. Central cancellation of all queries is still possible by deriving all passed-in contexts from one central one, but this is now the responsibility of the caller. The central query context is now created in main() and passed into the relevant components (web handler / API, rule manager). In a next step, the per-query context would have to be passed to the storage implementation, so that the storage can implement multi-tenancy or other features based on the contextual information.	2016-09-19 15:38:17 +02:00
Julius Volz	5f5a78e807	Merge pull request #1974 from prometheus/disable-local-storage Allow disabling local storage.	2016-09-17 18:40:01 +02:00
Tom Wilkie	d83879210c	Switch back to protos over HTTP, instead of GRPC. My aim is to support the new grpc generic write path in Frankenstein. On the surface this seems easy - however I've hit a number of problems that make me think it might be better to not use grpc just yet. The explanation of the problems requires a little background. At weave, traffic to frankenstein need to go through a couple of services first, for SSL and to be authenticated. So traffic goes: internet -> frontend -> authfe -> frankenstein - The frontend is Nginx, and adds/removes SSL. Its done this way for legacy reasons, so the certs can be managed in one place, although eventually we imagine we'll merge it with authfe. All traffic from frontend is sent to authfe. - Authfe checks the auth tokens / cookie etc and then picks the service to forward the RPC to. - Frankenstein accepts the reads and does the right thing with them. First problem I hit was Nginx won't proxy http2 requests - it can accept them, but all calls downstream are http1 (see https://trac.nginx.org/nginx/ticket/923). This wasn't such a big deal, so it now looks like: internet --(grpc/http2)--> frontend --(grpc/http1)--> authfe --(grpc/http1)--> frankenstein Next problem was golang grpc server won't accept http1 requests (see https://groups.google.com/forum/#!topic/grpc-io/JnjCYGPMUms). It is possible to link a grpc server in with a normal go http mux, as long as the mux server is serving over SSL, as the golang http client & server won't do http2 over anything other than an SSL connection. This would require making all our service to service comms SSL. So I had a go a writing a grpc http1 server, and got pretty far. But is was a bit of a mess. So finally I thought I'd make a separate grpc frontend for this, running in parallel with the frontend/authfe combo on a different port - and first up I'd need a grpc reverse proxy. Ideally we'd have some nice, generic reverse proxy that only knew about a map from service names -> downstream service, and didn't need to decode & re-encode every request as it went through. It seems like this can't be done with golang's grpc library - see https://github.com/mwitkow/grpc-proxy/issues/1. And then I was surprised to find you can't do grpc from browsers! See http://www.grpc.io/faq/ - not important to us, but I'm starting to question why we decided to use grpc in the first place? It would seem we could have most of the benefits of grpc with protos over HTTP, and this wouldn't preclude moving to grpc when its a bit more mature? In fact, the grcp FAQ even admits as much: > Why is gRPC better than any binary blob over HTTP/2? > This is largely what gRPC is on the wire.	2016-09-15 23:21:54 +01:00
Tobias Schmidt	29ced0090f	Fix common english misspellings	2016-09-14 23:23:28 -04:00
Julius Volz	b24e5d63bc	Add noop local storage engine. This adds a flag -storage.local.engine which allows turning off local storage in Prometheus. Instead of adding if-conditions and nil checks to all parts of Prometheus that deal with Prometheus's local storage (including the web interface), disabling local storage simply means replacing the normal local storage with a noop version that throws samples away and returns empty query results. We also don't add the noop storage to the fanout appender to decrease internal overhead. Instead of returning empty results, an alternate behavior could be to return errors on any query that point out that the local storage is disabled. Not sure which one is more preferable, so I went with the empty result option for now.	2016-09-14 13:18:05 +02:00
Julius Volz	a88e950d1f	Mark remote write address flag as experimental.	2016-09-01 00:58:53 +02:00
Julius Volz	aa3f2b7216	Generic write cleanups and changes. - fold metric name into labels - return initialization errors back to main - add snappy compression - better context handling - pre-allocation of labels - remove generic naming - other cleanups	2016-08-30 17:24:48 +02:00
Brian Brazil	36d2c4bd0b	Add generic write path using grpc. This uses a new proto format, with scope for multiple samples per timeseries in future. This will allow users to pump samples out to whatever they like without having to change the core Prometheus code. There's also an example receiver to save users figuring out the boilerplate themselves.	2016-08-30 17:19:18 +02:00
Julius Volz	4a866c13be	Fix ApplyConfig() error handling Currently, Prometheus starts up without any error when there is an invalid rule file :-/	2016-08-13 00:59:02 +02:00
Julius Volz	08891beb5f	Merge pull request #1828 from drawks/iss-1821 Error on non-flag commandline arguments	2016-07-21 00:35:53 +02:00
Björn Rabenstein	12709af249	Merge pull request #1838 from prometheus/release-1.0 Explicitly add logging flags to our custom flag set	2016-07-21 00:33:12 +02:00
Dave Rawks	00ea36cdbe	Error on non-flag commandline arguments - Added minor cmdline parsing logic change to bail on unconsumed arguments. Fixes #1821	2016-07-20 10:28:26 -07:00
beorn7	bf6201483c	Improve wording on log flag comment	2016-07-20 17:32:42 +02:00
beorn7	25385aafcb	Explicitly add logging flags to our custom flag set In https://github.com/prometheus/prometheus/pull/1782 , we moved to a custom flag set to avoid getting test flags into the main prometheus binary. However, that removed the logging flags, too. This commit updates the vendoring to a version of the log package that allows adding the log flags to our flag set explicitly.	2016-07-20 17:27:39 +02:00
Dmitry Vorobev	273e457da4	web: return status code and error message for config resource	2016-07-15 10:15:24 +02:00
Fabian Reinartz	59d26e8536	web: add -web.route-prefix flag Fixes #1191	2016-07-07 11:49:16 +02:00
Fabian Reinartz	8c24dfdb86	cmd/prometheus: use own flag set Fixes #1743	2016-07-03 14:23:31 +02:00
Fabian Reinartz	dd57e7ef5c	Merge pull request #1699 from prometheus/fabxc-multiam notifier: dispatch to multiple Alertmanagers	2016-06-06 12:01:41 +02:00
Fabian Reinartz	9baf120cd5	notifier: dispatch to multiple Alertmanagers This commit extends the notifier to dispatch alert batches to multiple Alertmanagers concurrently. It changes the `-alertmanager.url` flag to accept a comma separated list of URLs and/or to be set multiple times.	2016-06-06 11:41:10 +02:00
beorn7	99881ded63	Make the number of fingerprint mutexes configurable With a lot of series accessed in a short timeframe (by a query, a large scrape, checkpointing, ...), there is actually quite a significant amount of lock contention if something similar is running at the same time. In those cases, the number of locks needs to be increased. On the same front, as our fingerprints don't have a lot of entropy, I introduced some additional shuffling. With the current state, anly changes in the least singificant bits of a FP would matter.	2016-06-02 19:18:00 +02:00
beorn7	da8cb10b43	Partition the status tab into items in a dropdown I got feedback from different sources about rules and targets being too heavy in the status tab if their are lots of them. This change also allows for more fine-granular locking.	2016-05-18 18:13:55 +02:00
Steve Durrheimer	399d5c6375	Make version informations consistent between prometheus components	2016-05-05 22:33:18 +02:00
beorn7	865d16f870	Rename Gorilla into varbit	2016-03-23 16:30:41 +01:00
beorn7	8cdced3850	Implement Gorilla-inspired chunk encoding This is not a verbatim implementation of the Gorilla encoding. First of all, it could not, even if we wanted, because Prometheus has a different chunking model (constant size, not constant time). Second, this adds a number of changes that improve the encoding in general or at least for the specific use case of Prometheus (and are partially only possible in the context of Prometheus). See comments in the code for details.	2016-03-17 14:47:08 +01:00
Tobias Schmidt	2f151d02eb	Merge pull request #1456 from prometheus/validate-alertmanager-url Validate alertmanager URL	2016-03-07 20:09:46 -05:00
Tobias Schmidt	7763bbd993	Validate alertmanager URL	2016-03-07 20:07:17 -05:00
beorn7	b6fdb355d7	Move dump-heads into its own tool	2016-03-07 16:30:19 +01:00
beorn7	f193f2b8ef	Add a command to promtool that dumps metadata of heads.db I needed this today for debugging. It can certainly be improved, but it's already quite helpful. I refactored the reading of heads.db files out of persistence, which is an improvement, too. I made minor changes to the cli package to allow outputting via the io.Writer interface.	2016-03-07 16:21:57 +01:00
Fabian Reinartz	bfa8aaa017	Rename notification to notifier	2016-03-01 12:39:08 +01:00
Fabian Reinartz	fce17b41c5	Merge pull request #1408 from prometheus/hostname Log argument parse errors	2016-02-19 12:22:12 +01:00
Fabian Reinartz	e62677d7ba	Log argument parse errors Fixes #1407	2016-02-19 12:20:10 +01:00
Ignacio Carbajo	6a323b1e6d	Fix minor typo	2016-02-17 22:52:44 +00:00
beorn7	ec08c9a391	Rework the way to communicate backpressure (AKA suspended ingestion) This gives up on the idea to communicate throuh the Append() call (by either not returning as it is now or returning an error as suggested/explored elsewhere). Here I have added a Throttled() call, which has the advantage that it can be called before a whole _batch_ of Append()'s. Scrapes will happen completely or not at all. Same for rule group evaluations. That's a highly desired behavior (as discussed elsewhere). The code is even simpler now as the whole ingestion buffer could be removed. Logging of throttled mode has been streamlined and will create at most one message per minute.	2016-02-01 14:45:44 +01:00
Fabian Reinartz	d9f836e5b8	Merge pull request #1340 from prometheus/validate-externa-url Validate URL parameters	2016-01-27 15:49:08 +01:00
beorn7	a2cd479058	Fix calculation of chunks to persist after restart Since we are not overestimating the number of chunks to persist anymore, this commit also adjusts the default value for -storage.local.memory-chunks. Update of documentation will follow.	2016-01-25 19:33:51 +01:00
Tobias Schmidt	122d73858d	Validate URL parameters	2016-01-25 00:37:09 -05:00
Julius Volz	b150c5768c	Add missing word in comment.	2016-01-21 01:37:08 +01:00
Fabian Reinartz	7e1b39c682	Fix startup/teardown order, add documentation	2016-01-18 17:34:25 +01:00
beorn7	4221c7de5c	Improve handling of series file truncation If only very few chunks are to be truncated from a very large series file, the rewrite of the file is a lorge overhead. With this change, a certain ratio of the file has to be dropped to make it happen. While only causing disk overhead at about the same ratio (by default 10%), it will cut down I/O by a lot in above scenario.	2016-01-11 16:42:10 +01:00
Fabian Reinartz	37d80c4b25	Fix premature rule evaluation This commit prevents rule evaluation from starting until after the storage is ready.	2016-01-08 17:51:22 +01:00
Richard Hartmann	7da42eee6e	main.go: Remove warning about external_labels	2016-01-07 11:15:14 +01:00
Julius Volz	87d1831f12	Document INFLUXDB_PW env var in username flag Fixes https://github.com/prometheus/prometheus/issues/1281	2016-01-04 00:18:41 +01:00
Fabian Reinartz	62075aa037	Reduce noisy no-alertmanager warning	2015-12-17 15:42:26 +01:00
Fabian Reinartz	52e5224f5a	Refactor rules/ package	2015-12-17 15:42:25 +01:00
Fabian Reinartz	2c8a96ecdc	Adjust notification handler flags	2015-12-11 15:17:32 +01:00
Fabian Reinartz	e114ce0ff7	Refactor notification handler	2015-12-11 15:17:32 +01:00
Fabian Reinartz	a542cc8609	Remove -web.use-local-assets	2015-11-11 17:58:03 +01:00
Corentin Chary	a2e4439086	Add support for remote storage on Graphite Allows to use graphite over tcp or udp. Metrics labels and values are used to construct a valid Graphite path in a way that will allow us to eventually read them back and reconstruct the metrics. For example, this metric: model.Metric{ model.MetricNameLabel: "test:metric", "testlabel": "test:value", "testlabel2": "test:value", ) Will become: test:metric.testlabel=test:value.testlabel2=test:value escape.go takes care of escaping values to match Graphite character set, it basically uses percent-encoding as a fallback wich will work pretty will in the graphite/grafana world. The remote storage module also has an optional 'prefix' parameter to prefix all metrics with a path (for example, 'prometheus.'). Graphite URLs are simply in the form tcp://host:port or udp://host:port.	2015-11-10 07:58:57 +01:00
Fabian Reinartz	e3b6ec9784	Switch to common/log	2015-10-03 10:21:43 +02:00
Julius Volz	dac26cef71	Rename global "labels" config option to "external_labels".	2015-09-29 20:54:20 +02:00
Julius Volz	24d0d9190e	Make -web.external-url help string more verbose.	2015-09-16 20:35:23 +02:00
Julius Volz	eeb1da36ac	Fix InfluxDB write support to work with InfluxDB 0.9.x. Because the InfluxDB client library currently pulls in multiple MBs of unnecessary dependencies, I have modified and cut up the vendored version to only pull in the few pieces that are actually needed. On InfluxDB's side, this dependency issue is tracked in: https://github.com/influxdb/influxdb/issues/3447 Hopefully, it will be resolved soon. If a password is needed for InfluxDB, it may be supplied via the INFLUXDB_PW environment variable.	2015-09-16 17:40:03 +02:00
Julius Volz	af513468eb	Fix some dead code, missing error checks, shadowings. I applied https://medium.com/@jgautheron/quality-pipeline-for-go-projects-497e34d6567 and was greeted with a deluge of warnings, most of which were not applicable or really fixable realistically. These are some of the first ones I decided to fix.	2015-09-14 12:21:34 +02:00
Julius Volz	d73c8a4f0b	Remove notice about 0.14.x config file format change.	2015-09-11 16:43:04 +02:00
Jimmi Dyson	ec04ba38a2	Kubernetes SD config check	2015-09-09 13:24:44 +01:00
Jimmi Dyson	a1574aa2b3	Move TLS options to scrape config Fixes #1013, fixes #989	2015-09-09 09:52:21 +01:00
Fabian Reinartz	d839980fcb	Merge pull request #1051 from prometheus/globallabels Change global label handling	2015-09-03 16:52:59 +02:00
Fabian Reinartz	8fa719f778	Attach global labels to remote storage samples	2015-09-03 16:38:04 +02:00
Fabian Reinartz	5fed076a76	Attach global labels to outgoing alerts.	2015-09-03 16:38:04 +02:00
Fabian Reinartz	9bbd9264e2	Add global labels to federation	2015-09-03 16:38:03 +02:00
Silas Snider	b2cb637f97	Add instrumentation around configuration reloads. This commit enables automation (and alerting) around attempts to update prometheus server configuration automatically.	2015-09-02 10:08:51 -07:00
Julius Volz	995d3b831d	Fix most golint warnings. This is with `golint -min_confidence=0.5`. I left several lint warnings untouched because they were either incorrect or I felt it was better not to change them at the moment.	2015-08-26 12:44:46 +02:00
Julius Volz	274e9d6955	Exit when web server encounters a startup error	2015-08-20 18:23:57 +02:00
Fabian Reinartz	18c0f347a3	Fix loop-reloading on shutdown	2015-08-14 16:29:34 +02:00
Jan Berktold	2bf7048dbb	Add reload handler to web	2015-08-11 11:27:15 +02:00
Fabian Reinartz	73f1cc807d	Check token and cert file existence in promtool	2015-08-10 11:42:29 +02:00
Fabian Reinartz	7a67472fc1	Resolve relative paths on configuration loading This moves the concern of resolving the files relative to the config file into the configuration loading itself. It also fixes #921 which did not load the cert and token files relatively.	2015-08-05 18:08:04 +02:00
Fabian Reinartz	7e615dcdf0	cmd/promtool: resolve rule files relative to config file	2015-07-03 15:10:37 +02:00
Fabian Reinartz	feb8a03503	rules: load rule files relative to a base dir	2015-07-03 15:10:37 +02:00
Julius Volz	fcff35b43e	Consolidate external reachability flags into one. Besides fixing https://github.com/prometheus/prometheus/issues/805 by making the entire externally reachable server URL configurable, this adds tests for the "globalURL" template function and makes it easier to test other such functions in the future. This breaks the `web.Hostname` flag (and introduces `web.external-url`). This flag is likely only used by few users, so I hope that's justifiable. Fixes https://github.com/prometheus/prometheus/issues/805	2015-07-03 13:39:10 +02:00
Fabian Reinartz	b201725d1c	cmd/prometheus: fix remote storage fanout	2015-06-26 01:34:51 +02:00
Julius Volz	8887f1e1a2	Merge pull request #853 from prometheus/fabxc/help cmd/prometheus: improve help output	2015-06-25 18:59:01 +02:00
Fabian Reinartz	525070419b	cmd/prometheus: improve help output	2015-06-25 18:53:51 +02:00
Fabian Reinartz	bcc8101d9e	cmd/promtool: fix missing builddate in version info	2015-06-25 17:21:24 +02:00
Fabian Reinartz	23e77450ff	main: cleanup initialization of remote storage.	2015-06-23 18:24:48 +02:00
Fabian Reinartz	ccbc801d19	Merge pull request #816 from prometheus/fabxc/promctl Create promtool command	2015-06-22 16:40:09 +02:00
Fabian Reinartz	890c1a7e74	cmd/promtool: add promtool command. The promtool command should bundle multiple commands that help in maintaining a running Prometheus server.	2015-06-22 16:06:18 +02:00
Fabian Reinartz	f97db8d4e5	cmd/prometheus: fix version output	2015-06-18 12:53:00 +02:00
Fabian Reinartz	39edc2df7a	version: move version information into separate package. Version information is determined at build-time and thus there is no need to pass it down from main. In its own package it can be used from various other packages.	2015-06-16 14:48:29 +02:00
Fabian Reinartz	de66e32a4d	cmd/prometheus: create new main package.	2015-06-15 19:01:06 +02:00

... 7 8 9 10 11 ...

759 commits