prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-11-17 19:14:04 -08:00

Author	SHA1	Message	Date
Jesus Vazquez	c1b669bf9b	Add out-of-order sample support to the TSDB (#11075 ) * Introduce out-of-order TSDB support This implementation is based on this design doc: https://docs.google.com/document/d/1Kppm7qL9C-BJB1j6yb6-9ObG3AbdZnFUBYPNNWwDBYM/edit?usp=sharing This commit adds support to accept out-of-order ("OOO") sample into the TSDB up to a configurable time allowance. If OOO is enabled, overlapping querying are automatically enabled. Most of the additions have been borrowed from https://github.com/grafana/mimir-prometheus/ Here is the list ist of the original commits cherry picked from mimir-prometheus into this branch: - `4b2198d7ec` - `2836e5513f` - `00b379c3a5` - `ff0dc75758` - `a632c73352` - `c6f3d4ab33` - `5e8406a1d4` - `abde1e0ba1` - `e70e769889` - `df59320886` Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com> Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com> Co-authored-by: Dieter Plaetinck <dieter@grafana.com> Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * gofumpt files Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Add license header to missing files Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Fix OOO tests due to existing chunk disk mapper implementation Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Fix truncate int overflow Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Add Sync method to the WAL and update tests Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * remove useless sync Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Update minOOOTime after truncating Head * Update minOOOTime after truncating Head Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix lint Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Add a unit test Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Load OutOfOrderTimeWindow only once per appender Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Fix OOO Head LabelValues and PostingsForMatchers Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Fix replay of OOO mmap chunks Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Remove unnecessary err check Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Prevent panic with ApplyConfig Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Run OOO compaction after restart if there is OOO data from WBL Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Apply Bartek's suggestions Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Refactor OOO compaction Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Address comments and TODOs - Added a comment explaining why we need the allow overlapping compaction toggle - Clarified TSDBConfig OutOfOrderTimeWindow doc - Added an owner to all the TODOs in the code Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Run go format Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Fix remaining review comments Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix tests Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Change wbl reference when truncating ooo in TestHeadMinOOOTimeUpdate Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> * Fix TestWBLAndMmapReplay test failure on windows Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Address most of the feedback Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Refactor the block meta for out of order Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix windows error Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> * Fix review comments Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com> Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com> Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com> Co-authored-by: Dieter Plaetinck <dieter@grafana.com> Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>	2022-09-20 22:35:50 +05:30
Cosrider	bef6556ca5	delete redundant alias (#11180 ) Signed-off-by: Cosrider <cosrider7@gmail.com> Signed-off-by: Cosrider <cosrider7@gmail.com>	2022-08-31 15:50:38 +02:00
Matthieu MOREL	8a01943abc	refactor (package config): move from github.com/pkg/errors to 'errors' and 'fmt' packages (#10724 ) Signed-off-by: Matthieu MOREL <mmorel-35@users.noreply.github.com>	2022-05-23 09:54:32 +02:00
Matthieu MOREL	e2ede285a2	refactor: move from io/ioutil to io and os packages (#10528 ) * refactor: move from io/ioutil to io and os packages * use fs.DirEntry instead of os.FileInfo after os.ReadDir Signed-off-by: MOREL Matthieu <matthieu.morel@cnp.fr>	2022-04-27 11:24:36 +02:00
Julien Pivotto	fb2da1f26a	Followup on tracing (#10338 ) * Simplify code by letting common deal with empty TLS config * Improve error message if we notice a user is putting an authorization header into its configuration. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2022-02-22 21:44:36 +01:00
Matej Gera	0acbe5e3f5	Tracing: Add additional options to align with the upstream exporter (#10276 ) * Enhance configuration Signed-off-by: Matej Gera <matejgera@gmail.com>	2022-02-22 17:07:30 +01:00
DrAuYueng	5a6e26556b	Add an option to use the external labels as selectors for the remote read endpoint (#10254 ) * An option to ignore external_labels Signed-off-by: DrAuYueng <ouyang1204@gmail.com>	2022-02-16 22:12:47 +01:00
Julien Pivotto	9a2e93228e	Switch to grafana/regexp everywhere (#10268 ) Let's have a consistent library for regexp. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2022-02-13 00:58:27 +01:00
Julien Pivotto	8cb733d04c	Followup on OpenTelemetry migration (#10203 ) * Followup on OpenTelemetry migration - tracing_config: Change with_insecure to insecure, default to false. - tracing_config: Call SetDirectory to make TLS certificates relative to the Prometheus configuration - documentation: Change bool to boolean in the configuration - documentation: document type float - tracing: Always restart the tracing manager when TLS config is set to reload certificates - tracing: Always set TLS config, which could be used e.g. in case of potential redirects. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>\\	2022-01-29 23:56:44 +01:00
Matej Gera	2c61d29b2a	Tracing: Migrate to OpenTelemetry library (#9724 ) Signed-off-by: Matej Gera <matejgera@gmail.com>	2022-01-25 11:08:04 +01:00
Filip Petkovski	4855a0c067	Allow escaping a dollar sign when expanding external labels (#10129 ) * Allow escaping a dollar sign when expanding external labels There is currently no mechanism to natively escape a dollar sign in the os.Expand function. As a workaround, this commit modifies the external label expansion logic to treat a double dollar ($$) as a mechanism for escaping the dollar character. Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>	2022-01-17 16:43:55 +01:00
Bryan Boreham	1ed94142fc	remote-write: slow down retries to avoid DDOS (#9634 ) * remote-write: slow down retries to avoid DDOS Increase the default max retry time from 100ms to 5 seconds. Remote write calls are retried after a recoverable error such as the back-end returning 500. Prometheus waits the minimum time and retries, then doubles the wait on each subsequent retry until the maximum is reached. If some data is still getting through, remote-write will also increase shards, and the default maximum is 200. 200 shards sending every 100ms is 20 calls per second, to a back-end that is already in trouble. 5 seconds was chosen to match the default BatchSendDeadline: if we can afford to wait that long for no response, then we can wait the same time to retry. We will reach 5 seconds after 9 successive failures. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> * Update config doc for max_backoff change Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2021-11-09 14:08:24 -08:00
beorn7	c954cd9d1d	Move packages out of deprecated pkg directory This creates a new `model` directory and moves all data-model related packages over there: exemplar labels relabel rulefmt textparse timestamp value All the others are more or less utilities and have been moved to `util`: gate logging modetimevfs pool runtime Signed-off-by: beorn7 <beorn@grafana.com>	2021-11-09 08:03:10 +01:00
Mateusz Gozdek	1a6c2283a3	Format Go source files using 'gofumpt -w -s -extra' Part of #9557 Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>	2021-11-02 19:52:34 +01:00
Arthur Silva Sens	be2599c853	config: Make remote-write required for Agent mode (#9618 ) * config: Make remote-write required for Agent mode Signed-off-by: ArthurSens <arthursens2005@gmail.com>	2021-10-30 01:41:40 +02:00
Levi Harrison	bd57cd395e	Switch to common/sigv4 Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-08-26 09:37:19 -04:00
Martin Disibio	1bcd13d6b5	Exemplar resize (#8974 ) * Create experimental circular buffer resize method, benchmarks Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Optimize exemplar resize to only replay as many exemplars as needed Signed-off-by: Martin Disibio <mdisibio@gmail.com> * More comments, benchmark AddExemplar Signed-off-by: Martin Disibio <mdisibio@gmail.com> * optimizations Signed-off-by: Martin Disibio <mdisibio@gmail.com> * comment Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Slight refactor of resize benchmark + make use of resize via runtime reloadable storage config. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Some more config related changes. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address some review comments. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address more review comments. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Refactor to remove usage of noopExemplarStorage and avoid race condition when resizing from Head code. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix or add comments to clarify some of the new behaviour. Signed-off-by: Callum Styan <callumstyan@gmail.com> * fix potential panics related to negative exemplar buffer lengths Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Callum Styan <callumstyan@gmail.com>	2021-07-20 10:22:57 +05:30
Levi Harrison	d5c3c567d3	Remote Write: Add max samples per metadata send (#8959 ) * Added MaxSamplesPerSend Signed-off-by: Levi Harrison <git@leviharrison.dev> * Added tests Signed-off-by: Levi Harrison <git@leviharrison.dev> * Fixed order of require Signed-off-by: Levi Harrison <git@leviharrison.dev> * Added docs Signed-off-by: Levi Harrison <git@leviharrison.dev> * writes -> writesReceived Signed-off-by: Levi Harrison <git@leviharrison.dev> * Improved send loop Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-06-24 15:39:50 -07:00
Levi Harrison	b5f6f8fb36	Switched to go-kit/log Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-06-11 12:28:36 -04:00
hanjm	1df05bfd49	Add body_size_limit to prevent bad targets response large body cause Prometheus server OOM (#8827 ) Signed-off-by: hanjm <hanjinming@outlook.com>	2021-05-29 07:05:42 +08:00
Callum Styan	8fd73b1d28	Add Exemplar Remote Write support (#8296 ) * Write exemplars to the WAL and send them over remote write. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Update example for exemplars, print data in a more obvious format. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Add metrics for remote write of exemplars. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix incorrect slices passed to send in remote write. Signed-off-by: Callum Styan <callumstyan@gmail.com> * We need to unregister the new metrics. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address review comments Signed-off-by: Callum Styan <callumstyan@gmail.com> * Order of exemplar append vs write exemplar to WAL needs to change. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Several fixes to prevent sending uninitialized or incorrect samples with an exemplar. Fix dropping exemplar for missing series. Add tests for queue_manager sending exemplars Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Store both samples and exemplars in the same timeseries buffer to remove the alloc when building final request, keep sub-slices in separate buffers for re-use Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Condense sample/exemplar delivery tests to parameterized sub-tests Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Rename test methods for clarity now that they also handle exemplars Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Rename counter variable. Fix instances where metrics were not updated correctly Signed-off-by: Martin Disibio <mdisibio@gmail.com> * Add exemplars to LoadWAL benchmark Signed-off-by: Callum Styan <callumstyan@gmail.com> * last exemplars timestamp metric needs to convert value to seconds with ms precision Signed-off-by: Callum Styan <callumstyan@gmail.com> * Process exemplar records in a separate go routine when loading the WAL. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address review comments related to clarifying comments and variable names. Also refactor sample/exemplar to enqueue prompb types. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Regenerate types proto with comments, update protoc version again. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Put remote write of exemplars behind a feature flag. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address some of Ganesh's review comments. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Move exemplar remote write feature flag to a config file field. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address Bartek's review comments. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Don't allocate exemplar buffers in queue_manager if we're not going to send exemplars over remote write. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Add ValidateExemplar function, validate exemplars when appending to head and log them all to WAL before adding them to exemplar storage. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address more reivew comments from Ganesh. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Add exemplar total label length check. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address a few last review comments Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Martin Disibio <mdisibio@gmail.com>	2021-05-06 13:53:52 -07:00
Damien Grisonnet	b50f9c1c84	Add label scrape limits (#8777 ) * scrape: add label limits per scrape Add three new limits to the scrape configuration to provide some mechanism to defend against unbound number of labels and excessive label lengths. If any of these limits are broken by a sample from a scrape, the whole scrape will fail. For all of these configuration options, a zero value means no limit. The `label_limit` configuration will provide a mechanism to bound the number of labels per-scrape of a certain sample to a user defined limit. This limit will be tested against the sample labels plus the discovery labels, but it will exclude the __name__ from the count since it is a mandatory Prometheus label to which applying constraints isn't meaningful. The `label_name_length_limit` and `label_value_length_limit` will prevent having labels of excessive lengths. These limits also skip the __name__ label for the same reasons as the `label_limit` option and will also make the scrape fail if any sample has a label name/value length that exceed the predefined limits. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com> * scrape: add metrics and alert to label limits Add three gauge, one for each label limit to easily access the limit set by a certain scrape target. Also add a counter to count the number of targets that exceeded the label limits and thus were dropped. This is useful for the `PrometheusLabelLimitHit` alert that will notify the users that scraping some targets failed because they had samples exceeding the label limits defined in the scrape configuration. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com> * scrape: apply label limits to __name__ label Apply limits to the __name__ label that was previously skipped and truncate the label names and values in the error messages as they can be very very long. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com> * scrape: remove label limits gauges and refactor Remove `prometheus_target_scrape_pool_label_limit`, `prometheus_target_scrape_pool_label_name_length_limit`, and `prometheus_target_scrape_pool_label_value_length_limit` as they are not really useful since we don't have the information on the labels in it. Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>	2021-05-06 09:56:21 +01:00
Levi Harrison	fa184a5fc3	Add OAuth 2.0 Config (#8761 ) * Introduced oauth2 config into the codebase Signed-off-by: Levi Harrison <git@leviharrison.dev>	2021-04-28 14:47:52 +02:00
Julien Pivotto	e635ca834b	Add environment variable expansion in external label values Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2021-03-30 01:36:28 +02:00
Julien Pivotto	49016994ac	Switch to alertmanager api v2 According to the 2.25 release notes, 2.26 should switch to alertmanager api v2 by default. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2021-03-20 01:01:10 +01:00
Robert Fratto	5b78aa0649	Contribute grafana/agent sigv4 code (#8509 ) * Contribute grafana/agent sigv4 code * address review feedback - move validation logic for RemoteWrite into unmarshal - copy configuration fields from ec2 SD config - remove enabled field, use pointer for enabling sigv4 * Update config/config.go * Don't provide credentials if secret key / access key left blank * Add SigV4 headers to the list of unchangeable headers. * sigv4: don't include all headers in signature * only test for equality in the authorization header, not the signed date * address review feedback 1. s/httpClientConfigEnabled/httpClientConfigAuthEnabled 2. bearer_token tuples to "authorization" 3. Un-export NewSigV4RoundTripper * add x-amz-content-sha256 to list of unchangeable headers * Document sigv4 configuration * add suggestion for using default AWS SDK credentials Signed-off-by: Robert Fratto <robertfratto@gmail.com> Co-authored-by: Julien Pivotto <roidelapluie@gmail.com>	2021-03-08 12:20:09 -07:00
Julien Pivotto	93c6139bc1	Support follow_redirect This PR introduces support for follow_redirect, to enable users to disable following HTTP redirects. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2021-02-26 22:50:56 +01:00
Harkishen-Singh	79ba53a6c4	Custom headers on remote-read and refactor implementation to roundtripper. Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>	2021-02-26 17:20:29 +05:30
Julien Pivotto	8787f0aed7	Update common to support credentials type Most of the backwards compat tests is done in common. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2021-02-18 23:28:22 +01:00
Harkishen-Singh	77c20fd2f8	Adds support to configure retry on Rate-Limiting from remote-write config. Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>	2021-02-16 14:52:49 +05:30
Nándor István Krácser	509000269a	remote_write: allow passing along custom HTTP headers (#8416 ) * remote_write: allow passing along custom HTTP headers Signed-off-by: Nandor Kracser <bonifaido@gmail.com> * add warning Signed-off-by: Nandor Kracser <bonifaido@gmail.com> * remote_write: add header valadtion Signed-off-by: Nandor Kracser <bonifaido@gmail.com> * extend tests for bad remote write headers Signed-off-by: Nandor Kracser <bonifaido@gmail.com> * remote_write: add note about the authorization header Signed-off-by: Nandor Kracser <bonifaido@gmail.com>	2021-02-04 14:18:13 -07:00
gotjosh	4eca4dffb8	Allow metric metadata to be propagated via Remote Write. (#6815 ) * Introduce a metadata watcher Similarly to the WAL watcher, its purpose is to observe the scrape manager and pull metadata. Then, send it to a remote storage. Signed-off-by: gotjosh <josue@grafana.com> * Additional fixes after rebasing. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Rework samples/metadata metrics. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Use more descriptive variable names in MetadataWatcher collect. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix issues caused during rebasing. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix missing metric add and unneeded config code. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Address some review comments. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Fix metrics and docs Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Replace assert with require Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Bring back max_samples_per_send metric Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Fix tests Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> Co-authored-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>	2020-11-19 20:53:03 +05:30
Bryan Boreham	90fc6be70f	Default to bigger remote_write sends (#5267 ) * Default to bigger remote_write sends Raise the default MaxSamplesPerSend to amortise the cost of remote calls across more samples. Lower MaxShards to keep the expected max memory usage within reason. Signed-off-by: Bryan Boreham <bryan@weave.works> * Change default Capacity to 2500 To maintain ratio with MaxSamplesPerSend Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2020-09-09 14:00:23 -06:00
Andy Bursavich	4e6a94a27d	Invert service discovery dependencies (#7701 ) This also fixes a bug in query_log_file, which now is relative to the config file like all other paths. Signed-off-by: Andy Bursavich <abursavich@gmail.com>	2020-08-20 13:48:26 +01:00
Julien Pivotto	f482c7bdd7	Add per scrape-config targets limit (#7554 ) * Add per scrape-config targets limit Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-07-30 14:20:24 +02:00
Julien Pivotto	0cca23d3ed	DigitalOcean, Docker Swarm: properly load files Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-07-22 00:01:19 +02:00
Julien Pivotto	9d9bc524e5	Add query log (#6520 ) * Add query log, make stats logged in JSON like in the API Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2020-01-08 13:28:43 +00:00
Callum Styan	67838643ee	Add config option for remote job name (#6043 ) * Track remote write queues via a map so we don't care about index. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Support a job name for remote write/read so we can differentiate between them using the name. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Remote write/read has Name to not confuse the meaning of the field with scrape job names. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Split queue/client label into remote_name and url labels. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Don't allow for duplicate remote write/read configs. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Ensure we restart remote write queues if the hash of their config has not changed, but the remote name has changed. Signed-off-by: Callum Styan <callumstyan@gmail.com> * Include name in remote read/write config hashes, simplify duplicates check, update test accordingly. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-12-12 12:47:23 -08:00
Simon Pasquier	cccd542891	*: avoid missed Alertmanager targets (#6455 ) This change makes sure that nearly-identical Alertmanager configurations aren't merged together. The config's identifier was the MD5 hash of the configuration serialized to JSON but because `relabel.Regexp` has no public field and doesn't implement the JSON.Marshaler interface, it was always serialized to "{}". In practice, the identifier can be based on the index of the configuration in the list. Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-12-12 17:00:19 +01:00
Chris Marchbanks	a6a55c433c	Improve desired shards calculation (#5763 ) The desired shards calculation now properly keeps track of the rate of pending samples, and uses the previously unused integralAccumulator to adjust for missing information in the desired shards calculation. Also, configure more capacity for each shard. The default 10 capacity causes shards to block on each other while sending remote requests. Default to a 500 sample capacity and explain in the documentation that having more capacity will help throughput. Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>	2019-08-13 10:10:21 +01:00
Max Leonard Inden	41c22effbe	config&notifier: Add option to use Alertmanager API v2 With v0.16.0 Alertmanager introduced a new API (v2). This patch adds a configuration option for Prometheus to send alerts to the v2 endpoint instead of the defautl v1 endpoint. Signed-off-by: Max Leonard Inden <IndenML@gmail.com>	2019-06-21 16:33:53 +02:00
Callum Styan	e9129abeff	Remove max_retries from queue_config since it's not used in remote write anymore. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-06-10 12:43:08 -07:00
Tariq Ibrahim	8fdfa8abea	refine error handling in prometheus (#5388 ) i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors. ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives. iii) Does away with the use of fmt package for errors in favour of pkg/errors Signed-off-by: tariqibrahim <tariq181290@gmail.com>	2019-03-26 00:01:12 +01:00
Callum Styan	5603b857a9	Check if label value is valid when unmarhsaling external labels from YAML, add a test to config_tests for valid/invalid external label value. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-03-18 20:31:12 +00:00
Tom Wilkie	c7b3535997	Use pkg/relabelling in remote write. - Unmarshall external_labels config as labels.Labels, add tests. - Convert some more uses of model.LabelSet to labels.Labels. - Remove old relabel pkg (fixes #3647). - Validate external label names. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>	2019-03-18 20:31:12 +00:00
Julien Pivotto	4397916cb2	Add honor_timestamps (#5304 ) Fixes #5302 Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>	2019-03-15 10:04:15 +00:00
Simon Pasquier	027d2ece14	config: resolve more file paths (#5284 ) Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-03-12 10:24:15 +00:00
Simon Pasquier	c8a1a5a93c	discovery/kubernetes: fix support for password_file and bearer_token_file (#5211 ) * discovery/kubernetes: fix support for password_file Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Create and pass custom RoundTripper to Kubernetes client Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Use inline HTTPClientConfig Signed-off-by: Simon Pasquier <spasquie@redhat.com>	2019-02-20 11:22:34 +01:00
Callum Styan	6f69e31398	Tail the TSDB WAL for remote_write This change switches the remote_write API to use the TSDB WAL. This should reduce memory usage and prevent sample loss when the remote end point is down. We use the new LiveReader from TSDB to tail WAL segments. Logic for finding the tracking segment is included in this PR. The WAL is tailed once for each remote_write endpoint specified. Reading from the segment is based on a ticker rather than relying on fsnotify write events, which were found to be complicated and unreliable in early prototypes. Enqueuing a sample for sending via remote_write can now block, to provide back pressure. Queues are still required to acheive parallelism and batching. We have updated the queue config based on new defaults for queue capacity and pending samples values - much smaller values are now possible. The remote_write resharding code has been updated to prevent deadlocks, and extra tests have been added for these cases. As part of this change, we attempt to guarantee that samples are not lost; however this initial version doesn't guarantee this across Prometheus restarts or non-retryable errors from the remote end (eg 400s). This changes also includes the following optimisations: - only marshal the proto request once, not once per retry - maintain a single copy of the labels for given series to reduce GC pressure Other minor tweaks: - only reshard if we've also successfully sent recently - add pending samples, latest sent timestamp, WAL events processed metrics Co-authored-by: Chris Marchbanks <csmarchbanks.com> (initial prototype) Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> (sharding changes) Signed-off-by: Callum Styan <callumstyan@gmail.com>	2019-02-12 11:39:13 +00:00
Bartek Płotka	62c8337e77	Moved configuration into `relabel` package. (#4955 ) Adapted top dir relabel to use pkg relabel structs. Removal of this in a separate tracked here: https://github.com/prometheus/prometheus/issues/3647 Signed-off-by: Bartek Plotka <bwplotka@gmail.com>	2018-12-18 11:26:36 +00:00

1 2 3 4 5

236 commits