Commit graph

314 commits

Author SHA1 Message Date
Lucas Serven 70c8b2c63c
cmd/prometheus: buffer signal chans
According to the GoDoc for os.Signal [0]:

> Package signal will not block sending to c: the caller must ensure that
> c has sufficient buffer space to keep up with the expected signal rate.
> For a channel used for notification of just one signal value, a buffer
> of size 1 is sufficient.

[0] https://golang.org/pkg/os/signal/#Notify

Signed-off-by: Lucas Serven <lserven@gmail.com>
2018-11-14 10:33:28 +01:00
Frederic Branczyk bda9781ccd
Merge pull request #3839 from brancz/remove-old-alert-record
promql: Remove old and unused alerting/reconding syntax
2018-11-06 15:53:27 +01:00
Simon Pasquier a30348f1a4 discovery: add config label to discovered targets metric (#4753)
* discovery: add labels to discovered targets metric

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-10-18 16:46:59 +01:00
Callum Styan 9bca041285 WIP: keep track of samples per query, set a max # of samples (#4513)
* keep track of samples per query, set a max # of samples that can be in
memory at once

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2018-10-02 12:59:19 +01:00
Tom Wilkie 4c52400708
Limit concurrent remote reads. (#4656)
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-09-25 20:07:34 +01:00
Ganesh Vernekar 5790d23fd8 Unit testing for rules (#4350)
* Unit testing for rules
* Specifying order of group evaluation in unit tests

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-09-25 17:06:26 +01:00
Tom Wilkie 457e4bb58e
Limit the number of samples remote read can return. (#4532)
* Limit the number of samples remote read can return.

- Return 413 entity too large.
- Limit can be set be a flag.  Allow 0 to mean no limit.
- Include limit in error message.
- Set default limit to 50M (* 16 bytes = 800MB).

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-09-05 15:50:50 +02:00
Chris Marchbanks 63ed9d1b70 Send EndsAt along with alerts (#4550)
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2018-08-28 16:05:00 +01:00
Chris Marchbanks 87f1dad16d throttle resends of alerts to 1 minute by default (#4538)
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2018-08-27 17:41:42 +01:00
Krasi Georgiev 12fe204ea6
move runtime debug funcs in own package (#4494)
To make local debuging with `go run` easyer moved all files into a
dedicate package `runtime`.
This allows running prometheus just by using `go run main.go` instead of
passing mani files like `go run main.go limits_default.go ...`

Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
2018-08-22 13:41:11 +03:00
Simon Pasquier 08c2f50382
Merge pull request #4418 from simonpasquier/log-vm-limits
prometheus: log virtual memory limits
2018-08-07 16:27:46 +02:00
Frederic Branczyk b0b3e3dd74
promql: Remove old and unused alerting/reconding syntax
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
2018-08-07 15:14:06 +02:00
Dave Henderson 73a08f0045 promtool - Adding --step flag to 'query range' subcommand (#4454)
Signed-off-by: Dave Henderson <dhenderson@gmail.com>
2018-08-05 11:03:18 +02:00
Julius Volz 90521a65f8
Remove error return value from NotifyFunc() (#4459)
It's always nil and we also forgot to check it.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-08-04 21:31:12 +02:00
Ganesh Vernekar f1db699dff Persist alert 'for' state across restarts (#4061)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2018-08-02 11:18:24 +01:00
Simon Pasquier a94450c288 Fix build for openbsd
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-07-31 14:41:30 +02:00
Simon Pasquier 141c188ae6 Enforce conversion for freebsd
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-07-26 14:58:56 +02:00
Simon Pasquier 208d21a393 Add comment and print units
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-07-26 10:26:58 +02:00
Simon Pasquier ba22b10113 prometheus: log virtual memory limits
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-07-25 15:51:27 +02:00
Daisy T a3376e8f36 add query labels command to promtool (#4346)
Signed-off-by: Daisy T <daisyts@gmx.com>
2018-07-18 16:27:28 +02:00
Julius Volz 95dfb1b1dd
Add missing import to promtool, fix build (#4395)
Sorry, I used GitHub's web-based merge-conflict-resolution editor on
https://github.com/prometheus/prometheus/pull/4308 and it didn't show me
test errors afterwards, but maybe they didn't run again or I should have
waited or something.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-07-18 10:26:45 +02:00
Shubheksha 125da3b812 promtool: add command for querying series (#4308)
Signed-off-by: Shubheksha Jalan <jshubheksha@gmail.com>
2018-07-18 10:15:58 +02:00
Julius Volz 03aa3a3de8
main: Improve / clean up error messages (#4286)
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-07-18 09:58:40 +02:00
Chih-Hung Yeh 912d19fb85 Add 3 commands in promtool for getting debug information from prometheus server (#4247)
`debug all` - all information
`debug metrics` - metrics  information
`debug pprof` - profiling  information

the final result is compressed in a `tar.gz` file

Signed-off-by: chyeh <chyeh.taiwan@gmail.com>
2018-07-18 10:52:01 +03:00
Brian Brazil 68e8b80ffe
Reorder startup and shutdown to prevent panics. (#4321)
Start rule manager only after tsdb and config is loaded.
Stop rule manager before tsdb to avoid writing to closed storage.
Wait for any in-progress reloads to complete before shutting
down rule manager, so that rule manager doesn't get updated after
being shut down.

Remove incorrect comment around shutting down query enginge.
Log when config reload is completed.

Fixes #4133
Fixes #4262

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2018-07-04 13:41:16 +01:00
Michael Khalil 78e0784d04 return error exit status in prometheus cli (#4296)
Signed-off-by: mikeykhalil <mikeyfkhalil@gmail.com>
2018-06-21 08:32:26 +01:00
Tom Wilkie 8acad5f3cd make it compile
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-05-24 15:40:24 +01:00
Tom Wilkie e51d6c4b6c Make remote flush deadline a command line param.
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2018-05-23 15:06:01 +01:00
Sneha Inguva c1a851074b promtool: add query instant and query range commands (#4085)
* promtool: add QueryInstant and QueryRange cmds

* promtool: add more query functions

* promtool: finished query Instant

* promtool: add range query

* promtool: add query command and address arguments

* vendor client and api
2018-04-26 20:41:56 +02:00
Mario Trangoni 464e747f1e fix some comments typos (#4059) 2018-04-08 10:51:54 +01:00
Sneha Inguva 7be846754a main: actor functionality comments 2018-04-01 11:19:30 -07:00
Marek Siarkowicz bb86c3f62b Report internal runtime information on status page (#3921)
Add information about tsdb, wal and config reload
2018-03-21 16:08:37 +00:00
James Turnbull ba5273a0ab Minor edits to help text (#3990) 2018-03-20 16:54:36 +00:00
Simon Pasquier e1fd96db25 cmd: fix help text (#3989) 2018-03-20 15:58:19 +00:00
ferhat elmas ffa673f7d8 General simplifications (#3887)
Another try as in #1516
2018-02-26 07:58:10 +00:00
Bartek Plotka 93a63ac5fd api: Added v1/status/flags endpoint. (#3864)
Endpoint URL: /api/v1/status/flags
Example Output:
```json
{
  "status": "success",
  "data": {
    "alertmanager.notification-queue-capacity": "10000",
    "alertmanager.timeout": "10s",
    "completion-bash": "false",
    "completion-script-bash": "false",
    "completion-script-zsh": "false",
    "config.file": "my_cool_prometheus.yaml",
    "help": "false",
    "help-long": "false",
    "help-man": "false",
    "log.level": "info",
    "query.lookback-delta": "5m",
    "query.max-concurrency": "20",
    "query.timeout": "2m",
    "storage.tsdb.max-block-duration": "36h",
    "storage.tsdb.min-block-duration": "2h",
    "storage.tsdb.no-lockfile": "false",
    "storage.tsdb.path": "data/",
    "storage.tsdb.retention": "15d",
    "version": "false",
    "web.console.libraries": "console_libraries",
    "web.console.templates": "consoles",
    "web.enable-admin-api": "false",
    "web.enable-lifecycle": "false",
    "web.external-url": "",
    "web.listen-address": "0.0.0.0:9090",
    "web.max-connections": "512",
    "web.read-timeout": "5m",
    "web.route-prefix": "/",
    "web.user-assets": ""
  }
}
```

Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2018-02-21 08:49:02 +00:00
Fabian Reinartz 7ccd4b39b8 *: implement query params
This adds a parameter to the storage selection interface which allows
query engine(s) to pass information about the operations surrounding a
data selection.
This can for example be used by remote storage backends to infer the
correct downsampling aggregates that need to be provided.
2018-02-13 12:17:22 +01:00
Conor Broderick 5169ccf258
Merge pull request #3724 from simonpasquier/fix-bad-data-error
Don't reset FiredAt for inactive alerts
2018-02-01 16:18:09 +00:00
Krasi Georgiev b75428ec19 rename package retrieve to scrape
no fucnctinal changes just renaming retrieval to scrape
2018-02-01 09:55:07 +00:00
Krasi Georgiev 7858745c04 rename structs for consistency 2018-01-30 17:49:05 +00:00
Krasi Georgiev acc4197098 remove dicovery race for the context field 2018-01-29 15:18:07 +00:00
Julien Pivotto 8b20cb1e8d last config success time gauge: use SetToCurrentTime() (#3750)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2018-01-27 07:48:13 +00:00
Simon Pasquier 81c0ab69e0 Don't reset FiredAt for inactive alerts
Otherwise AlertManager receives resolved alerts where StartsAt is zero which
fails the validation.
2018-01-22 17:17:33 +01:00
Krasi Georgiev 719c579f7b refactor main execution reloadReady handling, update some comments 2018-01-17 18:14:24 +00:00
Krasi Georgiev 0eafaf32d3 set the correct config reloading execution for scraper and notifier 2018-01-17 13:06:56 +00:00
Krasi Georgiev 97f0461e29 refactor the config reloading execution 2018-01-17 12:02:13 +00:00
Krasi Georgiev 5260c650ec use the config hash for the map lookup 2018-01-16 11:10:54 +00:00
Krasi Georgiev 8369826808 comment to rethink the map reference for the notifier discovery 2018-01-16 09:47:53 +00:00
Krasi Georgiev d12e6f29fc discovery manager ApplyConfig now takes a direct ServiceDiscoveryConfig so that it can be used for the notify manager
reimplement the service discovery for the notify manager

Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>
2018-01-15 13:39:44 +00:00
Shubheksha Jalan 0471e64ad1 Use shared types from the common repo (#3674)
* refactor: use shared types from common repo, remove util/config

* vendor: add common/config

* fix nit
2018-01-11 16:10:25 +01:00
Goutham Veeramachaneni 35a6ffbaf3
Merge pull request #3587 from krasi-georgiev/web-test-error-check
handle web_test webhandler errors.
2018-01-10 22:03:25 +05:30
Shubheksha Jalan ec94df49d4 Refactor SD configuration to remove config dependency (#3629)
* refactor: move targetGroup struct and CheckOverflow() to their own package

* refactor: move auth and security related structs to a utility package, fix import error in utility package

* refactor: Azure SD, remove SD struct from config

* refactor: DNS SD, remove SD struct from config into dns package

* refactor: ec2 SD, move SD struct from config into the ec2 package

* refactor: file SD, move SD struct from config to file discovery package

* refactor: gce, move SD struct from config to gce discovery package

* refactor: move HTTPClientConfig and URL into util/config, fix import error in httputil

* refactor: consul, move SD struct from config into consul discovery package

* refactor: marathon, move SD struct from config into marathon discovery package

* refactor: triton, move SD struct from config to triton discovery package, fix test

* refactor: zookeeper, move SD structs from config to zookeeper discovery package

* refactor: openstack, remove SD struct from config, move into openstack discovery package

* refactor: kubernetes, move SD struct from config into kubernetes discovery package

* refactor: notifier, use targetgroup package instead of config

* refactor: tests for file, marathon, triton SD - use targetgroup package instead of config.TargetGroup

* refactor: retrieval, use targetgroup package instead of config.TargetGroup

* refactor: storage, use config util package

* refactor: discovery manager, use targetgroup package instead of config.TargetGroup

* refactor: use HTTPClient and TLS config from configUtil instead of config

* refactor: tests, use targetgroup package instead of config.TargetGroup

* refactor: fix tagetgroup.Group pointers that were removed by mistake

* refactor: openstack, kubernetes: drop prefixes

* refactor: remove import aliases forced due to vscode bug

* refactor: move main SD struct out of config into discovery/config

* refactor: rename configUtil to config_util

* refactor: rename yamlUtil to yaml_config

* refactor: kubernetes, remove prefixes

* refactor: move the TargetGroup package to discovery/

* refactor: fix order of imports
2017-12-29 21:01:34 +01:00
Brian Brazil ecc24b554d
Hide block duration flags. (#3618)
Users are starting to use these mistakenly thinking they'll help
with issues, and thus causing some confusion.
Thus hide them and make it clear that they're only there for testing
reasons.
2017-12-24 12:13:48 +00:00
Krasi Georgiev c94fa731aa bypass the proxy for the tests 2017-12-20 18:21:10 +00:00
Krasi Georgiev ad66476c4f fix flaky main.go test and simplify a bit 2017-12-19 15:07:49 +00:00
Fabian Reinartz 2881d73ed8
Merge pull request #3362 from krasi-georgiev/discovery-refactoring
Decouple the discovery and refactor the retrieval package
2017-12-19 12:56:34 +01:00
Goutham Veeramachaneni 9c9f96b2c0
Merge pull request #3529 from krasi-georgiev/main-integration-test
main.go integration test for Startup interrupting.
2017-12-18 22:12:13 -06:00
Krasi Georgiev 587dec9eb9 rebased and resolved conflicts with the new Discovery GUI page
Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>
2017-12-18 20:10:03 +00:00
Krasi Georgiev 1ec76d1950 rearange the contexts variables and logic
split the groupsMerge function to set and get
other small nits
2017-12-18 17:23:47 +00:00
Krasi Georgiev 6ff1d5c51e add the scrape manager config reloader
handle errors with invalid scrape config
2017-12-18 17:23:47 +00:00
Krasi Georgiev b0d4f6ee08 resolved merge confilc in main.go 2017-12-18 17:23:46 +00:00
Krasi Georgiev c5cb0d2910 simplify naming and API. 2017-12-18 17:22:50 +00:00
Krasi Georgiev 9c61f0e8a0 scrape pool doesn't rely on context as Stop() needs to be blocking to prevent Scrape loops trying to write to a closed TSDB storage. 2017-12-18 17:22:49 +00:00
Krasi Georgiev e405e2f1ea refactored discovery 2017-12-18 17:22:49 +00:00
pasquier-s 2440696961 Log file descriptor limits at startup (#3567)
Fixes #3564
2017-12-11 13:01:53 +00:00
Alberto Cortés 29da2fb9cd testutil: update to go1.9 testing.Helper 2017-12-08 19:06:53 +01:00
Alberto Cortés 8f6a9f7833 config: simplify tests by using testutil.NotOk (#3289)
Also include filename in all LoadFile errors

Also add mesage to testuitl.NotOk so we can identify failing tests when
using table driven tests.
2017-12-08 16:52:25 +00:00
Krasi Georgiev 740662644e write to temp dir and remove it at the end.
Signed-off-by: Krasi Georgiev <krasi.root@gmail.com>
2017-12-06 10:45:58 +00:00
Brian Brazil b97f4cf48c Add metrics for rule group interval and last duration. 2017-12-04 11:44:38 +00:00
Krasi Georgiev 2c2a962da3 main.go integration test for Startup interrupting. 2017-12-01 10:58:01 +00:00
Goutham Veeramachaneni 823b7f90b3
Use the files globbed files and not the files in cfg
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-11-30 17:08:34 +05:30
Fabian Reinartz 62461379b7 rules: decouple notifier packages
The dependency on the notifier packages caused a transitive dependency
on discovery and with that all client libraries our service discovery
uses.
2017-11-27 16:38:14 +01:00
Fabian Reinartz 4d964a0a0d rules: make glob expansion a concern of main 2017-11-24 08:22:57 +01:00
Fabian Reinartz bd9f7460eb rules: remove config package dependency 2017-11-24 07:57:54 +01:00
Fabian Reinartz 2d0e3746ac rules: remove dependency on promql.Engine 2017-11-24 07:57:54 +01:00
Krasi Georgiev e2f4850fea Refactor main.go with oklog/pkg/group actors pattern 2017-11-11 12:33:15 +00:00
Thibault Chataigner fc4406201e Tsdb StartTime : Use a simplier way to compute StartTime 2017-10-25 17:41:00 +02:00
Julius Volz 099df0c5f0 Migrate "golang.org/x/net/context" -> "context" (#3333)
In some places, where ctxhttp or gRPC are concerned, we still need to use the
old contexts.
2017-10-24 21:21:42 -07:00
Julius Volz 9d43176ab3 Remove unused printVersion variable (#3335)
Kingpin now automatically does this via --version.
2017-10-23 08:50:13 +01:00
Julius Volz 82c5b98496 Capitalize Prometheus in startup message (#3332)
Hey, branding :)
2017-10-23 08:49:28 +01:00
Thibault Chataigner bf4a279a91 Remote storage reads based on oldest timestamp in primary storage (#3129)
Currently all read queries are simply pushed to remote read clients.
This is fine, except for remote storage for wich it unefficient and
make query slower even if remote read is unnecessary.
So we need instead to compare the oldest timestamp in primary/local
storage with the query range lower boundary. If the oldest timestamp
is older than the mint parameter, then there is no need for remote read.
This is an optionnal behavior per remote read client.

Signed-off-by: Thibault Chataigner <t.chataigner@criteo.com>
2017-10-18 12:08:14 +01:00
Julius Volz 5f715f5733 Fix typo in flag description (#3302) 2017-10-16 23:00:05 +01:00
Tobias Schmidt 3589f2f1d4 Merge pull request #3285 from jlevesy/use-testutils-in-cmd-subpackage
Use testutil assertion helpers in cmd package
2017-10-13 00:12:39 +02:00
Julien Levesy d7b4fa8d78 use testutil assertions in the cmd/prometheus package 2017-10-12 13:45:38 +02:00
Mathieu Pasquet 38afa507bb Provide better errors messages in commandline
Instead or only printing the help message, which is not always helpful.
For example, when upgrading from prometheus v1, the retention time value
format has changed and now only accepts one unit (e.g. "15d") where it
previously allowed more complex strings (e.g. "360h0m0s").

This commit provides the error message as an explanation for the parsing
failure.
2017-10-09 16:25:50 +02:00
Marc Sluiter 6a633eece1 Added go-conntrack for monitoring http connections (#3241)
Added metrics for in- and outgoing traffic with go-conntrack.
2017-10-06 11:22:19 +01:00
Fabian Reinartz 2d0b8e8b94 Merge branch 'master' into dev-2.0 2017-10-05 13:09:18 +02:00
Paul Gier 08af129b4d cmd/prometheus: don't allow quotes at beginning or end of url
This prevents accidental copy/paste error where a the web.external-url
or alertmanager.url params could have an extra set of quotes.
See also: https://github.com/prometheus/prometheus/issues/1229
2017-10-04 10:10:02 -05:00
Paul Gier f79b55d057 cmd/prometheus: remove govalidator for url validation
The usage of govalidator is redundant with the call to url.Parse for
url validation. Removing it has the following benefits:

 - The explicit error message is displayed instead of just a generic
   valid/invalid message
 - Slightly smaller code with one fewer external dependency
 - Speed improvement by removing duplicate call to url.Parse (inside
   govalidator.IsURL()
 - Resolves issue #2717

The only potential drawback of removing govalidator is that certain
URLs will be considered valid which were previously invalid. For example:

 - URLs with hostnames that start and/or end with an underscore (http://_example.com_)
 - URLs with hostnames that contain some special characters (http://foo&*bar.org)

These are valid URIs according to RFC 3986 and valid domain names per RFC 2181,
however they are not valid hostnames per RFC 952.
2017-10-04 10:08:34 -05:00
Fabian Reinartz 7b02bfee0a web: start web handler while TSDB is starting up 2017-09-20 15:03:19 +02:00
Goutham Veeramachaneni f5aed810f9 logging: Port to common/promlog
Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2017-09-15 12:40:50 +05:30
Fabian Reinartz d21f149745 *: migrate to go-kit/log 2017-09-08 22:01:51 +05:30
Fabian Reinartz c70379e1c7 Merge branch 'dev-2.0' of github.com:prometheus/prometheus into dev-2.0 2017-09-04 13:10:50 +02:00
Fabian Reinartz fffe51fb03 Add mutex and block profiling via envvar 2017-09-04 13:10:32 +02:00
Ben Kochie 59aca4138b Fix staticcheck issues. 2017-08-28 17:29:01 +02:00
Matt Bostock 64973f5c65 cmd/prometheus: Fix capitalisation in log line (#3123)
Change 'Ready' to 'ready'.
2017-08-28 11:03:25 +01:00
Mark Adams 77c816b309 Fix pprof endpoints when -web.route-prefix or -web.external-url is used (#3054)
Whenever a route prefix is applied, the router prepends the prefix to
the URL path on the request. For most handlers, this is not an issue
because the request's path is only used for routing and is not actually
needed by the handler itself. However, Prometheus delegates the handling
of the /debug/* endpoints to the http.DefaultServeMux which has it's own
routing logic that depends on the url.Path. As a result, whenever a
prefix is applied, the prefixed URL is passed to the DefaultServeMux
which has no awareness of the prefix and returns a 404.

This change fixes the issue by creating a new serveDebug handler which
routes requests /debug/* requests to appropriate net/http/pprof handler
and removing the net/http/pprof import in cmd/prometheus since it is no
longer necessary.

Fixes #2183.
2017-08-23 00:00:56 +01:00
Callum Styan 8912f81ffe check if file_sd files exist in checkConfig 2017-08-22 15:25:30 -07:00
Fabian Reinartz 25f3e1c424 Merge branch 'master' into mergemaster 2017-08-10 17:04:25 +02:00
KalivarapuReshma 686050d816 Change -config.file to --config.file in Readme and error message 2017-08-08 12:49:35 +05:30