Commit graph

5543 commits

Author SHA1 Message Date
Richard Kiene f3d9692d09 Add Joyent Triton discovery 2017-01-17 20:34:32 +00:00
Fabian Reinartz 598e2f01c0 retrieval: don't erronously break appending 2017-01-17 08:39:18 +01:00
Fabian Reinartz d80a3de235 pkg/textparse: add documentation 2017-01-17 08:16:47 +01:00
Brian Brazil c1b547a90e Only checkpoint chunkdescs and series that need persisting. (#2340)
This decreases checkpoint size by not checkpointing things
that don't actually need checkpointing.

This is fully compatible with the v2 checkpoint format,
as it makes series appear as though the only chunksdescs
in memory are those that need persisting.
2017-01-17 00:59:38 +00:00
Fabian Reinartz 5418a42965 Merge pull request #2345 from Bplotka/fixed-alertmanager-flag-auth
Fixed regression in `-alertmanager.url flag`. Basic auth was ignored.
2017-01-16 18:29:51 +01:00
Bartek Plotka 579e33f19a Fixed style issues. 2017-01-16 16:45:58 +00:00
Bartek Plotka d7febe97fa Fixed regression in -alertmanager.url flag. Basic auth was ignored.
- Included basic auth parsing while parsing to AlertmanagerConfig
- Added test case

Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2017-01-16 16:39:20 +00:00
Fabian Reinartz db48726a6b pkg/textparse: allocate single string per metric 2017-01-16 17:24:00 +01:00
Fabian Reinartz 157e698958 web/api: fix min/max timestamps to valid range 2017-01-16 14:09:59 +01:00
Fabian Reinartz 990e40c959 Merge pull request #2338 from brancz/alertmanager-api
web/api: add alertmanager api
2017-01-16 12:08:14 +01:00
Fabian Reinartz c691895a0f retrieval: cache series references, use pkg/textparse
With this change the scraping caches series references and only
allocates label sets if it has to retrieve a new reference.
pkg/textparse is used to do the conditional parsing and reduce
allocations from 900B/sample to 0 in the general case.
2017-01-16 12:03:57 +01:00
Frederic Branczyk bd92571bdd
web/api: make target and alertmanager api responses consistent 2017-01-16 11:53:00 +01:00
Fabian Reinartz 022714b60a Merge pull request #2341 from mattbostock/patch-1
Correct notifications_dropped description
2017-01-16 09:23:46 +01:00
Fabian Reinartz fb3ab9bdb7 pkg/textparse: add more benchmarking, align lex defs 2017-01-15 17:32:57 +01:00
Fabian Reinartz e44d80314d pkg/textparse: add tests and method to retrieve full labels 2017-01-14 19:30:19 +01:00
Fabian Reinartz 091a7f2395 pkg/textparse: add initial text parser 2017-01-14 16:39:04 +01:00
Matt Bostock 4160892109 Correct notifications_dropped description
The current description does not accurately describe when the metric is incremented.

Aside from Alertmanger missing from the configuration, `prometheus_notifications_dropped_total` is incremented when errors occur while sending alert notifications to Alertmanager, or because the notifications queue is full, or because the number of notifications to be sent exceeds the queue capacity.

I think calling these cases 'errors' in a generic sense is more useful than the current description.
2017-01-13 23:36:00 +00:00
Brian Brazil f64c231dad Allow checkpoints and maintenance to happen concurrently. (#2321)
This is essential on larger Prometheus servers, as otherwise
checkpoints prevent sufficient persisting of chunks to disk.
2017-01-13 17:24:19 +00:00
Frederic Branczyk 389c6d0043
web/api: add alertmanager api 2017-01-13 15:30:20 +01:00
Fabian Reinartz ad9bc62e4c storage: extend appender and adapt it 2017-01-13 14:48:01 +01:00
Brian Brazil 1dcb7637f5 Add various persistence related metrics (#2333)
Add metrics around checkpointing and persistence

* Add a metric to say if checkpointing is happening,
and another to track total checkpoint time and count.

This breaks the existing prometheus_local_storage_checkpoint_duration_seconds
by renaming it to prometheus_local_storage_checkpoint_last_duration_seconds
as the former name is more appropriate for a summary.

* Add metric for last checkpoint size.

* Add metric for series/chunks processed by checkpoints.

For long checkpoints it'd be useful to see how they're progressing.

* Add metric for dirty series

* Add metric for number of chunks persisted per series.

You can get the number of chunks from chunk_ops,
but not the matching number of series. This helps determine
the size of the writes being made.

* Add metric for chunks queued for persistence

Chunks created includes both chunks that'll need persistence
and chunks read in for queries. This only includes chunks created
for persistence.

* Code review comments on new persistence metrics.
2017-01-11 15:11:19 +00:00
Björn Rabenstein 6ce97837ab Merge pull request #2327 from prometheus/beorn7/vendoring
vendoring: Update prometheus/common to pull in bug fixes
2017-01-09 13:28:36 +01:00
beorn7 86ec87b78f vendoring: Update prometheus/common to pull in bug fixes
In particular the one for https://github.com/prometheus/common/issues/72.
2017-01-09 12:25:17 +01:00
Fabian Reinartz 3302bb1eb1 Merge pull request #2323 from prometheus/beorn7/retrieval
Retrieval: Avoid copying Target
2017-01-08 06:49:47 +01:00
Björn Rabenstein ad40d0abbc Merge pull request #2288 from prometheus/limit-scrape
Add ability to limit scrape samples, and related metrics
2017-01-08 01:34:06 +01:00
beorn7 5dc01202d7 Retrieval: Remove some test lines that fail on Travis only
These lines exercise an append in
TestScrapeLoopWrapSampleAppender. Arguably, append shouldn't be tested
there in the first place.

Still it's weird why this fails on Travis:

```
--- FAIL: TestScrapeLoopWrapSampleAppender (0.00s)
    scrape_test.go:259: Expected count of 1, got 0
    scrape_test.go:290: Expected count of 1, got 0
2017/01/07 22:48:26 http: TLS handshake error from 127.0.0.1:50716: read tcp 127.0.0.1:40265->127.0.0.1:50716: read: connection reset by peer
FAIL
FAIL	github.com/prometheus/prometheus/retrieval	3.603s
```

Should anybody ever find out why, please revert this commit accordingly.
2017-01-08 00:01:46 +01:00
beorn7 3610331eeb Retrieval: Do not buffer the samples if no sample limit configured
Also, simplify and streamline the code a bit.
2017-01-07 18:18:54 +01:00
André Carvalho c43dfaba1c Add max concurrent and current queries engine metrics (#2326)
* Add max concurrent and current queries engine metrics

This commit adds two metrics to the promql/engine: the
number of max concurrent queries, as configured by the flag, and
the number of current queries being served+blocked in the engine.
2017-01-07 14:41:25 +00:00
beorn7 767c0709b1 Retrieval: Avoid copying Target
retreival.Target contains a mutex. It was copied in the Targets()
call. This potentially can wreak a lot of havoc.

It might even have caused the issues reported as #2266 and #2262 .
2017-01-06 18:43:41 +01:00
Fabian Reinartz 304cae9928 tsdb: Use PartitionedDB constructor 2017-01-06 12:34:54 +01:00
Brian Brazil f9e581907a Make index queue bigger. (#2322)
When a large Prometheus starts up fresh it can take many minutes
to warmup and clear out the index queue. A larger queue means less
blocking, bigger batches and cuts down startup time by ~50%.
2017-01-05 17:57:42 +00:00
Fabian Reinartz c9f4aea8e2 Merge pull request #2305 from alicebob/favicon
Add a favicon to the web GUI
2017-01-04 10:15:27 +01:00
Martin Lehmann 78fae3155f Make relative links in README.md absolute (#2316)
The relative links don't work in other pages that render the README (for example https://hub.docker.com/r/prom/prometheus/). As they are (hopefully) not due to change any time soon, I think using absolute links is better.
2017-01-03 20:07:33 +00:00
Fabian Reinartz bc20d93f0a storage: rename iterator value getters to At() 2017-01-02 13:33:37 +01:00
Julius Volz 90dd216646 Merge pull request #2306 from EdSchouten/sorted-alerts
Use lexicographic order to sort alerts by name.
2016-12-31 13:12:30 +01:00
Fabian Reinartz 89b6b3be9f vendor: remove unused dependencies 2016-12-31 09:32:55 +01:00
Fabian Reinartz e631a1260d retrieval: use separate appender per target 2016-12-30 21:35:35 +01:00
Fabian Reinartz 61bd698143 web: implement federation for new storage 2016-12-30 19:34:45 +01:00
Fabian Reinartz 7322c46b8e storage: add mock iterator for test 2016-12-30 10:45:56 +01:00
Fabian Reinartz 28f547bcc7 api/v1: fix tests, restore series queries 2016-12-30 10:43:44 +01:00
Fabian Reinartz e94b0899ee rules: fix tests, remove model types 2016-12-29 17:31:14 +01:00
Fabian Reinartz 68dc358496 cmd/prometheus: remove tests for old flags 2016-12-29 16:55:22 +01:00
Fabian Reinartz 8b4e4a9d2b notifier: fully use labels.Labels 2016-12-29 16:53:11 +01:00
Fabian Reinartz f8fc1f5bb2 *: migrate ingestion to new batch Appender 2016-12-29 11:03:56 +01:00
Fabian Reinartz 86cb0f30fd pkg/relabel: add relabel pkg using new labels 2016-12-28 19:04:59 +01:00
Fabian Reinartz 0987a72ec9 pkg/timestamp: create timestamp package 2016-12-28 11:33:00 +01:00
Fabian Reinartz 71fe0c58a8 promql: misc fixes 2016-12-28 11:32:15 +01:00
Mitsuhiro Tanda 7e369b9318 expose max memory chunks metrics (#2303)
* expose max memory chunks metrics
2016-12-27 18:34:07 +00:00
Ed Schouten b3a39ccd8a Use lexicographic order to sort alerts by name.
Right now the /alerts page of Prometheus sorts alerts by severity
(firing, pending, inactive). Once multiple alerts have the same
severity, their order seems to correlate to how they are placed in the
configuration files, but not always. Looking at the code, we make use of
sort.Sort(), which is documented not to provide a stable sort. The
Less() function also only takes the alert state into account.

This change extends the Less() function to provide a lexicographic order
on both the alert state and the name. This means I can finally find the
alerts I'm looking for without using my browser's search feature.
2016-12-27 14:28:44 +01:00
Harmen 135d32ea22 make assets 2016-12-27 13:59:20 +01:00