Commit graph

3879 commits

Author SHA1 Message Date
Björn Rabenstein 3e133a9312 Merge pull request #2400 from prometheus/beorn7/storage2
storage: Fix checkpointing of fully persisted memory series.
2017-02-07 00:21:14 +01:00
Fabian Reinartz e4f58d9860 Merge pull request #2401 from svend/kubernetes-config
Kubernetes SD: Fix namespace meta label in example config
2017-02-06 23:25:37 +01:00
Svend Sorensen 3a96d0e267 Kubernetes SD: Fix namespace meta label
Replace one more instance of `__meta_kubernetes_service_namespace` with
`__meta_kubernetes_namespace`.
2017-02-06 13:28:12 -08:00
beorn7 2363a90adc storage: Do not throw away fully persisted memory series in checkpointing 2017-02-06 17:39:59 +01:00
Or Cohen 93d20d2d2b Improve fuzzy search
The fuzzy library didn't try to find a "best match", but settled on the
first fuzzy match that exists. This patch includes a modified version of
the fuzzy library, which recursivley tries on the rest of the search
string to find a better match. If found, returns that one.

Another small modification is that if a pattern fully matches, it
skips the lookup entirley and returns the highest score possible for
that match.
2017-02-05 17:38:05 +02:00
Or Cohen 81d37a04aa Fix autocomplete misses certain queries
For some of the queries, the fuzzy lookup was not filtering properly.
The problem is due to the "replace" beind made on the query itself. It
accidently removes only the first underscore. This patch changes it so
that it removes all of the whitespaces, letting the fuzzy algorithm do
its magic, also fixing this problem.

Originally, the underscore were replaced by a space for this specific
reason, to let the user type a space and have the lookup treat it as the
word break.

Fixes #2380
2017-02-05 16:20:52 +02:00
beorn7 244a65fb29 storage: Increase persist watermark before calling append
The append call may reuse cds, and thus change its len.
(In practice, this wouldn't happen as cds should have len==cap.
Still, the previous order of lines was problematic.)
2017-02-05 02:25:09 +01:00
beorn7 75282b27ba storage: Added checks for invariants 2017-02-04 23:40:22 +01:00
beorn7 31e9db7f0c storage: Simplify evictChunkDesc method 2017-02-04 22:29:37 +01:00
beorn7 65dc8f44d3 storage: Test for errors returned by MaybePopulateLastTime 2017-02-01 23:43:58 +01:00
beorn7 752fac60ae storage: Remove race condition from TestLoop 2017-02-01 23:43:58 +01:00
beorn7 4daffbef12 Merge branch 'release-1.5'
This merges forward the bug-fixes from the release1.5 branch.
2017-02-01 23:43:05 +01:00
Brian Brazil 34767c2221 Clone lset before relabelling. (#2386)
We need to not change the lset passed into populateLabels, as that
is kept around by the SDs.

Fixes 2377
2017-02-01 19:49:50 +00:00
Björn Rabenstein 7db4447390 Merge pull request #2385 from prometheus/beorn7/storage
Fix embarrassing bug of not setting the shrink ratio
2017-02-01 16:58:56 +01:00
beorn7 4ccfc93dcf storage: Set shrink ratio in the constructor. 2017-02-01 15:37:16 +01:00
beorn7 b2f086c6c4 storage: Expose bug of not setting the shrink ratio in the contstructor 2017-02-01 15:37:10 +01:00
Julius Volz d5f6079029 Merge pull request #2381 from prometheus/remote-storage-bridge-example
Add standalone remote storage bridge example
2017-02-01 13:23:06 +01:00
Julius Volz b16371595d Add standalone remote storage bridge example
In preparation for removing specific remote storage implementations,
this offers an example of how to achieve the same in a separate process.
Rather than having three separate bridges for OpenTSDB, InfluxDB, and
Graphite, I decided to support all in one binary.

For now, this is in the example documenation directory, but perhaps we
will want to make a first-class project / repository out of it.
2017-02-01 13:22:41 +01:00
Julius Volz 5e985f24de Merge pull request #2179 from prometheus/update-mailing-list-ref
Replace mailing list / IRC mention with link to Community page
2017-01-26 17:08:16 +01:00
Julius Volz 2e1d8dd6bd Replace mailing list / IRC mention with link to Community page 2017-01-26 17:07:27 +01:00
Björn Rabenstein 22a8fb4bc9 Merge pull request #2361 from larkinscott/patch-1
Update .codeclimate.yml
2017-01-24 11:51:51 +01:00
Scott Larkin 5319e1da09 Update .codeclimate.yml
Changed the vendor/ path in the exclude paths node.
2017-01-23 14:58:53 -05:00
Frederic Branczyk d840f2c400 Merge pull request #2359 from brancz/cut-1.5.0
*: cut 1.5.0
2017-01-23 14:05:51 +01:00
Frederic Branczyk fb17493f66
*: cut 1.5.0 2017-01-23 12:59:01 +01:00
Björn Rabenstein 9688a312ed Merge pull request #2355 from prometheus/beorn7/lint
Remove auto-generated protobuf code from codeclimate
2017-01-20 11:31:51 +01:00
beorn7 4392aa43d4 Remove auto-generated protobuf code from codeclimate 2017-01-20 11:07:20 +01:00
Björn Rabenstein d717175104 Merge pull request #2354 from prometheus/beorn7/lint
Documentation: Add Code Climate badges to README.md
2017-01-20 10:51:05 +01:00
beorn7 0c8b753f6e Documentation: Add Code Climate badges to README.md 2017-01-19 23:22:22 +01:00
Scott Larkin e5a75b2b30 Code Climate config (#2351)
Created a Code Climate config with gofmt, golint, and govet enabled
2017-01-19 22:19:32 +01:00
Alex Somesan b22eb65d0f Cleaner separation between ServiceAccount and custom authentication in K8S SD (#2348)
* Canonical usage of cluster service-account in K8S SD

* Early validation for opt-in custom auth in K8S SD

* Fix typo in condition
2017-01-19 10:52:52 +01:00
Fabian Reinartz 7eb849e6a8 Merge pull request #2307 from joyent/triton_discovery
Add Joyent Triton discovery
2017-01-18 05:08:11 +01:00
Richard Kiene f3d9692d09 Add Joyent Triton discovery 2017-01-17 20:34:32 +00:00
Brian Brazil c1b547a90e Only checkpoint chunkdescs and series that need persisting. (#2340)
This decreases checkpoint size by not checkpointing things
that don't actually need checkpointing.

This is fully compatible with the v2 checkpoint format,
as it makes series appear as though the only chunksdescs
in memory are those that need persisting.
2017-01-17 00:59:38 +00:00
Fabian Reinartz 5418a42965 Merge pull request #2345 from Bplotka/fixed-alertmanager-flag-auth
Fixed regression in `-alertmanager.url flag`. Basic auth was ignored.
2017-01-16 18:29:51 +01:00
Bartek Plotka 579e33f19a Fixed style issues. 2017-01-16 16:45:58 +00:00
Bartek Plotka d7febe97fa Fixed regression in -alertmanager.url flag. Basic auth was ignored.
- Included basic auth parsing while parsing to AlertmanagerConfig
- Added test case

Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2017-01-16 16:39:20 +00:00
Fabian Reinartz 990e40c959 Merge pull request #2338 from brancz/alertmanager-api
web/api: add alertmanager api
2017-01-16 12:08:14 +01:00
Frederic Branczyk bd92571bdd
web/api: make target and alertmanager api responses consistent 2017-01-16 11:53:00 +01:00
Fabian Reinartz 022714b60a Merge pull request #2341 from mattbostock/patch-1
Correct notifications_dropped description
2017-01-16 09:23:46 +01:00
Matt Bostock 4160892109 Correct notifications_dropped description
The current description does not accurately describe when the metric is incremented.

Aside from Alertmanger missing from the configuration, `prometheus_notifications_dropped_total` is incremented when errors occur while sending alert notifications to Alertmanager, or because the notifications queue is full, or because the number of notifications to be sent exceeds the queue capacity.

I think calling these cases 'errors' in a generic sense is more useful than the current description.
2017-01-13 23:36:00 +00:00
Brian Brazil f64c231dad Allow checkpoints and maintenance to happen concurrently. (#2321)
This is essential on larger Prometheus servers, as otherwise
checkpoints prevent sufficient persisting of chunks to disk.
2017-01-13 17:24:19 +00:00
Frederic Branczyk 389c6d0043
web/api: add alertmanager api 2017-01-13 15:30:20 +01:00
Brian Brazil 1dcb7637f5 Add various persistence related metrics (#2333)
Add metrics around checkpointing and persistence

* Add a metric to say if checkpointing is happening,
and another to track total checkpoint time and count.

This breaks the existing prometheus_local_storage_checkpoint_duration_seconds
by renaming it to prometheus_local_storage_checkpoint_last_duration_seconds
as the former name is more appropriate for a summary.

* Add metric for last checkpoint size.

* Add metric for series/chunks processed by checkpoints.

For long checkpoints it'd be useful to see how they're progressing.

* Add metric for dirty series

* Add metric for number of chunks persisted per series.

You can get the number of chunks from chunk_ops,
but not the matching number of series. This helps determine
the size of the writes being made.

* Add metric for chunks queued for persistence

Chunks created includes both chunks that'll need persistence
and chunks read in for queries. This only includes chunks created
for persistence.

* Code review comments on new persistence metrics.
2017-01-11 15:11:19 +00:00
Björn Rabenstein 6ce97837ab Merge pull request #2327 from prometheus/beorn7/vendoring
vendoring: Update prometheus/common to pull in bug fixes
2017-01-09 13:28:36 +01:00
beorn7 86ec87b78f vendoring: Update prometheus/common to pull in bug fixes
In particular the one for https://github.com/prometheus/common/issues/72.
2017-01-09 12:25:17 +01:00
Fabian Reinartz 3302bb1eb1 Merge pull request #2323 from prometheus/beorn7/retrieval
Retrieval: Avoid copying Target
2017-01-08 06:49:47 +01:00
Björn Rabenstein ad40d0abbc Merge pull request #2288 from prometheus/limit-scrape
Add ability to limit scrape samples, and related metrics
2017-01-08 01:34:06 +01:00
beorn7 5dc01202d7 Retrieval: Remove some test lines that fail on Travis only
These lines exercise an append in
TestScrapeLoopWrapSampleAppender. Arguably, append shouldn't be tested
there in the first place.

Still it's weird why this fails on Travis:

```
--- FAIL: TestScrapeLoopWrapSampleAppender (0.00s)
    scrape_test.go:259: Expected count of 1, got 0
    scrape_test.go:290: Expected count of 1, got 0
2017/01/07 22:48:26 http: TLS handshake error from 127.0.0.1:50716: read tcp 127.0.0.1:40265->127.0.0.1:50716: read: connection reset by peer
FAIL
FAIL	github.com/prometheus/prometheus/retrieval	3.603s
```

Should anybody ever find out why, please revert this commit accordingly.
2017-01-08 00:01:46 +01:00
beorn7 3610331eeb Retrieval: Do not buffer the samples if no sample limit configured
Also, simplify and streamline the code a bit.
2017-01-07 18:18:54 +01:00
André Carvalho c43dfaba1c Add max concurrent and current queries engine metrics (#2326)
* Add max concurrent and current queries engine metrics

This commit adds two metrics to the promql/engine: the
number of max concurrent queries, as configured by the flag, and
the number of current queries being served+blocked in the engine.
2017-01-07 14:41:25 +00:00