Commit graph

3839 commits

Author SHA1 Message Date
Fabian Reinartz 0492ddbd4d *: fully decouple tsdb, add new storage interfaces 2016-12-25 01:43:22 +01:00
Fabian Reinartz 1becee3f6c main: remove Alertmanager legacy flag configuration 2016-12-25 00:43:41 +01:00
Fabian Reinartz d17b5be48a storage/metric: remove package 2016-12-25 00:42:52 +01:00
Fabian Reinartz 5817cb5bde *: migrate from model.* to promql.* types 2016-12-25 00:37:46 +01:00
Fabian Reinartz 9ea10d5265 promql: use labels.Builder to modify labels 2016-12-24 14:35:24 +01:00
Fabian Reinartz c6cd998905 promql: use local labels, add conversion 2016-12-24 14:01:37 +01:00
Fabian Reinartz ff504af2aa promql: undo accidental exports 2016-12-24 11:41:37 +01:00
Fabian Reinartz 6dedf89cc3 promql: rename SampleStream to Series 2016-12-24 11:32:42 +01:00
Fabian Reinartz c5f225b920 promql: export Sample 2016-12-24 11:32:10 +01:00
Fabian Reinartz 65581a3d46 promql: export SmapleStream 2016-12-24 11:29:39 +01:00
Fabian Reinartz 6315d00942 promql: export String value 2016-12-24 11:25:26 +01:00
Fabian Reinartz ac5d3bc05e promql: scalar T/V and Point 2016-12-24 11:23:06 +01:00
Fabian Reinartz 09666e2e2a promql: make scalar public 2016-12-24 10:44:04 +01:00
Fabian Reinartz b3f71df350 promql: make matrix exported 2016-12-24 10:42:54 +01:00
Fabian Reinartz a62df87022 promql: rename vector 2016-12-24 10:40:09 +01:00
Fabian Reinartz 15a931dbdb promql: migrate model types, use tsdb interfaces 2016-12-24 00:39:52 +01:00
Fabian Reinartz 8b84ee5ee6 storage: remove old storage
This removes all old storage files and only keeps interfaces
to still allow the code to compile.
2016-12-22 23:33:32 +01:00
Fabian Reinartz 313ab48b45 vendor: drop unused dependencies 2016-12-22 23:20:34 +01:00
Fabian Reinartz 11a731ba82 remote: remove hard-coded remote storages
This commit removes the flag-configured remote storage integrations
in favor of the generic remote write path.
2016-12-22 23:17:35 +01:00
Brian Brazil 93b70ee4ea Evict chunk descs of all unloaded chunks during maintenance. (#2297)
Keeping these around has two problems:
1) Each desc takes 64 bytes, 10 of them is 640B. This is a lot of
overhead on a 1024 byte chunk.
2) It can take well over a week to reach a point where this and thus
Prometheus memory usage as a whole enters steady state. This makes RAM
estimation very hard for users, and makes it difficult to investigate
things like memory fragmentation.

Instead we'll wipe them during each memory series maintenance cycle, and
if a query pulls them in they'll hang around as cache until the next
cycle.
2016-12-22 13:49:03 +00:00
Brian Brazil bed4635802 Use irate consistently in console template examples. (#2296)
I must have forgotten my 'g' when switching these.
2016-12-21 13:19:23 +00:00
Fabian Reinartz d6d03a966f Merge pull request #2295 from prometheus/fast-path-remote
Don't clone the metric if there's no remote writes.
2016-12-21 12:36:41 +01:00
Brian Brazil 1b8a474612 Don't clone the metric if there's no remote writes.
The metric clone can't be further optimised, and is a
non-trivial memory allocation cost so fast path it
if there's no remote writes configured.
2016-12-21 11:34:48 +00:00
Brian Brazil 6c07453ec1 Only clone the metric in the one place relabelling needs it. (#2292)
This cuts ~17% off memory allocations related to ingesting data
in a basic setup.
2016-12-21 10:00:33 +00:00
Brian Brazil 2e3b42ad6c Correctly handle the end time being 0 in the URL. (#2290) 2016-12-18 19:30:52 +00:00
Brian Brazil f421ce0636 Remove label from prometheus_target_skipped_scrapes_total (#2289)
This avoids it not being intialised, and breaking out by
interval wasn't partiuclarly useful.

Fixes #2269
2016-12-16 18:00:52 +00:00
Brian Brazil 30448286c7 Add sample_limit to scrape config.
This imposes a hard limit on the number of samples ingested from the
target. This is counted after metric relabelling, to allow dropping of
problemtic metrics.

This is intended as a very blunt tool to prevent overload due to
misbehaving targets that suddenly jump in sample count (e.g. adding
a label containing email addresses).

Add metric to track how often this happens.

Fixes #2137
2016-12-16 15:10:09 +00:00
Björn Rabenstein f3f798fbcf Merge pull request #2283 from tcolgate/ignoredots
ignore dotfiles in data directory
2016-12-15 13:32:03 +01:00
Tristan Colgate 30be8e0b8a ignore dotfiles in data directory 2016-12-15 11:48:23 +00:00
Tristan Colgate-McFarlane 4d9134e6d8 Add labeldrop and labelkeep actions. (#2279)
Introduce two new relabel actions. labeldrop, and labelkeep.
These can be used to filter the set of labels by matching regex

- labeldrop: drops all labels that match the regex
- labelkeep: drops all labels that do not match the regex
2016-12-14 10:17:42 +00:00
Björn Rabenstein 45570e5972 Merge pull request #2277 from prometheus/beorn7/storage2
storage: Sanity-check number of loaded chunk descs
2016-12-14 02:59:10 +01:00
beorn7 253be23c00 storage: Sanity-check number of loaded chunk descs
Two cases:

- An unarchived metric must have at least one chunk desc loaded upon
  unarchival. Otherwise, the file is gone or has size 0, which is an
  inconsistency (because the series is still indexed in the archive
  index). Hence, quarantining is triggered.

- If loading the chunk descs of a series with a known chunkDescsOffset
  (i.e. != -1), the number of chunks loaded must be equal to
  chunkDescsOffset. If not, there is a data corruption. An error is
  returned, which leads to qurantining.

In any case, there is a guard added to not access the 1st element of
an empty chunkDescs slice. (That's what triggered the crashes in issue
2249.)  A time series with unknown chunkDescsOffset and no chunks in
memory and no chunks on disk either could trigger that case. I would
assume such a "null series" doesn't exist, but it's not entirely
unthinkable and unreasonable to happen (perhaps in future uses of the
storage). (Create a series, and then something tries to preload chunks
before the first sample is added.)
2016-12-13 23:19:39 +01:00
Björn Rabenstein 5f0c0e43cf Merge pull request #2276 from prometheus/beorn7/storage
storage: Catch data corruption that leads to division by zero
2016-12-13 23:13:39 +01:00
Björn Rabenstein a4c8292232 Merge pull request #2278 from prometheus/beorn7/style
storage: Fix linter issue
2016-12-13 23:13:05 +01:00
beorn7 837c029b16 storage: Fix linter issue
Go style tries to avoid indented `else` blocks.
2016-12-13 19:05:30 +01:00
Brian Brazil c8de1484d5 Add scrape_samples_post_metric_relabeling
This reports the number of samples post any keep/drop
from metric relabelling.
2016-12-13 17:32:11 +00:00
Brian Brazil 06b9df65ec Refactor and add unittests to scrape result handling. 2016-12-13 16:49:17 +00:00
Björn Rabenstein 568fd8a8cb Merge pull request #2155 from prometheus/beorn7/vendoring2
Update vendoring for Azure
2016-12-13 17:10:59 +01:00
beorn7 4719482f5f storage: Make tests go-vet and golint clean 2016-12-13 17:07:27 +01:00
beorn7 485ac8dff7 storage: Verify validity of byte length when unmarshalling (double)delta chunks
This makes sure a division-by-zero crash cannot happen in the Len()
method.

Fixes #2773
2016-12-13 17:07:27 +01:00
Brian Brazil b5ded43594 Allow buffering of scraped samples before sending them to storage. 2016-12-13 15:01:35 +00:00
beorn7 906c3a2237 Update vendoring for Azure
Also, actually record the vendored version in vendor.json.
2016-12-13 14:21:16 +01:00
tattsun e714079cf2 storage: fix error message (#2270)
* storage: add error message
2016-12-09 22:36:27 +00:00
Fabian Reinartz 9ecea36ef9 Merge pull request #2259 from prometheus/federationerr
web: don't return federation errors over HTTP
2016-12-06 16:18:03 +01:00
Fabian Reinartz cef2e04aa3 web: add error counter for federation responses 2016-12-06 16:09:50 +01:00
Fabian Reinartz 0ea0a19848 Merge pull request #2240 from agaoglu/read-timeout
Set read-timeout for http.Server
2016-12-06 16:01:45 +01:00
Fabian Reinartz 9d68e81b32 web: don't return federation errors over HTTP
We are writing federation responses streaming. So after
the first byte we wrote, the status header is fixed. We cannot
return an HTTP error for intermediate error but should just abort
and log instead.
2016-12-06 15:52:50 +01:00
Erdem Agaoglu 054f8ebbfb Increase default max-connections 2016-12-06 17:45:19 +03:00
Erdem Agaoglu 2260079c12 Vendor x/net/netutil 2016-12-06 12:52:29 +03:00
Erdem Agaoglu e487477a17 LimitListener to limit max number of connections
This also drops tcp keep-alive in ListenAndServe but it's no longer
necessary since we now close idle connections long before that.
2016-12-06 12:45:59 +03:00