Fabian Reinartz
11a731ba82
remote: remove hard-coded remote storages
...
This commit removes the flag-configured remote storage integrations
in favor of the generic remote write path.
2016-12-22 23:17:35 +01:00
Brian Brazil
93b70ee4ea
Evict chunk descs of all unloaded chunks during maintenance. ( #2297 )
...
Keeping these around has two problems:
1) Each desc takes 64 bytes, 10 of them is 640B. This is a lot of
overhead on a 1024 byte chunk.
2) It can take well over a week to reach a point where this and thus
Prometheus memory usage as a whole enters steady state. This makes RAM
estimation very hard for users, and makes it difficult to investigate
things like memory fragmentation.
Instead we'll wipe them during each memory series maintenance cycle, and
if a query pulls them in they'll hang around as cache until the next
cycle.
2016-12-22 13:49:03 +00:00
Brian Brazil
bed4635802
Use irate consistently in console template examples. ( #2296 )
...
I must have forgotten my 'g' when switching these.
2016-12-21 13:19:23 +00:00
Fabian Reinartz
d6d03a966f
Merge pull request #2295 from prometheus/fast-path-remote
...
Don't clone the metric if there's no remote writes.
2016-12-21 12:36:41 +01:00
Brian Brazil
1b8a474612
Don't clone the metric if there's no remote writes.
...
The metric clone can't be further optimised, and is a
non-trivial memory allocation cost so fast path it
if there's no remote writes configured.
2016-12-21 11:34:48 +00:00
Brian Brazil
6c07453ec1
Only clone the metric in the one place relabelling needs it. ( #2292 )
...
This cuts ~17% off memory allocations related to ingesting data
in a basic setup.
2016-12-21 10:00:33 +00:00
Brian Brazil
2e3b42ad6c
Correctly handle the end time being 0 in the URL. ( #2290 )
2016-12-18 19:30:52 +00:00
Brian Brazil
f421ce0636
Remove label from prometheus_target_skipped_scrapes_total ( #2289 )
...
This avoids it not being intialised, and breaking out by
interval wasn't partiuclarly useful.
Fixes #2269
2016-12-16 18:00:52 +00:00
Brian Brazil
30448286c7
Add sample_limit to scrape config.
...
This imposes a hard limit on the number of samples ingested from the
target. This is counted after metric relabelling, to allow dropping of
problemtic metrics.
This is intended as a very blunt tool to prevent overload due to
misbehaving targets that suddenly jump in sample count (e.g. adding
a label containing email addresses).
Add metric to track how often this happens.
Fixes #2137
2016-12-16 15:10:09 +00:00
Björn Rabenstein
f3f798fbcf
Merge pull request #2283 from tcolgate/ignoredots
...
ignore dotfiles in data directory
2016-12-15 13:32:03 +01:00
Tristan Colgate
30be8e0b8a
ignore dotfiles in data directory
2016-12-15 11:48:23 +00:00
Tristan Colgate-McFarlane
4d9134e6d8
Add labeldrop and labelkeep actions. ( #2279 )
...
Introduce two new relabel actions. labeldrop, and labelkeep.
These can be used to filter the set of labels by matching regex
- labeldrop: drops all labels that match the regex
- labelkeep: drops all labels that do not match the regex
2016-12-14 10:17:42 +00:00
Björn Rabenstein
45570e5972
Merge pull request #2277 from prometheus/beorn7/storage2
...
storage: Sanity-check number of loaded chunk descs
2016-12-14 02:59:10 +01:00
beorn7
253be23c00
storage: Sanity-check number of loaded chunk descs
...
Two cases:
- An unarchived metric must have at least one chunk desc loaded upon
unarchival. Otherwise, the file is gone or has size 0, which is an
inconsistency (because the series is still indexed in the archive
index). Hence, quarantining is triggered.
- If loading the chunk descs of a series with a known chunkDescsOffset
(i.e. != -1), the number of chunks loaded must be equal to
chunkDescsOffset. If not, there is a data corruption. An error is
returned, which leads to qurantining.
In any case, there is a guard added to not access the 1st element of
an empty chunkDescs slice. (That's what triggered the crashes in issue
2249.) A time series with unknown chunkDescsOffset and no chunks in
memory and no chunks on disk either could trigger that case. I would
assume such a "null series" doesn't exist, but it's not entirely
unthinkable and unreasonable to happen (perhaps in future uses of the
storage). (Create a series, and then something tries to preload chunks
before the first sample is added.)
2016-12-13 23:19:39 +01:00
Björn Rabenstein
5f0c0e43cf
Merge pull request #2276 from prometheus/beorn7/storage
...
storage: Catch data corruption that leads to division by zero
2016-12-13 23:13:39 +01:00
Björn Rabenstein
a4c8292232
Merge pull request #2278 from prometheus/beorn7/style
...
storage: Fix linter issue
2016-12-13 23:13:05 +01:00
beorn7
837c029b16
storage: Fix linter issue
...
Go style tries to avoid indented `else` blocks.
2016-12-13 19:05:30 +01:00
Brian Brazil
c8de1484d5
Add scrape_samples_post_metric_relabeling
...
This reports the number of samples post any keep/drop
from metric relabelling.
2016-12-13 17:32:11 +00:00
Brian Brazil
06b9df65ec
Refactor and add unittests to scrape result handling.
2016-12-13 16:49:17 +00:00
Björn Rabenstein
568fd8a8cb
Merge pull request #2155 from prometheus/beorn7/vendoring2
...
Update vendoring for Azure
2016-12-13 17:10:59 +01:00
beorn7
4719482f5f
storage: Make tests go-vet and golint clean
2016-12-13 17:07:27 +01:00
beorn7
485ac8dff7
storage: Verify validity of byte length when unmarshalling (double)delta chunks
...
This makes sure a division-by-zero crash cannot happen in the Len()
method.
Fixes #2773
2016-12-13 17:07:27 +01:00
Brian Brazil
b5ded43594
Allow buffering of scraped samples before sending them to storage.
2016-12-13 15:01:35 +00:00
beorn7
906c3a2237
Update vendoring for Azure
...
Also, actually record the vendored version in vendor.json.
2016-12-13 14:21:16 +01:00
tattsun
e714079cf2
storage: fix error message ( #2270 )
...
* storage: add error message
2016-12-09 22:36:27 +00:00
Fabian Reinartz
9ecea36ef9
Merge pull request #2259 from prometheus/federationerr
...
web: don't return federation errors over HTTP
2016-12-06 16:18:03 +01:00
Fabian Reinartz
cef2e04aa3
web: add error counter for federation responses
2016-12-06 16:09:50 +01:00
Fabian Reinartz
0ea0a19848
Merge pull request #2240 from agaoglu/read-timeout
...
Set read-timeout for http.Server
2016-12-06 16:01:45 +01:00
Fabian Reinartz
9d68e81b32
web: don't return federation errors over HTTP
...
We are writing federation responses streaming. So after
the first byte we wrote, the status header is fixed. We cannot
return an HTTP error for intermediate error but should just abort
and log instead.
2016-12-06 15:52:50 +01:00
Erdem Agaoglu
054f8ebbfb
Increase default max-connections
2016-12-06 17:45:19 +03:00
Erdem Agaoglu
2260079c12
Vendor x/net/netutil
2016-12-06 12:52:29 +03:00
Erdem Agaoglu
e487477a17
LimitListener to limit max number of connections
...
This also drops tcp keep-alive in ListenAndServe but it's no longer
necessary since we now close idle connections long before that.
2016-12-06 12:45:59 +03:00
Fabian Reinartz
893390e0c6
Merge pull request #2248 from msiebuhr/cwd-in-status
...
web: Display current working directory on status-page
2016-12-05 21:41:37 +01:00
Morten Siebuhr
c5b17263a6
web: Display current working directory on status-page
2016-12-05 19:46:41 +01:00
Björn Rabenstein
a932c1a4b6
Merge pull request #1794 from cmluciano/cml/persistenceerror
...
Clarify error message when Prometheus data dir finds unexpected files
2016-12-05 18:40:51 +01:00
Christopher M. Luciano
148b006e25
Clarify error message when Prometheus data dir finds unexpected files
2016-12-05 10:51:57 -05:00
Fabian Reinartz
0459dcd2e2
Merge pull request #2234 from brancz/targets-api
...
web/api: add targets endpoint
2016-12-05 14:14:04 +01:00
Frederic Branczyk
33b583d50e
web/api: add targets endpoint
2016-12-05 13:13:21 +01:00
Frederic Branczyk
8f8cea4fbd
retrieval: refactor TargetManager to return flat list of Targets
2016-12-02 13:28:58 +01:00
Erdem Agaoglu
9986b28380
Set read-timeout for http.Server
...
This also specifies a timeout for idle client connections, which may
cause "too many open files" errors.
See #2238
2016-12-01 16:29:45 +03:00
Fabian Reinartz
63fe65bf2f
Merge pull request #2235 from prometheus/beorn7/doc
...
Kubernetes SD: More fixes to example config
2016-11-30 09:55:09 +01:00
beorn7
5770d9e545
Kubernetes SD: More fixes to example config
...
- Avoid mentioning the `in_cluster` option. (It doesn't exist anymore.)
- Replace `__meta_kubernetes_service_namespace` and
`__meta_kubernetes_pod_namespace` (which don't exist anymore) by
`__meta_kubernetes_namespace`.
2016-11-29 18:42:35 +01:00
Fabian Reinartz
2a89e8733f
Merge pull request #2230 from prometheus/cut-1.4.1
...
*: cut 1.4.1
2016-11-28 09:33:26 +01:00
Fabian Reinartz
6be1e98278
*: cut 1.4.1
2016-11-28 09:29:23 +01:00
Fabian Reinartz
d95e61d418
Merge pull request #2223 from prometheus/consulfix
...
consul: start service watch as goroutine
2016-11-28 08:00:41 +01:00
Fabian Reinartz
35da23fd82
consul: start service watch as goroutine
2016-11-27 11:01:16 +01:00
Fabian Reinartz
56f57a826f
Merge pull request #2219 from prometheus/builderimg
...
circle: update golang-builder image version
2016-11-25 16:05:53 +01:00
Fabian Reinartz
340de6c31c
circle: update golang-builder image version
2016-11-25 14:29:07 +01:00
Fabian Reinartz
ecad074e46
Merge pull request #2218 from prometheus/cut-1.4.0
...
*: cut 1.4.0
2016-11-25 13:35:04 +01:00
Fabian Reinartz
80455950ee
*: cut 1.4.0
2016-11-25 13:28:29 +01:00