Commit graph

3658 commits

Author SHA1 Message Date
beorn7 253be23c00 storage: Sanity-check number of loaded chunk descs
Two cases:

- An unarchived metric must have at least one chunk desc loaded upon
  unarchival. Otherwise, the file is gone or has size 0, which is an
  inconsistency (because the series is still indexed in the archive
  index). Hence, quarantining is triggered.

- If loading the chunk descs of a series with a known chunkDescsOffset
  (i.e. != -1), the number of chunks loaded must be equal to
  chunkDescsOffset. If not, there is a data corruption. An error is
  returned, which leads to qurantining.

In any case, there is a guard added to not access the 1st element of
an empty chunkDescs slice. (That's what triggered the crashes in issue
2249.)  A time series with unknown chunkDescsOffset and no chunks in
memory and no chunks on disk either could trigger that case. I would
assume such a "null series" doesn't exist, but it's not entirely
unthinkable and unreasonable to happen (perhaps in future uses of the
storage). (Create a series, and then something tries to preload chunks
before the first sample is added.)
2016-12-13 23:19:39 +01:00
Björn Rabenstein 5f0c0e43cf Merge pull request #2276 from prometheus/beorn7/storage
storage: Catch data corruption that leads to division by zero
2016-12-13 23:13:39 +01:00
Björn Rabenstein a4c8292232 Merge pull request #2278 from prometheus/beorn7/style
storage: Fix linter issue
2016-12-13 23:13:05 +01:00
beorn7 837c029b16 storage: Fix linter issue
Go style tries to avoid indented `else` blocks.
2016-12-13 19:05:30 +01:00
Brian Brazil c8de1484d5 Add scrape_samples_post_metric_relabeling
This reports the number of samples post any keep/drop
from metric relabelling.
2016-12-13 17:32:11 +00:00
Brian Brazil 06b9df65ec Refactor and add unittests to scrape result handling. 2016-12-13 16:49:17 +00:00
Björn Rabenstein 568fd8a8cb Merge pull request #2155 from prometheus/beorn7/vendoring2
Update vendoring for Azure
2016-12-13 17:10:59 +01:00
beorn7 4719482f5f storage: Make tests go-vet and golint clean 2016-12-13 17:07:27 +01:00
beorn7 485ac8dff7 storage: Verify validity of byte length when unmarshalling (double)delta chunks
This makes sure a division-by-zero crash cannot happen in the Len()
method.

Fixes #2773
2016-12-13 17:07:27 +01:00
Brian Brazil b5ded43594 Allow buffering of scraped samples before sending them to storage. 2016-12-13 15:01:35 +00:00
beorn7 906c3a2237 Update vendoring for Azure
Also, actually record the vendored version in vendor.json.
2016-12-13 14:21:16 +01:00
tattsun e714079cf2 storage: fix error message (#2270)
* storage: add error message
2016-12-09 22:36:27 +00:00
Fabian Reinartz 9ecea36ef9 Merge pull request #2259 from prometheus/federationerr
web: don't return federation errors over HTTP
2016-12-06 16:18:03 +01:00
Fabian Reinartz cef2e04aa3 web: add error counter for federation responses 2016-12-06 16:09:50 +01:00
Fabian Reinartz 0ea0a19848 Merge pull request #2240 from agaoglu/read-timeout
Set read-timeout for http.Server
2016-12-06 16:01:45 +01:00
Fabian Reinartz 9d68e81b32 web: don't return federation errors over HTTP
We are writing federation responses streaming. So after
the first byte we wrote, the status header is fixed. We cannot
return an HTTP error for intermediate error but should just abort
and log instead.
2016-12-06 15:52:50 +01:00
Erdem Agaoglu 054f8ebbfb Increase default max-connections 2016-12-06 17:45:19 +03:00
Erdem Agaoglu 2260079c12 Vendor x/net/netutil 2016-12-06 12:52:29 +03:00
Erdem Agaoglu e487477a17 LimitListener to limit max number of connections
This also drops tcp keep-alive in ListenAndServe but it's no longer
necessary since we now close idle connections long before that.
2016-12-06 12:45:59 +03:00
Fabian Reinartz 893390e0c6 Merge pull request #2248 from msiebuhr/cwd-in-status
web: Display current working directory on status-page
2016-12-05 21:41:37 +01:00
Morten Siebuhr c5b17263a6 web: Display current working directory on status-page 2016-12-05 19:46:41 +01:00
Björn Rabenstein a932c1a4b6 Merge pull request #1794 from cmluciano/cml/persistenceerror
Clarify error message when Prometheus data dir finds unexpected files
2016-12-05 18:40:51 +01:00
Christopher M. Luciano 148b006e25 Clarify error message when Prometheus data dir finds unexpected files 2016-12-05 10:51:57 -05:00
Fabian Reinartz 0459dcd2e2 Merge pull request #2234 from brancz/targets-api
web/api: add targets endpoint
2016-12-05 14:14:04 +01:00
Frederic Branczyk 33b583d50e
web/api: add targets endpoint 2016-12-05 13:13:21 +01:00
Frederic Branczyk 8f8cea4fbd
retrieval: refactor TargetManager to return flat list of Targets 2016-12-02 13:28:58 +01:00
Erdem Agaoglu 9986b28380 Set read-timeout for http.Server
This also specifies a timeout for idle client connections, which may
cause "too many open files" errors.
See #2238
2016-12-01 16:29:45 +03:00
Fabian Reinartz 63fe65bf2f Merge pull request #2235 from prometheus/beorn7/doc
Kubernetes SD: More fixes to example config
2016-11-30 09:55:09 +01:00
beorn7 5770d9e545 Kubernetes SD: More fixes to example config
- Avoid mentioning the `in_cluster` option. (It doesn't exist anymore.)
- Replace `__meta_kubernetes_service_namespace` and
  `__meta_kubernetes_pod_namespace` (which don't exist anymore) by
  `__meta_kubernetes_namespace`.
2016-11-29 18:42:35 +01:00
Fabian Reinartz 2a89e8733f Merge pull request #2230 from prometheus/cut-1.4.1
*: cut 1.4.1
2016-11-28 09:33:26 +01:00
Fabian Reinartz 6be1e98278 *: cut 1.4.1 2016-11-28 09:29:23 +01:00
Fabian Reinartz d95e61d418 Merge pull request #2223 from prometheus/consulfix
consul: start service watch as goroutine
2016-11-28 08:00:41 +01:00
Fabian Reinartz 35da23fd82 consul: start service watch as goroutine 2016-11-27 11:01:16 +01:00
Fabian Reinartz 56f57a826f Merge pull request #2219 from prometheus/builderimg
circle: update golang-builder image version
2016-11-25 16:05:53 +01:00
Fabian Reinartz 340de6c31c circle: update golang-builder image version 2016-11-25 14:29:07 +01:00
Fabian Reinartz ecad074e46 Merge pull request #2218 from prometheus/cut-1.4.0
*: cut 1.4.0
2016-11-25 13:35:04 +01:00
Fabian Reinartz 80455950ee *: cut 1.4.0 2016-11-25 13:28:29 +01:00
Fabian Reinartz b97f19a85e travis: update used Go compiler version 2016-11-25 13:28:19 +01:00
Fabian Reinartz 9b7f5c7f29 Merge pull request #2217 from prometheus/alertingsd
Extract alertmanager into interface
2016-11-25 11:28:38 +01:00
Fabian Reinartz 2ad56aabd4 notifier: extract alertmanager into interface 2016-11-25 11:19:43 +01:00
Fabian Reinartz cc35104504 config: fix naming and typo 2016-11-25 11:04:33 +01:00
Fabian Reinartz fd51ab46e5 Merge pull request #2215 from prometheus/alertingsd2
Discover Alertmanagers dynamically
2016-11-25 10:38:00 +01:00
Fabian Reinartz b1f28b48a3 Fix typo 2016-11-25 08:47:04 +01:00
Fabian Reinartz 3fb4d1191b config: rename AlertingConfig, resolve file paths 2016-11-24 15:19:37 +01:00
Fabian Reinartz d4deb8bbf2 web: show discovered Alertmanagers in UI 2016-11-24 15:06:50 +01:00
Fabian Reinartz f210d96497 notifier: use dynamic service discovery 2016-11-23 18:23:37 +01:00
Fabian Reinartz 183c5749b9 config: add Alertmanager configuration 2016-11-23 18:23:37 +01:00
Fabian Reinartz 200bbe1bad config: extract SD and HTTPClient configurations 2016-11-23 18:23:37 +01:00
Fabian Reinartz dd1a656cc4 Merge pull request #2212 from prometheus/alertingsd
Extract discovery package
2016-11-23 17:46:48 +01:00
Fabian Reinartz 47623202c7 retrieval: remove metric namespaces 2016-11-23 09:17:04 +01:00