prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-12-28 23:19:41 -08:00

Author	SHA1	Message	Date
Fabian Reinartz	5e57fa85c0	Merge pull request #2016 from mattbostock/fix_race_between_reload_and_stop rules/manager.go: Fix race between reload and stop	2016-09-22 19:30:56 +02:00
Fabian Reinartz	71b332278d	Merge pull request #2021 from prometheus/beorn7/storage Avoid `defer` in seriesMap.get	2016-09-22 18:12:21 +02:00
beorn7	ca98382943	Avoid `defer` in seriesMap.get This is related to https://github.com/golang/go/issues/14939 . It's probably the only occurrence where it matters.	2016-09-22 17:50:58 +02:00
Brian Brazil	2ee00b4461	Merge pull request #2019 from dominikschulz/ec2state Expose ec2_instance_state	2016-09-22 14:57:19 +01:00
Dominik Schulz	f6fbcf9aa2	Expose ec2_instance_state	2016-09-22 15:01:23 +02:00
Fabian Reinartz	e5c633ed14	Merge pull request #2015 from mattbostock/fix_typo cmd/prometheus/main.go: Fix typo in comment	2016-09-21 23:41:06 +02:00
Matt Bostock	926a5ab3dd	rules/manager.go: Fix race between reload and stop On one relatively large Prometheus instance (1.7M series), I noticed that upgrades were frequently resulting in Prometheus undergoing crash recovery on start-up. On closer examination, I found that Prometheus was panicking on shutdown. It seems that our configuration management (or misconfiguration thereof) is reloading Prometheus then immediately restarting it, which I suspect is causing this race: Sep 21 15:12:42 host systemd[1]: Reloading prometheus monitoring system. Sep 21 15:12:42 host prometheus[18734]: time="2016-09-21T15:12:42Z" level=info msg="Loading configuration file /etc/prometheus/config.yaml" source="main.go:221" Sep 21 15:12:42 host systemd[1]: Reloaded prometheus monitoring system. Sep 21 15:12:44 host systemd[1]: Stopping prometheus monitoring system... Sep 21 15:12:44 host prometheus[18734]: time="2016-09-21T15:12:44Z" level=warning msg="Received SIGTERM, exiting gracefully..." source="main.go:203" Sep 21 15:12:44 host prometheus[18734]: time="2016-09-21T15:12:44Z" level=info msg="See you next time!" source="main.go:210" Sep 21 15:12:44 host prometheus[18734]: time="2016-09-21T15:12:44Z" level=info msg="Stopping target manager..." source="targetmanager.go:90" Sep 21 15:12:52 host prometheus[18734]: time="2016-09-21T15:12:52Z" level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:548" Sep 21 15:12:56 host prometheus[18734]: time="2016-09-21T15:12:56Z" level=warning msg="Error on ingesting out-of-order samples" numDropped=1 source="scrape.go:467" Sep 21 15:12:56 host prometheus[18734]: time="2016-09-21T15:12:56Z" level=error msg="Error adding file watch for \"/etc/prometheus/targets\": no such file or directory" source="file.go:84" Sep 21 15:12:56 host prometheus[18734]: time="2016-09-21T15:12:56Z" level=error msg="Error adding file watch for \"/etc/prometheus/targets\": no such file or directory" source="file.go:84" Sep 21 15:13:01 host prometheus[18734]: time="2016-09-21T15:13:01Z" level=info msg="Stopping rule manager..." source="manager.go:366" Sep 21 15:13:01 host prometheus[18734]: time="2016-09-21T15:13:01Z" level=info msg="Rule manager stopped." source="manager.go:372" Sep 21 15:13:01 host prometheus[18734]: time="2016-09-21T15:13:01Z" level=info msg="Stopping notification handler..." source="notifier.go:325" Sep 21 15:13:01 host prometheus[18734]: time="2016-09-21T15:13:01Z" level=info msg="Stopping local storage..." source="storage.go:381" Sep 21 15:13:01 host prometheus[18734]: time="2016-09-21T15:13:01Z" level=info msg="Stopping maintenance loop..." source="storage.go:383" Sep 21 15:13:01 host prometheus[18734]: panic: close of closed channel Sep 21 15:13:01 host prometheus[18734]: goroutine 7686074 [running]: Sep 21 15:13:01 host prometheus[18734]: panic(0xba57a0, 0xc60c42b500) Sep 21 15:13:01 host prometheus[18734]: /usr/local/go/src/runtime/panic.go:500 +0x1a1 Sep 21 15:13:01 host prometheus[18734]: github.com/prometheus/prometheus/rules.(Manager).ApplyConfig.func1(0xc6645a9901, 0xc420271ef0, 0xc420338ed0, 0xc60c42b4f0, 0xc6645a9900) Sep 21 15:13:01 host prometheus[18734]: /home/build/packages/prometheus/tmp/build/gopath/src/github.com/prometheus/prometheus/rules/manager.go:412 +0x3c Sep 21 15:13:01 host prometheus[18734]: created by github.com/prometheus/prometheus/rules.(Manager).ApplyConfig Sep 21 15:13:01 host prometheus[18734]: /home/build/packages/prometheus/tmp/build/gopath/src/github.com/prometheus/prometheus/rules/manager.go:423 +0x56b Sep 21 15:13:03 host systemd[1]: prometheus.service: main process exited, code=exited, status=2/INVALIDARGUMENT	2016-09-21 22:03:02 +01:00
Matt Bostock	dd98766b32	cmd/prometheus/main.go: Fix typo in comment	2016-09-21 21:59:25 +01:00
Matthew Campbell	67d76e3a5d	timeseries: store varbit encoded data into cassandra	2016-09-21 17:56:55 +02:00
Julius Volz	f92532f254	api: Consolidate web API contexts This is based on the common/route changes in https://github.com/prometheus/common/pull/61.	2016-09-21 03:22:20 +02:00
Julius Volz	766074568e	Update vendoring of github.com/prometheus/common/route	2016-09-21 03:22:14 +02:00
Tom Wilkie	4520e12440	Add HTTP Basic Auth & TLS support to the generic write path. (#1957 ) * Add config, HTTP Basic Auth and TLS support to the generic write path. - Move generic write path configuration to the config file - Factor out config.TLSConfig -> tlf.Config translation - Support TLSConfig for generic remote storage - Rename Run to Start, and make it non-blocking. - Dedupe code in httputil for TLS config. - Make remote queue metrics global.	2016-09-19 22:47:51 +02:00
Julius Volz	6dda28dbd4	Merge pull request #2000 from prometheus/contextify-storage Contextify storage and PromQL interfaces.	2016-09-19 16:31:12 +02:00
Julius Volz	c187308366	storage: Contextify storage interfaces. This is based on https://github.com/prometheus/prometheus/pull/1997. This adds contexts to the relevant Storage methods and already passes PromQL's new per-query context into the storage's query methods. The immediate motivation supporting multi-tenancy in Frankenstein, but this could also be used by Prometheus's normal local storage to support cancellations and timeouts at some point.	2016-09-19 16:29:07 +02:00
Julius Volz	ed5a0f0abe	promql: Allow per-query contexts. For Weaveworks' Frankenstein, we need to support multitenancy. In Frankenstein, we initially solved this without modifying the promql package at all: we constructed a new promql.Engine for every query and injected a storage implementation into that engine which would be primed to only collect data for a given user. This is problematic to upstream, however. Prometheus assumes that there is only one engine: the query concurrency gate is part of the engine, and the engine contains one central cancellable context to shut down all queries. Also, creating a new engine for every query seems like overkill. Thus, we want to be able to pass per-query contexts into a single engine. This change gets rid of the promql.Engine's built-in base context and allows passing in a per-query context instead. Central cancellation of all queries is still possible by deriving all passed-in contexts from one central one, but this is now the responsibility of the caller. The central query context is now created in main() and passed into the relevant components (web handler / API, rule manager). In a next step, the per-query context would have to be passed to the storage implementation, so that the storage can implement multi-tenancy or other features based on the contextual information.	2016-09-19 15:38:17 +02:00
Julius Volz	c9c2663a54	Merge pull request #2004 from redbaron/no-false-sharing Avoid having contended mutexes on same cacheline	2016-09-19 00:40:03 +02:00
Maxim Ivanov	bdc53098fc	Avoid having contended mutexes on same cacheline CPUs have to serialise write access to a single cache line effectively reducing level of possible parallelism. Placing mutexes on different cache lines avoids this problem. Most gains will be seen on NUMA servers where CPU interconnect traffic is especially expensive Before: go test . -run none -bench BenchmarkFingerprintLocker BenchmarkFingerprintLockerParallel-4 2000000 932 ns/op BenchmarkFingerprintLockerSerial-4 30000000 49.6 ns/op After: go test . -run none -bench BenchmarkFingerprintLocker BenchmarkFingerprintLockerParallel-4 3000000 569 ns/op BenchmarkFingerprintLockerSerial-4 30000000 51.0 ns/op	2016-09-18 23:32:55 +01:00
Julius Volz	5f5a78e807	Merge pull request #1974 from prometheus/disable-local-storage Allow disabling local storage.	2016-09-17 18:40:01 +02:00
Julius Volz	06199268b5	Merge pull request #2003 from mattbostock/remove_json_from_accept Scrape: Remove JSON from Accept request header	2016-09-17 14:41:27 +02:00
Matt Bostock	4fc619b605	Scrape: Remove JSON from Accept request header JSON is no longer supported as an exposition format [1] [2] [3]. Remove it from the `Accept` header added to requests when scraping targets. [1]: https://github.com/prometheus/prometheus/blob/master/CHANGELOG.md#100--2016-07-18 [2]: https://prometheus.io/docs/instrumenting/exposition_formats/#historical-versions [3]: https://docs.google.com/document/d/1ZjyKiKxZV83VI9ZKAXRGKaUKK2BIWCT7oiGBKDBpjEY/edit?usp=sharing	2016-09-17 10:28:03 +01:00
Julius Volz	a9b96be3fd	Merge pull request #2002 from grandbora/graphPageUrlClientSideMigration Graph page url client side migration	2016-09-17 00:26:09 +02:00
Bora Tunca	2e9de70267	generate assets	2016-09-16 18:20:12 -04:00
Bora Tunca	44377dc458	Add backward compatibility to old query format	2016-09-16 18:20:00 -04:00
Brian Brazil	5ac89fbc1f	Merge pull request #1985 from tommyulfsparre/resync Resync state after ZooKeeper failure	2016-09-16 20:48:04 +01:00
Tobias Schmidt	874cb44bb6	Merge pull request #1996 from ton31337/Fix/allow_numbers_as_first_letter Allow number to be the first letter as well for `job_name`	2016-09-16 11:08:52 -04:00
beorn7	1f2785ebb7	Merge branch 'release-1.1'	2016-09-16 16:33:28 +02:00
Björn Rabenstein	ac374aa674	Merge pull request #2001 from prometheus/beorn7/release Cut v1.1.3	2016-09-16 13:33:59 +02:00
beorn7	f4656acd1f	Cut v1.1.3	2016-09-16 13:08:16 +02:00
Donatas Abraitis	1aa8898b66	Allow number to be the first letter as well for `job_name`	2016-09-16 14:06:47 +03:00
Tobias Schmidt	304f971e3e	Merge pull request #1769 from gottwald/gce-discovery GCE discovery	2016-09-16 03:11:51 -04:00
Ingo Gottwald	3b546d061f	Add support for GCE discovery	2016-09-16 08:55:33 +02:00
Ingo Gottwald	fefcd6eef2	Add deps for google cloud support	2016-09-16 08:51:58 +02:00
Tobias Schmidt	4b850970a2	Merge pull request #1999 from prometheus/print-promu Add promu installation logging to Makefile	2016-09-15 19:12:15 -04:00
Julius Volz	099adab253	Merge pull request #1987 from tomwilkie/1982-die-grpc-die Switch back to protos over HTTP, instead of grpc.	2016-09-16 01:01:47 +02:00
Julius Volz	92d60ba4c0	Add promu installation logging to Makefile Due to bad GitHub connectivity, "make" frequently got stuck at the promu step for me, and I was thinking that "format" was taking a long time because the promu step wasn't logged. All other Makefile targets have log statements...	2016-09-16 00:59:56 +02:00
Tom Wilkie	d83879210c	Switch back to protos over HTTP, instead of GRPC. My aim is to support the new grpc generic write path in Frankenstein. On the surface this seems easy - however I've hit a number of problems that make me think it might be better to not use grpc just yet. The explanation of the problems requires a little background. At weave, traffic to frankenstein need to go through a couple of services first, for SSL and to be authenticated. So traffic goes: internet -> frontend -> authfe -> frankenstein - The frontend is Nginx, and adds/removes SSL. Its done this way for legacy reasons, so the certs can be managed in one place, although eventually we imagine we'll merge it with authfe. All traffic from frontend is sent to authfe. - Authfe checks the auth tokens / cookie etc and then picks the service to forward the RPC to. - Frankenstein accepts the reads and does the right thing with them. First problem I hit was Nginx won't proxy http2 requests - it can accept them, but all calls downstream are http1 (see https://trac.nginx.org/nginx/ticket/923). This wasn't such a big deal, so it now looks like: internet --(grpc/http2)--> frontend --(grpc/http1)--> authfe --(grpc/http1)--> frankenstein Next problem was golang grpc server won't accept http1 requests (see https://groups.google.com/forum/#!topic/grpc-io/JnjCYGPMUms). It is possible to link a grpc server in with a normal go http mux, as long as the mux server is serving over SSL, as the golang http client & server won't do http2 over anything other than an SSL connection. This would require making all our service to service comms SSL. So I had a go a writing a grpc http1 server, and got pretty far. But is was a bit of a mess. So finally I thought I'd make a separate grpc frontend for this, running in parallel with the frontend/authfe combo on a different port - and first up I'd need a grpc reverse proxy. Ideally we'd have some nice, generic reverse proxy that only knew about a map from service names -> downstream service, and didn't need to decode & re-encode every request as it went through. It seems like this can't be done with golang's grpc library - see https://github.com/mwitkow/grpc-proxy/issues/1. And then I was surprised to find you can't do grpc from browsers! See http://www.grpc.io/faq/ - not important to us, but I'm starting to question why we decided to use grpc in the first place? It would seem we could have most of the benefits of grpc with protos over HTTP, and this wouldn't preclude moving to grpc when its a bit more mature? In fact, the grcp FAQ even admits as much: > Why is gRPC better than any binary blob over HTTP/2? > This is largely what gRPC is on the wire.	2016-09-15 23:21:54 +01:00
Tom Wilkie	e0989fde89	Remove grpc vendoring.	2016-09-15 23:15:56 +01:00
Tom Wilkie	bcd43e82c6	Add go_import_path to travis so it works on a fork. (#1995 )	2016-09-15 17:05:56 -04:00
Tobias Schmidt	3eef95962b	Merge pull request #1965 from prometheus/beorn7/federation federation: Collapse time series of the same name	2016-09-15 17:03:13 -04:00
beorn7	717dd8adac	web: add more federation test scenarios	2016-09-15 15:23:55 +02:00
beorn7	784a8ad7c5	web: Inline httptest.NewRequest because it only exists in Go1.7+	2016-09-15 15:06:36 +02:00
Fabian Reinartz	737ae60cea	Merge pull request #1993 from prometheus/grobie/include-go-report Link to goreport from README	2016-09-15 08:00:14 +02:00
Fabian Reinartz	4f7e6e8bf0	Merge pull request #1994 from prometheus/grobie/fix-small-issues Fix low hanging code issues	2016-09-15 07:57:55 +02:00
Tobias Schmidt	29ced0090f	Fix common english misspellings	2016-09-14 23:23:28 -04:00
Tobias Schmidt	e2c12dcdb5	Add missing error check in persistence test	2016-09-14 23:16:47 -04:00
Tobias Schmidt	27074863b4	Print url.URLs correctly in tests	2016-09-14 23:15:18 -04:00
Tobias Schmidt	8f3b62bfe4	Simplify struct initialization	2016-09-14 23:13:27 -04:00
Tobias Schmidt	b41a240c36	Link to goreport from README	2016-09-14 23:09:26 -04:00
beorn7	39c4915401	federation: Collapse time series of the same name This will avoid duplicate MetricFamilies, thereby shrinking the size of the federation payload and also creating legal text format. Also, add unit tests for federation. They were also needed for the previous state of the code, but were missing.	2016-09-14 19:35:20 +02:00
Julius Volz	b24e5d63bc	Add noop local storage engine. This adds a flag -storage.local.engine which allows turning off local storage in Prometheus. Instead of adding if-conditions and nil checks to all parts of Prometheus that deal with Prometheus's local storage (including the web interface), disabling local storage simply means replacing the normal local storage with a noop version that throws samples away and returns empty query results. We also don't add the noop storage to the fanout appender to decrease internal overhead. Instead of returning empty results, an alternate behavior could be to return errors on any query that point out that the local storage is disabled. Not sure which one is more preferable, so I went with the empty result option for now.	2016-09-14 13:18:05 +02:00

... 4 5 6 7 8 ...

3531 commits