Commit graph

49 commits

Author SHA1 Message Date
Goutham Veeramachaneni 0d0c9d5440
Move Registerer to Config Struct in Notifier 2017-03-31 21:20:12 +05:30
Julius Volz beb3c4b389 Remove legacy remote storage implementations
This removes legacy support for specific remote storage systems in favor
of only offering the generic remote write protocol. An example bridge
application that translates from the generic protocol to each of those
legacy backends is still provided at:

documentation/examples/remote_storage/remote_storage_bridge

See also https://github.com/prometheus/prometheus/issues/10

The next step in the plan is to re-add support for multiple remote
storages.
2017-02-14 17:52:05 +01:00
Bartek Plotka 579e33f19a Fixed style issues. 2017-01-16 16:45:58 +00:00
Bartek Plotka d7febe97fa Fixed regression in -alertmanager.url flag. Basic auth was ignored.
- Included basic auth parsing while parsing to AlertmanagerConfig
- Added test case

Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2017-01-16 16:39:20 +00:00
Erdem Agaoglu 054f8ebbfb Increase default max-connections 2016-12-06 17:45:19 +03:00
Erdem Agaoglu e487477a17 LimitListener to limit max number of connections
This also drops tcp keep-alive in ListenAndServe but it's no longer
necessary since we now close idle connections long before that.
2016-12-06 12:45:59 +03:00
Erdem Agaoglu 9986b28380 Set read-timeout for http.Server
This also specifies a timeout for idle client connections, which may
cause "too many open files" errors.
See #2238
2016-12-01 16:29:45 +03:00
Fabian Reinartz f210d96497 notifier: use dynamic service discovery 2016-11-23 18:23:37 +01:00
beorn7 5c41ca84e5 Catch negative staleness delta set on the command line 2016-11-01 15:17:59 +01:00
Brian Brazil 6bc29ba857 Fix regression from #1957, specify non-zero default timeout. (#2121)
Fixes #2075
2016-10-26 14:47:41 +01:00
Julius Volz ab80ced756 storage: separate chunk package, publish more names
This is a followup to https://github.com/prometheus/prometheus/pull/2011.

This publishes more of the methods and other names of the chunk code and
moves the chunk code to its own package. There's some unavoidable
ugliness: the chunk and chunkDesc metrics are used by both packages, so
I had to move them to the chunk package. That isn't great, but I don't
see how to do it better without a larger redesign of everything. Same
for the evict requests and some other types.
2016-09-26 13:25:11 +02:00
Tom Wilkie 4520e12440 Add HTTP Basic Auth & TLS support to the generic write path. (#1957)
* Add config, HTTP Basic Auth and TLS support to the generic write path.

- Move generic write path configuration to the config file
- Factor out config.TLSConfig -> tlf.Config translation
- Support TLSConfig for generic remote storage
- Rename Run to Start, and make it non-blocking.
- Dedupe code in httputil for TLS config.
- Make remote queue metrics global.
2016-09-19 22:47:51 +02:00
Julius Volz 5f5a78e807 Merge pull request #1974 from prometheus/disable-local-storage
Allow disabling local storage.
2016-09-17 18:40:01 +02:00
Tom Wilkie d83879210c Switch back to protos over HTTP, instead of GRPC.
My aim is to support the new grpc generic write path in Frankenstein.  On the surface this seems easy - however I've hit a number of problems that make me think it might be better to not use grpc just yet.

The explanation of the problems requires a little background.  At weave, traffic to frankenstein need to go through a couple of services first, for SSL and to be authenticated.  So traffic goes:

    internet -> frontend -> authfe -> frankenstein

- The frontend is Nginx, and adds/removes SSL.  Its done this way for legacy reasons, so the certs can be managed in one place, although eventually we imagine we'll merge it with authfe.  All traffic from frontend is sent to authfe.
- Authfe checks the auth tokens / cookie etc and then picks the service to forward the RPC to.
- Frankenstein accepts the reads and does the right thing with them.

First problem I hit was Nginx won't proxy http2 requests - it can accept them, but all calls downstream are http1 (see https://trac.nginx.org/nginx/ticket/923).  This wasn't such a big deal, so it now looks like:

    internet --(grpc/http2)--> frontend --(grpc/http1)--> authfe --(grpc/http1)--> frankenstein

Next problem was golang grpc server won't accept http1 requests (see https://groups.google.com/forum/#!topic/grpc-io/JnjCYGPMUms).  It is possible to link a grpc server in with a normal go http mux, as long as the mux server is serving over SSL, as the golang http client & server won't do http2 over anything other than an SSL connection.  This would require making all our service to service comms SSL.  So I had a go a writing a grpc http1 server, and got pretty far.  But is was a bit of a mess.

So finally I thought I'd make a separate grpc frontend for this, running in parallel with the frontend/authfe combo on a different port - and first up I'd need a grpc reverse proxy.  Ideally we'd have some nice, generic reverse proxy that only knew about a map from service names -> downstream service, and didn't need to decode & re-encode every request as it went through.  It seems like this can't be done with golang's grpc library - see https://github.com/mwitkow/grpc-proxy/issues/1.

And then I was surprised to find you can't do grpc from browsers! See http://www.grpc.io/faq/ - not important to us, but I'm starting to question why we decided to use grpc in the first place?

It would seem we could have most of the benefits of grpc with protos over HTTP, and this wouldn't preclude moving to grpc when its a bit more mature?  In fact, the grcp FAQ even admits as much:

> Why is gRPC better than any binary blob over HTTP/2?
> This is largely what gRPC is on the wire.
2016-09-15 23:21:54 +01:00
Julius Volz b24e5d63bc Add noop local storage engine.
This adds a flag -storage.local.engine which allows turning off local
storage in Prometheus. Instead of adding if-conditions and nil checks to
all parts of Prometheus that deal with Prometheus's local storage
(including the web interface), disabling local storage simply means
replacing the normal local storage with a noop version that throws
samples away and returns empty query results. We also don't add the noop
storage to the fanout appender to decrease internal overhead.

Instead of returning empty results, an alternate behavior could be to
return errors on any query that point out that the local storage is
disabled. Not sure which one is more preferable, so I went with the
empty result option for now.
2016-09-14 13:18:05 +02:00
Julius Volz a88e950d1f Mark remote write address flag as experimental. 2016-09-01 00:58:53 +02:00
Julius Volz aa3f2b7216 Generic write cleanups and changes.
- fold metric name into labels
- return initialization errors back to main
- add snappy compression
- better context handling
- pre-allocation of labels
- remove generic naming
- other cleanups
2016-08-30 17:24:48 +02:00
Brian Brazil 36d2c4bd0b Add generic write path using grpc.
This uses a new proto format, with scope for multiple samples per
timeseries in future. This will allow users to pump samples out to
whatever they like without having to change the core Prometheus code.

There's also an example receiver to save users figuring out the
boilerplate themselves.
2016-08-30 17:19:18 +02:00
Julius Volz 08891beb5f Merge pull request #1828 from drawks/iss-1821
Error on non-flag commandline arguments
2016-07-21 00:35:53 +02:00
Dave Rawks 00ea36cdbe Error on non-flag commandline arguments
- Added minor cmdline parsing logic change to bail on
  unconsumed arguments. Fixes #1821
2016-07-20 10:28:26 -07:00
beorn7 bf6201483c Improve wording on log flag comment 2016-07-20 17:32:42 +02:00
beorn7 25385aafcb Explicitly add logging flags to our custom flag set
In https://github.com/prometheus/prometheus/pull/1782 , we moved to a
custom flag set to avoid getting test flags into the main prometheus
binary. However, that removed the logging flags, too. This commit
updates the vendoring to a version of the log package that allows
adding the log flags to our flag set explicitly.
2016-07-20 17:27:39 +02:00
Fabian Reinartz 59d26e8536 web: add -web.route-prefix flag
Fixes #1191
2016-07-07 11:49:16 +02:00
Fabian Reinartz 8c24dfdb86 cmd/prometheus: use own flag set
Fixes #1743
2016-07-03 14:23:31 +02:00
Fabian Reinartz dd57e7ef5c Merge pull request #1699 from prometheus/fabxc-multiam
notifier: dispatch to multiple Alertmanagers
2016-06-06 12:01:41 +02:00
Fabian Reinartz 9baf120cd5 notifier: dispatch to multiple Alertmanagers
This commit extends the notifier to dispatch alert batches
to multiple Alertmanagers concurrently.
It changes the `-alertmanager.url` flag to accept a comma
separated list of URLs and/or to be set multiple times.
2016-06-06 11:41:10 +02:00
beorn7 99881ded63 Make the number of fingerprint mutexes configurable
With a lot of series accessed in a short timeframe (by a query, a
large scrape, checkpointing, ...), there is actually quite a
significant amount of lock contention if something similar is running
at the same time.

In those cases, the number of locks needs to be increased.

On the same front, as our fingerprints don't have a lot of entropy, I
introduced some additional shuffling. With the current state, anly
changes in the least singificant bits of a FP would matter.
2016-06-02 19:18:00 +02:00
beorn7 865d16f870 Rename Gorilla into varbit 2016-03-23 16:30:41 +01:00
beorn7 8cdced3850 Implement Gorilla-inspired chunk encoding
This is not a verbatim implementation of the Gorilla encoding.  First
of all, it could not, even if we wanted, because Prometheus has a
different chunking model (constant size, not constant time).  Second,
this adds a number of changes that improve the encoding in general or
at least for the specific use case of Prometheus (and are partially
only possible in the context of Prometheus). See comments in the code
for details.
2016-03-17 14:47:08 +01:00
Tobias Schmidt 7763bbd993 Validate alertmanager URL 2016-03-07 20:07:17 -05:00
Fabian Reinartz bfa8aaa017 Rename notification to notifier 2016-03-01 12:39:08 +01:00
beorn7 ec08c9a391 Rework the way to communicate backpressure (AKA suspended ingestion)
This gives up on the idea to communicate throuh the Append() call (by
either not returning as it is now or returning an error as
suggested/explored elsewhere). Here I have added a Throttled() call,
which has the advantage that it can be called before a whole _batch_
of Append()'s. Scrapes will happen completely or not at all. Same for
rule group evaluations. That's a highly desired behavior (as discussed
elsewhere). The code is even simpler now as the whole ingestion buffer
could be removed.

Logging of throttled mode has been streamlined and will create at most
one message per minute.
2016-02-01 14:45:44 +01:00
Fabian Reinartz d9f836e5b8 Merge pull request #1340 from prometheus/validate-externa-url
Validate URL parameters
2016-01-27 15:49:08 +01:00
beorn7 a2cd479058 Fix calculation of chunks to persist after restart
Since we are not overestimating the number of chunks to persist
anymore, this commit also adjusts the default value for
-storage.local.memory-chunks. Update of documentation will follow.
2016-01-25 19:33:51 +01:00
Tobias Schmidt 122d73858d Validate URL parameters 2016-01-25 00:37:09 -05:00
beorn7 4221c7de5c Improve handling of series file truncation
If only very few chunks are to be truncated from a very large series
file, the rewrite of the file is a lorge overhead. With this change, a
certain ratio of the file has to be dropped to make it happen. While
only causing disk overhead at about the same ratio (by default 10%),
it will cut down I/O by a lot in above scenario.
2016-01-11 16:42:10 +01:00
Julius Volz 87d1831f12 Document INFLUXDB_PW env var in username flag
Fixes https://github.com/prometheus/prometheus/issues/1281
2016-01-04 00:18:41 +01:00
Fabian Reinartz 2c8a96ecdc Adjust notification handler flags 2015-12-11 15:17:32 +01:00
Fabian Reinartz e114ce0ff7 Refactor notification handler 2015-12-11 15:17:32 +01:00
Fabian Reinartz a542cc8609 Remove -web.use-local-assets 2015-11-11 17:58:03 +01:00
Corentin Chary a2e4439086 Add support for remote storage on Graphite
Allows to use graphite over tcp or udp. Metrics labels
and values are used to construct a valid Graphite path
in a way that will allow us to eventually read them back
and reconstruct the metrics.

For example, this metric:

model.Metric{
	model.MetricNameLabel: "test:metric",
	"testlabel":           "test:value",
	"testlabel2":           "test:value",
)

Will become:

test:metric.testlabel=test:value.testlabel2=test:value

escape.go takes care of escaping values to match Graphite
character set, it basically uses percent-encoding as a fallback
wich will work pretty will in the graphite/grafana world.

The remote storage module also has an optional 'prefix' parameter
to prefix all metrics with a path (for example, 'prometheus.').

Graphite URLs are simply in the form tcp://host:port or
udp://host:port.
2015-11-10 07:58:57 +01:00
Fabian Reinartz e3b6ec9784 Switch to common/log 2015-10-03 10:21:43 +02:00
Julius Volz 24d0d9190e Make -web.external-url help string more verbose. 2015-09-16 20:35:23 +02:00
Julius Volz eeb1da36ac Fix InfluxDB write support to work with InfluxDB 0.9.x.
Because the InfluxDB client library currently pulls in multiple MBs of
unnecessary dependencies, I have modified and cut up the vendored
version to only pull in the few pieces that are actually needed.

On InfluxDB's side, this dependency issue is tracked in:

https://github.com/influxdb/influxdb/issues/3447

Hopefully, it will be resolved soon.

If a password is needed for InfluxDB, it may be supplied via the
INFLUXDB_PW environment variable.
2015-09-16 17:40:03 +02:00
Julius Volz af513468eb Fix some dead code, missing error checks, shadowings.
I applied
https://medium.com/@jgautheron/quality-pipeline-for-go-projects-497e34d6567
and was greeted with a deluge of warnings, most of which were not
applicable or really fixable realistically. These are some of the first
ones I decided to fix.
2015-09-14 12:21:34 +02:00
Julius Volz fcff35b43e Consolidate external reachability flags into one.
Besides fixing https://github.com/prometheus/prometheus/issues/805 by
making the entire externally reachable server URL configurable, this
adds tests for the "globalURL" template function and makes it easier to
test other such functions in the future.

This breaks the `web.Hostname` flag (and introduces `web.external-url`).
This flag is likely only used by few users, so I hope that's
justifiable.

Fixes https://github.com/prometheus/prometheus/issues/805
2015-07-03 13:39:10 +02:00
Fabian Reinartz 525070419b cmd/prometheus: improve help output 2015-06-25 18:53:51 +02:00
Fabian Reinartz 23e77450ff main: cleanup initialization of remote storage. 2015-06-23 18:24:48 +02:00
Fabian Reinartz de66e32a4d cmd/prometheus: create new main package. 2015-06-15 19:01:06 +02:00