* scrape kubelet metrics via api node proxy
* add manifests to setup serviceaccount, clusterrole and clusterrolebinding to work with rbac
* removed .cluster.local and added newline to address comments
The Kubernetes pod SD creates a target for each declared port, as documented:
https://prometheus.io/docs/operating/configuration/#pod
> The pod role discovers all pods and exposes their containers as targets. For
> each declared port of a container, a single target is generated. If a
> container has no specified ports, a port-free target per container is created
> for manually adding a port via relabeling.
This results in the default port being the declared port, or no port if none are
declared.
This change corrects a bug introduced by PR
https://github.com/prometheus/prometheus/pull/2427
The regex uses three groups: the hostname, an optional port, and the
prefered port from a kubernetes annotation.
Previously, the second group should have been ignored if a :port was not
present in the input. However, making the port group optional with the
"?" had the unintended side-effect of allowing the hostname regex "(.+)"
to match greedily, which included the ":port" patterns up to the ";"
separating the hostname from the kubernetes port annotation.
This change updates the regex for the hostname to match any non-":"
characters. This forces the regex to stop if a ":port" is present and
allow the second group to match the optional port.
This change updates port relabeling for pod and service discovery so the
relabeling regex matches addresses with or without declared ports. As
well, this change uses a consistent style in the replacement pattern
for the two expressions.
Previously, for both services or pods that did not have declared ports, the
relabel config regex would fail to match:
__meta_kubernetes_service_annotation_prometheus_io_port
regex: (.+)(?::\d+);(\d+)
__meta_kubernetes_pod_annotation_prometheus_io_port
regex: (.+):(?:\d+);(\d+)
Both regexes expected a <host>:<port> pattern.
The new regex matches addresses with or without declared ports by making
the :<port> pattern optional.
__meta_kubernetes_service_annotation_prometheus_io_port
__meta_kubernetes_pod_annotation_prometheus_io_port
regex: (.+)(?::\d+)?;(\d+)
This removes legacy support for specific remote storage systems in favor
of only offering the generic remote write protocol. An example bridge
application that translates from the generic protocol to each of those
legacy backends is still provided at:
documentation/examples/remote_storage/remote_storage_bridge
See also https://github.com/prometheus/prometheus/issues/10
The next step in the plan is to re-add support for multiple remote
storages.
In preparation for removing specific remote storage implementations,
this offers an example of how to achieve the same in a separate process.
Rather than having three separate bridges for OpenTSDB, InfluxDB, and
Graphite, I decided to support all in one binary.
For now, this is in the example documenation directory, but perhaps we
will want to make a first-class project / repository out of it.
My aim is to support the new grpc generic write path in Frankenstein. On the surface this seems easy - however I've hit a number of problems that make me think it might be better to not use grpc just yet.
The explanation of the problems requires a little background. At weave, traffic to frankenstein need to go through a couple of services first, for SSL and to be authenticated. So traffic goes:
internet -> frontend -> authfe -> frankenstein
- The frontend is Nginx, and adds/removes SSL. Its done this way for legacy reasons, so the certs can be managed in one place, although eventually we imagine we'll merge it with authfe. All traffic from frontend is sent to authfe.
- Authfe checks the auth tokens / cookie etc and then picks the service to forward the RPC to.
- Frankenstein accepts the reads and does the right thing with them.
First problem I hit was Nginx won't proxy http2 requests - it can accept them, but all calls downstream are http1 (see https://trac.nginx.org/nginx/ticket/923). This wasn't such a big deal, so it now looks like:
internet --(grpc/http2)--> frontend --(grpc/http1)--> authfe --(grpc/http1)--> frankenstein
Next problem was golang grpc server won't accept http1 requests (see https://groups.google.com/forum/#!topic/grpc-io/JnjCYGPMUms). It is possible to link a grpc server in with a normal go http mux, as long as the mux server is serving over SSL, as the golang http client & server won't do http2 over anything other than an SSL connection. This would require making all our service to service comms SSL. So I had a go a writing a grpc http1 server, and got pretty far. But is was a bit of a mess.
So finally I thought I'd make a separate grpc frontend for this, running in parallel with the frontend/authfe combo on a different port - and first up I'd need a grpc reverse proxy. Ideally we'd have some nice, generic reverse proxy that only knew about a map from service names -> downstream service, and didn't need to decode & re-encode every request as it went through. It seems like this can't be done with golang's grpc library - see https://github.com/mwitkow/grpc-proxy/issues/1.
And then I was surprised to find you can't do grpc from browsers! See http://www.grpc.io/faq/ - not important to us, but I'm starting to question why we decided to use grpc in the first place?
It would seem we could have most of the benefits of grpc with protos over HTTP, and this wouldn't preclude moving to grpc when its a bit more mature? In fact, the grcp FAQ even admits as much:
> Why is gRPC better than any binary blob over HTTP/2?
> This is largely what gRPC is on the wire.
- fold metric name into labels
- return initialization errors back to main
- add snappy compression
- better context handling
- pre-allocation of labels
- remove generic naming
- other cleanups
This uses a new proto format, with scope for multiple samples per
timeseries in future. This will allow users to pump samples out to
whatever they like without having to change the core Prometheus code.
There's also an example receiver to save users figuring out the
boilerplate themselves.
CA and Bearer Token are config of `kubernetes_sd_configs`, not the
`scrape_config`. Also updated misleading top-level comment and removed
unnecessary global config.
* Support multiple masters with retries against each master as required.
* Scrape masters' metrics.
* Add role meta label for node/service/master to make it easier for relabeling.