* optimize Linode SD by polling for event changes during refresh
Most accounts are fairly "static", in the sense that they're not cycling
through instances constantly. So rather than do a full refresh every
interval and potentially make several behind-the-scenes paginated API
calls, this will now poll the `/account/events/` endpoint every minute
with a list of events that we care about. If a matching event is found,
we then do a full refresh.
Co-authored-by: William Smith <wsmith@linode.com>
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Signed-off-by: William Smith <wsmith@linode.com>
* Fix: Use json.Unmarshal() instead of json.Decoder
See https://ahmet.im/blog/golang-json-decoder-pitfalls/
json.Decoder is for JSON streams, not single JSON objects / bodies.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Revert modifications to targetgroup parsing
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Write exemplars to the WAL and send them over remote write.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Update example for exemplars, print data in a more obvious format.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Add metrics for remote write of exemplars.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Fix incorrect slices passed to send in remote write.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* We need to unregister the new metrics.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address review comments
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Order of exemplar append vs write exemplar to WAL needs to change.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Several fixes to prevent sending uninitialized or incorrect samples with an exemplar. Fix dropping exemplar for missing series. Add tests for queue_manager sending exemplars
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Store both samples and exemplars in the same timeseries buffer to remove the alloc when building final request, keep sub-slices in separate buffers for re-use
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Condense sample/exemplar delivery tests to parameterized sub-tests
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Rename test methods for clarity now that they also handle exemplars
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Rename counter variable. Fix instances where metrics were not updated correctly
Signed-off-by: Martin Disibio <mdisibio@gmail.com>
* Add exemplars to LoadWAL benchmark
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* last exemplars timestamp metric needs to convert value to seconds with
ms precision
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Process exemplar records in a separate go routine when loading the WAL.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address review comments related to clarifying comments and variable
names. Also refactor sample/exemplar to enqueue prompb types.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Regenerate types proto with comments, update protoc version again.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Put remote write of exemplars behind a feature flag.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address some of Ganesh's review comments.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Move exemplar remote write feature flag to a config file field.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address Bartek's review comments.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Don't allocate exemplar buffers in queue_manager if we're not going to
send exemplars over remote write.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Add ValidateExemplar function, validate exemplars when appending to head
and log them all to WAL before adding them to exemplar storage.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address more reivew comments from Ganesh.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Add exemplar total label length check.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* Address a few last review comments
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Martin Disibio <mdisibio@gmail.com>
- Remove unrelated changes
- Refactor code out of the API module - that is already getting pretty crowded.
- Don't track reference for AddFast in remote write. This has the potential to consume unlimited server-side memory if a malicious client pushes a different label set for every series. For now, its easier and safer to always use the 'slow' path.
- Return 400 on out of order samples.
- Use remote.DecodeWriteRequest in the remote write adapters.
- Put this behing the 'remote-write-server' feature flag
- Add some (very) basic docs.
- Used named return & add test for commit error propagation
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
* Testify: move to require
Moving testify to require to fail tests early in case of errors.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* More moves
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* add networking.k8s.io for ingress
level=error ts=2020-10-19T08:32:30.544Z caller=klog.go:96 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:494: Failed to watch *v1beta1.Ingress: failed to list *v1beta1.Ingress: ingresses.networking.k8s.io is forbidden: User \"system:serviceaccount:monitoring:prometheus\" cannot list resource \"ingresses\" in API group \"networking.k8s.io\" at the cluster scope"
Signed-off-by: root <likerj@inspur.com>
* Update rbac-setup.yml
Signed-off-by: root <likerj@inspur.com>
* add test to custom-sd/adapter writeOutput() function
Signed-off-by: Benoit Gagnon <benoit.gagnon@ubisoft.com>
* fix Adapter.writeOutput() function to work on Windows
On that platform, files cannot be moved while a process holds a handle
to them. Added an explicit Close() before that move. With this change,
the unit test succeeds.
Signed-off-by: Benoit Gagnon <benoit.gagnon@ubisoft.com>
* add missing dot to comment
Signed-off-by: Benoit Gagnon <benoit.gagnon@ubisoft.com>
* [bugfix] custom SD: when ip out of order, reflect.deepEqual can not correctly identify whether there is a change
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
* [format] makefile:Makefile.common:116: common-style
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
* [bugfix] custom sd: simonpasquier comment,It would be simpler to sort the targets alphabetically and keep reflect.DeepEqual.
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
* [bugfix]custom SD:fix sort
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
* [bugfix] custom SD : adapter.go need an empty line after "sort"
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
* [bugfix]custom SD:test sign-off
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
* [bugfix]custom SD: fix adaper_test.go
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
From the documentation:
> The default HTTP client's Transport may not
> reuse HTTP/1.x "keep-alive" TCP connections if the Body is
> not read to completion and closed.
This effectively enable keep-alive for the fixed requests.
Signed-off-by: Romain Baugue <romain.baugue@elwinar.com>
i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors.
ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives.
iii) Does away with the use of fmt package for errors in favour of pkg/errors
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
Although it is spelling mistakes, it might make an affects while reading.
Co-Authored-By: Nguyen Phuong An <AnNP@vn.fujitsu.com>
Signed-off-by: Kim Bao Long <longkb@vn.fujitsu.com>
Fix http link to https link for secure, modify http to https
in the links of project. Have some http links doesn't
redirect into https.
Co-Authored-By: Nguyen Van Trung trungnv@vn.fujitsu.com
Signed-off-by: Nguyen Hai Truong <truongnh@vn.fujitsu.com>
* *: use latest release of staticcheck
It also fixes a couple of things in the code flagged by the additional
checks.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Use official release of staticcheck
Also run 'go list' before staticcheck to avoid failures when downloading packages.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* update promlog to latest version
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* Update api tests, fix main setup
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* tidy go.sum
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* revendor prometheus/common
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* only initialize config; use kingpin for remote_storage_adapter
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* actually parse the flags
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* clean up imports
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* *: remove use of golang.org/x/net/context
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* scrape: fix TestTargetScrapeScrapeCancel
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
There are many more (mostly finalizers like Close/Stop/etc.), but most of
the others seemed like one couldn't do much about them anyway.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
As alertmanager needs to be configured in the config file in Prometheus 2, I think it is useful to have it in the example config.
Also renamed the rules in the example config so they are explicitely yml files.
* k8s: Support discovery of ingresses
* Move additional labels below allocation
This makes it more obvious why the additional elements are allocated.
Also fix allocation for node where we only set a single label.
* k8s: Remove port from ingress discovery
* k8s: Add comment to ingress discovery example
Kubernetes 1.7+ no longer exposes cAdvisor metrics on the Kubelet
metrics endpoint. Update the example configuration to scrape cAdvisor
in addition to Kubelet. The provided configuration works for 1.7.3+
and commented notes are given for 1.7.2 and earlier versions.
Also remove the comment about node (Kubelet) CA not matching the master
CA. Since the example no longer connects directly to the nodes, it
doesn't matter what CA they're using.
References:
- https://github.com/kubernetes/kubernetes/issues/48483
- https://github.com/kubernetes/kubernetes/pull/49079
* Compress remote storage requests and responses with unframed/raw snappy, for compatibility with other languages.
* Remove backwards compatibility code from remote_storage_adapter, update example_write_adapter
* Add /documentation/examples/remote_storage/example_write_adapter/example_writer_adapter to .gitignore
* scrape kubelet metrics via api node proxy
* add manifests to setup serviceaccount, clusterrole and clusterrolebinding to work with rbac
* removed .cluster.local and added newline to address comments
The Kubernetes pod SD creates a target for each declared port, as documented:
https://prometheus.io/docs/operating/configuration/#pod
> The pod role discovers all pods and exposes their containers as targets. For
> each declared port of a container, a single target is generated. If a
> container has no specified ports, a port-free target per container is created
> for manually adding a port via relabeling.
This results in the default port being the declared port, or no port if none are
declared.
This change corrects a bug introduced by PR
https://github.com/prometheus/prometheus/pull/2427
The regex uses three groups: the hostname, an optional port, and the
prefered port from a kubernetes annotation.
Previously, the second group should have been ignored if a :port was not
present in the input. However, making the port group optional with the
"?" had the unintended side-effect of allowing the hostname regex "(.+)"
to match greedily, which included the ":port" patterns up to the ";"
separating the hostname from the kubernetes port annotation.
This change updates the regex for the hostname to match any non-":"
characters. This forces the regex to stop if a ":port" is present and
allow the second group to match the optional port.
This change updates port relabeling for pod and service discovery so the
relabeling regex matches addresses with or without declared ports. As
well, this change uses a consistent style in the replacement pattern
for the two expressions.
Previously, for both services or pods that did not have declared ports, the
relabel config regex would fail to match:
__meta_kubernetes_service_annotation_prometheus_io_port
regex: (.+)(?::\d+);(\d+)
__meta_kubernetes_pod_annotation_prometheus_io_port
regex: (.+):(?:\d+);(\d+)
Both regexes expected a <host>:<port> pattern.
The new regex matches addresses with or without declared ports by making
the :<port> pattern optional.
__meta_kubernetes_service_annotation_prometheus_io_port
__meta_kubernetes_pod_annotation_prometheus_io_port
regex: (.+)(?::\d+)?;(\d+)
This removes legacy support for specific remote storage systems in favor
of only offering the generic remote write protocol. An example bridge
application that translates from the generic protocol to each of those
legacy backends is still provided at:
documentation/examples/remote_storage/remote_storage_bridge
See also https://github.com/prometheus/prometheus/issues/10
The next step in the plan is to re-add support for multiple remote
storages.
In preparation for removing specific remote storage implementations,
this offers an example of how to achieve the same in a separate process.
Rather than having three separate bridges for OpenTSDB, InfluxDB, and
Graphite, I decided to support all in one binary.
For now, this is in the example documenation directory, but perhaps we
will want to make a first-class project / repository out of it.
My aim is to support the new grpc generic write path in Frankenstein. On the surface this seems easy - however I've hit a number of problems that make me think it might be better to not use grpc just yet.
The explanation of the problems requires a little background. At weave, traffic to frankenstein need to go through a couple of services first, for SSL and to be authenticated. So traffic goes:
internet -> frontend -> authfe -> frankenstein
- The frontend is Nginx, and adds/removes SSL. Its done this way for legacy reasons, so the certs can be managed in one place, although eventually we imagine we'll merge it with authfe. All traffic from frontend is sent to authfe.
- Authfe checks the auth tokens / cookie etc and then picks the service to forward the RPC to.
- Frankenstein accepts the reads and does the right thing with them.
First problem I hit was Nginx won't proxy http2 requests - it can accept them, but all calls downstream are http1 (see https://trac.nginx.org/nginx/ticket/923). This wasn't such a big deal, so it now looks like:
internet --(grpc/http2)--> frontend --(grpc/http1)--> authfe --(grpc/http1)--> frankenstein
Next problem was golang grpc server won't accept http1 requests (see https://groups.google.com/forum/#!topic/grpc-io/JnjCYGPMUms). It is possible to link a grpc server in with a normal go http mux, as long as the mux server is serving over SSL, as the golang http client & server won't do http2 over anything other than an SSL connection. This would require making all our service to service comms SSL. So I had a go a writing a grpc http1 server, and got pretty far. But is was a bit of a mess.
So finally I thought I'd make a separate grpc frontend for this, running in parallel with the frontend/authfe combo on a different port - and first up I'd need a grpc reverse proxy. Ideally we'd have some nice, generic reverse proxy that only knew about a map from service names -> downstream service, and didn't need to decode & re-encode every request as it went through. It seems like this can't be done with golang's grpc library - see https://github.com/mwitkow/grpc-proxy/issues/1.
And then I was surprised to find you can't do grpc from browsers! See http://www.grpc.io/faq/ - not important to us, but I'm starting to question why we decided to use grpc in the first place?
It would seem we could have most of the benefits of grpc with protos over HTTP, and this wouldn't preclude moving to grpc when its a bit more mature? In fact, the grcp FAQ even admits as much:
> Why is gRPC better than any binary blob over HTTP/2?
> This is largely what gRPC is on the wire.
- fold metric name into labels
- return initialization errors back to main
- add snappy compression
- better context handling
- pre-allocation of labels
- remove generic naming
- other cleanups
This uses a new proto format, with scope for multiple samples per
timeseries in future. This will allow users to pump samples out to
whatever they like without having to change the core Prometheus code.
There's also an example receiver to save users figuring out the
boilerplate themselves.
CA and Bearer Token are config of `kubernetes_sd_configs`, not the
`scrape_config`. Also updated misleading top-level comment and removed
unnecessary global config.
* Support multiple masters with retries against each master as required.
* Scrape masters' metrics.
* Add role meta label for node/service/master to make it easier for relabeling.
- Modified sample conf so it is useable by default, also added some
comments from the 'hello world' configuration.
- Updated README so there's a clear two step start for newbies.
- Added extra vim swap files to gitignore.
Change-Id: I76203973db4a7b332014662fcfb2ce5e7d137bd8