This commit introduces a new metric to count the number of failed
requests to Linode's API when using Linode SD. Resolves#10672, inspired
by #10476.
_Note_: this doens't count failures when polling the `/account/events`
endpoint, as a `401` there is how we determine if the supplied token has
the needed API scopes to do event polling vs full refreshes each
interval.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* refactor: move from io/ioutil to io and os packages
* use fs.DirEntry instead of os.FileInfo after os.ReadDir
Signed-off-by: MOREL Matthieu <matthieu.morel@cnp.fr>
* discovery: expose HTTP client options to discoverers
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
* discovery/http: use HTTP client options for created client
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
* scrape: use a list of HTTP client options instead of just dial context
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
* discovery: rephrase comment
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
* Run gofumpt on all files
Getting golangci-lint errors when building on my laptop, possibly because I have newer version of gofumpt then what it was formatted with.
Run gofumpt -w -extra on all files as it will be needed in the future anyway.
* Update golangci-lint to v1.44.2
v1.44.0 upgraded gofumpt so bumping version in CI will help keep formatting correct for everyone
* Address golangci-lint error
Getting 'error-strings: error strings should not be capitalized or end with punctuation or a newline' from revive here.
Drop new line.
Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
The upgraded client adds order=creation_date_desc to the query
parameters causing the tests to fail. Instead of checking the full URI,
just check that the path is correct.
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
* discovery/targetgroup: support marshaling to JSON
targetgroup.Group is able to be marshaled to and from YAML, but not with
JSON. JSON is used for the http_sd API representation, so users
implementing the API were required to implement their own type which is
expected by the JSON unmarshaler.
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
* fix lint error
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
* Update discovery/targetgroup/targetgroup.go
Co-authored-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
* s/Json/JSON
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
Co-authored-by: Julius Volz <julius.volz@gmail.com>
Fail configuration unmarshalling if kubeconfig or api url are set with
"own namespace"
Only read namespace file if needed.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
When using Kubernetes service discovery on a Prometheus instance that's
not running inside Kubernetes, the creation of the service discovery
fails with a "no such file or directory" error as the special
`/var/run/secrets/kubernetes.io/serviceaccount/namespace` file is not
there. This commit moves the code that reads this file into the
if-branch where no `APIServer.URL` is given (that one basically makes
Prometheus assume it is running inside of a Kubernetes cluster).
Signed-off-by: Georg Gadinger <nilsding@nilsding.org>
When using Kubernetes service discovery on a Prometheus instance that's
not running inside Kubernetes, the creation of the service discovery
fails with a "no such file or directory" error as the special
`/var/run/secrets/kubernetes.io/serviceaccount/namespace` file is not
there. This commit moves the code that reads this file into the
if-branch where no `APIServer.URL` is given (that one basically makes
Prometheus assume it is running inside of a Kubernetes cluster).
Signed-off-by: Georg Gadinger <nilsding@nilsding.org>
This commit adds support for discovering targets from the same
Kubernetes namespace as the Prometheus pod itself. Own-namespace
discovery can be indicated by using "." as the namespace.
Fixes#9782
Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
When using Kubernetes on cloud providers, nodes will have the
spec.providerID field populated to contain the cloud provider specific
name of the EC2/GCE/... instance.
Let's expose this information as an additional label, so that it's
easier to annotate metrics and alerts to contain the cloud provider
specific name of the instance to which it pertains.
Signed-off-by: Ed Schouten <eschouten@apple.com>
* Add basic initial developer docs for TSDB
There's a decent amount of content already out there (blog posts,
conference talks, etc), but:
* when they get stale, they don't tend to get updated
* they still leave me with questions that I'ld like to answer
for developers (like me) who want to use, or work with, TSDB
What I propose is developer docs inside the prometheus
repository. Easy to find and harness the power of the community
to expand it and keep it up to date.
* perfect is the enemy of good. Let's have a base and incrementally improve
* Markdown docs should be broad but not too deep. Source code comments
can complement them, and are the ideal place for implementation details.
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* use example code that works out of the box
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* Apply suggestions from code review
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* PR feedback
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* more docs
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* PR feedback
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* Apply suggestions from code review
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Apply suggestions from code review
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
* feedback
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* Update tsdb/docs/usage.md
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
* final tweaks
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* workaround docs versioning issue
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* Move example code to real executable, testable example.
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* cleanup example test and make sure it always reproduces
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* obtain temp dir in a way that works with older Go versions
Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
* Fix Ganesh's comments
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
When running tests in parallel, 10 milliseconds may not be enough for
all discoverers to register, which will make test flaky.
This commit changes the waiting logic to wait for number of discoverers
to stop increasing during given time frame, which should be large enough
for single discoverer to register in test environment.
A following run passes with this commit:
go test -failfast -race -count 100 -v ./discovery/kubernetes/
Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
* Add a feature flag to enable the new manager
This PR creates a copy of the legacy manager and uses it by default.
It is a companion PR to #9349. With this PR, users can enable the new
discovery manager and provide us with any feedback / side effects that
the new behaviour might have.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
We are re-enabling HTTP 2 again. There has been a few bugfixes upstream
in go, and we have also enabled ReadIdleTimeout.
Fix#7588Fix#9068
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
We have been Puppet user for 10 years and we are users of
https://github.com/camptocamp/prometheus-puppetdb-sd
However, that file_sd implementation contains business logic and
assumptions around e.g. the modules which you are using.
This pull request adds a simple PuppetDB service discovery, which will
enable more use cases than the upstream sd.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
This change sets the scheme to https when a rule specified by Ingress
matches a wildcard DNS entry in the ingress TLS hosts
Signed-off-by: Philip Gough <philip.p.gough@gmail.com>
* PromQL: Fix start and end keywords masking label and metric names
This commit fixes an issue with the "at modifier" that introduced two
new keywords: `start` and `end`. In grouping options and in metric
names, these keywords took precedence over metric or label names, so
that those metrics and labels could no longer be referenced.
Signed-off-by: Clayton Peters <clayton.peters@man.com>
* Add in additional tests for metrics and/or labels called start/end.
Signed-off-by: Clayton Peters <clayton.peters@man.com>
* *: Cut 2.29.0-rc.0
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* VERSION: bump to 2.29.0-rc.0
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* Remove experimental wording on size-based retention
Followup of #9004
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Fix PR reference in changelog
Signed-off-by: George Brighton <george@gebn.co.uk>
* Describe EC2 availability zone IDs at most once per refresh (#9142)
Signed-off-by: George Brighton <george@gebn.co.uk>
* Describe EC2 availability zones at most once per SD load
Closes#9142.
Signed-off-by: George Brighton <george@gebn.co.uk>
* Incorporate feedback
Signed-off-by: George Brighton <george@gebn.co.uk>
* Integrate feedback
Signed-off-by: George Brighton <george@gebn.co.uk>
* Add a compatibility note for macOS users.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* *: Cut v2.29.0-rc.1
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* Fix `kuma_sd` targetgroup reporting (#9157)
* Bundle all xDS targets into a single group
Signed-off-by: austin ce <austin.cawley@gmail.com>
* *: cut v2.29.0-rc.2
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* Rename links
Signed-off-by: Levi Harrison <git@leviharrison.dev>
* bump codemirror-promql to 0.17.0
Signed-off-by: Augustin Husson <husson.augustin@gmail.com>
* *: cut v2.29.0
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
* tsdb: align atomically accessed int64 (#9192)
This prevents a panic in 32-bit archs:
https://pkg.go.dev/sync/atomic#pkg-note-BUGFixed#9190
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Release 2.29.1 (#9193)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Co-authored-by: Clayton Peters <clayton.peters@man.com>
Co-authored-by: Frederic Branczyk <fbranczyk@gmail.com>
Co-authored-by: George Brighton <george@gebn.co.uk>
Co-authored-by: Austin Cawley-Edwards <austin.cawley@gmail.com>
Co-authored-by: Levi Harrison <git@leviharrison.dev>
Co-authored-by: Augustin Husson <husson.augustin@gmail.com>
* optimize Linode SD by polling for event changes during refresh
Most accounts are fairly "static", in the sense that they're not cycling
through instances constantly. So rather than do a full refresh every
interval and potentially make several behind-the-scenes paginated API
calls, this will now poll the `/account/events/` endpoint every minute
with a list of events that we care about. If a matching event is found,
we then do a full refresh.
Co-authored-by: William Smith <wsmith@linode.com>
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Signed-off-by: William Smith <wsmith@linode.com>
* Fix: Use json.Unmarshal() instead of json.Decoder
See https://ahmet.im/blog/golang-json-decoder-pitfalls/
json.Decoder is for JSON streams, not single JSON objects / bodies.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Revert modifications to targetgroup parsing
Signed-off-by: Julius Volz <julius.volz@gmail.com>
prometheus_sd_discovered_targets is wrongly calculated when there are
multiple SD configurations in place. One discovery manager can have
multiple groups coming from multiple service discoveries.
When multiple service discovery configs are used, we do not compute the
metric correctly, and instead just set the metric to one of the service
discoveries.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
This makes it clear that the dockerswarm package does more than docker
swarm, but does also docker.
I have picked moby as it is the upstream name: https://mobyproject.org/
There is no user-facing change, except in the case of a bad
configuration. Previously, a user who would have a bad docker sd config
would see an error like:
> field xx not found in type dockerswarm.plain
Now that error would be turned into:
> field xx not found in type moby.plain
While not perfect, it should at not be confusing between docker and
dockerswarm.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Prometheus adds the ability to read secrets from files. This add
this feature for the scaleway service discovery.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
This PR introduces support for follow_redirect, to enable users to
disable following HTTP redirects.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
The label `__meta_digitalocean_image` expose the `slug` of the image and
the `slug` is only present in the public images.
To refer a user-generated image (`snapshot` or `custom`) we can use
the image's display name.
See: https://developers.digitalocean.com/documentation/v2/#images
Signed-off-by: Matteo Valentini <matteo.valentini@nethesis.it>
Last change in 4efca5a introduced a problem where NewDiscovery would
just return a nil value, which is not handled well and didn't allow for
fixing configuration issues at runtime without a reload.
Signed-off-by: Alfred Krohmer <alfred.krohmer@logmein.com>
This also caches credentials that are obtained e.g. via IRSA on AWS EKS.
Previously, every refresh cycle would request the credentials again.
Signed-off-by: Alfred Krohmer <alfred.krohmer@logmein.com>
Label selector can be
"set-based"(https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#set-based-requirement)
but such a selector causes Prometheus start failure with the "unexpected
error: parsing YAML file ...: invalid selector: 'foo in (bar,baz)';
can't understand 'baz)'"-like error.
This is caused by the `fields.ParseSelector(string)` function that
simply splits an expression as a CSV-list, so a comma confuses such a
parsing method and lead to the error.
Use `labels.Parse(string)` to use a valid lexer to parse a selector
expression.
Closes#8284.
Signed-off-by: Alexey Shumkin <Alex.Crezoff@gmail.com>
* Testify: move to require
Moving testify to require to fail tests early in case of errors.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* More moves
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Refactor test assertions
This pull request gets rid of assert.True where possible to use
fine-grained assertions.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Java timestamps are causing issues when unmarshalling with a 32 bit
prometheus. It appears that we do not use those fields, so let's remove
them.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Fix Hetzner Robot SD trying to decode response when a non 2xx HTTP code was returned
Signed-off-by: Lukas Kämmerling <lukas.kaemmerling@hetzner-cloud.de>
This also fixes a bug in query_log_file, which now is relative to the config file like all other paths.
Signed-off-by: Andy Bursavich <abursavich@gmail.com>
* discovery: check for nil triton_sd_config
Note: this was discovered thanks to the added test.
The test is pretty low-level but also effective.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Implement go leak test for promql
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Implement go leak test for Consul SD
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* Implement go leak test in discovery manager
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
We were missing testing on the behaviour of the configuration
unmarshalling.
This PR adds a refresh command that can be used to test that we
use the correct refresh function.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Since we dependend on go1.14 now, we can use T.Cleanup
https://golang.org/pkg/testing/#T.Cleanup
This provides a nicer approach to shut down the test server.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
* OpenStack SD: Add availability config option, to choose endpoint type
In some environments Prometheus must query OpenStack via an alternative
endpoint type (gophercloud calls this `availability`.
This commit implements this option.
Co-Authored-By: Dennis Kuhn <d.kuhn@syseleven.de>
Signed-off-by: Steffen Neubauer <s.neubauer@syseleven.de>
use ?wait=10m will give results as fast as usual when data is changing
but will perform far less requests when services do not change.
On large infrastructure, this will reduce quite a lot the number of
qps on Consul servers while having the same performance for freshness
of results.
Signed-off-by: Pierre Souchay <p.souchay@criteo.com>
Previously `max` results stopped reading from results in tests
prematurely, as it stopped when `max` number of items were received from
the channel instead of `max` number of unique target groups received.
This caused flaky tests where the same target group was received
multiple times, as Kubernetes informers may emit the same event multiple
times.
Before this patch, running this test repeatedly failed eventually. After
this patch I have run the test many thousand times without failure.
```bash
go test -run TestEndpointsDiscoveryNamespaces -count 1000 -test.v
```
Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>
Added optional configuration item role, defaults to 'container' (backwards-compatible).
Setting role to 'cn' will discover compute nodes instead.
Human-friendly compute node hostname discovery depends on cmon 1.7.0:
c1a2aeca36
Adjust testcases to use discovery config per case as two different types are now supported.
Updated documentation:
* new role setting
* clarify what the name 'container' covers as triton uses different names in different locations
Signed-off-by: jzinkweg <jzinkweg@gmail.com>
Add extra meta labels which will be useful in the case
Prometheus discovery hypervisor .
Signed-off-by: pzqu <pzqu@qq.com>
Co-authored-by: pzqu <pzqu@example.com>
We can assume that not all target groups are nil in normal scernarios,
so we can create targets[poolKey] outside the loop.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
The Kubernetes client records workqueue duration and latency metrics as
seconds so there's no need to convert the values from microseconds to
seconds anymore.
The cache metrics (prometheus_sd_kubernetes_cache_*) are removed because
they aren't used anymore by the client though still exposed by its API.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* adding additional unit tests for getDataCenter() in consul
Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
* Consult Tests : update comments to start with uppercase and end with point
Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
* Consult Test : using table-driven tests
Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
* Consul Test : cleaner syntax
Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
* Consul Test : even cleaner syntax
Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
* Consul Test : update comments
Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
* Fixing naming convention by removing underscore in function name
Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
* Removing duplicated test case for getDatacenter()
Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
* adding unit test for target group
Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
* Improve unit tests for target group
Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
* Fix imports
Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
* Improve test by asserting on whole Target Group object
Signed-off-by: Jean-Baptiste Le Duigou <jb.leduigou@gmail.com>
- Use testutil.ToFloat64 to collect testing metrics
- Declare ServiceDiscoveryConfig directly instead of calling Unmarshal on a piece of YAML
Signed-off-by: Nevill <nevill.dutt@gmail.com>
* Update go.mod dependencies before release
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Add issue for showing query warnings in promtool
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Revert json-iterator back to 1.1.6
It produced errors when marshaling Point values with special float
values.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Fix expected step values in promtool tests after client_golang update
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Update generated protobuf code after proto dep updates
Signed-off-by: Julius Volz <julius.volz@gmail.com>