Commit graph

177 commits

Author SHA1 Message Date
Ayoub Mrini 581d8d86b4
Pod status changes not discovered by Kube Endpoints SD (#13337)
* fix(discovery/kubernetes/endpoints): react to changes on Pods because some modifications can occur on them without triggering an update on the related Endpoints (The Pod phase changing from Pending to Running e.g.).

---------

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Co-authored-by: Guillermo Sanchez Gavier <gsanchez@newrelic.com>
2024-02-01 12:34:37 +00:00
Paulin Todev 78411d5e8b
SD Managers taking over responsibility for registration of debug metrics (#13375)
SD Managers take over responsibility for SD metrics registration

---------

Signed-off-by: Paulin Todev <paulin.todev@gmail.com>
Signed-off-by: Björn Rabenstein <github@rabenste.in>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
2024-01-23 16:53:55 +01:00
machine424 2d01e56695
chore(kubernetes): check preconditions earlier and avoid unnecessary checks or iterations
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-01-16 12:10:35 +01:00
Paulin Todev d2e997030e
Fix linter issues
Signed-off-by: Paulin Todev <paulin.todev@gmail.com>
2023-12-11 14:28:37 +00:00
Paulin Todev 27bb57a37b
Define metric label values in one place
Signed-off-by: Paulin Todev <paulin.todev@gmail.com>
2023-12-11 13:39:01 +00:00
Paulin Todev 108a749a45
Set up labels for counters in advance
Signed-off-by: Paulin Todev <paulin.todev@gmail.com>
2023-12-11 13:39:00 +00:00
Paulin Todev 6de80d7fb0
Allow non-default registry to be used for metrics of SD components
Signed-off-by: Paulin Todev <paulin.todev@gmail.com>
2023-12-11 11:14:26 +00:00
Matthieu MOREL 9c4782f1cc
golangci-lint: enable testifylint linter (#13254)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-12-07 11:35:01 +00:00
Oleksandr Redko fa90ca46e5 ci(lint): enable godot; append dot at the end of comments
Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>
2023-10-31 19:53:38 +02:00
Matthieu MOREL 68e6b4dd34
ci(lint): enable errorlint on discovery (#12918)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-10-31 12:46:55 +01:00
Oleksandr Redko 8e5f0387a2
ci(lint): enable nolintlint and remove redundant comments (#12926)
Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>
2023-10-31 12:35:13 +01:00
Julien Pivotto 2bdb2e627f
Merge pull request #10914 from wangrzneu/add-endpointslice-label
Add more labels for endpointslice and endpoints role in k8s discovery
2023-07-18 13:35:03 +02:00
Julien Pivotto 076056ccdf
Merge pull request #11642 from zoonage/main
Do not add pods to target group if the PodIP status is not set
2023-07-05 23:10:50 +02:00
renzheng.wang b2c5de2e65 fix lint issue
Signed-off-by: renzheng.wang <wangrzneu@gmail.com>
2023-05-30 20:35:04 +08:00
renzheng.wang 98ffad01b8 update tests and docs
Signed-off-by: renzheng.wang <wangrzneu@gmail.com>
2023-05-30 20:13:52 +08:00
renzheng.wang 866fa25b20 add label and labelpresent for endpointslice role in k8s discovery
Signed-off-by: renzheng.wang <wangrzneu@gmail.com>
2023-05-30 20:13:38 +08:00
Mickael Carl 2f35619710 discovery/kubernetes: attach node labels when the endpoints TargetRef's kind are Node
Signed-off-by: Mickael Carl <mcarl@apple.com>
2023-05-11 10:11:56 +01:00
cui fliter 276ca6a883 fix some comments
Signed-off-by: cui fliter <imcusg@gmail.com>
2023-04-25 14:19:16 +08:00
Matthieu MOREL bae9a21200
Merge branch 'main' into linter/nilerr
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-04-19 19:56:39 +02:00
beorn7 5b53aa1108 style: Replace else if cascades with switch
Wiser coders than myself have come to the conclusion that a `switch`
statement is almost always superior to a statement that includes any
`else if`.

The exceptions that I have found in our codebase are just these two:

* The `if else` is followed by an additional statement before the next
  condition (separated by a `;`).
* The whole thing is within a `for` loop and `break` statements are
  used. In this case, using `switch` would require tagging the `for`
  loop, which probably tips the balance.

Why are `switch` statements more readable?

For one, fewer curly braces. But more importantly, the conditions all
have the same alignment, so the whole thing follows the natural flow
of going down a list of conditions. With `else if`, in contrast, all
conditions but the first are "hidden" behind `} else if `, harder to
spot and (for no good reason) presented differently from the first
condition.

I'm sure the aforemention wise coders can list even more reasons.

In any case, I like it so much that I have found myself recommending
it in code reviews. I would like to make it a habit in our code base,
without making it a hard requirement that we would test on the CI. But
for that, there has to be a role model, so this commit eliminates all
`if else` occurrences, unless it is autogenerated code or fits one of
the exceptions above.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-19 17:22:31 +02:00
beorn7 c3c7d44d84 lint: Adjust to the lint warnings raised by current versions of golint-ci
We haven't updated golint-ci in our CI yet, but this commit prepares
for that.

There are a lot of new warnings, and it is mostly because the "revive"
linter got updated. I agree with most of the new warnings, mostly
around not naming unused function parameters (although it is justified
in some cases for documentation purposes – while things like mocks are
a good example where not naming the parameter is clearer).

I'm pretty upset about the "empty block" warning to include `for`
loops. It's such a common pattern to do something in the head of the
`for` loop and then have an empty block. There is still an open issue
about this: https://github.com/mgechev/revive/issues/810 I have
disabled "revive" altogether in files where empty blocks are used
excessively, and I have made the effort to add individual
`// nolint:revive` where empty blocks are used just once or twice.
It's borderline noisy, though, but let's go with it for now.

I should mention that none of the "empty block" warnings for `for`
loop bodies were legitimate.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-19 17:10:10 +02:00
Matthieu MOREL fb3eb21230 enable gocritic, unconvert and unused linters
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-04-13 19:20:22 +00:00
Peter Nicholson 138a1362d8 Add support for EndpointSlice conditions
Signed-off-by: Peter Nicholson <petergoods@hotmail.com>
2023-01-19 18:56:02 +01:00
Ben Whetstone 32e9f6a39c Add container ID as a meta label for pod targets
Signed-off-by: Ben Whetstone <ben.whetstone@sysdig.com>
2023-01-11 11:44:36 -05:00
Jens Erat 728fdc959e
Kubernetes SD: disable resync timer
While originally the resync period also forced refreshing from Kubernetes API server, this has been removed for some years now because watching the API server got more stable [1]. Today, this just results in all entities being sent to the service discovery again, which is valid from a general Prometheus perspective, but results in unnecessary CPU load and also breaks service discovery metrics. In especially, this makes monitoring "do we actually observe changes from Kubernetes API server" impossible (receiving constant updates from Kubernetes service discovery is a pretty valid assumption, for example nodes get frequent status updates, ...).

Signed-off-by: Jens Erat <jens.erat@mercedes-benz.com>
2022-12-22 13:26:03 +01:00
Julien Pivotto 3677d61a4b Update kubernetes dependencies
A new API is available for AddEventHandlers, to get errors but also be
able to cancel handlers.

Doing the easy thing for the release, which is just to log errors.

We could see how to improve this in the future to handle the errors
properly and cancel the handlers.

Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2022-12-14 10:46:20 +01:00
Aaron George d542483e8c k8s discovery: Ensure that the pod IP is in the status before adding to target group
Signed-off-by: Aaron George <aaron@ometria.com>

Signed-off-by: Aaron George <aaron@ometria.com>
2022-11-30 09:04:14 +00:00
Maciej Borsz 56eba3ace2 Use protobuf encoding in client-go
Signed-off-by: Maciej Borsz <maciejborsz@google.com>
2022-09-26 12:54:33 +00:00
Karl Piplies 3782cb40d5 add loadbalancerip to service labels
Signed-off-by: Karl Piplies <karl.piplies@mercedes-benz.com>
2022-08-10 12:40:11 +02:00
Frederic Branczyk 414c3e549c
Merge pull request #11002 from yngwiewang/feature/k8s-service-port-number
feat:(kubernetes_sd): add __meta_kubernetes_service_port_number (#10945)
2022-07-22 16:13:55 +02:00
Robert Fratto 97be65387d discovery/kubernetes: fix broken tests
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
2022-07-18 09:10:12 -04:00
Robert Fratto 823d24d1e9 discovery/kubernetes: add container image as metadata
This commits adds a __meta_kubernetes_pod_container_image as a new
metadata label. This can be used to alert on mismatched versions of
targets who don't have a build_info metric, as well as injecting it into
log lines for other consumers of discovery/kubernetes (e.g., Promtail).

Signed-off-by: Robert Fratto <robertfratto@gmail.com>
2022-07-18 08:35:12 -04:00
yngwiewang 1abbf5a5c5 add __meta_kubernetes_service_port_number (#10945)
Signed-off-by: yngwiewang <yngwiewang@163.com>
2022-07-09 17:04:25 +08:00
Filip Petkovski 05da373dcb
kubernetes_sd: Allow attaching node labels for endpoint role
The Kubernetes service discovery can only add node labels to
targets from the pod role.

This commit extends this functionality to the endpoints and
endpointslices roles.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2022-06-16 12:22:55 +02:00
Filip Petkovski 7a78897d0b
Improve reliability of Kubernetes SD tests (#10761)
The tests for Kubernetes SD rely on comparing target groups by first
serializing them to JSON. However, the target group MarshalJSON function
only serializes the __address__ label, which makes eliminates all other
labels from the comparison.

This commit implements a separate marshaling function intended for use in
Kubernetes SD tests. The function serializes all target labels, making
comparisons much more reliable. The commit also fixes all tests that
started to fail due to the newly introduced change.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2022-06-07 16:19:40 +01:00
Matthieu MOREL f43749e82f
refactor (discovery): move from github.com/pkg/errors to 'errors' and 'fmt' (#10807)
Signed-off-by: Matthieu MOREL <mmorel-35@users.noreply.github.com>

Co-authored-by: Matthieu MOREL <mmorel-35@users.noreply.github.com>
2022-06-03 13:47:14 +02:00
Matthieu MOREL e2ede285a2
refactor: move from io/ioutil to io and os packages (#10528)
* refactor: move from io/ioutil to io and os packages
* use fs.DirEntry instead of os.FileInfo after os.ReadDir

Signed-off-by: MOREL Matthieu <matthieu.morel@cnp.fr>
2022-04-27 11:24:36 +02:00
nixargh e76c6aac50 Fix #10507: explicitly include gcp auth from k8s.io to kubernetes discovery
Signed-off-by: nixargh <nixargh@protonmail.com>
2022-04-01 14:56:37 +03:00
Furkan 2939966634
Kubernetes SD: Support discovery.k8s.io/v1 EndpointSlice
Fixes #9498

Signed-off-by: Furkan <furkan.turkal@trendyol.com>
Signed-off-by: Erkan Zileli <erkan.zileli@trendyol.com>
Co-authored-by: Batuhan Apaydin <batuhan.apaydin@trendyol.com>
2022-03-19 00:42:16 +03:00
fpetkovski 16bd0d7d5c
Index pods by node name
Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
2022-03-10 15:34:56 +01:00
fpetkovski fa798d3042
Allow attaching node metadata
Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
2022-03-10 08:40:52 +01:00
cui fliter c9b56d1a49
all: fix some typos (#10389)
Signed-off-by: cuishuang <imcusg@gmail.com>
2022-03-03 12:03:07 +00:00
beorn7 35010daa90 Merge branch 'release-2.33' into beorn7/cleaning-up-cherrypicking-fallout 2022-02-02 16:49:40 +01:00
Julien Pivotto 9d63502204 k8s: improve 'own_namespace'
Fail configuration unmarshalling if kubeconfig or api url are set with
"own namespace"

Only read namespace file if needed.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2022-02-01 14:59:09 +01:00
Georg Gadinger c85efa02d9 Fix k8s target discovery when not running inside k8s
When using Kubernetes service discovery on a Prometheus instance that's
not running inside Kubernetes, the creation of the service discovery
fails with a "no such file or directory" error as the special
`/var/run/secrets/kubernetes.io/serviceaccount/namespace` file is not
there.  This commit moves the code that reads this file into the
if-branch where no `APIServer.URL` is given (that one basically makes
Prometheus assume it is running inside of a Kubernetes cluster).

Signed-off-by: Georg Gadinger <nilsding@nilsding.org>
2022-02-01 14:41:25 +01:00
Georg Gadinger 4663f814d6 Fix k8s target discovery when not running inside k8s
When using Kubernetes service discovery on a Prometheus instance that's
not running inside Kubernetes, the creation of the service discovery
fails with a "no such file or directory" error as the special
`/var/run/secrets/kubernetes.io/serviceaccount/namespace` file is not
there.  This commit moves the code that reads this file into the
if-branch where no `APIServer.URL` is given (that one basically makes
Prometheus assume it is running inside of a Kubernetes cluster).

Signed-off-by: Georg Gadinger <nilsding@nilsding.org>
2022-02-01 10:20:03 +01:00
fpetkovski de87515b24 Implement target discovery in own k8s namespace
This commit adds support for discovering targets from the same
Kubernetes namespace as the Prometheus pod itself. Own-namespace
discovery can be indicated by using "." as the namespace.

Fixes #9782

Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
2021-12-14 13:41:38 +01:00
Ed Schouten a3e9628e0c
Kubernetes service discovery: add provider ID label (#9603)
When using Kubernetes on cloud providers, nodes will have the
spec.providerID field populated to contain the cloud provider specific
name of the EC2/GCE/...  instance.

Let's expose this information as an additional label, so that it's
easier to annotate metrics and alerts to contain the cloud provider
specific name of the instance to which it pertains.

Signed-off-by: Ed Schouten <eschouten@apple.com>
2021-12-06 22:27:11 +01:00
Mateusz Gozdek ea924746b3
discovery/kubernetes: improve test logic for waiting for discoverers (#9584)
When running tests in parallel, 10 milliseconds may not be enough for
all discoverers to register, which will make test flaky.

This commit changes the waiting logic to wait for number of discoverers
to stop increasing during given time frame, which should be large enough
for single discoverer to register in test environment.

A following run passes with this commit:

go test -failfast -race -count 100 -v ./discovery/kubernetes/

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
2021-11-02 22:17:32 +01:00
Mateusz Gozdek b7bdf6fab2 Fix imports formatting
According to
2829908806 (r58457095).

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
2021-11-02 19:52:34 +01:00