Commit graph

197 commits

Author SHA1 Message Date
Aniket Kaulavkar f7685caf0d Parallelize discovery/kubernetes tests using t.Parallel()
Signed-off-by: Aniket Kaulavkar <aniket.kaulavkar@gmail.com>
2024-11-14 10:44:03 +05:30
Matthieu MOREL af1a19fc78 enable errorf rule from perfsprint linter
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-11-06 16:50:36 +01:00
Giedrius Statkevičius 58fedb6b61 discovery/kubernetes: optimize more gets
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-10-28 17:17:37 +02:00
Giedrius Statkevičius 716fd5b11f discovery/kubernetes: use namespacedName
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-10-28 16:19:56 +02:00
Giedrius Statkevičius e452308e37 discovery/kubernetes: optimize resolvePodRef
resolvePodRef is in a hot path:

```
ROUTINE ======================== github.com/prometheus/prometheus/discovery/kubernetes.(*Endpoints).resolvePodRef in discovery/kubernetes/endpoints.go
    2.50TB     2.66TB (flat, cum) 22.28% of Total
         .          .    447:func (e *Endpoints) resolvePodRef(ref *apiv1.ObjectReference) *apiv1.Pod {
         .          .    448:   if ref == nil || ref.Kind != "Pod" {
         .          .    449:           return nil
         .          .    450:   }
    2.50TB     2.50TB    451:   p := &apiv1.Pod{}
         .          .    452:   p.Namespace = ref.Namespace
         .          .    453:   p.Name = ref.Name
         .          .    454:
         .   156.31GB    455:   obj, exists, err := e.podStore.Get(p)
         .          .    456:   if err != nil {
         .          .    457:           level.Error(e.logger).Log("msg", "resolving pod ref failed", "err", err)
         .          .    458:           return nil
         .          .    459:   }
         .          .    460:   if !exists {
```

This is some low hanging fruit that we can easily optimize. The key of
an object has format "namespace/name" so generate that inside of
Prometheus itself and use pooling.

```
goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/discovery/kubernetes
cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
                 │   olddisc    │               newdisc               │
                 │    sec/op    │   sec/op     vs base                │
ResolvePodRef-16   516.3n ± 17%   289.5n ± 7%  -43.92% (p=0.000 n=10)

                 │   olddisc    │              newdisc               │
                 │     B/op     │    B/op     vs base                │
ResolvePodRef-16   1168.00 ± 0%   24.00 ± 0%  -97.95% (p=0.000 n=10)

                 │  olddisc   │            newdisc             │
                 │ allocs/op  │ allocs/op   vs base            │
ResolvePodRef-16   2.000 ± 0%   2.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal
```

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-10-28 12:12:40 +02:00
machine424 b1c356beea
fix(discovery): Handle cache.DeletedFinalStateUnknown in node informers' DeleteFunc
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-10-16 10:20:37 +02:00
TJ Hoplock 6ebfbd2d54 chore!: adopt log/slog, remove go-kit/log
For: #14355

This commit updates Prometheus to adopt stdlib's log/slog package in
favor of go-kit/log. As part of converting to use slog, several other
related changes are required to get prometheus working, including:
- removed unused logging util func `RateLimit()`
- forward ported the util/logging/Deduper logging by implementing a small custom slog.Handler that does the deduping before chaining log calls to the underlying real slog.Logger
- move some of the json file logging functionality to use prom/common package functionality
- refactored some of the new json file logging for scraping
- changes to promql.QueryLogger interface to swap out logging methods for relevant slog sugar wrappers
- updated lots of tests that used/replicated custom logging functionality, attempting to keep the logical goal of the tests consistent after the transition
- added a healthy amount of `if logger == nil { $makeLogger }` type conditional checks amongst various functions where none were provided -- old code that used the go-kit/log.Logger interface had several places where there were nil references when trying to use functions like `With()` to add keyvals on the new *slog.Logger type

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
2024-10-07 15:58:50 -04:00
bas smit 73997289c3 tests: update discovery tests with new labael
Some checks are pending
CI / Go tests (push) Waiting to run
CI / More Go tests (push) Waiting to run
CI / Go tests with previous Go version (push) Waiting to run
CI / UI tests (push) Waiting to run
CI / Go tests on Windows (push) Waiting to run
CI / Mixins tests (push) Waiting to run
CI / Build Prometheus for common architectures (0) (push) Waiting to run
CI / Build Prometheus for common architectures (1) (push) Waiting to run
CI / Build Prometheus for common architectures (2) (push) Waiting to run
CI / Build Prometheus for all architectures (0) (push) Waiting to run
CI / Build Prometheus for all architectures (1) (push) Waiting to run
CI / Build Prometheus for all architectures (10) (push) Waiting to run
CI / Build Prometheus for all architectures (11) (push) Waiting to run
CI / Build Prometheus for all architectures (2) (push) Waiting to run
CI / Build Prometheus for all architectures (3) (push) Waiting to run
CI / Build Prometheus for all architectures (4) (push) Waiting to run
CI / Build Prometheus for all architectures (5) (push) Waiting to run
CI / Build Prometheus for all architectures (6) (push) Waiting to run
CI / Build Prometheus for all architectures (7) (push) Waiting to run
CI / Build Prometheus for all architectures (8) (push) Waiting to run
CI / Build Prometheus for all architectures (9) (push) Waiting to run
CI / Report status of build Prometheus for all architectures (push) Blocked by required conditions
CI / Check generated parser (push) Waiting to run
CI / golangci-lint (push) Waiting to run
CI / fuzzing (push) Waiting to run
CI / codeql (push) Waiting to run
CI / Publish main branch artifacts (push) Blocked by required conditions
CI / Publish release artefacts (push) Blocked by required conditions
CI / Publish UI on npm Registry (push) Blocked by required conditions
Scorecards supply-chain security / Scorecards analysis (push) Waiting to run
Previous commit added the pod_container_init label to discovery, so all
the tests need to reflect that.

Signed-off-by: bas smit <bsmit@bol.com>
2024-10-01 10:26:58 +02:00
bas smit a10dc9298e sd k8s: support sidecar containers in endpoint discovery
Sidecar containers are a newish feature in k8s. They're implemented
similar to init containers but actually stay running and allow you to
delay startup of your application pod until the sidecar started (like
init containers always do).

This adds the ports of the sidecar container to the list of discovered
endpoint(slice), allowing you to target those containers as well.
The implementation is a copy of that of Pod discovery

fixes: #14927

Signed-off-by: bas smit <bsmit@bol.com>
2024-10-01 10:26:58 +02:00
bas smit 7a90d73fa6 sd k8s: test for sidecar container support in endpoints
This test is expected to fail, the followup will add the feature

Signed-off-by: bas smit <bsmit@bol.com>
2024-10-01 10:26:58 +02:00
Jan Fajerski 5138922b0d Merge branch 'main' into 3.0-main-sync-24-08-21 2024-08-21 09:09:36 +02:00
Arve Knudsen 3a78e76282 Upgrade golangci-lint to v1.60.1
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-08-18 12:13:25 +02:00
cuiweiyuan 1800af54f0 chore: fix some function names
Signed-off-by: cuiweiyuan <cuiweiyuan@aliyun.com>
2024-08-15 13:57:21 +08:00
Simon Pasquier 145988d48f
discovery(k8s): remove support for API versions no longer served
This commit removes support for the following API versions:
* `discovery.k8s.io/v1beta1` API version of EndpointSlice (no longer
  served as of v1.25).
* `networking.k8s.io/v1beta1` API version of Ingress (no longer served
  as of v1.22).

Closes #12884

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2024-07-04 14:54:27 +02:00
Simon Pasquier 7704cde4ea
discovery(k8s): add metadata labels to endpointslices
This commit adds 2 new metadata labels for the endpointslice role:
* `__meta_kubernetes_endpointslice_endpoint_node_name`
* `__meta_kubernetes_endpointslice_endpoint_zone`

The latter is only present when the `discovery.k8s.io/v1` API group is
available.

I also updated the configuration doc and added an entry for the
`__meta_kubernetes_endpointslice_endpoint_hostname` label which was
missing.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2024-04-17 11:20:19 +02:00
hanghuge c14a158d03 Signed-off-by: hanghuge <cmoman@outlook.com>
Fix unavailable link

Signed-off-by: hanghuge <cmoman@outlook.com>
2024-04-08 18:44:22 +08:00
machine424 0e81ab44a2
discovery(k8s): add a metric to track failed requests, failures will still be logged.
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-02-28 19:55:46 +01:00
machine424 92544c00bf
discovery: kubernetes: Avoid creating unnecessary Kubernetes indexers in RoleEndpointSlice
This was due to a missing "return", see https://github.com/prometheus/prometheus/pull/13554#discussion_r1490965817

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-02-16 14:29:28 +01:00
Bryan Boreham 6005ac6f9d
Merge pull request #9311 from Creatone/creatone/use-testify-3
tests: Move from t.Errorf and others. (Part 3)
2024-02-05 18:48:59 +01:00
Ayoub Mrini 581d8d86b4
Pod status changes not discovered by Kube Endpoints SD (#13337)
* fix(discovery/kubernetes/endpoints): react to changes on Pods because some modifications can occur on them without triggering an update on the related Endpoints (The Pod phase changing from Pending to Running e.g.).

---------

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Co-authored-by: Guillermo Sanchez Gavier <gsanchez@newrelic.com>
2024-02-01 12:34:37 +00:00
Paweł Szulik 7f24efccdb Refactor discovery tests to use testify.
Signed-off-by: Paweł Szulik <paul.szulik@gmail.com>
2024-01-31 16:42:11 +00:00
Paulin Todev 78411d5e8b
SD Managers taking over responsibility for registration of debug metrics (#13375)
SD Managers take over responsibility for SD metrics registration

---------

Signed-off-by: Paulin Todev <paulin.todev@gmail.com>
Signed-off-by: Björn Rabenstein <github@rabenste.in>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
2024-01-23 16:53:55 +01:00
machine424 2d01e56695
chore(kubernetes): check preconditions earlier and avoid unnecessary checks or iterations
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-01-16 12:10:35 +01:00
Paulin Todev d2e997030e
Fix linter issues
Signed-off-by: Paulin Todev <paulin.todev@gmail.com>
2023-12-11 14:28:37 +00:00
Paulin Todev 27bb57a37b
Define metric label values in one place
Signed-off-by: Paulin Todev <paulin.todev@gmail.com>
2023-12-11 13:39:01 +00:00
Paulin Todev 108a749a45
Set up labels for counters in advance
Signed-off-by: Paulin Todev <paulin.todev@gmail.com>
2023-12-11 13:39:00 +00:00
Paulin Todev 6de80d7fb0
Allow non-default registry to be used for metrics of SD components
Signed-off-by: Paulin Todev <paulin.todev@gmail.com>
2023-12-11 11:14:26 +00:00
Matthieu MOREL 9c4782f1cc
golangci-lint: enable testifylint linter (#13254)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-12-07 11:35:01 +00:00
Oleksandr Redko fa90ca46e5 ci(lint): enable godot; append dot at the end of comments
Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>
2023-10-31 19:53:38 +02:00
Matthieu MOREL 68e6b4dd34
ci(lint): enable errorlint on discovery (#12918)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-10-31 12:46:55 +01:00
Oleksandr Redko 8e5f0387a2
ci(lint): enable nolintlint and remove redundant comments (#12926)
Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>
2023-10-31 12:35:13 +01:00
Julien Pivotto 2bdb2e627f
Merge pull request #10914 from wangrzneu/add-endpointslice-label
Add more labels for endpointslice and endpoints role in k8s discovery
2023-07-18 13:35:03 +02:00
Julien Pivotto 076056ccdf
Merge pull request #11642 from zoonage/main
Do not add pods to target group if the PodIP status is not set
2023-07-05 23:10:50 +02:00
renzheng.wang b2c5de2e65 fix lint issue
Signed-off-by: renzheng.wang <wangrzneu@gmail.com>
2023-05-30 20:35:04 +08:00
renzheng.wang 98ffad01b8 update tests and docs
Signed-off-by: renzheng.wang <wangrzneu@gmail.com>
2023-05-30 20:13:52 +08:00
renzheng.wang 866fa25b20 add label and labelpresent for endpointslice role in k8s discovery
Signed-off-by: renzheng.wang <wangrzneu@gmail.com>
2023-05-30 20:13:38 +08:00
Mickael Carl 2f35619710 discovery/kubernetes: attach node labels when the endpoints TargetRef's kind are Node
Signed-off-by: Mickael Carl <mcarl@apple.com>
2023-05-11 10:11:56 +01:00
cui fliter 276ca6a883 fix some comments
Signed-off-by: cui fliter <imcusg@gmail.com>
2023-04-25 14:19:16 +08:00
Matthieu MOREL bae9a21200
Merge branch 'main' into linter/nilerr
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-04-19 19:56:39 +02:00
beorn7 5b53aa1108 style: Replace else if cascades with switch
Wiser coders than myself have come to the conclusion that a `switch`
statement is almost always superior to a statement that includes any
`else if`.

The exceptions that I have found in our codebase are just these two:

* The `if else` is followed by an additional statement before the next
  condition (separated by a `;`).
* The whole thing is within a `for` loop and `break` statements are
  used. In this case, using `switch` would require tagging the `for`
  loop, which probably tips the balance.

Why are `switch` statements more readable?

For one, fewer curly braces. But more importantly, the conditions all
have the same alignment, so the whole thing follows the natural flow
of going down a list of conditions. With `else if`, in contrast, all
conditions but the first are "hidden" behind `} else if `, harder to
spot and (for no good reason) presented differently from the first
condition.

I'm sure the aforemention wise coders can list even more reasons.

In any case, I like it so much that I have found myself recommending
it in code reviews. I would like to make it a habit in our code base,
without making it a hard requirement that we would test on the CI. But
for that, there has to be a role model, so this commit eliminates all
`if else` occurrences, unless it is autogenerated code or fits one of
the exceptions above.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-19 17:22:31 +02:00
beorn7 c3c7d44d84 lint: Adjust to the lint warnings raised by current versions of golint-ci
We haven't updated golint-ci in our CI yet, but this commit prepares
for that.

There are a lot of new warnings, and it is mostly because the "revive"
linter got updated. I agree with most of the new warnings, mostly
around not naming unused function parameters (although it is justified
in some cases for documentation purposes – while things like mocks are
a good example where not naming the parameter is clearer).

I'm pretty upset about the "empty block" warning to include `for`
loops. It's such a common pattern to do something in the head of the
`for` loop and then have an empty block. There is still an open issue
about this: https://github.com/mgechev/revive/issues/810 I have
disabled "revive" altogether in files where empty blocks are used
excessively, and I have made the effort to add individual
`// nolint:revive` where empty blocks are used just once or twice.
It's borderline noisy, though, but let's go with it for now.

I should mention that none of the "empty block" warnings for `for`
loop bodies were legitimate.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-04-19 17:10:10 +02:00
Matthieu MOREL fb3eb21230 enable gocritic, unconvert and unused linters
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-04-13 19:20:22 +00:00
Peter Nicholson 138a1362d8 Add support for EndpointSlice conditions
Signed-off-by: Peter Nicholson <petergoods@hotmail.com>
2023-01-19 18:56:02 +01:00
Ben Whetstone 32e9f6a39c Add container ID as a meta label for pod targets
Signed-off-by: Ben Whetstone <ben.whetstone@sysdig.com>
2023-01-11 11:44:36 -05:00
Jens Erat 728fdc959e
Kubernetes SD: disable resync timer
While originally the resync period also forced refreshing from Kubernetes API server, this has been removed for some years now because watching the API server got more stable [1]. Today, this just results in all entities being sent to the service discovery again, which is valid from a general Prometheus perspective, but results in unnecessary CPU load and also breaks service discovery metrics. In especially, this makes monitoring "do we actually observe changes from Kubernetes API server" impossible (receiving constant updates from Kubernetes service discovery is a pretty valid assumption, for example nodes get frequent status updates, ...).

Signed-off-by: Jens Erat <jens.erat@mercedes-benz.com>
2022-12-22 13:26:03 +01:00
Julien Pivotto 3677d61a4b Update kubernetes dependencies
A new API is available for AddEventHandlers, to get errors but also be
able to cancel handlers.

Doing the easy thing for the release, which is just to log errors.

We could see how to improve this in the future to handle the errors
properly and cancel the handlers.

Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2022-12-14 10:46:20 +01:00
Aaron George d542483e8c k8s discovery: Ensure that the pod IP is in the status before adding to target group
Signed-off-by: Aaron George <aaron@ometria.com>

Signed-off-by: Aaron George <aaron@ometria.com>
2022-11-30 09:04:14 +00:00
Maciej Borsz 56eba3ace2 Use protobuf encoding in client-go
Signed-off-by: Maciej Borsz <maciejborsz@google.com>
2022-09-26 12:54:33 +00:00
Karl Piplies 3782cb40d5 add loadbalancerip to service labels
Signed-off-by: Karl Piplies <karl.piplies@mercedes-benz.com>
2022-08-10 12:40:11 +02:00
Frederic Branczyk 414c3e549c
Merge pull request #11002 from yngwiewang/feature/k8s-service-port-number
feat:(kubernetes_sd): add __meta_kubernetes_service_port_number (#10945)
2022-07-22 16:13:55 +02:00