* add description for __meta_kubernetes_endpoints_label_* and __meta_kubernetes_endpoints_labelpresent_*
Signed-off-by: renzheng.wang <wangrzneu@gmail.com>
This follow a simple function-based approach to access the count and
sum fields of a native Histogram. It might be more elegant to
implement “accessors” via the dot operator, as considered in the
brainstorming doc [1]. However, that would require the introduction of
a whole new concept in PromQL. For the PoC, we should be fine with the
function-based approch. Even the obvious inefficiencies (rate'ing a
whole histogram twice when we only want to rate each the count and the
sum once) could be optimized behind the scenes.
Note that the function-based approach elegantly solves the problem of
detecting counter resets in the sum of observations in the case of
negative observations. (Since the whole native Histogram is rate'd,
the counter reset is detected for the Histogram as a whole.)
We will decide later if an “accessor” approach is really needed. It
would change the example expression for average duration in
functions.md from
histogram_sum(rate(http_request_duration_seconds[10m]))
/
histogram_count(rate(http_request_duration_seconds[10m]))
to
rate(http_request_duration_seconds.sum[10m])
/
rate(http_request_duration_seconds.count[10m])
[1]: https://docs.google.com/document/d/1ch6ru8GKg03N02jRjYriurt-CZqUVY09evPg6yKTA1s/edit
Signed-off-by: beorn7 <beorn@grafana.com>
The Kubernetes service discovery can only add node labels to
targets from the pod role.
This commit extends this functionality to the endpoints and
endpointslices roles.
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Since Prometheus documentation is versioned, do not write down that a
specific function was added in Prom 2.0, for consistency.
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
* Provide a callout about the vector matching keywords and the group
modifiers.
Relates prometheus/docs#2106
Signed-off-by: Harold Dost <h.dost@criteo.com>
We always track total samples queried and add those to the standard set
of stats queries can report.
We also allow optionally tracking per-step samples queried. This must be
enabled both at the engine and query level to be tracked and rendered.
The engine flag is exposed via a Prometheus feature flag, while the
query flag is set when stats=all.
Co-authored-by: Alan Protasio <approtas@amazon.com>
Co-authored-by: Andrew Bloomgarden <blmgrdn@amazon.com>
Co-authored-by: Harkishen Singh <harkishensingh@hotmail.com>
Signed-off-by: Andrew Bloomgarden <blmgrdn@amazon.com>
Update the wording of the `labelmap` relabel action to make it more
clear that it acts on all the label names, rather than the list provided
by source_labels.
Signed-off-by: SuperQ <superq@gmail.com>
* Fix documentation for Docker API filters
Signed-off-by: Robert Jacob <xperimental@solidproject.de>
* Undo indentation change
Signed-off-by: Robert Jacob <xperimental@solidproject.de>
* Added pathPrefix function to the template reference documentation
Signed-off-by: David N Perkins <David.N.Perkins@ibm.com>
* PR suggestions
Signed-off-by: David N Perkins <David.N.Perkins@ibm.com>
* Switch wording back
Signed-off-by: David N Perkins <David.N.Perkins@ibm.com>
* Followup on OpenTelemetry migration
- tracing_config: Change with_insecure to insecure, default to false.
- tracing_config: Call SetDirectory to make TLS certificates relative to the Prometheus
configuration
- documentation: Change bool to boolean in the configuration
- documentation: document type float
- tracing: Always restart the tracing manager when TLS config is set to
reload certificates
- tracing: Always set TLS config, which could be used e.g. in case of
potential redirects.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>\\
* Allow escaping a dollar sign when expanding external labels
There is currently no mechanism to natively escape a dollar sign
in the os.Expand function. As a workaround, this commit modifies
the external label expansion logic to treat a double dollar ($$)
as a mechanism for escaping the dollar character.
Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
Split out the note about regex-matching into a separate paragraph to
make it more obvious. Move it up, closer to the definition.
Signed-off-by: SuperQ <superq@gmail.com>
This follows the line of argument that the invariant of not looking
ahead of the query time was merely emerging behavior and not a
documented stable feature. Any query that looks ahead of the query
time was simply invalid before the introduction of the negative offset
and the @ modifier.
Signed-off-by: beorn7 <beorn@grafana.com>
Following the argument that breaking the invariant that PromQL does
not look ahead of the evaluation time implies a breaking change, we
still need to keep the feature flag around, but at least we can
communicate that the feature is considered stable, and that the
feature flags will be ignored from v3 on.
Signed-off-by: beorn7 <beorn@grafana.com>
Since `/api/v1/write` is a mutating endpoint, we should still activate
the remote-write-receiver explicitly. But we should do it in the same
way as the other mutating endpoints, i.e. via a flag
`--web.enable-remote-write-receiver`.
This commit marks the feature flag as deprecated, i.e. it still works
but logs a warning on startup. This enables users to seamlessly
migrate. With the next minor release, we can start ignoring the
feature flag (but still warn a user that is trying to use it).
Signed-off-by: beorn7 <beorn@grafana.com>
This commit adds support for discovering targets from the same
Kubernetes namespace as the Prometheus pod itself. Own-namespace
discovery can be indicated by using "." as the namespace.
Fixes#9782
Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
When using Kubernetes on cloud providers, nodes will have the
spec.providerID field populated to contain the cloud provider specific
name of the EC2/GCE/... instance.
Let's expose this information as an additional label, so that it's
easier to annotate metrics and alerts to contain the cloud provider
specific name of the instance to which it pertains.
Signed-off-by: Ed Schouten <eschouten@apple.com>
This can be useful when generating rules, a query may use a duration,
and it may be useful to template that into a URL parameter. Therefore
this allows interfacing with systems that don't implement Prometheus
style duration parsing.
Signed-off-by: David Leadbeater <dgl@dgl.cx>
* remote-write: slow down retries to avoid DDOS
Increase the default max retry time from 100ms to 5 seconds.
Remote write calls are retried after a recoverable error such as the
back-end returning 500. Prometheus waits the minimum time and retries,
then doubles the wait on each subsequent retry until the maximum is
reached.
If some data is still getting through, remote-write will also increase
shards, and the default maximum is 200. 200 shards sending every 100ms
is 20 calls per second, to a back-end that is already in trouble.
5 seconds was chosen to match the default BatchSendDeadline: if we can
afford to wait that long for no response, then we can wait the same time
to retry. We will reach 5 seconds after 9 successive failures.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Update config doc for max_backoff change
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Add a feature flag to enable the new manager
This PR creates a copy of the legacy manager and uses it by default.
It is a companion PR to #9349. With this PR, users can enable the new
discovery manager and provide us with any feedback / side effects that
the new behaviour might have.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
We have been Puppet user for 10 years and we are users of
https://github.com/camptocamp/prometheus-puppetdb-sd
However, that file_sd implementation contains business logic and
assumptions around e.g. the modules which you are using.
This pull request adds a simple PuppetDB service discovery, which will
enable more use cases than the upstream sd.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
This adds a new metric exposing per target scrape sample_limit value. Metrics are only exposed if extra-scrape-metrics feature flag is enabled.
scrape_sample_limit will make it easy to monitor and alert on targets getting close to configured sample_limit, which is important given than exceeding sample_limit results in the entire scrape results being rejected.
Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
Add a new built-in metric `scrape_timeout_seconds` to allow monitoring
of the ratio of scrape duration to the scrape timeout. Hide behind a
feature flag to avoid additional cardinality by default.
Signed-off-by: SuperQ <superq@gmail.com>