This fixes#6992, which was introduced by #6777. There was an
intermediate component which translated TSDB errors into storage errors,
but that component was deleted and this bug went unnoticed, until we
were watching at the Prombench results. Without this, scrape will fail
instead of dropping samples or using "Add" when the series have been
garbage collected.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
With defer having less of a performance penalty, there is no reason
not to do those crucial operations via defer.
Context: With isolation in place, if we forget to Commit/Rollback, the
low watermark will get stuck forever.
The current code should not have any bugs, but moving to defer helps
to avoid future bugs.
This is also moving the `closeAppend` in the `Commit` implementation
itself to defer. If logging to the WAL fails, we would have missed the
`closeAppend`.
Signed-off-by: beorn7 <beorn@grafana.com>
This is most likely due to an endpoint not producing valid
metrics output, which we should treat the same as a failed
scrape, and thus not spam the application logs with it.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* [comments] change word ‘wheter’ to ‘whether’
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
* [comments] change word ‘wheter’ to ‘whether’
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
* tsdb: don't allow ingesting empty labelsets
When we ingest an empty labelset in the head, further blocks can not be
compacted, with the error:
```
level=error ts=2020-02-27T21:26:58.379Z caller=db.go:659 component=tsdb
msg="compaction failed" err="persist head block: write compaction:
add series: out-of-order series added with label set \"{}\" / prev:
\"{}\""
```
We should therefore reject those invalid empty labelsets upfront.
This can be reproduced with the following:
```
cat << END > prometheus.yml
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 1s
basic_auth:
username: test
password: test
metric_relabel_configs:
- regex: ".*"
action: labeldrop
static_configs:
- targets:
- 127.0.1.1:9090
END
./prometheus --storage.tsdb.min-block-duration=1m
```
And wait a few minutes.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Related to eb93c684d4
See,
$ make lint
>> running golangci-lint
GO111MODULE=on go list -e -compiled -test=true -export=false -deps=true -find=false -tags= -- ./... > /dev/null
GO111MODULE=on /home/mt/go/packages/bin/golangci-lint run ./...
scrape/target_test.go:260:2: SA1019: tlsConfig.BuildNameToCertificate is deprecated: NameToCertificate only allows associating a single certificate with a given name. Leave that field nil to let the library select the first compatible chain from Certificates. (staticcheck)
tlsConfig.BuildNameToCertificate()
^
scrape/target_test.go:357:2: SA1019: tlsConfig.BuildNameToCertificate is deprecated: NameToCertificate only allows associating a single certificate with a given name. Leave that field nil to let the library select the first compatible chain from Certificates. (staticcheck)
tlsConfig.BuildNameToCertificate()
^
make: *** [Makefile.common:181: common-lint] Error 1
$ go version
go version go1.14 linux/amd64
Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
This is part of https://github.com/prometheus/prometheus/pull/5882 that can be done to simplify things.
All todos I added will be fixed in follow up PRs.
* querier.Querier, querier.Appender, querier.SeriesSet, and querier.Series interfaces merged
with storage interface.go. All imports that.
* querier.SeriesIterator replaced by chunkenc.Iterator
* Added chunkenc.Iterator.Seek method and tests for xor implementation (?)
* Since we properly handle SelectParams for Select methods I adjusted min max
based on that. This should help in terms of performance for queries with functions like offset.
* added Seek to deletedIterator and test.
* storage/tsdb was removed as it was only a unnecessary glue with incompatible structs.
No logic was changed, only different source of abstractions, so no need for benchmarks.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Also improves TestPopulateLabels: testutil.ErrorEqual just returned a
bool without failing the test.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
To test the implementation of our metric metadata API, we need to represent various states of metadata in the scrape metadata store. That is currently not possible as the interface and method to set the store are private.
This changes the interface, list and get methods, and the SetMetadaStore function to be public.
Incidentally, the scrapeCache implementation needs to be renamed to match the new signature.
Signed-off-by: gotjosh <josue@grafana.com>
When using both a label and the suffix+label in the
relabel config. It's possible that Prometheus remove
the suffx+label for no obvious reason. It's due to a
collision when merging labels from target and from
the sample.
Signed-off-by: Geoffrey Beausire <g.beausire@criteo.com>
* Update go.mod dependencies before release
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Add issue for showing query warnings in promtool
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Revert json-iterator back to 1.1.6
It produced errors when marshaling Point values with special float
values.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Fix expected step values in promtool tests after client_golang update
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Update generated protobuf code after proto dep updates
Signed-off-by: Julius Volz <julius.volz@gmail.com>
Fixes https://github.com/prometheus/common/issues/36
Move logic handling this into the labels package,
so all the cases are handled in one place and we're
less likely to have this come up again.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
This is an estimate of churn, with series being added to the cache being
considered churn. This will have both false positives (e.g. series
appearing and disappearing) and false negatives (e.g. series hit
sample_limit, but still created in head block), but should be generally
useful as-is.
Relevant docs live in another repo.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
From the documentation:
> The default HTTP client's Transport may not
> reuse HTTP/1.x "keep-alive" TCP connections if the Body is
> not read to completion and closed.
This effectively enable keep-alive for the fixed requests.
Signed-off-by: Romain Baugue <romain.baugue@elwinar.com>
* Reload certificates from disk automatically
This change bumps github.com/prometheus/common to include
https://github.com/prometheus/common/pull/173
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* scrape: close idle connections on reload/stop
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* use v0.3.0 tag
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Now that we're not losing the scrape cache across failed
scrape, a scrape that continually failed but had varying
series or metadata (e.g. timestamps in metric names,
plus hitting smaple_limit) would grow the cache indefinitely.
Add some code to catch that, and flush the cache anyway.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors.
ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives.
iii) Does away with the use of fmt package for errors in favour of pkg/errors
Signed-off-by: tariqibrahim <tariq181290@gmail.com>
* scrape: Add global jitter for HA server
Covers issue in https://github.com/prometheus/prometheus/pull/4926#issuecomment-449039848
where the HA setup become a problem for targets unable to be scraped simultaneously.
The new jitter per server relies on the hostname and external labels which necessarily to be uniq.
As before, scrape offset will be calculated with regard the absolute time, so even
restart/reload doesn't change scrape time per scrape target + prometheus instance.
Use fqdn if possible, otherwise fall back to the hostname. It adds extra random seed
to calculate server hash to be distinguish on machines with the same hostname, but
different DC.
Signed-off-by: Aleksei Semiglazov <xjewer@gmail.com>
Although it is spelling mistakes, it might make an affects
while reading.
Co-Authored-By: Kim Bao Long longkb@vn.fujitsu.com
Signed-off-by: Nguyen Hai Truong <truongnh@vn.fujitsu.com>
* scrape: catch errors when creating HTTP clients
This change makes sure that no scrape pool is created with a nil HTTP
client.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Address Tariq's comment
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Address Brian's comment
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* *: use latest release of staticcheck
It also fixes a couple of things in the code flagged by the additional
checks.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Use official release of staticcheck
Also run 'go list' before staticcheck to avoid failures when downloading packages.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* buf Has been declared in scrape.go in line 785, I think it is unnecessary to declare a new variable again here.
Signed-off-by: arugakiWei <arugaki.wei@daocloud.io>
* delete the buf in line 785 because it is never used.
Signed-off-by: arugakiWei <arugaki.wei@daocloud.io>
* *: remove use of golang.org/x/net/context
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* scrape: fix TestTargetScrapeScrapeCancel
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Pass content type down to text parser.
Add layer of indirection in front of text parser,
and rename to avoid future clashes.
Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
* Add evaluationTimestamp (Last Evaluation) column to display on /rules
Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>
* Add lastScrapeDuration ("Scrape Duration") to display on /targets
Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>
* Updates based on Julius' feedback
Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>
* Update to set timestamp to when eval started (after eval completes)
Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>
* Update /rules to display time since last evaluation
Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>
* Re-order Last Eval/Eval Time to be consistent with targets page
Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>
The scrape manage receiver's channel now just saves the target sets
and another backgorund runner updates the scrape loops every 5 seconds.
This is so that the scrape manager doesn't block the receiving channel
when it does the long background reloading of the scrape loops.
Active and dropped targets are now saved in each scrape pool instead of
the scrape manager. This is mainly to avoid races when getting the
targets via the web api.
When reloading the scrape loops now happens in parallel to speed up the
final disared state and this also speeds up the prometheus's shutting
down.
Also updated some funcs signatures in the web package for consistency.
Signed-off-by: Krasi Georgiev <kgeorgie@redhat.com>
There are many more (mostly finalizers like Close/Stop/etc.), but most of
the others seemed like one couldn't do much about them anyway.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
This adds a per-target cache of scraped metadata. The metadata is only
available for the lifecycle of the attached target. An API endpoint allows
to select metadata by metric name and a label selection of targets.
Signed-off-by: Fabian Reinartz <freinartz@google.com>
Extends the parser to allow retrieving metadata.
The lexer now yields proper tokens that are fed into a hand-written
parser on top.
Signed-off-by: Fabian Reinartz <freinartz@google.com>
* promql: Rewrote tests with testutil for functions_test
Signed-off-by: Elif T. Kuş <elifkus@gmail.com>
* pkg/relabel: Rewrote tests with testutil for relabel_test
Signed-off-by: Elif T. Kuş <elifkus@gmail.com>
* discovery/consul: Rewrote tests with testutil for consul_test
Signed-off-by: Elif T. Kuş <elifkus@gmail.com>
* scrape: Rewrote tests with testutil for manager_test
Signed-off-by: Elif T. Kuş <elifkus@gmail.com>
This commit avoids passing the full scrape configuration down to the
scrape loop to fix data races when the scrape configuration is being
reloaded.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Fix race by properly locking access to scrape pools. Use separate mutex for information needed by UI so that UI isn't blocked when targets are being updated.
The semantics of honor_labels are that if a target exposes
and empty label it will override the target labels. This PR
fixes that by once again distinguishing between empty labels
and missing labels in this one use case.
Beyond that empty labels should be pruned and not added to storage,
which this also fixes.
Fixes#3841