Commit graph

7761 commits

Author SHA1 Message Date
Julien Pivotto 8907ba6235 Make TSDB use storage errors
This fixes #6992, which was introduced by #6777. There was an
intermediate component which translated TSDB errors into storage errors,
but that component was deleted and this bug went unnoticed, until we
were watching at the Prombench results. Without this, scrape will fail
instead of dropping samples or using "Add" when the series have been
garbage collected.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-03-17 22:24:25 +01:00
Julien Pivotto ccc511456a
Release 2.17.0-rc.1 (#6991)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-03-16 16:59:12 +01:00
Tobias Guggenmos 012161d90d
PromQL: Fix lexer error handling (#6958)
* PromQL: Fix lexer error handling

This fixes bugs in the handling of lexer errors that are only noticeable for users of the language server and caused https://github.com/prometheus-community/promql-langserver/issues/104 .

Signed-off-by: Tobias Guggenmos <tobias.guggenmos@uni-ulm.de>

* Add test for error position ranges

Signed-off-by: Tobias Guggenmos <tobias.guggenmos@uni-ulm.de>
2020-03-16 15:47:47 +01:00
Julien Pivotto 0f9e78bd88
tsdb: fix races around head chunks (#6985)
* tsdb: fix races around head chunks

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-03-16 13:59:22 +01:00
Jaga Santagostino 350f25eed3
changes from PR requests (#6880)
Signed-off-by: Jaga Santagostino <jagasantagostino@gmail.com>
2020-03-16 08:22:08 +01:00
Julien Pivotto ef138a11c8
Update client_golang to fix promhttp error handling (#6986)
Fixes #6139

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-03-15 00:42:14 +01:00
Bartlomiej Plotka 4d6f9b75e2
Merge pull request #6971 from prometheus/select-sorted
storage: Removed SelectSorted method while maintaining logic; Simplified interface.
2020-03-14 11:20:37 +00:00
Björn Rabenstein a28fa010ee
TSDB: Extract parts out of populateSeries (#6983)
This addresses fabxc's TODO.

More importantly, it now properly defers the
querier.Close(). Previously, if a panic happened after creation of the
querier within the populateSeries function, querier.Close() was never called.

The latter was responsible for #6977.

Signed-off-by: beorn7 <beorn@grafana.com>
2020-03-14 09:03:40 +01:00
Chris Marchbanks 3128875ff4
Fix panic when a remote read store errors (#6975)
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2020-03-13 20:57:29 +01:00
Björn Rabenstein d80b0810c1
Move crucial actions to defer (#6918)
With defer having less of a performance penalty, there is no reason
not to do those crucial operations via defer.

Context: With isolation in place, if we forget to Commit/Rollback, the
low watermark will get stuck forever.

The current code should not have any bugs, but moving to defer helps
to avoid future bugs.

This is also moving the `closeAppend` in the `Commit` implementation
itself to defer. If logging to the WAL fails, we would have missed the
`closeAppend`.

Signed-off-by: beorn7 <beorn@grafana.com>
2020-03-13 20:54:47 +01:00
Bartlomiej Plotka 9d9c45588e Addressed Goutham's review.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-03-13 19:18:31 +00:00
Bartlomiej Plotka cd9516316a Addressed Brian's comments.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-03-13 14:37:47 +00:00
Bartlomiej Plotka fe802f29c9 storage: Removed SelectSorted method; Simplified interface; Added requirement for remote read to sort response.
This is technically BREAKING CHANGE, but it was like this from the beginning: I just notice that we rely in
Prometheus on remote read being sorted. This is because we use selected data from remote reads in MergeSeriesSet
which rely on sorting.

I found during work on https://github.com/prometheus/prometheus/pull/5882 that
we do so many repetitions because of this, for not good reason. I think
I found a good balance between convenience and readability with just one method.
Smaller the interface = better.

Also I don't know what TestSelectSorted was testing, but now it's testing sorting.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-03-13 13:06:25 +00:00
Julien Pivotto bf0055536d
Release 2.17.0-rc.0 (#6966)
* Release 2.17.0-rc.0

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-03-12 07:43:24 +01:00
Björn Rabenstein bc703b6456
Use struct{} as underlying type for context keys (#6965)
This is an alternative to #6963.

Signed-off-by: beorn7 <beorn@grafana.com>
2020-03-11 15:05:35 +01:00
coding3min 4dfbf328f2
[OpenStack SD] Add HypervisorID meta labels about id (#6962)
Add extra meta labels which will be useful in the case
Prometheus discovery hypervisor .

Signed-off-by: pzqu <pzqu@qq.com>

Co-authored-by: pzqu <pzqu@example.com>
2020-03-11 08:38:14 +00:00
Diana Payton 6ab413ab41
Docs: Update getting_started.md (#6955)
* Update getting_started.md

Signed-off-by: Diana Payton <52059945+oddlittlebird@users.noreply.github.com>
2020-03-10 20:02:41 +00:00
Björn Rabenstein 6029fa0b1b
Merge pull request #6951 from prometheus/beorn7/tsdb
tsdb: Do a full rollback upon commit error
2020-03-10 18:48:57 +01:00
beorn7 f6f4fd6556 tsdb: Do a full rollback upon commit error
I think the previous behavior is problematic as it will leave
`memSeries` around that still have `pendingCommit` set to `true`.

The only case where this can happen in this code path is a failure to
write to the WAL, in which case we are probably in trouble anyway. I
believe, however, we should still try to do the right thing and do the
full rollback. This will implicitly try to write to the WAL again, but
this time without samples, which may even succeed. (But we propagate
the previous error in any case.)

This also adds `a.head.putSeriesBuffer(a.sampleSeries)` to Rollback,
which was previously missing.

Signed-off-by: beorn7 <beorn@grafana.com>
2020-03-10 14:54:41 +01:00
李国忠 261cbab8e9
remove Unused parameter 'reg' in wal.Open function (#6941)
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
2020-03-10 11:01:47 +05:30
Julien Pivotto a17ddcf0f9
update vendor (#6936)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-03-09 15:27:19 +01:00
johncming df14bca643
tsdb: fix typo for wrong metric name (#6938)
Signed-off-by: johncming <johncming@yahoo.com>
2020-03-09 08:25:31 +00:00
Julien Pivotto 5ddd1dcf0f
Fix panic when parsing varags (#6940)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-03-08 13:09:24 +01:00
Boyko 84b00564f4
API_V1: Extract time param parsing logic (#6860)
* extract API time param parsing logic in a func

Signed-off-by: blalov <boiskila@gmail.com>
2020-03-06 12:33:01 +02:00
Tobias Guggenmos 1dbd799354
PromQL: Fix regression tests (#6935)
This PR fixes the regression tests for the issue fixed in #6931 .

The reason for that is that all of the invalid queries that triggered the regression have become more or less valid syntax in #6933 (they might still fail typechecking).

Signed-off-by: Tobias Guggenmos <tobias.guggenmos@uni-ulm.de>
2020-03-06 08:17:01 +00:00
Brian Brazil 5da8990053
Log scrape append failures as debug rather than warn. (#6852)
This is most likely due to an endpoint not producing valid
metrics output, which we should treat the same as a failed
scrape, and thus not spam the application logs with it.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2020-03-06 00:46:03 +00:00
Brian Brazil 44ad28dd5e
PromQL: Allow more keywords as metric names (#6933)
* Allow more keywords as metric names
* Add documentation about forbidden keywords

Signed-off-by: Tobias Guggenmos <tobias.guggenmos@uni-ulm.de>
2020-03-05 13:20:53 +00:00
Brian Brazil 7164b58945
PromQL: Fix parser panic (#6931)
Signed-off-by: Tobias Guggenmos <tobias.guggenmos@uni-ulm.de>
2020-03-05 08:03:38 +00:00
beorn7 0193b746b1 Defer call to iso.closeAppend
This is taken from #6918. Since we probably won't merge #6918 before
the relase, we have to do this bit of it as it fixes an actual bug
(iso.closeAppend is not called if the append fails because of an error
logging to the WAL).

Signed-off-by: beorn7 <beorn@grafana.com>
2020-03-04 23:33:30 +01:00
Hrishikesh Barman d1c00bb7c2
Added GITHUB_RUN_ID for direct link to running workflow. (#6926)
* Added prominfra docker image path.

Signed-off-by: Hrishikesh Barman <plain.hrishikeshbman@gmail.com>

* Added GITHUB_RUN_ID usage.

Signed-off-by: Hrishikesh Barman <plain.hrishikeshbman@gmail.com>
2020-03-04 15:06:02 +02:00
Ganesh Vernekar 2cc32c332d Log WAL replay duration
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-03-03 18:00:04 +01:00
Guangwen Feng a128fc3534
Add unit tests in labels.go (#6921)
* Add unit test for func Equal in labels.go
* Add unit test for func Compare in labels.go

Signed-off-by: Guangwen Feng <fenggw-fnst@cn.fujitsu.com>
2020-03-03 14:17:54 +00:00
李国忠 2bf4952049
remove Unused parameter 'sf' in calcTrendValue function (#6900)
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
2020-03-03 08:23:27 +00:00
Alex Gaganov df92a00838
Expose EC2 instance lifecycle as label (#6914)
Signed-off-by: Alex Gaganov <alex.gaganov@fiverr.com>
2020-03-03 08:03:16 +00:00
李国忠 52025bd7a9
[comments] change word ‘wheter’ to ‘whether’ (#6912)
* [comments] change word ‘wheter’ to ‘whether’
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>

* [comments] change word ‘wheter’ to ‘whether’
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
2020-03-02 13:51:24 +05:30
Julien Pivotto c67f81937c
discovery: updateGroup should not create targets[poolKey] in the loop (#6903)
We can assume that not all target groups are nil in normal scernarios,
so we can create targets[poolKey] outside the loop.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-03-02 07:35:02 +00:00
Julien Pivotto ed623f69e2
tsdb: don't allow ingesting empty labelsets (#6891)
* tsdb: don't allow ingesting empty labelsets

When we ingest an empty labelset in the head, further blocks can not be
compacted, with the error:

```
level=error ts=2020-02-27T21:26:58.379Z caller=db.go:659 component=tsdb
msg="compaction failed" err="persist head block: write compaction:
add series: out-of-order series added with label set \"{}\" / prev:
\"{}\""
```

We should therefore reject those invalid empty labelsets upfront.

This can be reproduced with the following:

```
cat << END > prometheus.yml
scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 1s
    basic_auth:
      username: test
      password: test
    metric_relabel_configs:
    - regex: ".*"
      action: labeldrop

    static_configs:
    - targets:
      - 127.0.1.1:9090
END
./prometheus --storage.tsdb.min-block-duration=1m
```
And wait a few minutes.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-03-02 07:18:05 +00:00
Julien Pivotto 2051ae2e6a Revert "Fix race condition in Rule manager.Update() function"
This reverts commit 8b11d2cfb6.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-03-02 08:01:21 +01:00
Björn Rabenstein babadf13e8
Merge pull request #6899 from prometheus/beorn7/isolation
Do not attempt isolation for appendID == 0
2020-03-01 16:21:39 +01:00
Mario Trangoni d9cb4a14d3 scrape/target_test.go: remove deprecated function BuildNameToCertificate()
Related to eb93c684d4

See,
$ make lint
>> running golangci-lint
GO111MODULE=on go list -e -compiled -test=true -export=false -deps=true -find=false -tags= -- ./... > /dev/null
GO111MODULE=on /home/mt/go/packages/bin/golangci-lint run  ./...
scrape/target_test.go:260:2: SA1019: tlsConfig.BuildNameToCertificate is deprecated: NameToCertificate only allows associating a single certificate with a given name. Leave that field nil to let the library select the first compatible chain from Certificates.  (staticcheck)
	tlsConfig.BuildNameToCertificate()
	^
scrape/target_test.go:357:2: SA1019: tlsConfig.BuildNameToCertificate is deprecated: NameToCertificate only allows associating a single certificate with a given name. Leave that field nil to let the library select the first compatible chain from Certificates.  (staticcheck)
	tlsConfig.BuildNameToCertificate()
	^
make: *** [Makefile.common:181: common-lint] Error 1

$ go version
go version go1.14 linux/amd64

Signed-off-by: Mario Trangoni <mjtrangoni@gmail.com>
2020-03-01 15:58:22 +01:00
fuling 8b11d2cfb6 Fix race condition in Rule manager.Update() function
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
2020-03-01 12:00:51 +01:00
李国忠 3fff701b77
[comments] change word ‘dependencie’ to ‘dependencies’ (#6904)
Signed-off-by: fuling <fuling.lgz@alibaba-inc.com>
2020-03-01 09:34:27 +01:00
beorn7 d9af11e636 Do not attempt isolation for appendID == 0
Signed-off-by: beorn7 <beorn@grafana.com>
2020-03-01 02:48:35 +01:00
Simon Kirsten 45fbed94d6
React UI: Fixed data table for matrix always showing 1 as value (#6896)
Signed-off-by: Simon Kirsten <1972314+skirsten@users.noreply.github.com>

React UI: Fixed data table for matrix always showing 1 as value
2020-02-29 07:45:48 +01:00
Chris Marchbanks 0e78908407
Escape target selector for tooltip on targets page (#6892)
This fixes an issue where the /new/targets page will not load when there
are jobs with invalid CSS characters in them, such as the
namespace/service/0 form used by the Prometheus Operator.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2020-02-28 20:27:45 +01:00
Björn Rabenstein d137cddd12
Merge pull request #6841 from prometheus/beorn7/isolation
Port isolation from old TSDB PR
2020-02-28 17:48:04 +01:00
beorn7 7f30b0984d Implement isolation
This has been ported from https://github.com/prometheus/tsdb/pull/306.

Original implementation by @brian-brazil, explained in detail in the
2nd half of this talk:
https://promcon.io/2017-munich/talks/staleness-in-prometheus-2-0/

The implementation was then processed by @gouthamve into the PR linked
above. Relevant slide deck:
https://docs.google.com/presentation/d/1-ICg7PEmDHYcITykD2SR2xwg56Tzf4gr8zfz1OerY5Y/edit?usp=drivesdk

Signed-off-by: beorn7 <beorn@grafana.com>
Co-authored-by: Brian Brazil <brian.brazil@robustperception.io>
Co-authored-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2020-02-28 14:18:39 +01:00
beorn7 6b8181370f Fix punctuation nits
Signed-off-by: beorn7 <beorn@grafana.com>
2020-02-28 14:17:33 +01:00
LongKB 82f7ed208b
Remove some duplicated words (#6882)
Signed-off-by: Pham Duc Hanh <hanhpd@fujitsu.com>
2020-02-27 07:08:31 +01:00
Peter Štibraný 1d396b96dc
Specify that returned samples must be ordered by timestamp. (#6877)
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
2020-02-26 13:11:55 +00:00