Commit graph

1089 commits

Author SHA1 Message Date
Chris Marchbanks f0f8e50567
Fix missing remote read spans (#7914)
The remote read client needs to use the nethttp.Transport wrapper in
order for spans to be instrumented properly.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2020-09-09 14:03:48 +02:00
Julien Pivotto a6ee1f8517
Merge pull request #7913 from prometheus/release-2.21
Merge release 2.21 into master
2020-09-09 11:08:32 +02:00
Bartlomiej Plotka 088fcc9e48
Fixed iterator regression: Avoid using heap for each sample when iterating. (#7900)
* Fixed iterator regression: Avoid using heap for each sample when iterating.

Fixes: https://github.com/prometheus/prometheus/issues/7873

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Check for .At() called after .Next() returned false

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* More comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-09-08 16:23:01 +02:00
Bartlomiej Plotka a399227a9f Revert "Fixed iterator regression: Avoid using heap for each sample when iterating."
This reverts commit 2c8b2c5915.
2020-09-04 17:10:42 +01:00
Julien Pivotto 2c8b2c5915 Fixed iterator regression: Avoid using heap for each sample when iterating.
Fixes: https://github.com/prometheus/prometheus/issues/7873

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-09-04 17:09:40 +01:00
Anand Sanmukhani f4a1b3cef6
Rename remote.newReadClient() to remote.NewReadClient() (#7881)
This should make the `NewReadClient()` exported outside the `remote` package

Signed-off-by: Anand Sanmukhani <asanmukh@redhat.com>
2020-09-02 17:15:10 +01:00
showuon dfdc358a5b
Fix the duplicated results issue from /api/v1/series (#7862)
* Fix the duplicated results issue from /api/v1/series

Signed-off-by: Luke Chen <showuon@gmail.com>
2020-08-29 01:21:39 +02:00
Robert Fratto 2bd077ed97
expose UserAgent to make it changeable by importers (#7832)
Signed-off-by: Robert Fratto <robertfratto@gmail.com>
2020-08-25 10:38:37 -06:00
Joe Elliott 624075eafe
Exposes remote storage http client to allow for customization (#7831)
* Exposes remote write client to allow for customizations to the http client

Signed-off-by: Joe Elliott <number101010@gmail.com>

* Removed changes to vendored file

Signed-off-by: Joe Elliott <number101010@gmail.com>
2020-08-20 08:45:31 -06:00
Andy Bursavich 4e6a94a27d
Invert service discovery dependencies (#7701)
This also fixes a bug in query_log_file, which now is relative to the config file like all other paths.

Signed-off-by: Andy Bursavich <abursavich@gmail.com>
2020-08-20 13:48:26 +01:00
Julien Pivotto a55c69c4c3
Apply gofmt -s on storage/remote/write_test.go (#7791)
Noticed in
https://goreportcard.com/report/github.com/prometheus/prometheus

This is the only file.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-08-12 19:55:42 -06:00
Mucahit Kurt 869f1bc587
use remote.Storage in TestSampleDelivery instead of direct creation of QueueManager (#4758)
Signed-off-by: Mucahit Kurt <mucahitkurt@gmail.com>
2020-08-11 12:37:03 -07:00
johncming 2b75c1b199
storage/remote: rename httpclient name. (#7747)
Signed-off-by: johncming <johncming@yahoo.com>
2020-08-10 11:32:43 -07:00
Guangwen Feng 2c4a4548a8
Fix golint issue caused by incorrect func name (#7756)
Signed-off-by: Guangwen Feng <fenggw-fnst@cn.fujitsu.com>
2020-08-06 20:27:37 +01:00
Bartlomiej Plotka 28c5cfaf0d
tsdb: Moved code merge series and iterators to differen files; cleanup. No functional changes just move! (#7714)
I did not want to move those in previous PR to make it easier to review. Now small cleanup time for readability. (:

## Changes

* Merge series goes to `storage/merge.go` leaving `fanout.go` for just fanout code.
* Moved `fanout test` code from weird separate package to storage.
* Unskiped one test: TestFanout_SelectSorted/chunk_querier
* Moved block series set codes responsible for querying blocks to `querier.go` from `compact.go`



Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-08-03 11:32:56 +01:00
Julien Pivotto 9848e77678
storage/remote: remove unused code now that tsdb implements ChunkQuerier (#7715)
* storage/remote: remove unused code now that tsdb implements ChunkQuerier

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* encodeChunks is unused too

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-08-03 10:13:36 +02:00
Bartlomiej Plotka e6d7cc5fa4
tsdb: Added ChunkQueryable implementations to db; unified MergeSeriesSets and vertical to single struct. (#7069)
* tsdb: Added ChunkQueryable implementations to db; unified compactor, querier and fanout block iterating.

Chained to https://github.com/prometheus/prometheus/pull/7059

* NewMerge(Chunk)Querier now takies multiple primaries allowing tsdb DB code to use it.
* Added single SeriesEntry / ChunkEntry for all series implementations.
* Unified all vertical, and non vertical for compact and querying to single
merge series / chunk sets by reusing VerticalSeriesMergeFunc for overlapping algorithm (same logic as before)
* Added block (Base/Chunk/)Querier for block querying. We then use populateAndTomb(Base/Chunk/) to iterate over chunks or samples.
* Refactored endpoint tests and querier tests to include subtests.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed comments from Brian and Beorn.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed snapshot test and added chunk iterator support for DBReadOnly.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed race when iterating over Ats first.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed tests.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed populate block tests.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed endpoints test.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed test.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Added test & fixed case of head open chunk.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed DBReadOnly tests and bug producing 1 sample chunks.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Added cases for partial block overlap for multiple full chunks.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Added extra tests for chunk meta after compaction.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed small vertical merge bug and added more tests for that.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-07-31 16:03:02 +01:00
Annanay ec562f152b Merge branch 'master' into appender-context
Signed-off-by: Annanay <annanayagarwal@gmail.com>
2020-07-31 13:03:56 +05:30
Annanay 9bba8a6eae Merge branch 'master' into appender-context
Signed-off-by: Annanay <annanayagarwal@gmail.com>
2020-07-30 16:43:18 +05:30
Annanay 89129cd39a Address comments
Signed-off-by: Annanay <annanayagarwal@gmail.com>
2020-07-30 16:41:13 +05:30
Javier Palomo Almena b58a613443
Replace sync/atomic with uber-go/atomic (#7683)
* storage: Replace usage of sync/atomic with uber-go/atomic

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* tsdb: Replace usage of sync/atomic with uber-go/atomic

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* web: Replace usage of sync/atomic with uber-go/atomic

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* notifier: Replace usage of sync/atomic with uber-go/atomic

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* cmd: Replace usage of sync/atomic with uber-go/atomic

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* scripts: Verify that we are not using restricted packages

It checks that we are not directly importing 'sync/atomic'.

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* Reorganise imports in blocks

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* notifier/test: Apply PR suggestions

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* storage/remote: avoid storing references on newEntry

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* Revert "scripts: Verify that we are not using restricted packages"

This reverts commit 278d32748e.

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* web: Group imports accordingly

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>
2020-07-30 13:15:42 +05:30
Joe Elliott 04b028f1e6
Exports recoverable error (#7689)
Signed-off-by: Joe Elliott <number101010@gmail.com>
2020-07-29 11:08:25 -06:00
Annanay f40e4579b7 gofmt
Signed-off-by: Annanay <annanayagarwal@gmail.com>
2020-07-24 20:40:19 +05:30
Annanay 7f98a744e5 Add context to Appender interface
Signed-off-by: Annanay <annanayagarwal@gmail.com>
2020-07-24 19:40:51 +05:30
Zhou Hao ddedf454d0
add os.RemoveAll err verification (#7540)
* add os.RemoveAll err verification for watcher_test

Signed-off-by: Zhou Hao <zhouhao@cn.fujitsu.com>

* add os.RemoveAll err verification for db_test

Signed-off-by: Zhou Hao <zhouhao@cn.fujitsu.com>

* add os.RemoveAll err verification for write_test

Signed-off-by: Zhou Hao <zhouhao@cn.fujitsu.com>

* add os.RemoveAll err verification for queue_manager_test

Signed-off-by: Zhou Hao <zhouhao@cn.fujitsu.com>

* tsdb/wal/watcher_test: add close operation before delete

Signed-off-by: Zhou Hao <zhouhao@cn.fujitsu.com>
2020-07-17 11:47:32 +05:30
Hu Shuai b962029031
Add a unit test for MergeLabels in storage/remote/codec.go. (#7499)
This PR is about adding a unit test for MergeLabels in storage/remote/codec.go.

Signed-off-by: Hu Shuai <hus.fnst@cn.fujitsu.com>
2020-07-04 22:17:19 -06:00
Ganesh Vernekar a4c2ea1ca3
Merge remote-tracking branch 'upstream/master' into merge-release-2.19
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-06-26 14:33:50 +05:30
Chris Marchbanks b299aba6cf
Fix panic when updating a remote write queue (#7452)
Right now Queue Manager metrics are registered when the metrics struct
is created, which happens before a changed queue is shutdown and the old
metrics are unregistered. In the case of named queues or updates to
external labels the apply config will panic due to duplicate metrics.

Instead, register the metrics as part of starting the queue as we always
guarantee that Stop will be called before a new Start.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2020-06-26 12:03:52 +05:30
Chris Marchbanks d78656c244
Pending Samples metric includes samples in channel (#7335)
* Pending Samples metric includes samples in channel

The pending samples metric should also include samples waiting in the
channels to be sent to provide a more accurate measure. In addition,
make sure that the pending samples is reset to 0 anytime a queue is
started as we remake all of the shards at that time.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>

* Log the number of dropped samples on hard shutdown

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2020-06-25 14:48:30 -06:00
Bartlomiej Plotka b788986717
storage: Adjusted fully storage layer support for chunk iterators: Remote read client, readyStorage, fanout. (#7059)
* Fixed nits introduced by https://github.com/prometheus/prometheus/pull/7334
* Added ChunkQueryable implementation to fanout and readyStorage.
* Added more comments.
* Changed NewVerticalChunkSeriesMerger to CompactingChunkSeriesMerger, removed tiny interface by reusing VerticalSeriesMergeFunc for overlapping algorithm for
both chunks and series, for both querying and compacting (!) + made sure duplicates are merged.
* Added ErrChunkSeriesSet
* Added Samples interface for seamless []promb.Sample to []tsdbutil.Sample conversion.
* Deprecating non chunks serieset based StreamChunkedReadResponses, added chunk one.
* Improved tests.
* Split remote client into Write (old storage) and read.
* Queryable client is now SampleAndChunkQueryable. Since we cannot use nice QueryableFunc I moved
all config based options to sampleAndChunkQueryableClient to aboid boilerplate.

In next commit: Changes for TSDB.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-06-24 14:41:52 +01:00
Nicole J d5a8f2afc4
Added the remote read histogram (#7334)
change remote read queries total metric to a histogram and add read requests counter with status code

Signed-off-by: njingco <jingco.nicole@gmail.com>
2020-06-16 07:11:41 -07:00
Kemal Akkoyun 66dfb951c4
*: Consistent Error/Warning handling for SeriesSet iterator: Allowing Async Select (#7251)
* Add errors and Warnings to SeriesSet

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Change Querier interface and refactor accordingly

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Refactor promql/engine to propagate warnings at eval stage

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Address review issues

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Make sure all the series from all Selects are pre-advanced

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Address review issues

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Separate merge series sets

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Clean

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Refactor merge querier failure handling

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Refactored and simplified fanout with improvements from incoming chunk iterator PRs.

* Secondary logic is hidden, instead of weird failed series set logic we had.
* Fanout is well commented
* Fanout closing record all errors
* MergeQuerier improved API (clearer)
* deferredGenericMergeSeriesSet is not needed as we return no samples anyway for failed series sets (next = false).

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fix formatting

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Fix CI issues

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Added final tests for error handling.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Brian's comments.

* Moved hints in populate to be allocated only when needed.
* Used sync.Once in secondary Querier to achieve all-or-nothing partial response logic.
* Select after first Next is done will panic.

NOTE: in lazySeriesSet in theory we could just panic, I think however we can
totally just return error, it will panic in expand anyway.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Utilize errWithWarnings

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Fix recently introduced expansion issue

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add tests for secondary querier error handling

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Implement lazy merge

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add name to test cases

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Reorganize

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Address review comments

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Address review comments

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Remove redundant warnings

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Fix rebase mistake

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-06-09 17:57:31 +01:00
Bartlomiej Plotka 8f0a924be9 Merge branch 'release-2.18' into get2.18.2-to-master 2020-06-09 10:04:44 +01:00
Marco Pracucci bc1d7d7f27 Fixed incorrect query results caused by buffer reuse in merge adapter (#7361)
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2020-06-09 05:45:18 +01:00
Marco Pracucci 1e006d84a0
Fixed incorrect query results caused by buffer reuse in merge adapter (#7361)
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2020-06-08 19:55:53 +01:00
Bert Hartmann 82c7cd320a
increase the remote write bucket range (#7323)
* increase the remote write bucket range

Increase the range of remote write buckets to capture times above 10s for laggy scenarios
Buckets had been: {.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10}
Buckets are now: {0.03125, 0.0625, 0.125, 0.25, 0.5, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512}

Signed-off-by: Bert Hartmann <berthartm@gmail.com>

* revert back to DefBuckets with addons to be backwards compatible

Signed-off-by: Bert Hartmann <berthartm@gmail.com>

* shuffle the buckets to maintain 2-2.5x increases

Signed-off-by: Bert Hartmann <berthartm@gmail.com>
2020-06-04 13:54:47 -06:00
Nicole J 5e9bd17b1f
added the prometheus_remote_storage_remote_read_queries_total (#7328)
* added the prometheus_remote_storage_remote_read_queries_total query

Signed-off-by: njingco <jingco.nicole@gmail.com>

* adjusted the help label of remoteReadQueriesTotal

Signed-off-by: njingco <jingco.nicole@gmail.com>
2020-06-03 10:30:52 -07:00
Cody Boggs 3268eac2dd
Trace Remote Write requests (#7206)
* Trace Remote Write requests

Signed-off-by: Cody Boggs <cboggs@splunk.com>

* Refactor store attempts to keep code flow clearer, and avoid so many places to deal with span finishing

Signed-off-by: Cody Boggs <cboggs@splunk.com>
2020-06-01 09:21:13 -06:00
Chris Marchbanks c1f9917e90
Add test for unregistering queue manager metrics
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2020-05-05 14:14:04 -06:00
Chris Marchbanks dfad1da296
Remove duplicate metrics in QueueManager
Right now any new metrics added for remote write need to be added to
both the QueueManager struct, and the queueManagerMetrics struct.
Instead, use the queueManagerMetrics struct directly from QueueManager.

The newQueueManagerMetrics constructor will now create the metrics for a
specific queue with name and endpoint pre-populated, and a new copy of
the struct will be created specifically for each queue.

This also fixes a bug where prometheus_remote_storage_sent_bytes_total
is not being unregistered after a queue is changed.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2020-05-05 14:13:59 -06:00
Julien Pivotto 7ecd2d1c24
Jaeger: Create child span for remote read (#7187)
* Jaeger: Create child span for remote read
* Jaeger: use middleware to trace client http request

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-05-02 22:41:55 +02:00
qinng f36ae1c21c
[remote-storage] use warn log level when send samples to remote failed (#7184)
[remote] increasing sendbatch error log level

Signed-off-by: guoruyi1 <guoruyi1@xiaomi.com>
Co-authored-by: guoruyi1 <guoruyi1@xiaomi.com>
2020-04-30 17:06:22 -06:00
Vasily Sliouniaev 0393b188c9
Add Jaeger (#7148)
* Trace remote read

Signed-off-by: vas <vasily.sliouniaev@jet.com>

* Use jaeger

Signed-off-by: vas <vasily.sliouniaev@jet.com>
2020-04-23 02:05:55 +02:00
Marek Slabicki 4b5e7d4984
Adding a shouldReshard function to modularize logic for the QueueManager deciding if it should shard or not (#7143)
Signed-off-by: Marek Slabicki <thaniri@gmail.com>
2020-04-20 16:20:39 -06:00
Chris Marchbanks cd12f0873c
Merge pull request #7073 from csmarchbanks/fix-md5-remote-write
Fix remote write not updating when relabel configs or secrets change
2020-04-16 16:36:25 -06:00
Chris Marchbanks 5ab6b043c1
Always update lastSendTimestamp after a request (#7122)
If the server is returning non-recoverable errors, such as if we are
trying to push samples that are too old, remote write will never
reshard. Non-recoverable errors should be treated the same as success
for the purpose of resharding, just as we do with sample rates and
durations.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2020-04-15 09:03:28 -06:00
ZouYu 2b7437d60e
Fix some warnings: 'redundant type from array, slice, or map composite literal' (#7109)
Signed-off-by: ZouYu <zouy.fnst@cn.fujitsu.com>
2020-04-15 11:17:41 +01:00
Chris Marchbanks d88a2b0261 Handle secret changes in remote write ApplyConfig
Remake the http client whenever ApplyConfig is called. This allows
secrets to be updated without needing to restart an otherwise unchanged
queue.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2020-04-13 23:14:15 +00:00
Simon Pasquier 317e73de79 Hash YAML instead of JSON
But it doesn't work either because of secret fields.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-04-13 22:32:37 +00:00
Simon Pasquier 8cc84660fa storage/remote: add tests for config changes
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2020-04-13 22:32:37 +00:00