This change switches the remote_write API to use the TSDB WAL. This should reduce memory usage and prevent sample loss when the remote end point is down.
We use the new LiveReader from TSDB to tail WAL segments. Logic for finding the tracking segment is included in this PR. The WAL is tailed once for each remote_write endpoint specified. Reading from the segment is based on a ticker rather than relying on fsnotify write events, which were found to be complicated and unreliable in early prototypes.
Enqueuing a sample for sending via remote_write can now block, to provide back pressure. Queues are still required to acheive parallelism and batching. We have updated the queue config based on new defaults for queue capacity and pending samples values - much smaller values are now possible. The remote_write resharding code has been updated to prevent deadlocks, and extra tests have been added for these cases.
As part of this change, we attempt to guarantee that samples are not lost; however this initial version doesn't guarantee this across Prometheus restarts or non-retryable errors from the remote end (eg 400s).
This changes also includes the following optimisations:
- only marshal the proto request once, not once per retry
- maintain a single copy of the labels for given series to reduce GC pressure
Other minor tweaks:
- only reshard if we've also successfully sent recently
- add pending samples, latest sent timestamp, WAL events processed metrics
Co-authored-by: Chris Marchbanks <csmarchbanks.com> (initial prototype)
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> (sharding changes)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
1. Added an ability to resize text area on mouseclick
2. Remember selected target status button on page reload
Signed-off-by: Maria Nemtinova <nemtinovamasha@gmail.com>
* *: bump gRPC dependencies
This change updates the gRPC dependencies to more recent versions:
* github.com/gogo/protobuf => v1.2.0
* github.com/grpc-ecosystem/grpc-gateway => v1.6.3
* google.golang.org/grpc => v1.17.0
In addition scripts/genproto.sh leverages Go modules information instead of
hardcoding SHA1 commits. This ensures that the code is generated from
the exact same sources.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Run 'make proto' in CI
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Revert tabs -> spaces change
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Fix 'make proto' step
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* 'go get' grpc/protobuf dependencies
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Prepopulate cache with go mod download
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* *: use latest release of staticcheck
It also fixes a couple of things in the code flagged by the additional
checks.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Use official release of staticcheck
Also run 'go list' before staticcheck to avoid failures when downloading packages.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* added `Copy to clipboard` button
Signed-off-by: Stafford Williams <stafford.williams@gmail.com>
* generate vsfdata
Signed-off-by: Stafford Williams <stafford.williams@gmail.com>
* new lines
Signed-off-by: Stafford Williams <stafford.williams@gmail.com>
* single newline
Signed-off-by: Stafford Williams <stafford.williams@gmail.com>
When a metric has a null value, number formatters like
`humanizeNoSmallPrefix` will throw "Uncaught TypeError: Cannot read
property 'toPrecision' of null".
This is fixed by explicitly checking for `null` and returning the string
"null".
Note: This is usually not seen as rickshaw doesn't show annotations for
null values, but still calls the formatter.
Signed-off-by: David Coles <coles.david@gmail.com>
* update promlog to latest version
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* Update api tests, fix main setup
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* tidy go.sum
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* revendor prometheus/common
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* only initialize config; use kingpin for remote_storage_adapter
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* actually parse the flags
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* clean up imports
Signed-off-by: Alex Yu <yu.alex96@gmail.com>
* web: added ability to set page title through flag.
Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>
* Reformatted variable names and Flag description for readability.
Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>
* assets_vfsdata.go
Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>
* Flag name changed from web.ui-title to web.page-title
Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>
* make assets
Signed-off-by: Andrew Chiu <andrew.chiu2@baesystems.com>
By default the gRPC client of the REST API gateway relies on the
HTTP_PROXY variable to connect to the local gRPC server which isn't
desired as the server runs in the same process. This change uses a
custom dialer that connects directly to the server's address.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* *: move to go 1.11
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Reduce number of places where we specify the Go version
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
* Add evaluationTimestamp (Last Evaluation) column to display on /rules
Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>
* Add lastScrapeDuration ("Scrape Duration") to display on /targets
Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>
* Updates based on Julius' feedback
Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>
* Update to set timestamp to when eval started (after eval completes)
Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>
* Update /rules to display time since last evaluation
Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>
* Re-order Last Eval/Eval Time to be consistent with targets page
Signed-off-by: Will Hegedus <wbhegedus@liberty.edu>
With the addition of the errors in the views list, it is now difficult
to have a view on all the rules in a screen witdh.
This commit adds wrapping to improve the overall display of the rules
page.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>