If the user accidentally sets the max block duration smaller than the min,
the current error is not informative. This change just performs the check
earlier and improves the error message.
* Allow getting credentials via EC2 role
This is subtly different than the existing `role_arn` solution, which
allows Prometheus to assume an IAM role given some set of credentials
already in-scope. With EC2 roles, one specifies the role at instance
launch time (via an instance profile.) The instance then exposes
temporary credentials via its metadata. The AWS Go SDK exposes a
credential provider that polls the [instance metadata endpoint][1]
already, so we can simply use that and it will take care of renewing the
credentials when they expire.
Without this, if this is being used inside EC2, it is difficult to
cleanly allow the use of STS credentials. One has to set up a proxy role
that can assume the role you really want, and launch the EC2 instance
with the proxy role. This isn't very clean, and also doesn't seem to be
[supported very well][2].
[1]:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html
[2]: https://github.com/aws/aws-cli/issues/1390
* Automatically try to detect EC2 role credentials
The `Available()` function exposed on ec2metadata returns a simple
true/false if the ec2 metadata is available. This is the best way to
know if we're actually running in EC2 (which is the only valid use-case
for this credential provider.)
This allows this to "just work" if you are using EC2 instance roles.
We were determining a chunk's end time once it was one quarter full to
compute it so all chunks have uniform number of samples.
This accidentally skipped the case where series started near the end of
a chunk range/block and never reached that threshold. As a result they
got persisted but were continued across the range.
This resulted in corrupted persisted data.
This ensures we close all previously opened queriers if on of the block
querier fails to open.
Also swap in new blocks before closing old ones to avoid the situation
in general. Make read locking of blocks more conservative to avoid
unnecessary retries by clients, e.g. when blocks are getting closed
before we can successfully instantiate querier against them.
If the other Prometheus has an external label that matches that of
the Prometheus being read from, then we need to remove that matcher
from the request as it's not actually stored in the database - it's
only added for alerts, federation and on the output of the remote read
endpoint.
Instead we check for that label being empty, in case there is a time
series with a different label value for that external label.
staticcheck fails with:
storage/remote/read_test.go:199:27: do not pass a nil Context, even if a function permits it; pass context.TODO if you are unsure about which Context to use (SA1012)