- fold metric name into labels
- return initialization errors back to main
- add snappy compression
- better context handling
- pre-allocation of labels
- remove generic naming
- other cleanups
This uses a new proto format, with scope for multiple samples per
timeseries in future. This will allow users to pump samples out to
whatever they like without having to change the core Prometheus code.
There's also an example receiver to save users figuring out the
boilerplate themselves.
Turns out its valid to have an overall chunk which is smaller than the
full doubleDeltaHeaderBytes size -- if it has a single sample, it
doesn't fill the whole header. Updated unmarshalling check to respect
this.
This is (hopefully) a fix for #1653
Specifically, this makes it so that if the length for the stored
delta/doubleDelta is somehow corrupted to be too small, the attempt to
unmarshal will return an error.
The current (broken) behavior is to return a malformed chunk, which can
then lead to a panic when there is an attempt to read header values.
The referenced issue proposed creating chunks with a minimum length -- I
instead opted to just error on the attempt to unmarshal, since I'm not
clear on how it could be safe to proceed when the length is
incorrect/unknown.
The issue also talked about possibly "quarantining series", but I don't
know the surrounding code well enough to understand how to make that
happen.
Verify that if the configs change, target groups are cleaned on
TargetManager.reload (rather than having old ones linger around, even if
they are no longer present in the configs).
This covers the bug fixed in #1907 -- I verified that by checking out
source from before that commit.
This is a start on #1906
As described in #1898 'go test -race' detects a race in lexer code. This
pacth fixes it and also add '-race' option to test target to prevent
regression.
This was only relevant so far for the benchmark suite as it would
recycle Expr for repetitions. However, the append is unnecessary as
each node is only inspected once when populating iterators, and
population must always start from scratch.
This also introduces error checking during benchmarks and fixes the so
far undetected test errors during benchmarking.
Also, remove a style nit (two golint warnings less…).
Specifically, the TestSpawnNotMoreThanMaxConcurrentSendsGoroutines was failing on a fresh checkout of master.
The test had a race condition -- it would only pass if one of the
spawned goroutines happened to very quickly pull a set of samples off an
internal queue.
This patch rewrites the test so that it deterministically waits until
all samples have been pulled off that queue. In case of errors, it also
now reports on the difference between what it expected and what it found.
I verified that, if the code under test is deliberately broken, the test
successfully reports on that.
Also, remove unused `providers` field in targetSet.
If the config file changes, we recreate all providers (by calling
`providersFromConfig`) and retrieve all targets anew from the newly
created providers. From that perspective, it cannot harm to clean up
the target group map in the targetSet. Not doing so (as it was the
case so far) keeps stale targets around. This mattered if an existing
key in the target group map was not overwritten in the initial fetch
of all targets from the providers. Examples where that mattered:
```
scrape_configs:
- job_name: "foo"
static_configs:
- targets: ["foo:9090"]
- targets: ["bar:9090"]
```
updated to:
```
scrape_configs:
- job_name: "foo"
static_configs:
- targets: ["foo:9090"]
```
`bar:9090` would still be monitored. (The static provider just
enumerates the target groups. If the number of target groups
decreases, the old ones stay around.
```
scrape_configs:
- job_name: "foo"
dns_sd_configs:
- names:
- "srv.name.one.example.org"
```
updated to:
```
scrape_configs:
- job_name: "foo"
dns_sd_configs:
- names:
- "srv.name.two.example.org"
```
Now both SRV records are still monitored. The SRV name is part of the
key in the target group map, thus the new one is just added and the
old ane stays around.
Obviously, this should have tests, and should have tests before, not
only for this case. This is the quick fix. I have created
https://github.com/prometheus/prometheus/issues/1906 to track test
creation.
Fixes https://github.com/prometheus/prometheus/issues/1610 .
0 is considered an invalid interval by time.NewTicker() and will cause a
panic if control reaches that point. Given the vagaries of timekeeping,
this may occasionally happen and make this test unstable.
Apart from not trying to send a newline in a HTTP header,
this also allows Prometheus to build and pass tests with Go 1.7,
which features stricter checking of HTTP headers.