Commit graph

10024 commits

Author SHA1 Message Date
Matt Bostock 4160892109 Correct notifications_dropped description
The current description does not accurately describe when the metric is incremented.

Aside from Alertmanger missing from the configuration, `prometheus_notifications_dropped_total` is incremented when errors occur while sending alert notifications to Alertmanager, or because the notifications queue is full, or because the number of notifications to be sent exceeds the queue capacity.

I think calling these cases 'errors' in a generic sense is more useful than the current description.
2017-01-13 23:36:00 +00:00
Brian Brazil f64c231dad Allow checkpoints and maintenance to happen concurrently. (#2321)
This is essential on larger Prometheus servers, as otherwise
checkpoints prevent sufficient persisting of chunks to disk.
2017-01-13 17:24:19 +00:00
Fabian Reinartz 1c80c33e72 Fix bug of unsorted postings lists being created
The former approach created unordered postings list by either
map iteration of new series being unsorted (fixable) or concurrent
writers creating new series interleaved.

We switch back to generating ephemeral references for a single batch.
Newly created series have to be re-set upon the next insert.
2017-01-13 16:22:20 +01:00
Frederic Branczyk 389c6d0043
web/api: add alertmanager api 2017-01-13 15:30:20 +01:00
Fabian Reinartz c7f5590a71 Ensure order of postings when adding new series 2017-01-13 15:25:11 +01:00
Fabian Reinartz ad9bc62e4c storage: extend appender and adapt it 2017-01-13 14:48:01 +01:00
Fabian Reinartz d970f0256a Add Rollback() and docs to Appender interface 2017-01-12 20:17:49 +01:00
Fabian Reinartz 22db9c3413 Remove old appendBatch methods 2017-01-12 20:04:49 +01:00
Fabian Reinartz fde69dab49 Use buffer pool for head appenders 2017-01-12 20:03:44 +01:00
Fabian Reinartz a317f252b9 Expose series references to clients
This exposes a reference number of a series represented by a label set
to clients.
Subsequent samples can be directly added via the reference rather than
repeatedly passing in the full labels. This drasitcally speeds up the
append process.

The appender chain uses different sections of the reference number for
assignment to child appenders and invalidating reference numbers as
necessary.

Clients can either pass out reference numbers themselves or have their
own optimized lookup, i.e. by directly associating unparsed metric
descriptors strings with reference numbers.
2017-01-12 20:00:54 +01:00
Fabian Reinartz 5e028710d5 Add fast past to validation after lock switch 2017-01-12 15:51:08 +01:00
Brian Brazil 1dcb7637f5 Add various persistence related metrics (#2333)
Add metrics around checkpointing and persistence

* Add a metric to say if checkpointing is happening,
and another to track total checkpoint time and count.

This breaks the existing prometheus_local_storage_checkpoint_duration_seconds
by renaming it to prometheus_local_storage_checkpoint_last_duration_seconds
as the former name is more appropriate for a summary.

* Add metric for last checkpoint size.

* Add metric for series/chunks processed by checkpoints.

For long checkpoints it'd be useful to see how they're progressing.

* Add metric for dirty series

* Add metric for number of chunks persisted per series.

You can get the number of chunks from chunk_ops,
but not the matching number of series. This helps determine
the size of the writes being made.

* Add metric for chunks queued for persistence

Chunks created includes both chunks that'll need persistence
and chunks read in for queries. This only includes chunks created
for persistence.

* Code review comments on new persistence metrics.
2017-01-11 15:11:19 +00:00
Fabian Reinartz 1b39887baa Revalidate series existance after lock switch 2017-01-11 14:05:58 +01:00
Fabian Reinartz ca5791efbc Simplify creation of new series 2017-01-11 13:58:26 +01:00
Fabian Reinartz 0ca755b4ae Replace single head chunk per series with memSeries
This adds a memory series holding several chunk to replace
the single head chunk per series so far.
This is necessary for uniform maximum chunk sizes in cases
where some series have higher frequency samples than others.
2017-01-11 13:02:38 +01:00
Fabian Reinartz 80affd98a8 Add barrier to benchmark writer
This adds a barrier to avoid issues with unfair goroutine scheduling
that causes some fake scrapers to run away from the other ones.
2017-01-11 13:01:30 +01:00
Fabian Reinartz c32a94d409 Unexport HeadBlock, export Block interface 2017-01-10 15:41:57 +01:00
Fabian Reinartz d86e8a63c7 Report correct number of appended samples 2017-01-10 11:17:37 +01:00
Fabian Reinartz 29883a18fc Add own Appender() method for DB 2017-01-09 22:54:08 +01:00
Fabian Reinartz 4c4e0c614e Simplify position mapper updating 2017-01-09 19:24:05 +01:00
Fabian Reinartz 142c89b8b0 Fix/update metrics 2017-01-09 19:14:21 +01:00
Fabian Reinartz 0dffd52238 Use page writer in compaction 2017-01-09 18:47:43 +01:00
Fabian Reinartz 89d8467f5c Add missing lock 2017-01-09 18:07:45 +01:00
Fabian Reinartz 8c31c6e934 Make concurrent head chunk reads safe, fix misc races
This adds a 4 sample buffer to every head chunk. The XOR
compression scheme may edit bytes in place. The minimum size
of a sample is 2 bits. So keeping the last 4 samples in an in-memory
buffer makes it safe to query the preceeding ones while samples
are added
2017-01-09 16:51:39 +01:00
Björn Rabenstein 6ce97837ab Merge pull request #2327 from prometheus/beorn7/vendoring
vendoring: Update prometheus/common to pull in bug fixes
2017-01-09 13:28:36 +01:00
beorn7 86ec87b78f vendoring: Update prometheus/common to pull in bug fixes
In particular the one for https://github.com/prometheus/common/issues/72.
2017-01-09 12:25:17 +01:00
Fabian Reinartz 3302bb1eb1 Merge pull request #2323 from prometheus/beorn7/retrieval
Retrieval: Avoid copying Target
2017-01-08 06:49:47 +01:00
Björn Rabenstein ad40d0abbc Merge pull request #2288 from prometheus/limit-scrape
Add ability to limit scrape samples, and related metrics
2017-01-08 01:34:06 +01:00
beorn7 5dc01202d7 Retrieval: Remove some test lines that fail on Travis only
These lines exercise an append in
TestScrapeLoopWrapSampleAppender. Arguably, append shouldn't be tested
there in the first place.

Still it's weird why this fails on Travis:

```
--- FAIL: TestScrapeLoopWrapSampleAppender (0.00s)
    scrape_test.go:259: Expected count of 1, got 0
    scrape_test.go:290: Expected count of 1, got 0
2017/01/07 22:48:26 http: TLS handshake error from 127.0.0.1:50716: read tcp 127.0.0.1:40265->127.0.0.1:50716: read: connection reset by peer
FAIL
FAIL	github.com/prometheus/prometheus/retrieval	3.603s
```

Should anybody ever find out why, please revert this commit accordingly.
2017-01-08 00:01:46 +01:00
beorn7 3610331eeb Retrieval: Do not buffer the samples if no sample limit configured
Also, simplify and streamline the code a bit.
2017-01-07 18:18:54 +01:00
Fabian Reinartz 1943f8d1bb Fix head block stats races 2017-01-07 18:02:17 +01:00
Fabian Reinartz 6aa922c5a6 Fix races 2017-01-07 16:20:32 +01:00
André Carvalho c43dfaba1c Add max concurrent and current queries engine metrics (#2326)
* Add max concurrent and current queries engine metrics

This commit adds two metrics to the promql/engine: the
number of max concurrent queries, as configured by the flag, and
the number of current queries being served+blocked in the engine.
2017-01-07 14:41:25 +00:00
beorn7 767c0709b1 Retrieval: Avoid copying Target
retreival.Target contains a mutex. It was copied in the Targets()
call. This potentially can wreak a lot of havoc.

It might even have caused the issues reported as #2266 and #2262 .
2017-01-06 18:43:41 +01:00
Fabian Reinartz 54f5027406 Put WAL lock down into encoder 2017-01-06 18:36:42 +01:00
Fabian Reinartz 300f4e2abf Use separate lock for series creation
This uses the head block's own lock to only lock if new series were
encountered.
In the general append case we just need to hold a
2017-01-06 18:10:50 +01:00
Fabian Reinartz 63e12807da Don't update head postings mapper on every append 2017-01-06 16:43:18 +01:00
Fabian Reinartz 71efd2e08d Periodically fsync WAL, make head cut async 2017-01-06 15:18:06 +01:00
Fabian Reinartz c61b310210 Naive size-based compaction
This adds naive compaction that tries to compact three
blocks of roughly equal size.
It decides based on samples present in a block and has no
safety measures considering the actual file size.
2017-01-06 13:53:05 +01:00
Fabian Reinartz 2eb544c98e Change file names and maker parsing safer 2017-01-06 13:13:22 +01:00
Fabian Reinartz 96c2bd249f Handle compaction trigger and reinitializing in DB 2017-01-06 13:03:23 +01:00
Fabian Reinartz 304cae9928 tsdb: Use PartitionedDB constructor 2017-01-06 12:34:54 +01:00
Fabian Reinartz 3ed2c2a14b Rename Partition to regular DB, DB to PartitionedDB 2017-01-06 11:40:09 +01:00
Fabian Reinartz 937cdb579c Switch to sequential block names
This changes block directory names from the int64 timestamp
to sequential numbering.
2017-01-06 10:45:03 +01:00
Fabian Reinartz 4590b61343 Rename shard to partition 2017-01-06 08:08:02 +01:00
Brian Brazil f9e581907a Make index queue bigger. (#2322)
When a large Prometheus starts up fresh it can take many minutes
to warmup and clear out the index queue. A larger queue means less
blocking, bigger batches and cuts down startup time by ~50%.
2017-01-05 17:57:42 +00:00
Fabian Reinartz 9790aa98ac Add postings wrapper that emits head postings in label set order
This adds a position mapper that takes series from a head block
in the order they were appended and creates a mapping representing
them in order of their label sets.

Write-repair of the postings list would cause very expensive writing.
Hence, we keep them as they are and only apply the postition mapping
at the very end, after a postings list has been sufficienctly reduced
through intersections etc.
2017-01-05 16:05:42 +01:00
Fabian Reinartz 5aa7f7cce8 Compact head block into persisted block 2017-01-04 21:11:15 +01:00
Fabian Reinartz 3f72d5d027 Fix last timestamp initialization
This initializes the chunkDesc's last timestamp to the minimum
value so initial samples with a timestamp of 0 (e.g. in tests)
are not accidentally dropped.
2017-01-04 14:06:40 +01:00
Fabian Reinartz c9f4aea8e2 Merge pull request #2305 from alicebob/favicon
Add a favicon to the web GUI
2017-01-04 10:15:27 +01:00