Commit graph

219 commits

Author SHA1 Message Date
Björn Rabenstein a8c79f0a0c Merge pull request #1422 from prometheus/release-0.17
Merge more commits from 0.17.
2016-02-23 23:07:44 +01:00
beorn7 8fa1560e48 Fix a very special case of handling the checkpoint timer 2016-02-23 16:48:35 +01:00
Björn Rabenstein d9eb624322 Merge pull request #1415 from prometheus/release-0.17
Forward-merge release-0.17 into master
2016-02-22 16:39:48 +01:00
beorn7 4d1f7b49b6 Fix a race condition in calculatePersistenceUrgencyScore 2016-02-22 15:48:39 +01:00
beorn7 ef3ab96111 Populate first and last time in the chunk descriptor earlier
The First time is kind of trivial as we always know it when we create
a new chunkDesc.

The last time is only know when the chunk is closed, so we have to set
it at that time.

The change saves a lot of digging down into the chunk
itself. Especially the last time is relative expensive as it involves
the creation of an iterator. The first time access now doesn't require
locking, which is also a nice gain.
2016-02-15 14:06:09 +01:00
beorn7 9a3edea477 Remove race condition from TestRetentionCutoff 2016-02-12 12:13:19 +01:00
Julius Volz 9b6d69610a Fix various typos in comments.
Helpfully reported by
https://goreportcard.com/report/github.com/prometheus/prometheus :)
2016-02-10 03:47:00 +01:00
Fabian Reinartz 1f877f3d2a Fix deadlock, structure target logging 2016-02-03 10:39:34 +01:00
Fabian Reinartz 59f1e722df Return error on sample appending 2016-02-02 14:01:44 +01:00
beorn7 ec08c9a391 Rework the way to communicate backpressure (AKA suspended ingestion)
This gives up on the idea to communicate throuh the Append() call (by
either not returning as it is now or returning an error as
suggested/explored elsewhere). Here I have added a Throttled() call,
which has the advantage that it can be called before a whole _batch_
of Append()'s. Scrapes will happen completely or not at all. Same for
rule group evaluations. That's a highly desired behavior (as discussed
elsewhere). The code is even simpler now as the whole ingestion buffer
could be removed.

Logging of throttled mode has been streamlined and will create at most
one message per minute.
2016-02-01 14:45:44 +01:00
beorn7 87ef24cd25 Add instrumentation and refactor things around "rushed mode" 2016-01-26 17:44:21 +01:00
beorn7 a2cd479058 Fix calculation of chunks to persist after restart
Since we are not overestimating the number of chunks to persist
anymore, this commit also adjusts the default value for
-storage.local.memory-chunks. Update of documentation will follow.
2016-01-25 19:33:51 +01:00
beorn7 972d94433a Introduce a hysteresis for "rushed mode"
"Rushed mode" is formerly known as "degraded mode", which is changed
with this commit, too. The name "degraded" was very misleading.

Also, switch into rushed mode if we have too many chunks in memory and
an at least reasonable amount of chunks to persist so that speeding up
persisting chunks can help.
2016-01-25 19:24:37 +01:00
beorn7 14796bdb60 Improve chunkMaxBatchSize doc comment 2016-01-25 18:57:51 +01:00
beorn7 582af1618c Streamline chunk writing
This helps to avoid allocations in the same way we were already doing
it during reading.
2016-01-25 16:36:36 +01:00
beorn7 99b9611351 Remove a race condition from TestRetentionCutoff 2016-01-25 16:36:14 +01:00
beorn7 3f4d22e4c7 Update doc comment
This should have gone into a previous commit, but I forgot to save
this particular file.
2016-01-12 12:38:18 +01:00
beorn7 add2ebdd56 Tolerate the lost+found directory in the data directory 2016-01-11 18:05:36 +01:00
Björn Rabenstein 6293f3a374 Merge pull request #1304 from prometheus/beorn7/storage
Improve handling of series file truncation
2016-01-11 17:27:08 +01:00
beorn7 cb117d8346 Add a series ops metric "purge_on_request"
It counts series deletions triggered via the API.
2016-01-11 17:22:16 +01:00
beorn7 4221c7de5c Improve handling of series file truncation
If only very few chunks are to be truncated from a very large series
file, the rewrite of the file is a lorge overhead. With this change, a
certain ratio of the file has to be dropped to make it happen. While
only causing disk overhead at about the same ratio (by default 10%),
it will cut down I/O by a lot in above scenario.
2016-01-11 16:42:10 +01:00
Fabian Reinartz e3b6ec9784 Switch to common/log 2015-10-03 10:21:43 +02:00
beorn7 22d3a4311a Increase waiting time in TestEvictAndLoadChunkDescs
The test had become flaky with Go1.5.

Theory here is that with Go1.5.x, sleeping for 10ms might not be
enough to wake up another goroutine, possibly because it is used for
GC. 50ms should always be enough due to GC pause guarantees with the
new GC.
2015-09-14 21:09:46 +02:00
Julius Volz af513468eb Fix some dead code, missing error checks, shadowings.
I applied
https://medium.com/@jgautheron/quality-pipeline-for-go-projects-497e34d6567
and was greeted with a deluge of warnings, most of which were not
applicable or really fixable realistically. These are some of the first
ones I decided to fix.
2015-09-14 12:21:34 +02:00
beorn7 daeccdd0e9 Fix DropMetricsForFingerprints
It now deletes the series file also for archived series.

Also, fix a naming error in a doc comment.
2015-09-11 15:47:23 +02:00
Julius Volz ffc5142c54 Merge pull request #1058 from prometheus/check-errors
Fix error checking and logging around checkpointing.
2015-09-07 19:57:16 +02:00
Julius Volz 6774a73878 Fix error checking and logging around checkpointing. 2015-09-07 19:34:59 +02:00
Julius Volz 011faf9057 Fix typo in comment. 2015-09-07 19:15:28 +02:00
Julius Volz 995d3b831d Fix most golint warnings.
This is with `golint -min_confidence=0.5`.

I left several lint warnings untouched because they were either
incorrect or I felt it was better not to change them at the moment.
2015-08-26 12:44:46 +02:00
Fabian Reinartz e061595352 Move COWMetric into storage/metric package 2015-08-25 11:59:07 +02:00
Brian Brazil fdf0d0642e Cast value to float, as that's what the console templates expect. 2015-08-24 16:59:08 +01:00
Fabian Reinartz 1535ef1457 Replace metric.SamplePair with model.SamplePair 2015-08-22 14:52:35 +02:00
Fabian Reinartz c9d396f476 Replace metric.LabelPair with model.LabelPair 2015-08-22 13:32:13 +02:00
Fabian Reinartz 438e232c9b Fix grouping of import blocks 2015-08-22 09:42:45 +02:00
Fabian Reinartz 306e8468a0 Switch from client_golang/model to common/model 2015-08-21 13:33:38 +02:00
Julius Volz f65ef1ed10 Fix wording in shutdown warning. 2015-08-17 14:26:53 +02:00
Brian Brazil 0ec71442cd Storage: Tell users how to avoid crash recovery.
If users see the crash recovery error, the chances are
they aren't shutting down Prometheus correctly. Telling
them how to do so will help them debug and fix the problem.
2015-08-16 10:42:31 +01:00
Laurie Malau 20ad403587 Don't warn/increment metric upon equal timestamps during append.
Perhaps it would be even better to still warn in case the sample value has
changed but the timestamps are equal, but we don't have efficient access
to the last value.
2015-08-09 23:49:49 +02:00
Julius Volz 517badc21d Only do regex lookups when there was no equality match.
For the label matching index-based preselection phase, don't do an OR
between equality and non-equality matchers. Execute only one of the two
(with equality matchers preferred when present).

Fixes https://github.com/prometheus/prometheus/issues/924
2015-07-23 23:13:30 +02:00
beorn7 699946bf32 Fix chunk desc loading.
If all samples in consecutive chunks have the same timestamp, the way
we used to load chunks will fail. With this change, the persist
watermark is used to load the right amount of chunkDescs from disk.

This bug is a possible reason for the rare storage corruption we have
observed.
2015-07-16 13:09:20 +02:00
beorn7 4203849c92 Test chunkDesc eviction and loading 2015-07-16 13:09:13 +02:00
beorn7 37e12df9ff Improve TestAppendOutOfOrder 2015-07-16 12:48:33 +02:00
beorn7 502aa9ded5 Use Has instead of Get for existence test. 2015-07-16 12:26:50 +02:00
beorn7 ff08f0b6fe storage: ensure timestamp monotonicity within series.
Fixes https://github.com/prometheus/prometheus/issues/481

While doing so, clean up and fix a few other things:

- Fix `go vet` warnings (@fabxc to blame ;).

- Fix a racey problem with unarchiving: Whenever we unarchive a
  series, we essentially want to do something with it. However, until
  we have done something with it, it appears like a series that is
  ready to be archived or even purged. So e.g. it would be ignored
  during checkpointing. With this fix, we always load the chunkDescs
  upon unarchiving. This is wasteful if we only want to add a new
  sample to an archived time series, but the (presumably more common)
  case where we access an archived time series in a query doesn't
  become more expensive.

- The change above streamlined the getOrCreateSeries ond
  newMemorySeries flow. Also, the modTime is now always set correctly.

- Fix the leveldb-backed implementation of KeyValueStore.Delete. It
  had the wrong behavior of still returning true, nil if a
  non-existing key has been passed in.
2015-07-15 18:56:53 +02:00
Julius Volz acbc2b8cb6 storage: Fix float->uint conversions on some compilers.
See https://github.com/prometheus/prometheus/issues/887, which will at
least be partially fixed by this.

From the spec https://golang.org/ref/spec#Conversions:

"In all non-constant conversions involving floating-point or complex
values, if the result type cannot represent the value the conversion
succeeds but the result value is implementation-dependent."

This ended up setting the converted values to 0 on Debian's Go 1.4.2
compiler, at least on 32-bit Debians.
2015-07-13 11:19:11 +02:00
beorn7 8c196c1028 Minor doc fixes. 2015-06-23 17:07:18 +02:00
Fabian Reinartz 6bfb4549a6 storage: add LastSamplePairForFingerprint method 2015-06-23 13:45:15 +02:00
Fabian Reinartz dc7d27ab9a retrieval: add honor label handling and parametrized querying.
This commit adds the honor_labels and params arguments to the scrape
config. This allows to specify query parameters used by the scrapers
and handling scraped labels with precedence.
2015-06-23 13:45:14 +02:00
beorn7 9016917d1c Increment dirty counter only if setDirty(true) is called.
Currently, we increment the counter even if setDirty(false) is called,
which sets the storage clean.
2015-06-22 18:12:55 +02:00
Fabian Reinartz 1eff186555 Merge pull request #810 from prometheus/fabxc/lmatch
Match empty labels.
2015-06-22 15:45:50 +02:00