Commit graph

142 commits

Author SHA1 Message Date
Tobias Schmidt 7d71d354fd Remove special listing of config.file in usage
The -config.file parameter isn't required or any more special than the
other flags. In order to avoid confusion, this change removes the
special mention again. Instead, the error message if a config file
couldn't be loaded is changed to mention the flag name.
2015-04-08 17:36:15 -04:00
Tobias Schmidt 35a44509fb Improve readability of usage text
Separates flag and description by a newline to make it easier to read
the flags with long descriptions.
2015-04-08 17:33:25 -04:00
Fabian Reinartz c012ca6039 Make help output readable.
This commit increases the usability by grouping flags based on their
first dot-separated group. Long flag descriptions are broken into lines
printed with indentation.
2015-04-08 12:41:49 +02:00
Björn Rabenstein d8e515e9cb Merge pull request #617 from prometheus/influxdb-write-support
Add experimental InfluxDB write support.
2015-04-07 13:23:06 +02:00
Ceesjan Luiten 0e18784c64 Make all paths absolute to support proxies 2015-04-02 20:36:47 +02:00
Julius Volz 593e565688 Allow writing to InfluxDB/OpenTSDB at the same time. 2015-04-02 20:24:38 +02:00
Julius Volz 61fb688dd9 Add experimental InfluxDB write support. 2015-04-01 02:03:16 +02:00
Julius Volz 33702da8a8 Use simple Now() func in API instead of utility.Time. 2015-03-27 23:43:47 +01:00
Julius Volz 3f2686d0b3 Remove unused fields from MetricsService. 2015-03-27 18:51:13 +01:00
beorn7 12ae6e9203 Increase resilience of the storage against data corruption - step 4.
Step 4: Add a configurable sync'ing of series files after modification.
2015-03-19 15:58:02 +01:00
beorn7 11bd9ce1bd Increase resilience of the storage against data corruption - step 3.
Step 3: Remember the mtime of series files and make use of it to
detect series files that are not the one the checkpoint thinks they
are.
2015-03-19 15:44:11 +01:00
beorn7 e25cca823c Increase resilience of the storage against data corruption - step 2.
Step 2: Add a flag -storage.local.pedantic-checks to check every
series file.

Also, remove countPersistedHeadChunks channel, which is unused.
2015-03-19 12:06:15 +01:00
beorn7 da7c0461c6 Rename persist queue len/cap to num/max chunks to persist.
Remove deprecated flag storage.incoming-samples-queue-capacity.
2015-03-18 19:36:41 +01:00
beorn7 be11cb2b07 Remove the sample ingestion channel.
The one central sample ingestion channel has caused a variety of
trouble. This commit removes it. Targets and rule evaluation call an
Append method directly now. To incorporate multiple storage backends
(like OpenTSDB), storage.Tee forks the Append into two different
appenders.

Note that the tsdb queue manager had its own queue anyway. It was a
queue after a queue... Much queue, so overhead...

Targets have their own little buffer (implemented as a channel) to
avoid stalling during an http scrape. But a new scrape will only be
started once the old one is fully ingested.

The contraption of three pipelined ingesters was removed. A Target is
an ingester itself now. Despite more logic in Target, things should be
less confusing now.

Also, remove lint and vet warnings in ast.go.
2015-03-15 14:08:22 +01:00
beorn7 0056eaeb4f Redesign series maintenance and chunk persistence. 2015-03-14 22:05:23 +01:00
beorn7 5bea942d8e Improve various things around chunk encoding.
A number of mostly minor things:

- Rename chunk type -> chunk encoding.

- After all, do not carry around the chunk encoding to all parts of
  the system, but just have one place where the encoding for new
  chunks is set based on the flag. The new approach has caveats as
  well, but the polution of so many method signatures is worse.

- Use the default chunk encoding for new chunks of existing
  series. (Previously, only new _series_ would get chunks with the
  default encoding.)

- Use an enum for chunk encoding. (But keep the version number for the
  flag, for reasons discussed previously.)

- Add encoding() to the chunk interface (so that a chunk knows its own
  encoding - no need to have that in a different top-level function).

- Got rid of newFollowUpChunk (which would keep the existing encoding
  for all chunks of a time series). Now only use newChunk(), which
  will create a chunk encoding according to the flag.

- Simplified transcodeAndAdd.

- Reordered methods of deltaEncodedChunk and doubleDeltaEncoded chunk
  to match the order in the chunk interface.

- Only transcode if the chunk is not yet half full. If more than half
  full, add a new chunk instead.
2015-03-14 19:03:20 +01:00
beorn7 66e768f05e Improve docstring for chunk type flag. 2015-03-06 17:04:07 +01:00
beorn7 13fcf1ddbc Implement double-delta encoded chunks. 2015-03-05 20:33:26 +01:00
beorn7 af91fb8e31 Improve persisting chunks to disk.
This is done by bucketing chunks by fingerprint. If the persisting to
disk falls behind, more and more chunks are in the queue. As soon as
there are "double hits", we will now persist both chunks in one go,
doubling the disk throughput (assuming it is limited by disk
seeks). Should even more pile up so that we end wit "triple hits", we
will persist those first, and so on.

Even if we have millions of time series, this will still help,
assuming not all of them are growing with the same speed. Series that
get many samples and/or are not very compressable will accumulate
chunks faster, and they will soon get double- or triple-writes.

To improve the chance of double writes,
-storage.local.persistence-queue-capacity could be set to a higher
value. However, that will slow down shutdown a lot (as the queue has
to be worked through). So we leave it to the user to set it to a
really high value. A more fundamental solution would be to checkpoint
not only head chunks, but also chunks still in the persist queue. That
would be quite complicated for a rather limited use-case (running many
time series with high ingestion rate on slow spinning disks).
2015-02-17 16:02:09 +01:00
beorn7 8a1c195b54 Move emptiness check to the receivers. 2015-02-12 19:47:24 +01:00
beorn7 11b3c2387c Improvements after review.
- Increase samplesQueueCapacity.

- Improve docstring for the above.

- Accept a short waiting period for the ingest channel to become
  ready. This should depend on the http timeout, but 100ms is probably
  good enough to cushion bursts bigger than samplesQueueCapacity,
  while it is unlikely that anybody ever will set an HTTP timeout
  similarly short.
2015-02-10 14:58:46 +01:00
beorn7 d2ab49c396 Make the persist queue length configurable.
Also, set a much higher default value.

Chunk persist requests can be quite spiky. If you collect a large
number of time series that are very similar, they will tend to finish
up a chunk at about the same time. There is no reason we need to back
up scraping just because of that. The rationale of the new default
value is "1/8 of the chunks in memory".
2015-02-06 14:54:53 +01:00
Bjoern Rabenstein 5859b74f1b Clean up license issues.
- Move CONTRIBUTORS.md to the more common AUTHORS.
- Added the required NOTICE file.
- Changed "Prometheus Team" to "The Prometheus Authors".
- Reverted the erroneous changes to the Apache License.
2015-01-21 20:07:45 +01:00
juliusv cca2e58f20 Merge pull request #442 from prometheus/beorn7/fix-crash-recovery
Fix ALL the crash-recovery related problems.
2015-01-09 10:56:02 +01:00
Bjoern Rabenstein 0851945054 Add a heuristics to checkpoint early if there are many "dirty" series.. 2015-01-08 20:15:58 +01:00
Julius Volz d6b9e97655 Remove extraction.Result type, simplify code. 2015-01-08 16:34:01 +01:00
Bjoern Rabenstein b1e4956142 Apply a giant code cleanup.
Essentially:

- Remove unused code.

- Make it 'go vet' clean. The only remaining warnings are in generated code.

- Make it 'golint' clean. The only remaining warnings are in gerenated code.

- Smoothed out same minor things.

Change-Id: I3fe5c1fbead27b0e7a9c247fee2f5a45bc2d42c6
2014-12-10 16:16:49 +01:00
Bjoern Rabenstein ea86f7e8f8 Fix weird things after merge.
And I swear I'll never use 'rebase' to 'clean something up' ever agin,
even if Julius tells me to do so...

Change-Id: Ifeabab20445279bf693c95f062da769b60fe195f
2014-11-25 17:39:02 +01:00
Bjoern Rabenstein 3a17aeabfd Merge branch 'beorn/storage-ng-with-commit-history-cleaned-up'
Conflicts:
	Makefile
	Makefile.INCLUDE
	VERSION
	main.go
	notification/notification.go
	retrieval/target.go
	retrieval/target_test.go
	retrieval/targetmanager.go
	retrieval/targetmanager_test.go
	retrieval/targetpool.go
	retrieval/targetpool_test.go
	rules/ast/functions.go
	rules/rules_test.go
	storage/metric/interface.go
	storage/metric/tiered/curator.go
	storage/metric/tiered/end_to_end_test.go
	storage/metric/tiered/leveldb.go
	storage/metric/tiered/memory.go
	storage/metric/tiered/memory_test.go
	storage/metric/tiered/tiered.go
	storage/remote/queue_manager.go
	templates/templates.go
	templates/templates_test.go
	web/api/query.go
	web/consoles.go
	web/web.go

Change-Id: I96e6312b51e877d4434fe96c494e9558fe2e1d16
2014-11-25 17:36:17 +01:00
Bjoern Rabenstein 14bda4180c Changes after pair code review.
Change-Id: Ib72d40f8e9027818cfbbd32a7a7201eebda07455
2014-11-25 17:12:59 +01:00
Bjoern Rabenstein bb42cc2e2d Evict based on memory pressure. Evict recently used chunks last.
Change-Id: Ie6168f0cdb3917bdc63b6fe15585dd70c1e42afe
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein 74c143c4c9 Improve scraper shutdown time.
- Stop target pools in parallel.
- Stop individual scrapers in goroutines, too.
- Timing tweaks.

Change-Id: I9dff1ee18616694f14b04408eaf1625d0f989696
2014-11-25 17:10:39 +01:00
Bjoern Rabenstein 3f61d304ce Reorganize maintenance loop.
Change-Id: Iac10f988ba3e93ffb188f49c30f92e0b6adce5a3
2014-11-25 17:10:30 +01:00
Bjoern Rabenstein a5f56639b8 Instrument unwritten samples queue.
Change-Id: Id77387314d340a5118490cf08e7bbc37c7366b25
2014-11-25 17:10:30 +01:00
Bjoern Rabenstein 904acd43da Add crash recovery.
Fix the behavior if preload for non-existent series is requested.

Instead of returning an error (which triggers a panic further up),
simply count those incidents. They can happen regularly, we just want
to know if they happen too frequently because that would mean the
indexing is behind or broken.

Change-Id: I4b2d1b93c4146eeea897d188063cb9574a270f8b
2014-11-25 17:09:43 +01:00
Bjoern Rabenstein f1de5b0c4e Run checkpointing of in-memory metrics and head chunks periodically.
Checkpointing interval is now a command line flag.

Along the way, several things were refactored.
- Restructure the way the storage is started and stopped..
- Number of series in checkpoint is now a uint64, not a varint.
  (Breaks old checkpoints, needs wipe!)
- More consistent naming and order of methods.

Change-Id: I883d9170c9a608ee716bb0ab3d0ded8ca03760d9
2014-11-25 17:09:04 +01:00
Bjoern Rabenstein 9c3ecc2134 Remove unused flags.
Change-Id: Ie1bcbb0743d65e92072628811706d49753023205
2014-11-25 17:09:03 +01:00
Bjoern Rabenstein b3ed9aa7a2 Clean up start-up and shut-down.
Change-Id: Idff4bbb0a15a9f879bfbb3da5b1025179cab5e2c
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein 4447708c9f Fix a race in target.go.
Also, fix problems in shutdown.
Starting serving and shutdown still has to be cleaned up properly.
It's a mess.

Change-Id: I51061db12064e434066446e6fceac32741c4f84c
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein 934d09f738 Fix race during shutdown.
Change-Id: I2f8bf48d92a14f1e5ecde27c1b138734d7653394
2014-11-25 17:08:45 +01:00
Bjoern Rabenstein 8fba3302bc Bold changes to concurrency.
(WIP. Probably doesn't work yet.)

Change-Id: Id1537dfcca53831a1d428078a5863ece7bdf4875
2014-11-25 17:07:45 +01:00
Julius Volz a746fbb8bc Instrument indexing: queue length, batch sizes and latencies.
Change-Id: I60bcbd24b160e47d418a485d8cffa39344a257c6
2014-11-25 17:07:44 +01:00
Bjoern Rabenstein f5f9f3514a Major code cleanup.
- Make it go-vet and golint clean.
- Add comments, TODOs, etc.

Change-Id: If1392d96f3d5b4cdde597b10c8dff1769fcfabe2
2014-11-25 17:02:53 +01:00
Bjoern Rabenstein 3592dc2359 Implement series eviction.
Change-Id: I7a503e0ba78aae3761d032851b06f2807122b085
2014-11-25 17:02:52 +01:00
Bjoern Rabenstein af77d5ef0b Added a few missing implementations in index.go.
Also, added closing of persistence and mem storage.

Change-Id: Iacf0d22c3520dd2584d9546984c1f8a5ed6cd54e
2014-11-25 17:02:01 +01:00
Julius Volz e7ed39c9a6 Initial experimental snapshot of next-gen storage.
Change-Id: Ifb8709960dbedd1d9f5efd88cdd359ee9fa9d26d
2014-11-25 17:02:00 +01:00
Brian Brazil 5edf689133 Stagger scrapes to spread out load.
Change-Id: Ib141b271e4adfb817886871f86051c207b05cf35
2014-11-25 17:02:00 +01:00
Bjoern Rabenstein 8956faeccb Migrate to new client_golang.
This change will only be submitted when the new client_golang has been
moved to the new version.

Change-Id: Ifceb59333072a08286a8ac910709a8ba2e3a1581
2014-11-25 17:01:59 +01:00
Brian Brazil 3b3ec604c3 Stagger scrapes to spread out load.
Change-Id: Ib141b271e4adfb817886871f86051c207b05cf35
2014-08-20 17:07:10 +01:00
Bjoern Rabenstein 2128d9d811 Migrate to new client_golang.
This change will only be submitted when the new client_golang has been
moved to the new version.

Change-Id: Ifceb59333072a08286a8ac910709a8ba2e3a1581
2014-06-19 16:03:50 +02:00