prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2024-11-09 23:24:05 -08:00

Author	SHA1	Message	Date
Brian Brazil	e041c0cd46	Add console and alert templates with access to all data. Move rulemanager to it's own package to break cicrular dependency. Make NewTestTieredStorage available to tests, remove duplication. Change-Id: I33b321245a44aa727bfc3614a7c9ae5005b34e03	2014-05-30 16:24:56 +01:00
Julius Volz	01f652cb4c	Separate storage implementation from interfaces. This was initially motivated by wanting to distribute the rule checker tool under `tools/rule_checker`. However, this was not possible without also distributing the LevelDB dynamic libraries because the tool transitively depended on Levigo: rule checker -> query layer -> tiered storage layer -> leveldb This change separates external storage interfaces from the implementation (tiered storage, leveldb storage, memory storage) by putting them into separate packages: - storage/metric: public, implementation-agnostic interfaces - storage/metric/tiered: tiered storage implementation, including memory and LevelDB storage. I initially also considered splitting up the implementation into separate packages for tiered storage, memory storage, and LevelDB storage, but these are currently so intertwined that it would be another major project in itself. The query layers and most other parts of Prometheus now have notion of the storage implementation anymore and just use whatever implementation they get passed in via interfaces. The rule_checker is now a static binary :) Change-Id: I793bbf631a8648ca31790e7e772ecf9c2b92f7a0	2014-04-16 13:30:19 +02:00
Matt T. Proud	2064f32662	Clean up quitting behavior and add quit trigger. The closing of Prometheus now using a sync.Once wrapper to prevent any accidental multiple invocations of it, which could trigger corruption or a race condition. The shutdown process is made more verbose through logging. A not-enabled by default web handler has been provided to trigger a remote shutdown if requested for debugging purposes. Change-Id: If4fee75196bbff1fb1e4a4ef7e1cfa53fef88f2e	2014-04-15 21:40:04 +02:00
Matt T. Proud	81367893fd	Use idiomatic one-to-many one-time signal pattern. The idiomatic pattern for signalling a one-time message to multiple consumers from a single producer is as follows: ``` c := make(chan struct{}) w := new(sync.WaitGroup) // Boilerplate to ensure synchronization. for i := 0; i < 1000; i++ { w.Add(1) go func() { defer w.Done() for { select { case _, ok := <- c: if !ok { return } default: // Do something here. } } }() } close(c) // Signal the one-to-many single-use message. w.Wait() ``` Change-Id: I755f73ba4c70a923afd342a4dea63365bdf2144b	2014-04-15 10:15:25 +02:00
Matt T. Proud	1d01435d4d	Make curation semaphore behavior idiomatic. Idiomatic semaphore usage in Go, unless it is wrapping a concrete type, should use anonymous empty structs (``struct{}``). This has several features that are worthwhile: 1. It conveys that the object in the channel is likely used for resource limiting / semaphore use. This is by idiom. 2. Due to magic under the hood, empty structs have a width of zero, meaning they consume little space. It is presumed that slices, channels, and other values of them can be represented specially with alternative optimizations. Dmitry Vyukov has done investigations into improvements that can be made to the channel design and Go and concludes that there are already nice short circuiting behaviors at work with this type. This is the first change of several that apply this type of change to suitable places. In this one change, we fix a bug in the previous revision, whereby a semaphore can be acquired for curation and never released back for subsequent work: http://goo.gl/70Y2qK. Compare that versus the compaction definition above. On top of that, the use of the semaphore in the mode better supports system shutdown idioms through the closing of channels. Change-Id: Idb4fca310f26b73c9ec690bbdd4136180d14c32d	2014-04-14 22:51:58 +02:00
Julius Volz	817d9b0e97	"go fmt" fixup. Change-Id: I262bb462281bc2610819c822fc7a0768c6ce3d8d	2014-02-27 19:48:55 +01:00
Julius Volz	2279fcbac4	Compact everything to the same sample group size. Change-Id: Ibb4f3a5d76173d64de916ef1eb41ab5d7900c97b	2014-02-19 16:22:20 +01:00
Julius Volz	61d26e8445	Add optional sample replication to OpenTSDB. Prometheus needs long-term storage. Since we don't have enough resources to build our own timeseries storage from scratch ontop of Riak, Cassandra or a similar distributed datastore at the moment, we're planning on using OpenTSDB as long-term storage for Prometheus. It's data model is roughly compatible with that of Prometheus, with some caveats. As a first step, this adds write-only replication from Prometheus to OpenTSDB, with the following things worth noting: 1) I tried to keep the integration lightweight, meaning that anything related to OpenTSDB is isolated to its own package and only main knows about it (essentially it tees all samples to both the existing storage and TSDB). It's not touching the existing TieredStorage at all to avoid more complexity in that area. This might change in the future, especially if we decide to implement a read path for OpenTSDB through Prometheus as well. 2) Backpressure while sending to OpenTSDB is handled by simply dropping samples on the floor when the in-memory queue of samples destined for OpenTSDB runs full. Prometheus also only attempts to send samples once, rather than implementing a complex retry algorithm. Thus, replication to OpenTSDB is best-effort for now. If needed, this may be extended in the future. 3) Samples are sent in batches of limited size to OpenTSDB. The optimal batch size, timeout parameters, etc. may need to be adjusted in the future. 4) OpenTSDB has different rules for legal characters in tag (label) values. While Prometheus allows any characters in label values, OpenTSDB limits them to a to z, A to Z, 0 to 9, -, _, . and /. Currently any illegal characters in Prometheus label values are simply replaced by an underscore. Especially when integrating OpenTSDB with the read path in Prometheus, we'll need to reconsider this: either we'll need to introduce the same limitations for Prometheus labels or escape/encode illegal characters in OpenTSDB in such a way that they are fully decodable again when reading through Prometheus, so that corresponding timeseries in both systems match in their labelsets. Change-Id: I8394c9c55dbac3946a0fa497f566d5e6e2d600b5	2014-01-02 18:21:38 +01:00
Julius Volz	fb44580110	Cleanup/fix program termination sequence. Change-Id: I2bc58a2583fb079c9ef383cfc7a5e0fbe613f1cd	2013-12-11 15:40:32 +01:00
Julius Volz	740d448983	Use custom timestamp type for sample timestamps and related code. So far we've been using Go's native time.Time for anything related to sample timestamps. Since the range of time.Time is much bigger than what we need, this has created two problems: - there could be time.Time values which were out of the range/precision of the time type that we persist to disk, therefore causing incorrectly ordered keys. One bug caused by this was: https://github.com/prometheus/prometheus/issues/367 It would be good to use a timestamp type that's more closely aligned with what the underlying storage supports. - sizeof(time.Time) is 192, while Prometheus should be ok with a single 64-bit Unix timestamp (possibly even a 32-bit one). Since we store samples in large numbers, this seriously affects memory usage. Furthermore, copying/working with the data will be faster if it's smaller. MEMORY USAGE RESULTS Initial memory usage comparisons for a running Prometheus with 1 timeseries and 100,000 samples show roughly a 13% decrease in total (VIRT) memory usage. In my tests, this advantage for some reason decreased a bit the more samples the timeseries had (to 5-7% for millions of samples). This I can't fully explain, but perhaps garbage collection issues were involved. WHEN TO USE THE NEW TIMESTAMP TYPE The new clientmodel.Timestamp type should be used whenever time calculations are either directly or indirectly related to sample timestamps. For example: - the timestamp of a sample itself - all kinds of watermarks - anything that may become or is compared to a sample timestamp (like the timestamp passed into Target.Scrape()). When to still use time.Time: - for measuring durations/times not related to sample timestamps, like duration telemetry exporting, timers that indicate how frequently to execute some action, etc. NOTE ON OPERATOR OPTIMIZATION TESTS We don't use operator optimization code anymore, but it still lives in the code as dead code. It still has tests, but I couldn't get all of them to pass with the new timestamp format. I commented out the failing cases for now, but we should probably remove the dead code soon. I just didn't want to do that in the same change as this. Change-Id: I821787414b0debe85c9fffaeb57abd453727af0f	2013-12-03 09:11:28 +01:00
Matt T. Proud	df2f9e47b8	Fix arguments for format string. Change-Id: Ie0d55a70969c992b1afc6fa96b0e3f2171f0de5a	2013-09-04 18:38:29 +02:00
Matt T. Proud	33d6f73d18	Remove unused import. Change-Id: Ic2415bd1abb07c43a9df3adc4b13c75fd465c767	2013-09-04 18:13:06 +02:00
Matt T. Proud	4a87c002e8	Update low-level i'faces to reflect wireformats. This commit fixes a critique of the old storage API design, whereby the input parameters were always as raw bytes and never Protocol Buffer messages that encapsulated the data, meaning every place a read or mutation was conducted needed to manually perform said translations on its own. This is taxing. Change-Id: I4786938d0d207cefb7782bd2bd96a517eead186f	2013-09-04 17:13:58 +02:00
Matt T. Proud	12d5e6ca5a	Curation should not starve user-interactive ops. The background curation should be staggered to ensure that disk I/O yields to user-interactive operations in a timely manner. The lack of routine prioritization necessitates this. Change-Id: I9b498a74ccd933ffb856e06fedc167430e521d86	2013-08-26 19:40:55 +02:00
Julius Volz	93a8d03221	Merge "Add alert-expression console links to notifications."	2013-08-24 19:40:50 +02:00
Matt T. Proud	0d73b8f87e	Adjust curation thresholds and intervals. Curation is an expensive process. We can perform it less frequently; and in the case of compaction processors, we can afford to let them group more together. Change-Id: I421e27958ed7a98dfacaababefad19462772b6a3	2013-08-23 16:07:37 +02:00
Julius Volz	1eb1ceac8c	Add alert-expression console links to notifications. The ConsoleLinkForExpression() function now escapes console URLs in such a way that works both in emails and in HTML. Change-Id: I917bae0b526cbbac28ccd2a4ec3c5ac03ee4c647	2013-08-20 15:45:41 +02:00
Julius Volz	305662bfdd	Handle SIGTERM in addition to SIGINT. Change-Id: I9c91ec7e2b9026b9eb152b98a6986f2bf773eb03	2013-08-14 12:38:21 +02:00
Matt T. Proud	972e856d9b	Kill the curation state channel. The use of the channels for curation state were always unidiomatic. Change-Id: I1cb1d7175ebfb4faf28dff84201066278d6a0d92	2013-08-13 17:20:22 +02:00
Julius Volz	d69b85e6c9	Add global label support via Ingesters.	2013-08-13 16:54:15 +02:00
Matt T. Proud	417e6ffa0c	Rename the queue arguments for another trial. Again, playing around. Change-Id: I5413f7723a38ae18792802c5ed91753101adf491 Moving order of arguments by priority. Testing out patch sets. Change-Id: I5413f7723a38ae18792802c5ed91753101adf491	2013-08-13 14:05:12 +02:00
Matt T. Proud	014c5ef176	Move version to the bottom. Again, dummy commit. Just to demonstrate naive drafting with Gerrit. Change-Id: I43afaf346c61738a17be60eaa22c966545e1eccf	2013-08-13 13:52:24 +02:00
Matt T. Proud	13bfc6c120	Consolidate queue arguments in main. This is a no-op commit to demonstrate Gerrit workflow. Change-Id: I3f09ccd91e4645753517fa31ca97384e7548317f	2013-08-13 13:47:37 +02:00
Julius Volz	0003027dce	Add needed trailing spaces in logs.	2013-08-12 18:22:48 +02:00
Julius Volz	aa5d251f8d	Use github.com/golang/glog for all logging.	2013-08-12 17:54:36 +02:00
Julius Volz	3b970c5133	Add variable interpolation to notification messages. This includes required refactorings to enable replacing the http client (for testing) and moving the NotificationReq type definitions to the "notifications" package, so that this package doesn't need to depend on "rules" anymore and that it can instead use a representation of the required data which only includes the necessary fields.	2013-08-12 12:29:08 +02:00
Julius Volz	ecf0ee8f39	Transfer alerting rule and Prometheus URL to alertmanager.	2013-08-09 18:32:13 +02:00
Matt T. Proud	9baaadfc53	Include forgotten interval.	2013-08-05 18:34:19 +02:00
Matt T. Proud	07ac921aec	Code Review: First pass.	2013-08-05 17:31:49 +02:00
Matt T. Proud	d8792cfd86	Extract HighWatermarking. Clean up the rest.	2013-08-05 11:03:03 +02:00
Julius Volz	35ee2cd3cb	Add alertmanager notification support to Prometheus. Alert definitions now also have mandatory SUMMARY and DESCRIPTION fields that get sent along a firing alert to the alert manager.	2013-07-30 17:23:41 +02:00
Matt T. Proud	30b1cf80b5	WIP - Snapshot of Moving to Client Model.	2013-06-25 15:52:42 +02:00
Julius Volz	0226d1ac7a	Implement alerts dashboard and expression console links.	2013-06-13 22:35:40 +02:00
Julius Volz	ba29d07901	Show loaded rules in Status dashboard.	2013-06-11 11:39:31 +02:00
Matt T. Proud	2c3df44af6	Ensure database access waits until it is started. This commit introduces a channel message to ensure serving state has been reached with the storage stack before anything attempts to use it.	2013-06-06 10:42:21 +02:00
Julius Volz	adb87816f4	Put RuleManager concurrency in hands of caller, fix races.	2013-06-05 13:56:56 +02:00
Matt T. Proud	0d2d6e9a27	Include uptime in the status console. In order to help corroborate whether a Prometheus instance has flapped until meta-monitoring is in-place, we ought to provide the instance's start time in the console to aid in diagnostics.	2013-05-24 10:44:34 +02:00
Matt T. Proud	b063338ae6	Expand the in-memory arena size. We need to exercise this code path sooner versus later. This will be a cheap way of doing so.	2013-05-14 17:59:52 +02:00
Bernerd Schaefer	63d9988b9c	Drop unused writeMemoryInterval	2013-05-14 17:03:03 +02:00
Matt T. Proud	b224251981	Simplify compaction and expose database sizes. This commit simplifies the way that compactions across a database's keyspace occur due to reading the LevelDB internals. Secondarily it introduces the database size estimation mechanisms. Include database health and help interfaces. Add database statistics; remove status goroutines. This commit kills the use of Go routines to expose status throughout the web components of Prometheus. It also dumps raw LevelDB status on a separate /databases endpoint.	2013-05-14 12:29:53 +02:00
juliusv	92ad65ff13	Merge pull request #232 from prometheus/optimize/granular-storage-locking Synchronous memory appends and more fine-grained storage locks.	2013-05-13 10:11:57 -07:00
Matt T. Proud	d538b0382f	Include long-tail data deletion mechanism. This commit introduces the long-tail deletion mechanism, which will automatically cull old sample values. It is an acceptable hold-over until we get a resampling pipeline implemented. Kill legacy OS X documentation, too.	2013-05-13 10:54:36 +02:00
Julius Volz	ce1ee444f1	Synchronous memory appends and more fine-grained storage locks. This does two things: 1) Make TieredStorage.AppendSamples() write directly to memory instead of buffering to a channel first. This is needed in cases where a rule might immediately need the data generated by a previous rule. 2) Replace the single storage mutex by two new ones: - memoryMutex - needs to be locked at any time that two concurrent goroutines could be accessing (via read or write) the TieredStorage memoryArena. - memoryDeleteMutex - used to prevent any deletion of samples from memoryArena as long as renderView is running and assembling data from it. The LevelDB disk storage does not need to be protected by a mutex when rendering a view since renderView works off a LevelDB snapshot. The rationale against adding memoryMutex directly to the memory storage: taking a mutex does come with a small inherent time cost, and taking it is only required in few places. In fact, no locking is required for the memory storage instance which is part of a view (and not the TieredStorage).	2013-05-10 17:15:52 +02:00
Matt Proud	7f0d816574	Schedule the background compactors to run. This commit introduces three background compactors, which compact sparse samples together. 1. Older than five minutes is grouped together into chunks of 50 every 30 minutes. 2. Older than 60 minutes is grouped together into chunks of 250 every 50 minutes. 3. Older than one day is grouped together into chunks of 5000 every 70 minutes.	2013-05-07 17:14:04 +02:00
Julius Volz	af7920126c	Fix build errors and add default build step to "make".	2013-05-07 15:54:41 +02:00
Julius Volz	56324d8ce2	Make AST query storage non-global.	2013-05-07 13:15:10 +02:00
Matt T. Proud	3b9b1c6ab4	Define dependencies for web. stack concretely. This commit destroys the use of AppState, which makes passing concrete state along to various serving components onerous.	2013-05-06 11:13:12 +02:00
Matt T. Proud	ce45787dbf	Storage interface to TieredStorage. This commit drops the Storage interface and just replaces it with a publicized TieredStorage type. Storage had been anticipated to be used as a wrapper for testability but just was not used due to practicality. Merely overengineered. My bad. Anyway, we will eventually instantiate the TieredStorage dependencies in main.go and pass them in for more intelligent lifecycle management. These changes will pave the way for managing the curators without Law of Demeter violations.	2013-05-03 15:54:14 +02:00
Julius Volz	9cea5d9df8	Convert the Prometheus configuration to protocol buffers.	2013-04-30 22:26:00 +02:00
Julius Volz	368a792dd2	Adjust memory queue size after change to send arrays over channel.	2013-04-30 13:41:04 +02:00
juliusv	b02debd69c	Merge pull request #205 from prometheus/julius-channel-arrays Send sample arrays instead of single samples over channels.	2013-04-29 09:05:05 -07:00
Julius Volz	d8110fcd9c	Send sample arrays instead of single samples over channels.	2013-04-29 17:24:17 +02:00
Matt T. Proud	3362bf36e2	Include curator status in web heads-up-display.	2013-04-29 12:40:33 +02:00
Matt T. Proud	6fac20c8af	Harden the tests against OOMs. This commit employs explicit memory freeing for the in-memory storage arenas. Secondarily, we take advantage of smaller channel buffer sizes in the test.	2013-04-29 11:46:01 +02:00
Bernerd Schaefer	c152aa514f	Always print version information when starting up	2013-04-25 13:14:50 +02:00
Bernerd Schaefer	a2a4f94aae	StatusHandler renders build info	2013-04-25 11:57:08 +02:00
Bernerd Schaefer	033533c4c5	Capture build information and print with -version	2013-04-25 11:47:48 +02:00
Matt T. Proud	d468271e2f	Fix append queue telemetry and parameterize sizes. The original append queue telemetry never worked, because it was updated only upon the exit of the select statement, which would usually liberate the queues of contents. This has been fixed to be reported arbitrarily. The queue sizes are now parameterizable via flags.	2013-04-16 17:13:29 +02:00
Julius Volz	c4d0969c00	Propagate more errors during rule evaluation.	2013-04-09 13:47:20 +02:00
Julius Volz	ec413459fa	Depointerize Matrix/Vector types as well as time.Time arguments.	2013-03-28 18:07:12 +01:00
Julius Volz	55ca65aa6e	More userfriendly output when we fail to create the tiered storage.	2013-03-27 11:25:05 +01:00
Julius Volz	8e4c5b0cea	Use AST query analyzer and views with tiered storage.	2013-03-21 18:16:52 +01:00
Matt T. Proud	8cc5cdde0b	checkpoint.	2013-03-21 18:08:46 +01:00
Matt T. Proud	d5380897c3	Cleanups and adds performance regression.	2013-03-21 18:06:51 +01:00
Matt T. Proud	f39b9c3c8e	Checkpoint.	2013-03-21 18:06:51 +01:00
Matt T. Proud	41068c2e84	Checkpoint.	2013-03-21 18:06:51 +01:00
Matt T. Proud	13ae29b304	Initial in-memory arena implementation. It is unbounded, and nothing uses it except for a gating flag in main.	2013-02-18 09:38:14 -06:00
Julius Volz	a908e397bc	Integrate cleanups for comments in PR70.	2013-02-14 19:03:17 +01:00
Julius Volz	23374788d3	Beginnings of a Prometheus status page.	2013-02-14 19:03:17 +01:00
Matt T. Proud	4fbcea73f5	MetricPersistence.AppendSample signature changes. ``MetricPersistence.AppendSample(*model.Sample)`` -> ``MetricPersistence.AppendSample(model.Sample)``.	2013-02-13 13:46:28 -08:00
Julius Volz	0cbd03ccf9	Move web-related code/resources to a subdirectory.	2013-02-08 14:52:36 +01:00
Julius Volz	d67e4b9131	Address outstanding comments from PR/47 and other cleanups.	2013-02-07 11:38:01 +01:00
Julius Volz	8194702bb3	Allow selecting available metrics through UI.	2013-02-07 11:02:12 +01:00
Matt T. Proud	a7953251ed	Incorporate pprof HTTP debugging handler. This registers the pprof debugging handler via the static side- effects documented in http://golang.org/pkg/net/http/pprof/.	2013-02-05 17:17:33 +01:00
Matt T. Proud	ea54751431	Update import paths to new location. This repository moved from matttproud/prometheus to prometheus/prometheus, and all import paths need to be updated.	2013-01-27 18:49:45 +01:00
Matt T. Proud	ed0db275d8	Merge pull request #48 from matttproud/feature/versioned-processors Replacing the Previous …	2013-01-27 08:50:47 -08:00
Matt T. Proud	f2ded515b7	Support versioned telemetry providers. client_golang was updated to support full label-oriented telemetry, which introduced interface incompatibilities with the previous version of Prometheus. To alleviate this, a general fetching and processing dispatching system has been created, which discriminates and processes according to the version of input.	2013-01-27 17:45:50 +01:00
Julius Volz	c049ae39af	Cleanups to rules/persistence adapter code.	2013-01-25 12:22:55 +01:00
Tobias Schmidt	6523536758	Fix bug in config.LoadFromFile if file is not present	2013-01-25 00:41:10 +01:00
Julius Volz	127c61a55f	Add command-line arguments for config file and storage path.	2013-01-22 19:32:56 +01:00
Matt T. Proud	efe61c18fa	Refactor target scheduling to separate facility. ``Target`` will be refactored down the road to support various nuanced endpoint types. Thusly incorporating the scheduling behavior within it will be problematic. To that end, the scheduling behavior has been moved into a separate assistance type to improve conciseness and testability. ``make format`` was also run.	2013-01-13 10:43:37 +01:00
Julius Volz	17a4a442b3	Add REST API, expression browser, and text/JSON output formats.	2013-01-11 02:27:03 +01:00
Julius Volz	56384bf42a	Add initial config and rule language implementation.	2013-01-07 23:43:36 +01:00
Matt T. Proud	aa474d3623	Improve interruption handling.	2013-01-06 23:30:46 +01:00
Matt T. Proud	52f52a7ee2	Include nascent instrumentation of stack.	2013-01-04 23:32:46 +01:00
Matt T. Proud	2922def8d0	Use the ``TargetManager`` for targets.	2013-01-04 17:17:23 +01:00
Matt T. Proud	3ac5d48b1a	Impl' storage i'faces and fix non-idiomatic warts. This change includes implementation of most major storage layer features, albeit some imperfect. It also includes nascent telemetry bindings, too.	2013-01-04 10:39:38 +01:00
Matt T. Proud	6589fc92f8	Strip web services, which weren't adding value.	2012-12-12 12:04:46 +01:00
Matt T. Proud	c1f0d8aefd	Levigo -> LevelDB in terminology and references.	2012-11-28 20:25:19 +01:00
Matt T. Proud	44f8802ae7	Add Apache License 2.0 boilerplate.	2012-11-26 20:11:34 +01:00
Matt T. Proud	2bbdaa5790	Initial directory re-arrangement for storage.	2012-11-26 19:56:51 +01:00
Matt T. Proud	6072143505	Initial commit of external resources.	2012-11-24 12:33:34 +01:00

1 2 3

142 commits