Merge pull request #3367 from prometheus/docs

A few 2.0 doc adjustments
2025-03-05 20:59:13 -08:00 · 2017-10-28 13:22:46 +02:00 · 2017-10-28 13:22:46 +02:00 · 94d9a200aa
parent faf4bb03ee a32e4cbdd8
commit 94d9a200aa
5 changed files with 6 additions and 374 deletions
--- a/docs/configuration/alerting_rules.md
+++ b/docs/configuration/alerting_rules.md
@ -94,5 +94,6 @@ on top of the simple alert definitions. In Prometheus's ecosystem, the
 [Alertmanager](https://prometheus.io/docs/alerting/alertmanager/) takes on this
 role. Thus, Prometheus may be configured to periodically send information about
 alert states to an Alertmanager instance, which then takes care of dispatching
-the right notifications. The Alertmanager instance may be configured via the
+the right notifications.  
-`-alertmanager.url` command line flag.
+Prometheus can be [configured](configuration.md) to automatically discovered available
 Alertmanager instances through its service discovery integrations.
--- a/docs/getting_started.md
+++ b/docs/getting_started.md
@ -65,8 +65,8 @@ directory containing the Prometheus binary and run:
 ```bash
 # Start Prometheus.
-# By default, Prometheus stores its database in ./data (flag -storage.local.path).
+# By default, Prometheus stores its database in ./data (flag --storage.tsdb.path).
-./prometheus -config.file=prometheus.yml
+./prometheus --config.file=prometheus.yml
 ```
 Prometheus should start up. You should also be able to browse to a status page
@ -77,17 +77,6 @@ You can also verify that Prometheus is serving metrics about itself by
 navigating to its metrics endpoint:
 [localhost:9090/metrics](http://localhost:9090/metrics)
 The number of OS threads executed by Prometheus is controlled by the
 `GOMAXPROCS` environment variable. As of Go 1.5 the default value is
 the number of cores available.
 Blindly setting `GOMAXPROCS` to a high value can be counterproductive. See the
 relevant [Go FAQs](http://golang.org/doc/faq#Why_no_multi_CPU).
 Prometheus by default uses around 3GB in memory. If you have a
 smaller machine, you can tune Prometheus to use less memory.  For details,
 see the [memory usage documentation](storage.md#memory-usage).
 ## Using the expression browser
 Let us try looking at some data that Prometheus has collected about itself. To
--- a/docs/index.md
+++ b/docs/index.md
@ -15,5 +15,4 @@ The documentation is available alongside all the project documentation at
 - [Getting started](getting_started.md)
 - [Configuration](configuration/configuration.md)
 - [Querying](querying/basics.md)
 - [Storage](storage.md)
 - [Federation](federation.md)
--- a/docs/installation.md
+++ b/docs/installation.md
@ -51,7 +51,7 @@ Or use an additional volume for the config:
 ```bash
 docker run -p 9090:9090 -v /prometheus-data \
-       prom/prometheus -config.file=/prometheus-data/prometheus.yml
+       prom/prometheus --config.file=/prometheus-data/prometheus.yml
 ```
 ### Custom image
--- a/docs/storage.md
+++ b/docs/storage.md
@ -1,357 +0,0 @@
 ---
 title: Storage
 sort_rank: 5
 ---
 # Storage
 Prometheus has a sophisticated local storage subsystem. For indexes,
 it uses [LevelDB](https://github.com/google/leveldb). For the bulk
 sample data, it has its own custom storage layer, which organizes
 sample data in chunks of constant size (1024 bytes payload). These
 chunks are then stored on disk in one file per time series.
 This sections deals with the various configuration settings and issues you
 might run into. To dive deeper into the topic, check out the following talks:
 * [The Prometheus Time Series Database](https://www.youtube.com/watch?v=HbnGSNEjhUc).
 * [Configuring Prometheus for High Performance](https://www.youtube.com/watch?v=hPC60ldCGm8).
 ## Memory usage
 Prometheus keeps all the currently used chunks in memory. In addition, it keeps
 as many most recently used chunks in memory as possible. You have to tell
 Prometheus how much memory it may use for this caching. The flag
 `storage.local.target-heap-size` allows you to set the heap size (in bytes)
 Prometheus aims not to exceed. Note that the amount of physical memory the
 Prometheus server will use is the result of complex interactions of the Go
 runtime and the operating system and very hard to predict precisely. As a rule
 of thumb, you should have at least 50% headroom in physical memory over the
 configured heap size. (Or, in other words, set `storage.local.target-heap-size`
 to a value of two thirds of the physical memory limit Prometheus should not
 exceed.)
 The default value of `storage.local.target-heap-size` is 2GiB and thus tailored
 to 3GiB of physical memory usage. If you have less physical memory available,
 you have to lower the flag value. If you have more memory available, you should
 raise the value accordingly. Otherwise, Prometheus will not make use of the
 memory and thus will perform much worse than it could.
 Because Prometheus uses most of its heap for long-lived allocations of memory
 chunks, the
 [garbage collection target percentage](https://golang.org/pkg/runtime/debug/#SetGCPercent)
 is set to 40 by default. You can still override this setting via the `GOGC`
 environment variable as usual. If you need to conserve CPU capacity and can
 accept running with fewer memory chunks, try higher values.
 For high-performance set-ups, you might need to adjust more flags. Please read
 through the sections below for details.
 NOTE: Prior to v1.6, there was no flag `storage.local.target-heap-size`.
 Instead, the number of chunks kept in memory had to be configured using the
 flags `storage.local.memory-chunks` and `storage.local.max-chunks-to-persist`.
 These flags still exist for compatibility reasons. However,
 `storage.local.max-chunks-to-persist` has no effect anymore, and if
 `storage.local.memory-chunks` is set to a non-zero value _x_, it is used to
 override the value for `storage.local.target-heap-size` to 3072*_x_.
 ## Disk usage
 Prometheus stores its on-disk time series data under the directory specified by
 the flag `storage.local.path`. The default path is `./data` (relative to the
 working directory), which is good to try something out quickly but most likely
 not what you want for actual operations. The flag `storage.local.retention`
 allows you to configure the retention time for samples. Adjust it to your needs
 and your available disk space.
 ## Chunk encoding
 Prometheus currently offers three different types of chunk encodings. The chunk
 encoding for newly created chunks is determined by the
 `-storage.local.chunk-encoding-version` flag. The valid values are 0, 1,
 or 2.
 Type 0 is the simple delta encoding implemented for Prometheus's first chunked
 storage layer. Type 1 is the current default encoding, a double-delta encoding
 with much better compression behavior than type 0. Both encodings feature a
 fixed byte width per sample over the whole chunk, which allows fast random
 access. While type 0 is the fastest encoding, the difference in encoding cost
 compared to encoding 1 is tiny. Due to the better compression behavior of type
 1, there is really no reason to select type 0 except compatibility with very
 old Prometheus versions.
 Type 2 is a variable bit-width encoding, i.e. each sample in the chunk can use
 a different number of bits. Timestamps are double-delta encoded, too, but with
 a slightly different algorithm. A number of different encoding schemes are
 available for sample values. The choice is made per chunk based on the nature
 of the sample values (constant, integer, regularly increasing, random…). Major
 parts of the type 2 encoding are inspired by a paper published by Facebook
 engineers:
 [_Gorilla: A Fast, Scalable, In-Memory Time Series Database_](http://www.vldb.org/pvldb/vol8/p1816-teller.pdf).
 With type 2, access within a chunk has to happen sequentially, and the encoding
 and decoding cost is a bit higher. Overall, type 2 will cause more CPU usage
 and increased query latency compared to type 1 but offers a much improved
 compression ratio. The exact numbers depend heavily on the data set and the
 kind of queries. Below are results from a typical production server with a
 fairly expensive set of recording rules.
 Chunk type | bytes per sample | cores | rule evaluation duration
 :------:|:-----:|:----:|:----:
 1 | 3.3 | 1.6 | 2.9s
 2 | 1.3 | 2.4 | 4.9s
 You can change the chunk encoding each time you start the server, so
 experimenting with your own use case is encouraged. Take into account, however,
 that only newly created chunks will use the newly selected chunk encoding, so
 it will take a while until you see the effects.
 For more details about the trade-off between the chunk encodings, see
 [this blog post](https://prometheus.io/blog/2016/05/08/when-to-use-varbit-chunks/).
 ## Settings for high numbers of time series
 Prometheus can handle millions of time series. However, with the above
 mentioned default setting for `storage.local.target-heap-size`, you will be
 limited to about 200,000 time series simultaneously present in memory. For more
 series, you need more memory, and you need to configure Prometheus to make use
 of it as described above.
 Each of the aforementioned chunks contains samples of a single time series. A
 time series is thus represented as a series of chunks, which ultimately end up
 in a time series file (one file per time series) on disk.
 A series that has recently received new samples will have an open incomplete
 _head chunk_. Once that chunk is completely filled, or the series hasn't
 received samples in a while, the head chunk is closed and becomes a chunk
 waiting to be appended to its corresponding series file, i.e. it is _waiting
 for persistence_. After the chunk has been persisted to disk, it becomes
 _evictable_, provided it is not currently used by a query. Prometheus will
 evict evictable chunks from memory to satisfy the configured target heap
 size. A series with an open head chunk is called an _active series_. This is
 different from a _memory series_, which also includes series without an open
 head chunk but still other chunks in memory (whether waiting for persistence,
 used in a query, or evictable). A series without any chunks in memory may be
 _archived_, upon which it ceases to have any mandatory memory footprint.
 The amount of chunks Prometheus can keep in memory depends on the flag value
 for `storage.local.target-heap-size` and on the amount of memory used by
 everything else. If there are not enough chunks evictable to satisfy the target
 heap size, Prometheus will throttle ingestion of more samples (by skipping
 scrapes and rule evaluations) until the heap has shrunk enough. _Throttled
 ingestion is really bad for various reasons. You really do not want to be in
 that situation._
 Open head chunks, chunks still waiting for persistence, and chunks being used
 in a query are not evictable. Thus, the reasons for the inability to evict
 enough chunks include the following:
 1. Queries that use too many chunks.
 2. Chunks are piling up waiting for persistence because the storage layer
   cannot keep up writing chunks.
 3. There are too many active time series, which results in too many open head
   chunks.
 Currently, Prometheus has no defence against case (1). Abusive queries will
 essentially OOM the server.
 To defend against case (2), there is a concept of persistence urgency explained
 in the next section.
 Case (3) depends on the targets you monitor. To mitigate an unplanned explosion
 of the number of series, you can limit the number of samples per individual
 scrape (see `sample_limit` in the [scrape config](configuration/configuration.md#scrape_config)).
 If the number of active time series exceeds the number of memory chunks the
 Prometheus server can afford, the server will quickly throttle ingestion as
 described above. The only way out of this is to give Prometheus more RAM or
 reduce the number of time series to ingest.
 In fact, you want many more memory chunks than you have series in
 memory. Prometheus tries to batch up disk writes as much as possible as it
 helps for both HDD (write as much as possible after each seek) and SSD (tiny
 writes create write amplification, which limits the effective throughput and
 burns much more quickly through the lifetime of the device). The more
 Prometheus can batch up writes, the more efficient is the process of persisting
 chunks to disk. which helps case (2).
 In conclusion, to keep the Prometheus server healthy, make sure it has plenty
 of headroom of memory chunks available for the number of memory series. A
 factor of three is a good starting point. Refer to the
 [section about helpful metrics](#helpful-metrics) to find out what to look
 for. A very broad rule of thumb for an upper limit of memory series is the
 total available physical memory divided by 10,000, e.g. About 6M memory series
 on a 64GiB server.
 If you combine a high number of time series with very fast and/or large
 scrapes, the number of pre-allocated mutexes for series locking might not be
 sufficient. If you see scrape hiccups while Prometheus is writing a checkpoint
 or processing expensive queries, try increasing the value of the
 `storage.local.num-fingerprint-mutexes` flag. Sometimes tens of thousands or
 even more are required.
 PromQL queries that involve a high number of time series will make heavy use of
 the LevelDB-backed indexes. If you need to run queries of that kind, tweaking
 the index cache sizes might be required. The following flags are relevant:
 * `-storage.local.index-cache-size.label-name-to-label-values`: For regular
  expression matching.
 * `-storage.local.index-cache-size.label-pair-to-fingerprints`: Increase the
  size if a large number of time series share the same label pair or name.
 * `-storage.local.index-cache-size.fingerprint-to-metric` and
  `-storage.local.index-cache-size.fingerprint-to-timerange`: Increase the size
  if you have a large number of archived time series, i.e. series that have not
  received samples in a while but are still not old enough to be purged
  completely.
 You have to experiment with the flag values to find out what helps. If a query
 touches 100,000+ time series, hundreds of MiB might be reasonable. If you have
 plenty of memory available, using more of it for LevelDB cannot harm. More
 memory for LevelDB will effectively reduce the number of memory chunks
 Prometheus can afford.
 ## Persistence urgency and “rushed mode”
 Naively, Prometheus would all the time try to persist completed chunk to disk
 as soon as possible. Such a strategy would lead to many tiny write operations,
 using up most of the I/O bandwidth and keeping the server quite busy. Spinning
 disks will appear to be very slow because of the many slow seeks required, and
 SSDs will suffer from write amplification. Prometheus tries instead to batch up
 write operations as much as possible, which works better if it is allowed to
 use more memory.
 Prometheus will also sync series files after each write (with
 `storage.local.series-sync-strategy=adaptive`, which is the default) and use
 the disk bandwidth for more frequent checkpoints (based on the count of “dirty
 series”, see [below](#crash-recovery)), both attempting to minimize data loss
 in case of a crash.
 But what to do if the number of chunks waiting for persistence grows too much?
 Prometheus calculates a score for urgency to persist chunks. The score is
 between 0 and 1, where 1 corresponds to the highest urgency. Depending on the
 score, Prometheus will write to disk more frequently. Should the score ever
 pass the threshold of 0.8, Prometheus enters “rushed mode” (which you can see
 in the logs). In rushed mode, the following strategies are applied to speed up
 persisting chunks:
 * Series files are not synced after write operations anymore (making better use
  of the OS's page cache at the price of an increased risk of losing data in
  case of a server crash – this behavior can be overridden with the flag
  `storage.local.series-sync-strategy`).
 * Checkpoints are only created as often as configured via the
  `storage.local.checkpoint-interval` flag (freeing more disk bandwidth for
  persisting chunks at the price of more data loss in case of a crash and an
  increased time to run the subsequent crash recovery).
 * Write operations to persist chunks are not throttled anymore and performed as
  fast as possible.
 Prometheus leaves rushed mode once the score has dropped below 0.7.
 Throttling of ingestion happens if the urgency score reaches 1. Thus, the
 rushed mode is not _per se_ something to be avoided. It is, on the contrary, a
 measure the Prometheus server takes to avoid the really bad situation of
 throttled ingestion. Occasionally entering rushed mode is OK, if it helps and
 ultimately leads to leaving rushed mode again. _If rushed mode is entered but
 the urgency score still goes up, the server has a real problem._
 ## Settings for very long retention time
 If you have set a very long retention time via the `storage.local.retention`
 flag (more than a month), you might want to increase the flag value
 `storage.local.series-file-shrink-ratio`.
 Whenever Prometheus needs to cut off some chunks from the beginning of a series
 file, it will simply rewrite the whole file. (Some file systems support “head
 truncation”, which Prometheus currently does not use for several reasons.) To
 not rewrite a very large series file to get rid of very few chunks, the rewrite
 only happens if at least 10% of the chunks in the series file are removed. This
 value can be changed via the mentioned `storage.local.series-file-shrink-ratio`
 flag. If you have a lot of disk space but want to minimize rewrites (at the
 cost of wasted disk space), increase the flag value to higher values, e.g. 0.3
 for 30% of required chunk removal.
 ## Crash recovery
 Prometheus saves chunks to disk as soon as possible after they are
 complete. Incomplete chunks are saved to disk during regular
 checkpoints. You can configure the checkpoint interval with the flag
 `storage.local.checkpoint-interval`. Prometheus creates checkpoints
 more frequently than that if too many time series are in a “dirty”
 state, i.e. their current incomplete head chunk is not the one that is
 contained in the most recent checkpoint. This limit is configurable
 via the `storage.local.checkpoint-dirty-series-limit` flag.
 More active time series to cycle through lead in general to more chunks waiting
 for persistence, which in turns leads to larger checkpoints and ultimately more
 time needed for checkpointing. There is a clear trade-off between limiting the
 loss of data in case of a crash and the ability to scale to high number of
 active time series. To not spend the majority of the disk throughput for
 checkpointing, you have to increase the checkpoint interval. Prometheus itself
 limits the time spent in checkpointing to 50% by waiting after each
 checkpoint's completion for at least as long as the previous checkpoint took.
 Nevertheless, should your server crash, you might still lose data, and
 your storage might be left in an inconsistent state. Therefore,
 Prometheus performs a crash recovery after an unclean shutdown,
 similar to an `fsck` run for a file system. Details about the crash
 recovery are logged, so you can use it for forensics if required. Data
 that cannot be recovered is moved to a directory called `orphaned`
 (located under `storage.local.path`). Remember to delete that data if
 you do not need it anymore.
 The crash recovery usually takes less than a minute. Should it take much
 longer, consult the log to find out what is going on. With increasing number of
 time series in the storage (archived or not), the re-indexing tends to dominate
 the recovery time and can take tens of minutes in extreme cases.
 ## Data corruption
 If you suspect problems caused by corruption in the database, you can
 enforce a crash recovery by starting the server with the flag
 `storage.local.dirty`.
 If that does not help, or if you simply want to erase the existing
 database, you can easily start fresh by deleting the contents of the
 storage directory:
   1. Stop Prometheus.
   1. `rm -r <storage path>/*`
   1. Start Prometheus.
 ## Helpful metrics
 Out of the metrics that Prometheus exposes about itself, the following are
 particularly useful to tweak flags and find out about the required
 resources. They also help to create alerts to find out in time if a Prometheus
 server has problems or is out of capacity.
 * `prometheus_local_storage_memory_series`: The current number of series held
  in memory.
 * `prometheus_local_storage_open_head_chunks`: The number of open head chunks.
 * `prometheus_local_storage_chunks_to_persist`: The number of memory chunks
  that still need to be persisted to disk.
 * `prometheus_local_storage_memory_chunks`: The current number of chunks held
  in memory. If you substract the previous two, you get the number of persisted
  chunks (which are evictable if not currently in use by a query).
 * `prometheus_local_storage_series_chunks_persisted`: A histogram of the number
  of chunks persisted per batch.
 * `prometheus_local_storage_persistence_urgency_score`: The urgency score as
  discussed [above](#persistence-urgency-and-rushed-mode).
 * `prometheus_local_storage_rushed_mode` is 1 if Prometheus is in “rushed
  mode”, 0 otherwise. Can be used to calculate the percentage of time
  Prometheus is in rushed mode.
 * `prometheus_local_storage_checkpoint_last_duration_seconds`: How long the
  last checkpoint took.
 * `prometheus_local_storage_checkpoint_last_size_bytes`: Size of the last
  checkpoint in bytes.
 * `prometheus_local_storage_checkpointing` is 1 while Prometheus is
  checkpointing, 0 otherwise. Can be used to calculate the percentage of time
  Prometheus is checkpointing.
 * `prometheus_local_storage_inconsistencies_total`: Counter for storage
  inconsistencies found. If this is greater than 0, restart the server for
  recovery.
 * `prometheus_local_storage_persist_errors_total`: Counter for persist errors.
 * `prometheus_local_storage_memory_dirty_series`: Current number of dirty series.
 * `process_resident_memory_bytes`: Broadly speaking the physical memory
  occupied by the Prometheus process.
 * `go_memstats_alloc_bytes`: Go heap size (allocated objects in use plus allocated
  objects not in use anymore but not yet garbage-collected).