prometheus-metrics-documentation

Signed-off-by: Tomiwa <tomiwaaribisala@gmail.com>
2025-03-05 20:59:13 -08:00 · 2024-07-18 14:54:56 +01:00 · 2024-07-18 14:54:56 +01:00 · ffedf3fb6c
parent b75e635374
commit ffedf3fb6c
1 changed files with 166 additions and 0 deletions
--- a/docs/metrics.md
+++ b/docs/metrics.md
@ -0,0 +1,166 @@
+## Prometheus Metrics
+
+### List of Metrics Exported By Prometheus
+
+| Metric Name                                                  | Type      | Description                                                                                                                                                                         |
+|--------------------------------------------------------------|-----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| prometheus_api_remote_read_queries               | gauge   | The current number of remote read queries being executed or waiting.                                                                                                                        |
+| prometheus_build_info              | gauge   | A metric with a constant '1' value labeled by version, revision, branch, goversion from which prometheus was built, and the goos and goarch for the build.                                                                                                                        |
+| prometheus_config_last_reload_success_timestamp_seconds             | gauge   | Timestamp of the last successful configuration reload.                                                                                                                        |
+| prometheus_config_last_reload_successful          | gauge   | Whether the last configuration reload attempt was successful.                                                                                                                        |
+| prometheus_engine_queries        | gauge   | The current number of queries being executed or waiting.                                                                                                                        |
+| prometheus_engine_queries_concurrent_max      | gauge   | The max number of concurrent queries.                                                                                                                        |
+| prometheus_engine_query_duration_seconds     | summary  | Query timings.                                                                                                                        |
+| prometheus_engine_query_log_enabled    | gauge | State of the query log.                                                                                                                        |
+| prometheus_engine_query_log_failures_total   | counter | The number of query log failures.                                                                                                                        |
+| prometheus_engine_query_samples_total   | counter | The total number of samples loaded by all queries.                                                                                                                        |
+| prometheus_http_request_duration_seconds  | histogram | Histogram of latencies for HTTP requests.                                                                                                                        |
+| prometheus_http_requests_total  | counter | Counter of HTTP requests.                                                                                                                        |
+| prometheus_http_response_size_bytes  | histogram | Histogram of response size for HTTP requests.                                                                                                                        |
+| prometheus_notifications_alertmanagers_discovered  | gauge | The number of alertmanagers discovered and active.                                                                                                                        |
+| prometheus_notifications_dropped_total  | counter | Total number of alerts dropped due to errors when sending to Alertmanager.                                                                                                                        |
+| prometheus_notifications_errors_total  | counter | Total number of errors sending alert notifications.                                                                                                                        |
+| prometheus_notifications_latency_seconds  | summary | Latency quantiles for sending alert notifications.                                                                                                                        |
+| prometheus_notifications_queue_capacity  | gauge | The capacity of the alert notifications queue.                                                                                                                        |
+| prometheus_notifications_queue_length | gauge | The number of alert notifications in the queue.                                                                                                                        |
+| prometheus_notifications_sent_total | counter | Total number of alerts sent.                                                                                                                        |
+| prometheus_ready | gauge | Whether Prometheus startup was fully completed and the server is ready for normal operation.                                                                                                                        |
+| prometheus_remote_storage_exemplars_in_total | counter | Exemplars in to remote storage, compare to exemplars out for queue managers.                                                                                                                        |
+| prometheus_remote_storage_highest_timestamp_in_seconds | gauge | Highest timestamp that has come into the remote storage via the Appender interface, in seconds since epoch.                                                                                                                        |
+| prometheus_remote_storage_histograms_in_total | counter | HistogramSamples in to remote storage, compare to histograms out for queue managers.                                                                                                                        |
+| prometheus_remote_storage_samples_in_total | counter | Samples in to remote storage, compare to samples out for queue managers.                                                                                                                        |
+| prometheus_remote_storage_string_interner_zero_reference_releases_total | counter | The number of times release has been called for strings that are not interned.                                                                                                                        |
+| prometheus_rule_evaluation_duration_seconds | summary | The duration for a rule to execute.                                                                                                                        |
+| prometheus_rule_evaluation_failures_total | counter | The total number of rule evaluation failures.                                                                                                                        |
+| prometheus_rule_evaluations_total | counter | The total number of rule evaluations.                                                                                                                        |
+| prometheus_rule_group_duration_seconds | summary | The duration of rule group evaluations.                                                                                                                        |
+| prometheus_rule_group_interval_seconds | gauge | The interval of a rule group.                                                                                                                        |
+| prometheus_rule_group_iterations_missed_total | counter | The total number of rule group evaluations missed due to slow rule group evaluation.                                                                                                                        |
+| prometheus_rule_group_iterations_total | counter | The total number of scheduled rule group evaluations, whether executed or missed.                                                                                                                        |
+| prometheus_rule_group_last_duration_seconds | gauge | The duration of the last rule group evaluation.                                                                                                                        |
+| prometheus_rule_group_last_evaluation_samples | gauge | The number of samples returned during the last rule group evaluation.                                                                                                                        |
+| prometheus_rule_group_last_evaluation_timestamp_seconds | gauge | The timestamp of the last rule group evaluation in seconds.                                                                                                                        |
+| prometheus_rule_group_rules | gauge | The number of rules.                                                                                                                        |
+| prometheus_sd_azure_cache_hit_total | counter | Number of cache hit during refresh.                                                                                                                        |
+| prometheus_sd_azure_failures_total | counter | Number of Azure service discovery refresh failures.                                                                                                                        |
+| prometheus_sd_consul_rpc_duration_seconds | summary | The duration of a Consul RPC call in seconds.                                                                                                                        |
+| prometheus_sd_consul_rpc_failures_total | counter | The number of Consul RPC call failures.                                                                                                                        |
+| prometheus_sd_discovered_targets | gauge | Current number of discovered targets.                                                                                                                        |
+| prometheus_sd_dns_lookup_failures_total | counter | The number of DNS-SD lookup failures.                                                                                                                        |
+| prometheus_sd_dns_lookups_total | counter | The number of DNS-SD lookups.                                                                                                                        |
+| prometheus_sd_failed_configs | gauge | Current number of service discovery configurations that failed to load.                                                                                                                        |
+| prometheus_sd_file_mtime_seconds | gauge | Timestamp (mtime) of files read by FileSD. Timestamp is set at read time.                                                                                                                        |
+| prometheus_sd_file_read_errors_total | counter | The number of File-SD read errors.                                                                                                                        |
+| prometheus_sd_file_scan_duration_seconds | summary | The duration of the File-SD scan in seconds.                                                                                                                        |
+| prometheus_sd_file_watcher_errors_total | counter | The number of File-SD errors caused by filesystem watch failures.                                                                                                                        |
+| prometheus_sd_http_failures_total | counter | Number of HTTP service discovery refresh failures.                                                                                                                        |
+| prometheus_sd_kubernetes_events_total | counter | The number of Kubernetes events handled.                                                                                                                        |
+| prometheus_sd_kubernetes_failures_total | counter | The number of failed WATCH/LIST requests.                                                                                                                        |
+| prometheus_sd_kuma_fetch_duration_seconds | summary | The duration of a Kuma MADS fetch call.                                                                                                                        |
+| prometheus_sd_kuma_fetch_failures_total | counter | The number of Kuma MADS fetch call failures.                                                                                                                        |
+| prometheus_sd_kuma_fetch_skipped_updates_total | counter | The number of Kuma MADS fetch calls that result in no updates to the targets.                                                                                                                        |
+| prometheus_sd_linode_failures_total | counter | Number of Linode service discovery refresh failures.                                                                                                                        |
+| prometheus_sd_nomad_failures_total | counter | Number of nomad service discovery refresh failures.                                                                                                                        |
+| prometheus_sd_received_updates_total | counter | Total number of update events received from the SD providers.                                                                                                                        |
+| prometheus_sd_updates_delayed_total | counter | Total number of update events that couldn't be sent immediately.                                                                                                                        |
+| prometheus_sd_updates_total | counter | Total number of update events sent to the SD consumers.                                                                                                                        |
+| prometheus_target_interval_length_seconds | summary | Actual intervals between scrapes.                                                                                                                        |
+| prometheus_target_metadata_cache_bytes | gauge | The number of bytes that are currently used for storing metric metadata in the cache.                                                                                                                        |
+| prometheus_target_metadata_cache_entries | gauge | Total number of metric metadata entries in the cache.                                                                                                                        |
+| prometheus_target_scrape_pool_exceeded_label_limits_total | counter | Total number of times scrape pools hit the label limits, during sync or config reload.                                                                                                                        |
+| prometheus_target_scrape_pool_exceeded_target_limit_total | counter | Total number of times scrape pools hit the target limit, during sync or config reload.                                                                                                                        |
+| prometheus_target_scrape_pool_reloads_failed_total | counter | Total number of failed scrape pool reloads.                                                                                                                        |
+| prometheus_target_scrape_pool_reloads_total | counter | Total number of scrape pool reloads.                                                                                                                        |
+| prometheus_target_scrape_pool_sync_total | counter | Total number of syncs that were executed on a scrape pool.                                                                                                                        |
+| prometheus_target_scrape_pool_target_limit | gauge | Maximum number of targets allowed in this scrape pool.                                                                                                                        |
+| prometheus_target_scrape_pool_targets | gauge | Current number of targets in this scrape pool.                                                                                                                        |
+| prometheus_target_scrape_pools_failed_total | counter  | Total number of scrape pool creations that failed.                                                                                                                        |
+| prometheus_target_scrape_pools_total | counter  | Total number of scrape pool creation attempts.                                                                                                                        |
+| prometheus_target_scrapes_cache_flush_forced_total | counter  | How many times a scrape cache was flushed due to getting big while scrapes are failing.                                                                                                                        |
+| prometheus_target_scrapes_exceeded_body_size_limit_total | counter  | Total number of scrapes that hit the body size limit.                                                                                                                        |
+| prometheus_target_scrapes_exceeded_native_histogram_bucket_limit_total | counter  | Total number of scrapes that hit the native histogram bucket limit and were rejected.                                                                                                                        |
+| prometheus_target_scrapes_exceeded_sample_limit_total | counter  | Total number of scrapes that hit the sample limit and were rejected.                                                                                                                        |
+| prometheus_target_scrapes_exemplar_out_of_order_total | counter  | Total number of exemplar rejected due to not being out of the expected order.                                                                                                                        |
+| prometheus_target_scrapes_sample_duplicate_timestamp_total | counter  | Total number of samples rejected due to duplicate timestamps but different values.                                                                                                                        |
+| prometheus_target_scrapes_sample_out_of_bounds_total | counter  | Total number of samples rejected due to timestamp falling outside of the time bounds.                                                                                                                        |
+| prometheus_target_scrapes_sample_out_of_order_total | counter  | Total number of samples rejected due to not being out of the expected order.                                                                                                                        |
+| prometheus_target_sync_failed_total | counter  | Total number of target sync failures.                                                                                                                        |
+| prometheus_target_sync_length_seconds | summary  | Actual interval to sync the scrape pool.                                                                                                                        |
+| prometheus_template_text_expansion_failures | counter  | The total number of template text expansion failures.                                                                                                                        |
+| prometheus_template_text_expansion_failures_total | counter  | The total number of template text expansion failures.                                                                                                                        |
+| prometheus_template_text_expansions | counter  | The total number of template text expansions.                                                                                                                        |
+| prometheus_template_text_expansions_total | counter  | The total number of template text expansions.                                                                                                                        |
+| prometheus_treecache_watcher_goroutines | gauge  | The current number of watcher goroutines.                                                                                                                        |
+| prometheus_treecache_zookeeper_failures_total | counter  | The total number of ZooKeeper failures.                                                                                                                        |
+| prometheus_tsdb_blocks_loaded | gauge  | Number of currently loaded data blocks.                                                                                                                        |
+| prometheus_tsdb_checkpoint_creations_failed_total | counter  | Total number of checkpoint creations that failed.                                                                                                                        |
+| prometheus_tsdb_checkpoint_creations_total | counter  | Total number of checkpoint creations attempted.                                                                                                                        |
+| prometheus_tsdb_checkpoint_deletions_failed_total | counter  | Total number of checkpoint deletions that failed.                                                                                                                        |
+| prometheus_tsdb_checkpoint_deletions_total | counter  | Total number of checkpoint deletions attempted.                                                                                                                        |
+| prometheus_tsdb_clean_start | gauge  | -1: lockfile is disabled. 0: a lockfile from a previous execution was replaced. 1: lockfile creation was clean.                                                                                                                        |
+| prometheus_tsdb_compaction_chunk_range_seconds | histogram  | Final time range of chunks on their first compaction.                                                                                                                        |
+| prometheus_tsdb_compaction_chunk_samples | histogram  | Final number of samples on their first compaction.                                                                                                                        |
+| prometheus_tsdb_compaction_chunk_size_bytes | histogram  | Final size of chunks on their first compaction.                                                                                                                        |
+| prometheus_tsdb_compaction_duration_seconds | histogram  | Duration of compaction runs.                                                                                                                        |
+| prometheus_tsdb_compaction_populating_block | gauge  | Set to 1 when a block is currently being written to the disk.                                                                                                                        |
+| prometheus_tsdb_compactions_failed_total | counter  | Total number of compactions that failed for the partition.                                                                                                                        |
+| prometheus_tsdb_compactions_skipped_total | counter  | Total number of skipped compactions due to disabled auto compaction.                                                                                                                        |
+| prometheus_tsdb_compactions_total | counter  | Total number of compactions that were executed for the partition.                                                                                                                        |
+| prometheus_tsdb_compactions_triggered_total | counter  | Total number of triggered compactions for the partition.                                                                                                                        |
+| prometheus_tsdb_data_replay_duration_seconds | gauge  | Time taken to replay the data on disk.                                                                                                                        |
+| prometheus_tsdb_exemplar_exemplars_appended_total | counter | Total number of appended exemplars.                                                                                                                        |
+| prometheus_tsdb_exemplar_exemplars_in_storage | gauge | Number of exemplars currently in circular storage.                                                                                                                        |
+| prometheus_tsdb_exemplar_last_exemplars_timestamp_seconds | gauge | The timestamp of the oldest exemplar stored in circular storage. Useful to check for what timerange the current exemplar buffer limit allows. This usually means the last timestampfor all exemplars for a typical setup. This is not true though if one of the series timestamp is in future compared to rest series.                                                                                                                        |
+| prometheus_tsdb_exemplar_max_exemplars | gauge | Total number of exemplars the exemplar storage can store, resizeable.                                                                                                                        |
+| prometheus_tsdb_exemplar_out_of_order_exemplars_total | counter | Total number of out of order exemplar ingestion failed attempts.                                                                                                                        |
+| prometheus_tsdb_exemplar_series_with_exemplars_in_storage | gauge | Number of series with exemplars currently in circular storage.                                                                                                                        |
+| prometheus_tsdb_head_active_appenders | gauge | Number of currently active appender transactions.                                                                                                                        |
+| prometheus_tsdb_head_chunks | gauge | Total number of chunks in the head block.                                                                                                                        |
+| prometheus_tsdb_head_chunks_created_total | counter | Total number of chunks created in the head.                                                                                                                        |
+| prometheus_tsdb_head_chunks_removed_total | counter | Total number of chunks removed in the head.                                                                                                                        |
+| prometheus_tsdb_head_chunks_storage_size_bytes | gauge | Size of the chunks_head directory.                                                                                                                        |
+| prometheus_tsdb_head_gc_duration_seconds | summary | Runtime of garbage collection in the head block.                                                                                                                        |
+| prometheus_tsdb_head_max_time | gauge | Maximum timestamp of the head block. The unit is decided by the library consumer.                                                                                                                        |
+| prometheus_tsdb_head_max_time_seconds | gauge | Maximum timestamp of the head block.                                                                                                                        |
+| prometheus_tsdb_head_min_time | gauge | Minimum time bound of the head block. The unit is decided by the library consumer.                                                                                                                        |
+| prometheus_tsdb_head_min_time_seconds | gauge | Minimum time bound of the head block.                                                                                                                        |
+| prometheus_tsdb_head_out_of_order_samples_appended_total | counter | Total number of appended out of order samples.                                                                                                                        |
+| prometheus_tsdb_head_samples_appended_total | counter | Total number of appended samples.                                                                                                                        |
+| prometheus_tsdb_head_series | gauge | Total number of series in the head block.                                                                                                                        |
+| prometheus_tsdb_head_series_created_total | counter | Total number of series created in the head.                                                                                                                        |
+| prometheus_tsdb_head_series_not_found_total | counter | Total number of requests for series that were not found.                                                                                                                        |
+| prometheus_tsdb_head_series_removed_total | counter | Total number of series removed in the head.                                                                                                                        |
+| prometheus_tsdb_head_truncations_failed_total | counter | Total number of head truncations that failed.                                                                                                                        |
+| prometheus_tsdb_head_truncations_total | counter | Total number of head truncations attempted.                                                                                                                        |
+| prometheus_tsdb_isolation_high_watermark | gauge | The highest TSDB append ID that has been given out.                                                                                                                        |
+| prometheus_tsdb_isolation_low_watermark | gauge | The lowest TSDB append ID that is still referenced.                                                                                                                        |
+| prometheus_tsdb_lowest_timestamp | gauge | Lowest timestamp value stored in the database. The unit is decided by the library consumer.                                                                                                                        |
+| prometheus_tsdb_lowest_timestamp_seconds | gauge | Lowest timestamp value stored in the database.                                                                                                                        |
+| prometheus_tsdb_mmap_chunk_corruptions_total | counter | Total number of memory-mapped chunk corruptions.                                                                                                                        |
+| prometheus_tsdb_mmap_chunks_total | counter | Total number of chunks that were memory-mapped.                                                                                                                        |
+| prometheus_tsdb_out_of_bound_samples_total | counter | Total number of out of bound samples ingestion failed attempts with out of order support disabled.                                                                                                                        |
+| prometheus_tsdb_out_of_order_samples_total | counter | Total number of out of order samples ingestion failed attempts due to out of order being disabled.                                                                                                                        |
+| prometheus_tsdb_reloads_failures_total | counter | Number of times the database failed to reloadBlocks block data from disk.                                                                                                                        |
+| prometheus_tsdb_reloads_total | counter | Number of times the database reloaded block data from disk.                                                                                                                        |
+| prometheus_tsdb_retention_limit_bytes | gauge | Max number of bytes to be retained in the tsdb blocks, configured 0 means disabled.                                                                                                                        |
+| prometheus_tsdb_retention_limit_seconds | gauge | How long to retain samples in storage.                                                                                                                        |
+| prometheus_tsdb_size_retentions_total | counter | The number of times that blocks were deleted because the maximum number of bytes was exceeded.                                                                                                                        |
+| prometheus_tsdb_snapshot_replay_error_total | counter | Total number snapshot replays that failed.                                                                                                                        |
+| prometheus_tsdb_storage_blocks_bytes | gauge | The number of bytes that are currently used for local storage by all blocks.                                                                                                                        |
+| prometheus_tsdb_symbol_table_size_bytes | gauge | Size of symbol table in memory for loaded blocks.                                                                                                                        |
+| prometheus_tsdb_time_retentions_total | counter | The number of times that blocks were deleted because the maximum time limit was exceeded.                                                                                                                        |
+| prometheus_tsdb_tombstone_cleanup_seconds | histogram | The time taken to recompact blocks to remove tombstones.                                                                                                                        |
+| prometheus_tsdb_too_old_samples_total | counter | Total number of out of order samples ingestion failed attempts with out of support enabled, but sample outside of time window.                                                                                                                        |
+| prometheus_tsdb_vertical_compactions_total | counter | Total number of compactions done on overlapping blocks.                                                                                                                        |
+| prometheus_tsdb_wal_completed_pages_total | counter | Total number of completed pages.                                                                                                                        |
+| prometheus_tsdb_wal_corruptions_total | counter | Total number of WAL corruptions.                                                                                                                        |
+| prometheus_tsdb_wal_fsync_duration_seconds | summary | Duration of write log fsync.                                                                                                                        |
+| prometheus_tsdb_wal_page_flushes_total | counter | Total number of page flushes.                                                                                                                        |
+| prometheus_tsdb_wal_segment_current | gauge | Write log segment index that TSDB is currently writing to.                                                                                                                        |
+| prometheus_tsdb_wal_storage_size_bytes | gauge | Size of the write log directory.                                                                                                                        |
+| prometheus_tsdb_wal_truncate_duration_seconds | summary | Duration of WAL truncation.                                                                                                                        |
+| prometheus_tsdb_wal_truncations_failed_total | counter | Total number of write log truncations that failed.                                                                                                                        |
+| prometheus_tsdb_wal_truncations_total | counter | Total number of write log truncations attempted.                                                                                                                        |
+| prometheus_tsdb_wal_writes_failed_total | counter | Total number of write log writes that failed.                                                                                                                        |
+| prometheus_web_federation_errors_total | counter | Total number of errors that occurred while sending federation responses.                                                                                                                        |
+| prometheus_web_federation_warnings_total | counter | Total number of warnings that occurred while sending federation responses.                                                                                                                        |