Simplify the Getting Started documentation. (#7193)

- Reduce the level of entry to start gathering metrics with prometheus
  by suggesting to just download pre-built exporters instead of requiring
  the reader to download an entire Golang build chain and checkout a project.

Fix #6956

Signed-off-by: Harold Dost <h.dost@criteo.com>
This commit is contained in:
Harold Dost 2020-05-04 12:49:45 +02:00 committed by GitHub
parent 7ecd2d1c24
commit 0e2004f6fb
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -118,7 +118,7 @@ For more about the expression language, see the
To graph expressions, navigate to http://localhost:9090/graph and use the "Graph" To graph expressions, navigate to http://localhost:9090/graph and use the "Graph"
tab. tab.
For example, enter the following expression to graph the per-second rate of chunks For example, enter the following expression to graph the per-second rate of chunks
being created in the self-scraped Prometheus: being created in the self-scraped Prometheus:
``` ```
@ -132,36 +132,26 @@ Experiment with the graph range parameters and other settings.
Let us make this more interesting and start some example targets for Prometheus Let us make this more interesting and start some example targets for Prometheus
to scrape. to scrape.
The Go client library includes an example which exports fictional RPC latencies The Node Exporter is used as an example target, for more information on using it
for three services with different latency distributions. [see these instructions.](https://prometheus.io/docs/guides/node-exporter/)
Ensure you have the [Go compiler installed](https://golang.org/doc/install) and
have a [working Go build environment](https://golang.org/doc/code.html) (with
correct `GOPATH`) set up.
Download the Go client library for Prometheus and run three of these example
processes:
```bash ```bash
# Fetch the client library code and compile example. tar -xzvf node_exporter-*.*.tar.gz
git clone https://github.com/prometheus/client_golang.git cd node_exporter-*.*
cd client_golang/examples/random
go get -d
go build
# Start 3 example targets in separate terminals: # Start 3 example targets in separate terminals:
./random -listen-address=:8080 ./node_exporter --web.listen-address 127.0.0.1:8080
./random -listen-address=:8081 ./node_exporter --web.listen-address 127.0.0.1:8081
./random -listen-address=:8082 ./node_exporter --web.listen-address 127.0.0.1:8082
``` ```
You should now have example targets listening on http://localhost:8080/metrics, You should now have example targets listening on http://localhost:8080/metrics,
http://localhost:8081/metrics, and http://localhost:8082/metrics. http://localhost:8081/metrics, and http://localhost:8082/metrics.
## Configuring Prometheus to monitor the sample targets ## Configure Prometheus to monitor the sample targets
Now we will configure Prometheus to scrape these new targets. Let's group all Now we will configure Prometheus to scrape these new targets. Let's group all
three endpoints into one job called `example-random`. However, imagine that the three endpoints into one job called `node`. However, imagine that the
first two endpoints are production targets, while the third one represents a first two endpoints are production targets, while the third one represents a
canary instance. To model this in Prometheus, we can add several groups of canary instance. To model this in Prometheus, we can add several groups of
endpoints to a single job, adding extra labels to each group of targets. In endpoints to a single job, adding extra labels to each group of targets. In
@ -173,7 +163,7 @@ section in your `prometheus.yml` and restart your Prometheus instance:
```yaml ```yaml
scrape_configs: scrape_configs:
- job_name: 'example-random' - job_name: 'node'
# Override the global default and scrape targets from this job every 5 seconds. # Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s scrape_interval: 5s
@ -189,8 +179,7 @@ scrape_configs:
``` ```
Go to the expression browser and verify that Prometheus now has information Go to the expression browser and verify that Prometheus now has information
about time series that these example endpoints expose, such as the about time series that these example endpoints expose, such as `node_cpu_seconds_total`.
`rpc_durations_seconds` metric.
## Configure rules for aggregating scraped data into new time series ## Configure rules for aggregating scraped data into new time series
@ -198,27 +187,26 @@ Though not a problem in our example, queries that aggregate over thousands of
time series can get slow when computed ad-hoc. To make this more efficient, time series can get slow when computed ad-hoc. To make this more efficient,
Prometheus allows you to prerecord expressions into completely new persisted Prometheus allows you to prerecord expressions into completely new persisted
time series via configured recording rules. Let's say we are interested in time series via configured recording rules. Let's say we are interested in
recording the per-second rate of example RPCs recording the per-second rate of cpu time (`node_cpu_seconds_total`) averaged
(`rpc_durations_seconds_count`) averaged over all instances (but over all cpus per instance (but preserving the `job`, `instance` and `mode`
preserving the `job` and `service` dimensions) as measured over a window of 5 dimensions) as measured over a window of 5 minutes. We could write this as:
minutes. We could write this as:
``` ```
avg(rate(rpc_durations_seconds_count[5m])) by (job, service) avg by (job, instance, mode) (rate(node_cpu_seconds_total[5m]))
``` ```
Try graphing this expression. Try graphing this expression.
To record the time series resulting from this expression into a new metric To record the time series resulting from this expression into a new metric
called `job_service:rpc_durations_seconds_count:avg_rate5m`, create a file called `job_instance_mode:node_cpu_seconds:avg_rate5m`, create a file
with the following recording rule and save it as `prometheus.rules.yml`: with the following recording rule and save it as `prometheus.rules.yml`:
``` ```
groups: groups:
- name: example - name: cpu-node
rules: rules:
- record: job_service:rpc_durations_seconds_count:avg_rate5m - record: job_instance_mode:node_cpu_seconds:avg_rate5m
expr: avg(rate(rpc_durations_seconds_count[5m])) by (job, service) expr: avg by (job, instance, mode) (rate(node_cpu_seconds_total[5m]))
``` ```
To make Prometheus pick up this new rule, add a `rule_files` statement in your `prometheus.yml`. The config should now To make Prometheus pick up this new rule, add a `rule_files` statement in your `prometheus.yml`. The config should now
@ -245,7 +233,7 @@ scrape_configs:
static_configs: static_configs:
- targets: ['localhost:9090'] - targets: ['localhost:9090']
- job_name: 'example-random' - job_name: 'node'
# Override the global default and scrape targets from this job every 5 seconds. # Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s scrape_interval: 5s
@ -261,5 +249,5 @@ scrape_configs:
``` ```
Restart Prometheus with the new configuration and verify that a new time series Restart Prometheus with the new configuration and verify that a new time series
with the metric name `job_service:rpc_durations_seconds_count:avg_rate5m` with the metric name `job_instance_mode:node_cpu_seconds:avg_rate5m`
is now available by querying it through the expression browser or graphing it. is now available by querying it through the expression browser or graphing it.