node_exporter/vendor/github.com/hodgesds/perf-utils/README.md

121 lines
4.3 KiB
Markdown
Raw Normal View History

# Perf
[![GoDoc](https://godoc.org/github.com/hodgesds/perf-utils?status.svg)](https://godoc.org/github.com/hodgesds/perf-utils)
This package is a go library for interacting with the `perf` subsystem in
Linux. It allows you to do things like see how many CPU instructions a function
takes, profile a process for various hardware events, and other interesting
things. The library is by no means finalized and should be considered pre-alpha
at best.
# Use Cases
A majority of the utility methods in this package should only be used for
testing and/or debugging performance issues. Due to the nature of the go
runtime profiling on the goroutine level is extremely tricky, with the
exception of a long running worker goroutine locked to an OS thread. Eventually
this library could be used to implement many of the features of `perf` but in
accessible via Go directly.
## Caveats
* Some utility functions will call
[`runtime.LockOSThread`](https://golang.org/pkg/runtime/#LockOSThread) for
you, they will also unlock the thread after profiling. ***Note*** using these
utility functions will incur significant overhead.
* Overflow handling is not implemented.
# Setup
Most likely you will need to tweak some system settings unless you are running as root. From `man perf_event_open`:
```
perf_event related configuration files
Files in /proc/sys/kernel/
/proc/sys/kernel/perf_event_paranoid
The perf_event_paranoid file can be set to restrict access to the performance counters.
2 allow only user-space measurements (default since Linux 4.6).
1 allow both kernel and user measurements (default before Linux 4.6).
0 allow access to CPU-specific data but not raw tracepoint samples.
-1 no restrictions.
The existence of the perf_event_paranoid file is the official method for determining if a kernel supports perf_event_open().
/proc/sys/kernel/perf_event_max_sample_rate
This sets the maximum sample rate. Setting this too high can allow users to sample at a rate that impacts overall machine performance and potentially lock up the machine. The default value is 100000 (samples per
second).
/proc/sys/kernel/perf_event_max_stack
This file sets the maximum depth of stack frame entries reported when generating a call trace.
/proc/sys/kernel/perf_event_mlock_kb
Maximum number of pages an unprivileged user can mlock(2). The default is 516 (kB).
```
# Example
Say you wanted to see how many CPU instructions a particular function took:
```
package main
import (
"fmt"
"log"
"github.com/hodgesds/perf-utils"
)
func foo() error {
var total int
for i:=0;i<1000;i++ {
total++
}
return nil
}
func main() {
profileValue, err := perf.CPUInstructions(foo)
if err != nil {
log.Fatal(err)
}
fmt.Printf("CPU instructions: %+v\n", profileValue)
}
```
# Benchmarks
To profile a single function call there is an overhead of ~0.4ms.
```
$ go test -bench=BenchmarkCPUCycles .
goos: linux
goarch: amd64
pkg: github.com/hodgesds/perf-utils
BenchmarkCPUCycles-8 3000 397924 ns/op 32 B/op 1 allocs/op
PASS
ok github.com/hodgesds/perf-utils 1.255s
```
The `Profiler` interface has low overhead and suitable for many use cases:
```
$ go test -bench=BenchmarkProfiler .
goos: linux
goarch: amd64
pkg: github.com/hodgesds/perf-utils
BenchmarkProfiler-8 3000000 488 ns/op 32 B/op 1 allocs/op
PASS
ok github.com/hodgesds/perf-utils 1.981s
```
# BPF Support
BPF is supported by using the `BPFProfiler` which is available via the
`ProfileTracepoint` function. To use BPF you need to create the BPF program and
then call `AttachBPF` with the file descriptor of the BPF program. This is not
well tested so use at your own peril.
# Misc
Originally I set out to use `go generate` to build Go structs that were
compatible with perf, I found a really good
[article](https://utcc.utoronto.ca/~cks/space/blog/programming/GoCGoCompatibleStructs)
on how to do so. Eventually, after digging through some of the `/x/sys/unix`
code I found pretty much what I was needed. However, I think if you are
interested in interacting with the kernel it is a worthwhile read.