Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

make bench PKG=./pkg/bar BENCHES=BenchmarkFoo TESTFLAGS=-cpuprofile=/tmp/cpu.out.

Then you can run go tool pprof . /bar.test ./cpuprofile.out to tmp/cpu.out to jump into a CPU profile (or whatever other profile you are looking for) of that benchmark run. There is also a -memprofile flag for allocations.

Which profile to use when?

...

  • web: the most useful tool: fire up your browser with an SVG image of the current view of the profiler. The variables listed below (the kinds that are x=y) manipulate the state of pprof, and subsequent calls to web redraw a new image, with the new view.
  • pdf: like web, but generate a PDF instead of an SVG. Useful for uploading damning evidence to GitHub, which does not allow SVG uploads.
  • top<N> shows you the top N (by whatever metric you are currently profiling) functions. Some people use this, but I find it confusing and prefer to look at the web view.
  • list: outputs annotated source for functions matching regexp. This is great for seeing where in a function resources are being used.
  • weblist: like list, but opens up in a browser for easier viewing.
  • call_tree. By default, pprof draws a directed acyclic graph. This can get a little messy to understand, particularly if you care exactly when a particular hot node in the graph is getting called, and it has multiple callers. call_tree draws a tree (so child nodes from different callers are separated into different subtrees).
  • focus=foo: focus on functions named foo (i.e. draw a root node for each function that matches the regexp foo).
  • nodefraction=xyz: by default, pprof tries to intelligently "prune" the tree to only show a subset of nodes so t the graph is manageable. But sometimes this is too aggressive, and you want to relax it/make your browser suffer. Do note that on each SVG/PDF, the legend box shows the current edgefraction (under "Dropped nodes (cum <=xyzs)").
  • cum: set pprof into "cumulative" mode. By default pprof starts in flat mode, which shows how much time was spent in that function, whereas cum shows how much time was spent in that function including all children. Note that cum/flat just sets pprof into cumulative/flat mode, which affects subsequent calls to top10 and web, etc.
  • edgefraction, nodecount: like nodefraction, different ways to prune the graph.
  • peek foo: shows info about time spent in callers of foo

What's this cgocall that takes up all my CPU?

...