VGprof

VGprof is a modified version of gprof used for interpreting the files generated by the vgprof skin for Valgrind.  The output is essentially the same as for normal gprof usage, except that:
  1. any program can be profiled without having to recompile
  2. vgprof works with threaded programs
  3. vgprof can profile within and between shared libraries, as well as the main program.

Getting vgprof

The patches against binutils-2.13 are available here.  Here is a pre-compiled version of vgprof.

Changes to gprof

New somap tag

The vgprof skin can include a new somap record in the vgmon.out file.  These records contain a mapping between a range of virtual addresses and a full path to a shared library.  This allows vgprof itself to extract symbolic information from the shared libraries.

Histograms over the whole address space

Standard gprof only allows histogram records to specify a single address range (that is, if more than one histogram record exists in the file, all much cover the same address range).  Since the vgmon.out files generated by vgprof can contain information about code mapped all over the address space, it includes multiple histogram tags with addresses spread whereever samples were recorded.  I modified gprof to handle a sparse array of histogram samples all over the address space.

Performance improvements with large numbers of symbols

Several algorithms in gprof were sped up (mostly with hash tables) to deal with large numbers of source files and symbols. Unfortunately some algorithms are still too slow, particularly line-level profiling.

Invoking the vgprof skin

The basic form is:
valgrind --skin=vgprof program
By default, this will record function calls graph edges and an instruction count in the histogram. It will not record any basic-block level information or generate an somap.

Command line options are (defaults in bold):
--histo=yes | no
Include histogram information in the output file.
--histo-scale=16
How much code is accounted into each histogram bucket.  The default is 16 bytes per bucket.
--units=instructions | walltime | cputime
What is accumulated in the histogram.  Instructions counts x86 instructions executed, with a simple weighting scheme to account for the relative execution times (this is almost completely bogus).  Walltime counts real time, in milliseconds. Since this doesn't count across system calls (at the moment), it doesn't account for any time spent blocked, and is therefore much the same as cputime (which isn't implemented at all yet).
--unit-scale=auto | N
The scale factor applied to each histogram sample before being written to the file.  One of the limitations of the gmon.out file format is that histograms are only 16 bits per bucket, whereas the counts generated by vgprof are often considerably larger.  This is the scale factor applied to the counts as the file is generated to prevent information loss due to clipping.  The default, auto, will determine a power-of-10 scaling factor which prevents any buckets from overflowing.  This is only computed the first time a profile output is generated; subsequent profiles use the same scale factor, even if this would lead to overflow.
--bbcount=yes | no
Include a count of each basic block's usage.
--call-graph=yes | no
Include edges in the control flow of the program (resolution defined by --calls-only)
--calls-only=yes | no
Include only function calls in the call graph.  Otherwise, the call-graph will contain every edge from basic-block to basic-block.
--text-only=yes | no
Only instrument code in the text segment of an object.  This excludes basic blocks which are part of the dynamic linker's mechanism, which only confuses the output (running the program with LD_BIND_NOW set also helps).
--somap=yes | no
Include a set of somap records in the output.
--prof-output=vgmon.out
Sets the output file name. The actual name used is vgmon.out.pid[.count], where count is appended if more than one profile file is generated.

Using vgprof to view results

The basic usage is:
vgprof exe vgmon.out [vgmon.out...]
If multiple vgmon.out files are specified, they are added together and treated as one.

The only new command line option is:
--object-path=path:path
This is the path searched when looking up objects listed in an somap record.  This allows profiles to be viewed from a machine other than the one which recorded the profile, while still allowing symbols to be extracted over NFS.
All the options which write out summary gmon.out files are currenly disabled.

VGprof skin client requests

VGprof implements a couple of client requests.  These are:
VALGRIND_DUMP_PROFILE(zero)
This dumps out a snapshot of the current profiling information.  If zero is true, then the counters are atomically set to zero after writing out the file.
VALGRIND_ZERO_STATS()
This zeros the current counters.  This is useful for timing a specific piece of code.
gprof will take multiple vgmon.out files and accumulate the results, so repeatedly using DUMP_PROFILE with zero doesn't actually lose any information.