Chapter 1. Introduction

When you develop a program, usually one of the last steps is to make it as fast as possible (but still correct!). You don't want to waste your time, optimizing functions rarely used. So you need to know in which part of your program most of the time is spent.

This is called Profiling. The program is run in control of a profiling tool, which gives you a trace of the execution run. After examination of the trace, you probably know where to optimize and afterwards you verify the optimization success again with another profile run.

Most known is the GCC profiling tool gprof: You need to compile your program with option -pg; running the program generates a file gmon.out, which can be transformed into human readable form with gprof. The disadvantage is the needed compilation step for a prepared executable, which has to be statically linked.

Another profiling tool is Cachegrind, part of Valgrind. It uses the processor emulation of Valgrind to run the executable, and catches all memory accesses for the trace. The program does not need to be recompiled; it can use shared libraries and plugins, and the profile measuring doesn't influence the trace results. The trace includes the number of instruction/data memory accesses and 1st/2nd level cache misses, and relates it to source lines and functions of the run program (a disadvantage is the slowdown involved the the processor emulation, it's unfortunately around 50 times slower).

A patch for Valgrind 1.0.x sources adds call tree tracing, i.e. how the functions call each other and how many events happen while running a function (including all called functions).

KCachegrind is a visualization tool for the profiling data generated by Cachegrind. It is strongly advised to add support for call tree tracing into Cachegrind, because KCachegrind is much more useful this way.