Performance and build accuracy

From the experience with tools such as SCons, users may be concerned about performance and think that all build systems based on interpreted languages such as Python would not scale. We will now describe why this is not the case for Waf and why Waf should be chosen for building very large projects.

Comparing Waf against other build systems

Since Waf considers the file contents in the build process, it is often thought that Waf would be much slower than make. For a test project having 5000 files (generated from the script located in tools/genbench.py), on a 1.5Ghz computer, the Waf runtime is actually slightly faster than the Gnu/Make one (less than one second). The reason is the time to launch a new process - make is usually called recursively, once by directory.

For huge projects, calling make recursively is necessary for flexibility, but it hurts performance (launch many processes), and CPU utilization (running tasks in parallel). Make-based build systems such as CMake or Autotools inherit the limitations of Make.

Though Waf uses a similar design as SCons, Waf is about 15 times faster for similar features and without sacrificing build accuracy. The main reasons for this are the following:

  • The Waf data structures (file system representation, tasks) have been carefully chosen to minimize memory usage and data duplication
  • For a project of the same size, SCons requires at least 10 times as many function calls

A few benchmarks are maintained at this location

Waf hashing schemes and build accuracy

To rebuild targets when source file change, the file contents are hashed and compared. The hashes are used to identify the tasks, and to retrieve the files from a cache (folder defined by the environment variable WAFCACHE). Besides command-lines, this scheme also takes file dependencies into account: it is more accurate than caching systems such as ccache.

The Waf hashing scheme uses the md5 algorithm provided by the Python distribution. It is fast enough for up to about 100Mb of data and about 10000 files and very safe (virtually no risk of collision).

If more than 100Mb of data is present in the project, it may be necessary to use a faster hashing algorithm. An implementation of the fnv algorithm is present in the Waf distribution, and can replace md5 without really degrading accuracy.

If more than 10000 files are present, it may be necessary to replace the hashing system by a file name+size+timestamp hash scheme. An example is provided in the comment section of the module Utils.py. That scheme is more efficient but less accurate: the Waf cache should not be used with this scheme.