Overview
Example that uses parallel_while to do parallel preorder traversal of a sparse graph.
Each vertex in the graph is called a "cell".
Each cell has a value.
The value is a matrix.
Some of the cells have operators
that compute the cell's value, using other cell's values as input.
A cell that uses the value of cell x is called a successor of x.
The algorithm works as follows.
- Compute the set of cells that have no inputs. This set is called root_set.
- Each cell has an associated field ref_count that is an atomic integer.
Initialize ref_count to the number of inputs for the Cell.
- Update each cell in root_set, by applying a parallel_while to a stream
that iterates over root_set
- After updating a cell, for each of its successors
- Atomically decrement the successor's ref_count
- If the count became zero, add the cell to the set of cells to be updated,
by calling parallel_while::add.
The times printed are for the traversal and update,
and do not include time for computing the root_set.
NOTE: It is important to understand that this example is unlikely to show speedup
if the cell values are changed to type "float". The reason is twofold.
- The smaller value type causes each Cell to be significantly smaller than a cache line,
which leads to false sharing conflicts.
- The time to update the cells becomes very small, and consequently the overhead of
parallel_while swamps the useful work.
Files
- parallel_preorder.cpp
- Source code for example.
- Graph.cpp
- Source code for example.
- Graph.h
- Source code for example.
- Matrix.h
- Source code for example.
- Makefile
- Makefile for building example.
Directories
- vc7.1
- Contains Microsoft* Visual Studio* .NET 2003 workspace for building and running the example.
- vc8
- Contains Microsoft* Visual Studio* 2005 workspace for building and running the example.
- xcode
- Contains Xcode* IDE workspace for building and running the example.
To Build
General build directions can be found here.
Usage
- parallel_preorder [M[:N] [Rounds ['pause']]]
- M and N are a range of numbers of threads to be used.
- Rounds is the number of rounds the example runs internally. Default value
is 50; reduce it to shorten example run time.
- If 'pause' is specified, the application will wait for a user to hit return before it exits.
- To run a short version of this example, e.g., for use with Intel® Threading Tools:
- Build a debug version of the example
(see the build directions).
Run it with the desired number of threads and smaller number of rounds, e.g., parallel_preorder 4 5.
Up to parent directory
Copyright © 2005-2007 Intel Corporation. All Rights Reserved.
Intel, Pentium, Intel Xeon, Itanium, Intel XScale and VTune are
registered trademarks or trademarks of Intel Corporation or its
subsidiaries in the United States and other countries.
* Other names and brands may be claimed as the property of others.