|
Zoltan:
Parallel Partitioning, Load Balancing and Data-Management Services
Frequently Asked Questions
- What does the following message mean during
compilation of zoltan?
Makefile:28: mem.d: No such file or directory
- On some platforms, why do Zoltan partitioning
methods RCB and RIB use an increasing amount of memory over multiple
invocations?
- Why does compilation of the Fortran90 driver zfdrive fail in fdr_const.f90?
- During runs (particularly on RedStorm), MPI
reports that it is out of resources or too many messages have been posted.
What does this mean and what can I do?
- What does the following message mean during
compilation of Zoltan?
Makefile:28: mem.d: No such file or directory
Every time Zoltan is built, gmake looks for a dependency file filename.d
for each source file filename.c. The first time Zoltan is built for a
given platform, the dependency files do not exist. The dependency files are
also removed by "gmake clean." Don't worry, though; after producing this
warning, gmake will create the dependency files it needs and continue
compilation.
- On some platforms, why do Zoltan partitioning
methods RCB and RIB use an increasing amount of memory over multiple
invocations?
Zoltan partitioning methods RCB and RIB use MPI_Comm_dup and MPI_Comm_split
to recursively create communicators with subsets of processors.
Some implementations of
MPI (e.g., the default MPI on Sandia's Thunderbird cluster) do not correctly
release memory associated with these communicators during MPI_Comm_free,
resulting in growing memory use over multiple invocations of RCB or RIB.
An undocumented workaround in
Zoltan is to set the TFLOPS_SPECIAL parameter to 1 (e.g.,
Zoltan_Set_Param(zz,"TFLOPS_SPECIAL","1");), which causes an
implementation that doesn't use MPI_Comm_split to be invoked.
- Why does compilation of the Fortran90 driver zfdrive fail in fdr_const.f90?
The Fortran90 driver zfdrive uses user-defined data types for a mesh
data structure. It passes these data types to the Zoltan query functions
through the void *data argument. Strict type checking in Fortran90
requires that the query interface have these user-defined data types compiled
into the interface. The solution is as follows:
cd Zoltan/fort
mv zoltan_user_data.f90 zoltan_user_data.f90.zoltan
ln -s ../fdriver/zoltan_user_data.f90 .
cd ..
touch fdriver/zoltan_user_data.f90 fort/zoltan_user_data.f90
gmake zfdrive
See the Fortran90 API
description in the User's Guide
and instructions for using zfdrive
in the Developer's Guide for more details.
- During runs (particularly on RedStorm), MPI
reports that it is out of resources or too many messages have been posted.
What does this mean and what can I do?
Some implementations of MPI (including RedStorm's implementation) limit
the number of message receives that can be posted simultaneously. Some
communications in Zoltan (including hashing of IDs to processors in the
Zoltan Distributed Data Directory) can require messages from large numbers
of processors, triggering this error on certain platforms.
To avoid this problem, Zoltan contains logic to use AllToAll communication
instead of point-to-point communication when a large number
of receives are needed. The maximum number of simultaneous receives allowed
can be set as a compile-time option to Zoltan.
In the native Zoltan
build environment, add -DMPI_RECV_LIMIT=# to the
DEFS line of zoltan/src/Utilities/Config/Config.<platform>
, where # is the maximum number of simultaneous receives allowed.
In the Autotool build
environment, option --enable-mpi-recv-limit=# sets the
maximum number of simultaneous receives allowed. The default value is 2000.
Updated: $Date$
Copyright (c) 2000-2007, Sandia National Laboratories.
The Zoltan Library and its documentation are released
under the GNU Lesser General Public License (LGPL).
See the README file in the main Zoltan directory for more information.
|