LAM / MPI Parallel Computing
LAM 6.3 Release Notes

LAM web site navigation
[ Mirror sites ]

  • LAM home
  • Version 6.4 information
  • Version 6.3 information
  • Version 6.2 information
  • Downloads
  • FAQ
  • Quick tutorials
  • Other MPI software
  • 3rd Party Packages
  • MPI implementation list


  • Table of Contents

    Installation

    Please see the installation guide for instructions on installing LAM/MPI 6.3

    Supported Systems

    LAM 6.3 has been tested on the following systems:
    • Solaris 2.5.1, 2.6
    • IRIX 6.2, 6.3, 6.4, 6.5
    • AIX 4.1.4, 4.2.1
    • Linux 2.0.36, 2.2.10
    • HP-UX B 11.0
    Since this is still "beta" level software, we request that you also download the "lamtest" package from the LAM web site and run it on your system. If any errors are encountered, please send us all the output (see the README file in that package).

    The LAM Team would greatly appreciate your time and effort in helping to verify LAM/MPI on a wide variety of systems. Please see the LAM test suite page to see how you can help.<

    New feature overview

    • Added ROMIO MPI I/O package (chapter 9 from the MPI-2 standard) from the Argonne National Laboratory.
    • Added the MPI-2 C++ bindings package (chapter 10 from the MPI-2 standard) from the Laboratory for Scientific Computing at the University of Notre Dame.
    • Pseudo-tty support for remote IO (e.g., line buffered output).
    • Ability to pass environment variables through mpirun.
    • Ability to mpirun shell scripts/debuggers/etc. (that eventually run LAM/MPI programs).
    • Ability to execute non-MPI programs across the multicomputer.
    • Added configurable ability to zero-fill internal LAM buffers before they are used (for development tools such as Purify).
    • Greatly expanded error messages; provided for customizable local help files.
    • Expanded and updated documentation.
    • Wrapper compilers now use same compilers as in ./configure.
    • Made -lmpi implicit in wrapper compilers.
    • Fixed compatibility problems with some compilers.
    • Fixed problem with some compilers not getting prototypes properly from <mpi.h>.
    • Various bug fixes (see the HISTORY file).

    Caveats about MPI_CANCEL

    LAM is fully MPI-1 complaint with the exception of MPI_CANCEL. MPI_CANCEL works properly for receives, but will almost never work on sends. MPI_CANCEL is most frequently used with unmatched MPI_IRECV's that were made "in case" a matching message arrived. This simply entails removing the receive request from the local queue, and is fairly straightforward to implement.

    Actually canceling a send operation is much more difficult because some meta information about a message is usually sent immediately. As such, the message is usually at least partially sent before an MPI_CANCEL is issued. Trying to chase down all the particular cases is a nightmare, to say the least.

    As such, the LAM Team decided not to implement MPI_CANCEL on sends, and instead concentrate on other features.

    Backward Compatibility

    LAM provides full source code backward compatibility with previous versions of LAM. Old applications that compiled properly with older versions of LAM can simply be recompiled with this version of LAM.

    Binary compatibility, however, is not provided -- applications that have been compiled with previous versions of LAM will need to be recompiled in order to run properly with this version of LAM. If applications are not re-compiled with this LAM, their behavior will be unpredictable.

    LAM and Linux

    LAM is frequently used on Linux-based machines (iX86 and otherwise). It works correctly under 2.0.36 (we didn't test under 2.0.37, but we have no reason to believe that it wouldn't work under that version as well, since it is really only minor changes from 2.0.36) and 2.2.x.

    However, versions 2.2.0 through 2.2.9 had some TCP/IP performance problems. It seems that version 2.2.10 fixed these problems; if you are using a Linux version between 2.2.0 and 2.2.9, LAM may exhibit poor C2C performance due to the Linux TCP/IP kernel bugs. We recomend that you upgrade to 2.2.10 (or the latest version).

    See the LAM Linux page for a full discussion of the problem.

    LAM help file

    The following LAM binaries have had their help messages greatly expanded:
    • hboot
    • hcc / hcp / hf77
    • lamboot
    • recon
    • tkill
    • wipe

    The messages should be much more helpful in trying to diagnose problems, especially for first-time users. The help messages generally try to identify the problem and suggest solutions. It is possible for multiple error messages to be printed; one failure may cause other failures. As such, the first error message is generally (but not always) the most relevant message -- solving that error may solve the rest.

    Additionally, much more information is now output when the "-d" switch is used on all of these program (which enables debugging output).

    The help messages are all contained in a single ASCII file which is initially installed into the following file (where $prefix is the option supplied to --prefix in the ./configure script):

    $prefix/share/lam/lam-6.3b-helpfile

    The format of the file is simple; simple delimiter lines separate help topic messages. It should be very obvious which message corresponds to which program/topic name combination.

    This file allows system administrators to customize help messages for their users according to the local environment. When LAM tries to find the helpfile to print out a help message, it actually searches for the file in the following order:

    $LAMHELPFILE
    $HOME/lam-helpfile
    $HOME/lam-6.3b-helpfile
    $HOME/share/lam/lam-helpfile
    $HOME/share/lam/lam-6.3b-helpfile
    $LAMHELPDIR/lam-helpfile
    $LAMHELPDIR/lam-6.3b-helpfile
    $LAMHOME/share/lam/lam-helpfile
    $LAMHOME/share/lam/lam-6.3b-helpfile
    $TROLLIUSHOME/share/lam/lam-helpfile
    $TROLLIUSHOME/share/lam/lam-6.3b-helpfile
    $prefix/share/lam/lam-helpfile
    $prefix/share/lam/lam-6.3b-helpfile
    

    This seemingly-over complicated scheme will allow for maximum flexibility for system administrators and/or users to define the location of customized help files.

    Zeroing out LAM buffers before use

    LAM has several structures that are used in many situations. One example is the "struct nmsg"; one of the underlying message constructs used to pass data between LAM entities. But since the "struct nmsg" is used in so many places, it is a generalized structure and contains fields that are not used in every situation.

    By default, LAM only zeros out relevant struct members before using a structure. "Using" a structure may involve sending the entire structure (including uninitialized members) to a remote host. This is not a problem, because the remote host will also ignore irrelevant struct members (depending on the specific function being invoked). More to the point -- LAM was designed this way to avoid setting variables that will not be used; this is a slight optimization in run-time performance.

    Memory-checking debuggers are quite popular (such as purify and the Solaris Workshop bcheck program), and quite useful to find memory leaks, indexing past the end of arrays, and other types of Heisenbugs. Since LAM "uses" uninitialized memory, it tends to generate many warnings with these types of debuggers.

    The --with-purify option has been added to the ./configure script that will force LAM to zero out all memory before it is used. This will eliminate the "read before initialized" types of warnings that memory-checking debuggers will identify deep inside LAM. However, this option invokes a slight overhead penalty in the run-time performance of LAM, so it is not the default.

    Mpirun

    The default behavior of mpirun has changed. The default options now correspond to -w -c2c -nger. That is, wait for the application to terminate, use the fast client-to-client communication mode and disable GER. To get the old behavior use the options -lamd -ger -nw.

    Mpirun now recognizes command lines of the form

    % mpirun -np <nprocs> {LAM specific mpirun args} \
    	<program> {program args}
    
    For example,
    % mpirun -np 4 -lamd n0 n1 /bin/foobar 12 a b c
    
    runs 4 copies of program /bin/foobar on nodes n0 and n1, passing the arguments, 12 a b c, to the program. The new syntax is equivalent to the following in the "-c" syntax which is still supported.
    % mpirun -c <nprocs> {LAM specific mpirun args} \
    	<program> -- {program args}
    

    Ability to pass environment variables.

    All environment variables named LAM_MPI_* are now automatically passed to remote notes (unless disabled via the "-nx" option to mpirun). The "-x" option enabled exporting of specific environment variables to the remote nodes:
    % LAM_MPI_FOO="green eggs and lam"
    % export LAM_MPI_FOO
    % mpirun N -x DISPLAY,ME=author lamIam
    

    This will launch the "lamIam" application on all remote nodes. The LAM_MPI_FOO, DISPLAY, and ME variables will be created on all nodes before the user's program is invoked.

    Note that the parser for the "-x" option is currently not very sophisticated -- it cannot even handle quoted values when defining new environment variables. Users are advised to set variables in the environment prior to invoking mpirun, and only use "-x" to export the variables to the remote nodes (not to define new variables), if possible.

    Pseudo-tty support.

    The "-pty" option to mpirun enabled pseudo tty support. Among other things, this gives line-buffered output from the remote nodes (which is probably what you want). It is not currently a default option because it has not been tested in a wide variety of Unixes yet.

    Ability to change to arbitrary directories.

    The "-wd" option to mpirun allows the user to change to an arbitrary directory before their program is invoked. It can also be used in application schema files to specify working directories on specific nodes and/or for specific applications.

    If the "-wd" option appears both in a schema file and on the command line, the schema file directory will override the command line value.

    Ability to run shell scripts/debuggers/etc.

    mpirun can now also run non-LAM/MPI programs. That is, one can mpirun a shell script, debugger, or any other program that will eventually either exec a LAM/MPI program or spawn a LAM/MPI program as a child.

    This is extremely helpful for batch systems and debugging environments. For example:

    % mpirun N gdb
    

    lamexec

    The lamexec command has been added to LAM/MPI's repertoire. It is an "mpirun clone", but is specifically for running non-MPI programs. That is, one can do the following:
    % lamexec N ps
    

    which will run "ps" on all nodes in the multicomputer. It can take most of the same command line arguments as mpirun; it does not support the flags that do not make sense for non-MPI programs (e.g., -c2c, -lamd, etc.). See lamexec(1) for more details.

    hcc / hcp / hf77 / mpicc / mpiCC / mpif77

    The hcc, hcp, and hf77 wrapper compilers have previously not automatically passed the "-lmpi" option to the underlying compiler. The rationale behind this decision was that the "mpicc" and "mpif77" wrapper compilers added this functionality; the "h" wrappers were intended as Trollius compilers, not LAM/MPI compilers.

    But hcc, hcp, and hf77 have become the de facto wrapper compilers (vs. mpicc and mpif77). Indeed, some users have been confused about why -lmpi is not implicit to the "h" wrapper compilers.

    Hence, "-lmpi" is now automatically passed to the underlying compiler in the hcc, hcp, and hf77 wrapper compilers. The mpicc and mpif77 compilers are now symbolic links to hcc and hf77, respectively.

    For symmetry, mpiCC has been created as a symbolic link to hcp.

    Root execution disallowed

    It is a Very Bad Idea to run the LAM executables as root.

    LAM was designed to be run by individual users; it was not designed to be run as a root-level service where multiple users use the same LAM daemons in a client-server fashion (see "Typical Usage" in the INSTALL file). LAM should be booted by each individual user who wishes to run MPI programs. There are a wide array of security issues when root runs a service-level daemon; LAM does not even attempt to address any of these issues.

    Especially with today's propensity for hackers to scan for root-owned network daemons, it could be tragic to run this program as root. While LAM is known to be quite stable, and LAM does not leave network sockets open for random connections after the initial setup, several factors should strike fear into system administrator's hearts if LAM were to be constantly running for all users to utilize:

    • LAM leaves a unix domain socket open on each machine in the /tmp directory. So if someone breaks into root on one machine, they effectively have root on all machines that are connected via LAM.
    • Indeed, there must have been a .rhosts (or some other trust mechanism) for root which must have allowed you to run LAM on remote nodes. Depending on your local setup, this may not be safe.
    • LAM has never been checked for buffer overflows and other malicious input types of errors. We don't think that there are any buffer-overflow types of situations in LAM, we've never checked explicitly (hence, per Mr. Murphy, there are certainly some hiding somewhere).
    • LAM programs are not audited or tracked in any way. This could present a sneaky way to execute binaries without log trails (especially as root).
    Hence, it's a Very Bad Idea to run LAM as root. LAM binaries will quit immediately if root runs them. Login as a different user to run LAM.

    RPI transport layers

    LAM 6.2 provides three client-to-client transport layers which implement the request progression interface (RPI). As in LAM 6.1 the LAM daemon RPI transport is always available. It is no longer the default transport and must be explicitly invoked via the -lamd option to mpirun.

    The three client-to-client transports are:

    • tcp

      The tcp transport uses TCP sockets for all interprocess communication.

    • usysv

      The usysv transport is multi-protocol. Processes on the same node communicate via SYSV shared memory and processes on different nodes communicate via TCP sockets. It uses spin-locks for shared memory synchronization as well as a SYSV semaphore or pthread mutex for synchronizing access to a per node global shared memory pool.

      The spin-locks require that the architecture has strongly ordered writes and this transport is only supported on such platforms. It should be relatively easy to modify this transport to work on systems with weakly ordered writes by adding memory barriers in appropriate places.

    • sysv

      The sysv transport is the same as the usysv transport except that SYSV semaphores are used for message synchronization rather than spin-locks. On some uniprocessor systems (e.g. Linux) the blocking nature of semaphores can lead to better performance than when using spin-locks.

      The usysv transport should give the best performance on SMPs.

      Please refer to the tuning notes for more on the performance of the various transports and on tuning them.

    Signal catching

    LAM MPI now catches the signals SEGV, BUS, FPE and ILL. The signal handler terminates the application. This is useful in batch jobs to help ensure that mpirun returns if an application process dies. To disable the catching of signals use the -nsigs option to mpirun.

    Internal signal

    The signal used internally by LAM has been changed from SIGUSR1 to SIGUSR2 to reduce the chance of conflicts with the Linux pthreads library. The signal used is configurable.

    New basic datatypes

    Support has been added for the MPI_LONG_LONG_INT, MPI_UNSIGNED_LONG_LONG and MPI_WCHAR basic datatypes.

    MPI-2 Support

    C++ bindings

    C++ bindings for MPI-1 are provided from the MPI-2 C++ bindings package from the University of Notre Dame (http://www.mpi.nd.edu/research/mpi2c++/), version 1.0.3. The MPI-1 C++ bindings are described in Chapter 10 and Appendix B of the MPI-2 standard, which can be found at http://www.mpi-forum.org/.

    The C++ bindings package is compiled, by default, with LAM, and the LAM wrapper compilers (hcc/hcp/hf77) will automatically do "the right things" to compile/link user programs that use MPI C++ bindings function calls.

    Note that the C++ bindings have requirements on the degree of conformance that your C++ compiler supports; see the file mpi2c++/README for more details. If your C++ compiler cannot support the requirements of the C++ bindings package, it is safest just to disable MPI C++ bindings support in LAM.

    MPI C++ bindings support can be disabled via the LAM ./configure script; see the INSTALL file for specific instructions.

    Please see the "Contact Information" section of the mpi2c++/README file for how to submit questions and bug reports about the MPI 2 C++ bindings package (that do not specifically pertain to LAM).

    MPI-IO / ROMIO

    MPI-IO support has been added by including the ROMIO package from Argonne National Labs (http://www.mcs.anl.gov/romio/), version 1.0.1. The MPI-IO functions are described in chapter 9 of the MPI-2 standard, which can be found at http://www.mpi-forum.org/.

    The ROMIO package can be compiled with LAM, and the LAM wrapper compilers (hcc/hcp/hf77) will automatically do "the right things" to compile/link user programs that use ROMIO function calls.

    Please note that this is the first version of ROMIO that has been configured to work with LAM. As such, there are some custom modifications that were made to the initial ROMIO distribution of 1.0.1; a vanilla ROMIO 1.0.1 distribution will not compile correctly (conversely, versions of LAM prior to 6.3b will not compile with ROMIO as well -- there were incompatibilities in both directions). The ROMIO modifications have been conveyed back to the ROMIO team; the next release will be able to natively compile with LAM 6.3b (and higher) with possible limitations, mentioned below.

    ROMIO support can be enabled via the LAM ./configure script; see the INSTALL file for specific instructions.

    There are some important limitations to ROMIO that are discussed in the romio/README file.

    One limitation that is not currently listed in the ROMIO README file is that atomic file access will not work with AFS. This is because of file locking problems with AFS. The ROMIO test program "atomicity" will fail if you specify an output file on AFS.

    Additionally, ROMIO does not support the following LAM functionality:

    • LAM MPI-2 datatypes cannot be used with ROMIO; ROMIO makes the fundamental assumption that MPI-2 datatypes are built upon MPI-1 datatypes. LAM builds MPI-2 datatypes natively -- ROMIO cannot presently handle this case.

      This will hopefully be fixed in some future release of ROMIO.

      The ROMIO test programs "coll_test", "fcoll_test", "large_array", and "coll_perf" will fail because they use the MPI-2 datatype MPI_DARRAY.

      Please see the sections "ROMIO Users Mailing List" and "Reporting Bugs" in romio/README for how to submit questions and bug reports about ROMIO (that do not specifically pertain to LAM).

    Inter-language interoperability

    Inter-language interoperability is supported. It is now possible to initialize LAM MPI from either C or Fortran and mix MPI calls from both languages.

    One-sided communication

    Support is provided for get/put/accumulate data transfer operations and for the post/wait/start/complete and fence synchronization operations. No support is provided for window locking.

    The datatypes used in the get/put/accumulate operations are restricted to being basic datatypes or single level contigs/vectors of basic datatypes.

    The implementation of the one-sided operations is layered on top of the point-to-point functions and will thus perform no better than them. Nevertheless it is hoped that providing this support will aid developers in developing and debugging codes using one-sided communication.

    The following functions related to one-sided communication have been implemented.

    MPI_Win_create
    MPI_Win_free 
    MPI_Win_get_group 
    
    MPI_Get
    MPI_Put
    MPI_Accumulate
    
    MPI_Win_fence 
    MPI_Win_post 
    MPI_Win_wait 
    MPI_Win_start 
    MPI_Win_complete

    Dynamic processes

    The dynamic process support provided in LAM 6.2 has been extended and the function names changed to conform to the final MPI 2.0 standard. The following functions related to dynamic process support are provided.

    MPI_Comm_spawn
    MPI_Comm_spawn_multiple
    MPI_Comm_get_parent 
    MPI_Comm_accept
    MPI_Comm_connect
    MPI_Comm_disconnect
    MPI_Comm_join
    
    MPI_Lookup_name 
    MPI_Publish_name 
    MPI_Unpublish_name 
    
    MPI_Open_port 
    MPI_Close_port

    Info

    Full support for info objects is provided.

    MPI_Info_create 
    MPI_Info_free 
    MPI_Info_delete 
    MPI_Info_dup 
    MPI_Info_get 
    MPI_Info_get_nkeys 
    MPI_Info_get_nthkey 
    MPI_Info_get_valuelen 
    MPI_Info_set

    Communicator and window error handling

    The new communicator error handler functions are supported and window error handlers are also supported.

    MPI_Comm_create_errhandler
    MPI_Comm_get_errhandler
    MPI_Comm_set_errhandler
    
    MPI_Win_create_errhandler
    MPI_Win_get_errhandler 
    MPI_Win_set_errhandler

    Handle conversions

    Handle conversions for inter-language interoperability are supported.

    MPI_Comm_f2c 
    MPI_Comm_c2f 
    
    MPI_Group_f2c 
    MPI_Group_c2f 
    
    MPI_Type_f2c 
    MPI_Type_c2f 
    
    MPI_Request_f2c 
    MPI_Request_c2f 
    
    MPI_Info_f2c 
    MPI_Info_c2f 
    
    MPI_Win_f2c 
    MPI_Win_c2f 
    
    MPI_Status_f2c
    MPI_Status_c2f

    Attributes on communicators, datatypes and windows

    Attributes may now be set on and retrieved from datatypes and windows. The new communicator attribute handling functions are also supported.

    MPI_Comm_create_keyval
    MPI_Comm_free_keyval 
    MPI_Comm_delete_attr 
    MPI_Comm_get_attr 
    MPI_Comm_set_attr 
    
    MPI_Type_create_keyval
    MPI_Type_free_keyval 
    MPI_Type_delete_attr 
    MPI_Type_get_attr 
    MPI_Type_set_attr 
    
    MPI_Win_create_keyval
    MPI_Win_free_keyval 
    MPI_Win_delete_attr 
    MPI_Win_get_attr 
    MPI_Win_set_attr

    New derived type constructors and type enquiry functions

    Support has been added for the following new derived type constructors

    MPI_Type_create_struct
    MPI_Type_create_hindexed 
    MPI_Type_create_hvector
    MPI_Type_dup 
    MPI_Type_create_resized
    MPI_Type_create_subarray
    MPI_Type_create_darray
    and for the type enquiry functions

    MPI_Type_get_contents
    MPI_Type_get_envelope
    MPI_Type_get_extent 
    MPI_Type_get_true_extent

    Miscellaneous

    Implementations of the following functions are provided. LAM 6.3 reports its MPI version as 1.2.

    MPI_Get_version
    MPI_Get_address


    Questions? Comments? Feedback?
    Please click here
    This site is located in:

    Notre Dame, IN, USA
    Copyright ©1996-1999
    LAM Team / UND
    16-Sep-1999 / 08:29:31 EST