Programmer's Manual for LIBPSIO: The PSI I/O Library

T. Daniel Crawford
22 October 1998
Updated: 27 July 2006
crawdad@vt.edu


I. The structure and philosophy of the library

Many I/O libraries for quantum chemistry packages (including those in the old PSI2 code) expect the programmer to know the byte-by-byte layout of the given binary file. Accordingly, the primary read and write functions in such libraries require as an argument a global bytewise file pointer to the beginning of the desired data. As a result, when this pointer is defined to be an unsigned four-byte integer (common on 32-bit computers), the total size of the direct access file is limited to 4 GB (232 bytes). Furthermore, in order to avoid code duplication, this I/O design requires that one construct specialized libraries of functions (e.g., libfile30 in PSI2) for interaction with particularly complicated files such as a checkpoint file. Even slight modification of the file layout can require substantial changes to such libraries.

This PSI3 I/O library, libpsio, is intended to overcome these problems in two ways:

Data items in the TOC are identified by keyword strings (e.g., "Nuclear Repulsion Energy"). If the programmer wishes to read or write an entire TOC entry, he/she is required to provide only the TOC keyword and the entry size (in bytes) to obtain the data; the entry's global starting address is supplied by the TOC. Furthermore, it is possible to read pieces of TOC entries (say a single buffer of a large list of two-electron integrals) by providing the appropriate TOC keyword, a size, and a starting address relative to the beginning of the TOC entry. In short, the TOC design hides all information about the global structure of the file from the programmer and allows him/her to be concerned only with the structure of individual entries.

II. The structure of libpsio file

The first element in every libpsio file is a single integer, toclen, indicating the number of entries in the file. Each entry is stored together with its TOC "header", i.e., the keyword-string and global-address information for the data. When the file is opened, the first entry's TOC header is read from the file into an in-core TOC list. If a second entry exists, the ending-address data from the first entry is used to lseek() to the next entry, whose header is read into the in-core TOC, and so on. If a new entry is added or an existing entry is modified (e.g., extended), both the in-core TOC and the corresponding TOC header on-disk are updated automatically. This prevents most cases of corruption of the file in case of a program crash. Apart from the toclen integer, the file itself is viewed by the library as a series of pages, each of which contains an identical number of bytes. The global address of the beginning of a given entry is stored on the TOC as a page/offset pair comprised of the starting page and byte-offset on that page where the data reside. The entry-relative page/offset addresses which the programmer must provide work in exactly the same manner, but the 0/0 (PSIO_ZERO) position is taken to be the beginning of the desired entry rather than the beginning of the file.

II. The user interface

All of the functions needed to carry out basic I/O are described in this subsection. Proper declarations of these routines are provided by the header file psio.h. Note that before any open/close functions may be called, the input parsing library, libipv1 must be initialized so that the necessary file striping information may be read from user input. (See the PSI3 programmer's manual for details on the current version of the input parser.) Also note that ULI is used as an abbreviation for unsigned long int in the remainder of this manual.

int psio_init(void): Before any files may be opened or the basic read/write functions of libpsio may be used, the global data needed by the library functions must be initialized using this function.

int psio_ipv1_config(void): If the library is operator within a PSI module, the library can find its configuration data in the input file or in the .psirc file when this function is called. Therefore it should be called immediately after psio_init().

int psio_done(void): When all interaction with the files is complete, this function is used to free the library's global memory.

int psio_open(ULI unit, int status): Opens the binary file identified by unit. The status flag is a boolean used to indicate if the file is new (PSIO_OPEN_NEW) or if it already exists and is being re-opened (PSIO_OPEN_OLD). If specified in the user input file, the file will be automatically opened as a multivolume (striped) file, and each page of data will be read from or written to each volume in succession. (Note that a non-existent file can still be opened with status PSIO_OPEN_OLD.)

int psio_close(ULI unit, int keep): Closes a binary file identified by unit. The keep flag is a boolean used to indicate if the file's volumes should be deleted (0) or retained (1) after being closed.

int psio_read_entry(ULI unit, char *key, char *buffer, ULI size): Used to read an entire TOC entry identified by the string key from unit into the array buffer. The number of bytes to be read is given by size, but this value is only used to ensure that the read request does not exceed the end of the entry. If the entry does not exist, an error is printed to stderr and the program will exit.

int psio_write_entry(ULI unit, char *key, char *buffer, ULI size): Used to write an entire TOC entry idenitified by the string key to unit into the array buffer. The number of bytes to be written is given by size. If the entry already exists and its data is being overwritten, the value of size is used to ensure that the write request does not exceed the end of the entry.

int psio_read(ULI unit, char *key, char *buffer, ULI size, psio_address sadd, psio_address *eadd): Used to read a fragment of size bytes of a given TOC entry identified by key from unit into the array buffer. The starting address is given by the sadd and the ending address (that is, the entry-relative address of the next byte in the file) is returned in *eadd.

int psio_write(ULI unit, char *key, char *buffer, ULI size, psio_address sadd, psio_address *eadd): Used to write a fragment of size bytes of a given TOC entry identified by key to unit into the array buffer. The starting address is given by the sadd and the ending address (that is, the entry-relative address of the next byte in the file) is returned in *eadd.

The page/offset address pairs required by the preceeding read and write functions are supplied via variables of the data type psio_address, defined by:

typedef struct {
ULI page;
ULI offset;
} psio_address;

The PSIO_ZERO defined as a global variable provides a convenient input for the 0/0 page/offset.

III. Manipulating the table of contents

In addition, to the basic open/close/read/write functions described above, the programmer also has a limited ability to directly manipulate or examine the data in the TOC itself.

int psio_tocprint(ULI unit, FILE *outfile): Prints the TOC of unit in a readable form to outfile, including entry keywords and starting/ending addresses.

int psio_toclen(ULI unit, FILE *outfile): Returns the number of entries in the TOC of unit.

int psio_tocdel(ULI unit, char *key): Deletes the TOC entry corresponding to key. NB: Do not use this function if you are not a PSI3 expert. This function only deletes the entry's reference from the TOC itself and does not remove the corresponding data from the file. Hence, it is possible to introduce data "holes" into the file.

IV. Some simple examples

The following code illustrates the basic use of the library, as well as when/how the psio_init() and psio_done() functions should be called in relation to initialization of libipv1.

#include <stdio.h>
#include <libipv1/ip_lib.h>
#include <libpsio/psio.h>
#include <libciomr/libciomr.h>


FILE *infile, *outfile;

int main()
{
  int i, M, N;
  double enuc, *some_data;
  psio_address next;  /* Special page/offset structure */


  psi_start(&infile,&outfile,&psi_file_prefix,argc-1,argv+1,0);
  ip_cwk_add(progid);


  /* Initialize the I/O system */
  psio_init(); psio_ipv1_config();


  /* Open the file and write an energy */
  psio_open(31, PSIO_OPEN_NEW);
  enuc = 12.3456789; 
  psio_write_entry(31, "Nuclear Repulsion Energy", (char *) &enuc,
                   sizeof(double));
  psio_close(31,1);


  /* Read M rows of an MxN matrix from a file */
  some_data = init_matrix(M,N);


  psio_open(91, PSIO_OPEN_OLD);
  next = PSIO_ZERO;/* Note use of the special variable */
  for(i=0; i < M; i++)
      psio_read(91, "Some Coefficients", (char *) (some_data + i*N),
                N*sizeof(double), next, &next);
  psio_close(91,0);


  /* Close the I/O system */
  psio_done();

  ip_done();
}

char *gprgid()
{
   char *prgid = "CODE_NAME";
   return(prgid);
}


T. Daniel Crawford  / crawdad@vt.edu