Chapter 4 Extended File Name Syntax
4.1 Overview
CFITSIO supports an extended syntax when specifying the name of the
data file to be opened or created that includes the following
features:
-
CFITSIO can read IRAF format images which have header file names that
end with the '.imh' extension, as well as reading and writing FITS
files, This feature is implemented in CFITSIO by first converting the
IRAF image into a temporary FITS format file in memory, then opening
the FITS file. Any of the usual CFITSIO routines then may be used to
read the image header or data. Similarly, raw binary data arrays can
be read by converting them on the fly into virtual FITS images.
- FITS files on the internet can be read (and sometimes written) using the FTP,
HTTP, or ROOT protocols.
- FITS files can be piped between tasks on the stdin and stdout streams.
- FITS files can be read and written in shared memory. This can potentially
achieve much better data I/O performance compared to reading and
writing the same FITS files on magnetic disk.
- Compressed FITS files in gzip or Unix COMPRESS format can be directly read.
- Output FITS files can be written directly in compressed gzip format,
thus saving disk space.
- FITS table columns can be created, modified, or deleted 'on-the-fly' as
the table is opened by CFITSIO. This creates a virtual FITS file containing
the modifications that is then opened by the application program.
- Table rows may be selected, or filtered out, on the fly when the table
is opened by CFITSIO, based on an arbitrary user-specified expression.
Only rows for which the expression evaluates to 'TRUE' are retained
in the copy of the table that is opened by the application program.
- Histogram images may be created on the fly by binning the values in
table columns, resulting in a virtual N-dimensional FITS image. The
application program then only sees the FITS image (in the primary
array) instead of the original FITS table.
The latter 3 features in particular add very powerful data processing
capabilities directly into CFITSIO, and hence into every task that uses
CFITSIO to read or write FITS files. For example, these features
transform a very simple program that just copies an input FITS file to
a new output file (like the `fitscopy' program that is distributed with
CFITSIO) into a multipurpose FITS file processing tool. By appending
fairly simple qualifiers onto the name of the input FITS file, the user
can perform quite complex table editing operations (e.g., create new
columns, or filter out rows in a table) or create FITS images by
binning or histogramming the values in table columns. In addition,
these functions have been coded using new state-of-the art algorithms
that are, in some cases, 10 - 100 times faster than previous widely
used implementations.
Before describing the complete syntax for the extended FITS file names
in the next section, here are a few examples of FITS file names that
give a quick overview of the allowed syntax:
The full extended CFITSIO FITS file name can contain several different
components depending on the context. These components are described in
the following sections:
When creating a new file:
filetype://BaseFilename(templateName)
When opening an existing primary array or image HDU:
filetype://BaseFilename(outName)[HDUlocation][ImageSection]
When opening an existing table HDU:
filetype://BaseFilename(outName)[HDUlocation][colFilter][rowFilter][binSpec]
The filetype, BaseFilename, outName, HDUlocation, and ImageSection
components, if present, must be given in that order, but the colFilter,
rowFilter, and binSpec specifiers may follow in any order. Regardless
of the order, however, the colFilter specifier, if present, will be
processed first by CFITSIO, followed by the rowFilter specifier, and
finally by the binSpec specifier.
4.2 Filetype
The type of file determines the medium on which the file is located
(e.g., disk or network) and, hence, which internal device driver is used by
CFITSIO to read and/or write the file. Currently supported types are
file:// - file on local magnetic disk (default)
ftp:// - a readonly file accessed with the anonymous FTP protocol.
It also supports ftp://username:password@hostname/...
for accessing password-protected ftp sites.
http:// - a readonly file accessed with the HTTP protocol. It
does not support username:password like the ftp driver.
Proxy HTTP servers are supported using the http_proxy
environment variable.
root:// - uses the CERN root protocol for writing as well as
reading files over the network.
shmem:// - opens or creates a file which persists in the computer's
shared memory.
mem:// - opens a temporary file in core memory. The file
disappears when the program exits so this is mainly
useful for test purposes when a permanent output file
is not desired.
If the filetype is not specified, then type file:// is assumed.
The double slashes '//' are optional and may be omitted in most cases.
4.2.1 Notes about HTTP proxy servers
A proxy HTTP server may be used by defining the address (URL) and port
number of the proxy server with the http_proxy environment variable.
For example
setenv http_proxy http://heasarc.gsfc.nasa.gov:3128
will cause CFITSIO to use port 3128 on the heasarc proxy server whenever
reading a FITS file with HTTP.
4.2.2 Notes about the root filetype
The original rootd server can be obtained from:
ftp://root.cern.ch/root/rootd.tar.gz
but, for it to work correctly with CFITSIO one has to use a modified
version which supports a command to return the length of the file.
This modified version is available in rootd subdirectory
in the CFITSIO ftp area at
ftp://legacy.gsfc.nasa.gov/software/fitsio/c/root/rootd.tar.gz.
This small server is started either by inetd when a client requests a
connection to a rootd server or by hand (i.e. from the command line).
The rootd server works with the ROOT TNetFile class. It allows remote
access to ROOT database files in either read or write mode. By default
TNetFile assumes port 432 (which requires rootd to be started as root).
To run rootd via inetd add the following line to /etc/services:
rootd 432/tcp
and to /etc/inetd.conf, add the following line:
rootd stream tcp nowait root /user/rdm/root/bin/rootd rootd -i
Force inetd to reread its conf file with "kill -HUP <pid inetd>".
You can also start rootd by hand running directly under your private
account (no root system privileges needed). For example to start
rootd listening on port 5151 just type: rootd -p 5151
Notice: no & is needed. Rootd will go into background by itself.
Rootd arguments:
-i says we were started by inetd
-p port# specifies a different port to listen on
-d level level of debug info written to syslog
0 = no debug (default)
1 = minimum
2 = medium
3 = maximum
Rootd can also be configured for anonymous usage (like anonymous ftp).
To setup rootd to accept anonymous logins do the following (while being
logged in as root):
- Add the following line to /etc/passwd:
rootd:*:71:72:Anonymous rootd:/var/spool/rootd:/bin/false
where you may modify the uid, gid (71, 72) and the home directory
to suite your system.
- Add the following line to /etc/group:
rootd:*:72:rootd
where the gid must match the gid in /etc/passwd.
- Create the directories:
mkdir /var/spool/rootd
mkdir /var/spool/rootd/tmp
chmod 777 /var/spool/rootd/tmp
Where /var/spool/rootd must match the rootd home directory as
specified in the rootd /etc/passwd entry.
- To make writeable directories for anonymous do, for example:
mkdir /var/spool/rootd/pub
chown rootd:rootd /var/spool/rootd/pub
That's all. Several additional remarks: you can login to an anonymous
server either with the names "anonymous" or "rootd". The password should
be of type user@host.do.main. Only the @ is enforced for the time
being. In anonymous mode the top of the file tree is set to the rootd
home directory, therefore only files below the home directory can be
accessed. Anonymous mode only works when the server is started via
inetd.
4.2.3 Notes about the shmem filetype:
Shared memory files are currently supported on most Unix platforms,
where the shared memory segments are managed by the operating system
kernel and `live' independently of processes. They are not deleted (by
default) when the process which created them terminates, although they
will disappear if the system is rebooted. Applications can create
shared memory files in CFITSIO by calling:
fit_create_file(&fitsfileptr, "shmem://h2", &status);
where the root `file' names are currently restricted to be 'h0', 'h1',
'h2', 'h3', etc., up to a maximumn number defined by the the value of
SHARED_MAXSEG (equal to 16 by default). This is a prototype
implementation of the shared memory interface and a more robust
interface, which will have fewer restrictions on the number of files
and on their names, may be developed in the future.
When opening an already existing FITS file in shared memory one calls
the usual CFITSIO routine:
fits_open_file(&fitsfileptr, "shmem://h7", mode, &status)
The file mode can be READWRITE or READONLY just as with disk files.
More than one process can operate on READONLY mode files at the same
time. CFITSIO supports proper file locking (both in READONLY and
READWRITE modes), so calls to fits_open_file may be locked out until
another other process closes the file.
When an application is finished accessing a FITS file in a shared
memory segment, it may close it (and the file will remain in the
system) with fits_close_file, or delete it with fits_delete_file.
Physical deletion is postponed until the last process calls
ffclos/ffdelt. fits_delete_file tries to obtain a READWRITE lock on
the file to be deleted, thus it can be blocked if the object was not
opened in READWRITE mode.
A shared memory management utility program called `smem', is included
with the CFITSIO distribution. It can be built by typing `make smem';
then type `smem -h' to get a list of valid options. Executing smem
without any options causes it to list all the shared memory segments
currently residing in the system and managed by the shared memory
driver. To get a list of all the shared memory objects, run the system
utility program `ipcs [-a]'.
4.3 Base Filename
The base filename is the name of the file optionally including the
director/subdirectory path, and in the case of `ftp', `http', and `root'
filetypes, the machine identifier. Examples:
myfile.fits
!data.fits
/data/myfile.fits
fits.gsfc.nasa.gov/ftp/sampledata/myfile.fits.gz
When creating a new output file on magnetic disk (of type file://) if
the base filename begins with an exclamation point (!) then any
existing file with that same basename will be deleted prior to creating
the new FITS file. Otherwise if the file to be created already exists,
then CFITSIO will return an error and will not overwrite the existing
file. Note that the exclamation point, '!', is a special UNIX character,
so if it is used on the command line rather than entered at a task
prompt, it must be preceded by a backslash to force the UNIX
shell to pass it verbatim to the application program.
If the output disk file name ends with the suffix '.gz', then CFITSIO
will compress the file using the gzip compression algorithm before
writing it to disk. This can reduce the amount of disk space used by
the file. Note that this feature requires that the uncompressed file
be constructed in memory before it is compressed and written to disk,
so it can fail if there is insufficient available memory.
An input FITS file may be compressed with the gzip or Unix compress
algorithms, in which case CFITSIO will uncompress the file on the fly
into a temporary file (in memory or on disk). Compressed files may
only be opened with read-only permission. When specifying the name of
a compressed FITS file it is not necessary to append the file suffix
(e.g., `.gz' or `.Z'). If CFITSIO cannot find the input file name
without the suffix, then it will automatically search for a compressed
file with the same root name. In the case of reading ftp and http type
files, CFITSIO generally looks for a compressed version of the file
first, before trying to open the uncompressed file. By default,
CFITSIO copies (and uncompressed if necessary) the ftp or http FITS
file into memory on the local machine before opening it. This will
fail if the local machine does not have enough memory to hold the whole
FITS file, so in this case, the output filename specifier (see the next
section) can be used to further control how CFITSIO reads ftp and http
files.
If the input file is an IRAF image file (*.imh file) then CFITSIO will
automatically convert it on the fly into a virtual FITS image before it
is opened by the application program. IRAF images can only be opened
with READONLY file access.
Similarly, if the input file is a raw binary data array, then CFITSIO
will convert it on the fly into a virtual FITS image with the basic set
of required header keywords before it is opened by the application
program (with READONLY access). In this case the data type and
dimensions of the image must be specified in square brackets following
the filename (e.g. rawfile.dat[ib512,512]). The first character (case
insensitive) defines the datatype of the array:
b 8-bit unsigned byte
i 16-bit signed integer
u 16-bit unsigned integer
j 32-bit signed integer
r or f 32-bit floating point
d 64-bit floating point
An optional second character specifies the byte order of the array
values: b or B indicates big endian (as in FITS files and the native
format of SUN UNIX workstations and Mac PCs) and l or L indicates
little endian (native format of DEC OSF workstations and IBM PCs). If
this character is omitted then the array is assumed to have the native
byte order of the local machine. These datatype characters are then
followed by a series of one or more integer values separated by commas
which define the size of each dimension of the raw array. Arrays with
up to 5 dimensions are currently supported. Finally, a byte offset to
the position of the first pixel in the data file may be specified by
separating it with a ':' from the last dimension value. If omitted, it
is assumed that the offset = 0. This parameter may be used to skip
over any header information in the file that precedes the binary data.
Further examples:
raw.dat[b10000] 1-dimensional 10000 pixel byte array
raw.dat[rb400,400,12] 3-dimensional floating point big-endian array
img.fits[ib512,512:2880] reads the 512 x 512 short integer array in
a FITS file, skipping over the 2880 byte header
One special case of input file is where the filename = `-' (a dash or
minus sign) or 'stdin' or 'stdout', which signifies that the input file
is to be read from the stdin stream, or written to the stdout stream if
a new output file is being created. In the case of reading from stdin,
CFITSIO first copies the whole stream into a temporary FITS file (in
memory or on disk), and subsequent reading of the FITS file occurs in
this copy. When writing to stdout, CFITSIO first constructs the whole
file in memory (since random access is required), then flushes it out
to the stdout stream when the file is closed. In addition, if the
output filename = '-.gz' or 'stdout.gz' then it will be gzip compressed
before being written to stdout.
This ability to read and write on the stdin and stdout steams allows
FITS files to be piped between tasks in memory rather than having to
create temporary intermediate FITS files on disk. For example if task1
creates an output FITS file, and task2 reads an input FITS file, the
FITS file may be piped between the 2 tasks by specifying
task1 - | task2 -
where the vertical bar is the Unix piping symbol. This assumes that the 2
tasks read the name of the FITS file off of the command line.
4.4 Output File Name when Opening an Existing File
An optional output filename may be specified in parentheses immediately
following the base file name to be opened. This is mainly useful in
those cases where CFITSIO creates a temporary copy of the input FITS
file before it is opened and passed to the application program. This
happens by default when opening a network FTP or HTTP-type file, when
reading a compressed FITS file on a local disk, when reading from the
stdin stream, or when a column filter, row filter, or binning specifier
is included as part of the input file specification. By default this
temporary file is created in memory. If there is not enough memory to
create the file copy, then CFITSIO will exit with an error. In these
cases one can force a permanent file to be created on disk, instead of
a temporary file in memory, by supplying the name in parentheses
immediately following the base file name. The output filename can
include the '!' clobber flag.
Thus, if the input filename to CFITSIO is:
file1.fits.gz(file2.fits)
then CFITSIO will uncompress `file1.fits.gz' into the local disk file
`file2.fits' before opening it. CFITSIO does not automatically delete
the output file, so it will still exist after the application program
exits.
In some cases, several different temporary FITS files will be created
in sequence, for instance, if one opens a remote file using FTP, then
filters rows in a binary table extension, then create an image by
binning a pair of columns. In this case, the remote file will be
copied to a temporary local file, then a second temporary file will be
created containing the filtered rows of the table, and finally a third
temporary file containing the binned image will be created. In cases
like this where multiple files are created, the outfile specifier will
be interpreted the name of the final file as described below, in descending
priority:
-
as the name of the final image file if an image within a single binary
table cell is opened or if an image is created by binning a table column.
- as the name of the file containing the filtered table if a column filter
and/or a row filter are specified.
- as the name of the local copy of the remote FTP or HTTP file.
- as the name of the uncompressed version of the FITS file, if a
compressed FITS file on local disk has been opened.
- otherwise, the output filename is ignored.
The output file specifier is useful when reading FTP or HTTP-type
FITS files since it can be used to create a local disk copy of the file
that can be reused in the future. If the output file name = `*' then a
local file with the same name as the network file will be created.
Note that CFITSIO will behave differently depending on whether the
remote file is compressed or not as shown by the following examples:
-
`ftp://remote.machine/tmp/myfile.fits.gz(*)' - the remote compressed
file is copied to the local compressed file `myfile.fits.gz', which
is then uncompressed in local memory before being opened and passed
to the application program.
- `ftp://remote.machine/tmp/myfile.fits.gz(myfile.fits)' - the remote
compressed file is copied and uncompressed into the local file
`myfile.fits'. This example requires less local memory than the
previous example since the file is uncompressed on disk instead of
in memory.
- `ftp://remote.machine/tmp/myfile.fits(myfile.fits.gz)' - this will
usually produce an error since CFITSIO itself cannot compress files.
The exact behavior of CFITSIO in the latter case depends on the type of
ftp server running on the remote machine and how it is configured. In
some cases, if the file `myfile.fits.gz' exists on the remote machine,
then the server will copy it to the local machine. In other cases the
ftp server will automatically create and transmit a compressed version
of the file if only the uncompressed version exists. This can get
rather confusing, so users should use a certain amount of caution when
using the output file specifier with FTP or HTTP file types, to make
sure they get the behavior that they expect.
4.5 Template File Name when Creating a New File
When a new FITS file is created with a call to fits_create_file, the
name of a template file may be supplied in parentheses immediately
following the name of the new file to be created. This template is
used to define the structure of one or more HDUs in the new file. The
template file may be another FITS file, in which case the newly created
file will have exactly the same keywords in each HDU as in the template
FITS file, but all the data units will be filled with zeros. The
template file may also be an ASCII text file, where each line (in
general) describes one FITS keyword record. The format of the ASCII
template file is described below.
4.6 HDU Location Specification
The optional HDU location specifier defines which HDU (Header-Data
Unit, also known as an `extension') within the FITS file to initially
open. It must immediately follow the base file name (or the output
file name if present). If it is not specified then the first HDU (the
primary array) is opened. The HDU location specifier is required if
the colFilter, rowFilter, or binSpec specifiers are present, because
the primary array is not a valid HDU for these operations. The HDU may
be specified either by absolute position number, starting with 0 for
the primary array, or by reference to the HDU name, and optionally, the
version number and the HDU type of the desired extension. The location
of an image within a single cell of a binary table may also be
specified, as described below.
The absolute position of the extension is specified either by enclosed
the number in square brackets (e.g., `[1]' = the first extension
following the primary array) or by preceded the number with a plus sign
(`+1'). To specify the HDU by name, give the name of the desired HDU
(the value of the EXTNAME or HDUNAME keyword) and optionally the
extension version number (value of the EXTVER keyword) and the
extension type (value of the XTENSION keyword: IMAGE, ASCII or TABLE,
or BINTABLE), separated by commas and all enclosed in square brackets.
If the value of EXTVER and XTENSION are not specified, then the first
extension with the correct value of EXTNAME is opened. The extension
name and type are not case sensitive, and the extension type may be
abbreviated to a single letter (e.g., I = IMAGE extension or primary
array, A or T = ASCII table extension, and B = binary table BINTABLE
extension). If the HDU location specifier is equal to `[PRIMARY]' or
`[P]', then the primary array (the first HDU) will be opened.
FITS images are most commonly stored in the primary array or an image
extension, but images can also be stored as a vector in a single cell
of a binary table (i.e. each row of the vector column contains a
different image). Such an image can be opened with CFITSIO by
specifying the desired column name and the row number after the binary
table HDU specifier as shown in the following examples. The column name
is separated from the HDU specifier by a semicolon and the row number
is enclosed in parentheses. In this case CFITSIO copies the image from
the table cell into a temporary primary array before it is opened. The
application program then just sees the image in the primary array,
without any extensions. The particular row to be opened may be
specified either by giving an absolute integer row number (starting
with 1 for the first row), or by specifying a boolean expression that
evaluates to TRUE for the desired row. The first row that satisfies
the expression will be used. The row selection expression has the same
syntax as described in the Row Filter Specifier section, below.
Examples:
myfile.fits[3] - open the 3rd HDU following the primary array
myfile.fits+3 - same as above, but using the FTOOLS-style notation
myfile.fits[EVENTS] - open the extension that has EXTNAME = 'EVENTS'
myfile.fits[EVENTS, 2] - same as above, but also requires EXTVER = 2
myfile.fits[events,2,b] - same, but also requires XTENSION = 'BINTABLE'
myfile.fits[3; images(17)] - opens the image in row 17 of the 'images'
column in the 3rd extension of the file.
myfile.fits[3; images(exposure > 100)] - as above, but opens the image
in the first row that has an 'exposure' column value
greater than 100.
4.7 Image Section
A virtual file containing a rectangular subsection of an image can be
extracted and opened by specifying the range of pixels (start:end)
along each axis to be extracted from the original image. One can also
specify an optional pixel increment (start:end:step) for each axis of
the input image. A pixel step = 1 will be assumed if it is not
specified. If the start pixel is larger then the end pixel, then the
image will be flipped (producing a mirror image) along that dimension.
An asterisk, '*', may be used to specify the entire range of an axis,
and '-*' will flip the entire axis. The input image can be in the
primary array, in an image extension, or contained in a vector cell of
a binary table. In the later 2 cases the extension name or number must
be specified before the image section specifier.
Examples:
myfile.fits[1:512:2, 2:512:2] - open a 256x256 pixel image
consisting of the odd numbered columns (1st axis) and
the even numbered rows (2nd axis) of the image in the
primary array of the file.
myfile.fits[*, 512:256] - open an image consisting of all the columns
in the input image, but only rows 256 through 512.
The image will be flipped along the 2nd axis since
the starting pixel is greater than the ending pixel.
myfile.fits[*:2, 512:256:2] - same as above but keeping only
every other row and column in the input image.
myfile.fits[-*, *] - copy the entire image, flipping it along
the first axis.
myfile.fits[3][1:256,1:256] - opens a subsection of the image that
is in the 3rd extension of the file.
myfile.fits[4; images(12)][1:10,1:10] - open an image consisting
of the first 10 pixels in both dimensions. The original
image resides in the 12th row of the 'images' vector
column in the table in the 4th extension of the file.
When CFITSIO opens an image section it first creates a temporary file
containing the image section plus a copy of any other HDUs in the
file. This temporary file is then opened by the application program,
so it is not possible to write to or modify the input file when
specifying an image section. Note that CFITSIO automatically updates
the world coordinate system keywords in the header of the image
section, if they exist, so that the coordinate associated with each
pixel in the image section will be computed correctly.
4.8 Column and Keyword Filtering Specification
The optional column/keyword filtering specifier is used to modify the
column structure and/or the header keywords in the HDU that was
selected with the previous HDU location specifier. This filtering
specifier must be enclosed in square brackets and can be distinguished
from a general row filter specifier (described below) by the fact that
it begins with the string 'col ' and is not immediately followed by an
equals sign. The original file is not changed by this filtering
operation, and instead the modifications are made on a copy of the
input FITS file (usually in memory), which also contains a copy of all
the other HDUs in the file. This temporary file is passed to the
application program and will persist only until the file is closed or
until the program exits, unless the outfile specifier (see above) is
also supplied.
The column/keyword filter can be used to perform the following
operations. More than one operation may be specified by separating
them with semi-colons.
- Copy only a specified list of columns columns to the filtered input file.
The list of column name should be separated by semi-colons. Wild card
characters may be used in the column names to match multiple columns.
If the expression contains both a list of columns to be included and
columns to be deleted, then all the columns in the original table
except the explicitly deleted columns will appear in the filtered
table (i.e., there is no need to explicitly list the columns to
be included if any columns are being deleted).
- Delete a column or keyword by listing the name preceded by a minus
sign or an exclamation mark (!), e.g., '-TIME' will delete the TIME
column if it exists, otherwise the TIME keyword. An error is returned
if neither a column nor keyword with this name exists. Note that the
exclamation point, '!', is a special UNIX character, so if it is used
on the command line rather than entered at a task prompt, it must be
preceded by a backslash to force the UNIX shell to ignore it.
- Rename an existing column or keyword with the syntax 'NewName ==
OldName'. An error is returned if neither a column nor keyword with
this name exists.
- Append a new column or keyword to the table. To create a column,
give the new name, optionally followed by the datatype in parentheses,
followed by a single equals sign and an expression to be used to
compute the value (e.g., 'newcol(1J) = 0' will create a new 32-bit
integer column called 'newcol' filled with zeros). The datatype is
specified using the same syntax that is allowed for the value of the
FITS TFORMn keyword (e.g., 'I', 'J', 'E', 'D', etc. for binary tables,
and 'I8', F12.3', 'E20.12', etc. for ASCII tables). If the datatype is
not specified then an appropriate datatype will be chosen depending on
the form of the expression (may be a character string, logical, bit, long
integer, or double column). An appropriate vector count (in the case
of binary tables) will also be added if not explicitly specified.
When creating a new keyword, the keyword name must be preceded by a
pound sign '#', and the expression must evaluate to a scalar
(i.e., cannot have a column name in the expression). The comment
string for the keyword may be specified in parentheses immediately
following the keyword name (instead of supplying a datatype as in
the case of creating a new column).
- Recompute (overwrite) the values in an existing column or keyword by
giving the name followed by an equals sign and an arithmetic
expression.
The expression that is used when appending or recomputing columns or
keywords can be arbitrarily complex and may be a function of other
header keyword values and other columns (in the same row). The full
syntax and available functions for the expression are described below
in the row filter specification section.
If the expression contains both a list of columns to be included and
columns to be deleted, then all the columns in the original table
except the explicitly deleted columns will appear in the filtered
table.
For complex or commonly used operations, one can also place the
operations into an external text file and import it into the column
filter using the syntax '[col @filename.txt]'. The operations can
extend over multiple lines of the file, but multiple operations must
still be separated by semicolons. Any lines in the external text file
that begin with 2 slash characters ('//') will be ignored and may be
used to add comments into the file.
Examples:
[col Time;rate] - only the Time and rate columns will
appear in the filtered input file.
[col Time;*raw] - include the Time column and any other
columns whose name ends with 'raw'.
[col -TIME; Good == STATUS] - deletes the TIME column and
renames the status column to 'Good'
[col PI=PHA * 1.1 + 0.2] - creates new PI column from PHA values
[col rate = rate/exposure] - recomputes the rate column by dividing
it by the EXPOSURE keyword value.
4.9 Row Filtering Specification
When entering the name of a FITS table that is to be opened by a
program, an optional row filter may be specified to select a subset
of the rows in the table. A temporary new FITS file is created on
the fly which contains only those rows for which the row filter
expression evaluates to true. (The primary array and any other
extensions in the input file are also copied to the temporary
file). The original FITS file is closed and the new virtual file
is opened by the application program. The row filter expression is
enclosed in square brackets following the file name and extension
name (e.g., 'file.fits[events][GRADE==50]' selects only those rows
where the GRADE column value equals 50). When dealing with tables
where each row has an associated time and/or 2D spatial position,
the row filter expression can also be used to select rows based on
the times in a Good Time Intervals (GTI) extension, or on spatial
position as given in a SAO-style region file.
4.9.1 General Syntax
The row filtering expression can be an arbitrarily complex series
of operations performed on constants, keyword values, and column
data taken from the specified FITS TABLE extension. The expression
must evaluate to a boolean value for each row of the table, where
a value of FALSE means that the row will be excluded.
For complex or commonly used filters, one can place the expression
into a text file and import it into the row filter using the syntax
'[@filename.txt]'. The expression can be arbitrarily complex and
extend over multiple lines of the file. Any lines in the external
text file that begin with 2 slash characters ('//') will be ignored
and may be used to add comments into the file.
Keyword and column data are referenced by name. Any string of
characters not surrounded by quotes (ie, a constant string) or
followed by an open parentheses (ie, a function name) will be
initially interpreted as a column name and its contents for the
current row inserted into the expression. If no such column exists,
a keyword of that name will be searched for and its value used, if
found. To force the name to be interpreted as a keyword (in case
there is both a column and keyword with the same name), precede the
keyword name with a single pound sign, '#', as in '#NAXIS2'. Due to
the generalities of FITS column and keyword names, if the column or
keyword name contains a space or a character which might appear as
an arithmetic term then inclose the name in '$' characters as in
$MAX PHA$ or #$MAX-PHA$. Names are case insensitive.
To access a table entry in a row other than the current one, follow
the column's name with a row offset within curly braces. For
example, 'PHA{-3}' will evaluate to the value of column PHA, 3 rows
above the row currently being processed. One cannot specify an
absolute row number, only a relative offset. Rows that fall outside
the table will be treated as undefined, or NULLs.
Boolean operators can be used in the expression in either their
Fortran or C forms. The following boolean operators are available:
"equal" .eq. .EQ. == "not equal" .ne. .NE. !=
"less than" .lt. .LT. < "less than/equal" .le. .LE. <= =<
"greater than" .gt. .GT. > "greater than/equal" .ge. .GE. >= =>
"or" .or. .OR. || "and" .and. .AND. &&
"negation" .not. .NOT. ! "approx. equal(1e-7)" ~
Note that the exclamation
point, '!', is a special UNIX character, so if it is used on the
command line rather than entered at a task prompt, it must be preceded
by a backslash to force the UNIX shell to ignore it.
The expression may also include arithmetic operators and functions.
Trigonometric functions use radians, not degrees. The following
arithmetic operators and functions can be used in the expression
(function names are case insensitive):
"addition" + "subtraction" -
"multiplication" * "division" /
"negation" - "exponentiation" ** ^
"absolute value" abs(x) "cosine" cos(x)
"sine" sin(x) "tangent" tan(x)
"arc cosine" arccos(x) "arc sine" arcsin(x)
"arc tangent" arctan(x) "arc tangent" arctan2(x,y)
"exponential" exp(x) "square root" sqrt(x)
"natural log" log(x) "common log" log10(x)
"modulus" i % j "random # [0.0,1.0)" random()
"minimum" min(x,y) "maximum" max(x,y)
"if-then-else" b?x:y
An alternate syntax for the min and max functions has only a single
argument which should be a vector value (see below). The result
will be the minimum/maximum element contained within the vector.
There are three functions that are primarily for use with SAO region
files and the FSAOI task, but they can be used directly. They
return a boolean true or false depending on whether a two
dimensional point is in the region or not:
"point in a circular region"
circle(xcntr,ycntr,radius,Xcolumn,Ycolumn)
"point in an elliptical region"
ellipse(xcntr,ycntr,xhlf_wdth,yhlf_wdth,rotation,Xcolumn,Ycolumn)
"point in a rectangular region"
box(xcntr,ycntr,xfll_wdth,yfll_wdth,rotation,Xcolumn,Ycolumn)
where
(xcntr,ycntr) are the (x,y) position of the center of the region
(xhlf_wdth,yhlf_wdth) are the (x,y) half widths of the region
(xfll_wdth,yfll_wdth) are the (x,y) full widths of the region
(radius) is half the diameter of the circle
(rotation) is the angle(degrees) that the region is rotated with
respect to (xcntr,ycntr)
(Xcoord,Ycoord) are the (x,y) coordinates to test, usually column
names
NOTE: each parameter can itself be an expression, not merely a
column name or constant.
There is also a function for testing if two values are close to
each other, i.e., if they are "near" each other to within a user
specified tolerance. The arguments, value_1 and value_2 can be
integer or real and represent the two values who's proximity is
being tested to be within the specified tolerance, also an integer
or real:
near(value_1, value_2, tolerance)
When a NULL, or undefined, value is encountered in the FITS table,
the expression will evaluate to NULL unless the undefined value is
not actually required for evaluation, e.g. "TRUE .or. NULL"
evaluates to TRUE. The following two functions allow some NULL
detection and handling: ISNULL(x) and DEFNULL(x,y). The former
returns a boolean value of TRUE if the argument x is NULL. The
later "defines" a value to be substituted for NULL values; it
returns the value of x if x is not NULL, otherwise it returns the
value of y.
The following type casting operators are available, where the
inclosing parentheses are required and taken from the C language
usage. Also, the integer to real casts values to double precision:
"real to integer" (int) x (INT) x
"integer to real" (float) i (FLOAT) i
Bit masks can be used to select out rows from bit columns (TFORMn =
#X) in FITS files. To represent the mask, binary, octal, and hex
formats are allowed:
binary: b0110xx1010000101xxxx0001
octal: o720x1 -> (b111010000xxx001)
hex: h0FxD -> (b00001111xxxx1101)
In all the representations, an x or X is allowed in the mask as a
wild card. Note that the x represents a different number of wild
card bits in each representation. All representations are case
insensitive.
To construct the boolean expression using the mask as the boolean
equal operator described above on a bit table column. For example,
if you had a 7 bit column named flags in a FITS table and wanted
all rows having the bit pattern 0010011, the selection expression
would be:
flags == b0010011
or
flags .eq. b10011
It is also possible to test if a range of bits is less than, less
than equal, greater than and greater than equal to a particular
boolean value:
flags <= bxxx010xx
flags .gt. bxxx100xx
flags .le. b1xxxxxxx
Notice the use of the x bit value to limit the range of bits being
compared.
It is not necessary to specify the leading (most significant) zero
(0) bits in the mask, as shown in the second expression above.
Bit wise AND, OR and NOT operations are also possible on two or
more bit fields using the '&'(AND), '|'(OR), and the '!'(NOT)
operators. All of these operators result in a bit field which can
then be used with the equal operator. For example:
(!flags) == b1101100
(flags & b1000001) == bx000001
Bit fields can be appended as well using the '+' operator. Strings
can be concatenated this way, too.
In addition, several constants are built in for use in numerical
expressions:
#pi 3.1415... #e 2.7182...
#deg #pi/180 #row current row number
#null undefined value #snull undefined string
A string constant must be enclosed in quotes as in 'Crab'. The
"null" constants are useful for conditionally setting table values
to a NULL, or undefined, value (eg., "col1==-99 ? #NULL : col1").
4.9.2 Vector Columns
Vector columns can also be used in building the expression. No
special syntax is required if one wants to operate on all elements
of the vector. Simply use the column name as for a scalar column.
Vector columns can be freely intermixed with scalar columns or
constants in virtually all expressions. The result will be of the
same dimension as the vector. Two vectors in an expression, though,
need to have the same number of elements and have the same
dimensions. The only places a vector column cannot be used (for
now, anyway) are the SAO region functions and the NEAR boolean
function.
Arithmetic and logical operations are all performed on an element by
element basis. Comparing two vector columns, eg "COL1 == COL2",
thus results in another vector of boolean values indicating which
elements of the two vectors are equal. Two functions are available
which operate on vectors: SUM(x) and NELEM(x). The former
literally sums all the elements in x, returning a scalar value. If
x is a boolean vector, SUM returns the number of TRUE elements.
The latter, NELEM, returns the number of elements in vector x.
(NELEM also operates on bit and string columns, returning their
column widths.) As an example, to test whether all elements of two
vectors satisfy a given logical comparison, one can use the
expression
SUM( COL1 > COL2 ) == NELEM( COL1 )
which will return TRUE if all elements of COL1 are greater than
their corresponding elements in COL2.
To specify a single element of a vector, give the column name
followed by a comma-separated list of coordinates enclosed in
square brackets. For example, if a vector column named PHAS exists
in the table as a one dimensional, 256 component list of numbers
from which you wanted to select the 57th component for use in the
expression, then PHAS[57] would do the trick. Higher dimensional
arrays of data may appear in a column. But in order to interpret
them, the TDIMn keyword must appear in the header. Assuming that a
(4,4,4,4) array is packed into each row of a column named ARRAY4D,
the (1,2,3,4) component element of each row is accessed by
ARRAY4D[1,2,3,4]. Arrays up to dimension 5 are currently
supported. Each vector index can itself be an expression, although
it must evaluate to an integer value within the bounds of the
vector. Vector columns which contain spaces or arithmetic operators
must have their names enclosed in "$" characters as with
$ARRAY-4D$[1,2,3,4].
A more C-like syntax for specifying vector indices is also
available. The element used in the preceding example alternatively
could be specified with the syntax ARRAY4D[4][3][2][1]. Note the
reverse order of indices (as in C), as well as the fact that the
values are still ones-based (as in Fortran -- adopted to avoid
ambiguity for 1D vectors). With this syntax, one does not need to
specify all of the indices. To extract a 3D slice of this 4D
array, use ARRAY4D[4].
Variable-length vector columns are not supported.
Vectors can be manually constructed within the expression using a
comma-separated list of elements surrounded by curly braces ('{}').
For example, '{1,3,6,1}' is a 4-element vector containing the values
1, 3, 6, and 1. The vector can contain only boolean, integer, and
real values (or expressions). The elements will be promoted to the
highest datatype present. Any elements which are themselves
vectors, will be expanded out with each of its elements becoming an
element in the constructed vector.
4.9.3 Good Time Interval Filtering
A common filtering method involves selecting rows which have a time
value which lies within what is called a Good Time Interval or GTI.
The time intervals are defined in a separate FITS table extension
which contains 2 columns giving the start and stop time of each
good interval. The filtering operation accepts only those rows of
the input table which have an associated time which falls within
one of the time intervals defined in the GTI extension. A high
level function, gtifilter(a,b,c,d), is available which evaluates
each row of the input table and returns TRUE or FALSE depending
whether the row is inside or outside the good time interval. The
syntax is
gtifilter( [ "gtifile" [, expr [, "STARTCOL", "STOPCOL" ] ] ] )
where each "[]" demarks optional parameters. Note that the quotes
around the gtifile and START/STOP column are required. The gtifile,
if specified, can be blank ("") which will mean to use the first
extension with the name "*GTI*" in the current file, a plain
extension specifier (eg, "+2", "[2]", or "[STDGTI]") which will be
used to select an extension in the current file, or a regular
filename with or without an extension specifier which in the latter
case will mean to use the first extension with an extension name
"*GTI*". Expr can be any arithmetic expression, including simply
the time column name. A vector time expression will produce a
vector boolean result. STARTCOL and STOPCOL are the names of the
START/STOP columns in the GTI extension. If one of them is
specified, they both must be.
In its simplest form, no parameters need to be provided -- default
values will be used. The expression "gtifilter()" is equivalent to
gtifilter( "", TIME, "*START*", "*STOP*" )
This will search the current file for a GTI extension, filter the
TIME column in the current table, using START/STOP times taken from
columns in the GTI extension with names containing the strings
"START" and "STOP". The wildcards ('*') allow slight variations in
naming conventions such as "TSTART" or "STARTTIME". The same
default values apply for unspecified parameters when the first one
or two parameters are specified. The function automatically
searches for TIMEZERO/I/F keywords in the current and GTI
extensions, applying a relative time offset, if necessary.
4.9.4 Spatial Region Filtering
Another common filtering method selects rows based on whether the
spatial position associated with each row is located within a given
2-dimensional region. The syntax for this high-level filter is
regfilter( "regfilename" [ , Xexpr, Yexpr [ , "wcs cols" ] ] )
where each "[]" demarks optional parameters. The region file name
is required and must be enclosed in quotes. The remaining
parameters are optional. The region file is an ASCII text file
which contains a list of one or more geometric shapes (circle,
ellipse, box, etc.) which defines a region on the celestial sphere
or an area within a particular 2D image. The region file is
typically generated using an image display program such as fv/POW
(distribute by the HEASARC), or ds9 (distributed by the Smithsonian
Astrophysical Observatory). Users should refer to the documentation
provided with these programs for more details on the syntax used in
the region files.
In its simpliest form, (e.g., regfilter("region.reg") ) the
coordinates in the default 'X' and 'Y' columns will be used to
determine if each row is inside or outside the area specified in
the region file. Alternate position column names, or expressions,
may be entered if needed, as in
regfilter("region.reg", XPOS, YPOS)
Region filtering can be applied most unambiguously if the positions
in the region file and in the table to be filtered are both give in
terms of absolute celestial coordinate units. In this case the
locations and sizes of the geometric shapes in the region file are
specified in angular units on the sky (e.g., positions given in
R.A. and Dec. and sizes in arcseconds or arcminutes). Similarly,
each row of the filtered table will have a celestial coordinate
associated with it. This association is usually implemented using
a set of so-called 'World Coordinate System' (or WCS) FITS keywords
that define the coordinate transformation that must be applied to
the values in the 'X' and 'Y' columns to calculate the coordinate.
Alternatively, one can perform spatial filtering using unitless
'pixel' coordinates for the regions and row positions. In this
case the user must be careful to ensure that the positions in the 2
files are self-consistent. A typical problem is that the region
file may be generated using a binned image, but the unbinned
coordinates are given in the event table. The ROSAT events files,
for example, have X and Y pixel coordinates that range from 1 -
15360. These coordinates are typically binned by a factor of 32 to
produce a 480x480 pixel image. If one then uses a region file
generated from this image (in image pixel units) to filter the
ROSAT events file, then the X and Y column values must be converted
to corresponding pixel units as in:
regfilter("rosat.reg", X/32.+.5, Y/32.+.5)
Note that this binning conversion is not necessary if the region
file is specified using celestial coordinate units instead of pixel
units because CFITSIO is then able to directly compare the
celestial coordinate of each row in the table with the celestial
coordinates in the region file without having to know anything
about how the image may have been binned.
The last "wcs cols" parameter should rarely be needed. If supplied,
this string contains the names of the 2 columns (space or comma
separated) which have the associated WCS keywords. If not supplied,
the filter will scan the X and Y expressions for column names.
If only one is found in each expression, those columns will be
used, otherwise an error will be returned.
These region shapes are supported (names are case insensitive):
Point ( X1, Y1 ) <- One pixel square region
Line ( X1, Y1, X2, Y2 ) <- One pixel wide region
Polygon ( X1, Y1, X2, Y2, ... ) <- Rest are interiors with
Rectangle ( X1, Y1, X2, Y2, A ) | boundaries considered
Box ( Xc, Yc, Wdth, Hght, A ) V within the region
Diamond ( Xc, Yc, Wdth, Hght, A )
Circle ( Xc, Yc, R )
Annulus ( Xc, Yc, Rin, Rout )
Ellipse ( Xc, Yc, Rx, Ry, A )
Elliptannulus ( Xc, Yc, Rinx, Riny, Routx, Routy, Ain, Aout )
Sector ( Xc, Yc, Amin, Amax )
where (Xc,Yc) is the coordinate of the shape's center; (X#,Y#) are
the coordinates of the shape's edges; Rxxx are the shapes' various
Radii or semimajor/minor axes; and Axxx are the angles of rotation
(or bounding angles for Sector) in degrees. For rotated shapes, the
rotation angle can be left off, indicating no rotation. Common
alternate names for the regions can also be used: rotbox = box;
rotrectangle = rectangle; (rot)rhombus = (rot)diamond; and pie
= sector. When a shape's name is preceded by a minus sign, '-',
the defined region is instead the area *outside* its boundary (ie,
the region is inverted). All the shapes within a single region
file are OR'd together to create the region, and the order is
significant. The overall way of looking at region files is that if
the first region is an excluded region then a dummy included region
of the whole detector is inserted in the front. Then each region
specification as it is processed overrides any selections inside of
that region specified by previous regions. Another way of thinking
about this is that if a previous excluded region is completely
inside of a subsequent included region the excluded region is
ignored.
The positional coordinates may be given either in pixel units,
decimal degrees or hh:mm:ss.s, dd:mm:ss.s units. The shape sizes
may be given in pixels, degrees, arcminutes, or arcseconds. Look
at examples of region file produced by fv/POW or ds9 for further
details of the region file format.
4.9.5 Example Row Filters
[ binary && mag <= 5.0] - Extract all binary stars brighter
than fifth magnitude (note that
the initial space is necessary to
prevent it from being treated as a
binning specification)
[#row >= 125 && #row <= 175] - Extract row numbers 125 through 175
[IMAGE[4,5] .gt. 100] - Extract all rows that have the
(4,5) component of the IMAGE column
greater than 100
[abs(sin(theta * #deg)) < 0.5] - Extract all rows having the
absolute value of the sine of theta
less than a half where the angles
are tabulated in degrees
[SUM( SPEC > 3*BACKGRND )>=1] - Extract all rows containing a
spectrum, held in vector column
SPEC, with at least one value 3
times greater than the background
level held in a keyword, BACKGRND
[VCOL=={1,4,2}] - Extract all rows whose vector column
VCOL contains the 3-elements 1, 4, and
2.
[@rowFilter.txt] - Extract rows using the expression
contained within the text file
rowFilter.txt
[gtifilter()] - Search the current file for a GTI
extension, filter the TIME
column in the current table, using
START/STOP times taken from
columns in the GTI extension
[regfilter("pow.reg")] - Extract rows which have a coordinate
(as given in the X and Y columns)
within the spatial region specified
in the pow.reg region file.
[regfilter("pow.reg", Xs, Ys)] - Same as above, except that the
Xs and Ys columns will be used to
determine the coordinate of each
row in the table.
4.10 Binning or Histogramming Specification
The optional binning specifier is enclosed in square brackets and can
be distinguished from a general row filter specification by the fact
that it begins with the keyword 'bin' not immediately followed by an
equals sign. When binning is specified, a temporary N-dimensional FITS
primary array is created by computing the histogram of the values in
the specified columns of a FITS table extension. After the histogram
is computed the input FITS file containing the table is then closed and
the temporary FITS primary array is opened and passed to the
application program. Thus, the application program never sees the
original FITS table and only sees the image in the new temporary file
(which has no additional extensions). Obviously, the application
program must be expecting to open a FITS image and not a FITS table in
this case.
The data type of the FITS histogram image may be specified by appending
'b' (for 8-bit byte), 'i' (for 16-bit integers), 'j' (for 32-bit
integer), 'r' (for 32-bit floating points), or 'd' (for 64-bit double
precision floating point) to the 'bin' keyword (e.g. '[binr X]'
creates a real floating point image). If the datatype is not
explicitly specified then a 32-bit integer image will be created by
default, unless the weighting option is also specified in which case
the image will have a 32-bit floating point data type by default.
The histogram image may have from 1 to 4 dimensions (axes), depending
on the number of columns that are specified. The general form of the
binning specification is:
[bin{bijrd} Xcol=min:max:binsize, Ycol= ..., Zcol=..., Tcol=...; weight]
in which up to 4 columns, each corresponding to an axis of the image,
are listed. The column names are case insensitive, and the column
number may be given instead of the name, preceded by a pound sign
(e.g., [bin #4=1:512]). If the column name is not specified, then
CFITSIO will first try to use the 'preferred column' as specified by
the CPREF keyword if it exists (e.g., 'CPREF = 'DETX,DETY'), otherwise
column names 'X', 'Y', 'Z', and 'T' will be assumed for each of the 4
axes, respectively.
Each column name may be followed by an equals sign and then the lower
and upper range of the histogram, and the size of the histogram bins,
separated by colons. Spaces are allowed before and after the equals
sign but not within the 'min:max:binsize' string. The min, max and
binsize values may be integer or floating point numbers, or they may be
the names of keywords in the header of the table. If the latter, then
the value of that keyword is substituted into the expression.
Default values for the min, max and binsize quantities will be
used if not explicitly given in the binning expression as shown
in these examples:
[bin x = :512:2] - use default minimum value
[bin x = 1::2] - use default maximum value
[bin x = 1:512] - use default bin size
[bin x = 1:] - use default maximum value and bin size
[bin x = :512] - use default minimum value and bin size
[bin x = 2] - use default minimum and maximum values
[bin x] - use default minimum, maximum and bin size
[bin 4] - default 2-D image, bin size = 4 in both axes
[bin] - default 2-D image
CFITSIO will use the value of the TLMINn, TLMAXn, and TDBINn keywords,
if they exist, for the default min, max, and binsize, respectively. If
they do not exist then CFITSIO will use the actual minimum and maximum
values in the column for the histogram min and max values. The default
binsize will be set to 1, or (max - min) / 10., whichever is smaller,
so that the histogram will have at least 10 bins along each axis.
A shortcut notation is allowed if all the columns/axes have the same
binning specification. In this case all the column names may be listed
within parentheses, followed by the (single) binning specification, as
in:
[bin (X,Y)=1:512:2]
[bin (X,Y) = 5]
The optional weighting factor is the last item in the binning specifier
and, if present, is separated from the list of columns by a
semi-colon. As the histogram is accumulated, this weight is used to
incremented the value of the appropriated bin in the histogram. If the
weighting factor is not specified, then the default weight = 1 is
assumed. The weighting factor may be a constant integer or floating
point number, or the name of a keyword containing the weighting value.
Or the weighting factor may be the name of a table column in which case
the value in that column, on a row by row basis, will be used.
In some cases, the column or keyword may give the reciprocal of the
actual weight value that is needed. In this case, precede the weight
keyword or column name by a slash '/' to tell CFITSIO to use the
reciprocal of the value when constructing the histogram.
For complex or commonly used histograms, one can also place its
description into a text file and import it into the binning
specification using the syntax '[bin @filename.txt]'. The file's
contents can extend over multiple lines, although it must still
conform to the no-spaces rule for the min:max:binsize syntax and each
axis specification must still be comma-separated. Any lines in the
external text file that begin with 2 slash characters ('//') will be
ignored and may be used to add comments into the file.
Examples:
[bini detx, dety] - 2-D, 16-bit integer histogram
of DETX and DETY columns, using
default values for the histogram
range and binsize
[bin (detx, dety)=16; /exposure] - 2-D, 32-bit real histogram of DETX
and DETY columns with a bin size = 16
in both axes. The histogram values
are divided by the EXPOSURE keyword
value.
[bin time=TSTART:TSTOP:0.1] - 1-D lightcurve, range determined by
the TSTART and TSTOP keywords,
with 0.1 unit size bins.
[bin pha, time=8000.:8100.:0.1] - 2-D image using default binning
of the PHA column for the X axis,
and 1000 bins in the range
8000. to 8100. for the Y axis.
[bin @binFilter.txt] - Use the contents of the text file
binFilter.txt for the binning
specifications.