Funtext: support for column-based text files

Summary

This document contains a summary of the options for processing column-based text files.

Description

Funtools will automatically sense and process "standard" column-based text files as if they were FITS binary tables without any change in Funtools syntax. In particular, you can filter text files using the same syntax as FITS binary tables:

  fundisp foo'[cir 512 512 .1]'
  fundisp -T foo
  funtable foo'[pha=1:10,cir 512 512 10]' foo.fits

The first example displays a filtered selection of a text file. The second example converts a text file to an RDB file. The third example converts a filtered selection of a text file to a FITS binary table.

Text files can also be used in Funtools image programs. In this case, you must provide binning parameters (as with raw event files), using the bincols keyword specifier:

  bincols=([xname[:tlmin[:tlmax:[binsiz]]]],[yname[:tlmin[:tlmax[:binsiz]]]
For example:
  funcnts foo'[bincols=(x:1024,y:1024)]' "ann 512 512 0 10 n=10"

Standard Text Files

Standard text files have the following characteristics:

Examples:

  # rdb file
  foo1	foo2	foo3	foos
  ----	----	----	----
  1	2.2	3	xxxx
  10	20.2	30	yyyy

  # multiple consecutive whitespace and dashes
  foo1   foo2    foo3 foos
  ---    ----    ---- ----
     1    2.2    3    xxxx
    10   20.2    30   yyyy

  # comma delims and blank lines
  foo1,foo2,foo3,foos

  1,2.2,3,xxxx
  10,20.2,30,yyyy

  # bar delims with null values
  foo1|foo2|foo3|foos
  1||3|xxxx
  10|20.2||yyyy

  # header-less data
  1	2.2   3	xxxx
  10	20.2 30	yyyy

The default set of token delimiters consists of spaces, tabs, commas, semi-colons, and vertical bars. Several parsers are tried simultaneously to analyze a line of text in different ways. One way of analyzing a line is to allow a combination of spaces, tabs, and commas to be squashed into a single delimiter (no null values between consecutive delimiters). Another way is to allow tab, semi-colon, and vertical bar delimiters to support null values, i.e. two consecutive delimiters implies a null value (e.g. RDB file). A successful parser is one which returns a consistent number of columns for all rows, with each column having a consistent data type. More than one parser can be successful. For now, it is assumed that they return the same tokens for a given line (theoretically, there are pathological cases, to be taken care of later on, maybe). Bad parsers are discarded on the fly.

If the header does not exist, then names "col1", "col2", etc. are assigned to the columns to allow filtering. Furthermore, data types for each column are determined by the data types found in the columns of the first data line and can be one of the following: string, int, and double. Thus, all of the above examples return the following display:

  fundisp foo'[foo1>5]'
        FOO1                  FOO2       FOO3         FOOS
  ---------- --------------------- ---------- ------------
          10           20.20000000         30         yyyy

Comments Convert to Header Params

Comments which precede data rows are converted into header parameters and will be written out as such using funimage or funhead. Two styles of comments are recognized:

1. FITS-style comments have an equal sign "=" between the keyword and value and an optional slash "/" to signify a comment. The strict FITS rules on column positions are not enforced. In addition, strings only need to be quoted if they contain whitespace. For example, the following are valid FITS-style comments:

  # fits0 = 100
  # fits1 = /usr/local/bin
  # fits2 = "/usr/local/bin /opt/local/bin"
  # fits3c = /usr/local/bin /opt/local/bin /usr/bin
  # fits4c = "/usr/local/bin /opt/local/bin" / path dir
Note that the fits3c comment is not quoted and therefore its value is the single token "/usr/local/bin" and the comment is "opt/local/bin /usr/bin". This different from the quoted comment in fits4c.

2. Free-form comments can have an optional colon separator between the keyword and value. In the absence of quote, all tokens after the keyword are part of the value, i.e. no comment is allowed. If a string is quoted, then slash "/" after the string will signify a comment. For example:

  # foo1 /usr/local/bin
  # foo2 "/usr/local/bin /opt/local/bin"
  # foo3 /usr/local/bin /opt/local/bin /usr/bin
  # foo4c "/usr/local/bin /opt/local/bin" / path dir
  
  # goo1: /usr/local/bin
  # goo2: "/usr/local/bin /opt/local/bin"
  # goo3: /usr/local/bin /opt/local/bin /usr/bin
  # goo4c: "/usr/local/bin /opt/local/bin" / path dir

Note that foo3 and goo3 are not quoted, so the whole string is part of the value, while foo4c and goo4c are quoted and have comments following the values.

Multiple Tables in a Single File

Multiple tables are supported in a single file. If an RDB-style file is sensed, then a ^L will signify end of table. Otherwise, an end of table is sensed when a new header (i.e., all alphanumeric columns) is found. Also, for standard parsers, end of table is sensed when a comment is found, i.e. comments are not mixed with data rows (although blank lines can be mixed).

You can access the nth table (starting from 0) in a multi-table file by enclosing the table number in brackets, as with a FITS extension:

  fundisp foo'[2]'
The above example will display the third table in the file.

TEXT() Specifier

As with ARRAY() and EVENTS() specifiers for raw image arrays and raw event lists respectively, you can use the TEXT() on text files to pass key=value options to the parsers. An empty set of keywords is equivalent to not having TEXT() at all, that is:

  fundisp foo
  fundisp foo'[TEXT()]'
are equivalent. A multi-table index number is placed inside the TEXT() specifier as the first token, when indexing into a multi-table: fundisp foo'[TEXT(2,...)]'

The filter specification is placed after the TEXT() specifier, separated by a comma, or in an entirely separate bracket:

  fundisp foo'[TEXT(...),circle 512 512 .1]'
  fundisp foo'[TEXT(2,...)][circle 512 512 .1]'

Text() Keyword Options

The following is a list of keywords that can be used within the TEXT() specifier (the first three are the most important ones):

delims="[delims]"
Specify token delimiters for this file. Only a single parser having these delimiters will be used to process the file.
  fundisp foo.fits'[TEXT(delims="!")]'
  fundisp foo.fits'[TEXT(delims="\t%")]'

comchars="[comchars]"
Specify comment characters. You must include "\n" to allow blank lines. These comment characters will be used for all standard parsers (unless delims are also specified).
  fundisp foo.fits'[TEXT(comchars="!\n")]'

cols="[name1:type1 ...]"
Specify names and data type of columns. This overrides header names and/or data types in the first data row or specified names and data types for header-less tables.
  fundisp foo.fits'[TEXT(cols="x:I,y:I,pha:I,pi:I,time:D,dx:E,dy:e")]'

If the column specifier is the only keyword, then the cols= is not required (in analogy with EVENTS()):

  fundisp foo.fits'[TEXT(x:I,y:I,pha:I,pi:I,time:D,dx:E,dy:e)]'
A index is allowed in this case:
  fundisp foo.fits'[TEXT(2,x:I,y:I,pha:I,pi:I,time:D,dx:E,dy:e)]'

eot="[eot delim]"
Specify end of table string specifier for multi-table files. RDB files support ^L. The end of table specifier is a string and the whole string must be found alone on a line to signify EOT. For example:
  fundisp foo.fits'[TEXT(eot="END")]' 
will end the table when a line contains "END" is found. Multiple lines are supported, so that:
  fundisp foo.fits'[TEXT(eot="END\nGAME")]'
will end the table when a line contains "END" followed by a line containing "GAME".

In the absence of an EOT delimiter, a new table will be sensed when a new header (all alphanumeric columns) is found.

null1="[datatype]"
Specify data type of a single null value in row 1. Since column data types are determined by the first row, a null value in that row will result in an error and a request to specify names and data types using cols=. If you only have a one null in row 1, you don't need to specify all names and columns. Instead, use null1="type" to specify its data type.

alen=[n]
Specify size in bytes to save for ASCII type columns. FITS binary tables only support fixed length ASCII columns and so a size value must be specified. The default is 16 bytes.

nullvalues=["true"|"false"]
Specify whether to expect null values. Give the parsers a hint as to whether null values should be allowed. The default is to try to determine this from the data.

whitespace=["true"|"false"]
Specify whether surrounding white space should be kept as part of string tokens. By default surrounding white space is removed from tokens.

header=["true"|"false"]
Specify whether to require a header. This is needed by tables containing all string columns (and with no row containing dashes), in order to be able to tell whether the first row is a header or part of the data. The default is false, meaning that the first row will be data. If a row dashes are present, the previous row is considered the column name row.

units=["true"|"false"]
Specify whether to require a units line. Give the parsers a hint as to whether a row specifying units should be allowed. The default is to try to determine this from the data.

i2f=["true"|"false"]
Specify whether to allow int to float conversions. If a column in row 1 contains an integer value, the data type for that column will be set to int. If a subsequent row contains a float in that column, an error will be signaled. This flag specifies that, instead of an error, the float should be silently truncated to int. Usually, you will want an error to be signaled, so that you can specify the data type using cols= (or by editing the column in row 1).

comeot=["true"|"false"|0|1|2]
Specify whether comment signifies end of table. If comeot is 0 or false, then comments do not signify end of table and can be interspersed with data rows. If the value is true or 1 (the default for standard parsers), then non-blank lines (e.g. lines beginning with '#') signify end of table but blanks are allowed between rows. If the value is 2, then all comments, including blank lines, signify end of table.

debug=["true"|"false"]
Specify display debugging information during parsing.

Environment Variables

Environment variables are defined to allow many of these values to be set without having to include them in TEXT() every time a file is processed:

  keyword	environment variable
  -------	--------------------
  delims	TEXT_DELIMS
  comchars	TEXT_COMCHARS
  cols		TEXT_COLUMNS
  eot		TEXT_EOT
  null1		TEXT_NULL1
  alen		TEXT_ALEN
  bincols	TEXT_BINCOLS

Restrictions

As with raw event files, the '+' (copy extensions) specifier is not supported for programs such as funtable.

Go to Funtools Help Index

Last updated: February 4, 2005