Besides the regular UNIX editors and utilities, a good way to view the data of course, would be to use the NoSQL operator that prints such datafiles: 'nsq-pr' (named after the 'pr' UNIX utility).
The relation, or table structure is achieved by separating the columns with ASCII TAB characters, and terminating the rows with ASCII NEWLINE characters. That is, each row of data in a file contains the data values (a data field) separated by TAB characters and terminated with a NEWLINE character. Therefore a fundamental rule is that data values must NOT contain TAB characters.
The first section of the file, called the header, contains the file structure information used by the operators. The header also contains optional embedded documentation relating to the entire datafile (table documentation) and/or each data column (column documentation). The rest of the file, called the body, contains the actual data values. A file of data, so structured, is said to be an 'rdbtable'.
The header consists of two or more lines. There is an optional number (zero or more) of lines of table documentation followed by exactally two lines that contain the structure information: the column name row and the column definition row. The table documentation lines start with either a sharp sign (#) followed by a space character, or one or more space characters followed by a sharp sign (#). The rest of each line may contain any documentation desired. Note that the table documentation lines are the only lines in an rdbtable that are not required to conform to the table structure defined above. The fields in the column name row contain the names of each column. The fields in the column definition row contain the data definitions and optional column documentation for each column.
The column names are case sensitive, i.e. 'COUNT' is different from 'Count'. The guideline for characters that may be used in column names is that alphabetic, numeric, and non-alphanumeric characters that are not special to the UNIX shell are good choices. Column names must include at least one alphabetic character. It is highly recommended (but not required) that column names start with an alphabetic or numeric character.
Non-alphanumeric characters that are acceptable in column names are the percent sign (%) colon (:) at sign (@) equals (=) comma (,) and dot (.). The sharp sign (#) underscore (_) and dash (-) characters may also be used but they must not be the first character in a column name. The TAB character must never be used in column names, nor should internal spaces or UNIX I/O redirection characters (<,>,|) be used.
The data definitions include column width, data type, and justification. The column width must be explicitly specified; the others are optional and are frequently specified by default.
The data definitions are specified by adjacent characters in a single word. The width of each field is specified by a numeric count. The type of data is "string", "numeric", or "month". The types are specified by an 'S', 'N', or 'M' respectively, and the default is type string. Printout justification is 'left', or 'right', and is specified by an '<' or '>' character respectively. If not specified, data types string and month will be left justified and type numeric will be right justified.
Note that column width is used primarily by the operator 'nsq-pr' and in no way limits the actual data size. It is not an error if some actual data in a column is wider than the defined width; a listing produced with 'nsq-pr' may be out of alignment however.
The optional documentation for each column follows the data definition word in the field. There must be one or more space characters after the data definition word and before the column documentation; the column documentation may be as long as necessary. Note that the data definition and the optional column documentation are contained in a single field in the row.
If the column name and/or column definition rows contain much information and/or column documentation they can become long and confusing to read. However the operators 'nsq-valid' and 'nsq-headchg' have options to print the header contents as a 'template' file, an organized list of information about the header.
A sample rdbtable (named SAMPLE) that will be used in later examples is shown in Table 1. The picture in Table 1 is for illustrative purposes; what the file would actually look like is shown in Table 2, where a TAB character is represented by '<T>' and a NEWLINE character is represented by '<N>'.
Table 1 rdbtable (SAMPLE) # Table documentation lines. These describe and # identify the rdbtable contents. # They may be read by many normal UNIX utilities, # which is useful to easily identify a file. # May also contain RCS or SCCS control information. NAME COUNT TYP AMT OTHER RIGHT 6 5N 3 5N 8 8> Bush 44 A 133 Another This Hansen 44 A 23 One Is Jones 77 X 77 Here On Perry 77 B 244 And The Hart 77 D 1111 So Right Holmes 65 D 1111 On Edge Table 2 rdbtable (SAMPLE) actual content # Table documentation lines. These describe and<N> # identify the rdbtable contents.<N> # They may be read by many normal UNIX utilities,<N> # which is useful to easily identify a file.<N> # May also contain RCS or SCCS control information.<N> NAME<T>COUNT<T>TYP<T>AMT<T>OTHER<T>RIGHT<N> 6<T>5N<T>3<T>5N<T>8<T>8><N> Bush<T>44<T>A<T>133<T>Another<T>This<N> Hansen<T>44<T>A<T>23<T>One<T>Is<N> Jones<T>77<T>X<T>77<T>Here<T>On<N> Perry<T>77<T>B<T>244<T>And<T>The<N> Hart<T>77<T>D<T>1111<T>So<T>Right<N> Holmes<T>65<T>D<T>1111<T>On<T>Edge<N>
It is important to note that only actual data is stored in the data fields, with no leading or trailing space characters. This fact can (and usually does) have a major effect on the size of the resulting datafiles (rdbtables) compared to data stored in "fixed field width" systems. The datafiles in NoSQL are almost always smaller, sometimes dramatically smaller.
Besides NoSQL there are other UNIX DBMS's, both commercial and free, that are based on ASCII tables. A commercial implementation is /rdb, by Revolutionary Software, while among the free ones there are Starbase, developed at the Harvard Smithsonian Astrophysical Observatory, and Gunnar Stefansson's reldb, a collection of interesting tools available at sites that bring archives of the comp.sources.unix Usenet newsgroup.
The ASCII table format of those database engines is very close to that of NoSQL, therefore data can easily be converted back and forth between them and NoSQL. To help with that, NoSQL provides a few simple convertion filters, namely nsq-n2r, nsq-r2n, nsq-tabletolist and nsq-listtotable.
Here is what the basic /rdb and Starbase table format look like :
Table 1a Starbase table (SAMPLE) Table documentation lines. These describe and identify the rdbtable contents. They may be read by many normal UNIX utilities, which is useful to easily identify a file. May also contain RCS or SCCS control information. NAME COUNT TYP AMT ---- ----- --- --- Bush 44 A 133 Hansen 44 A 23 Jones 77 X 77 Perry 77 B 244 Hart 77 D 1111 Holmes 65 D 1111
As with the NoSQL format, the actual table contents are:
Table 2a Starbase table (SAMPLE) actual content Table documentation lines. These describe and<N> identify the rdbtable contents.<N> They may be read by many normal UNIX utilities,<N> which is useful to easily identify a file.<N> May also contain RCS or SCCS control information.<N> <N> NAME<T>COUNT<T>TYP<T>AMT<N> ----<T>-----<T>---<T>---<N> Bush<T>44<T>A<T>133<N> Hansen<T>44<T>A<T>23<N> Jones<T>77<T>X<T>77<N> Perry<T>77<T>B<T>244<N> Hart<T>77<T>D<T>1111<N> Holmes<T>65<T>D<T>1111<N>
And here is its corresponding list format:
NAME Bush COUNT 44 TYP A AMT 133 NAME Hansen COUNT 44 TYP A AMT 23 NAME Jones COUNT 77 TYP X AMT 77 NAME Perry COUNT 77 TYP B AMT 244 NAME Hart COUNT 77 TYP D AMT 1111 NAME Holmes COUNT 65 TYP D AMT 1111