previousTable of Contentsnext
 

Chapter 3: Tutorials

Serás la clau que obre tots els panys,
seràs la llum, la llum il.limitada,
seràs confí on l'aurora comença,
seràs forment, escala il.luminada!
—M'aclame a tu
Lyrics: Vicent Andrés i Estellés
Music: Ovidi Montllor

This chapter consists of a series of simple yet comprehensive tutorials that will enable you to understand PyTables' main features. If you would like more information about some particular instance variable, global function, or method, look at the doc strings or go to the library reference in chapter 4. If you are reading this in PDF or HTML formats, follow the corresponding hyperlink near each newly introduced entity.

Please note that throughout this document the terms column and field will be used interchangeably, as will the terms row and record.

3.1 Getting started

In this section, we will see how to define our own records in Python and save collections of them (i.e. a table) into a file. Then we will select some of the data in the table using Python cuts and create numarray arrays to store this selection as separate objects in a tree.

In examples/tutorial1-1.py you will find the working version of all the code in this section. Nonetheless, this tutorial series has been written to allow you reproduce it in a Python interactive console. I encourage you to do parallel testing and inspect the created objects (variables, docs, children objects, etc.) during the course of the tutorial!

3.1.1 Importing tables objects

Before starting you need to import the public objects in the tables package. You normally do that by executing:

>>> import tables
	  

This is the recommended way to import tables if you don't want to pollute your namespace. However, PyTables has a very reduced set of first-level primitives, so you may consider using the alternative:

>>> from tables import *
	  

which will export in your caller application namespace the following objects: openFile, isHDF5, isPyTablesFile and IsDescription. This is a rather reduced set of objects, and for convenience, we will use this technique to access them.

If you are going to work with numarray or Numeric arrays (and normally, you will) you will also need to import objects from them. So most PyTables programs begin with:

>>> import tables        # but in this tutorial we use "from tables import *"
>>> from numarray import *  # or "from Numeric import *"
	  

3.1.2 Declaring a Column Descriptor

Now, imagine that we have a particle detector and we want to create a table object in order to save data retrieved from it. You need first to define the table, the number of columns it has, what kind of object is contained in each column, and so on.

Our particle detector has a TDC (Time to Digital Converter) counter with a dynamic range of 8 bits and an ADC (Analogical to Digital Converter) with a range of 16 bits. For these values, we will define 2 fields in our record object called TDCcount and ADCcount. We also want to save the grid position in which the particle has been detected, so we will add two new fields called grid_i and grid_j. Our instrumentation also can obtain the pressure and energy of the particle. The resolution of the pressure-gauge allows us to use a simple-precision float to store pressure readings, while the energy value will need a double-precision float. Finally, to track the particle we want to assign it a name to identify the kind of the particle it is and a unique numeric identifier. So we will add two more fields: name will be a string of up to 16 characters, and idnumber will be an integer of 64 bits (to allow us to store records for extremely large numbers of particles).

Having determined our columns and their types, we can now declare a new Particle class that will contain all this information:

>>> class Particle(IsDescription):
...     name      = StringCol(16)   # 16-character String
...     idnumber  = Int64Col()      # Signed 64-bit integer
...     ADCcount  = UInt16Col()     # Unsigned short integer
...     TDCcount  = UInt8Col()      # unsigned byte
...     grid_i    = Int32Col()      # integer
...     grid_j    = IntCol()        # integer (equivalent to Int32Col)
...     pressure  = Float32Col()    # float  (single-precision)
...     energy    = FloatCol()      # double (double-precision)
...
>>>
	  

This definition class is self-explanatory. Basically, you declare a class variable for each field you need. As its value you assign an instance of the appropriate Col subclass, according to the kind of column defined (the data type, the length, the shape, etc). See the section 4.12.2 for a complete description of these subclasses. See also appendix A for a list of data types supported by the Col constructor.

From now on, we can use Particle instances as a descriptor for our detector data table. We will see later on how to pass this object to construct the table. But first, we must create a file where all the actual data pushed into our table will be saved.

3.1.3 Creating a PyTables file from scratch

Use the first-level openFile (see 4.1.2) function to create a PyTables file:

>>> h5file = openFile("tutorial1.h5", mode = "w", title = "Test file")
	  

openFile (see 4.1.2) is one of the objects imported by the "from tables import *" statement. Here, we are saying that we want to create a new file in the current working directory called "tutorial1.h5" in "w"rite mode and with an descriptive title string ("Test file"). This function attempts to open the file, and if successful, returns the File (see 4.2) object instance h5file. The root of the object tree is specified in the instance's root attribute.

3.1.4 Creating a new group

Now, to better organize our data, we will create a group called detector that branches from the root node. We will save our particle data table in this group.

>>> group = h5file.createGroup("/", 'detector', 'Detector information')
	  

Here, we have taken the File instance h5file and invoked its createGroup method (see 4.2.2) to create a new group called detector branching from "/" (another way to refer to the h5file.root object we mentioned above). This will create a new Group (see4.3) object instance that will be assigned to the variable group.

3.1.5 Creating a new table

Let's now create a Table (see 4.5) object as a branch off the newly-created group. We do that by calling the createTable (see 4.2.2) method of the h5file object:

>>> table = h5file.createTable(group, 'readout', Particle, "Readout example")
	  

We create the Table instance under group. We assign this table the node name "readout". The Particle class declared before is the description parameter (to define the columns of the table) and finally we set "Readout example" as the Table title. With all this information, a new Table instance is created and assigned to the variable table.

If you are curious about how the object tree looks right now, simply print the File instance variable h5file, and examine the output:

>>> print h5file
Filename: 'tutorial1.h5' Title: 'Test file' Last modif.: 'Sun Jul 27 14:00:13 2003'
/ (Group) 'Test file'
/detector (Group) 'Detector information'
/detector/readout (Table(0,)) 'Readout example'

	  

As you can see, a dump of the object tree is displayed. It's easy to see the Group and Table objects we have just created. If you want more information, just type the variable containing the File instance:

>>> h5file
File(filename='tutorial1.h5', title='Test file', mode='w', trMap={}, rootUEP='/')
/ (Group) 'Test file'
/detector (Group) 'Detector information'
/detector/readout (Table(0,)) 'Readout example'
  description := {
    "ADCcount": Col('UInt16', shape=1, itemsize=2, dflt=0),
    "TDCcount": Col('UInt8', shape=1, itemsize= 1, dflt=0),
    "energy": Col('Float64', shape=1, itemsize=8, dflt=0.0),
    "grid_i": Col('Int32', shape=1, itemsize=4, dflt=0),
    "grid_j": Col('Int32', shape=1, itemsize=4, dflt=0),
    "idnumber": Col('Int64', shape=1, itemsize=8, dflt=0),
    "name": Col('CharType', shape=1, itemsize=16, dflt=None),
    "pressure": Col('Float32', shape=1, itemsize=4, dflt=0.0) }
  byteorder := little

	  

More detailed information is displayed about each object in the tree. Note how Particle, our table descriptor class, is printed as part of the readout table description information. In general, you can obtain much more information about the objects and their children by just printing them. That introspection capability is very useful, and I recommend that you use it extensively.

The time has come to fill this table with some values. First we will get a pointer to the Row (see 4.5.4) instance of this table instance:

>>> particle = table.row
	  

The row attribute of table points to the Row instance that will be used to write data rows into the table. We write data simply by assigning the Row instance the values for each row as if it were a dictionary (although it is actually an extension class), using the column names as keys.

Below is an example of how to write rows:

>>> for i in xrange(10):
...     particle['name']  = 'Particle: %6d' % (i)
...     particle['TDCcount'] = i % 256
...     particle['ADCcount'] = (i * 256) % (1 << 16)
...     particle['grid_i'] = i
...     particle['grid_j'] = 10 - i
...     particle['pressure'] = float(i*i)
...     particle['energy'] = float(particle['pressure'] ** 4)
...     particle['idnumber'] = i * (2 ** 34)
...     particle.append()
...
>>>
	  

This code should be easy to understand. The lines inside the loop just assign values to the different columns in the Row instance particle (see 4.5.4). A call to its append() method writes this information to the table I/O buffer.

After we have processed all our data, we should flush the table's I/O buffer if we want to write all this data to disk. We achieve that by calling the table.flush() method.

>>> table.flush()
	  

3.1.6 Reading (and selecting) data in a table

Ok. We have our data on disk, and now we need to access it and select from specific columns the values we are interested in. See the example below:

>>> table = h5file.root.detector.readout
>>> pressure = [ x['pressure'] for x in table.iterrows()
...              if x['TDCcount']>3 and 20<=x['pressure']<50 ]
>>> pressure
[25.0, 36.0, 49.0]
	  

The first line creates a "shortcut" to the readout table deeper on the object tree. As you can see, we use the natural naming schema to access it. We also could have used the h5file.getNode() method, as we will do later on.

You will recognize the last two lines as a Python list comprehension. It loops over the rows in table as they are provided by the table.iterrows() iterator (see 4.5.2). The iterator returns values until all the data in table is exhausted. These rows are filtered using the expression:
	      x['TDCcount'] > 3 and x['pressure'] <50
	    
We select the value of the pressure column from filtered records to create the final list and assign it to pressure variable.

We could have used a normal for loop to accomplish the same purpose, but I find comprehension syntax to be more compact and elegant.

Let's select the name column for the same set of cuts:

>>> names=[ x['name'] for x in table if x['TDCcount']>3 and 20<=x['pressure']<50 ]
>>> names
['Particle:      5', 'Particle:      6', 'Particle:      7']
	  

Note how we have omitted the iterrows() call in the list comprehension. The Table class has an implementation of the special method __iter__() that iterates over all the rows in the table. In fact, iterrows() internally calls this special __iter__() method. Accessing all the rows in a table using this method is very convenient, especially when working with the data interactively.

That's enough about selections. The next section will show you how to save these select results to a file.

3.1.7 Creating new array objects

In order to separate the selected data from the mass of detector data, we will create a new group columns branching off the root group. Afterwards, under this group, we will create two arrays that will contain the selected data. First, we create the group:

>>> gcolumns = h5file.createGroup(h5file.root, "columns", "Pressure and Name")
	  

Note that this time we have specified the first parameter using natural naming (h5file.root) instead of with an absolute path string ("/").

Now, create the first of the two Array objects we've just mentioned:

>>> h5file.createArray(gcolumns, 'pressure', array(pressure),
...                     "Pressure column selection")
/columns/pressure (Array(3,)) 'Pressure column selection'
  type = Float64
  itemsize = 8
  flavor = 'NumArray'
  byteorder = 'little'
	  

We already know the first two parameters of the createArray (see 4.2.2) methods (these are the same as the first two in createTable): they are the parent group where Array will be created and the Array instance name. The third parameter is the object we want to save to disk. In this case, it is a Numeric array that is built from the selection list we created before. The fourth parameter is the title.

Now, we will save the second array. It contains the list of strings we selected before: we save this object as-is, with no further conversion.

>>> h5file.createArray(gcolumns, 'name', names, "Name column selection")
/columns/name Array(4,) 'Name column selection'
  type = 'CharType'
  itemsize = 16
  flavor = 'List'
  byteorder = 'little'
	  

As you can see, createArray() accepts names (which is a regular Python list) as an object parameter. Actually, it accepts a variety of different regular objects (see 4.2.2) as parameters. The flavor attribute (see the output above) saves the original kind of object that was saved. Based on this flavor, PyTables will be able to retrieve exactly the same object from disk later on.

Note that in these examples, the createArray method returns an Array instance that is not assigned to any variable. Don't worry, this is intentional to show the kind of object we have created by displaying its representation. The Array objects have been attached to the object tree and saved to disk, as you can see if you print the complete object tree:

>>> print h5file
Filename: 'tutorial1.h5' Title: 'Test file' Last modif.: 'Sun Jul 27 14:00:13 2003'
/ (Group) 'Test file'
/columns (Group) 'Pressure and Name'
/columns/name (Array(3,)) 'Name column selection'
/columns/pressure (Array(3,)) 'Pressure column selection'
/detector (Group) 'Detector information'
/detector/readout (Table(10,)) 'Readout example'

	  

3.1.8 Closing the file and looking at its content

To finish this first tutorial, we use the close method of the h5file File object to close the file before exiting Python:

>>> h5file.close()
>>> ^D
	  

You have now created your first PyTables file with a table and two arrays. You can examine it with any generic HDF5 tool, such as h5dump or h5ls. Here is what the tutorial1.h5 looks like when read with the h5ls program:

$ h5ls -rd tutorial1.h5
/columns                 Group
/columns/name            Dataset {3}
    Data:
        (0) "Particle:      5", "Particle:      6", "Particle:      7"
/columns/pressure        Dataset {3}
    Data:
        (0) 25, 36, 49
/detector                Group
/detector/readout        Dataset {10/Inf}
    Data:
        (0) {0, 0, 0, 0, 10, 0, "Particle:      0", 0},
        (1) {256, 1, 1, 1, 9, 17179869184, "Particle:      1", 1},
        (2) {512, 2, 256, 2, 8, 34359738368, "Particle:      2", 4},
        (3) {768, 3, 6561, 3, 7, 51539607552, "Particle:      3", 9},
        (4) {1024, 4, 65536, 4, 6, 68719476736, "Particle:      4", 16},
        (5) {1280, 5, 390625, 5, 5, 85899345920, "Particle:      5", 25},
        (6) {1536, 6, 1679616, 6, 4, 103079215104, "Particle:      6", 36},
        (7) {1792, 7, 5764801, 7, 3, 120259084288, "Particle:      7", 49},
        (8) {2048, 8, 16777216, 8, 2, 137438953472, "Particle:      8", 64},
        (9) {2304, 9, 43046721, 9, 1, 154618822656, "Particle:      9", 81}
	  

Here's the outputs as displayed by the "ptdump" PyTables utility (located in utils/ directory):

$ ptdump tutorial1.h5
Filename: 'tutorial1.h5' Title: 'Test file' Last modif.: 'Sun Jul 27 14:40:51 2003'
/ (Group) 'Test file'
/columns (Group) 'Pressure and Name'
/columns/name (Array(3,)) 'Name column selection'
/columns/pressure (Array(3,)) 'Pressure column selection'
/detector (Group) 'Detector information'
/detector/readout (Table(10,)) 'Readout example'

	  

You can pass the -v or -d options to ptdump if you want more verbosity. Try them out!

3.2 Browsing the object tree and appending to tables

In this section, we will learn how to browse the tree and retrieve meta-information about the actual data, then append some rows to an existing table to show how table objects can be enlarged.

In examples/tutorial1-2.py you will find the working version of all the code in this section. As before, you are encouraged to use a python shell and inspect the object tree during the course of the tutorial.

3.2.1 Traversing the object tree

Let's start by opening the file we created in last tutorial section.

>>> h5file = openFile("tutorial1.h5", "a")
	  

This time, we have opened the file in "a"ppend mode. We use this mode to add more information to the file.

PyTables, following the Python tradition, offers powerful introspection capabilities, i.e. you can easily ask information about any component of the object tree as well as search the tree.

To start with, you can get a preliminary overview of the object tree by simply printing the existing File instance:

>>> print h5file
Filename: 'tutorial1.h5' Title: 'Test file' Last modif.: 'Sun Jul 27 14:40:51 2003'
/ (Group) 'Test file'
/columns (Group) 'Pressure and Name'
/columns/name (Array(3,)) 'Name column selection'
/columns/pressure (Array(3,)) 'Pressure column selection'
/detector (Group) 'Detector information'
/detector/readout (Table(10,)) 'Readout example'

	  

It looks like all of our objects are there. Now let's make use of the File iterator to see to list all the nodes in the object tree:

>>> for node in h5file:
...   print node
...
/ (Group) 'Test file'
/columns (Group) 'Pressure and Name'
/detector (Group) 'Detector information'
/columns/name (Array(3,)) 'Name column selection'
/columns/pressure (Array(3,)) 'Pressure column selection'
/detector/readout (Table(10,)) 'Readout example'
	  

We can use the walkGroups method (see 4.2.2) of the File class to list only the groups on tree:

>>> for group in h5file.walkGroups("/"):
...   print group
...
/ (Group) 'Test file'
/columns (Group) 'Pressure and Name'
/detector (Group) 'Detector information'
	  

Note that walkGroups() actually returns an iterator, not a list of objects. Using this iterator with the listNodes() method is a powerful combination. Let's see an example listing of all the arrays in the tree:

>>> for group in h5file.walkGroups("/"):
...     for array in h5file.listNodes(group, classname = 'Array'):
...         print array
...
/columns/name Array(3,) 'Name column selection'
/columns/pressure Array(3,) 'Pressure column selection'
	  

listNodes() (see 4.2.2) returns a list containing all the nodes hanging off a specific Group. If the classname keyword is specified, the method will filter out all instances which are not descendants of the class. We have asked for only Array instances.

We can combine both calls by using the walkNodes(where, classname) special method of the File object (see 4.2.2). For example:

>>> for array in h5file.walkNodes("/", "Array"):
...   print array
...
/columns/name (Array(3,)) 'Name column selection'
/columns/pressure (Array(3,)) 'Pressure column selection'
	  

This is a nice shortcut when working interactively.

Finally, we will list all the Leaf, i.e. Table and Array instances (see 4.4 for detailed information on Leaf class), in the /detector group. Note that only one instance of the Table class (i.e. readout) will be selected in this group (as should be the case):

>>> for leaf in h5file.root.detector._f_walkNodes('Leaf'):
...   print leaf
...
/detector/readout (Table(10,)) 'Readout example'
	  

We have used a call to the Group._f_walkNodes(classname, recursive) method (4.3.2), using the natural naming path specification.

Of course you can do more sophisticated node selections using these powerful methods. But first, let's take a look at some important PyTables object instance variables.

3.2.2 Setting and getting user attributes

PyTables provides an easy and concise way to complement the meaning of your node objects on the tree by using the AttributeSet class (see section 4.11). You can access this object through the standard attribute attrs in Leaf nodes and _v_attrs in Group nodes.

For example, let's imagine that we want to save the date indicating when the data in /detector/readout table has been acquired, as well as the temperature during the gathering process:

>>> table = h5file.root.detector.readout
>>> table.attrs.gath_date = "Wed, 06/12/2003 18:33"
>>> table.attrs.temperature = 18.4
>>> table.attrs.temp_scale = "Celsius"
	  

Now, let's set a somewhat more complex attribute in the /detector group:

>>> detector = h5file.root.detector
>>> detector._v_attrs.stuff = [5, (2.3, 4.5), "Integer and tuple"]
	  

Note how the AttributeSet instance is accessed with the _v_attrs attribute because detector is a Group node. In general, you can save any standard Python data structure as an attribute node. See section 4.11 for a more detailed explanation of how they are serialized for export to disk.

Retrieving the attributes is equally simple:

>>> table.attrs.gath_date
'Wed, 06/12/2003 18:33'
>>> table.attrs.temperature
18.399999999999999
>>> table.attrs.temp_scale
'Celsius'
>>> detector._v_attrs.stuff
[5, (2.2999999999999998, 4.5), 'Integer and tuple']
	  

You can probably guess how to delete attributes:

>>> del table.attrs.gath_date
	  

If you want to examine the current complete attribute set of /detector/table, you can print its representation (try hitting the TAB key twice if you are on a Unix Python console with the rlcompleter module active):

>>> table.attrs
/detector/readout (AttributeSet), 14 attributes:
   [CLASS := 'TABLE',
    FIELD_0_NAME := 'ADCcount',
    FIELD_1_NAME := 'TDCcount',
    FIELD_2_NAME := 'energy',
    FIELD_3_NAME := 'grid_i',
    FIELD_4_NAME := 'grid_j',
    FIELD_5_NAME := 'idnumber',
    FIELD_6_NAME := 'name',
    FIELD_7_NAME := 'pressure',
    NROWS := 10,
    TITLE := 'Readout example',
    VERSION := '2.0',
    tempScale := 'Celsius',
    temperature := 18.399999999999999]
	  

You can get a list of only the user or system attributes with the _f_list() method.

>>> print table.attrs._f_list("user")
['temp_scale', 'temperature']
>>> print table.attrs._f_list("sys")
['CLASS', 'FIELD_0_NAME', 'FIELD_1_NAME', 'FIELD_2_NAME', 'FIELD_3_NAME',
 'FIELD_4_NAME', 'FIELD_5_NAME', 'FIELD_6_NAME', 'FIELD_7_NAME', 'NROWS',
 'TITLE', 'VERSION']
	  

You can also rename attributes:

>>> table.attrs._f_rename("temp_scale","tempScale")
>>> print table.attrs._f_list()
['tempScale', 'temperature']
	  

However, you can't set, delete or rename read-only attributes:

>>> table.attrs._f_rename("VERSION", "version")
Traceback (most recent call last):
  File ">stdin>", line 1, in ?
  File "/home/falted/PyTables/pytables-0.7/tables/AttributeSet.py", 
  line 249, in _f_rename
    raise RuntimeError, \
RuntimeError: Read-only attribute ('VERSION') cannot be renamed
	  

If you would terminate your session now, you would be able to use the h5ls command to read the /detector/readout attributes from the file written to disk:

$ h5ls -vr tutorial1.h5/detector/readout
Opened "tutorial1.h5" with sec2 driver.
/detector/readout        Dataset {10/Inf}
    Attribute: CLASS     scalar
        Type:      6-byte null-terminated ASCII string
        Data:  "TABLE"
    Attribute: VERSION   scalar
        Type:      4-byte null-terminated ASCII string
        Data:  "2.0"
    Attribute: TITLE     scalar
        Type:      16-byte null-terminated ASCII string
        Data:  "Readout example"
    Attribute: FIELD_0_NAME scalar
        Type:      9-byte null-terminated ASCII string
        Data:  "ADCcount"
    Attribute: FIELD_1_NAME scalar
        Type:      9-byte null-terminated ASCII string
        Data:  "TDCcount"
    Attribute: FIELD_2_NAME scalar
        Type:      7-byte null-terminated ASCII string
        Data:  "energy"
    Attribute: FIELD_3_NAME scalar
        Type:      7-byte null-terminated ASCII string
        Data:  "grid_i"
    Attribute: FIELD_4_NAME scalar
        Type:      7-byte null-terminated ASCII string
        Data:  "grid_j"
    Attribute: FIELD_5_NAME scalar
        Type:      9-byte null-terminated ASCII string
        Data:  "idnumber"
    Attribute: FIELD_6_NAME scalar
        Type:      5-byte null-terminated ASCII string
        Data:  "name"
    Attribute: FIELD_7_NAME scalar
        Type:      9-byte null-terminated ASCII string
        Data:  "pressure"
    Attribute: tempScale scalar
        Type:      8-byte null-terminated ASCII string
        Data:  "Celsius"
    Attribute: temperature {1}
        Type:      native double
        Data:  18.4
    Attribute: NROWS     {1}
        Type:      native int
        Data:  10
    Location:  0:1:0:1952
    Links:     1
    Modified:  2003-07-24 13:59:19 CEST
    Chunks:    {2048} 96256 bytes
    Storage:   470 logical bytes, 96256 allocated bytes, 0.49% utilization
    Type:      struct {
                   "ADCcount"         +0    native unsigned short
                   "TDCcount"         +2    native unsigned char
                   "energy"           +3    native double
                   "grid_i"           +11   native int
                   "grid_j"           +15   native int
                   "idnumber"         +19   native long long
                   "name"             +27   16-byte null-terminated ASCII string
                   "pressure"         +43   native float
               } 47 bytes
 

	  

Attributes are a useful mechanism to add persistent (meta) information to your data.

3.2.3 Getting object metadata

Each object in PyTables has metadata information about the data in the file. Normally this metainformation is accessible through the node instance variables. Let's take a look at some examples:

>>> print "Object:", table
Object: /detector/readout Table(10,) 'Readout example'
>>> print "Table name:", table.name
Table name: readout
>>> print "Table title:", table.title
Table title: Readout example
>>> print "Number of rows in table:", table.nrows
Number of rows in table: 10
>>> print "Table variable names with their type and shape:"
Table variable names with their type and shape:
>>> for name in table.colnames:
...   print name, ':= %s, %s' % (table.coltypes[name], table.colshapes[name])
...
ADCcount := UInt16, 1
TDCcount := UInt8, 1
energy := Float64, 1
grid_i := Int32, 1
grid_j := Int32, 1
idnumber := Int64, 1
name := CharType, 1
pressure := Float32, 1
	  

Here, the name, title, nrows, colnames, coltypes and colshapes attributes (see 4.2.1 for a complete attribute list) of the Table object gives us quite a bit of information about the table data.

You can interactively retrieve general information about the public objects in PyTables by printing their internal doc strings:

>>> print table.__doc__
Represent a table in the object tree.
    It provides methods to create new tables or open existing ones, as
    well as to write/read data to/from table objects over the
    file. A method is also provided to iterate over the rows without
    loading the entire table or column in memory.

    Data can be written or read both as Row instances or as numarray
    (NumArray or RecArray) objects.
    
    Methods:

        __getitem__(key)
        __iter__()
        __setitem__(key, value)
        append(rows)
        flushRowsToIndex()
        iterrows(start, stop, step)
        itersequence(sequence)
        modifyRows(start, rows)
        modifyColumns(start, columns, names)
        read([start] [, stop] [, step] [, field [, flavor]])
        reIndex()
        reIndexDirty()
        removeRows(start, stop)
        removeIndex(column)
        where(condition [, start] [, stop] [, step])
        whereIndexed(condition [, start] [, stop] [, step])
        whereInRange(condition [, start] [, stop] [, step])
        getWhereList(condition [, flavor])

    Instance variables:

        description -- the metaobject describing this table
        row -- a reference to the Row object associated with this table
        nrows -- the number of rows in this table
        rowsize -- the size, in bytes, of each row
        cols -- accessor to the columns using a natural name schema
        colnames -- the field names for the table (list)
        coltypes -- the type class for the table fields (dictionary)
        colshapes -- the shapes for the table fields (dictionary)
        colindexed -- whether the table fields are indexed (dictionary)
        indexed -- whether or not some field in Table is indexed
        indexprops -- properties of an indexed Table. Exists only
            if the Table is indexed

	  

The help function is also a handy way to see PyTables reference documentation online. Try it yourself with other object docs:

>>> help(table.__class__)
>>> help(table.removeRows)
	  

To examine metadata in the /columns/pressure Array object:

>>> pressureObject = h5file.getNode("/columns", "pressure")
>>> print "Info on the object:", repr(pressureObject)
Info on the object: /columns/pressure (Array(3,)) 'Pressure column selection'
  type = Float64
  itemsize = 8
  flavor = 'NumArray'
  byteorder = 'little'
>>> print "  shape: ==>", pressureObject.shape
  shape: ==> (3,)
>>> print "  title: ==>", pressureObject.title
  title: ==> Pressure column selection
>>> print "  type: ==>", pressureObject.type
  type: ==> Float64
	  

Observe that we have used the getNode() method of the File class to access a node in the tree, instead of the natural naming method. Both are useful, and depending on the context you will prefer one or the other. getNode() has the advantages that it can get a node from the pathname string (as in this example) and can also act as a filter to show only nodes in a particular location that are instances of class classname. In general, however, I consider natural naming to be more elegant and easier to use, especially if you are using the name completion capability present in interactive console. Try this powerful combination of natural naming and completion capabilities present in most Python consoles, and see how pleasant it is to browse the object tree (at least, as pleasant as such an activity can be).

If you look at the type attribute of the pressureObject object, you can verify that it is a "Float64" array. By looking at its shape attribute, you can deduce that the array on disk is unidimensional and has 3 elements. See 4.7.1 or the internal string docs for the complete Array attribute list.

3.2.4 Reading data from Array objects

Once you have found the desired Array, use the read() method of the Array object to retrieve its data:

>>> pressureArray = pressureObject.read()
>>> pressureArray
array([ 25.,  36.,  49.])
>>> print "pressureArray is an object of type:", type(pressureArray)
pressureArray is an object of type: <class 'numarray.numarraycore.NumArray'>
>>> nameArray = h5file.root.columns.name.read()
>>> nameArray
['Particle:      5', 'Particle:      6', 'Particle:      7']
>>> print "nameArray is an object of type:", type(nameArray)
nameArray is an object of type: <type 'list'>
>>>
>>> print "Data on arrays nameArray and pressureArray:"
Data on arrays nameArray and pressureArray:
>>> for i in range(pressureObject.shape[0]):
...   print nameArray[i], "-->", pressureArray[i]
...
Particle:      5 --> 25.0
Particle:      6 --> 36.0
Particle:      7 --> 49.0
>>> pressureObject.name
'pressure'
	  

You can see that the read() method (see section 4.7.2) returns an authentic numarray object for the pressureObject instance by looking at the output of the type() call. A read() of the nameObject object instance returns a native Python list (of strings). The type of the object saved is stored as an HDF5 attribute (named FLAVOR) for objects on disk. This attribute is then read as Array metainformation (accessible through in the Array.attrs.FLAVOR variable), enabling the read array to be converted into the original object. This provides a means to save a large variety of objects as arrays with the guarantee that you will be able to later recover them in their original form. See section 4.2.2 for a complete list of supported objects for the Array object class.

3.2.5 Appending data to an existing table

Now, let's have a look at how we can add records to an existing table on disk. Let's use our well-known readout Table object and append some new values to it:

>>> table = h5file.root.detector.readout
>>> particle = table.row
>>> for i in xrange(10, 15):
...     particle['name']  = 'Particle: %6d' % (i)
...     particle['TDCcount'] = i % 256
...     particle['ADCcount'] = (i * 256) % (1 << 16)
...     particle['grid_i'] = i
...     particle['grid_j'] = 10 - i
...     particle['pressure'] = float(i*i)
...     particle['energy'] = float(particle['pressure'] ** 4)
...     particle['idnumber'] = i * (2 ** 34)
...     particle.append()
...
>>> table.flush()
	  

It's the same method we used to fill a new table. PyTables knows that this table is on disk, and when you add new records, they are appended to the end of the table5).

If you look carefully at the code you will see that we have used the table.row attribute to create a table row and fill it with the new values. Each time that its append() method is called, the actual row is committed to the output buffer and the row pointer is incremented to point to the next table record. When the buffer is full, the data is saved on disk, and the buffer is reused again for the next cycle.

Caveat emptor: Do not forget to always call the .flush() method after a write operation, or else your tables will not be updated!

Let's have a look at some rows in the modified table and verify that our new data has been appended:

>>> for r in table.iterrows():
...     print "%-16s | %11.1f | %11.4g | %6d | %6d | %8d |" % \
...        (r['name'], r['pressure'], r['energy'], r['grid_i'], r['grid_j'],
...         r['TDCcount'])
...
...
Particle:      0 |         0.0 |           0 |      0 |     10 |        0 |
Particle:      1 |         1.0 |           1 |      1 |      9 |        1 |
Particle:      2 |         4.0 |         256 |      2 |      8 |        2 |
Particle:      3 |         9.0 |        6561 |      3 |      7 |        3 |
Particle:      4 |        16.0 |   6.554e+04 |      4 |      6 |        4 |
Particle:      5 |        25.0 |   3.906e+05 |      5 |      5 |        5 |
Particle:      6 |        36.0 |    1.68e+06 |      6 |      4 |        6 |
Particle:      7 |        49.0 |   5.765e+06 |      7 |      3 |        7 |
Particle:      8 |        64.0 |   1.678e+07 |      8 |      2 |        8 |
Particle:      9 |        81.0 |   4.305e+07 |      9 |      1 |        9 |
Particle:     10 |       100.0 |       1e+08 |     10 |      0 |       10 |
Particle:     11 |       121.0 |   2.144e+08 |     11 |     -1 |       11 |
Particle:     12 |       144.0 |     4.3e+08 |     12 |     -2 |       12 |
Particle:     13 |       169.0 |   8.157e+08 |     13 |     -3 |       13 |
Particle:     14 |       196.0 |   1.476e+09 |     14 |     -4 |       14 |
	  

3.2.6 And finally... how to delete rows from a table

We'll finish this tutorial by deleting some rows from the table we have. Suppose that we want to delete the the 5th to 9th rows (inclusive):

>>> table.removeRows(5,10)
5
	  

removeRows(start, stop) (see 4.5.2) deletes the rows in the range (start, stop). It returns the number of rows effectively removed.

We have reached the end of this first tutorial. Don't forget to close the file when you finish:

>>> h5file.close()
>>> ^D
$ 
	  

In figure 3.1 you can see a graphical view of the PyTables file with the datasets we have just created. In figure 3.2 are displayed the general properties of the table /detector/readout.

The final version of the data file for tuto... (Click for original bitmap)
Figure 3.1: The final version of the data file for tutorial 1, with a view of the data objects.
General properties of the /detector/readout... (Click for original bitmap)
Figure 3.2: General properties of the /detector/readout table.

3.3 Multidimensional table cells and automatic sanity checks

Now it's time for a more real-life example (i.e. with errors in the code). We will create two groups that branch directly from the root node, Particles and Events. Then, we will put three tables in each group. In Particles we will put tables based on the Particle descriptor and in Events, the tables based the Event descriptor.

Afterwards, we will provision the tables with a number of records. Finally, we will read the newly-created table /Events/TEvent3 and select some values from it, using a comprehension list.

Look at the next script (you can find it in examples/tutorial2.py). It appears to do all of the above, but it contains some small bugs. Note that this Particle class is not directly related to the one defined in last tutorial; this class is simpler (note, however, the multidimensional columns called pressure and temperature).

We also introduce a new manner to describe a Table as a dictionary, as you can see in the Event description. See section 4.2.2 about the different kinds of descriptor objects that can be passed to the createTable() method.

from numarray import *
from tables import *

# Describe a particle record
class Particle(IsDescription):
    name        = StringCol(length=16) # 16-character String
    lati        = IntCol()             # integer
    longi       = IntCol()             # integer
    pressure    = Float32Col(shape=(2,3)) # array of floats (single-precision)
    temperature = FloatCol(shape=(2,3))   # array of doubles (double-precision)

# Another way to describe the columns of a table
Event = {
    "name"    : Col('CharType', 16),    # 16-character String
    "TDCcount": Col("UInt8", 1),        # unsigned byte
    "ADCcount": Col("UInt16", 1),       # Unsigned short integer
    "xcoord"  : Col("Float32", 1),      # integer
    "ycoord"  : Col("Float32", 1),      # integer
    }

# Open a file in "w"rite mode
fileh = openFile("tutorial2.h5", mode = "w")
# Get the HDF5 root group
root = fileh.root
# Create the groups:
for groupname in ("Particles", "Events"):
    group = fileh.createGroup(root, groupname)
# Now, create and fill the tables in the Particles group
gparticles = root.Particles
# Create 3 new tables
for tablename in ("TParticle1", "TParticle2", "TParticle3"):
    # Create a table
    table = fileh.createTable("/Particles", tablename, Particle,
                           "Particles: "+tablename)
    # Get the record object associated with the table:
    particle = table.row
    # Fill the table with data for 257 particles
    for i in xrange(257):
        # First, assign the values to the Particle record
        particle['name'] = 'Particle: %6d' % (i)
        particle['lati'] = i 
        particle['longi'] = 10 - i
        ########### Detectable errors start here. Play with them!
        particle['pressure'] = array(i*arange(2*3), shape=(2,4))  # Incorrect
        #particle['pressure'] = array(i*arange(2*3), shape=(2,3))  # Correct
        ########### End of errors
        particle['temperature'] = (i**2)     # Broadcasting
        # This injects the Record values
        particle.append()      
    # Flush the table buffers
    table.flush()

# Now Events:
for tablename in ("TEvent1", "TEvent2", "TEvent3"):
    # Create a table in the Events group
    table = fileh.createTable(root.Events, tablename, Event,
                           "Events: "+tablename)
    # Get the record object associated with the table:
    event = table.row
    # Fill the table with data on 257 events
    for i in xrange(257):
        # First, assign the values to the Event record
        event['name']  = 'Event: %6d' % (i)
        event['TDCcount'] = i % (1<<8)   # Correct range
        ########### Detectable errors start here. Play with them!
        #event['xcoord'] = float(i**2)   # Correct spelling
        event['xcoor'] = float(i**2)     # Wrong spelling
        event['ADCcount'] = i * 2        # Correct type
        #event['ADCcount'] = "sss"          # Wrong type
        ########### End of errors
        event['ycoord'] = float(i)**4
        # This injects the Record values
        event.append()

    # Flush the buffers
    table.flush()

# Read the records from table "/Events/TEvent3" and select some
table = root.Events.TEvent3
e = [ p['TDCcount'] for p in table
      if p['ADCcount'] < 20 and 4 <= p['TDCcount'] < 15 ]
print "Last record ==>", p
print "Selected values ==>", e
print "Total selected records ==> ", len(e)
# Finally, close the file (this also will flush all the remaining buffers)
fileh.close()
	

3.3.1 Shape checking

If you look at the code carefully, you'll see that it won't work. You will get the following error:

$ python tutorial2.py
Traceback (most recent call last):
  File "tutorial2.py", line 53, in ?
    particle['pressure'] = array(i*arange(2*3), shape=(2,4))  # Incorrect
  File  "/usr/local/lib/python2.2/site-packages/numarray/numarraycore.py",
 line 281, in array
  a.setshape(shape)
  File "/usr/local/lib/python2.2/site-packages/numarray/generic.py", 
 line 530, in setshape
    raise ValueError("New shape is not consistent with the old shape")
ValueError: New shape is not consistent with the old shape
	

This error indicates that you are trying to assign an array with an incompatible shape to a table cell. Looking at the source, we see that we were trying to assign an array of shape (2,4) to a pressure element, which was defined with the shape (2,3).

In general, these kinds of operations are forbidden, with one valid exception: when you assign a scalar value to a multidimensional column cell, all the cell elements are populated with the value of the scalar. For example:

        particle['temperature'] = (i**2)    # Broadcasting
	  

The value i**2 is assigned to all the elements of the temperature table cell. This capability is provided by the numarray package and is known as broadcasting.

3.3.2 Field name checking

After fixing the previous error and rerunning the program, we encounter another error:

$ python tutorial2.py
Traceback (most recent call last):
  File "tutorial2.py", line 74, in ?
    event['xcoor'] = float(i**2)     # Wrong spelling
  File "src/hdf5Extension.pyx",
 line 1812, in hdf5Extension.Row.__setitem__
    raise KeyError, "Error setting \"%s\" field.\n %s" % \
KeyError: Error setting "xcoor" field.
 Error was: "exceptions.KeyError: xcoor"
	  

This error indicates that we are attempting to assign a value to a non-existent field in the event table object. By looking carefully at the Event class attributes, we see that we misspelled the xcoord field (we wrote xcoor instead). This is unusual behavior for Python, as normally when you assign a value to a non-existent instance variable, Python creates a new variable with that name. Such a feature can be dangerous when dealing with an object that contains a fixed list of field names. PyTables checks that the field exists and raises a KeyError if the check fails.

3.3.3 Data type checking

Finally, in order to test type checking, we will change the next line:

	    event.ADCcount = i * 2        # Correct type
	  

to read:

	    event.ADCcount = "sss"          # Wrong type
	  

This modification will cause the following TypeError exception to be raised when the script is executed:

$ python tutorial2.py
Traceback (most recent call last):
  File "tutorial2.py", line 76, in ?
    event['ADCcount'] = "sss"          # Wrong type
  File "src/hdf5Extension.pyx",
 line 1812, in hdf5Extension.Row.__setitem__
    raise KeyError, "Error setting \"%s\" field.\n %s" % \
KeyError: Error setting "ADCcount" field.
 Error was: "exceptions.TypeError: NA_setFromPythonScalar: bad value type."
	  

You can see the structure created with this (corrected) script in figure 3.3. In particular, note the multidimensional column cells in table /Particles/TParticle2.

Table hierarchy for tutorial 2. (Click for original bitmap)
Figure 3.3: Table hierarchy for tutorial 2.

Feel free to examine the rest of examples in directory examples, and try to understand them. I've written several practical sample scripts to give you an idea of the PyTables capabilities, its way of dealing with HDF5 objects, and how it can be used in the real world.


5) Note that you can append not only scalar values to tables, but also fully multidimensional array objects.

previousTable of Contentsnext