previousTable of Contentsnext
 

Chapter 4: Library Reference

PyTables implements several classes to represent the different nodes in the object tree. They are named File, Group, Leaf, Table, Array, EArray, VLArray and UnImplemented. Another one allows the user to complement the information on these different objects; its name is AttributeSet. Finally, another important class called IsDescription allows to build a Table record description by declaring a subclass of it. Many other classes are defined in PyTables, but they can be regarded as helpers whose goal is mainly to declare the data type properties of the different first class objects and will be described at the end of this chapter as well.

An important function, called openFile is responsible to create, open or append to files. In addition, a few utility functions are defined to guess if the user supplied file is a PyTables or HDF5 file. These are called isPyTablesFile and isHDF5, respectively. Finally, there exists a function called whichLibVersion that informs about the versions of the underlying C libraries (for example, the HDF5 or the Zlib).

Let's start discussing the first-level variables and functions available to the user, then the different classes defined in PyTables.

4.1 tables variables and functions

4.1.1 Global variables

__version__
The PyTables version number.
ExtVersion
The version of the Pyrex extension module. This might be useful when reporting bugs.
HDF5Version
The underlying HDF5 library version number.

4.1.2 Global functions

copyFile(srcFilename=None, dstFilename=None, title=None, filters=None, copyuserattrs=1, overwrite=0)

Copy a closed PyTables (or generic HDF5) file specified by srcFilename to dstFilename. Returns a tuple in the form (ngroups, nleaves, nbytes) specifiying the number of groups, leaves and bytes copied.

title
The title for the new file. If not specified, the source file title will be copied.
filters
A Filters instance (see 4.13.1). If specified, it will override the original filter properties in all source nodes.
copyuserattrs
You can prevent the user attributes from being copied by setting this parameter to 0. The default is to copy them.
overwrite
If dstFilename file already exists and overwrite is 1, it will be silently overwritten. The default is not overwriting.

isHDF5(filename)

Determines whether filename is in the HDF5 format or not. When successful, returns a positive value, for TRUE, or 0 (zero), for FALSE. Otherwise returns a negative value. To this function to work, it needs a closed file.

isPyTablesFile(filename)

Determines whether a file is in the PyTables format. When successful, returns the format version string, for TRUE, or 0 (zero), for FALSE. Otherwise returns a negative value. To this function to work, it needs a closed file.

openFile(filename, mode='r', title='', trMap={}, rootUEP="/", filters=None)

Open a PyTables (or generic HDF5) file and returns a File object.

filename
The name of the file (supports environment variable expansion). It is suggested that it should have any of ".h5", ".hdf" or ".hdf5" extensions, although this is not mandatory.
mode
The mode to open the file. It can be one of the following:
'r'
read-only; no data can be modified.
'w'
write; a new file is created (an existing file with the same name would be deleted).
'a'
append; an existing file is opened for reading and writing, and if the file does not exist it is created.
'r+'
is similar to 'a', but the file must already exist.
title
If filename is new, this will set a title for the root group in this file. If filename is not new, the title will be read from disk, and this will not have any effect.
trMap
A dictionary to map names in the object tree Python namespace into different HDF5 names in file namespace. The keys are the Python names, while the values are the HDF5 names. This is useful when you need to use HDF5 node names with invalid or reserved words in Python.
rootUEP
The root User Entry Point. This is a group in the HDF5 hierarchy which will be taken as the starting point to create the object tree. The group has to be named after its HDF5 name and can be a path. If it does not exist, a RuntimeError exception is issued. Use this if you do not want to build the entire object tree, but rather only a subtree of it.
filters
An instance of the Filters class (see section 4.13.1) that provides information about the desired I/O filters applicable to the leaves that hangs directly from root (unless other filters properties are specified for these leaves). Besides, if you do not specify filter properties for its child groups, they will inherit these ones. So, if you open a new file with this parameter set, all the leaves that would be created in the file will recursively inherit this filtering properties (again, if you don't prevent that from happening by specifying other filters on the child groups or leaves).

whichLibVersion(libname)

Returns info about versions of the underlying C libraries. libname can be whether "hdf5", "zlib", "lzo" or "ucl". It always returns a tuple of 3 elements. When successful, the first element of this tuple has a positive value, and is 0 (zero) when library is not available (for example LZO or UCL). In case the library is available, the second element of tuple contains the library version and the third element the date (if available) of that version.

4.2 The File class

This class is returned when a PyTables file is opened with the openFile function. It has methods to flush and close files. Also, the File class offers methods to create, rename and delete nodes, as well as to traverse the object tree. One of its attributes (rootUEP) represents the user entry point to the object tree attached to the file.

Next, we will discuss the attributes and methods for File class6).

4.2.1 File instance variables

filename
Filename opened.
format_version
The PyTables version number of this file.
isopen
It takes the value 1 if the underlying file is open. 0 otherwise.
mode
Mode in which the filename was opened.
root
The root of the object tree hierarchy. It is a Group instance.
rootUEP
The UEP (User Entry Point) group in file (see 4.1.2).
title
The title of the root group in file.
trMap
This is a dictionary that maps node names between python and HDF5 domain names. Its initial values are set from the trMap parameter passed to the openFile function. You can change its contents after a file is opened and the new map will take effect over any new object added to the tree.
filters
Container for filter properties associated to this file. See section 4.13.1 for more information on this object.
objects
Dictionary with all objects (groups or leaves) on tree.
groups
Dictionary with all object groups on tree.
leaves
Dictionary with all object leaves on tree.

4.2.2 File methods

createGroup(where, name, title='', filters=None)

Create a new Group instance with name name in where location.

where
The parent group where the new group will hang from. where parameter can be a path string (for example "/level1/group5"), or another Group instance.
name
The name of the new group.
title
A description for this group.
filters
An instance of the Filters class (see section4.13.1) that provides information about the desired I/O filters applicable to the leaves that hangs directly from this new group (unless other filters properties are specified for these leaves). Besides, if you do not specify filter properties for its child groups, they will inherit these ones.

createTable(where, name, description, title='', filters=None, expectedrows=10000)

Create a new Table instance with name name in where location.

where
The parent group where the new table will hang from. where parameter can be a path string (for example "/level1/leaf5"), or Group instance.
name
The name of the new table.
description
An instance of a user-defined class (derived from the IsDescription class) where table fields are defined. However, in certain situations, it is more handy to allow this description to be supplied as a dictionary (for example, when you do not know beforehand which structure will have your table). In such a cases, you can pass the description as a dictionary as well. See section 3.3 for an example of use. Finally, a RecArray object from the numarray package is also accepted, and all the information about columns and other metadata is used as a basis to create the Table object. Moreover, if the RecArray has actual data this is also injected on the newly created Table object.
title
A description for this object.
filters
An instance of the Filters class (see section 4.13.1) that provides information about the desired I/O filters to be applied during the life of this object.
expectedrows
An user estimate of the number of records that will be on table. If not provided, the default value is appropriate for tables until 10 MB in size (more or less). If you plan to save bigger tables you should provide a guess; this will optimize the HDF5 B-Tree creation and management process time and memory used. See section 6.1 for a discussion on that issue.

createArray(where, name, object, title='')

Create a new Array instance with name name in where location.

object
The regular array to be saved. Currently accepted values are: lists, tuples, scalars (int and float), strings and (multidimensional) Numeric and NumArray arrays (including CharArrays string arrays). However, these objects must be regular (i.e. they cannot be like, for example, [[1,2],2]). Also, objects that have some of their dimensions equal to zero are not supported (use an EArray object if you want to create an array with one of its dimensions equal to 0).

See createTable description 4.2.2 for more information on the where, name and title, parameters.

createEArray(where, name, atom, title='', filters=None, expectedrows=1000)

Create a new EArray instance with name name in where location.

atom
An Atom instance representing the shape, type and flavor of the atomic objects to be saved. One (and only one) of the shape dimensions must be 0. The dimension being 0 means that the resulting EArray object can be extended along it. Multiple enlargeable dimensions are not supported right now. See section 4.12.3 for the supported set of Atom class descendants.
expectedrows
In the case of enlargeable arrays this represents an user estimate about the number of row elements that will be added to the growable dimension in the EArray object. If not provided, the default value is 1000 rows. If you plan to create both much smaller or much bigger EArrays try providing a guess; this will optimize the HDF5 B-Tree creation and management process time and the amount of memory used.

See createTable description 4.2.2 for more information on the where, name, title, and filters parameters.

createVLArray(where, name, atom=None, title='', filters=None, expectedsizeinMB=1.0)

Create a new VLArray instance with name name in where location. See the section 4.9 for a description of the VLArray class.

atom
An Atom instance representing the shape, type and flavor of the atomic object to be saved. See section 4.12.3 for the supported set of Atom class descendants.
expectedsizeinMB
An user estimate about the size (in MB) in the final VLArray object. If not provided, the default value is 1 MB. If you plan to create both much smaller or much bigger VLA's try providing a guess; this will optimize the HDF5 B-Tree creation and management process time and the amount of memory used.

See createTable description 4.2.2 for more information on the where, name, title, and filters parameters.

getNode(where, name='', classname='')

Returns the object node name under where location.

where
Can be a path string or Group instance. If where doesn't exist or has already a child called name, a ValueError error is raised.
name
The object name desired. If name is a null string (''), or not supplied, this method assumes to find the object in where.
classname
If supplied, returns only an instance of this class name. Possible values are: 'Group', 'Leaf', 'Table', 'Array', 'EArray', 'VLArray' and 'UnImplemented'. Note that these values are strings.

getAttrNode(where, attrname, name='' )

Returns the attribute attrname under where.name location.

where
Can be a path string or Group instance. If where doesn't exist or has not a child called name, a ValueError error is raised.
attrname
The name of the attribute to get.
name
The node name desired. If name is a null string (''), or not supplied, this method assumes to find the object in where.

setAttrNode(where, attrname, attrvalue, name='')

Sets the attribute attrname with value attrvalue under where.name location.

where
Can be a path string or Group instance. If where doesn't exist or has not a child called name, a ValueError error is raised.
attrname
The name of the attribute to set on disk.
attrvalue
The value of the attribute to set. Any scalar (string, ints or floats) attribute is supported natively. However, (c)Pickle is automatically used so as to serialize other kind of objects (like lists, tuples, dicts, small Numeric/numarray objects, ...) that you might want to save.
name
The node name desired. If name is a null string (''), or not supplied, this method assumes to find the object in where.

delAttrNode(where, attrname, name = "")

Delete the attribute attrname in where.name location.

where
Can be a path string or Group instance. If where doesn't exist or has not a child called name, a ValueError error is raised.
attrname
The name of the attribute to delete on disk.
name
The node name desired. If name is a null string (''), or not supplied, this method assumes to find the object in where.

copyAttrs(where, name="", dstNode=None)

Copy the attributes from node where.name to dstNode.

where
Can be a path string or Group instance. If where doesn't exist or has not a child called name, a LookupError error is raised.
name
If name is a null string (""), or not supplied, this method assumes to find the object in where.
dstNode
This is the destination node where the attributes will be copied. It can be either a path string or a Node object.

listNodes(where, classname='')

Returns a list with all the object nodes (Group or Leaf) hanging from where. The list is alpha-numerically sorted by node name.

where
The parent group. Can be a path string or Group instance.
classname
If a classname parameter is supplied, the iterator will return only instances of this class (or subclasses of it). The only supported classes in classname are 'Group', 'Leaf', 'Table', 'Array', 'EArray', 'VLArray' and 'UnImplemented'. Note that these values are strings.

removeNode(where, name = "", recursive=0)

Removes the object node name under where location.

where
Can be a path string or Group instance. If where doesn't exist or has not a child called name, a LookupError error is raised.
name
The name of the node to be removed. If not provided, the where node is changed.
recursive
If not supplied, the object will be removed only if it has no children. If supplied with a true value, the object and all its descendants will be completely removed.

renameNode(where, newname, name)

Rename the object node name under where location.

where
Can be a path string or Group instance. If where doesn't exist or has not a child called name, a LookupError error is raised.
newname
Is the new name to be assigned to the node.
name
The name of the node to be changed. If not provided, the where node is changed.

walkGroups(where='/')

Iterator that returns the list of Groups (not Leaves) hanging from where. If where is not supplied, the root object is taken as origin. The returned Group list is in a top-bottom order, and alpha-numerically sorted when they are at the same level.

where
The origin group. Can be a path string or Group instance.

walkNodes(where="/", classname="")

Recursively iterate over the nodes in the File instance. It takes two parameters:

where
If supplied, the iteration starts from this group.
classname
(String) If supplied, only instances of this class are returned.

Example of use:

	      # Recursively print all the nodes hanging from '/detector'
	      print "Nodes hanging from group '/detector':"
	      for node in h5file.walkNodes("/detector"):
	          print node
	    

copyChildren(whereSrc, whereDst, recursive=0, filters=None, copyuserattrs=1, start=0, stop=None, step=1, overwrite = 0)

Copy (recursively) the children of a group into another location. Returns a tuple in the form (ngroups, nleaves, nbytes) specifiying the number of groups, leaves and bytes copied.

whereSrc
The parent group where the children to be copied are hanging on. This parameter can be a path string (for example "/level1/group5"), or another Group instance.
whereDst
The parent group where the source children will be copied to. This group must exist or a LookupError will be issued. This parameter can be a path string (for example "/level1/group6"), or another Group instance.
recursive
Specifies whether the copy should recurse into subgroups or not. The default is not recurse.
filters
Whether or not override the original filter properties present in source nodes. This parameter must be an instance of the Filters class (see section4.13.1). The default is to copy the filter attributes from source children.
copyuserattrs
You can prevent the user attributes from being copied by setting this parameter to 0. The default is to copy them.
start, stop, step
Specifies the range of rows in child leaves to be copied; the default is to copy all the rows.
overwrite
Whether the possible existing children hanging from whereDst and having the same names than whereSrc children should overwrite the destination nodes or not.

copyFile(dstFilename=None, title=None, filters=None, copyuserattrs=1, overwrite=0)

Copy the contents of this file to dstFilename. If the filename already exists it won't be overwritten unless overwrite is set to true (see later). Returns a tuple in the form (ngroups, nleaves, nbytes) specifiying the number of groups, leaves and bytes copied.

title
The title for the new file. If not specified, the source file title will be copied.
filters
Whether or not override the original filter properties present in source nodes. This parameter must be an instance of the Filters class (see section4.13.1). The default is to copy the filter attributes from source children.
copyuserattrs
You can prevent the user attributes from being copied by setting this parameter to 0. The default is to copy them.
copyuserattrs
You can prevent the user attributes from being copied by setting this parameter to 0. The default is to copy them.
overwrite
Whether overwrite or not the possibly existing dstFilename file. The default is not overwrite it.

flush()

Flush all the leaves in the object tree.

close()

Flush all the leaves in object tree and close the file.

4.2.3 File special methods

Following are described the methods that automatically trigger actions when a File instance is accessed in a special way.

__iter__()

Iterate over the children on the File instance. However, this does not accept parameters. This iterator is recursive.

Example of use:

	      # Recursively list all the nodes in the object tree
	      h5file = tables.openFile("vlarray1.h5")
	      print "All nodes in the object tree:"
	      for node in h5file:
	          print node
	    

__str__()

Prints a short description of the File object.

Example of use:

>>> f=tables.openFile("data/test.h5")
>>> print f
data/test.h5 (File) 'Table Benchmark'
Last modif.: 'Mon Sep 20 12:40:47 2004'
Object Tree:
/ (Group) 'Table Benchmark'
/tuple0 (Table(100L,)) 'This is the table title'
/group0 (Group) ''
/group0/tuple1 (Table(100L,)) 'This is the table title'
/group0/group1 (Group) ''
/group0/group1/tuple2 (Table(100L,)) 'This is the table title'
/group0/group1/group2 (Group) ''
	    

__repr__()

Prints a detailed description of the File object.

4.3 The Group class

Instances of this class are a grouping structure containing instances of zero or more groups or leaves, together with supporting metadata.

Working with groups and leaves is similar in many ways to working with directories and files, respectively, in a Unix filesystem. As with Unix directories and files, objects in the object tree are often described by giving their full (or absolute) path names. This full path can be specified either as a string (like in '/group1/group2') or as a complete object path written in natural name schema (like in
file.root.group1.group2) as discussed in the section 1.2.

A collateral effect of the natural naming schema is that you must be aware when assigning a new attribute variable to a Group object to not collide with existing children node names. For this reason and to not pollute the children namespace, it is explicitly forbidden to assign "normal" attributes to Group instances, and the only ones allowed must start with some reserved prefixes, like "_f_" (for methods) or "_v_" (for instance variables) prefixes. Any attempt to assign a new attribute that does not starts with these prefixes, will raise a NameError exception.

Other effect is that you cannot use reserved Python names or other non-allowed python names (like for example "$a" or "44") as node names. You can, however, make use of the trMap (translation map dictionary) parameter in the openFile function (see section 4.1.2) in order to use non-valid Python names as node names in the file.

4.3.1 Group instance variables

_v_title
A description for this group.
_v_name
The name of this group.
_v_hdf5name
The name of this group in HDF5 file namespace.
_v_pathname
A string representation of the group location in tree.
_v_parent
The parent Group instance.
_v_rootgroup
Pointer to the root group object.
_v_file
Pointer to the associated File object.
_v_depth
The depth level in tree for this group.
_v_nchildren
The number of children (groups or leaves) hanging from this instance.
_v_children
Dictionary with all nodes (groups or leaves) hanging from this instance.
_v_groups
Dictionary with all node groups hanging from this instance.
_v_leaves
Dictionary with all node leaves hanging from this instance.
_v_attrs
The associated AttributeSet instance (see 4.11).
_v_filters
Container for filter properties. See section 4.13.1 for more information on this object.

4.3.2 Group methods

This class define the __setattr__, __getattr__ and __delattr__ and they work as normally intended. So, you can access, assign or delete children to a group by just using the next constructs:
	      # Add a Table child instance under group with name "tablename"
	      group.tablename = Table(recordDict, "Record instance")
	      table = group.tablename     # Get the table child instance
	      del group.tablename         # Delete the table child instance
	    

Caveat: The following methods are documented for completeness, and they can be used without any problem. However, you should use the high-level counterpart methods in the File class, because these are most used in documentation and examples, and are a bit more powerful than those exposed here.

_f_join(name)

Helper method to correctly concatenate a name child object with the pathname of this group.

_f_rename(newname)

Change the name of this group to newname.

_f_remove(recursive=0)

Remove this object. If recursive is true, force the removal even if this group has children.

_f_getAttr(attrname)

Gets the HDF5 attribute attrname of this group.

_f_setAttr(attrname, attrvalue)

Sets the attribute attrname of this group to the value attrvalue. Any scalar (string, ints or floats) attribute is supported natively. However, (c)Pickle is automatically used so as to serialize other kind of objects (like lists, tuples, dicts, small Numeric/numarray objects, ...) that you might want to save.

_f_delAttr(attrname)

Delete the attribute attrname of this group.

_f_listNodes(classname='')

Returns a list with all the object nodes hanging from this instance. The list is alpha-numerically sorted by node name. If a classname parameter is supplied, it will only return instances of this class (or subclasses of it). The supported classes in classname are 'Group', 'Leaf', 'Table' and 'Array', 'EArray', 'VLArray' and 'UnImplemented'.

_f_walkGroups()

Iterate over the list of Groups (not Leaves) hanging from self. The returned Group list is in a top-bottom order, and alpha-numerically sorted when they are at the same level.

_f_walkNodes(classname="", recursive=0)

Iterate over the nodes in the Group instance. It takes two parameters:

classname
(String) If supplied, only instances of this class are returned.
recursive
(Integer) If false, only children hanging immediately after the group are returned. If true, a recursion over all the groups hanging from it is performed.

Example of use:

	      # Recursively print all the arrays hanging from '/'
	      print "Arrays the object tree '/':"
	      for array in h5file.root._f_walkNodes("Array", recursive=1):
	          print array
	    

_f_close()

Close this group, making it and its children unaccessible in the object tree.

_f_copyChildren(where, recursive=0, filters=None, copyuserattrs=1, start=0, stop=None, step=1, overwrite=0)

Copy (recursively) the children of this group into another location specified by where (it can be a path string or a Group object). Returns a tuple in the form (ngroups, nleaves, nbytes) specifiying the number of groups, leaves and bytes copied.

recursive
Specifies whether the copy should recurse into subgroups or not. The default is not recurse.
filters
Whether or not override the original filter properties present in source nodes. This parameter must be an instance of the Filters class (see section4.13.1). The default is to copy the filter attributes from source children.
copyuserattrs
You can prevent the user attributes from being copied by setting this parameter to 0. The default is to copy them.
start, stop, step
Specifies the range of rows in child leaves to be copied; the default is to copy all the rows.
overwrite
Whether the possible existing children hanging from this group and having the same names than where children should overwrite the destination nodes or not.

4.3.3 Group special methods

Following are described the methods that automatically trigger actions when a Group instance is accessed in a special way.

__iter__()

Iterate over the children on the group instance. However, this does not accept parameters. This iterator is not recursive.

Example of use:

	      # Non-recursively list all the nodes hanging from '/detector'
	      print "Nodes in '/detector' group:"
	      for node in h5file.root.detector:
	          print node
	    

__str__()

Prints a short description of the Group object.

Example of use:

>>> f=tables.openFile("data/test.h5")
>>> print f.root.group0
/group0 (Group) 'First Group'
>>>
	    

__repr__()

Prints a detailed description of the Group object.

Example of use:

>>> f=tables.openFile("data/test.h5")
>>> f.root.group0
/group0 (Group) 'First Group'
  children := ['tuple1' (Table), 'group1' (Group)]
>>>
	    

4.4 The Leaf class

The goal of this class is to provide a place to put common functionality of all its descendants as well as provide a way to help classifying objects on the tree. A Leaf object is an end-node, that is, a node that can hang directly from a group object, but that is not a group itself and, thus, it cannot have descendents. Right now, the set of end-nodes is composed by Table, Array, EArray, VLArray and UnImplemented class instances. In fact, all the previous classes inherit from the Leaf class.

4.4.1 Leaf instance variables

The public variables and methods that class descendants inherits from Leaf are listed below.

name
The Leaf node name in Python namespace.
hdf5name
The Leaf node name in HDF5 namespace.
objectID
The HDF5 object ID of the Leaf node.
title
The Leaf title (actually a property rather than a plain attribute).
shape
The shape of the associated data in the Leaf.
byteorder
The byteorder of the associated data of the Leaf.
attrs
The associated AttributeSet instance (see 4.11).
filters
Container for filter properties. See section 4.13.1 for more information on this object.

Besides, the next instance variables are also defined and have similar meaning as its counterparts in the Group class:

_v_hdf5name
The name of this leaf in HDF5 file namespace.
_v_pathname
A string representation of the leaf location in tree.
_v_parent
The parent Group instance.
_v_rootgroup
Pointer to the root Group object.
_v_file
Pointer to the associated File object.
_v_depth
The depth level in tree for this leaf.

4.4.2 Leaf methods

copy(where, name, title=None, filters=None, copyuserattrs=1, start=0, stop=None, step=1, overwrite=0)

Copy this leaf into another location. It returns a tuple (object, nbytes) where object is the newly created object and nbytes is the number of bytes copied. The meaning of the parameters is explained below:
where
Can be a path string or Group instance. If where doesn't exist or has not a child called name, a LookupError error is raised.
name
The name of the destination node.
title
The new title for destination. If None, the original title is copied.
filters
An instance of the Filters (see section 4.13.1) class. A None value means that the source properties are copied as is.
copyuserattrs
Whether copy the user attributes of the source leaf to the destination or not. The default is to copy them.
start, stop, step
Specifies the range of rows to be copied; the default is to copy all the rows.
overwrite
If the destination node name already exists this specifies whether it should be overwritten or not. The default is not overwrite it.

remove()

Remove this leaf.

rename(newname)

Change the name of this leaf to newname.

getAttr(attrname)

Gets the HDF5 attribute attrname of this leaf.

setAttr(attrname, attrvalue)

Sets the attribute attrname of this leaf to the value attrvalue.

delAttr(attrname)

Delete the HDF5 attribute attrname of this leaf.

flush()

Flush the leaf buffers (if any).

close()

Flush the leaf buffers (if any) and close the dataset.

4.5 The Table class

Instances of this class represents table objects in the object tree. It provides methods to read/write data and from/to table objects in the file.

Data can be read from or written to tables by accessing to an special object that hangs from Table. This object is an instance of the Row class (see 4.5.4). See the tutorial sections chapter 3 on how to use the Row interface. The columns of the tables can also be easily accessed (and more specifically, they can be read but not written) by making use of the Column class, through the use of an extension of the natural naming schema applied inside the tables. See the section 4.6 for some examples of use of this capability.

Note that this object inherits all the public attributes and methods that Leaf already has.

4.5.1 Table instance variables

description
The metaobject describing this table.
row
The Row instance for this table (see 4.5.4).
nrows
The number of rows in this table.
rowsize
The size, in bytes, of each row.
cols
A Cols (see section 4.5.5) instance that serves as accessor to Column (see section 4.6) objects.
colnames
The field names for the table (list).
coltypes
The data types for the table fields (dictionary).
colshapes
The shapes for the table fields (dictionary).
colindexed
Whether the table fields are indexed (dictionary).
indexed
Whether the table fields are indexed (dictionary).
indexprops
Properties of an indexed Table (see 4.13.2). This attribute (dictionary) exists only if the Table is indexed.

4.5.2 Table methods

append(rows=None)

Append a series of rows to this Table instance. rows is an object that can keep the rows to be append in several formats, like a RecArray, a list of tuples, list of Numeric/NumArray/CharArray objects, string, Python buffer or None (no append will result). Of course, this rows object has to be compliant with the underlying format of the Table instance or a ValueError will be issued.

Example of use:
from tables import *
class Particle(IsDescription):
    name        = StringCol(16, pos=1)   # 16-character String
    lati        = IntCol(pos=2)        # integer
    longi       = IntCol(pos=3)        # integer
    pressure    = Float32Col(pos=4)    # float  (single-precision)
    temperature = FloatCol(pos=5)      # double (double-precision)

fileh = openFile("test4.h5", mode = "w")
table = fileh.createTable(fileh.root, 'table', Particle, "A table")
# Append several rows in only one call
table.append([("Particle:     10", 10, 0, 10*10, 10**2),
              ("Particle:     11", 11, -1, 11*11, 11**2),
              ("Particle:     12", 12, -2, 12*12, 12**2)])
fileh.close()
	      

iterrows(start=None, stop=None, step=1)

Returns an iterator yielding Row (see section 4.5.4) instances built from rows in table. If a range is supplied (i.e. some of the start, stop or step parameters are passed), only the appropriate rows are returned. Else, all the rows are returned. See also the __iter__() special method in section 4.5.3 for a shorter way to call this iterator.

The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.

Example of use:

	      result = [ row['var2'] for row in table.iterrows(step=5)
                                     if row['var1'] <= 20 ]
	    

itersequence(sequence=None, sort=1)

Iterate over a sequence of row coordinates.

sequence
Can be any object that supports the __getitem__ special method, like lists, tuples, Numeric/numarray objects, etc.
sort
If true, means that sequence will be sorted out so that the I/O process would get better performance. If your sequence is already sorted or you don't want to sort it, put this parameter to 0. The default is to sort the sequence.

read(start=None, stop=None, step=1, field=None, flavor="numarray")

Returns the actual data in Table. If field is not supplied, it returns the data as a RecArray object table.

The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.

The rest of the parameters are described next:

field
If specified, only the column field is returned as a NumArray object. If this is not supplied, all the fields are selected and a RecArray is returned.
flavor
When a field in table is selected, passing a flavor parameter make an additional conversion to happen in the default "numarray" returned object. flavor must have any of the next values: "numarray" (i.e. no conversion is made), "Numeric", "Tuple" or "List".

modifyRows(start=None, stop=None, step=1, rows=None)

Modify a series of rows in the [start:stop:step] extended slice range. If you pass None to stop, all the rows existing in rows will be used.

rows can be either a RecArray object or a structure that is able to be converted to a RecArray and compliant with the table format.

Returns the number of modified rows.

It raises an ValueError in case the rows parameter could not be converted to an object compliant with table description.

It raises an IndexError in case the modification will exceed the length of the table.

modifyColumns(start=None, stop=None, step=1, columns=None, names=None)

Modify a series of rows in the [start:stop:step] extended slice row range. If you pass None to stop, all the rows existing in columns will be used.

columns can be either a RecArray or a list of arrays (the columns) that is able to be converted to a RecArray compliant with the specified column names subset of the table format.

names specifies the column names of the table to be modified.

Returns the number of modified rows.

It raises an ValueError in case the columns parameter could not be converted to an object compliant with table description.

It raises an IndexError in case the modification will exceed the length of the table.

removeRows(start=None, stop=None)

Removes a range of rows in the table. If only start is supplied, this row is to be deleted. If a range is supplied, i.e. both the start and stop parameters are passed, all the rows in the range are removed. A step parameter is not supported, and it is not foreseen to implement it anytime soon.

start
Sets the starting row to be removed. It accepts negative values meaning that the count starts from the end. A value of 0 means the first row.
stop
Sets the last row to be removed to stop - 1, i.e. the end point is omitted (in the Python range tradition). It accepts, likewise start, negative values. A special value of None (the default) means removing just the row supplied in start.

removeIndex(index=None)

Remove the index associated with the specified column. Only Index instances (see 4.13.3) are accepted as parameter. This index can be recreated again by calling the createIndex (see 4.6.2) method of the appropriate Column object.

flushRowsToIndex()

Add remaining rows in buffers to non-dirty indexes. This can be useful when you have chosen non-automatic indexing for the table (see section 4.13.2) and want to update the indexes on it.

reIndex()

Recompute all the existing indexes in table. This can be useful when you suspect that, for any reason, the index information for columns is no longer valid and want to rebuild the indexes on it.

reIndexDirty()

Recompute the existing indexes in table, but only if they are dirty. This can be useful when you have set the reindex parameter to 0 in IndexProps constructor (see 4.13.2) for the table and want to update the indexes after a invalidating index operation (Table.removeRows, for example).

where(condition, start=None, stop=None, step=None)

Returns an iterator yielding Row (see section 4.5.4) instances built from rows in table that satisfy a condition over a column. If the column to which the condition is applied is indexed, this index will be used in order to accelerate the search. Else, the in-kernel iterator (with better performance than the regular iterator) will be choosed instead.

Moreover, if a range is supplied (i.e. some of the start, stop or step parameters are passed), only the rows in that range and fullfilling the condition are returned. Else, all the rows that fullfill the condition are returned.

The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows that fullfill the condition are selected.

You can mix this method with regular selections in order to have complex queries. It is strongly recommended that you pass the most restrictive condition as the parameter to this method if you want to achieve maximum performance.

Example of use:

              passvalues=[]
              for row in table.iterrows(0>table.cols.col1<0.3, step=5):
                  if row['var1'] <= 20:
                      passvalues.append(row['var2']
              print "Values that passes the cuts:", passvalues
	    

See also the whereIndexed and whereInRange methods below for more specific ways to call this iterator.

whereIndexed(condition, start=None, stop=None, step=None)

Iterator that selects values fulfilling the condition parameter. This only works for conditions over a indexed column. If you try to use it over non-indexed column, an AssertionError will be raised.

The meaning of the condition, start, stop and step parameters is the same as in the where method (see 4.5.2) described above.

whereInRange(condition, start=None, stop=None, step=None)

Iterator that selects values fulfilling the condition parameter. This method will use the in-kernel search method, i.e. it won't take advantage of a possible indexed column.

The meaning of the condition, start, stop and step parameters is the same as in the where method (see 4.5.2) described above.

getWhereList(condition, flavor="List")

Get the row coordinates that fulfill the condition param. This method will take advantage of an indexed column to speed-up the search.

flavor is the desired type of the returned list. It can take the 'List' (the default), 'Tuple' or 'NumArray' values.

4.5.3 Table special methods

Following are described the methods that automatically trigger actions when a Table instance is accessed in a special way (e.g., table["var2"] will be equivalent to a call to table.__getitem__("var2")).

__iter__()

It returns the same iterator than Table.iterrows(0,0,1). However, this does not accept parameters.

Example of use:

	      result = [ row['var2'] for row in table 
                                     if row['var1'] <= 20 ]
	    

Which is equivalent to:

	      result = [ row['var2'] for row in table.iterrows() 
                                     if row['var1'] <= 20 ]
	    

__getitem__(key)

It takes different actions depending on the type of the key parameter:

key is an Integer
The corresponding table row is returned as a RecArray.Record object.
key is a Slice
The row slice determined by key is returned as a RecArray object.
key is a String
The key is interpreted as a column name of the table, and, if it exists, it is read and returned as a NumArray or CharArray object (whatever is appropriate).

Example of use:

	      record = table[4]
	      recarray = table[4:1000:2]
	      narray = table["var2"]
	    

Which is equivalent to:

	      record = table.read(start=4)[0]
	      recarray = table.read(start=4, stop=1000, step=2)
	      narray = table.read(field="var2")
	    

__setitem__(key, value)

It takes different actions depending on the type of the key parameter:

key is an Integer
The corresponding table row is set to value. value must be a List or Tuple capable of being converted to the table field format.
key is a Slice
The row slice determined by key is set to value. value must be a RecArray object or a list of rows capable of being converted to the table field format.

Example of use:

	      # Modify just one existing row
	      table[2] = [456,'db2',1.2]
	      # Modify two existing rows
	      rows = numarray.records.array([[457,'db1',1.2],[6,'de2',1.3]],
	                                    formats="i4,a3,f8")
	      table[1:3:2] = rows
	    

Which is equivalent to:

	      table.modifyRows(start=2, [456,'db2',1.2])
	      rows = numarray.records.array([[457,'db1',1.2],[6,'de2',1.3]],
	                                    formats="i4,a3,f8")
	      table.modifyRows(start=1, step=2, rows)
	    

4.5.4 The Row class

This class is used to fetch and set values on the table fields. It works very much like a dictionary, where the keys are the field names of the associated table and the values are the values of those fields in a specific row.

This object turns out to actually be an extension type, so you won't be able to access its documentation interactively. Neither you won't be able to access its internal attributes (they are not directly accessible from Python), although accessors (i.e. methods that return an internal attribute) have been defined for some important variables.

Row methods

append()
Once you have filled the proper fields for the current row, calling this method actually commits these data to the disk (actually data are written to the output buffer).
nrow()
Accessor that returns the current row number in the table. It is useful to know which row is being dealt with in the middle of a loop.
getTable()
Accessor that returns the associated Table object.

4.5.5 The Cols class

This class is used as an accessor to the table columns following the natural name convention, so that you can access the different columns because there exist one attribute with the name of the columns for each associated Column instances. Besides, and like the Row class, it works similar to a dictionary, where the keys are the column names of the associated table and the values are Column instances. See section 4.6 for examples of use.

4.6 The Column class

Each instance of this class is associated with one column of every table. These instances are mainly used to fetch and set actual data from the table columns, but there are a few other associated methods to deal with indexes.

4.6.1 Column instance variables

table
The parent Table instance.
name
The name of the associated column.
type
The data type of the column.
index
The associated Index object (see 4.13.3) to this column (None if doesn't exist).
dirty
Whether the index is dirty or not (property).

4.6.2 Column methods

createIndex()

Create an Index (see 4.13.3) object for this column.

reIndex()

Recompute the index associated with this column. This can be useful when you suspect that, for any reason, the index information is no longer valid and want to rebuild it.

reIndexDirty()

Recompute the existing index only if it is dirty. This can be useful when you have set the reindex parameter to 0 in IndexProps constructor (see 4.13.2) for the table and want to update the column's index after a invalidating index operation (Table.removeRows, for example).

removeIndex()

Delete the associated column's index. After doing that, you will loose the indexation information on disk. However, you can always re-create it using the createIndex() method (see 4.6.2).

closeIndex()

Close the index of this column. After that, the column will look as if it has no index, although it will re-appear when the file would be re-opened later on.

4.6.3 Column special methods

__getitem__(key)

Returns a column element or slice. It takes different actions depending on the type of the key parameter:

key is an Integer
The corresponding element in the column is returned as a scalar object or as a NumArray/CharArray object, depending on its shape.
key is a Slice
The row range determined by this slice is returned as a NumArray or CharArray object (whichever is appropriate).
Example of use:
print "Column handlers:"
for name in table.colnames:
    print table.cols[name]
print
print "Some selections:"
print "Select table.cols.name[1]-->", table.cols.name[1]
print "Select table.cols.name[1:2]-->", table.cols.name[1:2]
print "Select table.cols.lati[1:3]-->", table.cols.lati[1:3]
print "Select table.cols.pressure[:]-->", table.cols.pressure[:]
print "Select table.cols['temperature'][:]-->", table.cols['temperature'][:]
	      
and the output of this for a certain arbitrary table is:
Column handlers:
/table.cols.name (Column(1,), CharType)
/table.cols.lati (Column(2,), Int32)
/table.cols.longi (Column(1,), Int32)
/table.cols.pressure (Column(1,), Float32)
/table.cols.temperature (Column(1,), Float64)

Some selections:
Select table.cols.name[1]--> Particle:     11
Select table.cols.name[1:2]--> ['Particle:     11']
Select table.cols.lati[1:3]--> [[11 12]
 [12 13]]
Select table.cols.pressure[:]--> [  90.  110.  132.]
Select table.cols['temperature'][:]--> [ 100.  121.  144.]
	      
See the examples/table2.py for a more complete example.

__setitem__(key, value)

It takes different actions depending on the type of the key parameter:

key is an Integer
The corresponding element in the column is set to value. value must be a scalar or NumArray/CharArray, depending on column's shape.
key is a Slice
The row slice determined by key is set to value. value must be a list of elements or a NumArray/CharArray.

Example of use:

	      # Modify row 1
	      table.cols.col1[1] = -1
	      # Modify rows 1 and 3
	      table.cols.col1[1::2] = [2,3]
	    

Which is equivalent to:

	      # Modify row 1
	      table.modifyColumns(start=1, columns=[[-1]], names=["col1"])
	      # Modify rows 1 and 3
	      columns = numarray.records.fromarrays([[2,3]], formats="i4")
	      table.modifyColumns(start=1, step=2, columns=columns, names=["col1"])
	    

4.7 The Array class

Represents an array on file. It provides methods to write/read data to/from array objects in the file. This class does not allow you to enlarge the datasets on disk; see the EArray descendant in section 4.8 if you want enlargeable dataset support and/or compression features.

The array data types supported are the same as the set provided by Numeric and numarray. For details of these data types see appendix A, or the numarray reference manual ().

Note that this object inherits all the public attributes and methods that Leaf already provides.

4.7.1 Array instance variables

flavor
The object representation for this array. It can be any of "NumArray", "CharArray" "Numeric", "List", "Tuple", "String", "Int" or "Float" values.
nrows
The length of the first dimension of Array.
nrow
On iterators, this is the index of the current row.
type
The type class of the represented array.
itemsize
The size of the base items. Specially useful for CharArray objects.

4.7.2 Array methods

Note that, as this object has no internal I/O buffers, it is not necessary to use the flush() method inherited from Leaf in order to save its internal state to disk. When a writing method call returns, all the data is already on disk.

iterrows(start=None, stop=None, step=1)

Returns an iterator yielding numarray instances built from rows in array. The return rows are taken from the first dimension in case of an Array instance and the enlargeable dimension in case of an EArray instance. If a range is supplied (i.e. some of the start, stop or step parameters are passed), only the appropriate rows are returned. Else, all the rows are returned. See also the and __iter__() special methods in section 4.7.3 for a shorter way to call this iterator.

The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.

Example of use:

	      result = [ row for row in arrayInstance.iterrows(step=4) ]
	    

read(start=None, stop=None, step=1)

Read the array from disk and return it as a numarray (default) object, or an object with the same original flavor that it was saved. It accepts start, stop and step parameters to select rows (the first dimension in the case of an Array instance and the enlargeable dimension in case of an EArray) for reading.

The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.

4.7.3 Array special methods

Following are described the methods that automatically trigger actions when an Array instance is accessed in a special way (e.g., array[2:3,...,::2] will be equivalent to a call to
array.__getitem__(slice(2,3, None), Ellipsis, slice(None, None, 2))).

__iter__()

It returns the same iterator than Array.iterrows(0,0,1). However, this does not accept parameters.

Example of use:

	      result = [ row[2] for row in array ]

	    

Which is equivalent to:

	      result = [ row[2] for row in array.iterrows(0, 0, 1) ]
	    

__getitem__(key)

It returns a numarray (default) object (or an object with the same original flavor that it was saved) containing the slice of rows stated in the key parameter. The set of allowed tokens in key is the same as extended slicing in python (the Ellipsis token included).

Example of use:

	      array1 = array[4]   # array1.shape == array.shape[1:]
	      array2 = array[4:1000:2]  # len(array2.shape) == len(array.shape)
	      array3 = array[::2, 1:4, :]
	      array4 = array[1, ..., ::2, 1:4, 4:] # General slice selection
	    

__setitem__(key, value)

Sets an Array element, row or extended slice. It takes different actions depending on the type of the key parameter:

key is an integer:
The corresponding row is assigned to value. If needed, this value is broadcasted to fit the specified row.
key is a slice:
The row slice determined by it is assigned to value. If needed, this value is broadcasted to fit in the desired range. If the slice to be updated exceeds the actual shape of the array, only the values in the existing range are updated, i.e. the index error will be silently ignored. If value is a multidimensional object, then its shape must be compatible with the slice specified in key, otherwhise, a ValueError will be issued.

Example of use:

	      a1[0] = 333       # Assign an integer to a Integer Array row
	      a2[0] = "b"       # Assign a string to a string Array row
	      a3[1:4] = 5       # Broadcast 5 to slice 1:4
	      a4[1:4:2] = "xXx" # Broadcast "xXx" to slice 1:4:2
	      # General slice update (a5.shape = (4,3,2,8,5,10)
	      a5[1, ..., ::2, 1:4, 4:] = arange(1728, shape=(4,3,2,4,3,6))
	    

4.8 The EArray class

This is a child of the Array class (see 4.7) and as such, EArray represents an array on the file. The difference is that EArray allows to enlarge datasets along any single dimension7) you select. Another important difference is that it also supports compression.

So, in addition to the attributes and methods that EArray inherits from Array, it supports a few more that provide a way to enlarge the arrays on disk. Following are described the new variables and methods as well as some that already exist in Array but that differ somewhat on the meaning and/or functionality in the EArray context.

4.8.1 EArray instance variables

atom
The class instance chosen for the atom object (see section 4.12.3).
extdim
The enlargeable dimension.
nrows
The length of the enlargeable dimension.

4.8.2 EArray methods

append(object)

Appends an object to the underlying dataset. Obviously, this object has to have the same type as the EArray instance; otherwise a TypeError is issued. In the same way, the dimensions of the object have to conform to those of EArray, that is, all the dimensions have to be the same except, of course, that of the enlargeable dimension which can be of any length (even 0!).

Example of use (code available in examples/earray1.py):

import tables
from numarray import strings

fileh = tables.openFile("earray1.h5", mode = "w")
a = tables.StringAtom(shape=(0,), length=8)
# Use 'a' as the object type for the enlargeable array
array_c = fileh.createEArray(fileh.root, 'array_c', a, "Chars")
array_c.append(strings.array(['a'*2, 'b'*4], itemsize=8))
array_c.append(strings.array(['a'*6, 'b'*8, 'c'*10], itemsize=8))

# Read the string EArray we have created on disk
for s in array_c:
    print "array_c[%s] => '%s'" % (array_c.nrow, s)
# Close the file
fileh.close()
	    

and the output is:

	      array_c[0] => 'aa'
	      array_c[1] => 'bbbb'
	      array_c[2] => 'aaaaaa'
	      array_c[3] => 'bbbbbbbb'
	      array_c[4] => 'cccccccc'
	    

4.9 The VLArray class

Instances of this class represents array objects in the object tree with the property that their rows can have a variable number of (homogeneous) elements (called atomic objects, or just atoms). Variable length arrays (or VLA's for short), similarly to Table instances, can have only one dimension, and likewise Table, the compound elements (the atoms) of the rows of VLArrays can be fully multidimensional objects.

VLArray provides methods to read/write data from/to variable length array objects residents on disk. Also, note that this object inherits all the public attributes and methods that Leaf already has.

4.9.1 VLArray instance variables

atom
The class instance chosen for the atom object (see section 4.12.3).
nrow
On iterators, this is the index of the current row.
nrows
The total number of rows.

4.9.2 VLArray methods

append(object1, object2, ...)

Append the objects passed as parameters to a single row in the VLArray instance. The type of the objects has to be compliant with the VLArray.atom instance type.

Example of use (code available in examples/vlarray1.py):

	      import tables
	      from Numeric import *   # or, from numarray import *

	      # Create a VLArray:
	      fileh = tables.openFile("vlarray1.h5", mode = "w")
	      vlarray = fileh.createVLArray(fileh.root, 'vlarray1',
	      tables.Int32Atom(flavor="Numeric"),
	                       "ragged array of ints", Filters(complevel=1))
	      # Append some (variable length) rows
	      # All these different parameter specification are accepted:
	      vlarray.append(array([5, 6]))
	      vlarray.append(array([5, 6, 7]))
	      vlarray.append([5, 6, 9, 8])
	      vlarray.append(5, 6, 9, 10, 12)

	      # Now, read it through an iterator
	      for x in vlarray:
	          print vlarray.name+"["+str(vlarray.nrow)+"]-->", x

	      # Close the file
	      fileh.close()
	    

And the output for this looks like:

	      vlarray1[0]--> [5 6]
	      vlarray1[1]--> [5 6 7]
	      vlarray1[2]--> [5 6 9 8]
	      vlarray1[3]--> [ 5  6  9 10 12]
	    

iterrows(start=None, stop=None, step=1)

Returns an iterator yielding one row per iteration. If a range is supplied (i.e. some of the start, stop or step parameters are passed), only the appropriate rows are returned. Else, all the rows are returned. See also the __iter__() special methods in section 4.9.3 for a shorter way to call this iterator.

The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.

Example of use:

	      for row in vlarray.iterrows(step=4):
	          print vlarray.name+"["+str(vlarray.nrow)+"]-->", row
	    

read(start=None, stop=None, step=1)

Returns the actual data in VLArray. As the lengths of the different rows are variable, the returned value is a python list, with as many entries as specified rows in the range parameters.

The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.

4.9.3 VLArray special methods

Following are described the methods that automatically trigger actions when a VLArray instance is accessed in a special way (e.g., vlarray[2:5] will be equivalent to a call to vlarray.__getitem__(slice(2,5,None)).

__iter__()

It returns the same iterator than VLArray.iterrows(0,0,1). However, this does not accept parameters.

Example of use:

	      result = [ row for row in vlarray ]
	    

Which is equivalent to:

	      result = [ row for row in vlarray.iterrows() ]
	    

__getitem__(key)

It returns the slice of rows determined by key, which can be an integer index or an extended slice. The returned value is a list of objects of type array.atom.type.

Example of use:

	      list1 = vlarray[4]
	      list2 = vlarray[4:1000:2]
	    

__setitem__(keys, value)

Updates a vlarray row described by keys by setting it to value. Depending on the value of keys, the action taken is different:

keys is an integer:
It refers to the number of row to be modified. The value object must be type and shape compatible with the object that exists in the vlarray row.
keys is a tuple:
The first element refers to the row to be modified, and the second element to the range (so, it can be an integer or an slice) of the row that will be updated. As above, the value object must be type and shape compatible with the object specified in the vlarray row and range.

Note: When updating VLStrings (codification UTF-8) or Objects atoms, there is a problem: one can only update values with exactly the same bytes than in the original row. With UTF-8 encoding this is problematic because, for instance, 'c' takes 1 byte, but 'ç' takes two. The same applies when using Objects atoms, because when cPickle applies to a class instance (for example), it does not guarantee to return the same number of bytes than over other instance, even of the same class than the former. These facts effectively limit the number of objects than can be updated in VLArrays.

Example of use:

	      vlarray[0] = vlarray[0]*2+3
	      vlarray[99,3:] = arange(96)*2+3
	      # Negative values for start and stop (but not step) are supported
	      vlarray[99,-99:-89:2] = vlarray[5]*2+3 
	    

4.10 The UnImplemented class

Instances of this class represents an unimplemented dataset in a generic HDF5 file. When reading such a file (i.e. one that has not been created with PyTables, but with some other HDF5 library based tool), chances are that the specific combination of datatypes and/or dataspaces in some dataset might not be supported by PyTables yet. In such a case, this dataset will be mapped into the UnImplemented class and hence, the user will still be able to build the complete object tree of this generic HDF5 file, as well as enabling the access (both read and write) of the attributes of this dataset and some metadata. Of course, the user won't be able to read the actual data on it.

This is an elegant way to allow users to work with generic HDF5 files despite the fact that some of its datasets would not be supported by PyTables. However, if you are really interested in having access to an unimplemented dataset, please, get in contact with the developer team.

This class does not have any public instance variables, except those inherited from the Leaf class (see 4.4).

4.11 The AttributeSet class

Represents the set of attributes of a node (Leaf or Group). It provides methods to create new attributes, open, rename or delete existing ones.

Like in Group instances, AttributeSet instances make use of the natural naming convention, i.e. you can access the attributes on disk like if they were normal AttributeSet attributes. This offers the user a very convenient way to access (but also to set and delete) node attributes by simply specifying them like a normal attribute class.

Caveat: All Python data types are supported. The scalar ones (i.e. String, Int and Float) are mapped directly to the HDF5 counterparts, so you can correctly visualize them with any HDF5 tool. However, the rest of the data types and more general objects are serialized using cPickle, so you will be able to correctly retrieve them only from a Python-aware HDF5 library. Hopefully, the list of supported native attributes will be extended to fully multidimensional arrays sometime in the future.

4.11.1 AttributeSet instance variables

_v_node
The parent node instance.
_v_attrnames
List with all attribute names.
_v_attrnamessys
List with system attribute names.
_v_attrnamesuser
List with user attribute names.

4.11.2 AttributeSet methods

Note that this class define the __setattr__, __getattr__ and __delattr__ and they work as normally intended. Any scalar (string, ints or floats) attribute is supported natively as an attribute. However, (c)Pickle is automatically used so as to serialize other kind of objects (like lists, tuples, dicts, small Numeric/numarray objects, ...) that you might want to save.

With these special methods, you can access, assign or delete attributes on disk by just using the next constructs:
	      leaf.attrs.myattr = "str attr"  # Set a string (native support)
	      leaf.attrs.myattr2 = 3          # Set an integer (native support)
	      leaf.attrs.myattr3 = [3,(1,2)]  # A generic object (Pickled)
	      attrib = leaf.attrs.myattr      # Get the attribute myattr
	      del leaf.attrs.myattr           # Delete the attribute myattr
	    
_f_copy(where)
Copy the user attributes to where object. where has to be a Group or Leaf instance.
_f_list(attrset = "user")
Return the list of attributes of the parent node. attrset selects the attribute set to be returned. An "user" value returns only the user attributes and this is the default. "sys" returns only the system (some of which are read-only) attributes. "readonly" returns the system read-only attributes. "all" returns both the system and user attributes.
_f_rename(oldattrname, newattrname)
Rename an attribute.

4.12 Declarative classes

In this section a series of classes that are meant to declare datatypes that are required for primary PyTables (like Table or VLArray ) objects are described.

4.12.1 The IsDescription class

This class is in fact a so-called metaclass object. There is nothing special on this fact, except that their subclasses attributes are transformed during its instantiation phase, and new methods for instances are defined based on the values of the class attributes.

It is designed to be used as an easy, yet meaningful way to describe the properties of Table objects through the use of classes that inherit properties from it. In order to define such a special class, you have to declare it as descendant of IsDescription, with many attributes as columns you want in your table. The name of these attributes will become the name of the columns, while its values are the properties of the columns that are obtained through the use of the Col 4.12.2 class constructor.

Then, you can pass an instance of this object to the Table constructor, where all the information it contains will be used to define the table structure. See the section 3.3 for an example on how that works.

Moreover, you can change the properties of the index creation process by using an instance of the IndexProps 4.13.2 class and assign it to a special attribute called _v_indexprops.

4.12.2 The Col class and its descendants

The Col class is used as a mean to declare the different properties of a table column. In addition, a series of descendant classes are offered in order to make these column descriptions easier to the user. In general, it is recommended to use these descendant classes, as they are more meaningful when found in the middle of the code.

Note that the only public method accessible in these classes is the constructor itself.

Col(dtype="Float64", shape=1, dflt=None, pos=None, indexed=0)
Declare the properties of a Table column.
dtype
The data type for the column. All types listed in appendix A are valid data types for columns. The type description is accepted both in string format and as a numarray data type.
shape
An integer or a tuple, that specifies the number of dtype items for each element (or shape, for multidimensional elements) of this column. For CharType columns, the last dimension is used as the length of the character strings. However, for this kind of objects, the use of StringCol subclass is strongly recommended.
dflt
The default value for elements of this column. If the user does not supply a value for an element while filling a table, this default value will be written to disk. If the user supplies an scalar value for a multidimensional column, this value is automatically broadcasted to all the elements in the column cell. If dflt is not supplied, an appropriate zero value (or null string) will be chosen by default.
pos
By default, columns are arranged in memory following an alpha-numerical order of the column names. In some situations, however, it is convenient to impose a user defined ordering. pos parameter allows the user to force the desired ordering.
indexed
Whether this column should be indexed for better performance in table selections.
StringCol(length=None, dflt=None, shape=1, pos=None, indexed=0)
Declare a column to be of type CharType. The length parameter sets the length of the strings. The meaning of the other parameters are like in the Col class.
BoolCol(dflt=0, shape=1, pos=None, indexed=0)
Define a column to be of type Bool. The meaning of the parameters are the same of those in the Col class.
IntCol(dflt=0, shape=1, itemsize=4, sign=1, pos=None, indexed=0)
Declare a column to be of type IntXX, depending on the value of itemsize parameter, that sets the number of bytes of the integers in the column. sign determines whether the integers are signed or not. The meaning of the other parameters are the same of those in the Col class.

This class has several descendants:

Int8Col(dflt=0, shape=1, pos=None, indexed=0)
Define a column of type Int8.
UInt8Col(dflt=0, shape=1, pos=None, indexed=0)
Define a column of type UInt8.
Int16Col(dflt=0, shape=1, pos=None, indexed=0)
Define a column of type Int16.
UInt16Col(dflt=0, shape=1, pos=None, indexed=0)
Define a column of type UInt16.
Int32Col(dflt=0, shape=1, pos=None, indexed=0)
Define a column of type Int32.
UInt32Col(dflt=0, shape=1, pos=None, indexed=0)
Define a column of type UInt32.
Int64Col(dflt=0, shape=1, pos=None, indexed=0)
Define a column of type Int64.
UInt64Col(dflt=0, shape=1, pos=None, indexed=0)
Define a column of type UInt64.
FloatCol(dflt=0.0, shape=1, itemsize=8, pos=None, indexed=0)
Define a column to be of type FloatXX, depending on the value of itemsize. The itemsize parameter sets the number of bytes of the floats in the column and the default is 8 bytes (double precision). The meaning of the other parameters are the same as those in the Col class.

This class has two descendants:

Float32Col(dflt=0.0, shape=1, pos=None, indexed=0)
Define a column of type Float32.
Float64Col(dflt=0.0, shape=1, pos=None, indexed=0)
Define a column of type Float64.
ComplexCol(dflt=0.+0.j, shape=1, itemsize=16, pos=None)
Define a column to be of type ComplexXX, depending on the value of itemsize. The itemsize parameter sets the number of bytes of the complex types in the column and the default is 16 bytes (double precision complex). The meaning of the other parameters are the same as those in the Col class.

This class has two descendants:

Complex32Col(dflt=0.+0.j, shape=1, pos=None)
Define a column of type Complex32.
Float64Col(dflt=0+0.j, shape=1, pos=None)
Define a column of type Complex64.

ComplexCol columns and its descendants do not support indexation.

4.12.3 The Atom class and its descendants.

The Atom class is meant to declare the different properties of the base element (also known as atom) of EArray and VLArray objects. The Atom instances have the property that their length is always the same. However, you can grow objects along the extendable dimension in the case of EArray or put a variable number of them on a VLArray row. Moreover, the atoms are not restricted to scalar values, and they can be fully multidimensional objects.

A series of descendant classes are offered in order to make the use of these element descriptions easier. In general, it is recommended to use these descendant classes, as they are more meaningful when found in the middle of the code. Note that the only public methods accessible in these classes are the atomsize() method and the constructor itself. The atomsize() method returns the total length, in bytes, of the element base atom.

A description of the different constructors with their parameters follows:

Atom(dtype="Float64", shape=1, flavor="NumArray")
Define properties for the base elements of EArray and VLArray objects.
dtype
The data type for the base element. See the appendix A for a relation of data types supported. The type description is accepted both in string format and as numarray data type.
shape
In a EArray context, it is a tuple specifing the shape of the object, and one (and only one) of its dimensions must be 0, meaning that the EArray object will be enlarged along this axis. In the case of a VLArray, it can be an integer with a value of 1 (one) or a tuple, that specifies whether the atom is an scalar (in the case of a 1) or has multiple dimensions (in the case of a tuple). For CharType elements, the last dimension is used as the length of the character strings. However, for this kind of objects, the use of StringAtom subclass is strongly recommended.
flavor
The object representation for this atom. It can be any of "CharArray" or "String" for the CharType type and "NumArray", "Numeric", "List" or "Tuple" for the rest of the types. If the specified values differs from CharArray or NumArray values, the read atoms will be converted to that specific flavor. If not specified, the atoms will remain in their native format (i.e. CharArray or NumArray).
StringAtom(shape=1, length=None, flavor="CharArray")
Define an atom to be of CharType type. The meaning of the shape parameter is the same as in the Atom class. length sets the length of the strings atoms. flavor can be whether "CharArray" or "String". Unicode strings are not supported by this type; see the VLStringAtom class if you want Unicode support (only available for VLAtom objects).
BoolAtom(shape=1, flavor="NumArray")
Define an atom to be of type Bool. The meaning of the parameters are the same of those in the Atom class.
IntAtom(shape=1, itemsize=4, sign=1, flavor="NumArray")
Define an atom to be of type IntXX, depending on the value of itemsize parameter, that sets the number of bytes of the integers that conform the atom. sign determines whether the integers are signed or not. The meaning of the other parameters are the same of those in the Atom class.

This class has several descendants:

Int8Atom(shape=1, flavor="NumArray")
Define an atom of type Int8.
UInt8Atom(shape=1, flavor="NumArray")
Define an atom of type UInt8.
Int16Atom(shape=1, flavor="NumArray")
Define an atom of type Int16.
UInt16Atom(shape=1, flavor="NumArray")
Define an atom of type UInt16.
Int32Atom(shape=1, flavor="NumArray")
Define an atom of type Int32.
UInt32Atom(shape=1, flavor="NumArray")
Define an atom of type UInt32.
Int64Atom(shape=1, flavor="NumArray")
Define an atom of type Int64.
UInt64Atom(shape=1, flavor="NumArray")
Define an atom of type UInt64.
FloatAtom(shape=1, itemsize=8, flavor="NumArray")
Define an atom to be of FloatXX type, depending on the value of itemsize. The itemsize parameter sets the number of bytes of the floats in the atom and the default is 8 bytes (double precision). The meaning of the other parameters are the same as those in the Atom class.

This class has two descendants:

Float32Atom(shape=1, flavor="NumArray")
Define an atom of type Float32.
Float64Atom(shape=1, flavor="NumArray")
Define an atom of type Float64.
ComplexAtom(shape=1, itemsize=16, flavor="NumArray")
Define an atom to be of ComplexXX type, depending on the value of itemsize. The itemsize parameter sets the number of bytes of the floats in the atom and the default is 16 bytes (double precision complex). The meaning of the other parameters are the same as those in the Atom class.

This class has two descendants:

Complex32Atom(shape=1, flavor="NumArray")
Define an atom of type Complex32.
Complex64Atom(shape=1, flavor="NumArray")
Define an atom of type Complex64.

Now, there come two special classes, ObjectAtom and VLString, that actually do not descend from Atom, but which goal is so similar that they should be described here. The difference between them and the Atom and descendents classes is that these special classes does not allow multidimensional atoms, nor multiple values per row. A flavor can't be specified neither as it is immutable (see below).

Caveat emptor: You are only allowed to use these classes to create VLArray objects, not EArray objects.

ObjectAtom()
This class is meant to fit any kind of object in a row of an VLArray instance by using cPickle behind the scenes. Due to the fact that you cannot foresee how long will be the output of the cPickle serialization (i.e. the atom already has a variable length), you can only fit a representant of it per row. However, you can still pass several parameters to the VLArray.append() method as they will be regarded as a tuple of compound objects (the parameters), so that we still have only one object to be saved in a single row. It does not accept parameters and its flavor is automatically set to "Object", so the reads of rows always returns an arbitrary python object. You can regard ObjectAtom types as an easy way to save an arbitrary number of generic python objects in a VLArray object.
VLStringAtom()
This class describes a row of the VLArray class, rather than an atom. It differs from the StringAtom class in that you can only add one instance of it to one specific row, i.e. the VLArray.append() method only accepts one object when the base atom is of this type. Besides, it supports Unicode strings (contrarily to StringAtom) because it uses the UTF-8 codification (this is why its atomsize() method returns always 1) when serializing to disk. It does not accept any parameter and because its flavor is automatically set to "VLString", the reads of rows always returns a python string. See the appendix C.3.4 if you are curious on how this is implemented at the low-level. You can regard VLStringAtom types as an easy way to save generic variable length strings.

See examples/vlarray1.py and examples/vlarray2.py for further examples on VLArrays, including object serialization and Unicode string management.

4.13 Helper classes

In this section are listed classes that does not fit in any other section and that mainly serves for ancillary purposes.

4.13.1 The Filters class

This class is meant to serve as a container that keeps information about the filter properties associated with the enlargeable leaves, that is Table, EArray and VLArray.

The public variables of Filters are listed below:

complevel
The compression level (0 means no compression).
complib
The compression filter used (in case of compressed dataset).
shuffle
Whether the shuffle filter is active or not.
fletcher32
Whether the fletcher32 filter is active or not.

There are no Filters public methods with the exception of the constructor itself that is described next.

Filters(complevel=0, complib="zlib", shuffle=1, fletcher32=0)

The parameters that can be passed to the Filters class constructor are:

complevel
Specifies a compress level for data. The allowed range is 0-9. A value of 0 disables compression. The default is that compression is disabled, that balances between compression effort and CPU consumption.
complib
Specifies the compression library to be used. Right now, "zlib" (default), "lzo" and "ucl" values are supported. See section 6.3 for some advice on which library is better suited to your needs.
shuffle
Whether or not to use the shuffle filter present in the HDF5 library. This is normally used to improve the compression ratio (at the cost of consuming a little bit more CPU time). A value of 0 disables shuffling and 1 makes it active. The default value depends on whether compression is enabled or not; if compression is enabled, shuffling defaults to be active, else shuffling is disabled.
fletcher32
Whether or not to use the fletcher32 filter in the HDF5 library. This is used to add a checksum on each data chunk. A value of 0 disables the checksum and it is the default.
Of course, you can also create an instance and then assign the ones you want to change. For example:
import numarray as na
from tables import *

fileh = openFile("test5.h5", mode = "w")
atom = Float32Atom(shape=(0,2))
filters = Filters(complevel=1, complib = "lzo")
filters.fletcher32 = 1
arr = fileh.createEArray(fileh.root, 'earray', atom, "A growable array",
                         filters = filters)
# Append several rows in only one call
arr.append(na.array([[1., 2.],
                     [2., 3.],
                     [3., 4.]], type=na.Float32))

# Print information on that enlargeable array
print "Result Array:"
print repr(arr)

fileh.close()
	      
This enforces the use of the LZO library, a compression level of 1 and a fletcher32 checksum filter as well. See the output of this example:
Result Array:
/earray (EArray(3L, 2), fletcher32, shuffle, lzo(1)) 'A growable array'
  type = Float32
  shape = (3L, 2)
  itemsize = 4
  nrows = 3
  extdim = 0
  flavor = 'NumArray'
  byteorder = 'little'
	      

4.13.2 The IndexProps class

You can use this class to set/unset the properties in the indexing process of a Table column. To use it, create an instance, and assign it to the special attribute _v_indexprops in a table description 4.12.1 class or dictionary.

The public variables of IndexProps are listed below:

auto
Whether an existing index should be updated or not after a table append operation.
reindex
Whether the table columns are to be re-indexed after an invalidating index operation.
filters
The filter settings for the different Table indexes.

There are no IndexProps public methods with the exception of the constructor itself that is described next.

IndexProps(auto=1, reindex=1, filters=None)

The parameters that can be passed to the IndexProps class constructor are:

auto
Specifies whether an existing index should be updated or not after a table append operation. The default is enable automatic index updates.
reindex
Specifies whether the table columns are to be re-indexed after an invalidating index operation (like for example, after a Table.removeRows call). The default is to reindex after operations that invalidate indexes.
filters
Sets the filter properties for Column indexes. It has to be an instance of the Filters (see section 4.13.1) class. A None value means that the default settings for the Filters object are selected.

4.13.3 The Index class

This class is used to keep the indexing information for table columns. It is actually a descendant of the Group class, with some added functionality.

It has no methods intented for programmer's use, but it has some attributes that maybe interesting for him.

Index instance variables

column
The column object this index belongs to.
type
The type class for the index.
itemsize
The size of the atomic items. Specially useful for columns of CharType type.
nelements
The total number of elements in index.
dirty
Whether the index is dirty or not.
sorted
The IndexArray object (see 4.13.4) with the sorted values information.
indices
The IndexArray object (see 4.13.4) with the sorted indices information.
filters
The Filters (see section 4.13.1) instance for this index.

4.13.4 The IndexArray class

This class is used to keep part of the indexing information for table columns. It is actually a descendant of the EArray class, with some added functionality.

It has no methods intented for programmer's use, and although it has some attributes with potentially useful information, all of it is accessible through Index class (see 4.13.3), so it will not be replicated here.


6) On the following, the term Leaf will refer to either a Table, Array, EArray, VLArray or UnImplemented node object.
7) In the future, multiple enlargeable dimensions might be implemented as well.

previousTable of Contentsnext