PyTables implements several classes to represent the different nodes in the object tree. They are named File, Group, Leaf, Table, Array, EArray, VLArray and UnImplemented. Another one allows the user to complement the information on these different objects; its name is AttributeSet. Finally, another important class called IsDescription allows to build a Table record description by declaring a subclass of it. Many other classes are defined in PyTables, but they can be regarded as helpers whose goal is mainly to declare the data type properties of the different first class objects and will be described at the end of this chapter as well.
An important function, called openFile is responsible to create, open or append to files. In addition, a few utility functions are defined to guess if the user supplied file is a PyTables or HDF5 file. These are called isPyTablesFile and isHDF5, respectively. Finally, there exists a function called whichLibVersion that informs about the versions of the underlying C libraries (for example, the HDF5 or the Zlib).
Let's start discussing the first-level variables and functions available to the user, then the different classes defined in PyTables.
Copy a closed PyTables (or generic HDF5) file specified by srcFilename to dstFilename. Returns a tuple in the form (ngroups, nleaves, nbytes) specifiying the number of groups, leaves and bytes copied.
Determines whether filename is in the HDF5 format or not. When successful, returns a positive value, for TRUE, or 0 (zero), for FALSE. Otherwise returns a negative value. To this function to work, it needs a closed file.
Determines whether a file is in the PyTables format. When successful, returns the format version string, for TRUE, or 0 (zero), for FALSE. Otherwise returns a negative value. To this function to work, it needs a closed file.
Open a PyTables (or generic HDF5) file and returns a File object.
Returns info about versions of the underlying C libraries. libname can be whether "hdf5", "zlib", "lzo" or "ucl". It always returns a tuple of 3 elements. When successful, the first element of this tuple has a positive value, and is 0 (zero) when library is not available (for example LZO or UCL). In case the library is available, the second element of tuple contains the library version and the third element the date (if available) of that version.
This class is returned when a PyTables file is opened with the openFile function. It has methods to flush and close files. Also, the File class offers methods to create, rename and delete nodes, as well as to traverse the object tree. One of its attributes (rootUEP) represents the user entry point to the object tree attached to the file.
Next, we will discuss the attributes and methods for File class6).
Create a new Group instance with name name in where location.
Create a new Table instance with name name in where location.
Create a new Array instance with name name in where location.
See createTable description 4.2.2 for more information on the where, name and title, parameters.
Create a new EArray instance with name name in where location.
See createTable description 4.2.2 for more information on the where, name, title, and filters parameters.
Create a new VLArray instance with name name in where location. See the section 4.9 for a description of the VLArray class.
See createTable description 4.2.2 for more information on the where, name, title, and filters parameters.
Returns the object node name under where location.
Returns the attribute attrname under where.name location.
Sets the attribute attrname with value attrvalue under where.name location.
Delete the attribute attrname in where.name location.
Copy the attributes from node where.name to dstNode.
Returns a list with all the object nodes (Group or Leaf) hanging from where. The list is alpha-numerically sorted by node name.
Removes the object node name under where location.
Rename the object node name under where location.
Iterator that returns the list of Groups (not Leaves) hanging from where. If where is not supplied, the root object is taken as origin. The returned Group list is in a top-bottom order, and alpha-numerically sorted when they are at the same level.
Recursively iterate over the nodes in the File instance. It takes two parameters:
Example of use:
# Recursively print all the nodes hanging from '/detector' print "Nodes hanging from group '/detector':" for node in h5file.walkNodes("/detector"): print node
Copy (recursively) the children of a group into another location. Returns a tuple in the form (ngroups, nleaves, nbytes) specifiying the number of groups, leaves and bytes copied.
Copy the contents of this file to dstFilename. If the filename already exists it won't be overwritten unless overwrite is set to true (see later). Returns a tuple in the form (ngroups, nleaves, nbytes) specifiying the number of groups, leaves and bytes copied.
Flush all the leaves in the object tree.
Flush all the leaves in object tree and close the file.
Following are described the methods that automatically trigger actions when a File instance is accessed in a special way.
Iterate over the children on the File instance. However, this does not accept parameters. This iterator is recursive.
Example of use:
# Recursively list all the nodes in the object tree h5file = tables.openFile("vlarray1.h5") print "All nodes in the object tree:" for node in h5file: print node
Prints a short description of the File object.
Example of use:
>>> f=tables.openFile("data/test.h5") >>> print f data/test.h5 (File) 'Table Benchmark' Last modif.: 'Mon Sep 20 12:40:47 2004' Object Tree: / (Group) 'Table Benchmark' /tuple0 (Table(100L,)) 'This is the table title' /group0 (Group) '' /group0/tuple1 (Table(100L,)) 'This is the table title' /group0/group1 (Group) '' /group0/group1/tuple2 (Table(100L,)) 'This is the table title' /group0/group1/group2 (Group) ''
Instances of this class are a grouping structure containing instances of zero or more groups or leaves, together with supporting metadata.
Working with groups and leaves is similar in many ways to
working with directories and files, respectively, in a Unix
filesystem. As with Unix directories and files, objects in
the object tree are often described by giving their full (or
absolute) path names. This full path can be specified either
as a string (like in '/group1/group2') or as a
complete object path written in natural name schema
(like in
file.root.group1.group2) as
discussed in the section 1.2.
A collateral effect of the natural naming schema is that you must be aware when assigning a new attribute variable to a Group object to not collide with existing children node names. For this reason and to not pollute the children namespace, it is explicitly forbidden to assign "normal" attributes to Group instances, and the only ones allowed must start with some reserved prefixes, like "_f_" (for methods) or "_v_" (for instance variables) prefixes. Any attempt to assign a new attribute that does not starts with these prefixes, will raise a NameError exception.
Other effect is that you cannot use reserved Python names or other non-allowed python names (like for example "$a" or "44") as node names. You can, however, make use of the trMap (translation map dictionary) parameter in the openFile function (see section 4.1.2) in order to use non-valid Python names as node names in the file.
# Add a Table child instance under group with name "tablename" group.tablename = Table(recordDict, "Record instance") table = group.tablename # Get the table child instance del group.tablename # Delete the table child instance
Caveat: The following methods are documented for completeness, and they can be used without any problem. However, you should use the high-level counterpart methods in the File class, because these are most used in documentation and examples, and are a bit more powerful than those exposed here.
Helper method to correctly concatenate a name child object with the pathname of this group.
Remove this object. If recursive is true, force the removal even if this group has children.
Sets the attribute attrname of this group to the value attrvalue. Any scalar (string, ints or floats) attribute is supported natively. However, (c)Pickle is automatically used so as to serialize other kind of objects (like lists, tuples, dicts, small Numeric/numarray objects, ...) that you might want to save.
Returns a list with all the object nodes hanging from this instance. The list is alpha-numerically sorted by node name. If a classname parameter is supplied, it will only return instances of this class (or subclasses of it). The supported classes in classname are 'Group', 'Leaf', 'Table' and 'Array', 'EArray', 'VLArray' and 'UnImplemented'.
Iterate over the list of Groups (not Leaves) hanging from self. The returned Group list is in a top-bottom order, and alpha-numerically sorted when they are at the same level.
Iterate over the nodes in the Group instance. It takes two parameters:
Example of use:
# Recursively print all the arrays hanging from '/' print "Arrays the object tree '/':" for array in h5file.root._f_walkNodes("Array", recursive=1): print array
Copy (recursively) the children of this group into another location specified by where (it can be a path string or a Group object). Returns a tuple in the form (ngroups, nleaves, nbytes) specifiying the number of groups, leaves and bytes copied.
Following are described the methods that automatically trigger actions when a Group instance is accessed in a special way.
Iterate over the children on the group instance. However, this does not accept parameters. This iterator is not recursive.
Example of use:
# Non-recursively list all the nodes hanging from '/detector' print "Nodes in '/detector' group:" for node in h5file.root.detector: print node
The goal of this class is to provide a place to put common functionality of all its descendants as well as provide a way to help classifying objects on the tree. A Leaf object is an end-node, that is, a node that can hang directly from a group object, but that is not a group itself and, thus, it cannot have descendents. Right now, the set of end-nodes is composed by Table, Array, EArray, VLArray and UnImplemented class instances. In fact, all the previous classes inherit from the Leaf class.
The public variables and methods that class descendants inherits from Leaf are listed below.
Besides, the next instance variables are also defined and have similar meaning as its counterparts in the Group class:
Instances of this class represents table objects in the object tree. It provides methods to read/write data and from/to table objects in the file.
Data can be read from or written to tables by accessing to an special object that hangs from Table. This object is an instance of the Row class (see 4.5.4). See the tutorial sections chapter 3 on how to use the Row interface. The columns of the tables can also be easily accessed (and more specifically, they can be read but not written) by making use of the Column class, through the use of an extension of the natural naming schema applied inside the tables. See the section 4.6 for some examples of use of this capability.
Note that this object inherits all the public attributes and methods that Leaf already has.
Append a series of rows to this Table instance. rows is an object that can keep the rows to be append in several formats, like a RecArray, a list of tuples, list of Numeric/NumArray/CharArray objects, string, Python buffer or None (no append will result). Of course, this rows object has to be compliant with the underlying format of the Table instance or a ValueError will be issued.
from tables import * class Particle(IsDescription): name = StringCol(16, pos=1) # 16-character String lati = IntCol(pos=2) # integer longi = IntCol(pos=3) # integer pressure = Float32Col(pos=4) # float (single-precision) temperature = FloatCol(pos=5) # double (double-precision) fileh = openFile("test4.h5", mode = "w") table = fileh.createTable(fileh.root, 'table', Particle, "A table") # Append several rows in only one call table.append([("Particle: 10", 10, 0, 10*10, 10**2), ("Particle: 11", 11, -1, 11*11, 11**2), ("Particle: 12", 12, -2, 12*12, 12**2)]) fileh.close()
Returns an iterator yielding Row (see section 4.5.4) instances built from rows in table. If a range is supplied (i.e. some of the start, stop or step parameters are passed), only the appropriate rows are returned. Else, all the rows are returned. See also the __iter__() special method in section 4.5.3 for a shorter way to call this iterator.
The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.
Example of use:
result = [ row['var2'] for row in table.iterrows(step=5) if row['var1'] <= 20 ]
Iterate over a sequence of row coordinates.
Returns the actual data in Table. If field is not supplied, it returns the data as a RecArray object table.
The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.
The rest of the parameters are described next:
Modify a series of rows in the [start:stop:step] extended slice range. If you pass None to stop, all the rows existing in rows will be used.
rows can be either a RecArray object or a structure that is able to be converted to a RecArray and compliant with the table format.
Returns the number of modified rows.
It raises an ValueError in case the rows parameter could not be converted to an object compliant with table description.
It raises an IndexError in case the modification will exceed the length of the table.
Modify a series of rows in the [start:stop:step] extended slice row range. If you pass None to stop, all the rows existing in columns will be used.
columns can be either a RecArray or a list of arrays (the columns) that is able to be converted to a RecArray compliant with the specified column names subset of the table format.
names specifies the column names of the table to be modified.
Returns the number of modified rows.
It raises an ValueError in case the columns parameter could not be converted to an object compliant with table description.
It raises an IndexError in case the modification will exceed the length of the table.
Removes a range of rows in the table. If only start is supplied, this row is to be deleted. If a range is supplied, i.e. both the start and stop parameters are passed, all the rows in the range are removed. A step parameter is not supported, and it is not foreseen to implement it anytime soon.
Remove the index associated with the specified column. Only Index instances (see 4.13.3) are accepted as parameter. This index can be recreated again by calling the createIndex (see 4.6.2) method of the appropriate Column object.
Add remaining rows in buffers to non-dirty indexes. This can be useful when you have chosen non-automatic indexing for the table (see section 4.13.2) and want to update the indexes on it.
Recompute all the existing indexes in table. This can be useful when you suspect that, for any reason, the index information for columns is no longer valid and want to rebuild the indexes on it.
Recompute the existing indexes in table, but only if they are dirty. This can be useful when you have set the reindex parameter to 0 in IndexProps constructor (see 4.13.2) for the table and want to update the indexes after a invalidating index operation (Table.removeRows, for example).
Returns an iterator yielding Row (see section 4.5.4) instances built from rows in table that satisfy a condition over a column. If the column to which the condition is applied is indexed, this index will be used in order to accelerate the search. Else, the in-kernel iterator (with better performance than the regular iterator) will be choosed instead.
Moreover, if a range is supplied (i.e. some of the start, stop or step parameters are passed), only the rows in that range and fullfilling the condition are returned. Else, all the rows that fullfill the condition are returned.
The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows that fullfill the condition are selected.
You can mix this method with regular selections in order to have complex queries. It is strongly recommended that you pass the most restrictive condition as the parameter to this method if you want to achieve maximum performance.
Example of use:
passvalues=[] for row in table.iterrows(0>table.cols.col1<0.3, step=5): if row['var1'] <= 20: passvalues.append(row['var2'] print "Values that passes the cuts:", passvalues
See also the whereIndexed and whereInRange methods below for more specific ways to call this iterator.
Iterator that selects values fulfilling the condition parameter. This only works for conditions over a indexed column. If you try to use it over non-indexed column, an AssertionError will be raised.
The meaning of the condition, start, stop and step parameters is the same as in the where method (see 4.5.2) described above.
Iterator that selects values fulfilling the condition parameter. This method will use the in-kernel search method, i.e. it won't take advantage of a possible indexed column.
The meaning of the condition, start, stop and step parameters is the same as in the where method (see 4.5.2) described above.
Following are described the methods that automatically trigger actions when a Table instance is accessed in a special way (e.g., table["var2"] will be equivalent to a call to table.__getitem__("var2")).
It returns the same iterator than Table.iterrows(0,0,1). However, this does not accept parameters.
Example of use:
result = [ row['var2'] for row in table if row['var1'] <= 20 ]
Which is equivalent to:
result = [ row['var2'] for row in table.iterrows() if row['var1'] <= 20 ]
It takes different actions depending on the type of the key parameter:
Example of use:
record = table[4] recarray = table[4:1000:2] narray = table["var2"]
Which is equivalent to:
record = table.read(start=4)[0] recarray = table.read(start=4, stop=1000, step=2) narray = table.read(field="var2")
It takes different actions depending on the type of the key parameter:
Example of use:
# Modify just one existing row table[2] = [456,'db2',1.2] # Modify two existing rows rows = numarray.records.array([[457,'db1',1.2],[6,'de2',1.3]], formats="i4,a3,f8") table[1:3:2] = rows
Which is equivalent to:
table.modifyRows(start=2, [456,'db2',1.2]) rows = numarray.records.array([[457,'db1',1.2],[6,'de2',1.3]], formats="i4,a3,f8") table.modifyRows(start=1, step=2, rows)
This class is used to fetch and set values on the table fields. It works very much like a dictionary, where the keys are the field names of the associated table and the values are the values of those fields in a specific row.
This object turns out to actually be an extension type, so you won't be able to access its documentation interactively. Neither you won't be able to access its internal attributes (they are not directly accessible from Python), although accessors (i.e. methods that return an internal attribute) have been defined for some important variables.
This class is used as an accessor to the table columns following the natural name convention, so that you can access the different columns because there exist one attribute with the name of the columns for each associated Column instances. Besides, and like the Row class, it works similar to a dictionary, where the keys are the column names of the associated table and the values are Column instances. See section 4.6 for examples of use.
Each instance of this class is associated with one column of every table. These instances are mainly used to fetch and set actual data from the table columns, but there are a few other associated methods to deal with indexes.
Recompute the index associated with this column. This can be useful when you suspect that, for any reason, the index information is no longer valid and want to rebuild it.
Recompute the existing index only if it is dirty. This can be useful when you have set the reindex parameter to 0 in IndexProps constructor (see 4.13.2) for the table and want to update the column's index after a invalidating index operation (Table.removeRows, for example).
Delete the associated column's index. After doing that, you will loose the indexation information on disk. However, you can always re-create it using the createIndex() method (see 4.6.2).
Returns a column element or slice. It takes different actions depending on the type of the key parameter:
print "Column handlers:" for name in table.colnames: print table.cols[name] print print "Some selections:" print "Select table.cols.name[1]-->", table.cols.name[1] print "Select table.cols.name[1:2]-->", table.cols.name[1:2] print "Select table.cols.lati[1:3]-->", table.cols.lati[1:3] print "Select table.cols.pressure[:]-->", table.cols.pressure[:] print "Select table.cols['temperature'][:]-->", table.cols['temperature'][:]and the output of this for a certain arbitrary table is:
Column handlers: /table.cols.name (Column(1,), CharType) /table.cols.lati (Column(2,), Int32) /table.cols.longi (Column(1,), Int32) /table.cols.pressure (Column(1,), Float32) /table.cols.temperature (Column(1,), Float64) Some selections: Select table.cols.name[1]--> Particle: 11 Select table.cols.name[1:2]--> ['Particle: 11'] Select table.cols.lati[1:3]--> [[11 12] [12 13]] Select table.cols.pressure[:]--> [ 90. 110. 132.] Select table.cols['temperature'][:]--> [ 100. 121. 144.]See the examples/table2.py for a more complete example.
It takes different actions depending on the type of the key parameter:
Example of use:
# Modify row 1 table.cols.col1[1] = -1 # Modify rows 1 and 3 table.cols.col1[1::2] = [2,3]
Which is equivalent to:
# Modify row 1 table.modifyColumns(start=1, columns=[[-1]], names=["col1"]) # Modify rows 1 and 3 columns = numarray.records.fromarrays([[2,3]], formats="i4") table.modifyColumns(start=1, step=2, columns=columns, names=["col1"])
Represents an array on file. It provides methods to write/read data to/from array objects in the file. This class does not allow you to enlarge the datasets on disk; see the EArray descendant in section 4.8 if you want enlargeable dataset support and/or compression features.
The array data types supported are the same as the set provided by Numeric and numarray. For details of these data types see appendix A, or the numarray reference manual ().
Note that this object inherits all the public attributes and methods that Leaf already provides.
Note that, as this object has no internal I/O buffers, it is not necessary to use the flush() method inherited from Leaf in order to save its internal state to disk. When a writing method call returns, all the data is already on disk.
Returns an iterator yielding numarray instances built from rows in array. The return rows are taken from the first dimension in case of an Array instance and the enlargeable dimension in case of an EArray instance. If a range is supplied (i.e. some of the start, stop or step parameters are passed), only the appropriate rows are returned. Else, all the rows are returned. See also the and __iter__() special methods in section 4.7.3 for a shorter way to call this iterator.
The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.
Example of use:
result = [ row for row in arrayInstance.iterrows(step=4) ]
Read the array from disk and return it as a numarray (default) object, or an object with the same original flavor that it was saved. It accepts start, stop and step parameters to select rows (the first dimension in the case of an Array instance and the enlargeable dimension in case of an EArray) for reading.
The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.
Following are described the methods that automatically
trigger actions when an Array instance is
accessed in a special way (e.g.,
array[2:3,...,::2] will be equivalent to a
call to
array.__getitem__(slice(2,3, None),
Ellipsis, slice(None, None, 2))).
It returns the same iterator than Array.iterrows(0,0,1). However, this does not accept parameters.
Example of use:
result = [ row[2] for row in array ]
Which is equivalent to:
result = [ row[2] for row in array.iterrows(0, 0, 1) ]
It returns a numarray (default) object (or an object with the same original flavor that it was saved) containing the slice of rows stated in the key parameter. The set of allowed tokens in key is the same as extended slicing in python (the Ellipsis token included).
Example of use:
array1 = array[4] # array1.shape == array.shape[1:] array2 = array[4:1000:2] # len(array2.shape) == len(array.shape) array3 = array[::2, 1:4, :] array4 = array[1, ..., ::2, 1:4, 4:] # General slice selection
Sets an Array element, row or extended slice. It takes different actions depending on the type of the key parameter:
Example of use:
a1[0] = 333 # Assign an integer to a Integer Array row a2[0] = "b" # Assign a string to a string Array row a3[1:4] = 5 # Broadcast 5 to slice 1:4 a4[1:4:2] = "xXx" # Broadcast "xXx" to slice 1:4:2 # General slice update (a5.shape = (4,3,2,8,5,10) a5[1, ..., ::2, 1:4, 4:] = arange(1728, shape=(4,3,2,4,3,6))
This is a child of the Array class (see 4.7) and as such, EArray represents an array on the file. The difference is that EArray allows to enlarge datasets along any single dimension7) you select. Another important difference is that it also supports compression.
So, in addition to the attributes and methods that EArray inherits from Array, it supports a few more that provide a way to enlarge the arrays on disk. Following are described the new variables and methods as well as some that already exist in Array but that differ somewhat on the meaning and/or functionality in the EArray context.
Appends an object to the underlying dataset. Obviously, this object has to have the same type as the EArray instance; otherwise a TypeError is issued. In the same way, the dimensions of the object have to conform to those of EArray, that is, all the dimensions have to be the same except, of course, that of the enlargeable dimension which can be of any length (even 0!).
Example of use (code available in examples/earray1.py):
import tables from numarray import strings fileh = tables.openFile("earray1.h5", mode = "w") a = tables.StringAtom(shape=(0,), length=8) # Use 'a' as the object type for the enlargeable array array_c = fileh.createEArray(fileh.root, 'array_c', a, "Chars") array_c.append(strings.array(['a'*2, 'b'*4], itemsize=8)) array_c.append(strings.array(['a'*6, 'b'*8, 'c'*10], itemsize=8)) # Read the string EArray we have created on disk for s in array_c: print "array_c[%s] => '%s'" % (array_c.nrow, s) # Close the file fileh.close()
and the output is:
array_c[0] => 'aa' array_c[1] => 'bbbb' array_c[2] => 'aaaaaa' array_c[3] => 'bbbbbbbb' array_c[4] => 'cccccccc'
Instances of this class represents array objects in the object tree with the property that their rows can have a variable number of (homogeneous) elements (called atomic objects, or just atoms). Variable length arrays (or VLA's for short), similarly to Table instances, can have only one dimension, and likewise Table, the compound elements (the atoms) of the rows of VLArrays can be fully multidimensional objects.
VLArray provides methods to read/write data from/to variable length array objects residents on disk. Also, note that this object inherits all the public attributes and methods that Leaf already has.
Append the objects passed as parameters to a single row in the VLArray instance. The type of the objects has to be compliant with the VLArray.atom instance type.
Example of use (code available in examples/vlarray1.py):
import tables from Numeric import * # or, from numarray import * # Create a VLArray: fileh = tables.openFile("vlarray1.h5", mode = "w") vlarray = fileh.createVLArray(fileh.root, 'vlarray1', tables.Int32Atom(flavor="Numeric"), "ragged array of ints", Filters(complevel=1)) # Append some (variable length) rows # All these different parameter specification are accepted: vlarray.append(array([5, 6])) vlarray.append(array([5, 6, 7])) vlarray.append([5, 6, 9, 8]) vlarray.append(5, 6, 9, 10, 12) # Now, read it through an iterator for x in vlarray: print vlarray.name+"["+str(vlarray.nrow)+"]-->", x # Close the file fileh.close()
And the output for this looks like:
vlarray1[0]--> [5 6] vlarray1[1]--> [5 6 7] vlarray1[2]--> [5 6 9 8] vlarray1[3]--> [ 5 6 9 10 12]
Returns an iterator yielding one row per iteration. If a range is supplied (i.e. some of the start, stop or step parameters are passed), only the appropriate rows are returned. Else, all the rows are returned. See also the __iter__() special methods in section 4.9.3 for a shorter way to call this iterator.
The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.
Example of use:
for row in vlarray.iterrows(step=4): print vlarray.name+"["+str(vlarray.nrow)+"]-->", row
Returns the actual data in VLArray. As the lengths of the different rows are variable, the returned value is a python list, with as many entries as specified rows in the range parameters.
The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.
Following are described the methods that automatically trigger actions when a VLArray instance is accessed in a special way (e.g., vlarray[2:5] will be equivalent to a call to vlarray.__getitem__(slice(2,5,None)).
It returns the same iterator than VLArray.iterrows(0,0,1). However, this does not accept parameters.
Example of use:
result = [ row for row in vlarray ]
Which is equivalent to:
result = [ row for row in vlarray.iterrows() ]
It returns the slice of rows determined by key, which can be an integer index or an extended slice. The returned value is a list of objects of type array.atom.type.
Example of use:
list1 = vlarray[4] list2 = vlarray[4:1000:2]
Updates a vlarray row described by keys by setting it to value. Depending on the value of keys, the action taken is different:
Note: When updating VLStrings (codification UTF-8) or Objects atoms, there is a problem: one can only update values with exactly the same bytes than in the original row. With UTF-8 encoding this is problematic because, for instance, 'c' takes 1 byte, but 'ç' takes two. The same applies when using Objects atoms, because when cPickle applies to a class instance (for example), it does not guarantee to return the same number of bytes than over other instance, even of the same class than the former. These facts effectively limit the number of objects than can be updated in VLArrays.
Example of use:
vlarray[0] = vlarray[0]*2+3 vlarray[99,3:] = arange(96)*2+3 # Negative values for start and stop (but not step) are supported vlarray[99,-99:-89:2] = vlarray[5]*2+3
Instances of this class represents an unimplemented dataset in a generic HDF5 file. When reading such a file (i.e. one that has not been created with PyTables, but with some other HDF5 library based tool), chances are that the specific combination of datatypes and/or dataspaces in some dataset might not be supported by PyTables yet. In such a case, this dataset will be mapped into the UnImplemented class and hence, the user will still be able to build the complete object tree of this generic HDF5 file, as well as enabling the access (both read and write) of the attributes of this dataset and some metadata. Of course, the user won't be able to read the actual data on it.
This is an elegant way to allow users to work with generic HDF5 files despite the fact that some of its datasets would not be supported by PyTables. However, if you are really interested in having access to an unimplemented dataset, please, get in contact with the developer team.
This class does not have any public instance variables, except those inherited from the Leaf class (see 4.4).
Represents the set of attributes of a node (Leaf or Group). It provides methods to create new attributes, open, rename or delete existing ones.
Like in Group instances, AttributeSet instances make use of the natural naming convention, i.e. you can access the attributes on disk like if they were normal AttributeSet attributes. This offers the user a very convenient way to access (but also to set and delete) node attributes by simply specifying them like a normal attribute class.
Caveat: All Python data types are supported. The scalar ones (i.e. String, Int and Float) are mapped directly to the HDF5 counterparts, so you can correctly visualize them with any HDF5 tool. However, the rest of the data types and more general objects are serialized using cPickle, so you will be able to correctly retrieve them only from a Python-aware HDF5 library. Hopefully, the list of supported native attributes will be extended to fully multidimensional arrays sometime in the future.
Note that this class define the __setattr__, __getattr__ and __delattr__ and they work as normally intended. Any scalar (string, ints or floats) attribute is supported natively as an attribute. However, (c)Pickle is automatically used so as to serialize other kind of objects (like lists, tuples, dicts, small Numeric/numarray objects, ...) that you might want to save.
leaf.attrs.myattr = "str attr" # Set a string (native support) leaf.attrs.myattr2 = 3 # Set an integer (native support) leaf.attrs.myattr3 = [3,(1,2)] # A generic object (Pickled) attrib = leaf.attrs.myattr # Get the attribute myattr del leaf.attrs.myattr # Delete the attribute myattr
In this section a series of classes that are meant to declare datatypes that are required for primary PyTables (like Table or VLArray ) objects are described.
This class is in fact a so-called metaclass object. There is nothing special on this fact, except that their subclasses attributes are transformed during its instantiation phase, and new methods for instances are defined based on the values of the class attributes.
It is designed to be used as an easy, yet meaningful way to describe the properties of Table objects through the use of classes that inherit properties from it. In order to define such a special class, you have to declare it as descendant of IsDescription, with many attributes as columns you want in your table. The name of these attributes will become the name of the columns, while its values are the properties of the columns that are obtained through the use of the Col 4.12.2 class constructor.
Then, you can pass an instance of this object to the Table constructor, where all the information it contains will be used to define the table structure. See the section 3.3 for an example on how that works.
Moreover, you can change the properties of the index creation process by using an instance of the IndexProps 4.13.2 class and assign it to a special attribute called _v_indexprops.
The Col class is used as a mean to declare the different properties of a table column. In addition, a series of descendant classes are offered in order to make these column descriptions easier to the user. In general, it is recommended to use these descendant classes, as they are more meaningful when found in the middle of the code.
Note that the only public method accessible in these classes is the constructor itself.
This class has several descendants:
This class has two descendants:
This class has two descendants:
ComplexCol columns and its descendants do not support indexation.
The Atom class is meant to declare the different properties of the base element (also known as atom) of EArray and VLArray objects. The Atom instances have the property that their length is always the same. However, you can grow objects along the extendable dimension in the case of EArray or put a variable number of them on a VLArray row. Moreover, the atoms are not restricted to scalar values, and they can be fully multidimensional objects.
A series of descendant classes are offered in order to make the use of these element descriptions easier. In general, it is recommended to use these descendant classes, as they are more meaningful when found in the middle of the code. Note that the only public methods accessible in these classes are the atomsize() method and the constructor itself. The atomsize() method returns the total length, in bytes, of the element base atom.
A description of the different constructors with their parameters follows:
This class has several descendants:
This class has two descendants:
This class has two descendants:
Now, there come two special classes, ObjectAtom and VLString, that actually do not descend from Atom, but which goal is so similar that they should be described here. The difference between them and the Atom and descendents classes is that these special classes does not allow multidimensional atoms, nor multiple values per row. A flavor can't be specified neither as it is immutable (see below).
Caveat emptor: You are only allowed to use these classes to create VLArray objects, not EArray objects.
See examples/vlarray1.py and examples/vlarray2.py for further examples on VLArrays, including object serialization and Unicode string management.
In this section are listed classes that does not fit in any other section and that mainly serves for ancillary purposes.
This class is meant to serve as a container that keeps information about the filter properties associated with the enlargeable leaves, that is Table, EArray and VLArray.
The public variables of Filters are listed below:
There are no Filters public methods with the exception of the constructor itself that is described next.
The parameters that can be passed to the Filters class constructor are:
import numarray as na from tables import * fileh = openFile("test5.h5", mode = "w") atom = Float32Atom(shape=(0,2)) filters = Filters(complevel=1, complib = "lzo") filters.fletcher32 = 1 arr = fileh.createEArray(fileh.root, 'earray', atom, "A growable array", filters = filters) # Append several rows in only one call arr.append(na.array([[1., 2.], [2., 3.], [3., 4.]], type=na.Float32)) # Print information on that enlargeable array print "Result Array:" print repr(arr) fileh.close()This enforces the use of the LZO library, a compression level of 1 and a fletcher32 checksum filter as well. See the output of this example:
Result Array: /earray (EArray(3L, 2), fletcher32, shuffle, lzo(1)) 'A growable array' type = Float32 shape = (3L, 2) itemsize = 4 nrows = 3 extdim = 0 flavor = 'NumArray' byteorder = 'little'
You can use this class to set/unset the properties in the indexing process of a Table column. To use it, create an instance, and assign it to the special attribute _v_indexprops in a table description 4.12.1 class or dictionary.
The public variables of IndexProps are listed below:
There are no IndexProps public methods with the exception of the constructor itself that is described next.
The parameters that can be passed to the IndexProps class constructor are:
This class is used to keep the indexing information for table columns. It is actually a descendant of the Group class, with some added functionality.
It has no methods intented for programmer's use, but it has some attributes that maybe interesting for him.
This class is used to keep part of the indexing information for table columns. It is actually a descendant of the EArray class, with some added functionality.
It has no methods intented for programmer's use, and although it has some attributes with potentially useful information, all of it is accessible through Index class (see 4.13.3), so it will not be replicated here.