Last modified: February 03, 2010
Contents
Gamera provides a rudimentary framework for dealing with graph structures. While this is a clearly Python-centric library, Many of the algorithms are written in C++ so are generally faster than a pure Python approach.
This chapter assumes a basic understanding of graph theory and graph algorithms. While this document describes the API, it does not describe in detail how the algorithms work. (Hopefully this can be added in a future release of this documentation.)
Most operations on graphs in the Gamera graph library are methods on Graph objects.
Each node is identified by an arbitrary Python value. Importantly, the same value may not be assigned to more than one node within the same graph. Whenever a node is needed as an argument, either the node object or the node's value may be passed in. In most cases, it is more convenient to refer to nodes by their associated values rather than keeping track of their associated Node objects.
The following example code covers basic usage. For more detailed descriptions of the properties and methods, see the Reference section.
Let's look at some simple code to create the following trivially small graph structure:
>>> from gamera import graph
>>> from gamera import graph_util
>>>
>>> g = graph.Graph()
>>>
>>> g.add_edge('a', 'b')
1
>>> g.add_edge('a', 'c')
1
>>> g.add_edge('a', 'd')
1
>>> g.add_edge('b', 'd')
1
>>> g.add_edge('c', 'e')
1
>>> g.add_edge('f', 'g')
1
>>> g.add_edge('f', 'g')
1
>>> g.add_edge('e', 'e')
1
>>>
add_edge creates nodes that don't already exist, so add_node is not necessary to create this graph.
The number of nodes and edges can be queried:
>>> g.nnodes # Number of nodes
7
>>> g.nedges # Number of edges
8
Now, the graph can be traversed, using either a breadth-first search (BFS) or depth-first search (DFS). Each search algorithm takes a node as a starting point:
>>> # node is a Node instance. Call node() to get its value
>>> for node in g.BFS('a'):
... print node(),
...
a b c d e
>>> for node in g.DFS('a'):
... print node(),
...
a d c e b
>>> # No other nodes reachable from 'e'
>>> for node in g.BFS('e'):
... print node(),
...
e
>>> for node in g.BFS('f'):
... print node(),
...
f g
Note that the search algorithms, like many other things in the Gamera graph library, return lazy iterators, so the results are determined on demand. Importantly, this means you can not get the length of the result until it has been entirely evaluated.
>>> g.BFS('a')
<gamera.graph.Iterator object at 0xf6fa54d0>
>>> len(g.BFS('a'))
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: len() of unsized object
>>> list(g.BFS('a'))
[<Node of 'a'>, <Node of 'b'>, <Node of 'c'>, <Node of 'd'>, <Node of 'e'>]
>>> [node() for node in g.BFS('a')]
['a', 'b', 'c', 'd', 'e']
>>>
Note that this graph is composed of two distinct subgraphs. There are ways to determine the number of subgraphs and iterate over the roots of those subgraphs. For instance, to do a breadth-first search of all subgraphs:
>>> g.nsubgraphs
2
>>> for root in g.get_subgraph_roots():
... for node in g.BFS(root):
... print node.data,
... print
...
a b c d e
f g
The graph can be restricted based on a number of different properties by passing flags to the constructor. These properties include directedness, cyclicalness, multi-connectedness and self-connectedness. (See Graph constructor).
For instance, let's attempt to make a tree, which is a graph that is undirected and contains no cycles, using the same graph structure:
>>> g = graph.Graph(graph.TREE)
>>> g.add_edge('a', 'b')
1
>>> g.add_edge('a', 'c')
1
>>> g.add_edge('a', 'd')
1
>>> g.add_edge('b', 'd')
0
>>> g.add_edge('c', 'e')
1
>>> g.add_edge('f', 'g')
1
>>> g.add_edge('f', 'g')
0
>>> g.add_edge('e', 'e')
0
Note that some edges were not created, since they would have violated the restrictions of a tree. This is indicated by the return value of add_edge. In this case, our graph now looks like this:
The reference section below provides a complete list of the algorithms and methods on graph objects.
Each graph is represented by instances of the Graph class. All modifications to the graph structure, including adding and removing nodes and edges, is performed through this class.
Graph (flags = FREE)
Construct a new graph.
The flags are used to set certain restrictions on the graph. When adding an edge violates one of these restrictions, the edge is not added and None is returned. Note that exceptions are not raised. The graph type may be changed at any time after creation using methods such as make_directed or make_undirected, but conversion may take some time.
The flags may be any combination of the following values (use bitwise-or to combine flags). The values of these flags are defined in the graph module. By default, all flags are True:
DIRECTED:
When True, the graph will have directed edges. Nodes will only traverse to other nodes in the direction of the edge. (Implementation detail: When False, each edge will be represented by two edges, one pointing in each direction.
CYCLIC:
When True, the graph may contain cycles. When False, edges are added to the graph only when they do not create cycles. (When False, MULTI_CONNECTED and SELF_CONNECTED are set to False.)
BLOB:
A "blob" is defined as the opposite of a tree. (When False, DIRECTED and CYCLIC will be set to False).
MULTI_CONNECTED:
When True, the graph may contain multiple edges between a single pair of nodes.
SELF_CONNECTED:
When True, the graph may contain edges that start and end at the same node.
In addition to these raw flags, there are some convenience values for common combinations of these flags.
- FREE: Equivalent to all flags being set. There are no restrictions on the graph morphology.
- TREE: Tree structure (no flags set).
- FLAG_DAG: Directed, acyclic graph.
- UNDIRECTED: Undirected, cyclic graph.
add_node (value)
Add a node identified by the given value. The newly-created node has no edges.
Returns 1 if a new node was created. Returns 0 if a node already exists with the associated value.
Complexity: Nodes are added in constant time, except when requiring a reallocation of the node vector.
add_nodes (list_of_values)
Add nodes identified by each value in a list. The newly-created nodes have no edges.
Returns the number of new nodes that were created.
Complexity: add_nodes is moderately faster than multiple calls to add_node. Nodes are added in constant time, except when requiring a reallocation of the node vector.
get_node (value)
Returns the Node object identified by the given value.
Raises a ValueError exception if there is no node associated with the given value.
get_nodes ()
Returns a lazy iterator over all nodes in the graph. The ordering of the nodes is undefined.
remove_node (value)
Remove a node identified by value from the graph, stitching together the broken edges.
For instance, given the graph:
a -> b -> c
.remove_node('b') will result in:
a -> c
Complexity: Removing a node takes O (n + e) where n is the number of nodes in the graph and e is the number of edges attached to the given node.
remove_node_and_edges (value)
Remove the node identifed by value from the graph, and remove all edges pointing inward or outward from that node.
For instance, given the graph:
a -> b -> c
.remove_node_and_edges('b') will result in:
a c
Complexity: Removing a node takes O (n + e) where n is the number of nodes in the graph and e is the number of edges attached to the given node.
add_edge (from_value, to_value, cost = 1.0, label = None)
Add an edge between the two nodes identified by from_value and to_value.
The return value is the number of edges created. If the edge violates any of the restrictions specified by the flags to the graph's constructor, the edge will not be created.
If the graph is DIRECTED, the edge goes from from_value to to_value.
If a node representing one of the values is not present, a nodes will be implicitly created.
Optionally, a cost and label can be associated with the edge. These values are used by some higher-level graph algorithms such as create_minimum_spanning_tree.
Complexity: Edges are added in constant time, except when requiring a reallocation of the edge vector, or when CYCLIC is False.
add_edges (list_of_tuples)
Add edges specified by a list of tuples of the form:
(from_value, to_value, [cost*[, *label]]).
The members of this tuple correspond to the arguments to add_edge.
The return value is the number of edges created. If an edge violates any of the restrictions specified by the flags to the graph's constructor, that edge will not be created.
If a node representing any of the values are not present, a node will be implicitly created.
Complexity: add_edges is moderately faster than multiple calls to add_edge. Edges are added in constant time, except when requiring a reallocation of the edge vector, or when CYCLIC is False.
get_edges ()
Returns an iterator over all edges in the graph. The ordering of the edges is undefined.
has_edge (from_value, to_value)
or
has_edge (from_node, to_node)
or
has_edge (edge)
Returns True if graph contains the given edge. The edge can be specified as either a pair of values identifying nodes, a pair of Node objects, or a single Edge object.
remove_edge (from_value, to_value)
Remove an edge between two nodes identified by from_value and to_value.
If the edge does not exist in the graph, a RuntimeError exception is raised.
When the graph is DIRECTED, only the edge going from from_value to to_value is removed.
If the graph is MULTI_CONNECTED, all edges from from_value to to_value are removed.
Complexity: Edges can be removed in O*(*e) time where e is the number of edges in the graph.
remove_all_edges ()
Remove all the edges in the graph, leaving all nodes as islands.
Complexity: remove_all_edges takes O ( n + e) time where n is the number of nodes in the graph and e is the number of edges in the graph.
make_directed ()
If the graph is undirected, converts it into an undirected graph by adding a complementary edge for each existing edge.
make_undirected ()
If the graph is directed, converts it into an undirected graph. Each edge in the existing graph will become a non-directional edge in the resulting graph.
is_cyclic ()
Returns True if the graph is defined as cyclic. Note that this is True even if the graph does not currently have any cycles.
make_cyclic ()
Allow the graph to include cycles from this point on. This does nothing except set the CYCLIC flag.
make_acyclic ()
Remove any cycles (using a depth-first search technique) and disallow cycles from this point on.
This may not be the most appropriate cycle-removing technique for all applications.
See create_spanning_tree for other ways to do this.
is_blob ()
Returns True if the graph is defined as being a blob (the opposite of a tree). Note that this will return True even if the graph currently conforms to the restrictions of a tree.
make_tree ()
Turns the graph into a tree by calling make_acyclic followed by make_undirected. Sets the BLOB flag to False.
This approach may not be reasonable for all applications. For other ways to convert blobs to trees, see spanning trees.
make_blob ()
Make the graph into a blob (the opposite of a tree). This does nothing except set the BLOB flag.
is_multi_connected ()
Returns True if the graph is defined as being multi-connected (i.e. multiple edges between a single pair of nodes). Note that this returns True even if there are no multi-connections in the graph.
is_singly_connected ()
Returns True if the graph is defined as being singly-connected (i.e. at most one edge between a single pair of nodes). Note that this will return False if the graph is defined as multi-connected, even if it contains no multi-connections.
make_multi_connected ()
Allow the graph to be multi-connected from this point on. This does nothing except set the MULTI_CONNECTED flag.
make_singly_connected ()
For each pair of nodes, leave only one remaining edge in either direction. Restrict the graph to being singly-connected from this point on.
is_self_connected ()
Returns True if the graph is defined as self-connected (having edges that point from one node to that same node.) Note that this returns True even if the graph does not have any self-connections.
make_self_connected ()
Allow the graph to be self-conncted from this point on. This does nothing except set the SELF_CONNECTED flag.
make_not_self_connected ()
Remove all self-connections and restrict the graph to have no self-connections from this point on.
Here, a "subgraph" is defined as a connected group of nodes. A graph contains multiple subgraphs when the graph does not have a single root node from which all other nodes can be reached.
get_subgraph_roots ()
Returns a lazy iterator over each of the subgraph roots. Performing a breadth-first or depth-first search from each of this notes will visit every node in the graph.
size_of_subgraph (value)
or
size_of_subgraph (node)
Returns the size of the subgraph rooted at the given node. In other words, this returns the number of nodes reachable from the given node.
copy (flags = FREE)
Copies a graph (optionally specifying new flags for the new graph).
In some cases, copying the graph to a new graph type may be faster than using one of the in-place conversion functions.
See Graph constructor for a definition of flags.
The number of nodes in the graph.
The number of edges in the graph.
The number of subgraphs.
BFS (value or node)
A lazy iterator that returns the nodes in breadth-first order starting from the given value or node.
DFS (value or node)
A lazy iterator that returns the nodes in depth-first order starting from the given value or node.
djikstra_shortest_path (value or node)
Calculates the shortest path from the given node to all other reachable nodes using Djikstra's algorithm.
The return value is a dictionary of paths. The keys are destination node identifiers and the values are tuples of the form
(distance, nodes)
where distance is the distance traveled from the given node to the destination node and nodes is a list of node identifiers in the shortest path to reach the destination node.
This algorithm will use the cost values associated with each edge if they are given.
djikstra_all_pairs_shortest_path ()
Calculates the shortest paths between all pairs of nodes in the graph by calling djikstra_shortest_path multiple times.
The return value is a dictionary where the keys are source node identifiers and the values are dictionaries of paths keyed by destination node identifiers (of the same form as djikstra_shortest_path). The values of the internal dictionaries are tuples of the form
(distance, nodes)
where distance is the distance traveled from the given node to the destination node and nodes is a list of node identifiers in the shortest path to reach the destination node.
This algorithm will use the cost values associated with each edge if they are given.
all_pairs_shortest_path ()
Calculates the shortest paths between all pairs of nodes in the graph using a tight-inner loop that is
generally faster than djikstra_all_pairs_shortest_path when the graph is reasonably large.
The return value is of the same form as djikstra_all_pairs_shortest_path. It is a dictionary where the keys are source node identifiers and the values are dictionaries of paths keyed by destination node identifiers (of the same form as djikstra_shortest_path). The values of the internal dictionaries are tuples of the form
(distance, nodes)
where distance is the distance traveled from the given node to the destination node and nodes is a list of node identifiers in the shortest path to reach the destination node.
This algorithm will use the cost values associated with each edge if they are given.
create_spanning_tree (value or node)
Returns a new graph which is a (probably non-optimal) spanning tree of all nodes reachable from the given node.
create_minimum_spanning_tree ()
Creates a minimum spanning tree of the entire graph in place using Kruskal's algorithm. A minimum spanning tree connects all nodes using the minimum total edge cost.
optimize_partitions (root_node, fittness_func, max_parts_per_group = 5, max_subgraph_size = 16)
A partition is defined as a way to divide a subgraph into groups. This algorithm finds an optimal partition according to the given fitness function.
- root_node
- The root node of the subgraph to be optimized.
- fitness_func
- A user-provided Python function, that given a partition as a nested list of groups, where each value is a node identifier, returns a floating-point score. Higher values indicate greater fitness.
- max_parts_per_group
- Limits the number of nodes that will be placed into a single group.
- max_subgraph_size
- If the subgraph rooted at root_node has more than max_subgraph_size nodes, the partitions will not be optimized.
Nodes contain a reference to an arbitrary Python value. Importantly, the value is used to identify the node, so the same value may not be assigned to more than one node. In many cases, it is much more convenient to refer to nodes by their associated values rather than keeping track of their associated Node objects.
The value that identifies this node. (This value can also be obtained by "calling" the node as in node()).
A lazy iterator over the edges pointing out from the node.
A lazy iterator over the nodes that can be reached directly (through a single edge) from this node.
The number of edges going out from this node.
traverse (node)
Returns the node on the other end of the edge. (Useful only for undirected graphs).
The starting node of this edge.
The ending node of this edge.
The cost associated with traversing this edge (used by algorithms such as create_minimum_spanning_tree).
The label associated with this edge.
The following functions are available in the graph_util module.
graphviz_output (graph, filename, string_function = str)
Writes a graph in the format used by the dot tool in the graphviz package for graph visualisation.
There are many ways to represent graphs in memory for different applications. Unfortunately, this library only uses one, which may not be suitable for all applications. It is as follows:
- The graph object contains arrays of all nodes and edges. Therefore, random access to nodes and edges is constant time (O(1)). Deleting nodes and edges is in linear time (O(n)).
- The graph contains a hash table from node data to nodes, so given a particular piece of data associated with a node, the node can be looked up in O(n ln n) time.
- Each node contains an array of coincident edges. If the graph is directed, this includes edges going in and out from the node.