DataManager is a simple class for managing data produced by multiple
runs of a simulation carried out in separate processes or machines. Each
process is assigned a unique ID and Python Shelf object to write its data
to. Each shelf is a dictionary whose keys must be strings. The DataManager
can collate information across multiple shelves using the get(key) method,
which returns a dictionary with keys the unique session names, and values
the value written in that session (typically only the values will be of
interest). If each value is a tuple or list then you can use the
get_merged(key) to get a concatenated list. If the data type is more
complicated you can use the get(key) method and merge by hand. The idea
is each process generates files with names that do not interfere with each
other so that there are no file concurrency issues, and then in the data
analysis phase, the data generated separately by each process is merged
together.
Methods:
- get(key)
- Return dictionary with keys the session names, and values the values
stored in that session for the given key.
- get_merged(key)
- Return a single list of the merged lists or tuples if each value for
every session is a list or tuple.
- get_matching(match)
- Returns a dictionary with keys the keys matching match and values
get(key). If match is a string, a matching key has to start with that
string. If match is a function, a key matches if match(key).
- get_merged_matching(match)
- Like get_merged(key) but across all keys that match.
- get_flat_matching(match)
- Returns a straight list of every value session[key] for all sessions
and all keys matching match.
- iteritems()
- Returns all (key, value) pairs, for each Shelf file, as an iterator
(useful for large files with too much data to be loaded into memory).
- itemcount()
- Returns the total number of items across all the Shelf files.
- keys()
- A list of all the keys across all sessions.
- session()
- Returns a randomly named session Shelf, multiple processes can write to
these without worrying about concurrency issues.
- computer_session()
- Returns a consistently named Shelf specific to that user and computer,
only one process can write to it without worrying about concurrency issues.
- locking_session(), locking_computer_session()
- Returns a LockingSession object, a limited proxy to the underlying
Shelf which acquires and releases a lock before and after every
operation, making it safe for concurrent access.
- session_filenames()
- A list of all the shelf filenames for all sessions.
- make_unique_key()
- Generates a unique key for inserting an element into a session without
overwriting data, uses uuid4.
Attributes:
- basepath
- The base path for data files.
- computer_name
- A (hopefully) unique identifier for the user and computer, consists of
the username and the computer network name.
- computer_session_filename
- The filename of the computer-specific session file. This file should
only be accessed by one process at a time, there’s no way to protect
against concurrent write accesses causing it to be corrupted.