Package CedarBackup2 :: Package extend :: Module mbox
[show private | hide private]
[frames | no frames]

Module CedarBackup2.extend.mbox

Provides an extension to back up mbox email files.

Backing up email

Email folders (often stored as mbox flatfiles) are not well-suited being backed up with an incremental backup like the one offered by Cedar Backup. This is because mbox files often change on a daily basis, forcing the incremental backup process to back them up every day in order to avoid losing data. This can result in quite a bit of wasted space when backing up large folders. (Note that the alternative maildir format does not share this problem, since it typically uses one file per message.)

One solution to this problem is to design a smarter incremental backup process, which backs up baseline content on the first day of the week, and then backs up only new messages added to that folder on every other day of the week. This way, the backup for any single day is only as large as the messages placed into the folder on that day. The backup isn't as "perfect" as the incremental backup process, because it doesn't preserve information about messages deleted from the backed-up folder. However, it should be much more space-efficient, and in a recovery situation, it seems better to restore too much data rather than too little.

What is this extension?

This is a Cedar Backup extension used to back up mbox email files via the Cedar Backup command line. Individual mbox files or directories containing mbox files can be backed up using the same collect modes allowed for filesystems in the standard Cedar Backup collect action: weekly, daily, incremental. It implements the "smart" incremental backup process discussed above, using functionality provided by the grepmail utility.

This extension requires a new configuration section <mbox> and is intended to be run either immediately before or immediately after the standard collect action. Aside from its own configuration, it requires the options and collect configuration sections in the standard Cedar Backup configuration file.

The mbox action is conceptually similar to the standard collect action, except that mbox directories are not collected recursively. This implies some configuration changes (i.e. there's no need for global exclusions or an ignore file). If you back up a directory, all of the mbox files in that directory are backed up into a single tar file using the indicated compression method.

Author: Kenneth J. Pronovici <pronovic@ieee.org>

Classes
LocalConfig Class representing this extension's configuration document.
MboxConfig Class representing mbox configuration.
MboxDir Class representing mbox directory configuration..
MboxFile Class representing mbox file configuration..

Function Summary
  executeAction(configPath, options, config)
Executes the mbox backup action.
  _backupMboxDir(config, absolutePath, fullBackup, collectMode, compressMode, lastRevision, newRevision, excludePaths, excludePatterns)
Backs up a directory containing mbox files.
  _backupMboxFile(config, absolutePath, fullBackup, collectMode, compressMode, lastRevision, newRevision, targetDir)
Backs up an individual mbox file.
  _getBackupPath(config, mboxPath, compressMode, newRevision, targetDir)
Gets the backup file path (including correct extension) associated with an mbox path.
  _getCollectMode(local, item)
Gets the collect mode that should be used for an mbox file or directory.
  _getCompressMode(local, item)
Gets the compress mode that should be used for an mbox file or directory.
  _getExclusions(config, mboxDir)
Gets exclusions (file and patterns) associated with an mbox directory.
  _getOutputFile(backupPath, compressMode)
Opens the output file used for saving backup information.
  _getRevisionPath(config, item)
Gets the path to the revision file associated with a repository.
  _getTarfilePath(config, mboxPath, compressMode, newRevision)
Gets the tarfile backup file path (including correct extension) associated with an mbox path.
  _loadLastRevision(config, item, fullBackup, collectMode)
Loads the last revision date for this item from disk and returns it.
  _writeNewRevision(config, item, newRevision)
Writes new revision information to disk.

Variable Summary
list GREPMAIL_COMMAND = ['grepmail']
Logger logger = <logging.Logger instance at 0x3aefcb0c>
str REVISION_PATH_EXTENSION = 'mboxlast'

Function Details

executeAction(configPath, options, config)

Executes the mbox backup action.
Parameters:
configPath - Path to configuration file on disk.
           (type=String representing a path on disk.)
options - Program command-line options.
           (type=Options object.)
config - Program configuration.
           (type=Config object.)
Raises:
ValueError - Under many generic error conditions
IOError - If a backup could not be written for some reason.

_backupMboxDir(config, absolutePath, fullBackup, collectMode, compressMode, lastRevision, newRevision, excludePaths, excludePatterns)

Backs up a directory containing mbox files.
Parameters:
config - Cedar Backup configuration.
absolutePath - Path to mbox directory to back up.
fullBackup - Indicates whether this should be a full backup.
collectMode - Indicates the collect mode for this item
compressMode - Compress mode of file ("none", "gzip", "bzip")
lastRevision - Date of last backup as datetime.datetime
newRevision - Date of new (current) backup as datetime.datetime
excludePaths - List of absolute paths to exclude.
excludePatterns - List of patterns to exclude.
Raises:
ValueError - If some value is missing or invalid.
IOError - If there is a problem backing up the mbox file.

_backupMboxFile(config, absolutePath, fullBackup, collectMode, compressMode, lastRevision, newRevision, targetDir=None)

Backs up an individual mbox file.
Parameters:
config - Cedar Backup configuration.
absolutePath - Path to mbox file to back up.
fullBackup - Indicates whether this should be a full backup.
collectMode - Indicates the collect mode for this item
compressMode - Compress mode of file ("none", "gzip", "bzip")
lastRevision - Date of last backup as datetime.datetime
newRevision - Date of new (current) backup as datetime.datetime
targetDir - Target directory to write the backed-up file into
Raises:
ValueError - If some value is missing or invalid.
IOError - If there is a problem backing up the mbox file.

_getBackupPath(config, mboxPath, compressMode, newRevision, targetDir=None)

Gets the backup file path (including correct extension) associated with an mbox path.

We assume that if the target directory is passed in, that we're backing up a directory. Under these circumstances, we'll just use the basename of the individual path as the output file.
Parameters:
config - Cedar Backup configuration.
mboxPath - Path to the indicated mbox file or directory
compressMode - Compress mode to use for this mbox path
newRevision - Revision this backup path represents
targetDir - Target directory in which the path should exist
Returns:
Absolute path to the backup file associated with the repository.

Note: The backup path only contains the current date in YYYYMMDD format, but that's OK because the index information (stored elsewhere) is the actual date object.

_getCollectMode(local, item)

Gets the collect mode that should be used for an mbox file or directory. Use file- or directory-specific value if possible, otherwise take from mbox section.
Parameters:
local - LocalConfig object.
item - Mbox file or directory
Returns:
Collect mode to use.

_getCompressMode(local, item)

Gets the compress mode that should be used for an mbox file or directory. Use file- or directory-specific value if possible, otherwise take from mbox section.
Parameters:
local - LocalConfig object.
item - Mbox file or directory
Returns:
Compress mode to use.

_getExclusions(config, mboxDir)

Gets exclusions (file and patterns) associated with an mbox directory.

The returned files value is a list of absolute paths to be excluded from the backup for a given directory. It is derived from the mbox directory's relative exclude paths.

The returned patterns value is a list of patterns to be excluded from the backup for a given directory. It is derived from the mbox directory's list of patterns.
Parameters:
config - Cedar Backup configuration.
mboxDir - Mbox directory object.
Returns:
Tuple (files, patterns) indicating what to exclude.

_getOutputFile(backupPath, compressMode)

Opens the output file used for saving backup information.

If the compress mode is "gzip", we'll open a GzipFile, and if the compress mode is "bzip2", we'll open a BZ2File. Otherwise, we'll just return an object from the normal open() method.
Parameters:
backupPath - Path to file to open.
compressMode - Compress mode of file ("none", "gzip", "bzip").
Returns:
Output file object.

_getRevisionPath(config, item)

Gets the path to the revision file associated with a repository.
Parameters:
config - Cedar Backup configuration.
item - Mbox file or directory
Returns:
Absolute path to the revision file associated with the repository.

_getTarfilePath(config, mboxPath, compressMode, newRevision)

Gets the tarfile backup file path (including correct extension) associated with an mbox path.

Along with the path, the tar archive mode is returned in a form that can be used with BackupFileList.generateTarfile.
Parameters:
config - Cedar Backup configuration.
mboxPath - Path to the indicated mbox file or directory
compressMode - Compress mode to use for this mbox path
newRevision - Revision this backup path represents
Returns:
Tuple of (absolute path to tarfile, tar archive mode)

Note: The tarfile path only contains the current date in YYYYMMDD format, but that's OK because the index information (stored elsewhere) is the actual date object.

_loadLastRevision(config, item, fullBackup, collectMode)

Loads the last revision date for this item from disk and returns it.

If this is a full backup, or if the revision file cannot be loaded for some reason, then None is returned. This indicates that there is no previous revision, so the entire mail file or directory should be backed up.
Parameters:
config - Cedar Backup configuration.
item - Mbox file or directory
fullBackup - Indicates whether this is a full backup
collectMode - Indicates the collect mode for this item
Returns:
Revision date as a datetime.datetime object or None.

Note: We write the actual revision object to disk via pickle, so we don't deal with the datetime precision or format at all. Whatever's in the object is what we write.

_writeNewRevision(config, item, newRevision)

Writes new revision information to disk.

If we can't write the revision file successfully for any reason, we'll log the condition but won't throw an exception.
Parameters:
config - Cedar Backup configuration.
item - Mbox file or directory
newRevision - Revision date as a datetime.datetime object.

Note: We write the actual revision object to disk via pickle, so we don't deal with the datetime precision or format at all. Whatever's in the object is what we write.


Variable Details

GREPMAIL_COMMAND

Type:
list
Value:
['grepmail']                                                           

logger

Type:
Logger
Value:
<logging.Logger instance at 0x3aefcb0c>                                

REVISION_PATH_EXTENSION

Type:
str
Value:
'mboxlast'                                                             

Generated by Epydoc 2.1 on Mon Sep 4 13:49:34 2006 http://epydoc.sf.net