Package CedarBackup2 :: Module util
[hide private]
[frames] | no frames]

Module util

source code

Provides general-purpose utilities.


Author: Kenneth J. Pronovici <pronovic@ieee.org>

Classes [hide private]
  AbsolutePathList
Class representing a list of absolute paths.
  ObjectTypeList
Class representing a list containing only objects with a certain type.
  RestrictedContentList
Class representing a list containing only object with certain values.
  RegexMatchList
Class representing a list containing only strings that match a regular expression.
  RegexList
Class representing a list of valid regular expression strings.
  _Vertex
Represents a vertex (or node) in a directed graph.
  DirectedGraph
Represents a directed graph.
  PathResolverSingleton
Singleton used for resolving executable paths.
  UnorderedList
Class representing an "unordered list".
  Pipe
Specialized pipe class for use by executeCommand.
  Diagnostics
Class holding runtime diagnostic information.
Functions [hide private]
 
sortDict(d)
Returns the keys of the dictionary sorted by value.
source code
 
convertSize(size, fromUnit, toUnit)
Converts a size in one unit to a size in another unit.
source code
 
getUidGid(user, group)
Get the uid/gid associated with a user/group pair
source code
 
changeOwnership(path, user, group)
Changes ownership of path to match the user and group.
source code
 
splitCommandLine(commandLine)
Splits a command line string into a list of arguments.
source code
 
resolveCommand(command)
Resolves the real path to a command through the path resolver mechanism.
source code
 
executeCommand(command, args, returnOutput=False, ignoreStderr=False, doNotLog=False, outputFile=None)
Executes a shell command, hopefully in a safe way.
source code
 
calculateFileAge(path)
Calculates the age (in days) of a file.
source code
 
encodePath(path)
Safely encodes a filesystem path.
source code
 
nullDevice()
Attempts to portably return the null device on this system.
source code
 
deriveDayOfWeek(dayName)
Converts English day name to numeric day of week as from time.localtime.
source code
 
isStartOfWeek(startingDay)
Indicates whether "today" is the backup starting day per configuration.
source code
 
buildNormalizedPath(path)
Returns a "normalized" path based on a path name.
source code
 
removeKeys(d, keys)
Removes all of the keys from the dictionary.
source code
 
displayBytes(bytes, digits=2)
Format a byte quantity so it can be sensibly displayed.
source code
 
getFunctionReference(module, function)
Gets a reference to a named function.
source code
 
isRunningAsRoot()
Indicates whether the program is running as the root user.
source code
 
mount(devicePath, mountPoint, fsType)
Mounts the indicated device at the indicated mount point.
source code
 
unmount(mountPoint, removeAfter=False, attempts=1, waitSeconds=0)
Unmounts whatever device is mounted at the indicated mount point.
source code
 
deviceMounted(devicePath)
Indicates whether a specific filesystem device is currently mounted.
source code
 
sanitizeEnvironment()
Sanitizes the operating system environment.
source code
 
dereferenceLink(path, absolute=True)
Deference a soft link, optionally normalizing it to an absolute path.
source code
 
checkUnique(prefix, values)
Checks that all values are unique.
source code
 
parseCommaSeparatedString(commaString)
Parses a list of values out of a comma-separated string.
source code
Variables [hide private]
  ISO_SECTOR_SIZE = 2048.0
Size of an ISO image sector, in bytes.
  BYTES_PER_SECTOR = 2048.0
Number of bytes (B) per ISO sector.
  BYTES_PER_KBYTE = 1024.0
Number of bytes (B) per kilobyte (kB).
  BYTES_PER_MBYTE = 1048576.0
Number of bytes (B) per megabyte (MB).
  BYTES_PER_GBYTE = 1073741824.0
Number of bytes (B) per megabyte (GB).
  KBYTES_PER_MBYTE = 1024.0
Number of kilobytes (kB) per megabyte (MB).
  MBYTES_PER_GBYTE = 1024.0
Number of megabytes (MB) per gigabyte (GB).
  SECONDS_PER_MINUTE = 60.0
Number of seconds per minute.
  MINUTES_PER_HOUR = 60.0
Number of minutes per hour.
  HOURS_PER_DAY = 24.0
Number of hours per day.
  SECONDS_PER_DAY = 86400.0
Number of seconds per day.
  UNIT_BYTES = 0
Constant representing the byte (B) unit for conversion.
  UNIT_KBYTES = 1
Constant representing the kilobyte (kB) unit for conversion.
  UNIT_MBYTES = 2
Constant representing the megabyte (MB) unit for conversion.
  UNIT_GBYTES = 4
Constant representing the gigabyte (GB) unit for conversion.
  UNIT_SECTORS = 3
Constant representing the ISO sector unit for conversion.
  _UID_GID_AVAILABLE = True
  logger = logging.getLogger("CedarBackup2.log.util")
  outputLogger = logging.getLogger("CedarBackup2.output")
  MTAB_FILE = '/etc/mtab'
  MOUNT_COMMAND = ['mount']
  UMOUNT_COMMAND = ['umount']
  DEFAULT_LANGUAGE = 'C'
  LANG_VAR = 'LANG'
  LOCALE_VARS = ['LC_ADDRESS', 'LC_ALL', 'LC_COLLATE', 'LC_CTYPE...
Function Details [hide private]

sortDict(d)

source code 

Returns the keys of the dictionary sorted by value.

There are cuter ways to do this in Python 2.4, but we were originally attempting to stay compatible with Python 2.3.

Parameters:
  • d - Dictionary to operate on
Returns:
List of dictionary keys sorted in order by dictionary value.

convertSize(size, fromUnit, toUnit)

source code 

Converts a size in one unit to a size in another unit.

This is just a convenience function so that the functionality can be implemented in just one place. Internally, we convert values to bytes and then to the final unit.

The available units are:

  • UNIT_BYTES - Bytes
  • UNIT_KBYTES - Kilobytes, where 1 kB = 1024 B
  • UNIT_MBYTES - Megabytes, where 1 MB = 1024 kB
  • UNIT_GBYTES - Gigabytes, where 1 GB = 1024 MB
  • UNIT_SECTORS - Sectors, where 1 sector = 2048 B
Parameters:
  • size (Integer or float value in units of fromUnit) - Size to convert
  • fromUnit (One of the units listed above) - Unit to convert from
  • toUnit (One of the units listed above) - Unit to convert to
Returns:
Number converted to new unit, as a float.
Raises:
  • ValueError - If one of the units is invalid.

getUidGid(user, group)

source code 

Get the uid/gid associated with a user/group pair

This is a no-op if user/group functionality is not available on the platform.

Parameters:
  • user (User name as a string) - User name
  • group (Group name as a string) - Group name
Returns:
Tuple (uid, gid) matching passed-in user and group.
Raises:
  • ValueError - If the ownership user/group values are invalid

changeOwnership(path, user, group)

source code 

Changes ownership of path to match the user and group.

This is a no-op if user/group functionality is not available on the platform, or if the either passed-in user or group is None. Further, we won't even try to do it unless running as root, since it's unlikely to work.

Parameters:
  • path - Path whose ownership to change.
  • user - User which owns file.
  • group - Group which owns file.

splitCommandLine(commandLine)

source code 

Splits a command line string into a list of arguments.

Unfortunately, there is no "standard" way to parse a command line string, and it's actually not an easy problem to solve portably (essentially, we have to emulate the shell argument-processing logic). This code only respects double quotes (") for grouping arguments, not single quotes ('). Make sure you take this into account when building your command line.

Incidentally, I found this particular parsing method while digging around in Google Groups, and I tweaked it for my own use.

Parameters:
  • commandLine (String, i.e. "cback --verbose stage store") - Command line string
Returns:
List of arguments, suitable for passing to popen2.
Raises:
  • ValueError - If the command line is None.

resolveCommand(command)

source code 

Resolves the real path to a command through the path resolver mechanism.

Both extensions and standard Cedar Backup functionality need a way to resolve the "real" location of various executables. Normally, they assume that these executables are on the system path, but some callers need to specify an alternate location.

Ideally, we want to handle this configuration in a central location. The Cedar Backup path resolver mechanism (a singleton called PathResolverSingleton) provides the central location to store the mappings. This function wraps access to the singleton, and is what all functions (extensions or standard functionality) should call if they need to find a command.

The passed-in command must actually be a list, in the standard form used by all existing Cedar Backup code (something like ["svnlook", ]). The lookup will actually be done on the first element in the list, and the returned command will always be in list form as well.

If the passed-in command can't be resolved or no mapping exists, then the command itself will be returned unchanged. This way, we neatly fall back on default behavior if we have no sensible alternative.

Parameters:
  • command (List form of command, i.e. ["svnlook", ].) - Command to resolve.
Returns:
Path to command or just command itself if no mapping exists.

executeCommand(command, args, returnOutput=False, ignoreStderr=False, doNotLog=False, outputFile=None)

source code 

Executes a shell command, hopefully in a safe way.

This function exists to replace direct calls to os.popen in the Cedar Backup code. It's not safe to call a function such as os.popen() with untrusted arguments, since that can cause problems if the string contains non-safe variables or other constructs (imagine that the argument is $WHATEVER, but $WHATEVER contains something like "; rm -fR ~/; echo" in the current environment).

Instead, it's safer to pass a list of arguments in the style supported bt popen2 or popen4. This function actually uses a specialized Pipe class implemented using either subprocess.Popen or popen2.Popen4.

Under the normal case, this function will return a tuple of (status, None) where the status is the wait-encoded return status of the call per the popen2.Popen4 documentation. If returnOutput is passed in as True, the function will return a tuple of (status, output) where output is a list of strings, one entry per line in the output from the command. Output is always logged to the outputLogger.info() target, regardless of whether it's returned.

By default, stdout and stderr will be intermingled in the output. However, if you pass in ignoreStderr=True, then only stdout will be included in the output.

The doNotLog parameter exists so that callers can force the function to not log command output to the debug log. Normally, you would want to log. However, if you're using this function to write huge output files (i.e. database backups written to stdout) then you might want to avoid putting all that information into the debug log.

The outputFile parameter exists to make it easier for a caller to push output into a file, i.e. as a substitute for redirection to a file. If this value is passed in, each time a line of output is generated, it will be written to the file using outputFile.write(). At the end, the file descriptor will be flushed using outputFile.flush(). The caller maintains responsibility for closing the file object appropriately.

Parameters:
  • command (List of individual arguments that make up the command) - Shell command to execute
  • args (List of additional arguments to the command) - List of arguments to the command
  • returnOutput (Boolean True or False) - Indicates whether to return the output of the command
  • doNotLog (Boolean True or False) - Indicates that output should not be logged.
  • outputFile (File object as returned from open() or file().) - File object that all output should be written to.
Returns:
Tuple of (result, output) as described above.
Notes:
  • I know that it's a bit confusing that the command and the arguments are both lists. I could have just required the caller to pass in one big list. However, I think it makes some sense to keep the command (the constant part of what we're executing, i.e. "scp -B") separate from its arguments, even if they both end up looking kind of similar.
  • You cannot redirect output via shell constructs (i.e. >file, 2>/dev/null, etc.) using this function. The redirection string would be passed to the command just like any other argument. However, you can implement the equivalent to redirection using ignoreStderr and outputFile, as discussed above.
  • The operating system environment is partially sanitized before the command is invoked. See sanitizeEnvironment for details.

calculateFileAge(path)

source code 

Calculates the age (in days) of a file.

The "age" of a file is the amount of time since the file was last used, per the most recent of the file's st_atime and st_mtime values.

Technically, we only intend this function to work with files, but it will probably work with anything on the filesystem.

Parameters:
  • path - Path to a file on disk.
Returns:
Age of the file in days (possibly fractional).
Raises:
  • OSError - If the file doesn't exist.

encodePath(path)

source code 

Safely encodes a filesystem path.

Many Python filesystem functions, such as os.listdir, behave differently if they are passed unicode arguments versus simple string arguments. For instance, os.listdir generally returns unicode path names if it is passed a unicode argument, and string pathnames if it is passed a string argument.

However, this behavior often isn't as consistent as we might like. As an example, os.listdir "gives up" if it finds a filename that it can't properly encode given the current locale settings. This means that the returned list is a mixed set of unicode and simple string paths. This has consequences later, because other filesystem functions like os.path.join will blow up if they are given one string path and one unicode path.

On comp.lang.python, Martin v. Löwis explained the os.listdir behavior like this:

  The operating system (POSIX) does not have the inherent notion that file
  names are character strings. Instead, in POSIX, file names are primarily
  byte strings. There are some bytes which are interpreted as characters
  (e.g. '\x2e', which is '.', or '\x2f', which is '/'), but apart from
  that, most OS layers think these are just bytes.

  Now, most *people* think that file names are character strings.  To
  interpret a file name as a character string, you need to know what the
  encoding is to interpret the file names (which are byte strings) as
  character strings.

  There is, unfortunately, no operating system API to carry the notion of a
  file system encoding. By convention, the locale settings should be used
  to establish this encoding, in particular the LC_CTYPE facet of the
  locale. This is defined in the environment variables LC_CTYPE, LC_ALL,
  and LANG (searched in this order).

  If LANG is not set, the "C" locale is assumed, which uses ASCII as its
  file system encoding. In this locale, '\xe2\x99\xaa\xe2\x99\xac' is not a
  valid file name (at least it cannot be interpreted as characters, and
  hence not be converted to Unicode).

  Now, your Python script has requested that all file names *should* be
  returned as character (ie. Unicode) strings, but Python cannot comply,
  since there is no way to find out what this byte string means, in terms
  of characters.

  So we have three options:

  1. Skip this string, only return the ones that can be converted to Unicode. 
     Give the user the impression the file does not exist.
  2. Return the string as a byte string
  3. Refuse to listdir altogether, raising an exception (i.e. return nothing)

  Python has chosen alternative 2, allowing the application to implement 1
  or 3 on top of that if it wants to (or come up with other strategies,
  such as user feedback).

As a solution, he suggests that rather than passing unicode paths into the filesystem functions, that I should sensibly encode the path first. That is what this function accomplishes. Any function which takes a filesystem path as an argument should encode it first, before using it for any other purpose.

I confess I still don't completely understand how this works. On a system with filesystem encoding "ISO-8859-1", a path u"\xe2\x99\xaa\xe2\x99\xac" is converted into the string "\xe2\x99\xaa\xe2\x99\xac". However, on a system with a "utf-8" encoding, the result is a completely different string: "\xc3\xa2\xc2\x99\xc2\xaa\xc3\xa2\xc2\x99\xc2\xac". A quick test where I write to the first filename and open the second proves that the two strings represent the same file on disk, which is all I really care about.

Parameters:
  • path - Path to encode
Returns:
Path, as a string, encoded appropriately
Raises:
  • ValueError - If the path cannot be encoded properly.
Notes:
  • As a special case, if path is None, then this function will return None.
  • To provide several examples of encoding values, my Debian sarge box with an ext3 filesystem has Python filesystem encoding ISO-8859-1. User Anarcat's Debian box with a xfs filesystem has filesystem encoding ANSI_X3.4-1968. Both my iBook G4 running Mac OS X 10.4 and user Dag Rende's SuSE 9.3 box both have filesystem encoding UTF-8.
  • Just because a filesystem has UTF-8 encoding doesn't mean that it will be able to handle all extended-character filenames. For instance, certain extended-character (but not UTF-8) filenames -- like the ones in the regression test tar file test/data/tree13.tar.gz -- are not valid under Mac OS X, and it's not even possible to extract them from the tarfile on that platform.

nullDevice()

source code 

Attempts to portably return the null device on this system.

The null device is something like /dev/null on a UNIX system. The name varies on other platforms.

deriveDayOfWeek(dayName)

source code 

Converts English day name to numeric day of week as from time.localtime.

For instance, the day monday would be converted to the number 0.

Parameters:
  • dayName (string, i.e. "monday", "tuesday", etc.) - Day of week to convert
Returns:
Integer, where Monday is 0 and Sunday is 6; or -1 if no conversion is possible.

isStartOfWeek(startingDay)

source code 

Indicates whether "today" is the backup starting day per configuration.

If the current day's English name matches the indicated starting day, then today is a starting day.

Parameters:
  • startingDay (string, i.e. "monday", "tuesday", etc.) - Configured starting day.
Returns:
Boolean indicating whether today is the starting day.

buildNormalizedPath(path)

source code 

Returns a "normalized" path based on a path name.

A normalized path is a representation of a path that is also a valid file name. To make a valid file name out of a complete path, we have to convert or remove some characters that are significant to the filesystem -- in particular, the path separator and any leading '.' character (which would cause the file to be hidden in a file listing).

Note that this is a one-way transformation -- you can't safely derive the original path from the normalized path.

To normalize a path, we begin by looking at the first character. If the first character is '/' or '\', it gets removed. If the first character is '.', it gets converted to '_'. Then, we look through the rest of the path and convert all remaining '/' or '\' characters '-', and all remaining whitespace characters to '_'.

As a special case, a path consisting only of a single '/' or '\' character will be converted to '-'.

Parameters:
  • path - Path to normalize
Returns:
Normalized path as described above.
Raises:
  • ValueError - If the path is None

removeKeys(d, keys)

source code 

Removes all of the keys from the dictionary. The dictionary is altered in-place. Each key must exist in the dictionary.

Parameters:
  • d - Dictionary to operate on
  • keys - List of keys to remove
Raises:
  • KeyError - If one of the keys does not exist

displayBytes(bytes, digits=2)

source code 

Format a byte quantity so it can be sensibly displayed.

It's rather difficult to look at a number like "72372224 bytes" and get any meaningful information out of it. It would be more useful to see something like "69.02 MB". That's what this function does. Any time you want to display a byte value, i.e.:

  print "Size: %s bytes" % bytes

Call this function instead:

  print "Size: %s" % displayBytes(bytes)

What comes out will be sensibly formatted. The indicated number of digits will be listed after the decimal point, rounded based on whatever rules are used by Python's standard %f string format specifier. (Values less than 1 kB will be listed in bytes and will not have a decimal point, since the concept of a fractional byte is nonsensical.)

Parameters:
  • bytes (Integer number of bytes.) - Byte quantity.
  • digits (Integer value, typically 2-5.) - Number of digits to display after the decimal point.
Returns:
String, formatted for sensible display.

getFunctionReference(module, function)

source code 

Gets a reference to a named function.

This does some hokey-pokey to get back a reference to a dynamically named function. For instance, say you wanted to get a reference to the os.path.isdir function. You could use:

  myfunc = getFunctionReference("os.path", "isdir")

Although we won't bomb out directly, behavior is pretty much undefined if you pass in None or "" for either module or function.

The only validation we enforce is that whatever we get back must be callable.

I derived this code based on the internals of the Python unittest implementation. I don't claim to completely understand how it works.

Parameters:
  • module (Something like "os.path" or "CedarBackup2.util") - Name of module associated with function.
  • function (Something like "isdir" or "getUidGid") - Name of function
Returns:
Reference to function associated with name.
Raises:
  • ImportError - If the function cannot be found.
  • ValueError - If the resulting reference is not callable.

Copyright: Some of this code, prior to customization, was originally part of the Python 2.3 codebase. Python code is copyright (c) 2001, 2002 Python Software Foundation; All Rights Reserved.

mount(devicePath, mountPoint, fsType)

source code 

Mounts the indicated device at the indicated mount point.

For instance, to mount a CD, you might use device path /dev/cdrw, mount point /media/cdrw and filesystem type iso9660. You can safely use any filesystem type that is supported by mount on your platform. If the type is None, we'll attempt to let mount auto-detect it. This may or may not work on all systems.

Parameters:
  • devicePath - Path of device to be mounted.
  • mountPoint - Path that device should be mounted at.
  • fsType - Type of the filesystem assumed to be available via the device.
Raises:
  • IOError - If the device cannot be mounted.

Note: This only works on platforms that have a concept of "mounting" a filesystem through a command-line "mount" command, like UNIXes. It won't work on Windows.

unmount(mountPoint, removeAfter=False, attempts=1, waitSeconds=0)

source code 

Unmounts whatever device is mounted at the indicated mount point.

Sometimes, it might not be possible to unmount the mount point immediately, if there are still files open there. Use the attempts and waitSeconds arguments to indicate how many unmount attempts to make and how many seconds to wait between attempts. If you pass in zero attempts, no attempts will be made (duh).

If the indicated mount point is not really a mount point per os.path.ismount(), then it will be ignored. This seems to be a safer check then looking through /etc/mtab, since ismount() is already in the Python standard library and is documented as working on all POSIX systems.

If removeAfter is True, then the mount point will be removed using os.rmdir() after the unmount action succeeds. If for some reason the mount point is not a directory, then it will not be removed.

Parameters:
  • mountPoint - Mount point to be unmounted.
  • removeAfter - Remove the mount point after unmounting it.
  • attempts - Number of times to attempt the unmount.
  • waitSeconds - Number of seconds to wait between repeated attempts.
Raises:
  • IOError - If the mount point is still mounted after attempts are exhausted.

Note: This only works on platforms that have a concept of "mounting" a filesystem through a command-line "mount" command, like UNIXes. It won't work on Windows.

deviceMounted(devicePath)

source code 

Indicates whether a specific filesystem device is currently mounted.

We determine whether the device is mounted by looking through the system's mtab file. This file shows every currently-mounted filesystem, ordered by device. We only do the check if the mtab file exists and is readable. Otherwise, we assume that the device is not mounted.

Parameters:
  • devicePath - Path of device to be checked
Returns:
True if device is mounted, false otherwise.

Note: This only works on platforms that have a concept of an mtab file to show mounted volumes, like UNIXes. It won't work on Windows.

sanitizeEnvironment()

source code 

Sanitizes the operating system environment.

The operating system environment is contained in os.environ. This method sanitizes the contents of that dictionary.

Currently, all it does is reset the locale (removing $LC_*) and set the default language ($LANG) to DEFAULT_LANGUAGE. This way, we can count on consistent localization regardless of what the end-user has configured. This is important for code that needs to parse program output.

The os.environ dictionary is modifed in-place. If $LANG is already set to the proper value, it is not re-set, so we can avoid the memory leaks that are documented to occur on BSD-based systems.

Returns:
Copy of the sanitized environment.

dereferenceLink(path, absolute=True)

source code 

Deference a soft link, optionally normalizing it to an absolute path.

Parameters:
  • path - Path of link to dereference
  • absolute - Whether to normalize the result to an absolute path
Returns:
Dereferenced path, or original path if original is not a link.

checkUnique(prefix, values)

source code 

Checks that all values are unique.

The values list is checked for duplicate values. If there are duplicates, an exception is thrown. All duplicate values are listed in the exception.

Parameters:
  • prefix - Prefix to use in the thrown exception
  • values - List of values to check
Raises:
  • ValueError - If there are duplicates in the list

parseCommaSeparatedString(commaString)

source code 

Parses a list of values out of a comma-separated string.

The items in the list are split by comma, and then have whitespace stripped. As a special case, if commaString is None, then None will be returned.

Parameters:
  • commaString - List of values in comma-separated string format.
Returns:
Values from commaString split into a list, or None.

Variables Details [hide private]

LOCALE_VARS

Value:
['LC_ADDRESS',
 'LC_ALL',
 'LC_COLLATE',
 'LC_CTYPE',
 'LC_IDENTIFICATION',
 'LC_MEASUREMENT',
 'LC_MESSAGES',
 'LC_MONETARY',
...