Package logilab-common-0 :: Package 39 :: Package 0 :: Module textutils
[frames] | no frames]

Module textutils

source code

Some text manipulation utility functions.

:author:    Logilab
:copyright: 2003-2008 LOGILAB S.A. (Paris, FRANCE), all rights reserved.
:contact: http://www.logilab.fr/ -- mailto:contact@logilab.fr
:license: General Public License version 2 - http://www.gnu.org/licenses

:group text formatting: normalize_text, normalize_paragraph, pretty_match,unquote, colorize_ansi
:group text manipulation: searchall, get_csv
:sort: text formatting, text manipulation

:type ANSI_STYLES: dict(str)
:var ANSI_STYLES: dictionary mapping style identifier to ANSI terminal code

:type ANSI_COLORS: dict(str)
:var ANSI_COLORS: dictionary mapping color identifier to ANSI terminal code

:type ANSI_PREFIX: str
:var ANSI_PREFIX:
  ANSI terminal code notifing the start of an ANSI escape sequence
  
:type ANSI_END: str
:var ANSI_END:
  ANSI terminal code notifing the end of an ANSI escape sequence
  
:type ANSI_RESET: str
:var ANSI_RESET:
  ANSI terminal code reseting format defined by a previous ANSI escape sequence

Functions
 
unormalize(ustring, ignorenonascii=False)
replace diacritical characters with their corresponding ascii characters...
source code
 
unquote(string)
remove optional quotes (simple or double) from the string
source code
 
normalize_text(text, line_len=80, indent='', rest=False)
normalize a text to display it with a maximum line size and optionally arbitrary indentation.
source code
 
normalize_paragraph(text, line_len=80, indent='')
normalize a text to display it with a maximum line size and optionaly arbitrary indentation.
source code
 
normalize_rest_paragraph(text, line_len=80, indent='')
normalize a ReST text to display it with a maximum line size and optionaly arbitrary indentation.
source code
 
splittext(text, line_len)
split the given text on space according to the given max line size
source code
 
get_csv(string, sep=',')
return a list of string in from a csv formatted line
source code
 
apply_units(string, units, inter=None, final=float, blank_reg=_BLANK_RE, value_reg=_VALUE_RE)
Parse the string applying the units defined in units (eg: "1.5m",{'m',60} -> 80).
source code
 
pretty_match(match, string, underline_char='^')
return a string with the match location underlined:
source code
 
colorize_ansi(msg, color=None, style=None)
colorize message by wrapping it with ansi escape codes
source code
Variables
  linesep = '\n'
  MANUAL_UNICODE_MAP = {u'\xa1': u'!', u'\u0142': u'l', u'\u2044...
  BYTE_UNITS = {"B": 1, "KB": 1024, "MB": 1024** 2, "GB": 1024**...
  TIME_UNITS = {"ms": 0.0001, "s": 1, "min": 60, "h": 60* 60, "d...
  ANSI_PREFIX = '\033['
  ANSI_END = 'm'
  ANSI_RESET = '\033[0m'
  ANSI_STYLES = {'reset': "0", 'bold': "1", 'italic': "3", 'unde...
  ANSI_COLORS = {'reset': "0", 'black': "30", 'red': "31", 'gree...
Function Details

unormalize(ustring, ignorenonascii=False)

source code 
replace diacritical characters with their corresponding ascii characters
    

unquote(string)

source code 
remove optional quotes (simple or double) from the string

:type string: str or unicode
:param string: an optionaly quoted string

:rtype: str or unicode
:return: the unquoted string (or the input string if it wasn't quoted)

normalize_text(text, line_len=80, indent='', rest=False)

source code 
normalize a text to display it with a maximum line size and
optionally arbitrary indentation. Line jumps are normalized but blank
lines are kept. The indentation string may be used to insert a
comment (#) or a quoting (>) mark  for instance.

:type text: str or unicode
:param text: the input text to normalize

:type line_len: int
:param line_len: expected maximum line's length, default to 80

:type indent: str or unicode
:param indent: optional string to use as indentation

:rtype: str or unicode
:return:
  the input text normalized to fit on lines with a maximized size
  inferior to `line_len`, and optionally prefixed by an
  indentation string

normalize_paragraph(text, line_len=80, indent='')

source code 
normalize a text to display it with a maximum line size and
optionaly arbitrary indentation. Line jumps are normalized. The
indentation string may be used top insert a comment mark for
instance.

:type text: str or unicode
:param text: the input text to normalize

:type line_len: int
:param line_len: expected maximum line's length, default to 80

:type indent: str or unicode
:param indent: optional string to use as indentation

:rtype: str or unicode
:return:
  the input text normalized to fit on lines with a maximized size
  inferior to `line_len`, and optionally prefixed by an
  indentation string

normalize_rest_paragraph(text, line_len=80, indent='')

source code 
normalize a ReST text to display it with a maximum line size and
optionaly arbitrary indentation. Line jumps are normalized. The
indentation string may be used top insert a comment mark for
instance.

:type text: str or unicode
:param text: the input text to normalize

:type line_len: int
:param line_len: expected maximum line's length, default to 80

:type indent: str or unicode
:param indent: optional string to use as indentation

:rtype: str or unicode
:return:
  the input text normalized to fit on lines with a maximized size
  inferior to `line_len`, and optionally prefixed by an
  indentation string

splittext(text, line_len)

source code 
split the given text on space according to the given max line size

return a 2-uple:
* a line <= line_len if possible
* the rest of the text which has to be reported on another line

get_csv(string, sep=',')

source code 
return a list of string in from a csv formatted line

>>> get_csv('a, b, c   ,  4')
['a', 'b', 'c', '4']
>>> get_csv('a')
['a']
>>>

:type string: str or unicode
:param string: a csv line

:type sep: str or unicode
:param sep: field separator, default to the comma (',')

:rtype: str or unicode
:return: the unquoted string (or the input string if it wasn't quoted)    

apply_units(string, units, inter=None, final=float, blank_reg=_BLANK_RE, value_reg=_VALUE_RE)

source code 
Parse the string applying the units defined in units
(eg: "1.5m",{'m',60} -> 80).
    
:type string: str or unicode
:param string: the string to parse

:type units: dict (or any object with __getitem__ using basestring key)
:param units: a dict mapping a unit string repr to its value

:type inter: type
:param inter: used to parse every intermediate value (need __sum__)

:type blank_reg: regexp
:param blank_reg: should match eveyr blank char to ignore.

:type value_reg: regexp with "value" and optional "unit" group
:param value_reg: match a value and it's unit into the 

pretty_match(match, string, underline_char='^')

source code 
return a string with the match location underlined:

>>> import re
>>> print pretty_match(re.search('mange', 'il mange du bacon'), 'il mange du bacon')
il mange du bacon
   ^^^^^
>>>

:type match: _sre.SRE_match
:param match: object returned by re.match, re.search or re.finditer

:type string: str or unicode
:param string:
  the string on which the regular expression has been applied to
  obtain the `match` object

:type underline_char: str or unicode
:param underline_char:
  character to use to underline the matched section, default to the
  carret '^'

:rtype: str or unicode
:return:
  the original string with an inserted line to underline the match
  location

colorize_ansi(msg, color=None, style=None)

source code 
colorize message by wrapping it with ansi escape codes

:type msg: str or unicode
:param msg: the message string to colorize

:type color: str or None
:param color:
  the color identifier (see `ANSI_COLORS` for available values)

:type style: str or None
:param style:
  style string (see `ANSI_COLORS` for available values). To get
  several style effects at the same time, use a coma as separator.

:raise KeyError: if an unexistant color or style identifier is given

:rtype: str or unicode
:return: the ansi escaped string


Variables Details

MANUAL_UNICODE_MAP

Value:
{u'\xa1': u'!', u'\u0142': u'l', u'\u2044': u'/', u'\xc6': u'AE', u'\x\
a9': u'(c)', u'\xab': u'"', u'\xe6': u'ae', u'\xae': u'(r)', u'\u0153'\
: u'oe', u'\u0152': u'OE', u'\xd8': u'O', u'\xf8': u'o', u'\xbb': u'"'\
, u'\xdf': u'ss',}

BYTE_UNITS

Value:
{"B": 1, "KB": 1024, "MB": 1024** 2, "GB": 1024** 3, "TB": 1024** 4,}

TIME_UNITS

Value:
{"ms": 0.0001, "s": 1, "min": 60, "h": 60* 60, "d": 60* 60* 24,}

ANSI_STYLES

Value:
{'reset': "0", 'bold': "1", 'italic': "3", 'underline': "4", 'blink': \
"5", 'inverse': "7", 'strike': "9",}

ANSI_COLORS

Value:
{'reset': "0", 'black': "30", 'red': "31", 'green': "32", 'yellow': "3\
3", 'blue': "34", 'magenta': "35", 'cyan': "36", 'white': "37",}