org.apache.uima.internal.util
Class CharacterUtils

java.lang.Object
  extended by org.apache.uima.internal.util.CharacterUtils

public class CharacterUtils
extends java.lang.Object

Collection of utilities for character handling. Contains utilities for semi-automatically creating lexer rules.


Constructor Summary
CharacterUtils()
          Constructor for CharacterUtils.
 
Method Summary
static java.util.ArrayList<org.apache.uima.internal.util.CharacterUtils.CharRange> getDigitRange()
          Generate an ArrayList of CharRanges for what Java considers to be a digit.
static java.util.ArrayList<org.apache.uima.internal.util.CharacterUtils.CharRange> getLetterRange()
          Generate an ArrayList of CharRanges for what Java considers to be a letter.
static void main(java.lang.String[] args)
           
static void printAntlrLexRule(java.lang.String name, java.util.ArrayList<org.apache.uima.internal.util.CharacterUtils.CharRange> charRanges)
           
static void printJavaCCLexRule(java.lang.String name, java.util.ArrayList<org.apache.uima.internal.util.CharacterUtils.CharRange> charRanges)
           
static java.lang.String toHexString(char c)
          Create a hex representation of the UTF-16 encoding of a Java char.
static java.lang.String toUnicodeChar(char c)
          Create a hex representation of the UTF-16 encoding of a Java char.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CharacterUtils

public CharacterUtils()
Constructor for CharacterUtils.

Method Detail

toUnicodeChar

public static java.lang.String toUnicodeChar(char c)
Create a hex representation of the UTF-16 encoding of a Java char. This is the representation that's understood by Java when reading source code.

Parameters:
c - The char to be encoded.
Returns:
String Hex representation of character. For example, the result of encoding 'A' would be "A".

toHexString

public static java.lang.String toHexString(char c)
Create a hex representation of the UTF-16 encoding of a Java char. This is the representation that's understood by the JavaCC lexer.

Parameters:
c - The char to be encoded.
Returns:
String Hex representation of character. For example, the result of encoding 'A' would be "0x0041".

getLetterRange

public static java.util.ArrayList<org.apache.uima.internal.util.CharacterUtils.CharRange> getLetterRange()
Generate an ArrayList of CharRanges for what Java considers to be a letter. I use this as input to Unicode agnostic lexers like ANTLR.

Returns:
ArrayList A list of character ranges.

getDigitRange

public static java.util.ArrayList<org.apache.uima.internal.util.CharacterUtils.CharRange> getDigitRange()
Generate an ArrayList of CharRanges for what Java considers to be a digit. I use this as input to Unicode agnostic lexers like ANTLR.

Returns:
ArrayList A list of character ranges.

printAntlrLexRule

public static void printAntlrLexRule(java.lang.String name,
                                     java.util.ArrayList<org.apache.uima.internal.util.CharacterUtils.CharRange> charRanges)

printJavaCCLexRule

public static void printJavaCCLexRule(java.lang.String name,
                                      java.util.ArrayList<org.apache.uima.internal.util.CharacterUtils.CharRange> charRanges)

main

public static void main(java.lang.String[] args)


Copyright © 2010 The Apache Software Foundation. All Rights Reserved.