[Contents]   [Back]   [Prev]   [Up]   [Next]   [Forward]  


Package Glib.Unicode

This package provides functions for handling of unicode characters and utf8 strings. See also Glib.Convert.

Types

type G_Unicode_Type is 
    (Unicode_Control,
     Unicode_Format, 
     Unicode_Unassigned, 
     Unicode_Private_Use, 
     Unicode_Surrogate, 
     Unicode_Lowercase_Letter, 
     Unicode_Modifier_Letter, 
     Unicode_Other_Letter, 
     Unicode_Titlecase_Letter, 
     Unicode_Uppercase_Letter, 
     Unicode_Combining_Mark, 
     Unicode_Enclosing_Mark, 
     Unicode_Non_Spacing_Mark, 
     Unicode_Decimal_Number, 
     Unicode_Letter_Number, 
     Unicode_Other_Number, 
     Unicode_Connect_Punctuation, 
     Unicode_Dash_Punctuation, 
     Unicode_Close_Punctuation, 
     Unicode_Final_Punctuation, 
     Unicode_Initial_Punctuation, 
     Unicode_Other_Punctuation, 
     Unicode_Open_Punctuation, 
     Unicode_Currency_Symbol, 
     Unicode_Modifier_Symbol, 
     Unicode_Math_Symbol, 
     Unicode_Other_Symbol, 
     Unicode_Line_Separator, 
     Unicode_Paragraph_Separator, 
     Unicode_Space_Separator); 

The possible character classifications. See http://www.unicode.org/Public/UNIDATA/UnicodeData.html


Subprograms

procedure UTF8_Validate        
  (Str                :        UTF8_String;
   Valid              : out    Boolean;
   Invalid_Pos        : out    Natural);

Validate a UTF8 string.
Set Valid to True if valid, set Invalid_Pos to first invalid byte.


Character classes


function Is_Space              
  (Char               :        Gunichar)
   return Boolean;

True if Char is a space character


function Is_Alnum              
  (Char               :        Gunichar)
   return Boolean;

True if Char is an alphabetical or numerical character


function Is_Alpha              
  (Char               :        Gunichar)
   return Boolean;

True if Char is an alphabetical character


function Is_Digit              
  (Char               :        Gunichar)
   return Boolean;

True if Char is a digit


function Is_Lower              
  (Char               :        Gunichar)
   return Boolean;

True if Char is a lower-case character


function Is_Upper              
  (Char               :        Gunichar)
   return Boolean;

True if Char is an upper-case character


function Is_Punct              
  (Char               :        Gunichar)
   return Boolean;

True if Char is a punctuation character


function Unichar_Type          
  (Char               :        Gunichar)
   return G_Unicode_Type;

Return the unicode character type of a given character


Case handling


function To_Lower              
  (Char               :        Gunichar)
   return Gunichar;

Convert Char to lower cases


function To_Upper              
  (Char               :        Gunichar)
   return Gunichar;

Convert Char to upper cases


function UTF8_Strdown          
  (Str                :        ICS.chars_ptr;
   Len                :        Integer)
   return ICS.chars_ptr;




function UTF8_Strdown          
  (Str                :        UTF8_String)
   return UTF8_String;

Convert Str to lower cases


function UTF8_Strup            
  (Str                :        ICS.chars_ptr;
   Len                :        Integer)
   return ICS.chars_ptr;




function UTF8_Strup            
  (Str                :        UTF8_String)
   return UTF8_String;

Convert Str to upper cases


Manipulating strings


function UTF8_Strlen           
  (Str                :        ICS.chars_ptr;
   Max                :        Integer := -1)
   return Glong;




function UTF8_Strlen           
  (Str                :        UTF8_String)
   return Glong;

Return the number of characters in Str


function UTF8_Find_Next_Char   
  (Str                :        ICS.chars_ptr;
   Str_End            :        ICS.chars_ptr := ICS.Null_Ptr)
   return ICS.chars_ptr;




function UTF8_Find_Next_Char   
  (Str                :        UTF8_String;
   Index              :        Natural)
   return Natural;




function UTF8_Find_Prev_Char   
  (Str_Start          :        ICS.chars_ptr;
   Str                :        ICS.chars_ptr)
   return ICS.chars_ptr;




function UTF8_Find_Prev_Char   
  (Str                :        UTF8_String;
   Index              :        Natural)
   return Natural;

Find the start of the previous UTF8 character after the Index-th byte.
Index doesn't need to be on the start of a character. Index is set to a value smaller than Str'First if there is no previous character.


Conversions


function Unichar_To_UTF8       
  (C                  :        Gunichar;
   Buffer             :        ICS.chars_ptr := ICS.Null_Ptr)
   return Natural;




procedure Unichar_To_UTF8      
  (C                  :        Gunichar;
   Buffer             : out    UTF8_String;
   Last               : out    Natural);

Encode C into Buffer. Buffer must have at least 6 bytes free.
Return the index of the last byte written in Buffer.


function UTF8_Get_Char         
  (Str                :        UTF8_String)
   return Gunichar;

Converts a sequence of bytes encoded as UTF8 to a unicode character.
If Str doesn't point to a valid UTF8 encoded character, the result is undefined.


function UTF8_Get_Char_Validated
  (Str                :        UTF8_String)
   return Gunichar;

Same as above. However, if the sequence if an incomplete start of a
possibly valid character, it returns -2. If the sequence is invalid, returns -1.



[Contents]   [Back]   [Prev]   [Up]   [Next]   [Forward]