CAlphabet Class Reference

Inheritance diagram for CAlphabet:

Inheritance graph
[legend]

List of all members.

Public Member Functions

 CAlphabet (char *alpha, int32_t len)
 CAlphabet (EAlphabet alpha)
 CAlphabet (CAlphabet *alpha)
 ~CAlphabet ()
bool set_alphabet (EAlphabet alpha)
EAlphabet get_alphabet ()
int32_t get_num_symbols ()
int32_t get_num_bits ()
uint8_t remap_to_bin (uint8_t c)
uint8_t remap_to_char (uint8_t c)
void clear_histogram ()
 clear histogram
void add_string_to_histogram (char *p, int64_t len)
void add_string_to_histogram (uint8_t *p, int64_t len)
void add_string_to_histogram (int16_t *p, int64_t len)
void add_string_to_histogram (uint16_t *p, int64_t len)
void add_string_to_histogram (int32_t *p, int64_t len)
void add_string_to_histogram (uint32_t *p, int64_t len)
void add_string_to_histogram (int64_t *p, int64_t len)
void add_string_to_histogram (uint64_t *p, int64_t len)
void add_byte_to_histogram (uint8_t p)
void print_histogram ()
 print histogram
void get_hist (int64_t **h, int32_t *len)
const int64_t * get_histogram ()
 get pointer to histogram
bool check_alphabet (bool print_error=true)
bool check_alphabet_size (bool print_error=true)
int32_t get_num_symbols_in_histogram ()
int32_t get_max_value_in_histogram ()
int32_t get_num_bits_in_histogram ()

Static Public Member Functions

static const char * get_alphabet_name (EAlphabet alphabet)

Static Public Attributes

static const uint8_t B_A = 0
static const uint8_t B_C = 1
static const uint8_t B_G = 2
static const uint8_t B_T = 3
static const uint8_t MAPTABLE_UNDEF = 0xff
static const char * alphabet_names [11] = {"DNA", "RAWDNA", "RNA", "PROTEIN", "ALPHANUM", "CUBE", "RAW", "IUPAC_NUCLEIC_ACID", "IUPAC_AMINO_ACID", "NONE", "UNKNOWN"}

Protected Member Functions

void init_map_table ()
void copy_histogram (CAlphabet *src)

Protected Attributes

EAlphabet alphabet
int32_t num_symbols
int32_t num_bits
uint8_t valid_chars [1<< (sizeof(uint8_t)*8)]
uint8_t maptable_to_bin [1<< (sizeof(uint8_t)*8)]
uint8_t maptable_to_char [1<< (sizeof(uint8_t)*8)]
int64_t histogram [1<< (sizeof(uint8_t)*8)]


Detailed Description

The class Alphabet implements an alphabet and utility functions, to remap characters to more (bit-)efficient representations, check if a string is valid, compute histograms etc.

Currently supported alphabets are DNA, RAWDNA, RNA, PROTEIN, ALPHANUM, CUBE, RAW, IUPAC_NUCLEIC_ACID and IUPAC_AMINO_ACID.

Definition at line 65 of file Alphabet.h.


Constructor & Destructor Documentation

CAlphabet::CAlphabet ( char *  alpha,
int32_t  len 
)

constructor

Parameters:
alpha alphabet to use
len len

Definition at line 25 of file Alphabet.cpp.

CAlphabet::CAlphabet ( EAlphabet  alpha  ) 

constructor

Parameters:
alpha alphabet (type) to use

Definition at line 56 of file Alphabet.cpp.

CAlphabet::CAlphabet ( CAlphabet alpha  ) 

constructor

Parameters:
alpha alphabet to use

Definition at line 62 of file Alphabet.cpp.

CAlphabet::~CAlphabet (  ) 

Definition at line 70 of file Alphabet.cpp.


Member Function Documentation

void CAlphabet::add_byte_to_histogram ( uint8_t  p  ) 

add element to histogram

Parameters:
p element

Definition at line 205 of file Alphabet.h.

void CAlphabet::add_string_to_histogram ( uint64_t *  p,
int64_t  len 
)

make histogram for whole string

Parameters:
p string
len length of string

Definition at line 437 of file Alphabet.cpp.

void CAlphabet::add_string_to_histogram ( int64_t *  p,
int64_t  len 
)

make histogram for whole string

Parameters:
p string
len length of string

Definition at line 429 of file Alphabet.cpp.

void CAlphabet::add_string_to_histogram ( uint32_t *  p,
int64_t  len 
)

make histogram for whole string

Parameters:
p string
len length of string

Definition at line 421 of file Alphabet.cpp.

void CAlphabet::add_string_to_histogram ( int32_t *  p,
int64_t  len 
)

make histogram for whole string

Parameters:
p string
len length of string

Definition at line 413 of file Alphabet.cpp.

void CAlphabet::add_string_to_histogram ( uint16_t *  p,
int64_t  len 
)

make histogram for whole string

Parameters:
p string
len length of string

Definition at line 397 of file Alphabet.cpp.

void CAlphabet::add_string_to_histogram ( int16_t *  p,
int64_t  len 
)

make histogram for whole string

Parameters:
p string
len length of string

Definition at line 405 of file Alphabet.cpp.

void CAlphabet::add_string_to_histogram ( uint8_t *  p,
int64_t  len 
)

make histogram for whole string

Parameters:
p string
len length of string

Definition at line 385 of file Alphabet.cpp.

void CAlphabet::add_string_to_histogram ( char *  p,
int64_t  len 
)

make histogram for whole string

Parameters:
p string
len length of string

Definition at line 391 of file Alphabet.cpp.

bool CAlphabet::check_alphabet ( bool  print_error = true  ) 

check whether symbols in histogram are valid in alphabet e.g. for DNA if only letters ACGT appear

Parameters:
print_error if errors shall be printed
Returns:
if symbols in histogram are valid in alphabet

Definition at line 490 of file Alphabet.cpp.

bool CAlphabet::check_alphabet_size ( bool  print_error = true  ) 

check whether symbols in histogram ALL fit in alphabet

Parameters:
print_error if errors shall be printed
Returns:
if symbols in histogram ALL fit in alphabet

Definition at line 512 of file Alphabet.cpp.

void CAlphabet::clear_histogram (  ) 

clear histogram

Definition at line 379 of file Alphabet.cpp.

void CAlphabet::copy_histogram ( CAlphabet src  )  [protected]

copy histogram

Parameters:
src alphabet to copy histogram from

Definition at line 529 of file Alphabet.cpp.

EAlphabet CAlphabet::get_alphabet (  ) 

get alphabet

Returns:
alphabet

Definition at line 98 of file Alphabet.h.

const char * CAlphabet::get_alphabet_name ( EAlphabet  alphabet  )  [static]

return alphabet name

Parameters:
alphabet alphabet type to get name from

Definition at line 534 of file Alphabet.cpp.

void CAlphabet::get_hist ( int64_t **  h,
int32_t *  len 
)

get histogram

Parameters:
h where the histogram will be stored
len length of histogram

Definition at line 218 of file Alphabet.h.

const int64_t* CAlphabet::get_histogram (  ) 

get pointer to histogram

Definition at line 230 of file Alphabet.h.

int32_t CAlphabet::get_max_value_in_histogram (  ) 

return maximum value in histogram

Returns:
maximum value in histogram

Definition at line 445 of file Alphabet.cpp.

int32_t CAlphabet::get_num_bits (  ) 

get number of bits necessary to store all symbols in alphabet

Returns:
number of necessary storage bits

Definition at line 117 of file Alphabet.h.

int32_t CAlphabet::get_num_bits_in_histogram (  ) 

return number of bits required to store all symbols in histogram

Returns:
number of bits required to store all symbols in histogram

Definition at line 472 of file Alphabet.cpp.

int32_t CAlphabet::get_num_symbols (  ) 

get number of symbols in alphabet

Returns:
number of symbols

Definition at line 107 of file Alphabet.h.

int32_t CAlphabet::get_num_symbols_in_histogram (  ) 

return number of symbols in histogram

Returns:
number of symbols in histogram

Definition at line 460 of file Alphabet.cpp.

void CAlphabet::init_map_table (  )  [protected]

init map table

Definition at line 124 of file Alphabet.cpp.

void CAlphabet::print_histogram (  ) 

print histogram

Definition at line 481 of file Alphabet.cpp.

uint8_t CAlphabet::remap_to_bin ( uint8_t  c  ) 

remap element e.g translate ACGT to 0123

Parameters:
c element to remap
Returns:
remapped element

Definition at line 127 of file Alphabet.h.

uint8_t CAlphabet::remap_to_char ( uint8_t  c  ) 

remap element e.g translate 0123 to ACGT

Parameters:
c element to remap
Returns:
remapped element

Definition at line 137 of file Alphabet.h.

bool CAlphabet::set_alphabet ( EAlphabet  alpha  ) 

set alphabet and initialize mapping table (for remap)

Parameters:
alpha new alphabet

Definition at line 74 of file Alphabet.cpp.


Member Data Documentation

alphabet

Definition at line 302 of file Alphabet.h.

const char * CAlphabet::alphabet_names = {"DNA", "RAWDNA", "RNA", "PROTEIN", "ALPHANUM", "CUBE", "RAW", "IUPAC_NUCLEIC_ACID", "IUPAC_AMINO_ACID", "NONE", "UNKNOWN"} [static]

alphabet names

Definition at line 298 of file Alphabet.h.

const uint8_t CAlphabet::B_A = 0 [static]

B_A

Definition at line 288 of file Alphabet.h.

const uint8_t CAlphabet::B_C = 1 [static]

B_C

Definition at line 290 of file Alphabet.h.

const uint8_t CAlphabet::B_G = 2 [static]

B_G

Definition at line 292 of file Alphabet.h.

const uint8_t CAlphabet::B_T = 3 [static]

B_T

Definition at line 294 of file Alphabet.h.

int64_t CAlphabet::histogram[1<< (sizeof(uint8_t)*8)] [protected]

histogram

Definition at line 314 of file Alphabet.h.

uint8_t CAlphabet::maptable_to_bin[1<< (sizeof(uint8_t)*8)] [protected]

maptable to bin

Definition at line 310 of file Alphabet.h.

uint8_t CAlphabet::maptable_to_char[1<< (sizeof(uint8_t)*8)] [protected]

maptable to char

Definition at line 312 of file Alphabet.h.

const uint8_t CAlphabet::MAPTABLE_UNDEF = 0xff [static]

MAPTABLE UNDEF

Definition at line 296 of file Alphabet.h.

int32_t CAlphabet::num_bits [protected]

number of bits

Definition at line 306 of file Alphabet.h.

int32_t CAlphabet::num_symbols [protected]

number of symbols

Definition at line 304 of file Alphabet.h.

uint8_t CAlphabet::valid_chars[1<< (sizeof(uint8_t)*8)] [protected]

valid chars

Definition at line 308 of file Alphabet.h.


The documentation for this class was generated from the following files:

SHOGUN Machine Learning Toolbox - Documentation