It basically uses the algorithm in the unix "comm" command (hence the name) to compute:
where maps a sequence
that consists of letters in
to a feature vector of size
. In this feature vector each entry denotes how often the k-mer appears in that
.
Note that this representation is especially tuned to small alphabets (like the 2-bit alphabet DNA), for which it enables spectrum kernels of order 8.
For this kernel the linadd speedups are quite efficiently implemented using direct maps.
Definition at line 43 of file CommWordStringKernel.h.
CCommWordStringKernel::CCommWordStringKernel | ( | INT | size, | |
bool | use_sign, | |||
ENormalizationType | normalization_ = FULL_NORMALIZATION | |||
) |
constructor
size | cache size | |
use_sign | if sign shall be used | |
normalization_ | type of normalization |
Definition at line 16 of file CommWordStringKernel.cpp.
CCommWordStringKernel::CCommWordStringKernel | ( | CStringFeatures< WORD > * | l, | |
CStringFeatures< WORD > * | r, | |||
bool | use_sign = false , |
|||
ENormalizationType | normalization_ = FULL_NORMALIZATION , |
|||
INT | size = 10 | |||
) |
constructor
l | features of left-hand side | |
r | features of right-hand side | |
use_sign | if sign shall be used | |
normalization_ | type of normalization | |
size | cache size |
Definition at line 26 of file CommWordStringKernel.cpp.
CCommWordStringKernel::~CCommWordStringKernel | ( | ) | [virtual] |
Definition at line 51 of file CommWordStringKernel.cpp.
initialize kernel
l | features of left-hand side | |
r | features of right-hand side |
Reimplemented from CStringKernel< ST >.
Reimplemented in CWeightedCommWordStringKernel.
Definition at line 92 of file CommWordStringKernel.cpp.
void CCommWordStringKernel::cleanup | ( | ) | [virtual] |
clean up kernel
Reimplemented from CKernel.
Reimplemented in CWeightedCommWordStringKernel.
Definition at line 169 of file CommWordStringKernel.cpp.
bool CCommWordStringKernel::load_init | ( | FILE * | src | ) | [virtual] |
load kernel init_data
src | file to load from |
Implements CKernel.
Definition at line 187 of file CommWordStringKernel.cpp.
bool CCommWordStringKernel::save_init | ( | FILE * | dest | ) | [virtual] |
save kernel init_data
dest | file to save to |
Implements CKernel.
Definition at line 192 of file CommWordStringKernel.cpp.
virtual EKernelType CCommWordStringKernel::get_kernel_type | ( | ) | [virtual] |
return what type of kernel we are
Implements CKernel.
Reimplemented in CWeightedCommWordStringKernel.
Definition at line 100 of file CommWordStringKernel.h.
virtual const CHAR* CCommWordStringKernel::get_name | ( | ) | [virtual] |
return the kernel's name
Implements CKernel.
Reimplemented in CWeightedCommWordStringKernel.
Definition at line 106 of file CommWordStringKernel.h.
bool CCommWordStringKernel::init_dictionary | ( | INT | size | ) | [virtual] |
initialize optimization
count | count | |
IDX | index | |
weights | weights |
Reimplemented from CKernel.
Definition at line 409 of file CommWordStringKernel.cpp.
bool CCommWordStringKernel::delete_optimization | ( | ) | [virtual] |
delete optimization
Reimplemented from CKernel.
Definition at line 436 of file CommWordStringKernel.cpp.
compute optimized
idx | index to compute |
Reimplemented from CKernel.
Reimplemented in CWeightedCommWordStringKernel.
Definition at line 444 of file CommWordStringKernel.cpp.
add to normal
idx | where to add | |
weight | what to add |
Reimplemented from CKernel.
Reimplemented in CWeightedCommWordStringKernel.
Definition at line 366 of file CommWordStringKernel.cpp.
void CCommWordStringKernel::clear_normal | ( | ) | [virtual] |
void CCommWordStringKernel::remove_lhs | ( | ) | [virtual] |
remove lhs from kernel
Reimplemented from CKernel.
Definition at line 59 of file CommWordStringKernel.cpp.
void CCommWordStringKernel::remove_rhs | ( | ) | [virtual] |
remove rhs from kernel
Reimplemented from CKernel.
Definition at line 79 of file CommWordStringKernel.cpp.
virtual EFeatureType CCommWordStringKernel::get_feature_type | ( | ) | [virtual] |
return feature type the kernel can deal with
Reimplemented from CStringKernel< ST >.
Reimplemented in CWeightedCommWordStringKernel.
Definition at line 157 of file CommWordStringKernel.h.
get dictionary
dsize | dictionary size will be stored in here | |
dweights | dictionary weights will be stored in here |
Definition at line 164 of file CommWordStringKernel.h.
DREAL * CCommWordStringKernel::compute_scoring | ( | INT | max_degree, | |
INT & | num_feat, | |||
INT & | num_sym, | |||
DREAL * | target, | |||
INT | num_suppvec, | |||
INT * | IDX, | |||
DREAL * | alphas, | |||
bool | do_init = true | |||
) | [virtual] |
compute scoring
max_degree | maximum degree | |
num_feat | number of features | |
num_sym | number of symbols | |
target | target | |
num_suppvec | number of support vectors | |
IDX | IDX | |
alphas | alphas | |
do_init | if initialization shall be performed |
Reimplemented in CWeightedCommWordStringKernel.
Definition at line 490 of file CommWordStringKernel.cpp.
CHAR * CCommWordStringKernel::compute_consensus | ( | INT & | num_feat, | |
INT | num_suppvec, | |||
INT * | IDX, | |||
DREAL * | alphas | |||
) |
compute consensus
num_feat | number of features | |
num_suppvec | number of support vectors | |
IDX | IDX | |
alphas | alphas |
Definition at line 612 of file CommWordStringKernel.cpp.
void CCommWordStringKernel::set_use_dict_diagonal_optimization | ( | bool | flag | ) |
set_use_dict_diagonal_optimization
flag | enable diagonal optimization |
Definition at line 202 of file CommWordStringKernel.h.
bool CCommWordStringKernel::get_use_dict_diagonal_optimization | ( | ) |
get.use.dict.diagonal.optimization
Definition at line 211 of file CommWordStringKernel.h.
compute kernel function for features a and b idx_{a,b} denote the index of the feature vectors in the corresponding feature object
idx_a | index a | |
idx_b | index b |
Implements CKernel.
Definition at line 225 of file CommWordStringKernel.h.
DREAL CCommWordStringKernel::compute_helper | ( | INT | idx_a, | |
INT | idx_b, | |||
bool | do_sort | |||
) | [protected, virtual] |
helper for compute
idx_a | index a | |
idx_b | index b | |
do_sort | if sorting shall be performed |
Reimplemented in CWeightedCommWordStringKernel.
Definition at line 239 of file CommWordStringKernel.cpp.
helper to compute only diagonal normalization for training
idx_a | index a |
Definition at line 197 of file CommWordStringKernel.cpp.
DREAL CCommWordStringKernel::normalize_weight | ( | DREAL * | weights, | |
DREAL | value, | |||
INT | seq_num, | |||
INT | seq_len, | |||
ENormalizationType | p_normalization | |||
) | [protected] |
normalize weight
weights | weights | |
value | value | |
seq_num | sequence number | |
seq_len | length of sequence | |
p_normalization | type of normalization |
Definition at line 254 of file CommWordStringKernel.h.
virtual EFeatureClass CStringKernel< ST >::get_feature_class | ( | ) | [virtual, inherited] |
return feature class the kernel can deal with
Implements CKernel.
Definition at line 63 of file StringKernel.h.
get kernel matrix
dst | destination where matrix will be stored | |
m | dimension m of matrix | |
n | dimension n of matrix |
Definition at line 79 of file Kernel.cpp.
get kernel matrix real
m | dimension m of matrix | |
n | dimension n of matrix | |
target | the kernel matrix |
Definition at line 216 of file Kernel.cpp.
SHORTREAL * CKernel::get_kernel_matrix_shortreal | ( | int & | m, | |
int & | n, | |||
SHORTREAL * | target | |||
) | [virtual, inherited] |
get kernel matrix shortreal
m | dimension m of matrix | |
n | dimension n of matrix | |
target | target for kernel matrix |
Reimplemented in CCustomKernel.
Definition at line 146 of file Kernel.cpp.
bool CKernel::load | ( | CHAR * | fname | ) | [inherited] |
load the kernel matrix
fname | filename to load from |
Definition at line 322 of file Kernel.cpp.
bool CKernel::save | ( | CHAR * | fname | ) | [inherited] |
save kernel matrix
fname | filename to save to |
Definition at line 327 of file Kernel.cpp.
CFeatures* CKernel::get_lhs | ( | ) | [inherited] |
CFeatures* CKernel::get_rhs | ( | ) | [inherited] |
INT CKernel::get_num_vec_lhs | ( | ) | [inherited] |
INT CKernel::get_num_vec_rhs | ( | ) | [inherited] |
bool CKernel::has_features | ( | ) | [inherited] |
void CKernel::remove_lhs_and_rhs | ( | ) | [virtual, inherited] |
remove lhs and rhs from kernel
Definition at line 358 of file Kernel.cpp.
void CKernel::set_cache_size | ( | INT | size | ) | [inherited] |
int CKernel::get_cache_size | ( | ) | [inherited] |
void CKernel::list_kernel | ( | ) | [inherited] |
list kernel
Definition at line 389 of file Kernel.cpp.
bool CKernel::has_property | ( | EKernelProperty | p | ) | [inherited] |
EOptimizationType CKernel::get_optimization_type | ( | ) | [inherited] |
virtual void CKernel::set_optimization_type | ( | EOptimizationType | t | ) | [virtual, inherited] |
bool CKernel::get_is_initialized | ( | ) | [inherited] |
bool CKernel::init_optimization_svm | ( | CSVM * | svm | ) | [inherited] |
initialize optimization
svm | svm model |
Definition at line 644 of file Kernel.cpp.
void CKernel::compute_batch | ( | INT | num_vec, | |
INT * | vec_idx, | |||
DREAL * | target, | |||
INT | num_suppvec, | |||
INT * | IDX, | |||
DREAL * | alphas, | |||
DREAL | factor = 1.0 | |||
) | [virtual, inherited] |
computes output for a batch of examples in an optimized fashion (favorable if kernel supports it, i.e. has KP_BATCHEVALUATION. to the outputvector target (of length num_vec elements) the output for the examples enumerated in vec_idx are added. therefore make sure that it is initialized with ZERO. the following num_suppvec, IDX, alphas arguments are the number of support vectors, their indices and weights
Reimplemented in CCombinedKernel, CWeightedDegreePositionStringKernel, and CWeightedDegreeStringKernel.
Definition at line 568 of file Kernel.cpp.
DREAL CKernel::get_combined_kernel_weight | ( | ) | [inherited] |
void CKernel::set_combined_kernel_weight | ( | double | nw | ) | [inherited] |
INT CKernel::get_num_subkernels | ( | ) | [virtual, inherited] |
get number of subkernels
Reimplemented in CCombinedKernel, CWeightedDegreePositionStringKernel, and CWeightedDegreeStringKernel.
Definition at line 583 of file Kernel.cpp.
void CKernel::compute_by_subkernel | ( | INT | vector_idx, | |
DREAL * | subkernel_contrib | |||
) | [virtual, inherited] |
compute by subkernel
vector_idx | index | |
subkernel_contrib | subkernel contribution |
Reimplemented in CCombinedKernel, CWeightedDegreePositionStringKernel, and CWeightedDegreeStringKernel.
Definition at line 588 of file Kernel.cpp.
get subkernel weights
num_weights | number of weights will be stored here |
Reimplemented in CCombinedKernel, CWeightedDegreePositionStringKernel, and CWeightedDegreeStringKernel.
Definition at line 593 of file Kernel.cpp.
set subkernel weights
weights | subkernel weights | |
num_weights | number of weights |
Reimplemented in CCombinedKernel, CWeightedDegreePositionStringKernel, and CWeightedDegreeStringKernel.
Definition at line 599 of file Kernel.cpp.
bool CKernel::get_precompute_matrix | ( | ) | [inherited] |
bool CKernel::get_precompute_subkernel_matrix | ( | ) | [inherited] |
virtual void CKernel::set_precompute_matrix | ( | bool | flag, | |
bool | subkernel_flag | |||
) | [virtual, inherited] |
set precompute matrix
flag | flag | |
subkernel_flag | subkernel flag |
Reimplemented in CCombinedKernel.
void CKernel::set_property | ( | EKernelProperty | p | ) | [protected, inherited] |
void CKernel::unset_property | ( | EKernelProperty | p | ) | [protected, inherited] |
void CKernel::set_is_initialized | ( | bool | p_init | ) | [protected, inherited] |
void CKernel::do_precompute_matrix | ( | ) | [protected, inherited] |
DREAL* CCommWordStringKernel::sqrtdiag_lhs [protected] |
sqrt diagonal of left-hand side
Definition at line 287 of file CommWordStringKernel.h.
DREAL* CCommWordStringKernel::sqrtdiag_rhs [protected] |
sqrt diagonal of right-hand side
Definition at line 289 of file CommWordStringKernel.h.
bool CCommWordStringKernel::initialized [protected] |
if kernel is initialized
Definition at line 291 of file CommWordStringKernel.h.
INT CCommWordStringKernel::dictionary_size [protected] |
size of dictionary (number of possible strings)
Definition at line 294 of file CommWordStringKernel.h.
DREAL* CCommWordStringKernel::dictionary_weights [protected] |
dictionary weights - array to hold counters for all possible strings
Definition at line 297 of file CommWordStringKernel.h.
bool CCommWordStringKernel::use_sign [protected] |
if sign shall be used
Definition at line 300 of file CommWordStringKernel.h.
type of normalization
Definition at line 302 of file CommWordStringKernel.h.
bool CCommWordStringKernel::use_dict_diagonal_optimization [protected] |
whether diagonal optimization shall be used
Definition at line 305 of file CommWordStringKernel.h.
INT* CCommWordStringKernel::dict_diagonal_optimization [protected] |
array to hold counters for all strings
Definition at line 307 of file CommWordStringKernel.h.
INT CKernel::cache_size [protected, inherited] |
KERNELCACHE_ELEM* CKernel::kernel_matrix [protected, inherited] |
SHORTREAL* CKernel::precomputed_matrix [protected, inherited] |
bool CKernel::precompute_subkernel_matrix [protected, inherited] |
bool CKernel::precompute_matrix [protected, inherited] |
CFeatures* CKernel::lhs [protected, inherited] |
CFeatures* CKernel::rhs [protected, inherited] |
DREAL CKernel::combined_kernel_weight [protected, inherited] |
bool CKernel::optimization_initialized [protected, inherited] |
EOptimizationType CKernel::opt_type [protected, inherited] |
ULONG CKernel::properties [protected, inherited] |
CParallel CSGObject::parallel [static, inherited] |
Definition at line 105 of file SGObject.h.
CIO CSGObject::io [static, inherited] |
Definition at line 106 of file SGObject.h.
CVersion CSGObject::version [static, inherited] |
Definition at line 107 of file SGObject.h.