weka.attributeSelection
Class ScatterSearchV1

java.lang.Object
  extended by weka.attributeSelection.ASSearch
      extended by weka.attributeSelection.ScatterSearchV1
All Implemented Interfaces:
java.io.Serializable, OptionHandler, RevisionHandler, TechnicalInformationHandler

public class ScatterSearchV1
extends ASSearch
implements OptionHandler, TechnicalInformationHandler

Class for performing the Sequential Scatter Search.

Scatter Search :

Performs an Scatter Search through the space of attribute subsets. Start with a population of many significants and diverses subset stops when the result is higher than a given treshold or there's not more improvement
For more information see:

F?lix Garc?a L?pez (2004). Solving feature subset selection problem by a Parallel Scatter Search. Elsevier.

Valid options are:

 -Z <num>
  Specify the number of subsets to generate 
  in the initial population..
 -T <threshold>
  Specify the treshold used for considering when a subset is significant.
 -R <0 = greedy combination | 1 = reduced greedy combination >
  Specify the kind of combiantion 
  for using it in the combination method.
 -S <seed>
  Set the random number seed.
  (default = 1)
 -D
  Verbose output for monitoring the search.
BibTeX:
 @book{L?pez2004,
    author = {F?lix Garc?a L?pez},
    month = {October},
    publisher = {Elsevier},
    title = {Solving feature subset selection problem by a Parallel Scatter Search},
    year = {2004},
    language = {English}
 }
 

from the Book: Solving feature subset selection problem by a Parallel Scatter Search, F?lix Garc?a L?pez.

Version:
$Revision: 4848 $
Author:
Adrian Pino (apinoa@facinf.uho.edu.cu)
See Also:
Serialized Form

Nested Class Summary
 class ScatterSearchV1.Subset
           
 
Field Summary
static Tag[] TAGS_SELECTION
           
 
Constructor Summary
ScatterSearchV1()
           
 
Method Summary
 int[] attributeList(java.util.BitSet group)
          converts a BitSet into a list of attribute indexes
 java.util.List<ScatterSearchV1.Subset> bubbleSubsetSort(java.util.List<ScatterSearchV1.Subset> subsetList)
          Sort a List of subsets according to their merits
 double calculateTreshhold()
          Calculate the treshold of a dataSet given an evaluator
 java.lang.String combinationTipText()
          Returns the tip text for this property
 void CombineParents()
          Combine all the posible pair solutions existing in the Population
 void CreatePopulation(int popSize)
          Create the initial Population
 java.lang.String debugTipText()
          Returns the tip text for this property
 java.util.List<ScatterSearchV1.Subset> filterSubset(java.util.List<ScatterSearchV1.Subset> subsetList, int preferredSize)
          Filter a given Lis of Subsets removing the equals subsets
 int generateRandomNumber(int limit)
           
 void GenerateReferenceSet(java.util.List<ScatterSearchV1.Subset> ReferenceSet, int bestSolutions, int divSolutions)
          Generate the a ReferenceSet containing the n best solutions and the m most diverse solutions of the initial Population.
 java.util.BitSet getAllBits(java.util.List<ScatterSearchV1.Subset> subsets)
          Save in Bitset all the gens that are in many others subsets.
 ScatterSearchV1.Subset getBestgen(ScatterSearchV1.Subset subset, java.util.BitSet gens)
          Evaluate each gen of a BitSet inserted in a Subset and get the most significant for that Subset
 SelectedTag getCombination()
          Get the combination
 boolean getDebug()
          Get whether output is to be verbose
 int getIndexofBiggest(java.util.List<java.lang.Integer> simDif)
          get the index in a List where this have the biggest number
 java.lang.String[] getOptions()
          Gets the current settings of ReliefFAttributeEval.
 int getPopulationSize()
          Get the population size
 java.lang.String getRevision()
          Returns the revision string.
 int getSeed()
          get the value of the random number generator's seed
 TechnicalInformation getTechnicalInformation()
          Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
 double getTreshold()
          Get the treshold
 java.lang.String globalInfo()
          Returns a string describing this search method
 void ImproveSolutions()
          Improve the solutions previously combined by adding the attributes that improve that solution
 void InitPopulation(int popSize)
          Creating space for introducing the population
 ScatterSearchV1.Subset intersectSubsets(ScatterSearchV1.Subset subset1, ScatterSearchV1.Subset subset2)
          Intersects two subsets
 ScatterSearchV1.Subset joinSubsets(ScatterSearchV1.Subset subset1, ScatterSearchV1.Subset subset2)
          Join two subsets
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
 java.lang.String populationSizeTipText()
          Returns the tip text for this property
 java.lang.String printSubset(ScatterSearchV1.Subset subset)
           
 java.util.List<ScatterSearchV1.Subset> RankEachAttribute()
          Rank all the attributes individually acording to their merits
 int[] search(ASEvaluation ASEval, Instances data)
          Searches the attribute subset space using Scatter Search.
 java.lang.String seedTipText()
          Returns the tip text for this property
 void setCombination(SelectedTag c)
          Set the kind of combination
 void setDebug(boolean d)
          Set whether verbose output should be generated.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setPopulationSize(int size)
          Set the population size
 void setSeed(int s)
          set the seed for random number generation
 void setTreshold(double treshold)
          Set the treshold
 ScatterSearchV1.Subset simetricDif(ScatterSearchV1.Subset subset1, ScatterSearchV1.Subset subset2, int mode)
           
 int SimetricDiference(ScatterSearchV1.Subset subset, java.util.BitSet bitset)
          Calculate the Simetric Diference of two subsets
 java.lang.String toString()
          returns a description of the search.
 java.lang.String tresholdTipText()
          Returns the tip text for this property
 void UpdateReferenceSet(int numBestSolutions, int numDivsSolutions)
          Update the ReferenceSet putting the new obtained Solutions there
 
Methods inherited from class weka.attributeSelection.ASSearch
forName, makeCopies
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

TAGS_SELECTION

public static final Tag[] TAGS_SELECTION
Constructor Detail

ScatterSearchV1

public ScatterSearchV1()
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this search method

Returns:
a description of the search suitable for displaying in the explorer/experimenter gui

getTechnicalInformation

public TechnicalInformation getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.

Specified by:
getTechnicalInformation in interface TechnicalInformationHandler
Returns:
the technical information about this class

getRevision

public java.lang.String getRevision()
Returns the revision string.

Specified by:
getRevision in interface RevisionHandler
Overrides:
getRevision in class ASSearch
Returns:
the revision

tresholdTipText

public java.lang.String tresholdTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setTreshold

public void setTreshold(double treshold)
Set the treshold

Parameters:
treshold - for identifyng significant subsets

getTreshold

public double getTreshold()
Get the treshold

Returns:
the treshold that subsets most overcome to be considered as significants

populationSizeTipText

public java.lang.String populationSizeTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setPopulationSize

public void setPopulationSize(int size)
Set the population size

Parameters:
size - the number of subset in the initial population

getPopulationSize

public int getPopulationSize()
Get the population size

Returns:
the number of subsets to generate in the initial population

combinationTipText

public java.lang.String combinationTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setCombination

public void setCombination(SelectedTag c)
Set the kind of combination

Parameters:
c - the kind of combination of the search

getCombination

public SelectedTag getCombination()
Get the combination

Returns:
the kind of combination used in the Combination method

seedTipText

public java.lang.String seedTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setSeed

public void setSeed(int s)
set the seed for random number generation

Parameters:
s - seed value

getSeed

public int getSeed()
get the value of the random number generator's seed

Returns:
the seed for random number generation

debugTipText

public java.lang.String debugTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setDebug

public void setDebug(boolean d)
Set whether verbose output should be generated.

Parameters:
d - true if output is to be verbose.

getDebug

public boolean getDebug()
Get whether output is to be verbose

Returns:
true if output will be verbose

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-Z
Specify the number of subsets to generate in the initial population.

-T
Specify the treshold used for considering when a subset is significant.

-R
Specify the kind of combiantion.

-S
Set the random number seed. (default = 1) -D
Verbose output for monitoring the search (default = false)

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of ReliefFAttributeEval.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions()

toString

public java.lang.String toString()
returns a description of the search.

Overrides:
toString in class java.lang.Object
Returns:
a description of the search as a String.

search

public int[] search(ASEvaluation ASEval,
                    Instances data)
             throws java.lang.Exception
Searches the attribute subset space using Scatter Search.

Specified by:
search in class ASSearch
Parameters:
ASEvaluator - the attribute evaluator to guide the search
data - the training instances.
Returns:
an array of selected attribute indexes
Throws:
java.lang.Exception - if the search can't be completed

GenerateReferenceSet

public void GenerateReferenceSet(java.util.List<ScatterSearchV1.Subset> ReferenceSet,
                                 int bestSolutions,
                                 int divSolutions)
Generate the a ReferenceSet containing the n best solutions and the m most diverse solutions of the initial Population.

Parameters:
ReferenceSet - the ReferenceSet for storing these solutions
bestSolutions - the number of the most pure solutions.
divSolutions - the number of the most diverses solutions acording to the bestSolutions.

UpdateReferenceSet

public void UpdateReferenceSet(int numBestSolutions,
                               int numDivsSolutions)
Update the ReferenceSet putting the new obtained Solutions there

Parameters:
bestSolutions - the number of the most pure solutions.
divSolutions - the number of the most diverses solutions acording to the bestSolutions.

ImproveSolutions

public void ImproveSolutions()
                      throws java.lang.Exception
Improve the solutions previously combined by adding the attributes that improve that solution

Throws:
java.lang.Exception - if there is some trouble evaluating the candidate solutions

CombineParents

public void CombineParents()
                    throws java.lang.Exception
Combine all the posible pair solutions existing in the Population

Throws:
java.lang.Exception - if there is some trouble evaluating the new childs

CreatePopulation

public void CreatePopulation(int popSize)
                      throws java.lang.Exception
Create the initial Population

Parameters:
popSize - the size of the initial population
Throws:
java.lang.Exception - if there is a trouble evaluating any solution

RankEachAttribute

public java.util.List<ScatterSearchV1.Subset> RankEachAttribute()
                                                         throws java.lang.Exception
Rank all the attributes individually acording to their merits

Returns:
an ordered List of Subsets with just one attribute
Throws:
java.lang.Exception - if the evaluation can not be completed

getBestgen

public ScatterSearchV1.Subset getBestgen(ScatterSearchV1.Subset subset,
                                         java.util.BitSet gens)
                                  throws java.lang.Exception
Evaluate each gen of a BitSet inserted in a Subset and get the most significant for that Subset

Returns:
a new Subset with the union of subset and the best gen of gens. in case that there's not improvement with each gen return null
Throws:
java.lang.Exception - if the evaluation of can not be completed

bubbleSubsetSort

public java.util.List<ScatterSearchV1.Subset> bubbleSubsetSort(java.util.List<ScatterSearchV1.Subset> subsetList)
Sort a List of subsets according to their merits

Parameters:
subsetList - the subsetList to be ordered
Returns:
a List with ordered subsets

getIndexofBiggest

public int getIndexofBiggest(java.util.List<java.lang.Integer> simDif)
get the index in a List where this have the biggest number

Parameters:
simDif - the Lists of numbers for getting from them the index of the bigger
Returns:
an index that represents where the bigest number is.

getAllBits

public java.util.BitSet getAllBits(java.util.List<ScatterSearchV1.Subset> subsets)
Save in Bitset all the gens that are in many others subsets.

Parameters:
subsets - the Lists of subsets for getting from them all their gens
Returns:
a Bitset with all the gens contained in many others subsets.

InitPopulation

public void InitPopulation(int popSize)
Creating space for introducing the population

Parameters:
size - the number of subset in the initial population

joinSubsets

public ScatterSearchV1.Subset joinSubsets(ScatterSearchV1.Subset subset1,
                                          ScatterSearchV1.Subset subset2)
                                   throws java.lang.Exception
Join two subsets

Parameters:
subset1 - one of the subsets
subset2 - the other subset
Returns:
a new Subset that is te result of the Join
Throws:
java.lang.Exception - if the evaluation of the subsets can not be completed

intersectSubsets

public ScatterSearchV1.Subset intersectSubsets(ScatterSearchV1.Subset subset1,
                                               ScatterSearchV1.Subset subset2)
                                        throws java.lang.Exception
Intersects two subsets

Parameters:
subset1 - one of the subsets
subset2 - the other subset
Returns:
a new Subset that is te result of the intersection
Throws:
java.lang.Exception - if the evaluation of the subsets can not be completed

simetricDif

public ScatterSearchV1.Subset simetricDif(ScatterSearchV1.Subset subset1,
                                          ScatterSearchV1.Subset subset2,
                                          int mode)
                                   throws java.lang.Exception
Throws:
java.lang.Exception

generateRandomNumber

public int generateRandomNumber(int limit)

calculateTreshhold

public double calculateTreshhold()
                          throws java.lang.Exception
Calculate the treshold of a dataSet given an evaluator

Returns:
the treshhold of the dataSet
Throws:
java.lang.Exception - if the calculation can not be completed

SimetricDiference

public int SimetricDiference(ScatterSearchV1.Subset subset,
                             java.util.BitSet bitset)
Calculate the Simetric Diference of two subsets

Returns:
the Simetric Diference
Throws:
java.lang.Exception - if the calculation can not be completed

filterSubset

public java.util.List<ScatterSearchV1.Subset> filterSubset(java.util.List<ScatterSearchV1.Subset> subsetList,
                                                           int preferredSize)
Filter a given Lis of Subsets removing the equals subsets

Parameters:
subsetList - to filter
preferredSize - the preferred size of the new List (if it is -1, then the filter is make it for all subsets, else then the filter method stops when the given preferred size is reached or all the subset have been filtered).
Returns:
a new List filtered
Throws:
java.lang.Exception - if the calculation can not be completed

printSubset

public java.lang.String printSubset(ScatterSearchV1.Subset subset)

attributeList

public int[] attributeList(java.util.BitSet group)
converts a BitSet into a list of attribute indexes

Parameters:
group - the BitSet to convert
Returns:
an array of attribute indexes