weka.attributeSelection
Class FCBFSearch

java.lang.Object
  extended by weka.attributeSelection.ASSearch
      extended by weka.attributeSelection.FCBFSearch
All Implemented Interfaces:
java.io.Serializable, RankedOutputSearch, StartSetHandler, OptionHandler, RevisionHandler, TechnicalInformationHandler

public class FCBFSearch
extends ASSearch
implements RankedOutputSearch, StartSetHandler, OptionHandler, TechnicalInformationHandler

FCBF :

Feature selection method based on correlation measureand relevance&redundancy analysis. Use in conjunction with an attribute set evaluator (SymmetricalUncertAttributeEval).

For more information see:

Lei Yu, Huan Liu: Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution. In: Proceedings of the Twentieth International Conference on Machine Learning, 856-863, 2003.

BibTeX:

 @inproceedings{Yu2003,
    author = {Lei Yu and Huan Liu},
    booktitle = {Proceedings of the Twentieth International Conference on Machine Learning},
    pages = {856-863},
    publisher = {AAAI Press},
    title = {Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution},
    year = {2003}
 }
 

Valid options are:

 -D <create dataset>
  Specify Whether the selector generates a new dataset.
 -P <start set>
  Specify a starting set of attributes.
   Eg. 1,3,5-7.
  Any starting attributes specified are
  ignored during the ranking.
 -T <threshold>
  Specify a theshold by which attributes
  may be discarded from the ranking.
 -N <num to select>
  Specify number of attributes to select

Version:
$Revision: 1.7 $
Author:
Zheng Zhao: zhaozheng at asu.edu
See Also:
Serialized Form

Constructor Summary
FCBFSearch()
          Constructor
 
Method Summary
 java.lang.String generateDataOutputTipText()
          Returns the tip text for this property
 java.lang.String generateRankingTipText()
          Returns the tip text for this property
 int getCalculatedNumToSelect()
          Gets the calculated number to select.
 boolean getGenerateDataOutput()
          Returns the flag, by which the AttributeSelection module decide whether create a new dataset according to the selected features.
 boolean getGenerateRanking()
          This is a dummy method.
 int getNumToSelect()
          Gets the number of attributes to be retained.
 java.lang.String[] getOptions()
          Gets the current settings of ReliefFAttributeEval.
 java.lang.String getRevision()
          Returns the revision string.
 java.lang.String getStartSet()
          Returns a list of attributes (and or attribute ranges) as a String
 TechnicalInformation getTechnicalInformation()
          Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
 double getThreshold()
          Returns the threshold so that the AttributeSelection module can discard attributes from the ranking.
 java.lang.String globalInfo()
          Returns a string describing this search method
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
 java.lang.String numToSelectTipText()
          Returns the tip text for this property
 double[][] rankedAttributes()
          Sorts the evaluated attribute list
 int[] search(ASEvaluation ASEval, Instances data)
          Kind of a dummy search algorithm.
 void setGenerateDataOutput(boolean doGenerate)
          Sets the flag, by which the AttributeSelection module decide whether create a new dataset according to the selected features.
 void setGenerateRanking(boolean doRank)
          This is a dummy set method---Ranker is ONLY capable of producing a ranked list of attributes for attribute evaluators.
 void setNumToSelect(int n)
          Specify the number of attributes to select from the ranked list.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setStartSet(java.lang.String startSet)
          Sets a starting set of attributes for the search.
 void setThreshold(double threshold)
          Set the threshold by which the AttributeSelection module can discard attributes.
 java.lang.String startSetTipText()
          Returns the tip text for this property
 java.lang.String thresholdTipText()
          Returns the tip text for this property
 java.lang.String toString()
          returns a description of the search as a String
 
Methods inherited from class weka.attributeSelection.ASSearch
forName, makeCopies
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

FCBFSearch

public FCBFSearch()
Constructor

Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this search method

Returns:
a description of the search suitable for displaying in the explorer/experimenter gui

getTechnicalInformation

public TechnicalInformation getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.

Specified by:
getTechnicalInformation in interface TechnicalInformationHandler
Returns:
the technical information about this class

numToSelectTipText

public java.lang.String numToSelectTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setNumToSelect

public void setNumToSelect(int n)
Specify the number of attributes to select from the ranked list. -1 indicates that all attributes are to be retained.

Specified by:
setNumToSelect in interface RankedOutputSearch
Parameters:
n - the number of attributes to retain

getNumToSelect

public int getNumToSelect()
Gets the number of attributes to be retained.

Specified by:
getNumToSelect in interface RankedOutputSearch
Returns:
the number of attributes to retain

getCalculatedNumToSelect

public int getCalculatedNumToSelect()
Gets the calculated number to select. This might be computed from a threshold, or if < 0 is set as the number to select then it is set to the number of attributes in the (transformed) data.

Specified by:
getCalculatedNumToSelect in interface RankedOutputSearch
Returns:
the calculated number of attributes to select

thresholdTipText

public java.lang.String thresholdTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setThreshold

public void setThreshold(double threshold)
Set the threshold by which the AttributeSelection module can discard attributes.

Specified by:
setThreshold in interface RankedOutputSearch
Parameters:
threshold - the threshold.

getThreshold

public double getThreshold()
Returns the threshold so that the AttributeSelection module can discard attributes from the ranking.

Specified by:
getThreshold in interface RankedOutputSearch
Returns:
the threshold

generateRankingTipText

public java.lang.String generateRankingTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setGenerateRanking

public void setGenerateRanking(boolean doRank)
This is a dummy set method---Ranker is ONLY capable of producing a ranked list of attributes for attribute evaluators.

Specified by:
setGenerateRanking in interface RankedOutputSearch
Parameters:
doRank - this parameter is N/A and is ignored

getGenerateRanking

public boolean getGenerateRanking()
This is a dummy method. Ranker can ONLY be used with attribute evaluators and as such can only produce a ranked list of attributes

Specified by:
getGenerateRanking in interface RankedOutputSearch
Returns:
true all the time.

generateDataOutputTipText

public java.lang.String generateDataOutputTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setGenerateDataOutput

public void setGenerateDataOutput(boolean doGenerate)
Sets the flag, by which the AttributeSelection module decide whether create a new dataset according to the selected features.

Parameters:
doGenerate - the flag, by which the AttributeSelection module decide whether create a new dataset according to the selected features

getGenerateDataOutput

public boolean getGenerateDataOutput()
Returns the flag, by which the AttributeSelection module decide whether create a new dataset according to the selected features.

Returns:
the flag, by which the AttributeSelection module decide whether create a new dataset according to the selected features.

startSetTipText

public java.lang.String startSetTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setStartSet

public void setStartSet(java.lang.String startSet)
                 throws java.lang.Exception
Sets a starting set of attributes for the search. It is the search method's responsibility to report this start set (if any) in its toString() method.

Specified by:
setStartSet in interface StartSetHandler
Parameters:
startSet - a string containing a list of attributes (and or ranges), eg. 1,2,6,10-15.
Throws:
java.lang.Exception - if start set can't be set.

getStartSet

public java.lang.String getStartSet()
Returns a list of attributes (and or attribute ranges) as a String

Specified by:
getStartSet in interface StartSetHandler
Returns:
a list of attributes (and or attribute ranges)

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options.

Valid options are:

 -D <create dataset>
  Specify Whether the selector generates a new dataset.
 -P <start set>
  Specify a starting set of attributes.
   Eg. 1,3,5-7.
  Any starting attributes specified are
  ignored during the ranking.
 -T <threshold>
  Specify a theshold by which attributes
  may be discarded from the ranking.
 -N <num to select>
  Specify number of attributes to select

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of ReliefFAttributeEval.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions()

search

public int[] search(ASEvaluation ASEval,
                    Instances data)
             throws java.lang.Exception
Kind of a dummy search algorithm. Calls a Attribute evaluator to evaluate each attribute not included in the startSet and then sorts them to produce a ranked list of attributes.

Specified by:
search in class ASSearch
Parameters:
ASEval - the attribute evaluator to guide the search
data - the training instances.
Returns:
an array (not necessarily ordered) of selected attribute indexes
Throws:
java.lang.Exception - if the search can't be completed

rankedAttributes

public double[][] rankedAttributes()
                            throws java.lang.Exception
Sorts the evaluated attribute list

Specified by:
rankedAttributes in interface RankedOutputSearch
Returns:
an array of sorted (highest eval to lowest) attribute indexes
Throws:
java.lang.Exception - of sorting can't be done.

toString

public java.lang.String toString()
returns a description of the search as a String

Overrides:
toString in class java.lang.Object
Returns:
a description of the search

getRevision

public java.lang.String getRevision()
Returns the revision string.

Specified by:
getRevision in interface RevisionHandler
Overrides:
getRevision in class ASSearch
Returns:
the revision