System documentation of the GNU Image-Finding Tool

CAcInvertedFile Class Reference

An accessor to an inverted file. More...

#include <CAcInvertedFile.h>

Inheritance diagram for CAcInvertedFile:

CAcURL2FTS CAccessor CAccessorImplementation CAccessor CAcIFFileSystem List of all members.

Public Member Functions

virtual bool operator() () const =0
 for testing if the inverted file is correctly constructed
virtual string IDToURL (TID inID) const =0
 Translate a DocumentID to a URL (for output).
virtual pair< bool, TID > URLToID (const string &inURL) const =0
 Translate an URL to its document ID.
virtual list< TID > * getAllFeatureIDs () const =0
 Getting a list of all features contained in this.
bool operator() () const
 for testing if the inverted file is correctly constructed
 CAcInvertedFile (const CXMLElement &inCollectionElement)
 This opens an exsisting inverted file, and then inits this structure.
bool init (bool)
 called by constructors
 ~CAcInvertedFile ()
 Destructor.
string IDToURL (TID inID) const
 Translate a DocumentID to a URL (for output).
TID URLToID (const string &inURL) const
 Translate an URL to its document ID.
TID getMaximumFeatureID () const
 This is interesting for browsing.
list< TID > * getAllFeatureIDs () const
 Getting a list of all features contained in this.
The proper inverted file access
virtual CDocumentFrequencyListFeatureToList (TFeatureID inFID) const =0
 Give the List of documents containing the feature inFID.
virtual CDocumentFrequencyListURLToFeatureList (string inURL) const =0
 List of features contained by a document with URL inURL.
virtual CDocumentFrequencyListDIDToFeatureList (TID inDID) const =0
 List of features contained by a document with ID inDID.
Accessing information about features
virtual double FeatureToCollectionFrequency (TFeatureID) const =0
 Collection frequency for a given feature.
virtual unsigned int getFeatureDescription (TID inFeatureID) const =0
 What kind of feature is the feature with ID inFeatureID?
Accessing additional document information
virtual double DIDToMaxDocumentFrequency (TID) const =0
 returns the maximum document frequency for one document ID
virtual double DIDToDFSquareSum (TID) const =0
 Returns the document-frequency square sum for a given document ID.
virtual double DIDToSquareDFLogICFSum (TID) const =0
 Returns this function for a given document ID.
virtual bool generateInvertedFile ()=0
 Generating an inverted File, if there is none.
virtual bool checkConsistency ()=0
 Check the consistency of the inverted file system accessed by this accessor.
The proper inverted file access
CDocumentFrequencyListFeatureToList (TFeatureID) const
 List of documents containing the feature.
CDocumentFrequencyListURLToFeatureList (string inURL) const
 List of features contained by a document.
CDocumentFrequencyListDIDToFeatureList (TID inDID) const
 List of features contained by a document with ID inDID.
Accessing information about features
double FeatureToCollectionFrequency (TFeatureID) const
 Collection frequency for a given feature.
unsigned int getFeatureDescription (TID inFeatureID) const
 What kind of feature is the feature with ID inFeatureID?
Accessing additional document information
double DIDToMaxDocumentFrequency (TID) const
 returns the maximum document frequency for one document ID
double DIDToDFSquareSum (TID) const
 Returns the document-frequency square sum for a given document ID.
double DIDToSquareDFLogICFSum (TID) const
 Returns this function for a given document ID.
bool generateInvertedFile ()
 Generating an inverted File, if there is none.
bool newGenerateInvertedFile ()
 Generating an inverted File, if there is none.
bool checkConsistency ()
 Check the consistency of the inverted file system accessed by this accessor.
bool findWithinStream (TID inFeatureID, TID inDocumentID, double inDocumentFrequency) const
 Is the Document with inDocumentID contained in the document frequency list of the feature inFeatureID and is the associated document frequency the same?

Protected Types

typedef hash_map< TID, unsigned
int > 
CIDToOffset
 map from feature id to the offset for this feature

Protected Member Functions

void writeOffsetFileElement (TID inFeatureID, int inPosition, ostream &inOpenOffsetFile)
 add a pair of FeatureID,Offset to the open offset file (helper function for inverted file construction)
CDocumentFrequencyListgetFeatureFile (string inFileName) const
 loads a *.fts file.

Protected Attributes

TID mMaximumFeatureID
 the maximum feature ID arising in this file
CArraySelfDestroyPointer<
char > 
mInvertedFileBuffer
 A buffer, if the inverted file is to be held in ram.
CSelfDestroyPointer< istream > mInvertedFile
 The inverted file.
ifstream mOffsetFile
 Feature -> Offset in inverted file.
ifstream mFeatureDescriptionFile
 File of feature descriptions.
string mInvertedFileName
 Name of the inverted file.
string mOffsetFileName
 Name of the Offset file.
string mFeatureDescriptionFileName
 Name for the file with the feature description.
CIDToOffset mIDToOffset
 map from feature id to the offset for this feature
hash_map< TID, double > mFeatureToCollectionFrequency
 map from feature to the collection frequency
for fast access...
hash_map< TID, unsigned int > mFeatureDescription
 map from the feature ID to the feature description
CADIHash mDocumentInformation
 additional information about the document like, e.g.

Detailed Description

An accessor to an inverted file.

This access is done "by hand" at present this not really efficient, however we plan to move to memory mapped files.


Constructor & Destructor Documentation

CAcInvertedFile::CAcInvertedFile const CXMLElement inCollectionElement  ) 
 

This opens an exsisting inverted file, and then inits this structure.

After that it is fully usable

As a paramter it takes an XMLElement which contains a "collection" element and its content.

If the attribute vi-generate-inverted-file is true, then a new inverted file will be generated using the parameters given in inCollectionElement. you will NOT be able to use *this afterwards.

The REAL constructor.


Member Function Documentation

virtual CDocumentFrequencyList* CAcInvertedFile::FeatureToList TFeatureID  inFID  )  const [pure virtual]
 

Give the List of documents containing the feature inFID.

CORNELIA: CDocumentFrequencyList ist nichts anderes als eine liste von

int,float paaren:

struct{ int mID, float mFrequency; }

Implemented in CAcIFFileSystem.

bool CAcInvertedFile::generateInvertedFile  ) 
 

Generating an inverted File, if there is none.

Fast but stupid in-memory method. This method is very fast, if all the inverted file (and a bit more) can be kept in memory at runtime. If this is not the case, extensive swapping is the result, virtually halting the inverted file creation.

Reimplemented in CAcIFFileSystem.

list<TID>* CAcInvertedFile::getAllFeatureIDs  )  const
 

Getting a list of all features contained in this.

This function is necessary, because in the present system only about 50 percent of the features are really used.

A feature is considered used if it arises in mIDToOffset.

Reimplemented in CAcIFFileSystem.

virtual list<TID>* CAcInvertedFile::getAllFeatureIDs  )  const [pure virtual]
 

Getting a list of all features contained in this.

This function is necessary, because in the present system only about 50 percent of the features are really used.

A feature is considered used if it arises in at least one image

Implemented in CAcIFFileSystem.

CDocumentFrequencyList* CAcInvertedFile::getFeatureFile string  inFileName  )  const [protected]
 

loads a *.fts file.

and returns the feature list

Reimplemented in CAcIFFileSystem.

bool CAcInvertedFile::newGenerateInvertedFile  ) 
 

Generating an inverted File, if there is none.

Employing the two-way-merge method described in "managing gigabytes", chapter 5.2. Sort-based inversion. (Page 181)

Reimplemented in CAcIFFileSystem.


Member Data Documentation

CADIHash CAcInvertedFile::mDocumentInformation [protected]
 

additional information about the document like, e.g.

the euclidean length of the feature list.

Reimplemented in CAcIFFileSystem.


The documentation for this class was generated from the following files:
Need for discussion? Want to contribute? Contact
help-gift@gnu.org Generated using Doxygen