#include <CAcIFFileSystem.h>
Inheritance diagram for CAcIFFileSystem:
Public Member Functions | |
bool | operator() () const |
for testing if the inverted file is correctly constructed | |
CAcIFFileSystem (const CXMLElement &inCollectionElement) | |
This opens an exsisting inverted file, and then inits this structure. | |
bool | init (bool) |
called by constructors | |
~CAcIFFileSystem () | |
Destructor. | |
string | IDToURL (TID inID) const |
Translate a DocumentID to a URL (for output). | |
bool | generateInvertedFile () |
Generating an inverted File, if there is none. | |
bool | newGenerateInvertedFile () |
Generating an inverted File, if there is none. | |
bool | checkConsistency () |
Check the consistency of the inverted file system accessed by this accessor. | |
bool | findWithinStream (TID inFeatureID, TID inDocumentID, double inDocumentFrequency) const |
Is the Document with inDocumentID contained in the document frequency list of the feature inFeatureID and is the associated document frequency the same? | |
virtual pair< bool, TID > | URLToID (const string &inURL) const |
void | getAllIDs (list< TID > &) const |
List of the IDs of all documents present in the inverted file. | |
void | getAllAccessorElements (list< CAccessorElement > &) const |
List of triplets (ID,imageURL,thumbnailURL) of all the documents present in the inverted file. | |
void | getRandomIDs (list< TID > &, list< TID >::size_type) const |
get a given number of random C-AccessorElement-s | |
void | getRandomAccessorElements (list< CAccessorElement > &outResult, list< CAccessorElement >::size_type inSize) const |
For drawing random sets. | |
int | size () const |
The number of images in this accessor. | |
TID | getMaximumFeatureID () const |
This is interesting for browsing. | |
list< TID > * | getAllFeatureIDs () const |
Getting a list of all features contained in this. | |
virtual pair< bool, CAccessorElement > | IDToAccessorElement (TID inID) const |
operator bool () const | |
is this well constructed? | |
The proper inverted file access | |
CDocumentFrequencyList * | FeatureToList (TFeatureID) const |
List of documents containing the feature. | |
CDocumentFrequencyList * | URLToFeatureList (string inURL) const |
List of features contained by a document. | |
CDocumentFrequencyList * | DIDToFeatureList (TID inDID) const |
List of features contained by a document with ID inDID. | |
Accessing information about features | |
double | FeatureToCollectionFrequency (TFeatureID) const |
Collection frequency for a given feature. | |
unsigned int | getFeatureDescription (TID inFeatureID) const |
What kind of feature is the feature with ID inFeatureID? | |
Accessing additional document information | |
double | DIDToMaxDocumentFrequency (TID) const |
returns the maximum document frequency for one document ID | |
double | DIDToDFSquareSum (TID) const |
Returns the document-frequency square sum for a given document ID. | |
double | DIDToSquareDFLogICFSum (TID) const |
Returns this function for a given document ID. | |
Protected Types | |
typedef HASH_MAP< TID, streampos > | CIDToOffset |
map from feature id to the offset for this feature | |
Protected Member Functions | |
void | writeOffsetFileElement (TID inFeatureID, streampos inPosition, ostream &inOpenOffsetFile) |
add a pair of FeatureID,Offset to the open offset file (helper function for inverted file construction) | |
CDocumentFrequencyList * | getFeatureFile (string inFileName) const |
loads a *.fts file. | |
Protected Attributes | |
CMutex | mMutex |
the mutex for multi threading | |
CSelfDestroyPointer< CAcURL2FTS > | mURL2FTS |
In order to have just one parent, I have to limit on single inheritance. | |
TID | mMaximumFeatureID |
the maximum feature ID arising in this file | |
string | mInvertedFileBuffer |
A buffer, if the inverted file is to be held in ram. | |
string | mTemporaryIndexingFileBase |
Some place for putting temporary indexing data. | |
CSelfDestroyPointer< istream > | mInvertedFile |
The inverted file. | |
ifstream | mOffsetFile |
Feature -> Offset in inverted file. | |
ifstream | mFeatureDescriptionFile |
File of feature descriptions. | |
string | mInvertedFileName |
Name of the inverted file. | |
string | mOffsetFileName |
Name of the Offset file. | |
string | mFeatureDescriptionFileName |
Name for the file with the feature description. | |
CIDToOffset | mIDToOffset |
map from feature id to the offset for this feature | |
HASH_MAP< TID, double > | mFeatureToCollectionFrequency |
map from feature to the collection frequency | |
for fast access... | |
HASH_MAP< TID, unsigned int > | mFeatureDescription |
map from the feature ID to the feature description | |
CADIHash | mDocumentInformation |
additional information about the document like, e.g. |
This access is done "by hand".
For a long time we wanted to move to memory mapped files (like SWISH++) but currently I think this is not the best idea.
|
This opens an exsisting inverted file, and then inits this structure. After that it is fully usable As a paramter it takes an XMLElement which contains a "collection" element and its content. If the attribute cui-generate-inverted-file is true, then a new inverted file will be generated using the parameters given in inCollectionElement. you will NOT be able to use *this afterwards. Like every accessor, this accessor takes a <collection /> MRML element as input (
|
|
Is the Document with inDocumentID contained in the document frequency list of the feature inFeatureID and is the associated document frequency the same?
Reimplemented from CAcInvertedFile. |
|
Generating an inverted File, if there is none. Fast but stupid in-memory method. This method is very fast, if all the inverted file (and a bit more) can be kept in memory at runtime. If this is not the case, extensive swapping is the result, virtually halting the inverted file creation. Implements CAcInvertedFile. |
|
Getting a list of all features contained in this. This function is necessary, because in the present system only about 50 percent of the features are really used. A feature is considered used if it arises in mIDToOffset. Implements CAcInvertedFile. |
|
loads a *.fts file. and returns the feature list Reimplemented from CAcInvertedFile. |
|
For drawing random sets. Why is this part of an CAccessorImplementation? The way the accessor is organised might influence the way random sets can be drawn. At present everything happens in RAM, but we do not want to be fixed on that.
Implements CAccessor. |
|
get a given number of random C-AccessorElement-s
Implements CAccessor. |
|
Translate a DocumentID to an accessor Element Implements CAccessor. |
|
Generating an inverted File, if there is none. Employing the two-way-merge method described in "managing gigabytes", chapter 5.2. Sort-based inversion. (Page 181) Reimplemented from CAcInvertedFile. |
|
Translate an URL to its document ID Implements CAcInvertedFile. |
|
additional information about the document like, e.g. the euclidean length of the feature list. Reimplemented from CAcInvertedFile. |
|
In order to have just one parent, I have to limit on single inheritance. I cannot use virtual base classes, because then I cannot downcast |