writer2latex.office

Class OfficeReader


public class OfficeReader
extends java.lang.Object

This class reads and collects global information about an OOo document. This includes styles, forms, information about indexes and references etc.

Constructor Summary

OfficeReader(OfficeDocument oooDoc, boolean bAllParagraphsAreSoft)
Constructor; read a document

Method Summary

void
addFigureSequenceName(String sName)
Add a sequence name for figure captions.

OpenDocument has a very weak notion of figure captions: A caption is a paragraph containing a text:sequence element.

void
addTableSequenceName(String sName)
Add a sequence name for table captions.

OpenDocument has a very weak notion of table captions: A caption is a paragraph containing a text:sequence element.

boolean
bookmarkInHeading(String sName)
Is this bookmark contained in a heading?
StyleWithProperties
getCellStyle(String sName)
OfficeStyleFamily
getCellStyles()
static int
getCharacterCount(Node node)
Counts the number of characters (text nodes) in this element excluding footnotes etc.
StyleWithProperties
getColumnStyle(String sName)
OfficeStyleFamily
getColumnStyles()
Element
getContent()
Get the content element

In the old file format this means the office:body element

In the OpenDocument format this means a office:text, office:spreadsheet or office:presentation element.

StyleWithProperties
getDefaultCellStyle()
StyleWithProperties
getDefaultDrawingPageStyle()
StyleWithProperties
getDefaultFrameStyle()
StyleWithProperties
getDefaultParStyle()
StyleWithProperties
getDefaultPresentationStyle()
StyleWithProperties
getDrawingPageStyle(String sName)
OfficeStyleFamily
getDrawingPageStyles()
PropertySet
getEndnotesConfiguration()
MasterPage
getFirstMasterPage()
Returns the first master page used in the document.
FontDeclaration
getFontDeclaration(String sName)
Get a specific font declaration
OfficeStyleFamily
getFontDeclarations()
Get the collection of all font declarations.
PropertySet
getFootnotesConfiguration()
FormsReader
getForms()
Get the forms belonging to this document.
StyleWithProperties
getFrameStyle(String sName)
OfficeStyleFamily
getFrameStyles()
StyleWithProperties
getHeadingStyle(int nLevel)
Returns the paragraph style associated with headings of a specific level.
ListStyle
getListStyle(String sName)
OfficeStyleFamily
getListStyles()
String
getMajorityLanguage()
Return the iso language used in most paragaph styles (in a well-structured document this will be the default language) TODO: Base on content rather than style
MasterPage
getMasterPage(String sName)
OfficeStyleFamily
getMasterPages()
static char
getNextChar(Node node)
Return the next character in logical order
ListStyle
getOutlineStyle()
PageLayout
getPageLayout(String sName)
OfficeStyleFamily
getPageLayouts()
StyleWithProperties
getParStyle(String sName)
OfficeStyleFamily
getParStyles()
StyleWithProperties
getPresentationStyle(String sName)
OfficeStyleFamily
getPresentationStyles()
StyleWithProperties
getRowStyle(String sName)
OfficeStyleFamily
getRowStyles()
StyleWithProperties
getSectionStyle(String sName)
OfficeStyleFamily
getSectionStyles()
String
getSequenceFromRef(String sRefName)
Get the sequence name associated with a reference name
String
getSequenceName(Element par)
Get the sequence name associated with a paragraph
TableReader
getTableReader(Element node)
Read a table from a table:table node
StyleWithProperties
getTableStyle(String sName)
OfficeStyleFamily
getTableStyles()
String
getTextContent(Node node)
StyleWithProperties
getTextStyle(String sName)
OfficeStyleFamily
getTextStyles()
TocReader
getTocReader(Element onode)
Returns a reader for a specific toc
boolean
hasBookmarkRefTo(String sName)
Is there a reference to this bookmark?
boolean
hasEndnoteRefTo(String sId)
Is there a reference to this endnote?
boolean
hasFootnoteRefTo(String sId)
Is there a reference to this footnote id?
boolean
hasLinkTo(String sName)
Is there a link to this sequence anchor name?
boolean
hasReferenceRefTo(String sName)
Is there a reference to this reference mark?
boolean
hasSequenceRefTo(String sId)
Is there a reference to this sequence field?
static boolean
isDrawElement(Node node)
Checks, if a node is an element in the draw namespace
boolean
isFigureSequenceName(String sName)
Does this sequence name belong to a lof?
boolean
isInPackage(String sUrl)
Checks whether this url is internal to the package
boolean
isIndexSourceStyle(String sStyleName)
Is this style used in some toc as an index source style?
static boolean
isNoteElement(Node node)
Checks, if a node is an element representing a note (footnote/endnote)
boolean
isOpenDocument()
Is this an OASIS OpenDocument or an OOo 1.0 document?
boolean
isPackageFormat()
Checks whether or not this document is in package format
boolean
isPresentation()
Is this a presentation document?
static boolean
isSingleParagraph(Node node)
Checks, if this node contains at most one element, and that this is a paragraph.
boolean
isSpreadsheet()
Is this a spreadsheet document?
static boolean
isTableElement(Node node)
Checks, if a node is an element in the table namespace
boolean
isTableSequenceName(String sName)
Does this sequence name belong to a lot?
boolean
isText()
Is this an text document?
static boolean
isTextElement(Node node)
Checks, if a node is an element in the text namespace
static boolean
isWhitespace(String s)
Checks, if this text is whitespace
static boolean
isWhitespaceContent(Node node)
Checks, if the only text content of this node is whitespace
boolean
referenceMarkInHeading(String sName)
Is this reference mark contained in a heading?

Constructor Details

OfficeReader

public OfficeReader(OfficeDocument oooDoc,
                    boolean bAllParagraphsAreSoft)
Constructor; read a document

Method Details

addFigureSequenceName

public void addFigureSequenceName(String sName)
Add a sequence name for figure captions.

OpenDocument has a very weak notion of figure captions: A caption is a paragraph containing a text:sequence element. Moreover, the only source to identify which sequence number to use is the list(s) of figures. If there's no list of figures, captions cannot be identified. Thus this method lets the user add a sequence name to identify the figure captions.

Parameters:
sName - the name to add

addTableSequenceName

public void addTableSequenceName(String sName)
Add a sequence name for table captions.

OpenDocument has a very weak notion of table captions: A caption is a paragraph containing a text:sequence element. Moreover, the only source to identify which sequence number to use is the list(s) of tables. If there's no list of tables, captions cannot be identified. Thus this method lets the user add a sequence name to identify the table captions.

Parameters:
sName - the name to add

bookmarkInHeading

public boolean bookmarkInHeading(String sName)
Is this bookmark contained in a heading?
Parameters:
sName - the name of the bookmark
Returns:
true if so

getCellStyle

public StyleWithProperties getCellStyle(String sName)

getCellStyles

public OfficeStyleFamily getCellStyles()

getCharacterCount

public static int getCharacterCount(Node node)
Counts the number of characters (text nodes) in this element excluding footnotes etc.
Parameters:
node - the node to count in
Returns:
the number of characters

getColumnStyle

public StyleWithProperties getColumnStyle(String sName)

getColumnStyles

public OfficeStyleFamily getColumnStyles()

getContent

public Element getContent()
Get the content element

In the old file format this means the office:body element

In the OpenDocument format this means a office:text, office:spreadsheet or office:presentation element.

Returns:
the content Element

getDefaultCellStyle

public StyleWithProperties getDefaultCellStyle()

getDefaultDrawingPageStyle

public StyleWithProperties getDefaultDrawingPageStyle()

getDefaultFrameStyle

public StyleWithProperties getDefaultFrameStyle()

getDefaultParStyle

public StyleWithProperties getDefaultParStyle()

getDefaultPresentationStyle

public StyleWithProperties getDefaultPresentationStyle()

getDrawingPageStyle

public StyleWithProperties getDrawingPageStyle(String sName)

getDrawingPageStyles

public OfficeStyleFamily getDrawingPageStyles()

getEndnotesConfiguration

public PropertySet getEndnotesConfiguration()

getFirstMasterPage

public MasterPage getFirstMasterPage()
Returns the first master page used in the document. If no master page is used explicitly, the first master page found in the styles is returned. Returns null if no master pages exists.
Returns:
a MasterPage object representing the master page

getFontDeclaration

public FontDeclaration getFontDeclaration(String sName)
Get a specific font declaration
Parameters:
sName - the name of the font declaration
Returns:
a FontDeclaration representing the font

getFontDeclarations

public OfficeStyleFamily getFontDeclarations()
Get the collection of all font declarations.
Returns:
the OfficeStyleFamily of font declarations

getFootnotesConfiguration

public PropertySet getFootnotesConfiguration()

getForms

public FormsReader getForms()
Get the forms belonging to this document.
Returns:
a FormsReader representing the forms

getFrameStyle

public StyleWithProperties getFrameStyle(String sName)

getFrameStyles

public OfficeStyleFamily getFrameStyles()

getHeadingStyle

public StyleWithProperties getHeadingStyle(int nLevel)
Returns the paragraph style associated with headings of a specific level. Returns null if no such style is known.

In principle, different styles can be used for each heading, in practice the same (soft) style is used for all headings of a specific level.

Parameters:
nLevel - the level of the heading
Returns:
a StyleWithProperties object representing the style

getListStyle

public ListStyle getListStyle(String sName)

getListStyles

public OfficeStyleFamily getListStyles()

getMajorityLanguage

public String getMajorityLanguage()
Return the iso language used in most paragaph styles (in a well-structured document this will be the default language) TODO: Base on content rather than style
Returns:
the iso language

getMasterPage

public MasterPage getMasterPage(String sName)

getMasterPages

public OfficeStyleFamily getMasterPages()

getNextChar

public static char getNextChar(Node node)
Return the next character in logical order

getOutlineStyle

public ListStyle getOutlineStyle()

getPageLayout

public PageLayout getPageLayout(String sName)

getPageLayouts

public OfficeStyleFamily getPageLayouts()

getParStyle

public StyleWithProperties getParStyle(String sName)

getParStyles

public OfficeStyleFamily getParStyles()

getPresentationStyle

public StyleWithProperties getPresentationStyle(String sName)

getPresentationStyles

public OfficeStyleFamily getPresentationStyles()

getRowStyle

public StyleWithProperties getRowStyle(String sName)

getRowStyles

public OfficeStyleFamily getRowStyles()

getSectionStyle

public StyleWithProperties getSectionStyle(String sName)

getSectionStyles

public OfficeStyleFamily getSectionStyles()

getSequenceFromRef

public String getSequenceFromRef(String sRefName)
Get the sequence name associated with a reference name
Parameters:
sRefName - the reference name to use
Returns:
the sequence name or null

getSequenceName

public String getSequenceName(Element par)
Get the sequence name associated with a paragraph
Parameters:
par - the paragraph to look up
Returns:
the sequence name or null

getTableReader

public TableReader getTableReader(Element node)
Read a table from a table:table node
Parameters:
node - the table:table Element node
Returns:
a TableReader object representing the table

getTableStyle

public StyleWithProperties getTableStyle(String sName)

getTableStyles

public OfficeStyleFamily getTableStyles()

getTextContent

public String getTextContent(Node node)

getTextStyle

public StyleWithProperties getTextStyle(String sName)

getTextStyles

public OfficeStyleFamily getTextStyles()

getTocReader

public TocReader getTocReader(Element onode)
Returns a reader for a specific toc
Parameters:
onode - the text:table-of-content-node
Returns:
the reader, or null

hasBookmarkRefTo

public boolean hasBookmarkRefTo(String sName)
Is there a reference to this bookmark?
Parameters:
sName - the name of the bookmark
Returns:
true if there is a reference

hasEndnoteRefTo

public boolean hasEndnoteRefTo(String sId)
Is there a reference to this endnote?
Parameters:
sId - the id of the endnote
Returns:
true if there is a reference

hasFootnoteRefTo

public boolean hasFootnoteRefTo(String sId)
Is there a reference to this footnote id?
Parameters:
sId - the id of the footnote
Returns:
true if there is a reference

hasLinkTo

public boolean hasLinkTo(String sName)
Is there a link to this sequence anchor name?
Parameters:
sName - the name of the anchor
Returns:
true if there is a link

hasReferenceRefTo

public boolean hasReferenceRefTo(String sName)
Is there a reference to this reference mark?
Parameters:
sName - the name of the reference mark
Returns:
true if there is a reference

hasSequenceRefTo

public boolean hasSequenceRefTo(String sId)
Is there a reference to this sequence field?
Parameters:
sId - the id of the sequence field
Returns:
true if there is a reference

isDrawElement

public static boolean isDrawElement(Node node)
Checks, if a node is an element in the draw namespace
Parameters:
node - the node to check
Returns:
true if this is a draw element

isFigureSequenceName

public boolean isFigureSequenceName(String sName)
Does this sequence name belong to a lof?
Parameters:
sName - the name of the sequence
Returns:
true if it belongs to an index

isInPackage

public boolean isInPackage(String sUrl)
Checks whether this url is internal to the package
Parameters:
sUrl - the url to check
Returns:
true if the url is internal to the package

isIndexSourceStyle

public boolean isIndexSourceStyle(String sStyleName)
Is this style used in some toc as an index source style?
Parameters:
sStyleName - the name of the style
Returns:
true if this is an index source style

isNoteElement

public static boolean isNoteElement(Node node)
Checks, if a node is an element representing a note (footnote/endnote)
Parameters:
node - the node to check
Returns:
true if this is a note element

isOpenDocument

public boolean isOpenDocument()
Is this an OASIS OpenDocument or an OOo 1.0 document?
Returns:
true if it's an OASIS OpenDocument

isPackageFormat

public boolean isPackageFormat()
Checks whether or not this document is in package format
Returns:
true if it's in package format

isPresentation

public boolean isPresentation()
Is this a presentation document?
Returns:
true if it's a presentation document

isSingleParagraph

public static boolean isSingleParagraph(Node node)
Checks, if this node contains at most one element, and that this is a paragraph.
Parameters:
node - the node to check
Returns:
true if the node contains a single paragraph or nothing

isSpreadsheet

public boolean isSpreadsheet()
Is this a spreadsheet document?
Returns:
true if it's a spreadsheet document

isTableElement

public static boolean isTableElement(Node node)
Checks, if a node is an element in the table namespace
Parameters:
node - the node to check
Returns:
true if this is a table element

isTableSequenceName

public boolean isTableSequenceName(String sName)
Does this sequence name belong to a lot?
Parameters:
sName - the name of the sequence
Returns:
true if it belongs to an index

isText

public boolean isText()
Is this an text document?
Returns:
true if it's a text document

isTextElement

public static boolean isTextElement(Node node)
Checks, if a node is an element in the text namespace
Parameters:
node - the node to check
Returns:
true if this is a text element

isWhitespace

public static boolean isWhitespace(String s)
Checks, if this text is whitespace
Parameters:
s - the String to check
Returns:
true if the String contains whitespace only

isWhitespaceContent

public static boolean isWhitespaceContent(Node node)
Checks, if the only text content of this node is whitespace
Parameters:
node - the node to check (should be a paragraph node or a child of a paragraph node)
Returns:
true if the node contains whitespace only

referenceMarkInHeading

public boolean referenceMarkInHeading(String sName)
Is this reference mark contained in a heading?
Parameters:
sName - the name of the reference mark
Returns:
true if so