Prev Class | Next Class | Frames | No Frames |
Summary: Nested | Field | Method | Constr | Detail: Nested | Field | Method | Constr |
java.lang.Object
au.id.jericho.lib.html.OutputDocument
Source
document.
An OutputDocument
represents an original source document that
has been modified by substituting segments of it with other text.
Each of these substitutions must be registered in the output document,
which is most commonly done using the various replace
, remove
or insert
methods in this class.
These methods internally register one or more OutputSegment
objects to define each substitution.
After all of the substitutions have been registered, the modified text can be retrieved using the
writeTo(Writer)
or toString()
methods.
The registered output segments must not overlap each other, but may be adjacent.
Multiple output segments may be added at the same begin position provided that they are all
zero-length, with the exception of one segment which may end at a different position.
For efficiency reasons, violations of the above rules on overlapping segments do not throw an exception when the segment is registered,
but an OverlappingOutputSegmentsException
is thrown when the output is generated.
The following example converts all externally referenced style sheets to internal style sheets:
URL sourceUrl=new URL(sourceUrlString); String htmlText=Util.getString(new InputStreamReader(sourceUrl.openStream())); Source source=new Source(htmlText); OutputDocument outputDocument=new OutputDocument(source); StringBuffer sb=new StringBuffer(); List linkStartTags=source.findAllStartTags(Tag.LINK); for (Iterator i=linkStartTags.iterator(); i.hasNext();) { StartTag startTag=(StartTag)i.next(); Attributes attributes=startTag.getAttributes(); String rel=attributes.getValue("rel"); if (!"stylesheet".equalsIgnoreCase(rel)) continue; String href=attributes.getValue("href"); if (href==null) continue; String styleSheetContent; try { styleSheetContent=Util.getString(new InputStreamReader(new URL(sourceUrl,href).openStream())); } catch (Exception ex) { continue; // don't convert if URL is invalid } sb.setLength(0); sb.append("<style"); Attribute typeAttribute=attributes.get("type"); if (typeAttribute!=null) sb.append(' ').append(typeAttribute); sb.append(">\n").append(styleSheetContent).append("\n</style>"); outputDocument.replace(startTag,sb); } String convertedHtmlText=outputDocument.toString();
OutputSegment
, StringOutputSegment
Constructor Summary | |
| |
|
Method Summary | |
void |
|
void |
|
void |
|
long | |
Reader |
|
CharSequence |
|
void |
|
void |
|
void |
|
void | |
void | |
void |
|
Map |
|
void |
|
void |
|
void | |
void |
|
void |
|
void |
|
String |
|
void |
|
public OutputDocument(CharSequence sourceText)
Deprecated. Use the
OutputDocument(Source)
constructor instead.Constructs a new output document based on the specified source text. This constructor has been deprecated as of version 2.2 in favour of theOutputDocument(Source)
method as most of the methods in this class assume that the argument supplied to this constructor is the entire source document.
- Parameters:
sourceText
- the source text.
public OutputDocument(Source source)
Constructs a new output document based on the specified source document.
- Parameters:
source
- the source document.
public void add(FormControl formControl)
Deprecated. Use the
replace(FormControl)
method instead.Replaces the specifiedFormControl
in this output document. This method has been deprecated as of version 2.2 in favour of the identicalreplace(FormControl)
method in an effort to make this class and its methods more intuitive.
- Parameters:
formControl
- the form control to replace.
public void add(FormFields formFields)
Deprecated. Use the
replace(FormFields)
method instead.Replaces all the constituent form controls from the specifiedFormFields
in this output document. This method has been deprecated as of version 2.2 in favour of the identicalreplace(FormFields)
method in an effort to make this class and its methods more intuitive.
- Parameters:
formFields
- the form fields to replace.
public void add(OutputSegment outputSegment)
Deprecated. Use the
register(OutputSegment)
method instead.Registers the specified output segment in this output document. This method has been deprecated as of version 2.2 in favour of the identicalregister(OutputSegment)
method in an effort to make this class and its methods more intuitive.
- Parameters:
outputSegment
- the output segment to register.
public long getEstimatedMaximumOutputLength()
- Specified by:
- getEstimatedMaximumOutputLength in interface CharStreamSource
public Reader getReader()
Deprecated. Use
CharStreamSourceUtil.getReader(this)
instead.Returns aReader
that reads the final content of this output document. This method has been deprecated as of version 2.2 in favour of calling theCharStreamSourceUtil.getReader(CharStreamSource)
method, passing this object as the argument.
- Returns:
- a
Reader
that reads the final content of this output document.
public CharSequence getSourceText()
Returns the original source text upon which this output document is based.
- Returns:
- the original source text upon which this output document is based.
public void insert(int pos, CharSequence text)
Inserts the specified text at the specified character position in this output document.
- Parameters:
pos
- the character position at which to insert the text.text
- the replacement text.
public void output(Writer writer) throws IOException
Deprecated. Use the
writeTo(Writer)
method instead.Outputs the final content of this output document to the specifiedWriter
. This method has been deprecated as of version 2.2 in favour of the identicalwriteTo(Writer)
method in order for this class to implementCharStreamSource
.
- Parameters:
writer
- the destinationjava.io.Writer
for the output.
public void register(OutputSegment outputSegment)
Registers the specified output segment in this output document. Use this method if you want to use a customisedOutputSegment
class.
- Parameters:
outputSegment
- the output segment to register.
public void remove(Collection segments)
Removes all the segments from this output document represented by the specified source Segment objects. This is equivalent to the following code:for (Iterator i=segments.iterator(); i.hasNext();)remove
((Segment)i.next());
- Parameters:
segments
- a collection of segments to remove, represented by sourceSegment
objects.
public void remove(Segment segment)
Removes the specified segment from this output document. This is equivalent toreplace
(segment,null)
.
- Parameters:
segment
- the segment to remove.
public void replace(Attributes attributes, Map map)
Replaces the specified attributes segment in this source document with the name/value entries in the specifiedMap
. This method might be used if theMap
containing the new attribute values should not be preloaded with the same entries as the source attributes, or a map implementation other thanLinkedHashMap
is required. Otherwise, thereplace(Attributes, boolean convertNamesToLowerCase)
method is generally more useful. Keys in the map must beString
objects, and values must implement theCharSequence
interface. An attribute with no value is represented by a map entry with anull
value. Attribute values are stored unencoded in the map, and are automatically encoded if necessary during output. The use of invalid characters in attribute names results in unspecified behaviour. Note that methods in theAttributes
class treat attribute names as case insensitive, whereas theMap
treats them as case sensitive.
- Parameters:
attributes
- theAttributes
object defining the span of the segment to replace.map
- theMap
containing the name/value entries.
- See Also:
replace(Attributes, boolean convertNamesToLowerCase)
public Map replace(Attributes attributes, boolean convertNamesToLowerCase)
Replaces the specifiedAttributes
segment in this output document with the name/value entries in the returnedMap
. The returned map initially contains entries representing the attributes from the source document, which can be modified before output. The documentation of thereplace(Attributes,Map)
method contains more information about the requirements of the map entries. Specifying a value oftrue
as an argument to theconvertNamesToLowerCase
parameter causes all original attribute names to be converted to lower case in the map. This simplifies the process of finding/updating specific attributes since map keys are case sensitive. Attribute values are automatically decoded before being loaded into the map. This method is logically equivalent to:
replace
(attributes, attributes.
populateMap(new LinkedHashMap(),convertNamesToLowerCase)
)
The use ofLinkedHashMap
to implement the map ensures (probably unnecessarily) that existing attributes are output in the same order as they appear in the source document, and new attributes are output in the same order as they are added.
Source source=new Source(htmlDocument); Attributes bodyAttributes =source.findNextStartTag(0,Tag.BODY).getAttributes(); OutputDocument outputDocument=new OutputDocument(source); Map attributesMap=outputDocument.replace(bodyAttributes,true); attributesMap.put("bgcolor","green"); String htmlDocumentWithGreenBackground=outputDocument.toString();
- Parameters:
attributes
- theAttributes
segment defining the span of the segment and initial name/value entries of the returned map.convertNamesToLowerCase
- specifies whether all attribute names are converted to lower case in the map.
- Returns:
- a
Map
containing the name/value entries to be output.
- See Also:
replace(Attributes,Map)
public void replace(FormControl formControl)
Replaces the specifiedFormControl
in this output document. The effect of this method is to register zero or more output segments in the output document as required to reflect previous modifications to the control's state. The state of a control includes its submission value, output style, and whether it has been disabled. The state of the form control should not be modified after this method is called, as there is no guarantee that subsequent changes either will or will not be reflected in the final output. A second call to this method with the same parameter is not allowed. It is therefore recommended to call this method as the last action before the output is generated. Although the specifics of the number and nature of the output segments added in any particular circumstance is not defined in the specification, it can generally be assumed that only the minimum changes necessary are made to the original document. If the state of the control has not been modified, calling this method has no effect at all.
- Parameters:
formControl
- the form control to replace.
- See Also:
replace(FormFields)
public void replace(FormFields formFields)
Replaces all the constituent form controls from the specifiedFormFields
in this output document. This is equivalent to the following code:for (Iterator i=formFields.The state of any of the form controls in the specified form fields should not be modified after this method is called, as there is no guarantee that subsequent changes either will or will not be reflected in the final output. A second call to this method with the same parameter is not allowed. It is therefore recommended to call this method as the last action before the output is generated.getFormControls()
.iterator(); i.hasNext();)replace
((FormControl)i.next());
- Parameters:
formFields
- the form fields to replace.
- See Also:
replace(FormControl)
public void replace(Segment segment, CharSequence text)
Replaces the specified segment in this output document with the specified text. Specifying anull
argument to thetext
parameter is exactly equivalent to specifying an empty string, and results in the segment being completely removed from the output document.
- Parameters:
segment
- the segment to replace.text
- the replacement text, ornull
to remove the segment.
public void replace(int begin, int end, CharSequence text)
Replaces the specified segment of this output document with the specified text. Specifying anull
argument to thetext
parameter is exactly equivalent to specifying an empty string, and results in the segment being completely removed from the output document.
- Parameters:
begin
- the character position at which to begin the replacement.end
- the character position at which to end the replacement.text
- the replacement text, ornull
to remove the segment.
public void replace(int begin, int end, char ch)
Replaces the specified segment of this output document with the specified character.
- Parameters:
begin
- the character position at which to begin the replacement.end
- the character position at which to end the replacement.ch
- the replacement character.
public void replaceWithSpaces(int begin, int end)
Replaces the specified segment of this output document with a string of spaces of the same length. This method is used internally to implement the functionality available through theSegment.ignoreWhenParsing()
method. It is included in the public API in the unlikely event it has other practical uses for the developer. To remove a segment from the output document completely, use theremove(Segment)
method instead.
- Parameters:
begin
- the character position at which to begin the replacement.end
- the character position at which to end the replacement.
public String toString()
Returns the final content of this output document as aString
.
- Returns:
- the final content of this output document as a
String
.
- See Also:
writeTo(Writer)
public void writeTo(Writer writer) throws IOException
Writes the final content of this output document to the specifiedWriter
. AnOverlappingOutputSegmentsException
is thrown if any of the output segments overlap. For efficiency reasons this condition is not caught when the offending output segment is added. If the output is required in the form of aReader
, useCharStreamSourceUtil.getReader(this)
instead.
- Specified by:
- writeTo in interface CharStreamSource
- Parameters:
writer
- the destinationjava.io.Writer
for the output.
- See Also:
toString()