Package org.codelibs.nekohtml
Class HTMLTagBalancer
- java.lang.Object
-
- org.codelibs.nekohtml.HTMLTagBalancer
-
- All Implemented Interfaces:
org.apache.xerces.xni.parser.XMLComponent,org.apache.xerces.xni.parser.XMLDocumentFilter,org.apache.xerces.xni.parser.XMLDocumentSource,org.apache.xerces.xni.XMLDocumentHandler,HTMLComponent
public class HTMLTagBalancer extends java.lang.Object implements org.apache.xerces.xni.parser.XMLDocumentFilter, HTMLComponent
Balances tags in an HTML document. This component receives document events and tries to correct many common mistakes that human (and computer) HTML document authors make. This tag balancer can:- add missing parent elements;
- automatically close elements with optional end tags; and
- handle mis-matched inline element tags.
This component recognizes the following features:
- http://cyberneko.org/html/features/augmentations
- http://cyberneko.org/html/features/report-errors
- http://cyberneko.org/html/features/balance-tags/document-fragment
- http://cyberneko.org/html/features/balance-tags/ignore-outside-content
This component recognizes the following properties:
- http://cyberneko.org/html/properties/names/elems
- http://cyberneko.org/html/properties/names/attrs
- http://cyberneko.org/html/properties/error-reporter
- http://cyberneko.org/html/properties/balance-tags/current-stack
- Version:
- $Id: HTMLTagBalancer.java,v 1.20 2005/02/14 04:06:22 andyc Exp $
- Author:
- Andy Clark, Marc Guillemot
- See Also:
HTMLElements
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classHTMLTagBalancer.InfoElement info for each start element.static classHTMLTagBalancer.InfoStackUnsynchronized stack of element information.
-
Field Summary
Fields Modifier and Type Field Description protected static java.lang.StringAUGMENTATIONSInclude infoset augmentations.protected static java.lang.StringDOCUMENT_FRAGMENTDocument fragment balancing only.protected static java.lang.StringDOCUMENT_FRAGMENT_DEPRECATEDDocument fragment balancing only (deprecated).protected static java.lang.StringERROR_REPORTERError reporter.protected booleanfAllowSelfclosingIframeAllows self closing iframe tags.protected booleanfAllowSelfclosingTagsAllows self closing tags.protected booleanfAugmentationsInclude infoset augmentations.protected booleanfDocumentFragmentDocument fragment balancing only.protected org.apache.xerces.xni.XMLDocumentHandlerfDocumentHandlerThe document handler.protected org.apache.xerces.xni.parser.XMLDocumentSourcefDocumentSourceThe document source.protected HTMLTagBalancer.InfoStackfElementStackThe element stack.protected HTMLErrorReporterfErrorReporterError reporter.protected booleanfIgnoreOutsideContentIgnore outside content.protected HTMLTagBalancer.InfoStackfInlineStackThe inline stack.protected shortfNamesAttrsModify HTML attribute names.protected shortfNamesElemsModify HTML element names.protected booleanfNamespacesNamespaces.protected booleanfOpenedFormTrue if a form is in the stack (allow to discard opening of nested forms)static java.lang.StringFRAGMENT_CONTEXT_STACKName of the property holding the stack of elements in which context a document fragment should be parsed.protected booleanfReportErrorsReport errors.protected booleanfSeenAnythingTrue if seen anything.protected booleanfSeenBodyElementTrue if seen <body< element.protected booleanfSeenDoctypeTrue if root element has been seen.protected booleanfSeenHeadElementTrue if seen <head< element.protected booleanfSeenRootElementTrue if root element has been seen.protected booleanfSeenRootElementEndTrue if seen the end of the document element.protected static java.lang.StringIGNORE_OUTSIDE_CONTENTIgnore outside content.protected static java.lang.StringNAMES_ATTRSModify HTML attribute names: { "upper", "lower", "default" }.protected static java.lang.StringNAMES_ELEMSModify HTML element names: { "upper", "lower", "default" }.protected static shortNAMES_LOWERCASELowercase HTML names.protected static shortNAMES_MATCHMatch HTML element names.protected static shortNAMES_NO_CHANGEDon't modify HTML names.protected static shortNAMES_UPPERCASEUppercase HTML names.protected static java.lang.StringNAMESPACESNamespaces.protected static java.lang.StringREPORT_ERRORSReport errors.protected static HTMLEventInfoSYNTHESIZED_ITEMSynthesized event info item.protected HTMLTagBalancingListenertagBalancingListener
-
Constructor Summary
Constructors Constructor Description HTMLTagBalancer()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidcallEndElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.Augmentations augs)Call document handler end element.protected voidcallStartElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attrs, org.apache.xerces.xni.Augmentations augs)Call document handler start element.voidcharacters(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs)Characters.voidcomment(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs)Comment.voiddoctypeDecl(java.lang.String rootElementName, java.lang.String publicId, java.lang.String systemId, org.apache.xerces.xni.Augmentations augs)Doctype declaration.protected org.apache.xerces.xni.XMLAttributesemptyAttributes()Returns a set of empty attributes.voidemptyElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attrs, org.apache.xerces.xni.Augmentations augs)Empty element.voidendCDATA(org.apache.xerces.xni.Augmentations augs)End CDATA section.voidendDocument(org.apache.xerces.xni.Augmentations augs)End document.voidendElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.Augmentations augs)End element.voidendGeneralEntity(java.lang.String name, org.apache.xerces.xni.Augmentations augs)End entity.voidendPrefixMapping(java.lang.String prefix, org.apache.xerces.xni.Augmentations augs)End prefix mapping.org.apache.xerces.xni.XMLDocumentHandlergetDocumentHandler()Returns the document handler.org.apache.xerces.xni.parser.XMLDocumentSourcegetDocumentSource()Returns the document source.protected HTMLElements.ElementgetElement(org.apache.xerces.xni.QName elementName)Returns an HTML element.protected intgetElementDepth(HTMLElements.Element element)Returns the depth of the open tag associated with the specified element name or -1 if no matching element is found.java.lang.BooleangetFeatureDefault(java.lang.String featureId)Returns the default state for a feature.protected static shortgetNamesValue(java.lang.String value)Converts HTML names string value to constant value.protected intgetParentDepth(HTMLElements.Element[] parents, short bounds)Returns the depth of the open tag associated with the specified element parent names or -1 if no matching element is found.java.lang.ObjectgetPropertyDefault(java.lang.String propertyId)Returns the default state for a property.java.lang.String[]getRecognizedFeatures()Returns recognized features.java.lang.String[]getRecognizedProperties()Returns recognized properties.voidignorableWhitespace(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs)Ignorable whitespace.protected static java.lang.StringmodifyName(java.lang.String name, short mode)Modifies the given name based on the specified mode.voidprocessingInstruction(java.lang.String target, org.apache.xerces.xni.XMLString data, org.apache.xerces.xni.Augmentations augs)Processing instruction.voidreset(org.apache.xerces.xni.parser.XMLComponentManager manager)Resets the component.voidsetDocumentHandler(org.apache.xerces.xni.XMLDocumentHandler handler)Sets the document handler.voidsetDocumentSource(org.apache.xerces.xni.parser.XMLDocumentSource source)Sets the document source.voidsetFeature(java.lang.String featureId, boolean state)Sets a feature.voidsetProperty(java.lang.String propertyId, java.lang.Object value)Sets a property.voidstartCDATA(org.apache.xerces.xni.Augmentations augs)Start CDATA section.voidstartDocument(org.apache.xerces.xni.XMLLocator locator, java.lang.String encoding, org.apache.xerces.xni.Augmentations augs)Start document.voidstartDocument(org.apache.xerces.xni.XMLLocator locator, java.lang.String encoding, org.apache.xerces.xni.NamespaceContext nscontext, org.apache.xerces.xni.Augmentations augs)Start document.voidstartElement(org.apache.xerces.xni.QName elem, org.apache.xerces.xni.XMLAttributes attrs, org.apache.xerces.xni.Augmentations augs)Start element.voidstartGeneralEntity(java.lang.String name, org.apache.xerces.xni.XMLResourceIdentifier id, java.lang.String encoding, org.apache.xerces.xni.Augmentations augs)Start entity.voidstartPrefixMapping(java.lang.String prefix, java.lang.String uri, org.apache.xerces.xni.Augmentations augs)Start prefix mapping.protected org.apache.xerces.xni.AugmentationssynthesizedAugs()Returns an augmentations object with a synthesized item added.voidtextDecl(java.lang.String version, java.lang.String encoding, org.apache.xerces.xni.Augmentations augs)Text declaration.voidxmlDecl(java.lang.String version, java.lang.String encoding, java.lang.String standalone, org.apache.xerces.xni.Augmentations augs)XML declaration.
-
-
-
Field Detail
-
NAMESPACES
protected static final java.lang.String NAMESPACES
Namespaces.- See Also:
- Constant Field Values
-
AUGMENTATIONS
protected static final java.lang.String AUGMENTATIONS
Include infoset augmentations.- See Also:
- Constant Field Values
-
REPORT_ERRORS
protected static final java.lang.String REPORT_ERRORS
Report errors.- See Also:
- Constant Field Values
-
DOCUMENT_FRAGMENT_DEPRECATED
protected static final java.lang.String DOCUMENT_FRAGMENT_DEPRECATED
Document fragment balancing only (deprecated).- See Also:
- Constant Field Values
-
DOCUMENT_FRAGMENT
protected static final java.lang.String DOCUMENT_FRAGMENT
Document fragment balancing only.- See Also:
- Constant Field Values
-
IGNORE_OUTSIDE_CONTENT
protected static final java.lang.String IGNORE_OUTSIDE_CONTENT
Ignore outside content.- See Also:
- Constant Field Values
-
NAMES_ELEMS
protected static final java.lang.String NAMES_ELEMS
Modify HTML element names: { "upper", "lower", "default" }.- See Also:
- Constant Field Values
-
NAMES_ATTRS
protected static final java.lang.String NAMES_ATTRS
Modify HTML attribute names: { "upper", "lower", "default" }.- See Also:
- Constant Field Values
-
ERROR_REPORTER
protected static final java.lang.String ERROR_REPORTER
Error reporter.- See Also:
- Constant Field Values
-
FRAGMENT_CONTEXT_STACK
public static final java.lang.String FRAGMENT_CONTEXT_STACK
Name of the property holding the stack of elements in which context a document fragment should be parsed.- See Also:
- Constant Field Values
-
NAMES_NO_CHANGE
protected static final short NAMES_NO_CHANGE
Don't modify HTML names.- See Also:
- Constant Field Values
-
NAMES_MATCH
protected static final short NAMES_MATCH
Match HTML element names.- See Also:
- Constant Field Values
-
NAMES_UPPERCASE
protected static final short NAMES_UPPERCASE
Uppercase HTML names.- See Also:
- Constant Field Values
-
NAMES_LOWERCASE
protected static final short NAMES_LOWERCASE
Lowercase HTML names.- See Also:
- Constant Field Values
-
SYNTHESIZED_ITEM
protected static final HTMLEventInfo SYNTHESIZED_ITEM
Synthesized event info item.
-
fNamespaces
protected boolean fNamespaces
Namespaces.
-
fAugmentations
protected boolean fAugmentations
Include infoset augmentations.
-
fReportErrors
protected boolean fReportErrors
Report errors.
-
fDocumentFragment
protected boolean fDocumentFragment
Document fragment balancing only.
-
fIgnoreOutsideContent
protected boolean fIgnoreOutsideContent
Ignore outside content.
-
fAllowSelfclosingIframe
protected boolean fAllowSelfclosingIframe
Allows self closing iframe tags.
-
fAllowSelfclosingTags
protected boolean fAllowSelfclosingTags
Allows self closing tags.
-
fNamesElems
protected short fNamesElems
Modify HTML element names.
-
fNamesAttrs
protected short fNamesAttrs
Modify HTML attribute names.
-
fErrorReporter
protected HTMLErrorReporter fErrorReporter
Error reporter.
-
fDocumentSource
protected org.apache.xerces.xni.parser.XMLDocumentSource fDocumentSource
The document source.
-
fDocumentHandler
protected org.apache.xerces.xni.XMLDocumentHandler fDocumentHandler
The document handler.
-
fElementStack
protected final HTMLTagBalancer.InfoStack fElementStack
The element stack.
-
fInlineStack
protected final HTMLTagBalancer.InfoStack fInlineStack
The inline stack.
-
fSeenAnything
protected boolean fSeenAnything
True if seen anything. Important for xml declaration.
-
fSeenDoctype
protected boolean fSeenDoctype
True if root element has been seen.
-
fSeenRootElement
protected boolean fSeenRootElement
True if root element has been seen.
-
fSeenRootElementEnd
protected boolean fSeenRootElementEnd
True if seen the end of the document element. In other words, this variable is set to false until the end </HTML> tag is seen (or synthesized). This is used to ensure that extraneous events after the end of the document element do not make the document stream ill-formed.
-
fSeenHeadElement
protected boolean fSeenHeadElement
True if seen <head< element.
-
fSeenBodyElement
protected boolean fSeenBodyElement
True if seen <body< element.
-
fOpenedForm
protected boolean fOpenedForm
True if a form is in the stack (allow to discard opening of nested forms)
-
tagBalancingListener
protected HTMLTagBalancingListener tagBalancingListener
-
-
Method Detail
-
getFeatureDefault
public java.lang.Boolean getFeatureDefault(java.lang.String featureId)
Returns the default state for a feature.- Specified by:
getFeatureDefaultin interfaceHTMLComponent- Specified by:
getFeatureDefaultin interfaceorg.apache.xerces.xni.parser.XMLComponent
-
getPropertyDefault
public java.lang.Object getPropertyDefault(java.lang.String propertyId)
Returns the default state for a property.- Specified by:
getPropertyDefaultin interfaceHTMLComponent- Specified by:
getPropertyDefaultin interfaceorg.apache.xerces.xni.parser.XMLComponent
-
getRecognizedFeatures
public java.lang.String[] getRecognizedFeatures()
Returns recognized features.- Specified by:
getRecognizedFeaturesin interfaceorg.apache.xerces.xni.parser.XMLComponent
-
getRecognizedProperties
public java.lang.String[] getRecognizedProperties()
Returns recognized properties.- Specified by:
getRecognizedPropertiesin interfaceorg.apache.xerces.xni.parser.XMLComponent
-
reset
public void reset(org.apache.xerces.xni.parser.XMLComponentManager manager)
Resets the component.- Specified by:
resetin interfaceorg.apache.xerces.xni.parser.XMLComponent
-
setFeature
public void setFeature(java.lang.String featureId, boolean state)Sets a feature.- Specified by:
setFeaturein interfaceorg.apache.xerces.xni.parser.XMLComponent
-
setProperty
public void setProperty(java.lang.String propertyId, java.lang.Object value)Sets a property.- Specified by:
setPropertyin interfaceorg.apache.xerces.xni.parser.XMLComponent
-
setDocumentHandler
public void setDocumentHandler(org.apache.xerces.xni.XMLDocumentHandler handler)
Sets the document handler.- Specified by:
setDocumentHandlerin interfaceorg.apache.xerces.xni.parser.XMLDocumentSource
-
getDocumentHandler
public org.apache.xerces.xni.XMLDocumentHandler getDocumentHandler()
Returns the document handler.- Specified by:
getDocumentHandlerin interfaceorg.apache.xerces.xni.parser.XMLDocumentSource
-
startDocument
public void startDocument(org.apache.xerces.xni.XMLLocator locator, java.lang.String encoding, org.apache.xerces.xni.NamespaceContext nscontext, org.apache.xerces.xni.Augmentations augs)Start document.- Specified by:
startDocumentin interfaceorg.apache.xerces.xni.XMLDocumentHandler
-
xmlDecl
public void xmlDecl(java.lang.String version, java.lang.String encoding, java.lang.String standalone, org.apache.xerces.xni.Augmentations augs)XML declaration.- Specified by:
xmlDeclin interfaceorg.apache.xerces.xni.XMLDocumentHandler
-
doctypeDecl
public void doctypeDecl(java.lang.String rootElementName, java.lang.String publicId, java.lang.String systemId, org.apache.xerces.xni.Augmentations augs)Doctype declaration.- Specified by:
doctypeDeclin interfaceorg.apache.xerces.xni.XMLDocumentHandler
-
endDocument
public void endDocument(org.apache.xerces.xni.Augmentations augs)
End document.- Specified by:
endDocumentin interfaceorg.apache.xerces.xni.XMLDocumentHandler
-
comment
public void comment(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs)Comment.- Specified by:
commentin interfaceorg.apache.xerces.xni.XMLDocumentHandler
-
processingInstruction
public void processingInstruction(java.lang.String target, org.apache.xerces.xni.XMLString data, org.apache.xerces.xni.Augmentations augs)Processing instruction.- Specified by:
processingInstructionin interfaceorg.apache.xerces.xni.XMLDocumentHandler
-
startElement
public void startElement(org.apache.xerces.xni.QName elem, org.apache.xerces.xni.XMLAttributes attrs, org.apache.xerces.xni.Augmentations augs)Start element.- Specified by:
startElementin interfaceorg.apache.xerces.xni.XMLDocumentHandler
-
emptyElement
public void emptyElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attrs, org.apache.xerces.xni.Augmentations augs)Empty element.- Specified by:
emptyElementin interfaceorg.apache.xerces.xni.XMLDocumentHandler
-
startGeneralEntity
public void startGeneralEntity(java.lang.String name, org.apache.xerces.xni.XMLResourceIdentifier id, java.lang.String encoding, org.apache.xerces.xni.Augmentations augs)Start entity.- Specified by:
startGeneralEntityin interfaceorg.apache.xerces.xni.XMLDocumentHandler
-
textDecl
public void textDecl(java.lang.String version, java.lang.String encoding, org.apache.xerces.xni.Augmentations augs)Text declaration.- Specified by:
textDeclin interfaceorg.apache.xerces.xni.XMLDocumentHandler
-
endGeneralEntity
public void endGeneralEntity(java.lang.String name, org.apache.xerces.xni.Augmentations augs)End entity.- Specified by:
endGeneralEntityin interfaceorg.apache.xerces.xni.XMLDocumentHandler
-
startCDATA
public void startCDATA(org.apache.xerces.xni.Augmentations augs)
Start CDATA section.- Specified by:
startCDATAin interfaceorg.apache.xerces.xni.XMLDocumentHandler
-
endCDATA
public void endCDATA(org.apache.xerces.xni.Augmentations augs)
End CDATA section.- Specified by:
endCDATAin interfaceorg.apache.xerces.xni.XMLDocumentHandler
-
characters
public void characters(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs)Characters.- Specified by:
charactersin interfaceorg.apache.xerces.xni.XMLDocumentHandler
-
ignorableWhitespace
public void ignorableWhitespace(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs)Ignorable whitespace.- Specified by:
ignorableWhitespacein interfaceorg.apache.xerces.xni.XMLDocumentHandler
-
endElement
public void endElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.Augmentations augs)End element.- Specified by:
endElementin interfaceorg.apache.xerces.xni.XMLDocumentHandler
-
setDocumentSource
public void setDocumentSource(org.apache.xerces.xni.parser.XMLDocumentSource source)
Sets the document source.- Specified by:
setDocumentSourcein interfaceorg.apache.xerces.xni.XMLDocumentHandler
-
getDocumentSource
public org.apache.xerces.xni.parser.XMLDocumentSource getDocumentSource()
Returns the document source.- Specified by:
getDocumentSourcein interfaceorg.apache.xerces.xni.XMLDocumentHandler
-
startDocument
public void startDocument(org.apache.xerces.xni.XMLLocator locator, java.lang.String encoding, org.apache.xerces.xni.Augmentations augs)Start document.
-
startPrefixMapping
public void startPrefixMapping(java.lang.String prefix, java.lang.String uri, org.apache.xerces.xni.Augmentations augs)Start prefix mapping.
-
endPrefixMapping
public void endPrefixMapping(java.lang.String prefix, org.apache.xerces.xni.Augmentations augs)End prefix mapping.
-
getElement
protected HTMLElements.Element getElement(org.apache.xerces.xni.QName elementName)
Returns an HTML element.
-
callStartElement
protected final void callStartElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attrs, org.apache.xerces.xni.Augmentations augs)Call document handler start element.
-
callEndElement
protected final void callEndElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.Augmentations augs)Call document handler end element.
-
getElementDepth
protected final int getElementDepth(HTMLElements.Element element)
Returns the depth of the open tag associated with the specified element name or -1 if no matching element is found.- Parameters:
element- The element.
-
getParentDepth
protected int getParentDepth(HTMLElements.Element[] parents, short bounds)
Returns the depth of the open tag associated with the specified element parent names or -1 if no matching element is found.- Parameters:
parents- The parent elements.
-
emptyAttributes
protected final org.apache.xerces.xni.XMLAttributes emptyAttributes()
Returns a set of empty attributes.
-
synthesizedAugs
protected final org.apache.xerces.xni.Augmentations synthesizedAugs()
Returns an augmentations object with a synthesized item added.
-
modifyName
protected static final java.lang.String modifyName(java.lang.String name, short mode)Modifies the given name based on the specified mode.
-
getNamesValue
protected static final short getNamesValue(java.lang.String value)
Converts HTML names string value to constant value.- See Also:
NAMES_NO_CHANGE,NAMES_LOWERCASE,NAMES_UPPERCASE
-
-