org.htmlcleaner
Class TagNode

java.lang.Object
  extended by org.htmlcleaner.BaseTokenImpl
      extended by org.htmlcleaner.TagToken
          extended by org.htmlcleaner.TagNode
All Implemented Interfaces:
BaseToken, HtmlNode

public class TagNode
extends TagToken
implements HtmlNode

XML node tag - basic node of the cleaned HTML tree. At the same time, it represents start tag token after HTML parsing phase and before cleaning phase. After cleaning process, tree structure remains containing tag nodes (TagNode class), content (text nodes - ContentNode), comments (CommentNode) and optionally doctype node (DoctypeToken).


Field Summary
 
Fields inherited from class org.htmlcleaner.TagToken
name
 
Constructor Summary
TagNode(String name)
           
 
Method Summary
 void addAttribute(String attName, String attValue)
          Adds specified attribute to this tag or overrides existing one.
 void addChild(Object child)
           
 void addChildren(List<? extends BaseToken> newChildren)
          Add all elements from specified list to this node.
 void addNamespaceDeclaration(String nsPrefix, String nsURI)
          Adds namespace declaration to the node
 Object[] evaluateXPath(String xPathExpression)
          Evaluates XPath expression on give node.
 TagNode findElementByAttValue(String attName, String attValue, boolean isRecursive, boolean isCaseSensitive)
           
 TagNode findElementByName(String findName, boolean isRecursive)
           
 TagNode findElementHavingAttribute(String attName, boolean isRecursive)
           
 List<? extends BaseToken> getAllChildren()
           
 TagNode[] getAllElements(boolean isRecursive)
           
 List<? extends TagNode> getAllElementsList(boolean isRecursive)
           
 String getAttributeByName(String attName)
           
 Map<String,String> getAttributes()
          Returns the attributes of the tagnode.
 Map<String,String> getAttributesInLowerCase()
          Returns the attributes of the tagnode in lower case.
 int getChildIndex(HtmlNode child)
           
 List<TagNode> getChildren()
          Deprecated. use getChildTagList(), will be refactored and possibly removed in future versions. TODO This method should be refactored because is does not properly match the commonly used Java's getter/setter strategy.
 List<TagNode> getChildTagList()
           
 TagNode[] getChildTags()
           
 DoctypeToken getDocType()
           
 List<? extends TagNode> getElementList(ITagNodeCondition condition, boolean isRecursive)
          Get all elements in the tree that satisfy specified condition.
 List<? extends TagNode> getElementListByAttValue(String attName, String attValue, boolean isRecursive, boolean isCaseSensitive)
           
 List<? extends TagNode> getElementListByName(String findName, boolean isRecursive)
           
 List<? extends TagNode> getElementListHavingAttribute(String attName, boolean isRecursive)
           
 TagNode[] getElementsByAttValue(String attName, String attValue, boolean isRecursive, boolean isCaseSensitive)
           
 TagNode[] getElementsByName(String findName, boolean isRecursive)
           
 TagNode[] getElementsHavingAttribute(String attName, boolean isRecursive)
           
 String getName()
           
 Map<String,String> getNamespaceDeclarations()
           
 TagNode getParent()
           
 CharSequence getText()
           
 boolean hasAttribute(String attName)
          Checks existence of specified attribute.
 boolean hasChildren()
           
 void insertChild(int index, HtmlNode childToAdd)
          Inserts specified node at specified position in array of children
 void insertChildAfter(HtmlNode node, HtmlNode nodeToInsert)
          Inserts specified node in the list of children after specified child
 void insertChildBefore(HtmlNode node, HtmlNode nodeToInsert)
          Inserts specified node in the list of children before specified child
 boolean isAutoGenerated()
           
 boolean isCopy()
           
 boolean isEmpty()
           
 boolean isForeignMarkup()
           
 boolean isPruned()
           
 TagNode makeCopy()
           
 void removeAllChildren()
          Removes all children (subelements and text content).
 void removeAttribute(String attName)
          Removes specified attribute from this tag.
 boolean removeChild(Object child)
          Remove specified child element from this node.
 boolean removeFromTree()
          Remove this node from the tree.
 void serialize(Serializer serializer, Writer writer)
           
 void setAttributes(Map<String,String> attributes)
          Replace the current set of attributes with a new set.
 void setAutoGenerated(boolean autoGenerated)
           
 void setChildren(List<? extends BaseToken> children)
           
 void setDocType(DoctypeToken docType)
           
 void setForeignMarkup(boolean isForeignMarkup)
           
 void setPruned(boolean pruned)
           
 void traverse(TagNodeVisitor visitor)
          Traverses the tree and performs visitor's action on each node.
 
Methods inherited from class org.htmlcleaner.TagToken
toString
 
Methods inherited from class org.htmlcleaner.BaseTokenImpl
getCol, getRow, setCol, setRow
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface org.htmlcleaner.BaseToken
getCol, getRow, setCol, setRow
 

Constructor Detail

TagNode

public TagNode(String name)
Method Detail

getName

public String getName()
Overrides:
getName in class TagToken

getAttributeByName

public String getAttributeByName(String attName)
Parameters:
attName -
Returns:
Value of the specified attribute, or null if it this tag doesn't contain it.

getAttributes

public Map<String,String> getAttributes()
Returns the attributes of the tagnode.

Returns:
Map instance containing all attribute name/value pairs.

getAttributesInLowerCase

public Map<String,String> getAttributesInLowerCase()
Returns the attributes of the tagnode in lower case.

Returns:
Map instance containing all attribute name/value pairs, with attribute names transformed to lower case

setAttributes

public void setAttributes(Map<String,String> attributes)
Replace the current set of attributes with a new set.

Parameters:
attributes -

hasAttribute

public boolean hasAttribute(String attName)
Checks existence of specified attribute.

Parameters:
attName -
Returns:
true if TagNode has attribute

addAttribute

public void addAttribute(String attName,
                         String attValue)
Adds specified attribute to this tag or overrides existing one.

Parameters:
attName -
attValue -

removeAttribute

public void removeAttribute(String attName)
Removes specified attribute from this tag.

Parameters:
attName -

getChildren

@Deprecated
public List<TagNode> getChildren()
Deprecated. use getChildTagList(), will be refactored and possibly removed in future versions. TODO This method should be refactored because is does not properly match the commonly used Java's getter/setter strategy.

Returns:
List of child TagNode objects.

setChildren

public void setChildren(List<? extends BaseToken> children)

getAllChildren

public List<? extends BaseToken> getAllChildren()

getChildTagList

public List<TagNode> getChildTagList()
Returns:
List of child TagNode objects.

hasChildren

public boolean hasChildren()
Returns:
Whether this node has child elements or not.

getChildTags

public TagNode[] getChildTags()
Returns:
An array of child TagNode instances.

getText

public CharSequence getText()
Returns:
Text content of this node and it's subelements.

getChildIndex

public int getChildIndex(HtmlNode child)
Parameters:
child - Child to find index of
Returns:
Index of the specified child node inside this node's children, -1 if node is not the child

insertChild

public void insertChild(int index,
                        HtmlNode childToAdd)
Inserts specified node at specified position in array of children

Parameters:
index -
childToAdd -

insertChildBefore

public void insertChildBefore(HtmlNode node,
                              HtmlNode nodeToInsert)
Inserts specified node in the list of children before specified child

Parameters:
node - Child before which to insert new node
nodeToInsert - Node to be inserted at specified position

insertChildAfter

public void insertChildAfter(HtmlNode node,
                             HtmlNode nodeToInsert)
Inserts specified node in the list of children after specified child

Parameters:
node - Child after which to insert new node
nodeToInsert - Node to be inserted at specified position

getParent

public TagNode getParent()
Returns:
Parent of this node, or null if this is the root node.

getDocType

public DoctypeToken getDocType()

setDocType

public void setDocType(DoctypeToken docType)

addChild

public void addChild(Object child)

addChildren

public void addChildren(List<? extends BaseToken> newChildren)
Add all elements from specified list to this node.

Parameters:
newChildren -

getElementList

public List<? extends TagNode> getElementList(ITagNodeCondition condition,
                                              boolean isRecursive)
Get all elements in the tree that satisfy specified condition.

Parameters:
condition -
isRecursive -
Returns:
List of TagNode instances with specified name.

getAllElementsList

public List<? extends TagNode> getAllElementsList(boolean isRecursive)

getAllElements

public TagNode[] getAllElements(boolean isRecursive)

findElementByName

public TagNode findElementByName(String findName,
                                 boolean isRecursive)

getElementListByName

public List<? extends TagNode> getElementListByName(String findName,
                                                    boolean isRecursive)

getElementsByName

public TagNode[] getElementsByName(String findName,
                                   boolean isRecursive)

findElementHavingAttribute

public TagNode findElementHavingAttribute(String attName,
                                          boolean isRecursive)

getElementListHavingAttribute

public List<? extends TagNode> getElementListHavingAttribute(String attName,
                                                             boolean isRecursive)

getElementsHavingAttribute

public TagNode[] getElementsHavingAttribute(String attName,
                                            boolean isRecursive)

findElementByAttValue

public TagNode findElementByAttValue(String attName,
                                     String attValue,
                                     boolean isRecursive,
                                     boolean isCaseSensitive)

getElementListByAttValue

public List<? extends TagNode> getElementListByAttValue(String attName,
                                                        String attValue,
                                                        boolean isRecursive,
                                                        boolean isCaseSensitive)

getElementsByAttValue

public TagNode[] getElementsByAttValue(String attName,
                                       String attValue,
                                       boolean isRecursive,
                                       boolean isCaseSensitive)

evaluateXPath

public Object[] evaluateXPath(String xPathExpression)
                       throws XPatherException
Evaluates XPath expression on give node.
This is not fully supported XPath parser and evaluator. Examples below show supported elements:
  • //div//a
  • //div//a[@id][@class]
  • /body/*[1]/@type
  • //div[3]//a[@id][@href='r/n4']
  • //div[last() >= 4]//./div[position() = last()])[position() > 22]//li[2]//a
  • //div[2]/@*[2]
  • data(//div//a[@id][@class])
  • //p/last()
  • //body//div[3][@class]//span[12.2
  • data(//a['v' < @id])

Parameters:
xPathExpression -
Returns:
result of XPather evaluation.
Throws:
XPatherException

removeFromTree

public boolean removeFromTree()
Remove this node from the tree.

Returns:
True if element is removed (if it is not root node).

removeChild

public boolean removeChild(Object child)
Remove specified child element from this node.

Parameters:
child -
Returns:
True if child object existed in the children list.

removeAllChildren

public void removeAllChildren()
Removes all children (subelements and text content).


setAutoGenerated

public void setAutoGenerated(boolean autoGenerated)
Parameters:
autoGenerated - the autoGenerated to set

isAutoGenerated

public boolean isAutoGenerated()
Returns:
the autoGenerated

isPruned

public boolean isPruned()
Returns:
true, if node was marked to be pruned.

setPruned

public void setPruned(boolean pruned)

isEmpty

public boolean isEmpty()

addNamespaceDeclaration

public void addNamespaceDeclaration(String nsPrefix,
                                    String nsURI)
Adds namespace declaration to the node

Parameters:
nsPrefix - Namespace prefix
nsURI - Namespace URI

getNamespaceDeclarations

public Map<String,String> getNamespaceDeclarations()
Returns:
Map of namespace declarations for this node

serialize

public void serialize(Serializer serializer,
                      Writer writer)
               throws IOException
Specified by:
serialize in interface BaseToken
Throws:
IOException

makeCopy

public TagNode makeCopy()

isCopy

public boolean isCopy()

traverse

public void traverse(TagNodeVisitor visitor)
Traverses the tree and performs visitor's action on each node. It stops when it finishes all the tree or when visitor returns false.

Parameters:
visitor - TagNodeVisitor implementation

isForeignMarkup

public boolean isForeignMarkup()
Returns:
the isForeignMarkup

setForeignMarkup

public void setForeignMarkup(boolean isForeignMarkup)
Parameters:
isForeignMarkup - the isForeignMarkup to set


Copyright © 2006-2014. All Rights Reserved.