How I can implement an algorithm that loops through a tree HTML with Java?

I have to walk a tree that reaches me from a NodeList, I need an algorithm to traverse all nodes in order, most likely be in depth but not how to implement it. I think I need some recursion. Can anybody help?

The part of the code is: NodeList nodeLista = documento.getElementsByTagName("html");

for (int s = 0; s < nodeLista.getLength(); s++) {
    Node Raiz = nodeLista.item(s);

....

    for (int h = 0; h < nodeLista.getLength(); h++) {

    //Level of depth 1.
    Node Primer_Hijo = nodeLista.item(h); // In the first iteration for the HEAD will enter in the second iteration enter the BODY.

    //Level of depth 2.
    Element SegundoElemento = (Element) Primer_Hijo;
    NodeList ListadeNodos2 = SegundoElemento.getChildNodes();

.....

Answers


Something like this:

public static void main(String[] args) {
    //get the nodeList
    //...
    for (int h = 0; h < nodeLista.getLength(); h++) {
        Node Primer_Hijo = nodeLista.item(h); 
        navegate(Primer_Hijo);
    }

    //or (better) the root node
    navegate(rootNode);
}

void navegate(Node node){
    //do something with node
    node.getAttributes();
    //...

    for(int i=0; i<node.getChildNodes().getLength(); i++)
        navegate(node.getChildNodes().item(i));
    }
}

Recursive descent is exactly what you are looking for.

http://en.wikipedia.org/wiki/Recursive_descent_parser


For parsing html I have used Jerry in the past.

It bills itself as jquery for java and allows you to use css style selectors. I think there are now several libraries that implement css style selectors now.

It leads to more easily readable code though it might not fit your use case.


This is the pseudo code

    traverse_tree(node)   {
    childNodes = node.getChildNodes();
    if(chidNodes is empty){
      print valueOf(node);
      return;
    }
    for each childNode in childNodes{
     traverse_tree(childNode);
    }
}

Start traversal by calling traverse_tree(rootNode) //root is the tree root node.


Need Your Help