JSOUP using Nodes to get specific text that is outside HTML tags

So, I've been using jSoup to parse a website for some metaData, which works great. The problem is that some of the important metaData that I need is not within any tags, and I don't know how to get it.

Here is an example of the data I would need to get from my URL:

<div class="newclass ">
        <div>
            <p>     
                    <strong>Arist:</strong>&nbsp;Picasso Biggie <em>|</em>
                    <strong>Released:</strong>&nbsp;3 years ago <em>|</em>
                    <strong>Album:</strong>&nbsp;Picasso Biggie: The Big OneUp <em>|</em>                       
                    <strong>Producer:</strong>&nbsp;Various <em>|</em>                      
                    <strong>Featuring:</strong>&nbsp;Mount Kimbie <em>|</em>                                        
            </p>
        </div>
</div>

What I would be looking for in the html here are things like the artist "Picasso Biggie", the year it was released "3 years ago", and album "Picasso Biggie: The Big OneUp" etc.... I've looked into using nodes with jSoup, but I can only find a few examples and cannot figure out how to get jSoup to do what I'm looking for in particular.

This is the code I've tried and it returns nothing:

Document doc = Jsoup.connect(URL).get;
Elements dakss1 = doc.select(".newclass ");
for(Element dakss : rayz1) { 
     TextNode quill = (TextNode) rayz1.nextSibling().childNode(0);
     System.out.println("" + quill);
}

UPDATE: The answer by Shaowei Ling works great for getting all of the text outside of the tags, but I am wondering is there a way to specify only specific nodes, so I can only get specific words. For example instead of getting

    Picasso Biggie
    3 years ago
    Picasso Biggie: The Big OneUp
    Various
    Mount Kimbie

I only get:

3 years ago

if all I need is just the year the album was released?

UPDATE #2: Okay, to solve my second problem where I was parsing multiple items with the same HTML structure as above, I just went ahead and included the specific element I wanted in my selector query for jSoup. So, for example if I wanted to get the all the release dates for Picasso Biggies albums. This is the code I used:

    Document doc = Jsoup.connect(URL).get;
    Elements dakss1 = doc.select(".newclass p strong:contains(Released) ");
    for(Element dakss : dakss1) { 
         Node nodeWithReleaseDates =(Node) dakss.nextSibling();;
         System.out.println("" + nodeWithReleaseDates);
    }

This went ahead and returned all the release dates I wanted for Picasso Biggie's various albums, as follows:

3 years ago
2 years ago 
7 months ago
1 month ago

Answers


In your question, there is an undefined variable rayz1.

The example code may help you. I have run it, it works.

    String html = 
            "<div class=\"newclass \">\n"
            + "        <div>\n"
            + "            <p>     \n"
            + "                    <strong>Arist:</strong>&nbsp;Picasso Biggie <em>|</em>\n"
            + "                    <strong>Released:</strong>&nbsp;3 years ago <em>|</em>\n"
            + "                    <strong>Album:</strong>&nbsp;Picasso Biggie: The Big OneUp <em>|</em>                       \n"
            + "                    <strong>Producer:</strong>&nbsp;Various <em>|</em>                      \n"
            + "                    <strong>Featuring:</strong>&nbsp;Mount Kimbie <em>|</em>                                        \n"
            + "            </p>\n"
            + "        </div>\n"
            + "</div>";
    Document doc = Jsoup.parse(html);
    Elements dakss1 = doc.select("div p strong");
    for (Node dakss : dakss1) {
        System.out.println(dakss.nextSibling().toString().replace("&nbsp;", "").trim());
    }

The result would be:

    Picasso Biggie
    3 years ago
    Picasso Biggie: The Big OneUp
    Various
    Mount Kimbie

Need Your Help

How to keep user typed dates within this format "yyyy/MM/dd" and within actual calendar dates

java date exception calendar formatting

I'm trying to make it so the user cannot type an invalid date. For example, if the user typed in 2017/03/15 it would throw an error and alert the user that this is invalid (keeping the user bound to

Java Lambdas and Closures

java lambda closures java-8

I hear lambdas are coming soon to a Java near you (J8). I found an example of what they will look like on some blog: