Trouble Parsing HTML in PHP

I'm having issues using PHP Simple HTML DOM Parser when trying to parse a string. Here is my code:

$description = $item["description"]; 
$html = str_get_html($description);

foreach($html->find('img') as $e){
        echo $e->src . '<br>';
}

However, I get nothing printed even though my string has img tags. Here is the contents of $description, which I get back using Last RSS to parse an RSS feed.

<div class="field field-name-field-image field-type-image field-label-hidden"><div class="field-items"><div class="field-item even"><a href="/pattern/popover-tip-billionaire"><img src="http://patterntap.com/sites/default/files/styles/pattern-thumb/public/newpatterns/506f0e6a77a47.png" /></a></div></div></div><div class="field field-name-field-source field-type-link-field field-label-hidden"><div class="field-items"><div class="field-item even"><a href="http://www.billionaire.com/" target="_blank" class="source">http://www.billionaire.com/</a></div></div></div>

Doing a vardump($description) gives me:

string(773) "<div class="field field-name-field-image field-type-image field-label-hidden"><div class="field-items"><div class="field-item even"><a href="/pattern/announcement-pinterest"><img src="http://patterntap.com/sites/default/files/styles/pattern-thumb/public/newpatterns/52558136bc854.png" /></a></div></div></div><div class="field field-name-field-source field-type-link-field field-label-hidden"><div class="field-items"><div class="field-item even"><a href="http://www.pinterest.com/" target="_blank" class="source">http://www.pinterest.com/</a></div></div></div>" 

So I think I figure out what my issue is... For some reason $description does not equal the test above, even though I just echoed it out and pasted it. I need to replace only the first and last quotation marks.

$description2 = str_replace('"',"'",$description);

However, doing this tells me the two strings are not equal...

$test = '<div class="field field-name-field-image field-type-image field-label-hidden"><div class="field-items"><div class="field-item even"><a href="/pattern/popover-tip-billionaire"><img src="http://patterntap.com/sites/default/files/styles/pattern-thumb/public/newpatterns/506f0e6a77a47.png" /></a></div></div></div><div class="field field-name-field-source field-type-link-field field-label-hidden"><div class="field-items"><div class="field-item even"><a href="http://www.billionaire.com/" target="_blank" class="source">http://www.billionaire.com/</a></div></div></div>';
$description = $item["description"];
$description2 = str_replace('"',"'",$description);

//var_dump($description1);

//echo $description1;

if ($description2==$test){
    echo "ITS the same!?!";
}
else {
    echo "NOT THE SAME!!!!";
}

Answers


I never had a lot of luck with SimpleHTML DOM Parser - I never found it to be very simple, so I stuck with the built-in DOMDocument class.

This code extracts what you want:

$description = "Your HTML from wherever";
$dom = DOMDocument::loadHTML($description) or die("HTML parse error");

$nodelist = $dom->getElementsByTagName('img');
$nb = $nodelist->length;
for($pos=0; $pos<$nb; $pos++) {
    $node = $nodelist->item($pos);
    echo $node->getAttribute('src');
}

Here is a Fiddle


Your code works for me... But, sometimes using the object-oriented way to create you dom and load the html data solves the problem, so, instead of $html = str_get_html($description); use:

//Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load($description);

I tested your code and I get http://patterntap.com/sites/default/files/styles/pattern-thumb/public/newpatterns/506f0e6a77a47.png as expected. Are you sure your $description is what you wrote here? Also double-check simple_html_dom.php is included properly.

If you can't make it work and looking for an alternative I suggest you to test phpQuery https://code.google.com/p/phpquery/wiki/Manual . I'm using phpQuery and never looked for another alternative. But I'll also test simple-html-dom for pros and cons


Would you like to try another class for parsing html code? I think it's fast and uses less memory.

github.com/tag-parse


Just wanted to share my solution in case anyone ever stumbles on this later.

I ended up using:

$desc = mb_convert_encoding($description, "UTF-8", "HTML-ENTITIES");

After I was able to parse through and find all the images using LastRSS.


Need Your Help

How do I start designing portlets in Eclipse?

java eclipse plugins portlet

Is there any plugins for Eclipse for portlet design and deployment? I have never designed portlets so are there any good tutorials on the web as well?

Using Unity and interfaces, how do I create a concrete class that implements IDisposable

unity-container idisposable

I have an interface (IDbAccess) for a database access class so that I can unit test it using Unity. It all works fine in Unity and now I want to make the concrete database class implement IDisposab...