DOM Error - ID 'someAnchor' already defined in Entity, line X

If I try to load an HTML document into PHP DOM I get an error along the lines of:

Error DOMDocument::loadHTML() [domdocument.loadhtml]: ID someAnchor already defined in Entity, line: 9

I cannot work out why. Here is some code that loads an HTML string into DOM.

First without containing an anchor tag and second with one. The second document produces an error.

Hopefully you should be able to cut and paste it into a script and run it to see the same output:

<?php
ini_set('display_errors', 1);
error_reporting(E_ALL);


$stringWithNoAnchor = <<<EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body >
<h1>Hello</h1>
</body>
</html>
EOT;

$stringWithAnchor = <<<EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body >
<h1>Hello</h1>
<a name="someAnchor" id="someAnchor"></a>
</body>
</html>
EOT;

class domGrabber
    {
    public $_FileErrorStr = '';

    /**
    *@desc DOM object factory does the work of loading the DOM object
    */
    public function getLoadAsDOMObj($htmlString)
        {
        $this->_FileErrorStr =''; //reset error container
        $xmlDoc = new DOMDocument();
        set_error_handler(array($this, '_FileErrorHandler')); // Warnings and errors are suppressed
        $xmlDoc->loadHTML($htmlString);
        restore_error_handler();
        return $xmlDoc;
        }

    /**
    *@desc public so that it can catch errors from outside this class
    */
    public function _FileErrorHandler($errno, $errstr, $errfile, $errline)
        {
        if ($this->_FileErrorStr === null)
            {
            $this->_FileErrorStr = $errstr;
            }
        else    {
            $this->_FileErrorStr .= (PHP_EOL . $errstr);
            }
        }
    }

$domGrabber = new  domGrabber();
$xmlDoc = $domGrabber->getLoadAsDOMObj($stringWithNoAnchor );

echo 'PHP Version: '. phpversion() .'<br />'."\n";

echo '<pre>';
print $xmlDoc->saveXML();
echo '</pre>'."\n";
if ($domGrabber->_FileErrorStr)
    {
    echo 'Error'. $domGrabber->_FileErrorStr;
    }



$xmlDoc = $domGrabber->getLoadAsDOMObj($stringWithAnchor);
echo '<pre>';
print $xmlDoc->saveXML();
echo '</pre>'."\n";
if ($domGrabber->_FileErrorStr)
    {
    echo 'Error'. $domGrabber->_FileErrorStr;
    }

I get the following out put in my Firefox source code view:

PHP Version: 5.2.9<br />
<pre><?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml"><head><title>My document</title><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /></head><body>
<h1>Hello</h1>
</body></html>
</pre>
<pre><?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml"><head><title>My document</title><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /></head><body>
<h1>Hello</h1>
<a name="someAnchor" id="someAnchor"></a>

</body></html>
</pre>
Error
DOMDocument::loadHTML() [<a href='domdocument.loadhtml'>domdocument.loadhtml</a>]: ID someAnchor already defined in Entity, line: 9

So, why is DOM saying that someAnchor is already defined?


Update:

I experimented with both

  • Instead of using loadHTML() I used the loadXML() method - and that fixed it
  • Instead of having both id and name I used just id - Attribute and that fixed it.

See the comparison script here for the sake of completion:

<?php
ini_set('display_errors', 1);
error_reporting(E_ALL);


$stringWithNoAnchor = <<<EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body >
<p>stringWithNoAnchor</p>
</body>
</html>
EOT;

$stringWithAnchor = <<<EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body >
<p>stringWithAnchor</p>
<a  name="someAnchor" id="someAnchor" ></a>
</body>
</html>
EOT;

$stringWithAnchorButOnlyIdAtt = <<<EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body >
<p>stringWithAnchorButOnlyIdAtt</p>
<a id="someAnchor"></a>
</body>
</html>
EOT;

class domGrabber
    {
    public $_FileErrorStr = '';
    public $useHTMLMethod = TRUE;

    /**
    *@desc DOM object factory does the work of loading the DOM object
    */
    public function loadDOMObjAndWriteOut($htmlString)
        {
        $this->_FileErrorStr ='';

        $xmlDoc = new DOMDocument();
        set_error_handler(array($this, '_FileErrorHandler')); // Warnings and errors are suppressed


        if ($this->useHTMLMethod)
            {
            $xmlDoc->loadHTML($htmlString);
            }
        else    {
            $xmlDoc->loadXML($htmlString);
            }


        restore_error_handler();

        echo "<h1>";
        echo ($this->useHTMLMethod) ? 'using xmlDoc->loadHTML() ' : 'using $xmlDoc->loadXML()';
        echo "</h1>";
        echo '<pre>';
        print $xmlDoc->saveXML();
        echo '</pre>'."\n";
        if ($this->_FileErrorStr)
            {
            echo 'Error'. $this->_FileErrorStr;
            }
        }

    /**
    *@desc public so that it can catch errors from outside this class
    */
    public function _FileErrorHandler($errno, $errstr, $errfile, $errline)
        {
        if ($this->_FileErrorStr === null)
            {
            $this->_FileErrorStr = $errstr;
            }
        else    {
            $this->_FileErrorStr .= (PHP_EOL . $errstr);
            }
        }
    }

$domGrabber = new  domGrabber();

echo 'PHP Version: '. phpversion() .'<br />'."\n";

$domGrabber->useHTMLMethod = TRUE; //DOM->loadHTML
$domGrabber->loadDOMObjAndWriteOut($stringWithNoAnchor);
$domGrabber->loadDOMObjAndWriteOut($stringWithAnchor );
$domGrabber->loadDOMObjAndWriteOut($stringWithAnchorButOnlyIdAtt);

$domGrabber->useHTMLMethod = FALSE; //use DOM->loadXML
$domGrabber->loadDOMObjAndWriteOut($stringWithNoAnchor);
$domGrabber->loadDOMObjAndWriteOut($stringWithAnchor );
$domGrabber->loadDOMObjAndWriteOut($stringWithAnchorButOnlyIdAtt);

Answers


If you are loading XML files (that's the case, XHTML is XML), then you should use DOMDocument::loadXML(), not DOMDocument::loadHTML().

In HTML, both name and id introduce an ID. So you are repeating the id "someAnchor", hence the error.

However, the W3C validator allows repeated IDs in the form you show <a id="someAnchor" name="someAnchor"></a>. This may be a bug of libmxl2.

In this bug report for libxml2, a user proposes a patch to only consider the name attribute as an ID:

According to the HTML and XHTML specs, only the a element's name attribute shares name space with id attributes. For some of the elements it can be argued that multiple instances with the same name don't make sense, but they should nevertheless not be considered in the same namespace as other elements' id attributes.

See http://www.zvon.org/xxl/xhtmlReference/Output/Strict/attr_name.html for all the elements that take name attributes and their semantics.


Need Your Help

Facebook connect "An error occurred. Please try again later."

php facebook

I have a link on my page and the links has the href set by this $facebook-&gt;getLoginUrl(); when I click on the link and go to facebook I get this error: An error occurred. Please try again later....

Firefox Plugin Downloading

javascript firefox firefox-addon firebug firebug1.5

I am trying to create a SIMPLE plugin, no interface is necessary, that will automatically download and save to "Desktop/MyFolder/" everything that the page loads. My thought was to make an extension