HTML5 syntax - HTML vs XHTML

Even with HTML5 being the path forward for HTML we get two options as developers: XHTML syntax and HTML syntax. I've been using XHTML as my main doctype for 5 or so years so I'm very comfortable with it.

But my question is given that non-xml syntax will be allowed, is there any reason to stick with a valid XML syntax? Do you gain anything going with one over another, besides preference (compatibility, etc)? Personally I'll feel a little dirty going back to not closing tags, is second nature to me now, but would I gain something going back to HTML syntax?

Update: I guess my true question is is there a reason to switch from XHTML to HTML syntax? I've been using XHTML for years and not sure if there is a reason to switch back. Browser compatibility (IE was sometimes finiky with the application/xhtml+xml mime-type), etc?

Answers


I guess my true question is is there a reason to switch from XHTML to HTML syntax? I've been using XHTML for years and not sure if there is a reason to switch back. Browser compatibility (IE was sometimes finiky with the application/xhtml+xml mime-type), etc?

As mentioned in a previous answer, text/html is gets parsed as HTML and application/xhtml+xml gets parsed as XML. Thus, you should use the syntax that matches the MIME type you use.

If you are now serving text/html but using XHTML syntax, then you should fix your content to use the HTML5 syntax. You may already be close, since HTML5 allows the XMLesque /> empty element syntax for void elements (elements that are always empty, such as img and br).

If you are now using application/xhtml+xml, IE support would be a reason to switch to text/html and the HTML syntax if you care about supporting IE.

Trying to write polyglot documents that are correct HTML5 and XHTML5 (for serving different MIME types do different browsers with the same payload bytes) is harder than it seems at first sight and not worth the trouble.


The advantage of XHTML syntax is that it is XML. It can be easily parsed, understood and manipulated. The HTML syntax is a lot harder for clients to work with.

Nonsense! The HTML5 spec defines how to parse HTML in a way that is relatively easy to implement, and off-the-shelf parsers are being developed that can be easily integrated into tool chains. It's even possible for an HTML5 parser to be integrated into an XML tool chain in place of an XML parser.

But what you need to understand is that in practice, you're most likely using HTML anyway, even if you think you're using XHTML based on the DOCTYPE. If your content is being served as text/html, instead of application/xhtml+xml or another XML MIME type, then your content will be processed as HTML.

With HTML5, you can choose to use HTML-only syntax, meaning that it is only compatible with being served and processed as text/html it is not well-formed XML. Or use XHTML-only syntax, meaning that is is well-formed XML, but uses XML features that are not compatible with HTML. Or, you can write a Polyglot document, which is conforming and compatible with both HTML and XHTML processing (In principle, this is conceptually similar to writing XHTML 1.0 that conforms with Appendix C guidelines).


The HTML5 draft is very clear about which syntax to use:

  • use HTML syntax when sending pages as text/html
  • use XHTML syntax when sending pages as application/xhtml+xml

Reference: http://dev.w3.org/html5/spec/Overview.html#authors-using-xhtml


When using XHTML you can mix it with other XML content, f.e. MathML, SVG or your own proprietary format, by just changing namespace at some point. Also, you can embed XHTML inside other XML documents.

(well, actually MathML and SVG can be used in non-XML HTML5 too, but they are special-cased)


You shouldn't use XHTML to serve content on the Web (or any network including Internet Explorer clients); see Sending XHTML as text/html Considered Harmful for the full rationale.


Most of the benefits of XHTML have failed to materialise. While I wouldn't recommend it for new projects, XHTML served as text/html seems to be quite manageable and widespread, as long as you follow the compatibility guidelines. It probably isn't worthwhile changing any significant projects back to the HTML serialisation.


The advantage of XHTML syntax is that it is XML. It can be easily parsed, understood and manipulated. The HTML syntax is a lot harder for clients to work with.

But ultimately, it is just a matter of syntax. Both forms are allowed for HTML5.


Update: I guess my true question is is there a reason to switch from XHTML to HTML syntax? I've been using XHTML for years and not sure if there is a reason to switch back. Browser compatibility (IE was sometimes finiky with the application/xhtml+xml mime-type), etc?

You have to really consider two things. The language you are writing and the language you are sending. The Web is defined by 3 components:

  • URI
  • A resource - Markup Language (document)
  • A protocol - HTTP (tool for managing information space)

You can write a document with an XML syntax on your desktop such as using XHTML. In this specific environment, if you give the extension ".xhtml" to the filename and open it with your local browser, it will be parsed as XML. If you give the extension ".html" to the filename, it will be parsed as HTML. Basically in your authoring tool, it is XML, but this doesn't matter anymore once you process it with a tool.

On the Web, your ressource identified by a URI will be sent with a specific mimetype, most of the time, these days, people are using text/html. The mimetype defines how the client (browser, search engine bot, etc.) must process your document. If you are using an XML syntax but send it with text/html, the document will be processed by an html parser.

For sending your documents over the wire as XML, you have to configure your server to send it as application/xhtml+xml. (Note: that IE8 and previous versions do not understand what is application/xhtml+xml and they will propose the save menu.)

The HTML 5 Abstract model has been designed in a way that you can almost write it with an html syntax or an xml syntax in text/html. Almost because even if you write with an XML syntax (closing empty elements, quotes around attributes, etc.) you will get into troubles for complex pages which are calling scripting and namespaces, due to the way XML parsers and HTML parsers deal with those.


I like XHTML, because it forces me to write a good page. There are many advantages to XHTML, because browsers parse it faster, and you need to make well formed XML rather than just HTML. Also, you need to serve a page with the MIME Type application/xhtml+xml or you don't get any of the advantages of the X. The only problem with XHTML is that it won't display in IE8 and earlier.


2019 UPDATE

W3 own words about XHTML:

"A newer specification exists that is recommended for new adoption in place of this specification. New implementations should follow the latest version of the HTML specification."

So, you should use HTML 5.*


Need Your Help

Show padding zeros using DecimalFormat

java formatting decimalformat

I'm using DecimalFormat to format doubles to 2 decimal places like this:

How do I write a setup.py for a twistd/twisted plugin that works with setuptools, distribute, etc?

python twisted setuptools distutils distribute

The Twisted Plugin System is the preferred way to write extensible twisted applications.