David Madore's WebLog: XHTML or no XHTML?

Ian Hickson has written an interesting note on why not to use XHTML for the moment. He raises some very interesting issues. One of them is that the overwhelming majority of Web authors are hopelessly clueless and will just copy their HTML code from some other site or some poorly written book. Now when they start copying thinks like <?xml version="1.0" encoding="utf-8"?> and <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> without understanding what it means, then we have problems. Also when they start writing <br /> when they should have written <br> or vice versa, because they haven't heard of XHTML or don't know the difference with SGML-based HTML. Hell will not break loose now, because existing Web browsers have been built to be very fault-tolerant, but it may break loose in the future.

So, my important advice to Web authors: if you don't want to write markup that validates (or if this all sounds Chinese to you), fine, but then make sure of one thing: don't include in your HTML code anything which contains the characters <? or the word DOCTYPE. Just don't. Unless you know exactly what they mean, that is, and are prepared to face the consequences. If you don't, what you're writing is known as a tag soup, and the correct way to start an HTML tag soup is with <html> (or perhaps <html lang="en"> or some such thing). If you start the document with <?xml version="1.0" encoding="utf-8"?> then you are promising well-formed XML, so you had better know what this means. If you include a line such as <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> then you are promising markup that validates against a specific DOCTYPE, so you had better check that it does validate. If you aren't prepared to go through all that, then start with <html> and write tag soup, which just works.

Now why do I write XHTML when Ian Hickson quite rightly points out that it will bring me no advantage whatsoever (since it is served as text/html and not application/xhtml+xml)? Well, for one thing, my XHTML is valid: but the point of being valid is not that it makes the page any better per se, it simply helps me check for some basic mistakes that even using two-and-fourty different Web browsers wouldn't catch. But also, quite trivially, I find XHTML simpler to write than HTML4: writing <br> without ever closing the tag, for example, just seems wrong. And when the pages are computer-generated it's even more obvious: it is such a pain to write a program that will have to remember that the <br> tag may not be closed, for example, whereas in XHTML we simply close every tag, no questions asked.