Kimler Sidebar Menu

Kimler Adventure Pages: Journal Entries

random top 40

MIME talk

MIME talk

September 28th, 2008  · stk

Article updated with new information from the W3C - July 2009

I've advocated for XHTML and CSS, thinking it was the future of the web. I'm no longer convinced of this. We've decided to go back to well-formed tag soup XHTML after realizing the price for serving the "application/xhtml+xml" MIME type wasn't worth the cost. Find out why

mimeBack to XHTML v1.0 Strict and text/html
In other words: "Well-formed Tag Soup"

Since late 2005, we've been serving our pages as XHTML v1.1, using the application/xhtml+xml MIME type for those browsers - notably FireFox, Opera & Safari - that understand it. (To do this, we used server-side scripting to set the MIME type in the header. For more about the technique, read this 2005 article - "Are You Serving XHTML with the Wrong MIME Type?")

XHTML v1.1 has only negligible coding changes from XHTML v1.0 strict. However, unlike XHTML v1.0, its supposed to be served as an XML document (hence the MIME type). So what? Well, serving XML-based web documents (XHTML v1.1 as application/xhtml+xml) comes at a huge price and we're tired of paying it (and our readers are too - *cough* most notably ¥åßßå).

Originally, we viewed XHTML v1.0 as predecessor of HTML, since it was standard-based and eliminated the problems of proprietary tags and sloppy coding. We blindly migrated to XHTML v1.1, thinking we were further future-proofing our pages. HA!

The future direction of the web (XHTML and HTML) is muddled. Consider: HTML isn't being phased out; developers of browsers such as FireFox, Opera and Safari are lobbying for (and developing) HTML 5; the W3C has renewed the HTML working group; and the Chief Technical Officer of Opera says, "I don't think XHTML is a realistic option for the masses. HTML 5 is it." [sources]

To find out what price our readers will no longer have to pay, and more about XHTML v2.0 and HTML 5 ... read on

The XML Price Tag

An XHTML document is meant to be cleanly written and well formed. Web pages that are sent and read as true XML documents (XHTML v1.1 Doctype with an application/xhtml+xml MIME type) are not flexible with errors. If a browser (or user agent) hits a page with even a single error, it stops reading the page and spits out a "parsing error" message, rather than a web page.

This is supposed to be a benefit?! To paraphrase, "By requiring documents to be well-formed (error-free), it eliminates compatibility issues between incorrectly written code and browser-specific error handling. It also (theoretically) provides a web page author immediate indication of an error."

The trouble is (for us) that we often write/view articles on this website in Internet Explorer, which can't do the whole XML thing and we never see this error. (IE treats the page like plain old HTML and trys to get around the error - often successfully - rather than toss out a fat error message).

Visitors landing on a page with an error, in Firefox, Opera or Safari, see a fat error message, but no content. Until we're alerted of the problem (or stumble upon it ourselves) ... no one using those browsers sees a thing, other than some error they can't do anything about. Frustrating for them. Frustrating for us.

I don't think XHTML is a realistic option for the masses. HTML5 is it.

- the CTO of Opera

Likewise, visitors leaving comments often introduce an XML error - (imagine ... not everyone who comments here are coding experts, which is very shocking I know) - and BLAMMO, the article goes belly up (again, until we stumble upon it or someone tells us about the problem).

Conversely, HTML browsers are written to accept any input (well-formed or not) and then try to make something sensible out of it. Adding in this "error detection and correction" makes browsers incredibly difficult to code, especially if all browsers are expected to handle all the errors in a similar way. (Such handling promotes sloppy code. Because error-filled pages display OKAY, the author isn't even aware there ARE errors). However, at least visitors get to see something (and often times, it looks like it's supposed to). Negatives aside, this seems like a better thing for everyone, even if it means long nights for the browser developers!

It's one of those situations where they were trying to push the right idea (well-formed, standardized code), but got a tad carried away.

Håkon Wium Lie, Chief Technical Officer at Opera sums it up well, " ... it's unrealistic to think that all web authors will switch to an XML-based syntax which demands that browsers stop processing the document on the first error. XML's draconian policy was an attempt to clean up the web," right before he concludes, "I don't think XHTML is a realistic option for the masses. HTML5 is it."


Which Leads us to: XHTML -vs- HTML

First, there are many myths about XHTML and to be honest, I've done my part to propagate some of them. Here's a short list:

  • XHTML doesn't provide any greater separation of content and presentation than HTML
  • Most XHTML pages are processed as HTML anyway, not as XML
  • XHTML isn't a replacement for HTML
  • XHTML v1.x isn't future compatible with XHTML v2.x
  • XHTML doesn't have good browser support
  • XHTML pages aren't necessarily any cleaner than HTML pages

In fact, through the group WHAT-WG company representatives from Apple, Opera and FireFox have successfully lobbied to renew the W3C HTML Working Group and put forth a new drafted recommendation for the "next HTML" - HTML 5. Here's the latest HTML 5 Working Draft.

"Given this information (and XML parsing problems), what's the advantage to using XHTML anyway?" you might ask.

Well, in essence, none. The Randsco pages (just like many other XHTML pages) just use XHTML 1.0 and don't include any other mark-up languages. XHTML shines when you begin to use other XML tools, such as XSLT, XForms, XPath (XML Path Language) or XQuery. Pages that include MathML, SMIL or SVG couldn't be done in HTML, but we don't use any of those tools or language tags. Our page is simply a cleanly written HTML page and (at present) there's really no reason to treat it any differently.



I'm not a hundred percent certain of my conclusions. I guess they're really dependent upon how HTML 5 comes along, how widely adopted XHTML might be and whether we have need for all the XML goodies that XHTML might offer that won't be in any HTML draft. Here's where I'm at now:

  • I can no longer point clients to XHTML as "the way forward" (however ... I still advocate clean, standardized, error-free code, separation of content & styling and avoiding proprietary tags).
  • The future of the web appears just as cloudy as it did a number of years ago.
  • There's presently no-overriding reason Randsco should be served as an XML document (XHTML v1.1), or even as XHTML for that matter (although, because there's a validation benefit - to me - in XHTML strict, we will continue using XHTML v1.0 strict, for now).
  • While the role of XHTML on the web is unclear, the future of CSS looks bright and it remains a good standard.

UPDATE July 2009 - The W3C has announced that XHTML 2 will be dropped when the Working Group's charter expires on 31-Dec-2009. "The W3C XHTML 2 draft specification showed up in 2002 and was last edited in 2006. It was an ambitious re-working of the language for the web, looking to address inconsistencies, banish obvious presentational tags and implement clear and concise mark-up. Some of the concepts were excellent and some will be ported to the HTML 5 specification. Unfortunately, it was possibly too revolutionary, too strict and offered little backward compatibility." (source)


Resources and Further Reading

In no particular order:

XHTML versus HTML - Someone, like me, coming to terms with the realization that XHTML isn't necessarily the way forward.

HTML 5: We don't need no XHTML - A look at the revival of HTML, what it means and offers.

Beware of XHTML - An excellent article promoting the success of XHTML as a standard, by outlining what XHTML is and what it is not.

Interview with Håkon Wium Lie - Chief Technical Officer of Opera discusses the future of web standards.

HTML & XHTML Frequently Asked Questions - The W3C page that answers questions about how a two day workshop meant year's worth of XHTML and HTML questions.

Are You Serving XHTML with the wrong MIME type? - A Randsco article that shows how to use PHP to serve application/xhtml+xml to those browsers that understand it.

Char Sets & encoding in XHTML, HTML & CSS - W3C Internationalization Tutorial that discusses (among other things) the proper encoding for XHTML and HTML.

W3C FAQ about the Future of XHTML - W3C has compiled a list of questions to help the public and W3C members understand the future direction of XHTML, given that the XHTML 2 specification will be dropped.

Views: 24757 views
Leave a Comment · GuestBook
default pin-it button
Updated: 27-Jul-2009
Web View Count: 24757 viewsLast Web Update: 27-Jul-2009

Your Two Sense:

XHTML tags allowed. URLs & such will be converted to links.

Subscribe to Comments

Auto convert line breaks to <br />

No Comments or trackbacks for this post yet ...