Kimler Sidebar Menu

Kimler Adventure Pages: Journal Entries

search cloudRandom Searches
random top 40

Are You Serving XHTML with the Wrong MIME Type?

Are You Serving XHTML with the Wrong MIME Type?

November 10th, 2005  · stk

20-Feb-2006 UPDATE: Code modifications strengthen the PHP script.

We've had our b2evolution blog for just under a year. One of the reasons we picked b2evolution, was because it was written to XHTML standards. Like many b2evolution "skins", ours has a W3C validation button, which boasts of our XHTML compliance. Great!

So I was, stretching my technological legs and validating to the XHTML 1.0 (Strict) standard. Then, six months later, I realize there's a newer version (XHTML 1.1). What? To keep up with the technological Jone's, I begin validating to THAT. However (I find out, a few months later) that THIS WHOLE TIME - despite my learning XHTML, careful coding and validation frustration - our pages are being served as PLAIN OLD HTML!

Plain old HTML?

Pace Picante sauce aside, this causes me to pause. What am I doing wrong? I'm validating against the the XHTML 1.1 standard (it says so when I hit the W3C validation button). I've got the XHTML 1.1 DocType statement. I've even have a Meta tag that SAYS the "Content-Type" is "application/xhtml+xml". I should be all set, right? Wrong.

In a totally twisted plot that involves the Pope, GWB & a hoard of marauding Vikings, I discover that I'm not alone. MOST PEOPLE using XHTML serve their pages the same way ... as plain old HTML.

To find out why and to find out how to serve your pages as true XHTML ... read on.

Enlightenment from an Odd Corner

Some time ago, I wrote an article about valid imageMaps in XHTML 1.0 (Strict). When I switched to XHTML 1.1, I noticed that the old code yielded validation errors, so I investigated and wrote a new article. A reader pointed out that I wasn't using the correct "Content-Type" for XHTML 1.1 ("application/xhtml+xml") and that adding it would FIX the problem, in FireFox at least. That's when I fixed the Meta tag, but it didn't take care of the problem. Another reader, posting yesterday, indicated that he had run into the same problem and the solution was to use PHP and send out the correct "Doctype", in the header, BEFORE the page is actually sent!

BEFORE? That didn't make sense to me. Why should one have to go through PROGRAMMING efforts just to server a page correctly? Isn't that a LOT of work?

So I spent last night hacking into the topic ... here's what I learned ...

Serve Your XHTML with the Right MIME Type

MIME, originally an extension to email, is used by HTTP as a way to declare the type of content being served. Each resource has a specific type, constructed of two parts: the main type and a subtype (separated by a slash). The MIME type basically TELLS browsers, as it receives the document, how to treat it.

"Which MIME type should XHTML be served with?" the WaSP asked the W3C.

The W3C recommends that ALL XHTML documents be served as "application/xhtml+xml". This is because XHTML requires more strict validation and the code doesn't contain the myriad of diverse tags (AKA "tag soup") that need be supported in "text/html" (one benefit is that XHTML renders faster).

A problem arises, however, because some browsers (most notably IE) don't understand the "application/xhtml+xml" MIME type. So, as a fall-back, XHTML 1.0 can be served as "text/html", (plain old HTML). (Note that this applies ONLY to XHTML 1.0 - newer versions shouldn't be served as "text/html").

IF You're validating to XHTML 1.0 - BOTTOM LINE

Chances are that you're NOT serving your pages in the recommended fashion and you know what? ... that's OKAY. Just keep writing to XHTML standards, validating and know that you CAN (when the majority of the browsers get around to supporting "application/xhtml+xml") SWITCH your MIME type. (It'll probably be even easier, as you likely won't need to determine first IF the browser can "understand" the MIME type, using PHP, as by then ALL WILL.)

BTW ... Don't look for "application/xhtml+xml" to be supported in IE7 either, according to a Sept 15, 2005 article in the IEBlog, by Chris Wilson. It's going to be a while before IE supports it. (I'm glad they've decided to wait till they get it "right", but shake a leg boys! Even with legacy issues ... why is IE so FAR behind the times? (Dont' get me started ... )

Using PHP to Serve It Up

My poster presented some PHP code that allows one to serve the correct MIME type, depending on the HTTP_ACCEPT environment variable sent by the visitor. This works, but doesn't take into account the special case of the W3C validator, nor does it employ the more elegant use of browser "Q" values that I found in Neil Crosby's (undated) article, "Serving XHTML with the Correct MIME Type using PHP".

The HTTP_ACCEPT variable is a way for browsers to list which MIME types they accept. IF "application/xhtml+xml" is on the list, then the browser can render pages written in that MIME type. While browsers will accept that type, it doesn't mean that they prefer that type. That's what the "Q" Value indicates. It's an on/off bit, that indicates a "like/dislike" for a particular type. Neil's script uses PHP to send the preferred MIME type to a browser, rather than just what it "accepts".

Browsers send out a HTTP_ACCEPT list, but the W3C validation service doesn't. Neil's code takes this special case into consideration, which allows your pages to be validated against their true standard to which you're writing, despite what browser you're using. Sweet.

And lastly, the script sends out a MIME type of "text/html", to not-so-current browsers (*cough-cough* ... like Internet Explorer). GREAT!

<?php
$charset = "iso-8859-1";
$mime = "text/html";

# NOTE: To allow for q-values with one space (text/html; q=0.5),
# use the following regex:
# "/text\/html;[\ ]{0,1}q=([0-1]{0,1}\.\d{0,4})/i"
if((isset($_SERVER["HTTP_ACCEPT"])) && (stristr($_SERVER["HTTP_ACCEPT"],"application/xhtml+xml"))) {
   if(preg_match("/application\/xhtml\+xml;q=([0-1]{0,1}\.\d{0,4})/i",$_SERVER["HTTP_ACCEPT"],$matches)) {
      $xhtml_q = $matches[1];
      if(preg_match("/text\/html;q=([0-1]{0,1}\.\d{0,4})/i",$_SERVER["HTTP_ACCEPT"],$matches)) {
         $html_q = $matches[1];
         if((float)$xhtml_q >= (float)$html_q)
            $mime = "application/xhtml+xml";
        }
     }
   else
  $mime = "application/xhtml+xml";
  }

# special check for the W3C_Validator
# (allows IE page validation as XHTMLv1.1)
# but still serves page for IE to "understand" (i.e. text/html)
if (stristr($_SERVER["HTTP_USER_AGENT"],"W3C_Validator")) {
   $mime = "application/xhtml+xml";
  }

# set the prolog_type according to the mime type which was determined
if($mime == "application/xhtml+xml") {
   $prolog_type  = '<?xml version="1.0" encoding="'.$charset.'" ?>'."\n";
   $prolog_type .= '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" ';
$prolog_type .= '"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">'."\n";
   $prolog_type .= '<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">'."\n\n";
  } else {
   $prolog_type  = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"';
$prolog_type .= '"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">'."\n";
$prolog_type .= '<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">'."\n\n";

  }

# output the mime type and prolog type to your page
header("Content-Type: $mime;charset=$charset");
header("Vary: Accept");
print $prolog_type;

?>

The instructions are simple. Cut and paste the above code into a file and name it "myMIME.php". Then, just replace whatever you have at the top of your html page, before the <head> tag, with a simple PHP include: [ <?php include "/path/to/myMIME.php" ?> ] (minus the brackets).

Now you'll be correctly serving "application/xhtml+xml" to those browsers that prefer it, and "text/html" to whichever browsers don't have a clue. (Note: Because you're sending a header, it's imperative that the "<?php" be the VERY FIRST characters of your XHTML file, otherwise you'll get an ERROR message, saying that "headers have already been sent".)

UPDATE (20-Feb-2006): I have modified Neil's original code, making it better. The following modifications were made and are represented in the blue code block, above:

  1. The code checks to see if the HTTP_ACCEPT variable is set. This prevents PHP errors if the variable isn't set.
  2. New REGEX is used to evaluate the "Q value" contained in HTTP_ACCEPT and compares float values, rather than integer values. (Thanks to Darrin Yeager for that bit of code).
  3. If the visitor's browser doesn't accept the "application/xhtml+xml" MIME type, an XHTML 1.0 (Strict) DOCTYPE is sent (rather than an XHTML 1.1), using a "text/html" MIME type. (This is a perfectly valid MIME type for XHTML 1.0. Note: Omitting the XML declaration keeps IE6 out of 'quirks' mode.)
  4. Because the Doctype is XHTML 1.0 (Strict) and not HTML 4.01, there is no longer a need to "fix" self-closing tags (i.e., "/>"), so the "fix_code" function in his original script is no longer necessary.
(Permalink)
Views: 33231 views
19 Comments · GuestBook
default pin-it button
Updated: 4-Mar-2006
Web View Count: 33231 viewsLast Web Update: 4-Mar-2006

Your Two Sense:

XHTML tags allowed. URLs & such will be converted to links.


Subscribe to Comments

Auto convert line breaks to <br />

1.flag Vkaryl Comment
02/18/07
I'm using your re-version of the original mimetype.php on a couple of sites to test. While the mime type sent is working fine, in Firefox 2.0.0.1 and Opera 9.10 I'm having a problem with google ads - the ad blocks don't load if the mime type sent is xhtml 1.1 (or even 1.0, which I also tried). I tried the following without effect:

1. Direct input of the ad js into the index.php file (they're normally run through the php include function)
2. Changing the extension on the ad include files from .html to .php
3. Changing the charset from utf-8 (which is what I set up all my sites to use) back to iso-8859-1 which is what the script originally used
4. Changing the entire index.php file to have no includes at all even though the footer and menu includes work just fine

Site I'm testing here: http://bytehaven.com/. Note that the version served to IE displays the js just fine. The version served to Opera at least displays the entire page background color, unlike in FF where the page is "missing" halfway down.

If you have any thoughts about this, I'd appreciate knowing them! I really would love to use this function, but if the js is problematic.... (Oh, I did make sure I'm not using document.write anywhere - saw that in a comment on the workingwithme article....)

Thanks for any help you can give.
2.flag Vkaryl Comment
02/18/07
Forgot to say, I also tried putting the ads into separate .js files and calling them using the src element. Also didn't work in FF or Opera, though it was still fine in IE.... (tested IE both 6 and 7, btw).
3.flag stk Comment
02/18/07
Vkaryl,

Yes ... don't use Google ads! :p

1) They're ugly.
2) They're only advertising competitors for your hosting plans ... is that what you really want?
3) IMO, ppl don't make much $$ from them.
4) They require javascript.
5) Did I say that they detract from a page?

(I don't like ads on pages). ;)

Having said that, I can add that I don't know a lot about the js Google ads uses.

I did notice that your "show_ads.js" file is yielding errors "b has no properties" (line 3) ... six times. (Have you tried the JS on a FFox page that's served as a normal text/html doctype? - i.e., verified that it's a MIME-type-only issue?)

I'd contact Google ad support or search to see if anyone has run into this problem and what a work-a-round might be.

(I assume you've tried extracting the Google ad JS from the page and verified that it all works fine in FFox & Opera?)

Sorry I can't be of more assistance.

Good luck. (Please let me know what you end up doing to solve this issue, as other people might well have this problem.) Contrary to my sagacious opinions, it seems that MANY people use/like Google ad-sense ads! :p

Cheers,

-stk



4.flag stk Comment
02/18/07
Vkaryl,

Not sure why the body bg color isn't showing thru, but it appears to be the #maincontainer that determines how far the bg color goes (same color) ... delete the "height" line and boom ... full page.

I'd have to take a longer look at the XHTML and CSS.

Hope this helps.

-stk
5.flag Vkaryl Comment
02/18/07
Well, the site's up for a redesign - which was why I was planning to use this. The goog ads are there because I have bunches of clients who use them, and I have to design sites around them - so I have to know how they work, and what tweaks need doing, and so on. The whacked display was a byproduct, and since I'm going to redo the site I just won't worry about it. (For one thing, the OTHER thing that doesn't work with the mimetype.php setup is the aardvark FF extension - which makes troubleshooting a PITA since I can't "see" what's going on on-site....)

The goog ads were all working perfectly (yes, I know - I really don't like ads on sites either, but this is "business" - as in clients) before I set up the new "play nice with the browsers" mimetype server. And since IE (which is served text/html etc. instead of the xml mimetype) displays the goog ads fine, I have to assume it's the prolog.

Which means that while I can redo my own sites to use this (since I normally do NOT use goog, or any other, ads) I won't be able to migrate clients until I can figure this out.

I did a search earlier on google in the adsense webmasters info pages and didn't find anything. I'll check further, and see if I can get them to answer an email....

I'll let you know what if anything I find out.

[Hmmm. I need to find some other inline js to try, don't I?]
6.flag stk Comment
02/18/07
Vkaryl,

Understand. FWIW, none of our JS failed during the migration from text/html to application/xhtml+xml. (Which is why I asked if it failed text/hmtl with FF).

If I turn anything up, I'll let you know. Likewise, will be curious to see what's up.

Good luck.

-stk
7.flag Vkaryl Comment
02/18/07
Thanks for that info, Scott. Point of info: is your js inline or external? If external files, is it called in the head only, or in the file body, or the footer?

Don't know if that info will help but you never know!
8.flag stk Comment
02/18/07
Vkaryl,

Hmm ... appears to be a pretty common problem. Maybe this page will help?

Cheers,
-stk

We have JS that is called in the head and body, some from .js files, some from external references and some inline. (But, from what I've read, it's clearly a google adsense problem.) ;)
9.flag Vkaryl Comment
02/18/07
I just had the bright idea of commenting out the first $prolog line (appx. line 31 in your code block up there) - the line which actually outputs the prolog header in FF and Opera.

Guess what? The ads display just fine. So it's definitely the prolog itself as output in FF and Opera. Now whether google will have a workaround or even care about this, I don't know. Time will tell on that.

Very interesting. I hadn't thought to do that earlier.... and that fixes the problem with aardvark as well. Hmmm. Wonder what it is about the prolog....

Well, thanks for your time. If I get any help from google I'll post back.
10.flag Vkaryl Comment
02/18/07
Thanks much - I will read over the Keystones article for sure. I didn't do a plain goog search earlier, since I figured if it was a problem I could find it on google's AdSense site - obviously wrong!

Take care, thanks again.
11.flag Vkaryl Comment
02/18/07
And as a final fillip to the whole thing:

"CSS properties applied to the body element don't apply to the whole viewport in XHTML. This is most notable when a background colour or image is applied. In HTML, a background applied to the body element will cover the entire page. In XHTML, you need to style the html element as well. There is a demonstration of this behaviour in CSS body Element Test at Juicy Studio."

Sheesh. I obviously need to do some serious study here.
12.flag NewGuy Comment
01/10/09
Does anyone know why IE7 does not format xml on the page with a css? If the page is .xml it works, but if .php it does not. Thanks.
13.flag Andreas Elf Comment
05/02/09
I just want to say that it works like a charm.
14.flag stk Comment
05/02/09
Andreas - Thanks for commenting and letting us know it's still working great!
15.flag Mbenjamin Comment
07/05/09
stk - great article and very helpful. thanks. just a few questions if you don't mind:

(1) with the arrival of FF3.5 & IE8 (and 3 years gone by), is the code in the blue box still valid?

(2) i will need to change the charset to utf-8. i don't foresee any issues with that move. are you aware of any problems?

(3) once i implement the code, will i need to change my page extensions from .html to .php?

(4) some of my pages will need to carry google ads. what was the final remedy that was reached in your exchanges with vkaryl?

any feedback you can provide would be greatly appreciated. thanks.
16.flag stk Comment
07/05/09
Mbenjamin - Answers I hope will help:

1) The code should be valid, as it relies on what browsers accept and not specific browser versions. (One would have to test it, though, to be certain).

2) I don't use UTF-8, having used and relied on ISO-8859-1 since we started this blog. No ideas.

3) No or Yes, depending. We've told our server to parse HTML extensions as PHP, so we can use either-or. (If you're on a Linux-based server, just add the following to your .htaccess file).


# Parse php commands in HTML files
# AddHandler application/x-httpd-php .htm .html
AddHandler application/x-httpd-php5 .htm .html


4) I believe I used this solution (or a modification there-of).

Sorry I can't be more definitive. We've actually backed off of the XHTMLv1.1 and application/xhtml+xml MIME type a while ago, for some of the reasons outlined in this article.

Cheers.

17.flag Mbenjamin Comment
07/06/09
stk - thank you, again VERY MUCH. your articles and answers to my questions have been very helpful. mbenjamin
18.flag Pianom4n Comment
08/01/09
Instead of stristr, you should use stripos. stripos is faster because it only looks for one instance of the search term, while stristr looks for all of them.
19.flag DaveW Comment
06/26/12
WOW - you solved a completely different problem for me. I was updating my website to be "mobile friendly", and using "content-type: application/xhtml+xml" to trigger the strict evaluation of everything so I could make the changes and get everything right. All was working well, until I started my IE tests, and suddenly, IE was trying to download a PHP file rather than PHP interpreted output from that file. Searched high and low for an answer and found this - which pointed me in the right direction and provided the solution. Thanks again..!!!