MAMA: HTTP headers report
Introduction
Before we start discussing the main part of MAMA's results on markup, CSS, and script, we need to start at the logical beginning of our story. Any page that a browser displays begins with a HTTP transaction. The requester asks a Web server for a document, and the server responds with important meta-information about the HTTP transaction response before moving forward and delivering the bulk of the response—the document itself. This HTTP response contains many items worth examining. Some may consider this topic dull— it is an easy claim to make that most authors will never have a need to know about the details of the HTTP transport layer in the process of writing their documents. However, it all starts with the HTTP headers— it is the basis for all the rest that follows and is critical for us to look at to produce a cohesive exploration of Web documents.
Read the first report in the series - MAMA: Markup validation report - if you haven't already done so. For more on MAMA, check out the MAMA homepage. For more on HTTP headers, check out the full HTTP header results.
MAMA's HTTP request headers
We must first look at MAMA's HTTP request headers before looking at how Web servers responded to MAMA. The initial HTTP request is the first link in the chain, and it shapes much of what follows. For this study, MAMA tried to experience the Web as closely as possible to how Opera would experience it. Of particular interest is MAMA's User-Agent string:
Header name | Header value |
---|---|
Accept |
"text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1" |
Accept-Charset |
"windows-1252, utf-8, utf-16, iso-8859-1;q=0.6, *;q=0.1" |
Accept-Encoding |
"identity, *;q=0" |
Accept-Language |
"en" |
Connection |
"Keep-Alive" |
User-Agent |
"Opera/9.10 (Windows NT 5.1; U; en)" |
Note: The Accept-Language
and Accept-Charset
values chosen reflect the author's
own particular language bias. This can have an affect on what is served in the
HTTP response.
The HTTP Response— general anatomy
The HTTP response block consists of a "status line" followed by any number of newline-separated header field/value pairs. HTTP/1.1 (RFC 2616) provides considerable detail about the anatomy of an HTTP response, but in this overview we will only examine some of its most popular field/value components. The format that HTTP header name/value pairs follows is generally:
[Field Name (case-insensitive)]: [Field Value]
HTTP header response fields
First we will take a look at the most popular HTTP header fields in MAMA's URL set.
Three fields were found to be nearly universal: Content-Type
(the MIME type of the document), Date
(the date of the HTTP
transaction) and Server
(information about the Web server
sending the HTTP response).
Header name | Quantity | Percentage | Header name | Quantity | Percentage | |
---|---|---|---|---|---|---|
Content-Type |
3,508,919 | 100.0% | Last-Modified |
2,129,100 | 60.7% | |
Date |
3,504,603 | 99.9% | Content-Range |
2,068,687 | 59.0% | |
Server |
3,465,179 | 98.8% | ETag |
1,954,567 | 55.7% | |
Connection |
2,851,099 | 81.3% | Accept-Ranges |
1,870,170 | 53.3% | |
Content-Length |
2,534,203 | 72.2% | X-Powered-By |
1,348,347 | 38.4% |
Although many of the other HTTP header fields are not as ubiquitous as the top 3, we can see from the top 10 that many other headers are used VERY frequently. The average is 9 fields for an HTTP header.
Number of fields |
Quantity | Percentage |
---|---|---|
9 | 1,601,744 | 45.6% |
10 | 377,903 | 10.8% |
7 | 373,266 | 10.6% |
6 | 327,701 | 9.3% |
8 | 307,215 | 8.8% |
5 | 196,896 | 5.6% |
11 | 160,630 | 4.6% |
12 | 82,386 | 2.4% |
13 | 33,923 | 1.0% |
4 | 15,652 | 0.5% |
The average length of a typical HTTP header in MAMA's URL set is 381 characters. The smallest HTTP header block encountered was 66 characters, while the longest found was 9,725 characters.
Content-Type
HTTP header "charset" value
While the purpose of the Content-Type
header to
describe a document's MIME type is naturally ubiquitous, MAMA's URL selection
process eliminated MIME types that it could not analyze. The URLs that MAMA
surveyed were almost all "text/html", but other values were also encountered.
One optional portion of the Content-Type
header
field does deserve some extra scrutiny. It is the "Charset"
parameter, found in 688,819 of MAMA's URLs (19.6%):
Ex: Content-Type
: text/html; charset=utf-8
The charset field describes the encoding used for the document. Two values are dominant and roughly equal here—"utf-8" and "iso-8859-1", with the former value having the slight edge.
Encoding | Quantity | Percentage overall |
Percentage using HTTP header charset |
---|---|---|---|
utf-8 | 318,351 | 9.1% | 46.2% |
iso-8859-1 | 286,967 | 8.2% | 41.7% |
windows-1251 | 32,154 | 0.9% | 4.7% |
iso-8859-2 | 9,033 | 0.3% | 1.3% |
iso-8859-15 | 4,476 | 0.1% | 0.7% |
windows-1252 | 3,424 | 0.1% | 0.5% |
us-ascii | 3,228 | 0.1% | 0.5% |
shift_jis | 2,869 | 0.1% | 0.4% |
[none/empty] | 2,701 | 0.1% | 0.4% |
euc-jp | 2,589 | 0.1% | 0.4% |
The Server
HTTP header
This is an interesting value to examine. The main story we see here illustrates a long-standing competition between the two most popular Web servers, Apache and Microsoft's IIS. MAMA's numbers show the distribution heavily skewed in Apache's favor (67.7% as versus 25.9% for IIS). This balance does not match Netcraft's numbers for the same time period: Netcraft's data shows Apache as the dominant Web server, but less so than in MAMA's case (Apache: 50.76%, IIS: 35.84%).
The URL set used in Rene Saarsoo's previous study of
"Coding practices of Web pages"
shares a great deal of overlap with MAMA, and its HTTP header Server
field usage ratio is very similar to MAMA's. The main conclusion we can draw from
this is that the Dmoz URL set itself (which is MAMA's main URL source in this
study) has a slight bias towards Apache over the Web-at-large that Netcraft
appears to cover.
Web server |
Number domains |
Percent domains |
---|---|---|
Apache | 2,011,088 | 67.7% |
Microsoft IIS | 769,375 | 25.9% |
Other | 189,275 | 6.4% |
The Content-Length
HTTP header
This header field, used in ~70% of MAMA's URLs, tells the requester exactly how
long the incoming document is. Does it tell the truth? Luckily, we have a way
to measure this. MAMA assessed the length of every document it analyzed by
using Perl's length
function against the received content body.
Comparing these two values, MAMA finds that the Content-Length
values are rarely incorrect. If they are, they are only off by 1. Out of
2,533,890 URLs using the Content-Length
HTTP header,
all were the same as MAMA's measured length value, except:
Comparison condition | Quantity |
---|---|
Content-Length exceeded MAMA's count by 1 |
663 |
MAMA's length exceeded Content-Length by 1 |
2 |
Content-Length value could not be compared to MAMA's length |
58 |
Summary
It is very likely that the typical Web page author will have never seen the HTTP headers with which their pages are served. Many of the common HTTP headers are the moral equivalent of boring paperwork that has little effect on the author. However, authors should not make the mistake of simply dismissing them entirely— they lay the groundwork for the document that follows. Some of the popular header fields (like those influencing a document's caching and encoding behavior) directly influence the end-user's experience of the pages they've spent so much effort creating.
This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.
Comments
The forum archive of this article is still available on My Opera.