MAMA: HTTP headers report

By Brian Wilson

Introduction

Before we start discussing the main part of MAMA's results on markup, CSS, and script, we need to start at the logical beginning of our story. Any page that a browser displays begins with a HTTP transaction. The requester asks a Web server for a document, and the server responds with important meta-information about the HTTP transaction response before moving forward and delivering the bulk of the response—the document itself. This HTTP response contains many items worth examining. Some may consider this topic dull— it is an easy claim to make that most authors will never have a need to know about the details of the HTTP transport layer in the process of writing their documents. However, it all starts with the HTTP headers— it is the basis for all the rest that follows and is critical for us to look at to produce a cohesive exploration of Web documents.

Read the first report in the series - MAMA: Markup validation report - if you haven't already done so. For more on MAMA, check out the MAMA homepage. For more on HTTP headers, check out the full HTTP header results.

MAMA's HTTP request headers

We must first look at MAMA's HTTP request headers before looking at how Web servers responded to MAMA. The initial HTTP request is the first link in the chain, and it shapes much of what follows. For this study, MAMA tried to experience the Web as closely as possible to how Opera would experience it. Of particular interest is MAMA's User-Agent string:

MAMA's HTTP request headers
Header nameHeader value
Accept "text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1"
Accept-Charset "windows-1252, utf-8, utf-16, iso-8859-1;q=0.6, *;q=0.1"
Accept-Encoding "identity, *;q=0"
Accept-Language "en"
Connection "Keep-Alive"
User-Agent "Opera/9.10 (Windows NT 5.1; U; en)"

Note: The Accept-Language and Accept-Charset values chosen reflect the author's own particular language bias. This can have an affect on what is served in the HTTP response.

The HTTP Response— general anatomy

The HTTP response block consists of a "status line" followed by any number of newline-separated header field/value pairs. HTTP/1.1 (RFC 2616) provides considerable detail about the anatomy of an HTTP response, but in this overview we will only examine some of its most popular field/value components. The format that HTTP header name/value pairs follows is generally:

[Field Name (case-insensitive)]: [Field Value]

HTTP header response fields

First we will take a look at the most popular HTTP header fields in MAMA's URL set. Three fields were found to be nearly universal: Content-Type (the MIME type of the document), Date (the date of the HTTP transaction) and Server (information about the Web server sending the HTTP response).

Most popular HTTP header fields (top 10)
Header name Quantity Percentage   Header name Quantity Percentage
Content-Type 3,508,919 100.0% Last-Modified 2,129,100 60.7%
Date 3,504,603 99.9% Content-Range 2,068,687 59.0%
Server 3,465,179 98.8% ETag 1,954,567 55.7%
Connection 2,851,099 81.3% Accept-Ranges 1,870,170 53.3%
Content-Length 2,534,203 72.2% X-Powered-By 1,348,347 38.4%

Although many of the other HTTP header fields are not as ubiquitous as the top 3, we can see from the top 10 that many other headers are used VERY frequently. The average is 9 fields for an HTTP header.

Popularity of HTTP header field quantity (top 10)
Number
of fields
Quantity Percentage
9 1,601,744 45.6%
10 377,903 10.8%
7 373,266 10.6%
6 327,701 9.3%
8 307,215 8.8%
5 196,896 5.6%
11 160,630 4.6%
12 82,386 2.4%
13 33,923 1.0%
4 15,652 0.5%

The average length of a typical HTTP header in MAMA's URL set is 381 characters. The smallest HTTP header block encountered was 66 characters, while the longest found was 9,725 characters.

Content-Type HTTP header "charset" value

While the purpose of the Content-Type header to describe a document's MIME type is naturally ubiquitous, MAMA's URL selection process eliminated MIME types that it could not analyze. The URLs that MAMA surveyed were almost all "text/html", but other values were also encountered.

One optional portion of the Content-Type header field does deserve some extra scrutiny. It is the "Charset" parameter, found in 688,819 of MAMA's URLs (19.6%):

Ex: Content-Type: text/html; charset=utf-8

The charset field describes the encoding used for the document. Two values are dominant and roughly equal here—"utf-8" and "iso-8859-1", with the former value having the slight edge.

Popular Server HTTP header values
Encoding Quantity Percentage
overall
Percentage using
HTTP header charset
utf-8 318,351 9.1% 46.2%
iso-8859-1 286,967 8.2% 41.7%
windows-1251 32,154 0.9% 4.7%
iso-8859-2 9,033 0.3% 1.3%
iso-8859-15 4,476 0.1% 0.7%
windows-1252 3,424 0.1% 0.5%
us-ascii 3,228 0.1% 0.5%
shift_jis 2,869 0.1% 0.4%
[none/empty] 2,701 0.1% 0.4%
euc-jp 2,589 0.1% 0.4%

The Server HTTP header

This is an interesting value to examine. The main story we see here illustrates a long-standing competition between the two most popular Web servers, Apache and Microsoft's IIS. MAMA's numbers show the distribution heavily skewed in Apache's favor (67.7% as versus 25.9% for IIS). This balance does not match Netcraft's numbers for the same time period: Netcraft's data shows Apache as the dominant Web server, but less so than in MAMA's case (Apache: 50.76%, IIS: 35.84%).

The URL set used in Rene Saarsoo's previous study of "Coding practices of Web pages" shares a great deal of overlap with MAMA, and its HTTP header Server field usage ratio is very similar to MAMA's. The main conclusion we can draw from this is that the Dmoz URL set itself (which is MAMA's main URL source in this study) has a slight bias towards Apache over the Web-at-large that Netcraft appears to cover.

Popular Server HTTP header values
Web
server
Number
domains
Percent
domains
Apache 2,011,088 67.7%
Microsoft IIS 769,375 25.9%
Other 189,275 6.4%

The Content-Length HTTP header

This header field, used in ~70% of MAMA's URLs, tells the requester exactly how long the incoming document is. Does it tell the truth? Luckily, we have a way to measure this. MAMA assessed the length of every document it analyzed by using Perl's length function against the received content body. Comparing these two values, MAMA finds that the Content-Length values are rarely incorrect. If they are, they are only off by 1. Out of 2,533,890 URLs using the Content-Length HTTP header, all were the same as MAMA's measured length value, except:

Comparison of MAMA's actual document length to the stated Content-Length HTTP header
Comparison condition Quantity
Content-Length exceeded MAMA's count by 1 663
MAMA's length exceeded Content-Length by 1 2
Content-Length value could not be compared to MAMA's length 58

Summary

It is very likely that the typical Web page author will have never seen the HTTP headers with which their pages are served. Many of the common HTTP headers are the moral equivalent of boring paperwork that has little effect on the author. However, authors should not make the mistake of simply dismissing them entirely— they lay the groundwork for the document that follows. Some of the popular header fields (like those influencing a document's caching and encoding behavior) directly influence the end-user's experience of the pages they've spent so much effort creating.

This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.

Comments

The forum archive of this article is still available on My Opera.

No new comments accepted.