MAMA: HEAD structure

By Brian Wilson

Index:

  1. Introduction
  2. HEAD and its sub-elements
  3. TITLE element
  4. META element
  5. SCRIPT element
  6. LINK element
  7. STYLE element
  8. BASE element

Introduction

The elements that constitute the Head section of a page contain meta information about the document, providing data used to access crucial external resources or other important information. The importance of these elements is highlighted by their dominant usage above other elements.

NOTE:
The elements listed in this section usually exist within the HEAD element, but the statistics presented for these elements/attributes are for occurrences anywhere in the document.

The HEAD element is the most popular of any markup element, and its top 5 sub-elements are also in the top 20 of ALL markup elements. Not all of the sub-elements are so popular. The obsolete ISINDEX element was only found in 63 URLs—it seems well and truly dead, having long-since been replaced by the much richer interactive Forms we know and love today.

Fig 2-1: Elements in the HEAD block
ELEMENTFrequency ELEMENTFrequency
HEAD3,464,519LINK2,018,510
TITLE3,459,207STYLE1,313,454
META3,276,347BASE266,149
SCRIPT2,528,823ISINDEX63

The Profile attribute of the HEAD element

This attribute was detected 19,030 times in MAMA's URLs. It is used to specify a URL location of a metadata profile. At least ~90% of the values point to the XFN metadata system, which is used by the Rel attribute of hyperlinks to indicate relationships between authors and other people. Other types occurring with some frequency were Dublin Core and RSS syndication metadata.

Fig 2-2: Popular values for the HEAD Profile attribute
(See also the complete frequency table.)
Profile attribute valueFrequency
http://gmpg.org/xfn/11/15,770
http://gmpg.org/xfn/1/1,149
http://dublincore.org/documents/dcq-html/395
http://purl.org/metadata/dublin_core/313

TITLE element

With 3,459,207 occurrences, this is the second most popular element overall. When planning MAMA, it did not seem like storing the contents of the TITLE element would be all that compelling. I did not know how long the contents might reach either. So I settled for clamping the stored value to 255 characters. Any content after the first 255 characters was not stored. This makes some statistics about TITLE length meaningless, but others can still be useful; 128,874 URLs had empty TITLE elements, and 23,322 URLs had the maximum TITLE length (255 characters).

META element

This element ranks 6th overall in usage. Because this level of popularity was expected, MAMA stored additional details about some of the attribute values that were expected to be interesting. Values for the HTTP-Equiv and Name attributes were saved, as well as specific values for the Content attribute for the Content-Encoding and Generator usages.

Fig 4-1: META element/attribute frequency
ELEMENT/AttributeFrequency
META3,276,347
    Content3,273,610
    Http-equiv2,826,859
    Name2,710,638
    Scheme34,807

META Name attribute values

The numbers here show that "keywords" and "description" are about equally popular. Perhaps they are two great tastes that go great together on the Web? Yes. Over 90% of the time, these two types of META are indeed used together.

Fig 4-2: Top META Name attribute values
(See also the complete frequency table.)
META Name valuefrequency META Name valuefrequency
keywords2,170,259copyright419,554
description2,098,529progid280,923
generator942,051distribution235,943
robots931,622rating227,732
author815,415language206,990
revisit-after471,573  

META Http-equiv attribute values

The big story here is that most documents declare their MIME type using a META statement (over 75% of all URLs analyzed). All other usages are dwarfed by the "Content-type" value.

Fig 4-3: Top META Http-equiv attribute values
(See also the complete frequency table.)
META Http-equiv valuefrequency META Http-equiv valuefrequency
content-type2,679,505cache-control71,245
content-language456,078keywords56,833
pragma167,801reply-to50,307
refresh163,413content-script-type46,839
expires163,350description43,694
content-style-type114,828pics-label35,359
imagetoolbar86,502page-enter34,575

META Http-equiv="content-type" charset values

Of the 2,679,505 URLs that used the META Http-equiv="content-type" value, 2,363,865 of them (88.22%) also specified a "charset" parameter to provide encoding details. The top value was the western encoding "iso-8859-1", which was 4 times as likely as any other detected value. Encodings can sometimes be a bit cryptic, so the following guide to languages and encoding values may be helpful with the short summary table below (Fig 4-4) as well as the full frequency table for the "charset" value:

  • Cyrillic (includes Russian): windows-1251, koi8-r, iso-8859-5
  • Japanese: shift_jis, euc_jp, x-sjis, iso-2022-jp, shift-jis
  • Chinese: Trad. Chinese: big5, x-x-big5; Simp. Chinese: gb2312, gbk
  • Korean: euc-kr, ks_c_5601-1987
Fig 4-4: Top META content-type/charset component values
META charset valuefrequency META charset valuefrequency
iso-8859-11,424,697windows-125145,674
windows-1252330,123windows-125031,470
utf-8249,084gb231225,378
shift_jis90,517big512,282
iso-8859-259,850windows-125410,318

META Name="generator": Editors and Content Management Systems (CMS) used

MAMA also looked at these values in the section on markup validation. The most noticeable nugget here is that the many incarnations of Microsoft FrontPage are the definite leaders for this value. FrontPage occurs more than 8 times as often as any other META Name="generator" value. The following two tables are summary totals for the individual values found in the full per-URL frequency table.

Fig 4-5: Editor usage tracked via META Name="generator"
Editor substringfrequency Editor substringfrequency
Microsoft FrontPage347,095Microsoft Visual Studio22,936
Adobe GoLive41,865Adobe PageMill15,148
Microsoft MSHTML40,030Claris Home Page6,259
IBM WebSphere32,218Adobe Dreamweaver5,954
NetObjects Fusion26,355Apple iWeb2,504
Microsoft Word24,892  
Fig 4-6: CMS usage tracked via META Name="generator"
CMS substringFrequency
Joomla34,852
Typo318,067
WordPress16,594
Blogger9,907

SCRIPT element

We will be looking at scripting in much greater depth in a future section on Script, so for now we will just take a quick look at the element and its attributes.

Note: Many people seem to have problems spelling "language" - a number of misspellings of this occur fairly frequently.

Fig 5-1: SCRIPT element/attribute frequency
ELEMENT/Attributefrequency
SCRIPT2,528,823
    Language1,965,725
    Type1,769,337
    Src1,649,468
    Charset25,776
    Defer19,941
    For4,177
    Event3,973
    xml:space520

We will be looking at the dominant use of the LINK element for CSS in much greater depth in a future section on CSS, so for now we'll just take a quick look at the element, its attributes plus a detail view of the values for the Rel and Type. Although the Href and Rel attributes are not required by HTML 4.0, authors appear to treat them that way—they are both used in over 99% of all LINK usages. The frequency of the Rev attribute is higher than expected, but a random sampling reveals only the "made" value is in wide use.

Fig 6-1: LINK element/attribute frequency
ELEMENT/Attributefrequency
LINK2,018,510
    Href2,016,007
    Rel2,001,105
    Type1,777,982
    Media288,862
    Title234,355
    Rev43,977
    Charset4,306
    Hreflang2,335
    Target1,585

LINK Rel attribute values

This attribute was tracked in MAMA by breaking the value down into space-separated components. External style sheet statements are present in 90% of LINK instances, over 20% use the shortcut icon syntax, and ~8.5% of LINK elements specify an alternate form (most likely RSS or similar based on the Type attribute values in the next section).

Fig 6-2: Top LINK Rel attribute values
(See also the complete frequency table.)
LINK Rel valuefrequency LINK Rel valuefrequency
stylesheet1,836,140search22,296
icon451,386edit-time-data22,217
shortcut434,440schema.dc19,193
alternate170,342pingback18,270
file-list46,894home12,145
edituri35,383author11,894

LINK Type attribute values

The Type attribute is not required, but most authors seem to use the Type attribute for stylesheet and RSS uses. The Type usage ratio appears to fall off considerably when specifying a shortcut icon though—there are ~450,000 uses of Rel="icon" syntax, but "image/*" MIME types only happen 1/3 of the time.

Fig 6-3: Top LINK Type attribute values
(See also the complete frequency table.)
LINK Type valuefrequency LINK Type valuefrequency
text/css1,720,750image/png9,770
application/rss+xml147,654application/opensearchdescription+xml8,846
image/x-icon103,102image/gif6,859
application/rsd+xml35,387text/html4,280
application/atom+xml34,866image/vnd.microsoft.icon3,169
image/ico20,149application/wlwmanifest+xml2,615
text/xml14,293application/xml1,094
application/rdf+xml10,351images/x-icon1,006

STYLE element

As was already mentioned, we will be looking at style sheets in much greater depth in a future section on CSS, so for now we will just take a quick look at the element and its attributes.

Fig 7-1: STYLE element/attribute frequency
ELEMENT/Attributefrequency
STYLE1,313,454
    Type1,028,840
    Media116,336
    Title7,236
    xml:space140

BASE element

This element's original purpose was to declare a common root URL for relative URLs in a document, so it is a bit surprising to find that the original usage has been usurped in popularity by frame target control.

Fig 8-1: BASE element/attribute frequency
ELEMENT/Attributefrequency
BASE266,149
    Target159,479
    Href109,404

This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.

Comments

The forum archive of this article is still available on My Opera.

No new comments accepted.