MAMA: HEAD structure
Index:
- Introduction
 - HEAD and its sub-elements
 - TITLE element
 - META element
 - SCRIPT element
 - LINK element
 - STYLE element
 - BASE element
 
Introduction
The elements that constitute the Head section of a page contain meta information about the document, providing data used to access crucial external resources or other important information. The importance of these elements is highlighted by their dominant usage above other elements.
NOTE:
The elements listed in this section usually 
   exist within the HEAD element, but the statistics 
   presented for these elements/attributes are for occurrences anywhere in 
   the document.
HEAD and its sub-elements
The HEAD element is the most popular of any markup 
   element, and its top 5 sub-elements are also in the top 20 of ALL 
   markup elements. Not all of the sub-elements are so popular. The obsolete 
   ISINDEX element was only found in 63 URLs—it 
   seems well and truly dead, having long-since been replaced by the much richer 
   interactive Forms we know and love today.
| ELEMENT | Frequency | ELEMENT | Frequency | |
|---|---|---|---|---|
HEAD | 3,464,519 | LINK | 2,018,510 | |
TITLE | 3,459,207 | STYLE | 1,313,454 | |
META | 3,276,347 | BASE | 266,149 | |
SCRIPT | 2,528,823 | ISINDEX | 63 | 
The Profile attribute of the HEAD element
This attribute was detected 19,030 times in MAMA's URLs. It is used to specify 
   a URL location of a metadata profile. At least ~90% of the values point to the 
   XFN metadata system, which is 
   used by the Rel attribute of hyperlinks to indicate 
   relationships between authors and other people. Other types occurring with some 
   frequency were Dublin Core and RSS syndication metadata.
| Profile attribute value | Frequency | 
|---|---|
| http://gmpg.org/xfn/11/ | 15,770 | 
| http://gmpg.org/xfn/1/ | 1,149 | 
| http://dublincore.org/documents/dcq-html/ | 395 | 
| http://purl.org/metadata/dublin_core/ | 313 | 
TITLE element
With 3,459,207 occurrences, this is the second most popular element overall. When 
   planning MAMA, it did not seem like storing the contents of the TITLE 
   element would be all that compelling. I did not know how long the contents might 
   reach either. So I settled for clamping the stored value to 255 characters. 
   Any content after the first 255 characters was not stored. This makes some 
   statistics about TITLE length meaningless, but others 
   can still be useful; 128,874 URLs had empty TITLE elements, 
   and 23,322 URLs had the maximum TITLE length (255 characters).
META element
This element ranks 6th overall in usage. Because this level of popularity was 
   expected, MAMA stored additional details about some of the attribute values 
   that were expected to be interesting. Values for the HTTP-Equiv 
   and Name attributes were saved, as well as specific 
   values for the Content attribute for the Content-Encoding 
   and Generator usages.
| ELEMENT/Attribute | Frequency | 
|---|---|
META | 3,276,347 | 
    Content | 3,273,610 | 
    Http-equiv | 2,826,859 | 
    Name | 2,710,638 | 
    Scheme | 34,807 | 
META Name attribute values
The numbers here show that "keywords" and 
   "description" are about equally popular. Perhaps 
   they are two great tastes that go great together on the Web? Yes. Over 90% of 
   the time, these two types of META are indeed used together.
| META Name value | frequency | META Name value | frequency | |
|---|---|---|---|---|
| keywords | 2,170,259 | copyright | 419,554 | |
| description | 2,098,529 | progid | 280,923 | |
| generator | 942,051 | distribution | 235,943 | |
| robots | 931,622 | rating | 227,732 | |
| author | 815,415 | language | 206,990 | |
| revisit-after | 471,573 | 
META Http-equiv attribute values
The big story here is that most documents declare their MIME type using a 
   META statement (over 75% of all URLs analyzed). 
   All other usages are dwarfed by the "Content-type" value.
| META Http-equiv value | frequency | META Http-equiv value | frequency | |
|---|---|---|---|---|
| content-type | 2,679,505 | cache-control | 71,245 | |
| content-language | 456,078 | keywords | 56,833 | |
| pragma | 167,801 | reply-to | 50,307 | |
| refresh | 163,413 | content-script-type | 46,839 | |
| expires | 163,350 | description | 43,694 | |
| content-style-type | 114,828 | pics-label | 35,359 | |
| imagetoolbar | 86,502 | page-enter | 34,575 | 
META Http-equiv="content-type" 
    charset values
Of the 2,679,505 URLs that used the META 
   Http-equiv="content-type" 
   value, 2,363,865 of them (88.22%) also specified a "charset" 
   parameter to provide encoding details. The top value was the western encoding 
   "iso-8859-1", which was 4 times as likely as any 
   other detected value. Encodings can sometimes be a bit cryptic, so the following 
   guide to languages and encoding values may be helpful with the short summary table 
   below (Fig 4-4) as well as the full frequency table 
   for the "charset" value:
- Cyrillic (includes Russian): windows-1251, koi8-r, iso-8859-5
 - Japanese: shift_jis, euc_jp, x-sjis, iso-2022-jp, shift-jis
 - Chinese: Trad. Chinese: big5, x-x-big5; Simp. Chinese: gb2312, gbk
 - Korean: euc-kr, ks_c_5601-1987
 
| META charset value | frequency | META charset value | frequency | |
|---|---|---|---|---|
| iso-8859-1 | 1,424,697 | windows-1251 | 45,674 | |
| windows-1252 | 330,123 | windows-1250 | 31,470 | |
| utf-8 | 249,084 | gb2312 | 25,378 | |
| shift_jis | 90,517 | big5 | 12,282 | |
| iso-8859-2 | 59,850 | windows-1254 | 10,318 | 
META Name="generator": 
   Editors and Content Management Systems (CMS) used
MAMA also looked at these values in the section on markup validation. The most 
   noticeable nugget here is that the many incarnations of Microsoft FrontPage are 
   the definite leaders for this value. FrontPage occurs more than 8 times as 
   often as any other META 
   Name="generator" value. 
   The following two tables are summary totals for the individual values found in 
   the full per-URL frequency table.
| Editor substring | frequency | Editor substring | frequency | |
|---|---|---|---|---|
| Microsoft FrontPage | 347,095 | Microsoft Visual Studio | 22,936 | |
| Adobe GoLive | 41,865 | Adobe PageMill | 15,148 | |
| Microsoft MSHTML | 40,030 | Claris Home Page | 6,259 | |
| IBM WebSphere | 32,218 | Adobe Dreamweaver | 5,954 | |
| NetObjects Fusion | 26,355 | Apple iWeb | 2,504 | |
| Microsoft Word | 24,892 | 
| CMS substring | Frequency | 
|---|---|
| Joomla | 34,852 | 
| Typo3 | 18,067 | 
| WordPress | 16,594 | 
| Blogger | 9,907 | 
SCRIPT element
We will be looking at scripting in much greater depth in a future section on Script, so for now we will just take a quick look at the element and its attributes.
Note: Many people seem to have problems spelling "language" - a number of misspellings of this occur fairly frequently.
| ELEMENT/Attribute | frequency | 
|---|---|
SCRIPT | 2,528,823 | 
    Language | 1,965,725 | 
    Type | 1,769,337 | 
    Src | 1,649,468 | 
    Charset | 25,776 | 
    Defer | 19,941 | 
    For | 4,177 | 
    Event | 3,973 | 
    xml:space | 520 | 
LINK element
We will be looking at the dominant use of the LINK element 
   for CSS in much greater depth in a future section on CSS, so for now we'll just take a quick look at the element, its attributes 
   plus a detail view of the values for the Rel and 
   Type. Although the Href and 
   Rel attributes are not required by HTML 4.0, authors 
   appear to treat them that way—they are both used in over 99% of all 
   LINK usages. The frequency of the Rev 
   attribute is higher than expected, but a random sampling reveals only the 
   "made" value is in wide use.
| ELEMENT/Attribute | frequency | 
|---|---|
LINK | 2,018,510 | 
    Href | 2,016,007 | 
    Rel | 2,001,105 | 
    Type | 1,777,982 | 
    Media | 288,862 | 
    Title | 234,355 | 
    Rev | 43,977 | 
    Charset | 4,306 | 
    Hreflang | 2,335 | 
    Target | 1,585 | 
LINK Rel attribute values
This attribute was tracked in MAMA by breaking the value down into space-separated 
   components. External style sheet statements are present in 90% of LINK 
   instances, over 20% use the shortcut icon syntax, and ~8.5% of LINK 
   elements specify an alternate form (most likely RSS or similar based on the 
   Type attribute values in the next section).
| LINK Rel value | frequency | LINK Rel value | frequency | |
|---|---|---|---|---|
| stylesheet | 1,836,140 | search | 22,296 | |
| icon | 451,386 | edit-time-data | 22,217 | |
| shortcut | 434,440 | schema.dc | 19,193 | |
| alternate | 170,342 | pingback | 18,270 | |
| file-list | 46,894 | home | 12,145 | |
| edituri | 35,383 | author | 11,894 | 
LINK Type attribute values
The Type attribute is not required, but most authors 
   seem to use the Type attribute for stylesheet and RSS 
   uses. The Type usage ratio appears to fall off 
   considerably when specifying a shortcut icon though—there are ~450,000 
   uses of Rel="icon" syntax, 
   but "image/*" MIME types only happen 1/3 of the time.
| LINK Type value | frequency | LINK Type value | frequency | |
|---|---|---|---|---|
| text/css | 1,720,750 | image/png | 9,770 | |
| application/rss+xml | 147,654 | application/opensearchdescription+xml | 8,846 | |
| image/x-icon | 103,102 | image/gif | 6,859 | |
| application/rsd+xml | 35,387 | text/html | 4,280 | |
| application/atom+xml | 34,866 | image/vnd.microsoft.icon | 3,169 | |
| image/ico | 20,149 | application/wlwmanifest+xml | 2,615 | |
| text/xml | 14,293 | application/xml | 1,094 | |
| application/rdf+xml | 10,351 | images/x-icon | 1,006 | 
STYLE element
As was already mentioned, we will be looking at style sheets in much greater depth in a future section on CSS, so for now we will just take a quick look at the element and its attributes.
| ELEMENT/Attribute | frequency | 
|---|---|
STYLE | 1,313,454 | 
    Type | 1,028,840 | 
    Media | 116,336 | 
    Title | 7,236 | 
    xml:space | 140 | 
BASE element
This element's original purpose was to declare a common root URL for relative URLs in a document, so it is a bit surprising to find that the original usage has been usurped in popularity by frame target control.
| ELEMENT/Attribute | frequency | 
|---|---|
BASE | 266,149 | 
    Target | 159,479 | 
    Href | 109,404 | 
This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.
Comments
The forum archive of this article is still available on My Opera.
No new comments accepted.