MAMA: HEAD structure
Index:
- Introduction
- HEAD and its sub-elements
- TITLE element
- META element
- SCRIPT element
- LINK element
- STYLE element
- BASE element
Introduction
The elements that constitute the Head section of a page contain meta information about the document, providing data used to access crucial external resources or other important information. The importance of these elements is highlighted by their dominant usage above other elements.
NOTE:
The elements listed in this section usually
exist within the HEAD
element, but the statistics
presented for these elements/attributes are for occurrences anywhere in
the document.
HEAD
and its sub-elements
The HEAD
element is the most popular of any markup
element, and its top 5 sub-elements are also in the top 20 of ALL
markup elements. Not all of the sub-elements are so popular. The obsolete
ISINDEX
element was only found in 63 URLs—it
seems well and truly dead, having long-since been replaced by the much richer
interactive Forms we know and love today.
ELEMENT | Frequency | ELEMENT | Frequency | |
---|---|---|---|---|
HEAD | 3,464,519 | LINK | 2,018,510 | |
TITLE | 3,459,207 | STYLE | 1,313,454 | |
META | 3,276,347 | BASE | 266,149 | |
SCRIPT | 2,528,823 | ISINDEX | 63 |
The Profile
attribute of the HEAD element
This attribute was detected 19,030 times in MAMA's URLs. It is used to specify
a URL location of a metadata profile. At least ~90% of the values point to the
XFN metadata system, which is
used by the Rel
attribute of hyperlinks to indicate
relationships between authors and other people. Other types occurring with some
frequency were Dublin Core and RSS syndication metadata.
Profile attribute value | Frequency |
---|---|
http://gmpg.org/xfn/11/ | 15,770 |
http://gmpg.org/xfn/1/ | 1,149 |
http://dublincore.org/documents/dcq-html/ | 395 |
http://purl.org/metadata/dublin_core/ | 313 |
TITLE
element
With 3,459,207 occurrences, this is the second most popular element overall. When
planning MAMA, it did not seem like storing the contents of the TITLE
element would be all that compelling. I did not know how long the contents might
reach either. So I settled for clamping the stored value to 255 characters.
Any content after the first 255 characters was not stored. This makes some
statistics about TITLE
length meaningless, but others
can still be useful; 128,874 URLs had empty TITLE
elements,
and 23,322 URLs had the maximum TITLE
length (255 characters).
META
element
This element ranks 6th overall in usage. Because this level of popularity was
expected, MAMA stored additional details about some of the attribute values
that were expected to be interesting. Values for the HTTP-Equiv
and Name
attributes were saved, as well as specific
values for the Content
attribute for the Content-Encoding
and Generator usages.
ELEMENT/Attribute | Frequency |
---|---|
META | 3,276,347 |
Content | 3,273,610 |
Http-equiv | 2,826,859 |
Name | 2,710,638 |
Scheme | 34,807 |
META
Name
attribute values
The numbers here show that "keywords" and
"description" are about equally popular. Perhaps
they are two great tastes that go great together on the Web? Yes. Over 90% of
the time, these two types of META
are indeed used together.
META Name value | frequency | META Name value | frequency | |
---|---|---|---|---|
keywords | 2,170,259 | copyright | 419,554 | |
description | 2,098,529 | progid | 280,923 | |
generator | 942,051 | distribution | 235,943 | |
robots | 931,622 | rating | 227,732 | |
author | 815,415 | language | 206,990 | |
revisit-after | 471,573 |
META
Http-equiv
attribute values
The big story here is that most documents declare their MIME type using a
META
statement (over 75% of all URLs analyzed).
All other usages are dwarfed by the "Content-type" value.
META Http-equiv value | frequency | META Http-equiv value | frequency | |
---|---|---|---|---|
content-type | 2,679,505 | cache-control | 71,245 | |
content-language | 456,078 | keywords | 56,833 | |
pragma | 167,801 | reply-to | 50,307 | |
refresh | 163,413 | content-script-type | 46,839 | |
expires | 163,350 | description | 43,694 | |
content-style-type | 114,828 | pics-label | 35,359 | |
imagetoolbar | 86,502 | page-enter | 34,575 |
META
Http-equiv
="content-type"
charset values
Of the 2,679,505 URLs that used the META
Http-equiv
="content-type"
value, 2,363,865 of them (88.22%) also specified a "charset"
parameter to provide encoding details. The top value was the western encoding
"iso-8859-1", which was 4 times as likely as any
other detected value. Encodings can sometimes be a bit cryptic, so the following
guide to languages and encoding values may be helpful with the short summary table
below (Fig 4-4) as well as the full frequency table
for the "charset" value:
- Cyrillic (includes Russian): windows-1251, koi8-r, iso-8859-5
- Japanese: shift_jis, euc_jp, x-sjis, iso-2022-jp, shift-jis
- Chinese: Trad. Chinese: big5, x-x-big5; Simp. Chinese: gb2312, gbk
- Korean: euc-kr, ks_c_5601-1987
META charset value | frequency | META charset value | frequency | |
---|---|---|---|---|
iso-8859-1 | 1,424,697 | windows-1251 | 45,674 | |
windows-1252 | 330,123 | windows-1250 | 31,470 | |
utf-8 | 249,084 | gb2312 | 25,378 | |
shift_jis | 90,517 | big5 | 12,282 | |
iso-8859-2 | 59,850 | windows-1254 | 10,318 |
META
Name
="generator":
Editors and Content Management Systems (CMS) used
MAMA also looked at these values in the section on markup validation. The most
noticeable nugget here is that the many incarnations of Microsoft FrontPage are
the definite leaders for this value. FrontPage occurs more than 8 times as
often as any other META
Name
="generator" value.
The following two tables are summary totals for the individual values found in
the full per-URL frequency table.
Editor substring | frequency | Editor substring | frequency | |
---|---|---|---|---|
Microsoft FrontPage | 347,095 | Microsoft Visual Studio | 22,936 | |
Adobe GoLive | 41,865 | Adobe PageMill | 15,148 | |
Microsoft MSHTML | 40,030 | Claris Home Page | 6,259 | |
IBM WebSphere | 32,218 | Adobe Dreamweaver | 5,954 | |
NetObjects Fusion | 26,355 | Apple iWeb | 2,504 | |
Microsoft Word | 24,892 |
CMS substring | Frequency |
---|---|
Joomla | 34,852 |
Typo3 | 18,067 |
WordPress | 16,594 |
Blogger | 9,907 |
SCRIPT
element
We will be looking at scripting in much greater depth in a future section on Script, so for now we will just take a quick look at the element and its attributes.
Note: Many people seem to have problems spelling "language" - a number of misspellings of this occur fairly frequently.
ELEMENT/Attribute | frequency |
---|---|
SCRIPT | 2,528,823 |
Language | 1,965,725 |
Type | 1,769,337 |
Src | 1,649,468 |
Charset | 25,776 |
Defer | 19,941 |
For | 4,177 |
Event | 3,973 |
xml:space | 520 |
LINK
element
We will be looking at the dominant use of the LINK
element
for CSS in much greater depth in a future section on CSS, so for now we'll just take a quick look at the element, its attributes
plus a detail view of the values for the Rel
and
Type
. Although the Href
and
Rel
attributes are not required by HTML 4.0, authors
appear to treat them that way—they are both used in over 99% of all
LINK
usages. The frequency of the Rev
attribute is higher than expected, but a random sampling reveals only the
"made" value is in wide use.
ELEMENT/Attribute | frequency |
---|---|
LINK | 2,018,510 |
Href | 2,016,007 |
Rel | 2,001,105 |
Type | 1,777,982 |
Media | 288,862 |
Title | 234,355 |
Rev | 43,977 |
Charset | 4,306 |
Hreflang | 2,335 |
Target | 1,585 |
LINK
Rel
attribute values
This attribute was tracked in MAMA by breaking the value down into space-separated
components. External style sheet statements are present in 90% of LINK
instances, over 20% use the shortcut icon syntax, and ~8.5% of LINK
elements specify an alternate form (most likely RSS or similar based on the
Type
attribute values in the next section).
LINK Rel value | frequency | LINK Rel value | frequency | |
---|---|---|---|---|
stylesheet | 1,836,140 | search | 22,296 | |
icon | 451,386 | edit-time-data | 22,217 | |
shortcut | 434,440 | schema.dc | 19,193 | |
alternate | 170,342 | pingback | 18,270 | |
file-list | 46,894 | home | 12,145 | |
edituri | 35,383 | author | 11,894 |
LINK
Type
attribute values
The Type
attribute is not required, but most authors
seem to use the Type
attribute for stylesheet and RSS
uses. The Type
usage ratio appears to fall off
considerably when specifying a shortcut icon though—there are ~450,000
uses of Rel
="icon" syntax,
but "image/*" MIME types only happen 1/3 of the time.
LINK Type value | frequency | LINK Type value | frequency | |
---|---|---|---|---|
text/css | 1,720,750 | image/png | 9,770 | |
application/rss+xml | 147,654 | application/opensearchdescription+xml | 8,846 | |
image/x-icon | 103,102 | image/gif | 6,859 | |
application/rsd+xml | 35,387 | text/html | 4,280 | |
application/atom+xml | 34,866 | image/vnd.microsoft.icon | 3,169 | |
image/ico | 20,149 | application/wlwmanifest+xml | 2,615 | |
text/xml | 14,293 | application/xml | 1,094 | |
application/rdf+xml | 10,351 | images/x-icon | 1,006 |
STYLE
element
As was already mentioned, we will be looking at style sheets in much greater depth in a future section on CSS, so for now we will just take a quick look at the element and its attributes.
ELEMENT/Attribute | frequency |
---|---|
STYLE | 1,313,454 |
Type | 1,028,840 |
Media | 116,336 |
Title | 7,236 |
xml:space | 140 |
BASE
element
This element's original purpose was to declare a common root URL for relative URLs in a document, so it is a bit surprising to find that the original usage has been usurped in popularity by frame target control.
ELEMENT/Attribute | frequency |
---|---|
BASE | 266,149 |
Target | 159,479 |
Href | 109,404 |
This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.
Comments
The forum archive of this article is still available on My Opera.