MAMA: Markup report, part 3: Basic BODY markup
Introduction
This is it. This week we begin to cover the real heart of HTML markup. These elements are what make HTML tick—they give documents their life and primary expression. Hyperlinks create the "web" of the Web; images are the primary avenue for authors to incorporate multimedia, and the many phrase and block elements covered here imbue content with semantics and basic formatting. We will take a look at each of these areas in turn. For a deeper look at these areas and more, the following MAMA article topics are also available this week:
To read more details of MAMA's findings, check out the MAMA home page.
Hyperlinks
Hyperlinks make the Web the web that it is. It is little wonder then that
the A element is the most popular of any of the BODY
's
child elements. MAMA considered each occurrence of an Href
attribute in an AREA
or A
element
as a hyperlink and kept a running tally for each URL analyzed. It also compared
the domain of the URL in a hyperlink with the domain of the page being analyzed
in order to discover how many external domains were being referenced in a
document. Of the 3,337,666 URLs that contained hyperlinks, 72.2% of them had
at least one linking outside the domain of the URL that was analyzed. The
average number of hyperlinks per document was 38.4.
ELEMENT/ Attribute | Frequency | % Total element usage | ELEMENT/ Attribute | Frequency | % Total element usage | |
---|---|---|---|---|---|---|
A | 3,307,397 | -- | AREA | 453,187 | -- | |
Href | 3,304,834 | 99.9% | Coords | 452,272 | 99.8% | |
Target | 1,978,018 | 59.8% | Href | 450,478 | 99.4% | |
Title | 658,820 | 19.9% | Shape | 439,720 | 97.0% | |
Name | 485,168 | 14.7% | Alt | 203,624 | 44.9% | |
Rel | 96,613 | 2.9% | Nohref | 13,570 | 3.0% |
The A
Rel
attribute
The Rel
attribute for the A
element expresses the relationship that the destination URL has to the current
URL. Until relatively recently this attribute was underused. However, its
use has grown in
the last few years as microformats have been embraced. The most popular
values for this attribute are "nofollow" at more
than 2-to-1 over the next-nearest values of "bookmark"
and "tag".
Attribute value | Frequency | Attribute value | Frequency | |
---|---|---|---|---|
nofollow | 46,179 | license | 6,330 | |
bookmark | 20,524 | alternate | 5,252 | |
tag | 20,445 | lightbox | 2,917 | |
category | 13,012 | me | 1,929 | |
external | 7,473 | self | 1,630 |
Images
MAMA kept track of how many images were detected in each document, including duplicate references to the same image. It tallied the total image references encountered (avg: 22.6), the number of unique images encountered (avg: 12.3), and the maximum number of times an image was referenced multiple times (avg: 15.2). MAMA found 3,233,208 URLs (92.14%) using images via the following methods:
- The
IMG
element - The forms
INPUT
Type
="image"/Src
widget usage - Elements with a
Background
attribute
ELEMENT[Attribute] | Frequency | Percentage |
---|---|---|
IMG [Src ] | 3,219,304 | 91.7% |
TD [Background ] | 714,706 | 20.4% |
BODY [Background ] | 634,617 | 18.1% |
INPUT [Src ] | 337,286 | 9.6% |
TABLE [Background ] | 281,209 | 8.0% |
TH [Background ] | 5,354 | 0.2% |
The IMG
element
The IMG
element was by far the most popular method
for using images in a document. Of the child elements of BODY
,
IMG
is second in popularity only to hyperlinks—used in 91.7% of MAMA's URL set. Several of its attributes rank among the top
10 of all markup attributes.
ELEMENT/ Attribute | Frequency | % Total element usage | ELEMENT/ Attribute | Frequency | % Total element usage | |
---|---|---|---|---|---|---|
IMG | 3,219,487 | -- | Align | 1,134,698 | 35.2% | |
Src | 3,219,304 | 99.99% | Name | 875,461 | 27.2% | |
Width | 2,957,808 | 91.9% | Hspace | 526,348 | 16.3% | |
Height | 2,945,989 | 91.5% | Usemap | 447,774 | 13.9% | |
Border | 2,810,265 | 87.3% | Vspace | 445,580 | 13.8% | |
Alt | 2,520,939 | 78.3% | Title | 367,132 | 11.4% |
Image formats
Authors use images in many ways, and there is definitely room on the Web for the many formats currently in play. In addition to keeping track of image totals, MAMA looked at the popularity of the GIF, JPEG, and PNG formats. MAMA defaulted to using an image's file extension to judge the type of format. If MAMA could declare a particular format from just this alone, it did not try to dig any deeper. If the file extension check was inconclusive, MAMA would then download the HTTP HEAD of the referenced image and proceed to examine the image's MIME type to detect the format.
JPEG has no real competition in depicting photographs or realistic scenes, but the PNG format and the dominant GIF format are at odds for the same use cases. Due to a number of historical issues, uptake of the PNG format has been slower than many expected. Authors seem to have no problem with both formats coexisting on their Web sites.
Image Format | Total occurrences | Percentage | Maximum quantity encountered | Average in sample |
---|---|---|---|---|
GIF | 2,854,113 | 81.3% | 1,610 | 9.0 |
JPEG | 2,451,507 | 69.9% | 1,201 | 6.1 |
PNG | 374,408 | 10.7% | 539 | 3.2 |
Image formats in combination: Venn diagram
The following diagram shows the usage overlap of the three dominant image formats. The relationship between GIF and PNG is usually characterized as a competitive one, so it was expected that these numbers would demonstrate authors showing a clear preference for one or the other in their pages. That is definitely not the case. PNGs were detected in 374,408 URLs, and of those, 311,827 URLs (83.3%) also used the GIF format as well. If that is what constitutes a format war, the battle is a subtle one.
Note: Region sizes are not to scale
Phrase elements
The purpose of some of these inline elements is to assign semantic meaning to
text content. Many of the other elements in this category set their sights
lower; they convey simple formatting and appearance information. In fact,
the most popular of these elements is still the FONT
element—an element that exists purely to convey formatting. Markup purists
will be dismayed that FONT
remains in such high usage,
but they can console themselves that the overall use of CSS (80.4%) now edges
out FONT
(58.7%) by a comfortable margin. Other
interesting findings of note include SMALL
being
twice as popular as BIG
, and SUP
in use almost 8 times as much as SUB
.
ELEMENT | Frequency | ELEMENT | Frequency | |
---|---|---|---|---|
FONT | 2,061,417 | EM | 351,594 | |
B | 1,805,495 | U | 342,612 | |
SPAN | 1,527,964 | SMALL | 155,962 | |
STRONG | 1,102,056 | BIG | 76,946 | |
I | 668,742 | SUP | 73,309 |
The FONT
element
Zeroing in on this element can tell us a lot about "old school" HTML. When
it was introduced at the end of 1994, it filled an early void for typographical
capabilities with authors. CSS has since subsumed all the features that
FONT
first brought to the Web. The values for the
main attributes of the FONT
element show a dominant
value preferred for each: the Color
is usually white
("#ffffff" or "white"),
the Face
is typically "Arial",
and the Size
is most often "2".
ELEMENT/Attribute | Frequency |
---|---|
FONT | 2,061,417 |
Size | 1,709,405 |
Color | 1,634,714 |
Face | 1,379,110 |
Color attribute value | Frequency | Face attribute value | Frequency | Size attribute value | Frequency | ||
---|---|---|---|---|---|---|---|
#ffffff | 535,698 | arial | 1,036,962 | 2 | 967,193 | ||
#000000 | 442,291 | helvetica | 660,926 | 1 | 785,227 | ||
#ff0000 | 318,323 | verdana | 548,563 | 3 | 488,919 | ||
#0000ff | 188,611 | sans-serif | 486,824 | 4 | 485,621 | ||
#000080 | 101,950 | times new roman | 197,881 | 5 | 332,907 |
Block and replaced elements
These elements are used in a wide variety of situations to accomplish an assortment of tasks. Some of them are widely used and others are not. The only relationship that many of these elements have is that they share little in common with the other main MAMA categories.
The BR
element is used most frequently in this group,
but P
and DIV
are also
favored by authors. The BR
element ranks a little higher
than the numbers below indicate, because MAMA detected <br>
separately from <br/>
. The two variants
of BR
were detected in 2,884,356 of MAMA's URLs (82.2%).
The heading elements (H1
-H6
)
followed an expected popularity path: H2
is found
less often than H1
, H3
less often than H2
and so on. UL
is found at almost 20 times the frequency of the OL
element; it is a little surprising that authors do not show a tendency to rank things.
ELEMENT | Frequency | Percentage | ELEMENT | Frequency | Percentage | |
---|---|---|---|---|---|---|
BR | 2,859,662 | 81.5% | H1 | 769,344 | 21.9% | |
P | 2,702,935 | 77.0% | HR | 729,380 | 20.8% | |
DIV | 2,499,779 | 71.2% | H2 | 573,002 | 16.3% | |
CENTER | 1,076,535 | 30.7% | H3 | 438,496 | 12.5% | |
UL | 809,915 | 23.1% | BLOCKQUOTE | 188,947 | 5.4% |
Conclusion
This week's topics mark a turning point in the MAMA results. Although some other topics are also very popular (such as next week's tables and forms), hyperlinks and images constitute arguably THE most important part of HTML. Next week, we will wrap up MAMA's look at markup by covering the remaining popular markup topics: forms, plug-ins, tables, and XML.
This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.
Comments
The forum archive of this article is still available on My Opera.