MAMA: Markup report, part 3: Basic BODY markup

By Brian Wilson

Introduction

This is it. This week we begin to cover the real heart of HTML markup. These elements are what make HTML tick—they give documents their life and primary expression. Hyperlinks create the "web" of the Web; images are the primary avenue for authors to incorporate multimedia, and the many phrase and block elements covered here imbue content with semantics and basic formatting. We will take a look at each of these areas in turn. For a deeper look at these areas and more, the following MAMA article topics are also available this week:

To read more details of MAMA's findings, check out the MAMA home page.

Hyperlinks

Hyperlinks make the Web the web that it is. It is little wonder then that the A element is the most popular of any of the BODY's child elements. MAMA considered each occurrence of an Href attribute in an AREA or A element as a hyperlink and kept a running tally for each URL analyzed. It also compared the domain of the URL in a hyperlink with the domain of the page being analyzed in order to discover how many external domains were being referenced in a document. Of the 3,337,666 URLs that contained hyperlinks, 72.2% of them had at least one linking outside the domain of the URL that was analyzed. The average number of hyperlinks per document was 38.4.

Top attribute frequencies for the A and AREA elements
ELEMENT/
Attribute
Frequency% Total
element
usage
  ELEMENT/
Attribute
Frequency% Total
element
usage
A3,307,397-- AREA453,187--
    Href3,304,83499.9%     Coords452,27299.8%
    Target1,978,01859.8%     Href450,47899.4%
    Title658,82019.9%     Shape439,72097.0%
    Name485,16814.7%     Alt203,62444.9%
    Rel96,6132.9%     Nohref13,5703.0%

The A Rel attribute

The Rel attribute for the A element expresses the relationship that the destination URL has to the current URL. Until relatively recently this attribute was underused. However, its use has grown in the last few years as microformats have been embraced. The most popular values for this attribute are "nofollow" at more than 2-to-1 over the next-nearest values of "bookmark" and "tag".

Top values for A Rel
Attribute valueFrequency Attribute valueFrequency
nofollow46,179license6,330
bookmark20,524alternate5,252
tag20,445lightbox2,917
category13,012me1,929
external7,473self1,630

Images

MAMA kept track of how many images were detected in each document, including duplicate references to the same image. It tallied the total image references encountered (avg: 22.6), the number of unique images encountered (avg: 12.3), and the maximum number of times an image was referenced multiple times (avg: 15.2). MAMA found 3,233,208 URLs (92.14%) using images via the following methods:

  • The IMG element
  • The forms INPUT Type="image"/Src widget usage
  • Elements with a Background attribute
MAMA's Image usage detections
ELEMENT[Attribute]FrequencyPercentage
IMG[Src]3,219,30491.7%
TD[Background]714,70620.4%
BODY[Background]634,61718.1%
INPUT[Src]337,2869.6%
TABLE[Background]281,2098.0%
TH[Background]5,3540.2%

The IMG element

The IMG element was by far the most popular method for using images in a document. Of the child elements of BODY, IMG is second in popularity only to hyperlinks—used in 91.7% of MAMA's URL set. Several of its attributes rank among the top 10 of all markup attributes.

Top attribute frequencies for the IMG element
ELEMENT/
Attribute
Frequency% Total
element
usage
  ELEMENT/
Attribute
Frequency% Total
element
usage
IMG3,219,487--     Align1,134,69835.2%
    Src3,219,30499.99%     Name875,46127.2%
    Width2,957,80891.9%     Hspace526,34816.3%
    Height2,945,98991.5%     Usemap447,77413.9%
    Border2,810,26587.3%     Vspace445,58013.8%
    Alt2,520,93978.3%     Title367,13211.4%

Image formats

Authors use images in many ways, and there is definitely room on the Web for the many formats currently in play. In addition to keeping track of image totals, MAMA looked at the popularity of the GIF, JPEG, and PNG formats. MAMA defaulted to using an image's file extension to judge the type of format. If MAMA could declare a particular format from just this alone, it did not try to dig any deeper. If the file extension check was inconclusive, MAMA would then download the HTTP HEAD of the referenced image and proceed to examine the image's MIME type to detect the format.

JPEG has no real competition in depicting photographs or realistic scenes, but the PNG format and the dominant GIF format are at odds for the same use cases. Due to a number of historical issues, uptake of the PNG format has been slower than many expected. Authors seem to have no problem with both formats coexisting on their Web sites.

Image format statistics
Image
Format
Total
occurrences
Percentage Maximum
quantity
encountered
Average
in sample
GIF2,854,11381.3% 1,6109.0
JPEG2,451,50769.9% 1,2016.1
PNG374,40810.7% 5393.2

Image formats in combination: Venn diagram

The following diagram shows the usage overlap of the three dominant image formats. The relationship between GIF and PNG is usually characterized as a competitive one, so it was expected that these numbers would demonstrate authors showing a clear preference for one or the other in their pages. That is definitely not the case. PNGs were detected in 374,408 URLs, and of those, 311,827 URLs (83.3%) also used the GIF format as well. If that is what constitutes a format war, the battle is a subtle one.

Venn diagram for image format usage types

Note: Region sizes are not to scale

Phrase elements

The purpose of some of these inline elements is to assign semantic meaning to text content. Many of the other elements in this category set their sights lower; they convey simple formatting and appearance information. In fact, the most popular of these elements is still the FONT element—an element that exists purely to convey formatting. Markup purists will be dismayed that FONT remains in such high usage, but they can console themselves that the overall use of CSS (80.4%) now edges out FONT (58.7%) by a comfortable margin. Other interesting findings of note include SMALL being twice as popular as BIG, and SUP in use almost 8 times as much as SUB.

Top phrase markup elements
ELEMENTFrequency ELEMENTFrequency
FONT2,061,417EM351,594
B1,805,495U342,612
SPAN1,527,964SMALL155,962
STRONG1,102,056BIG76,946
I668,742SUP73,309

The FONT element

Zeroing in on this element can tell us a lot about "old school" HTML. When it was introduced at the end of 1994, it filled an early void for typographical capabilities with authors. CSS has since subsumed all the features that FONT first brought to the Web. The values for the main attributes of the FONT element show a dominant value preferred for each: the Color is usually white ("#ffffff" or "white"), the Face is typically "Arial", and the Size is most often "2".

FONT element/attribute usage
ELEMENT/AttributeFrequency
FONT2,061,417
   Size1,709,405
   Color1,634,714
   Face1,379,110
Top attribute values for FONT attributes
Color
attribute
value
Frequency  Face
attribute
value
Frequency  Size
attribute
value
Frequency
#ffffff535,698 arial1,036,962 2967,193
#000000442,291 helvetica660,926 1785,227
#ff0000318,323 verdana548,563 3488,919
#0000ff188,611 sans-serif486,824 4485,621
#000080101,950 times new roman197,881 5332,907

Block and replaced elements

These elements are used in a wide variety of situations to accomplish an assortment of tasks. Some of them are widely used and others are not. The only relationship that many of these elements have is that they share little in common with the other main MAMA categories.

The BR element is used most frequently in this group, but P and DIV are also favored by authors. The BR element ranks a little higher than the numbers below indicate, because MAMA detected <br> separately from <br/>. The two variants of BR were detected in 2,884,356 of MAMA's URLs (82.2%). The heading elements (H1-H6) followed an expected popularity path: H2 is found less often than H1, H3 less often than H2 and so on. UL is found at almost 20 times the frequency of the OL element; it is a little surprising that authors do not show a tendency to rank things.

Top block and replaced elements
ELEMENTFrequencyPercentage  ELEMENTFrequencyPercentage
BR2,859,66281.5% H1769,34421.9%
P2,702,93577.0% HR729,38020.8%
DIV2,499,77971.2% H2573,00216.3%
CENTER1,076,53530.7% H3438,49612.5%
UL809,91523.1% BLOCKQUOTE188,9475.4%

Conclusion

This week's topics mark a turning point in the MAMA results. Although some other topics are also very popular (such as next week's tables and forms), hyperlinks and images constitute arguably THE most important part of HTML. Next week, we will wrap up MAMA's look at markup by covering the remaining popular markup topics: forms, plug-ins, tables, and XML.

This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.

Comments

The forum archive of this article is still available on My Opera.

No new comments accepted.