MAMA: Phrase, block, list, and other elements

By Brian Wilson

Index:

  1. Introduction
  2. Phrase elements
  3. FONT element: Markup dedicated to visual formatting
  4. Basic block elements
  5. List elements
  6. Ruby elements
  7. Obsolete elements
  8. Browser extension elements/attributes
  9. Miscellaneous elements

Introduction

This section is a catch-all, covering the "rest" of popular markup in use that did not easily fall into the other markup categories. It covers basic phrase and block markup, list-related elements, browser extensions, and other elements and attributes. While none of these elements are among the 10 most popular, they do constitute a majority among the next 10 and the following 20 as well.

Phrase elements

The purpose of many markup elements is to declare the appearance or nature of the embedded text. The following 22 elements either describe some simple formatting instructions for text content or have a loftier goal of describing the intrinsic nature of the content. Applying a bold effect is the most popular usage for this type of element (STRONG almost always equates to bold formatting). The almost eight-fold preference of SUP over the comparable SUB element is puzzling.

Fig 2-1: Frequencies of phrase markup
ELEMENTFrequency ELEMENTFrequency
B1,805,495CITE12,562
SPAN1,527,964ACRONYM10,983
STRONG1,102,056SUB9,378
I668,742S7,767
EM351,594CODE7,595
U342,612STRIKE5,991
SMALL155,962DFN3,584
BIG76,946Q1,785
SUP73,309SAMP1,609
TT15,480KBD1,313
ABBR14,902VAR1,258

FONT element: markup dedicated to visual formatting

This element was originally a Netscape extension that later was included in HTML 3.2 and then immediately deprecated—much greater control is attainable using CSS. That does not stop it from remaining extremely popular with authors. In fact, it is THE most popular phrase markup element, even more popular than the B or SPAN elements.

Fig 3-1: FONT element/attribute usage
ELEMENT/AttributeFrequency
FONT2,061,417
   Size1,709,405
   Color1,634,714
   Face1,379,110

Attribute values

Because of the popularity of this element, and other studies focusing their attentions on other areas of markup (Hickson) or CSS (Saarsoo), it was instructive to look at values for some HTML attributes that had not been discussed before. The 3 main FONT attributes were ripe for this type of analysis.

Size attribute

With the Size attribute, absolute font size values are preferred over relative font size changes. Font size "2" is the most popular, with "1" also having high representation.

Fig 3-2: FONT Size values with highest frequency
(Also see the full frequency table.)
Attribute valueFrequency Attribute valueFrequency
2967,193-1311,017
1785,2276204,675
3488,919-2182,372
4485,621+1170,310
5332,907+2115,593

Color attribute

The hexadecimal (#RRGGBB syntax) method is preferred over the friendly color names. White and black have the clear edge in the frequency list, while blue, red, yellow, and gray are also very popular.

Fig 3-3: FONT Color values with highest frequency
(Also see the full frequency table.)
Attribute valueFrequency Attribute valueFrequency
#ffffff535,698red90,996
#000000442,291white90,452
#ff0000318,323#66666684,941
#0000ff188,611#ffff0081,481
#000080101,950black77,329

Face attribute

This attribute allows you to specify the type of font used, in the same manner as the font-family CSS property. The "Arial" font is a runaway favorite here, while "Helvetica", "Verdana" and "sans-serif" are only half as popular. This attribute value allows a comma-separated list of font names, so "Arial" may be very popular as a primary font choice, or it may also be a popular fallback font used when designating other specific fonts.

Fig 3-4: FONT Face values with highest frequency
(Also see the full frequency table.)
Attribute valueFrequency Attribute valueFrequency
Arial1,036,962times101,094
Helvetica660,926tahoma93,718
Verdana548,563geneva86,016
sans-serif486,824serif72,203
Times New Roman197,881comic sans ms68,708

About Face!: Fonts used in Japan

This is a good time to make use of MAMA's country tracking in order to locate regional differences in specific font use. Japanese font usage on Web pages, as a break-out example, would likely have specific trends very different than that of the rest of the Web-at-large. There were 124,976 URLs tracked in MAMA identified as being from Japan, and only 23,466 used the FONT Face attribute. In this restricted population, the "Arial" font surprisingly takes the top spot again. This is a bit odd because "Arial" does not contain any Japanese character glyphs. Perhaps "Arial" is again being used as a fallback font. Japan-specific fonts "Osaka" and "MS UI Gothic" also have high representation here.

Note: Some of the values in the full frequency table are meaningless gibberish, due to a bug in MAMA's storage of multi-byte Asian characters.

Fig 3-5: FONT Face values with highest frequency in Japan
(Also see the full frequency table.)
Attribute valueFrequency Attribute valueFrequency
Arial3,333verdana1,303
Osaka2,615sans-serif1,235
Helvetica1,738times607
Times New Roman1,677comic sans ms567
MS UI Gothic1,627century542

Basic block elements

The P and DIV elements are the workhorses of HTML, as expected, but the popularity of CENTER is higher than expected, being encountered more often than any of the Hx heading levels. The decreasing popularity of the heading levels from H1-H6 is not a surprise.

Fig 4-1: Frequencies of Block elements/attributes
ELEMENT/AttributeFrequency  ELEMENT/AttributeFrequency
P2,702,935H4185,110
DIV2,499,779H5103,060
CENTER1,076,535ADDRESS50,269
H1769,344H645,676
H2573,002PRE36,620
H3438,496   Width63
BLOCKQUOTE188,947  
   Cite1,402  

A common attribute for block elements: Align

Many of the basic block elements have the Align attribute in common, which controls horizontal alignment of the content within the block. More than half of all URLs with P elements also use the Align attribute on at least one of the element instances, but the prize for highest usage goes to the DIV element: 67.82% of the URLs that use DIV also have at least one of the DIVs using the Align attribute. The Heading elements (H1-H6) all have similar usage rates for the Align attribute: ~15-20%.

Fig 4-2: Usage of the Align attribute in block elements
ELEMENTFrequency% Total
element
usage
  ELEMENTFrequency% Total
element
usage
DIV1,695,28767.82%H360,47713.79%
P1,375,27850.88%H430,68216.58%
H1110,34214.34%H521,30220.67%
H276,29113.31%H610,34222.64%

List elements

It is surprising that the unordered list (UL) is almost 20 times as popular as the ordered list (OL)—one would think that ranking things would be a more popular activity on the Web (hotornot.com, anyone?). The UL and OL elements are used in the same documents in 29,564 cases (~2/3 of the OL total). The DL list also finishes the race far in advance of the OL element, at almost a 2-to-1 ratio. The deprecated MENU and DIR elements are both definitely on the way out—if usage of OL compared to UL seemed rare, the situation for MENU and DIR is dire.

Fig 5-1: Frequencies of List elements/attributes
ELEMENT/AttributeFrequency ELEMENT/AttributeFrequency
LI843,869DT74,984
   Type5854DD71,477
   Value597OL47,196
UL809,915   Start2,266
   Type23,996   Type3,425
DL84,257DIR4,397
   Compact583MENU1,906

The Type attribute of LI, UL and OL

MAMA stored the values of the Type attribute for the LI element and combined the Type values for UL and OL elements. The expected keywords are dominant in both cases, but a prominent swap occurs between the frequency tables for this attribute of the LI element and the UL/OL elements. The most popular value for LI/Type is "square", followed by "disc". The top OL-UL/Type values are the same, but reversed. There may be an obvious explanation for this that I am missing.

Ruby elements

The RUBY elements were introduced to HTML in XHTML 1.1 for primary use in select Asian language situations. The various Ruby-related elements still appear to suffer from low adoption. As expected, the clear majority of cases (265 of 289) are Japanese URLs.

Fig 6-1: Usage of Ruby elements
ELEMENTFrequency
RUBY289
RT278
RB216
RP204

Obsolete elements

These elements were obsolete back when HTML 2.0 was the latest and greatest version in the land. As you can see, they have all but disappeared from author's development lexicons.

Fig 7-1: Usage of obsolete elements
ELEMENTFrequency
XMP311
PLAINTEXT189
LISTING32

Browser extension elements/attributes

Some elements and attributes have been created by browser vendors in the past but have not been embraced in the standards. The functionality of the oft-maligned BLINK element was absorbed in CSS to become "text-decoration: blink". With time, usage of the BLINK element will disappear, but currently BLINK's usage by 26,807 URLs is ~36% of the "text-decoration: blink" version. Some extensions catch on, while others do not. The LAYER and ILAYER elements—both Netscape 4 specific elements—are fairly popular, while usage of MULTICOL, an element introduced and only supported in the same Netscape version is almost nonexistent. Of all the browser extensions that have not gained standards traction, the crown definitely belongs to MARQUEE—an element originating in Microsoft's Internet Explorer 2.0 and now embraced by all major browsers.

Fig 8-1: Usage of browser extensions
ELEMENT/AttributeFrequency ELEMENT/AttributeFrequency
MARQUEE140,328SPACER34,470
   Scrollamount73,551BLINK26,807
   Width67,071LAYER26,305
   Scrolldelay66,185ILAYER21,391
   Height58,552NOLAYER12,389
   Direction43,831MULTICOL75
   Behavior37,835  
   Bgcolor32,466  
   Align23,804  
   Loop16,055  
   Border9,059  
   Truespeed6,182  
   Hspace2,159  
   Vspace2,052  

Miscellaneous elements

These elements are a grab bag of markup that do not really fit in other categories, but as you can see by the popularity of BR (the 11th most popular element overall), they cannot be ignored. The revision control elements INS and DEL, along with the bi-directional control element BDO were all introduced in HTML 4.0, but none of them have any significant authoring traction "in the wild" even though they have been around for over 10 years.

Note: The value in parentheses for the BR element indicates the frequency of "<br />", which was detected separately from "<br>"

Fig 9-1: Usage of misc. elements
ELEMENT/AttributeFrequency  ELEMENT/AttributeFrequency
BR2,859,662
(168,642)
WBR4,883
   Clear107,624INS1,344
HR729,380   Datetime139
   Size227,745   Cite82
   Width226,657DEL1,243
   Noshade117,978   Datetime165
   Align100,044   Cite33
NOBR89,903BDO167
BASEFONT13,158   Dir147
   Size10,113  
   Color7,231  
   Face1,810  

This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.

Comments

The forum archive of this article is still available on My Opera.

No new comments accepted.