MAMA: Common attributes

By Brian Wilson

Index:

  1. Introduction
  2. Name
  3. Class
  4. Style
  5. Id
  6. Title
  7. Lang
  8. Dir
  9. Accesskey
  10. Tabindex
  11. Longdesc
  12. Disabled

Introduction

The common attributes are those that are used across a multitude of elements. They are often attributes of critical importance to the most popular features that Web browsers have. They are listed here under the same umbrella for a single goal— when viewed together, comparing the use of these attributes with their many applicable elements can expand our understanding of an attribute itself and how/when authors tend to use it.

Fig 1-1: Common attribute usage
Attribute Frequency   Attribute Frequency
Name 3,220,308 Dir 136,997
Class 2,139,184 Accesskey 80,026
Style 1,878,916 Tabindex 49,081
Id 1,782,769 Longdesc 26,641
Title 1,010,147 Disabled 6,643
Lang 475,336    

The Name attribute

This attribute has a number of heterogenous uses—many of them very popular. It comes as no surprise, then, that it is very widely used; 3,220,308 of MAMA's URLs (91.77%) carry the attribute in some fashion. Comparing the usage of Name to the Id attribute (which shares at least part of its functionality) demonstrates clear differences. Name has especially deep coverage in its uses with the META, PARAM, MAP, and FRAME elements, not to mention its often-paired use with Form fields. In many of these cases, usage runs above 95%. There are also some noticeable uses with vendor-specific attributes (such as Csaction, Menumachine, and Cssequence) where penetration is almost 100%—a sure sign that a program is responsible for the attribute creation; humans just are not that reliable!

Fig 2-1: Elements using Name and relative attribute popularities
[Please also see the full frequency table.]
ELEMENT Frequency % of
Total
element
usage
  ELEMENT Frequency % of
Total
element
usage
  ELEMENTFrequency % of
Total
element
usage
META 2,710,638 82.73% MAP 456,648 99.72% TABLE 20,507 0.71%
INPUT 990,058 98.17% FRAME 349,820 92.52% CSACTION 17,222 99.99%
IMG 875,460 27.19% SELECT 275,323 96.48% DIV 17,018 0.68%
PARAM 576,508 99.97% EMBED 138,809 25.44% OBJECT 9,284 1.74%
FORM 570,643 54.83% IFRAME 87,763 39.45% SPAN 8,778 0.57%
A 485,168 14.67% TEXTAREA 32,500 89.26% LAYER 7,583 28.83%

The Name attribute frequency

The average quantity of the Name attribute in MAMA's URL set (when it was used) was 13.0 times. This average is a bit lower than that of its sister attribute Id, and this is also reflected in the extreme use cases for Name. As usual, there is an extreme (some might say absurd, but I will not pass judgment) use case (49,260 Name attributes in one document), with the next-nearest neighbor falling into a very distant second place. The full frequency table is begging for your attention.

Fig 2-2: URLs with the most Name attributes
URL Name
quantity
http://genforum.genealogy.com/ny/all.html/ 49,260
http://www.notisum.se/rnp/SLS/lag/19920300.htm/ 4,614
http://www.broekemasierbestrating.nl/Default.htm/ (URL no longer active) 598
http://www.BeerHatsOnline.com/ 359

The Name attribute values

The list of values for this attribute can be a bit confusing, as it is a combined value list representing all types of Name usage. The top 15 slots are almost all from META, and other distinct uses like PARAM, Forms, CSIM and hyperlink anchors stand out with high representation as well.

Fig 2-3: Popular values for the Name attribute
[Please also see the full frequency table.]
ValueFrequency ValueFrequency ValueFrequency
keywords2,189,708quality504,666bgcolor228,085
description2,100,858revisit-after475,765language215,683
generator943,496copyright423,210top205,657
robots937,844progid281,339submit192,095
author818,017distribution236,615q182,848
movie530,989rating230,894map176,556

The Class attribute

This attribute is used to identify document structures for use in CSS. Multiple elements can carry the same class name, thereby allowing authors to create arbitrary groupings to control presentation. In all, 2,139,184 of the URLs in MAMA (60.96%) had at least one occurrence of the Class attribute. In addition, 98.52% of URLs that use Classalso use CSS in some manner. The value for the Class attribute is a space-separated list of class names, but the typical usage is a single class name—only 296,136 of the URLs using the attribute (13.84%) had any Class value carrying more than one class name.

Elements using the Class attribute

By relative percentage, the Class attribute shows a strong usage tendency. Block- and Form-related elements have high usage but inline/phrasal elements usually have much lower representation. The only real exceptions to this are hyperlinks (A), and the SPAN element (and also, inexplicably, the ABBR element). In fact, the generic block and inline elements (DIV and SPAN respectively) are among the highest relative representation of any of the elements. On the other hand, basic structural elements like HTML and BODY have surprisingly low relative Class usage. Adobe GoLive's editor appears to generate a Class attribute for its custom CSACTION element in every page it touches.

Fig 3-1: Elements using Class and relative attribute popularities
[Please also see the full frequency table.]
ELEMENTFrequency% of
Total
element
usage
  ELEMENTFrequency% of
Total
element
usage
  ELEMENTFrequency% of
Total
element
usage
A1,111,52633.61%IMG320,2819.95% H293,53016.32%
TD1,082,97937.45%LI228,42227.07% SELECT90,78131.81%
SPAN1,046,84068.51%UL197,72924.41% H362,41114.23%
DIV1,031,38441.26%FONT165,1868.01% FORM44,2474.25%
P736,88527.26%TR147,4745.10% HR37,6895.17%
TABLE580,28120.05%BODY119,8993.47% B36,8122.04%
INPUT438,51643.48%H1101,91713.25% STRONG31,6152.87%

The Class attribute frequency

Most pages were found to use this attribute, and did they ever! Some of the extreme cases found employed Class attribute in impressive quantities. Because the attribute is used as an aggregation mechanism with disparate elements, usage in high quantities is to be expected. The average number of Class attributes in a page (when they were used) was found to be 48.4. The highest quantity of the attribute recorded by MAMA was 98,439 times, but the live version at the time of writing is even higher: 102,627 times! It is a spreadsheet application—a gigantic grid of cells, each with a Class attribute—a sure way to inflate an attribute if ever there was one. That single case has 4 times as many Class attributes as any other case found in MAMA. A full frequency table of Class attribute quantities is available.

Fig 3-2: URLs with the most Class attributes
URLClass
Quantity
http://spreadsheets.google.com/pub?key=pk_yT3zL-5Gp45a3UyKbMOA/ 102,627
http://www.amphilsoc.org/library/mole/c/chargaff.htm/ 25,417
http://rpo.library.utoronto.ca/poem/19.html/ (URL no longer active) 15,940
http://nzvillage.com/newzealand/cms/front_content.php?idcat=164/ 13,691

The Class name values

In Hickson's Google research, he takes a close look at Class attribute values. As the main editorial force behind WhatWG's and the W3C's HTML 5, this was definitely useful and informative data to gather and examine. Hickson said that one value of the Class attribute was "baffling"—the value of "link". In the URLs sampled from MAMA, the class seems to be used often in relation to hyperlinks. The obvious question then is why an author would not just use an A or AREA element selector instead of creating a custom class. Well, a small sampling of URLs using this class value showed that it was applied to structures related to a hyperlink just as often as it was applied to a direct hyperlink itself (like being applied to a TD table cell encapsulating nothing but a link). In such cases, using a simple CSS Element selector would not be sufficient. Yes, other methods could be used to reference it (and probably are—CSS selectors are not yet tracked in MAMA), but this method at least is widely used.

The frequency table of Class attribute values compares favorably to Hickson's Google research. In all, 15 of the top 20 values from MAMA's list are in the top 20 from Google's list, and the top 2 values ("footer" and "menu") are the same order in both. The most popular value "footer" is twice as popular as its natural companion "header"; so, could one say that authors prefer page footers to page headers in their designs? One big noticeable trend from the Class value list: there are a high number of class names of the form: /style\d+/. The popularity of each class value decreases as the integer value at the end increases. MAMA detected values like this going at least up to "style117" and probably higher. A high (but untested) correlation was noticed between class names of this type and the use of Macromedia Dreamweaver scripting library functions. As Macromedia Dreamweaver is not always the easiest editor to detect, this correlation will remain a theory.

Fig 3-3: Popular values of the Class attribute
[Please also see the full frequency table.]
ValueFrequency ValueFrequency ValueFrequency
footer179,528style289,851nav68,634
menu146,673header89,274clear68,571
style1138,308copyright86,979search59,802
msonormal123,374button81,503style456,032
text122,911main69,620logo48,831
content113,951style369,349body48,052
title91,957small68,995left47,822

The Style attribute

This attribute is used to specify CSS at ground-level—"in the trenches" so to speak. Using CSS in this way negates many of its broad control advantages; styles applied only affect the current element and its descendents. In all, 1,878,916 of MAMA's URLs (53.54%) use it in some fashion. It is used most often with the generic DIV and SPAN elements - to be expected since these elements don't have any special intrinsic rendering behaviors on their own. There is a noticeable authoring fondness for using Style with Form-related elements, and Table-related elements (although there are some exceptions with the latter, like the TR element). Most pages use Style with Inline/Phrasal markup elements sparingly, while its popularity with most block elements fares much better.

Fig 4-1: Elements using Style and relative attribute popularities
[Please also see the full frequency table.]
ELEMENTFrequency% of
Total
element
usage
 ELEMENTFrequency% of
Total
element
usage
 ELEMENTFrequency% of
Total
element
usage
DIV884,82535.40%INPUT270,15526.79% H148,7676.34%
SPAN637,42241.72%BODY217,8906.31% LI45,5705.40%
TD616,24921.31%FONT171,6468.33% IFRAME44,72920.11%
TABLE586,22320.26%FORM126,23312.13% BR44,3371.46%
IMG464,34714.42%SELECT77,08027.01% MARQUEE41,07629.27%
A462,29713.98%TR68,6152.37% B35,8191.98%
P434,28816.07%UL52,9276.53% H233,9295.92%

The Id attribute

This attribute is used to create a document-wide unique identifier for an element. The Id attribute was originally meant to supercede the use of the Name attribute, but with 1,782,769 of MAMA's URLs (50.80%) using Id and 3,220,308 using Name (91.77%), it is a transition that seems to be still very much "in progress". DIV uses the Id attribute more than twice as often as its nearest neighbors in the frequency table, TABLE and IMG. DIV usage of Id is also 5 times as popular as its related cousin SPAN. IFRAME and Form-related elements have rather high usage rates. UL representation is also quite high, but there doesn't seem to be any obvious justification for that outcome. Netscape 4.x's proprietary LAYER and ILAYER elements each have over 50% relative usage of Id.

Fig 5-1: Elements using Id and relative attribute popularities
[Please also see the full frequency table.]
ELEMENTFrequency% of
Total
element
usage
 ELEMENTFrequency% of
Total
element
usage
  ELEMENTFrequency% of
Total
element
usage
DIV1,085,48243.42%TD230,3127.96% P81,1283.00%
TABLE482,76016.68%UL192,45323.76% SELECT68,08723.86%
IMG471,80714.65%SPAN180,55311.82% MAP58,14112.70%
INPUT372,90536.97%OBJECT165,62831.05% IFRAME57,00125.62%
A319,6199.66%LI91,02210.79% H141,2815.37%
FORM266,88625.64%BODY90,8832.63% TR33,0531.14%

The Id attribute frequency

About half of the pages in MAMA used the Id attribute. Pages that used the attribute averaged about 15.8 uses per document. As with many other cases in MAMA, the extreme use case was unique—it used 3 times as many Id attributes as the next-nearest URL. A full frequency table of Id quantities is ready for any curious readers.

Fig 5-2: URLs with the most Id attributes
URLId
Quantity
http://www.gibson.com/en-us/Divisions/Gibson%20Original/Gibson%20Mandolins/ 12,084
http://compas.ifrance.com/compas/ 4,452
http://cabokarate.spaces.live.com/ 3,519
http://www.townhall.com/columnists/ 1,984

The Id attribute values

At first glance, it does not seem like examining these values would be interesting. For an attribute that is supposed to contain unique values, the chances of value overlap between URLs should be much lower than with many other attributes, right? Not so fast. The really interesting thing to note is there is considerable overlap in Id and Class attribute values. "Footer" is the most popular value for each attribute, but many of the most popular values for each attribute hold different relative positions in the value lists. #2 on the Id list "Content" is #6 on the Class list, #3 on the Id list is #9 on the Class list, and so on. Hickson's Google study only looked at Class atcode class=tribute values, but perhaps should have looked at Id as well. It is apparent from the top values in both Class and Id rankings that authors continually have to work around unfilled semantic niches in the standards.

Note: There is an interesting discrepancy between HTML and CSS treatment of the Id value. In HTML, an Id value must begin with a letter ([A-Za-z]), but in CSS there is no such restriction in referencing an Id value. In theory, the HTML constraint should limit Id values to the more limited HTML form, but browsers are usually more forgiving and allow the CSS interpretation of an Id value, so in "the wild" there are many cases where Id values begin with a different character. In MAMA, 135,994 of the URLs using Id (7.63%) had at least one value that began with a character other than [A-Za-z].

Fig 5-3: Id attribute values
[Please also see the full frequency table.]
ValueFrequency  ValueFrequency ValueFrequency
footer288,061table1101,677wrapper66,730
content228,661menu96,161top66,615
header223,726layer193,920table257,934
logo121,351autonumber177,350layer256,823
container119,877search74,887sidebar52,416
main106,327nav72,057image148,922

Some Id value trends

The full list of Id attribute values also points out one other interesting tendency: The top 100 consists of repeating Id archetypes where the value only varies by the addition of numeric counters. This obviously indicates cases where more than one of a single type is used/expected, such as with "table", "image", or "menu". Of the top 100 Id values, over half of the values consist of variations on just 7 of the Id value substrings shown in the table below (Fig 5-4). These patterns are used over and over and do not stop with the top 100. There are many other values that also show templated trends, and substring values like "table" aren't used just 10 times in the entire value list—that is just the number of times they are used in the top 100 values. In actuality, it is used 95 times in the Id value list (and probably more, given an exhaustive complete list of Id values).

Fig 5-4: Popularity of Id attribute templates in Top 100 values
Value
Substring
Frequency Value
Substring
Frequency
layer10/100nav7/100
table10/100autonumber6/100
image7/100main4/100
menu7/100  

The Title attribute

This attribute is used to set "advisory information" about an element. In practical terms, this means authors can specify any value they want. It was found in 1,010,147 of MAMA's URLs (28.79%). It is most popular with hyperlinks (A and AREA), as well as the IMG and LINK elements. Usage on TH elements eclipses the relative usage by TD elements nearly 4:1. A few elements have extraordinarily high usage rates: ABBR and ACRONYM - probably because HTML 4 goes out of its way to define special Title behavior with these elements.

At the request of a co-worker, one extra Title-related feature was tracked by MAMA: Newlines in the source code. Historically, these have caused various problems in some browsers and it was hoped that it could be useful in a testing capacity at some point down the line to track them. Of the pages that used the Title attribute, 21,759 (2.15%) had at least 1 embedded Newline.

Fig 6-1: Elements using Title and relative attribute popularities
[Please also see the full frequency table.]
ELEMENTFrequency% of
Total
element
usage
 ELEMENTFrequency% of
Total
element
usage
 ELEMENTFrequency% of
Total
element
usage
A658,82019.92%FRAME17,2574.56% IFRAME5,8872.65%
IMG367,13211.40%TD16,7170.58% TABLE5,0480.17%
LINK234,35511.61%DIV16,1910.65% H13,3810.44%
INPUT72,4717.19%ABBR14,45597.00% TH3,2932.22%
AREA61,13713.49%LABEL14,4439.05% LI3,1630.37%
OBJECT23,1634.34%ACRONYM10,32994.05% FORM2,8640.28%
SPAN17,9421.17%STYLE7,2360.55% FONT2,8190.14%

The Lang attribute

This attribute indicates the base language of the content. It takes as a value RFC 1766 language codes. In practice, some values in the full frequency table stray from this ideal—they occasionally include encodings such as "utf-8" and other types of "languages" (such as "javascript1.2"). The relative usage popularity offers a few head-scratchers: SPAN is a much more popular user of Lang than DIV. HTML is also quite popular—considerably higher than application to the BODY element. Both ACRONYM and ABBR usages also stand out with especially high representation.

Fig 7-1: Elements using Lang and relative attribute popularities
ELEMENTFrequency% of
Total
element
usage
 ELEMENTFrequency% of
Total
element
usage
 ELEMENTFrequency% of
Total
element
usage
HTML314,5849.11%FONT2,4230.12% HEAD9780.03%
SPAN87,6755.74%DIV2,3160.09% ACRONYM7707.01%
META55,0031.68%LINK1,5520.08% TITLE6260.02%
BODY38,1571.11%P1,3900.05% FORM6250.06%
A3,8300.12%SCRIPT1,2310.05% ABBR5933.98%

The Dir attribute

This attribute is meant to give the "base direction of directionally neutral text" for an element's content and attribute values. There are two acceptable values, "ltr" (left-to-right) and "rtl" (yep—right-to-left). The attribute was detected at least once in 136,997 of the URLs MAMA analyzed (only 3.90%). The complete list of values detected for this attribute show some other uses not defined by HTML, aside from the usual typos and other uninteresting noise. One use is apparently to define a base directory ("dir") associated with the element to which it is applied. However, these other usages are absolutely trampled by the two main accepted values in terms of popularity. The least popular of the two, "rtl" occurs more than 100 times as much as the next nearest non-HTML value. The left-to-right value "ltr" is more than 10 times as popular as "rtl". Authors have a clear preference for using this attribute on the TABLE and HTML elements over others.

Dir attribute value popularity

Fig 8-1: Dir attribute values
ValueFrequency
ltr129,893
rtl11,321

Elements using the Dir attribute

The TABLE and HTML elements show the strongest Dir attribute tendency, with the BLOCKQUOTE element actually having the highest relative usage. The TABLE and HTML usages are understandable but BLOCKQUOTE may not be as obvious. A moment's reflection reveals why—to maintain the full integrity of a quotation, the natural language and internal direction of the content must be preserved.

Fig 8-2: Elements using Dir and relative attribute popularities
ELEMENTFrequency% of
Total
element
usage
 ELEMENTFrequency% of
Total
element
usage
TABLE60,9222.10%BLOCKQUOTE4,9192.60%
HTML47,0111.36%INPUT1,8920.19%
P14,0280.52%UL7200.09%
BODY8,4740.25%A6510.02%
TD7,5350.26%SELECT6430.23%
SPAN7,0850.46%FONT5170.03%
DIV5,5920.22%H15050.07%

The Accesskey attribute

An accesskey is supposed to be a single character used to give focus to an element. In all, 80,026 URLs had at least one element carrying an Accesskey attribute, but the two most popular elements that use it are A (68.57%) and INPUT (44.36%). Three other elements also used this attribute on a regular basis. The full frequency table of values shows that authors do well at restricting their values to a single character. Curiously, the most popular character used is "s" (with no obvious rationale to its popularity). After that, numbers dominate the list. Ten of the top 15 Accesskey values are the digits "0"-"9" (with "1" being most popular). Looking beyond the numbers, we find that the entire English alphabet ranks next before anything else. The top 36 spots consist of the digits "0"-"9" and the English alphabet "a"-"z" (MAMA ignored case-sensitivity when generating this frequency table). The INPUT and LABEL elements seem to have a greater affinity for the attribute than other elements, and A enjoys a much higher relative usage rate than the analogous AREA element.

Fig 9-1: Elements using Accesskey and relative attribute popularities
ELEMENTFrequency% of
Total
element
usage
A54,8761.66%
INPUT35,5013.52%
LABEL5,3303.34%
SELECT6010.21%
AREA5180.11%

The Tabindex attribute

This attribute gives an explicit position in the tabbing order for the current page. In MAMA, 49,081 URLs use the Tabindex attribute at least once—usage on the INPUT element represents over 70% of that total, while its next most-popular use on the A element trails far behind at just over 30%. HTML 4 defines only a narrow set of elements that can use Tabindex, but in common usage some elements that aren't in the HTML 4 allowed set are more popular than some of those that are—for example, IMG and DIV usages are more popular than OBJECT and BUTTON. A look at element-relative usage shows that authors prefer to use this attribute with form widgets (INPUT, SELECT, TEXTAREA and BUTTON) over other elements.

Fig 10-1: Elements using Tabindex and relative attribute popularities
ELEMENTFrequency% of
Total
element
usage
 ELEMENTFrequency% of
Total
element
usage
 ELEMENTFrequency% of
Total
element
usage
INPUT34,7253.44%DIV6230.03% LI3080.04%
A14,8980.45%OBJECT5230.10% TABLE2790.01%
SELECT5,2821.85%BUTTON4644.05% SPAN2440.02%
TEXTAREA1,5704.31%AREA3840.09% LABEL1660.10%
IMG1,3760.04%TD3840.01% IFRAME960.04%

The Longdesc attribute

This attribute is a URL that should provide a "longer description" of the resource. In MAMA, 26,641 URLs used Longdesc in some manner. Only 4 elements were found to use this attribute in noticeable quantities: IMG, FRAME, IFRAME and INPUT, with the IMG usage occurring far more than any other type (95.39% of all URLs with Longdesc usage). All the values here should be absolute or relative URLs, but the reality is a bit different. The full frequency table for this attribute at first seems rather unremarkable. All of the frequencies are quite low, indicating the unique nature of the attribute values. What also stands out is how many of the values are definitely not URLs, such as:

"boy hunched alone with hands in arms"

A rough estimate of the number of non-URL Longdesc attribute values is just over 1/3.

Fig 11-1: Elements using Longdesc and relative attribute popularities
ELEMENTFrequency% of
Total
element
usage
IMG25,4130.79%
FRAME6430.17%
IFRAME3620.16%
INPUT2510.03%

The Disabled attribute

This attribute is not very widely used—only 6,643 URLs were detected to carry the attribute in any capacity. The INPUT, SELECT and OPTION elements are where it is used most, but there are some other oddities in the list below; how exactly does a SPAN element set to Disabled react any differently than a SPAN that doesn't carry the attribute?

Fig 12-1: Elements using Disabled and relative attribute popularities
ELEMENTFrequency% of
Total
element
usage
 ELEMENTFrequency% of
Total
element
usage
INPUT2,6880.27%SPAN1060.01%
SELECT15150.53%TEXTAREA790.22%
OPTION13250.47%STYLE740.01%
A5560.02%IMG630.00%
LINK3550.02%BUTTON590.52%

This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.

Comments

The forum archive of this article is still available on My Opera.

No new comments accepted.