MAMA: Markup report, part 4: Forms, tables, and plug-ins, oh my!

By Brian Wilson

Introduction

In this week's overview we wrap up MAMA's look at markup by covering its most complex structures—forms, tables, and plug-ins. These topics take Web pages from a simple series of text, links, images, and lists to an entirely different level. Forms greatly expand user interaction possibilities. Tables generate axial relationships—which authors have creatively distorted for their most popular (and questionable) use, creating pixel-perfect grid based layouts. Plug-ins afford extensibility beyond HTML's stock capabilities. Without any of these features, HTML would be a barren, unexciting markup language. For a deeper look at these areas and more, the following MAMA article topics are also available this week:

Forms

Aside from hyperlinks, forms are the main way in which users interact with the Web. Among their varied critical uses, forms allow people to find things with search engines, publish their thoughts with blogging systems, and buy things on e-commerce sites. Forms in general are very popular—found on up to one-third of all pages analyzed.

Elements used in forms

The popularity of the main types of form elements varies widely, and sometimes surprisingly. For example, almost every FORM has an INPUT, but relatively few make use of TEXTAREA. Such variations may be due to a number of factors, including inherent biases in MAMA's current URL set (a majority of MAMA's URLs are Surface/Home pages, which rarely have forms on them, apart from the increasingly-popular search field). The intended use of a Web page often dictates the types of elements used, including form elements.

Frequencies of form-related elements
ELEMENTFrequency ELEMENTFrequency
FORM1,040,771TEXTAREA36,410
INPUT1,008,545FIELDSET31,673
SELECT285,362LEGEND18,269
OPTION281,923BUTTON11,455
LABEL159,631OPTGROUP5,348

The FORM element

We will start our look at form elements by looking at its main container element: FORM. It was detected in 1,040,771 of MAMA's URLs. Notice that the Action attribute is used on most of these pages—it specifies what to do with the information the form is collecting. This attribute is required, so the dominance here is understandable. The Method attribute is only slightly less popular than the Action attribute (89.4% of all forms usage).

Top attributes of the FORM element
Form
Attribute
Frequency
Action977,934
Method930,343
Name570,643
Id266,886
Target199,085

The Method attribute

Approximately 70% of pages that specify an explicit HTTP Method use the "post" method, while ~46% use the "get" method. This would indicate a clear authoring preference for the "post" method, but there are a few factors to consider. About 15% of the pages specifying the Method attribute use multiple forms on the page that mix both "post" and "get" methods. There are 110,428 URLs that used the FORM element with no Method attribute; "get" is the implied default value in such cases. This brings the relative preferences for Method amongst all FORM usages much closer: 62.2% for "post" and 51.6% for an explicit or implied "get" value.

FORM Method attribute explicit values
Method
value
Frequency
post647,234
get426,192

The INPUT element

This popular element is used in 96.9% of all documents using forms. With the element's functionality being as overloaded as it is, this popularity is both understandable and expected. Some of its attributes are also very popular.

Popular attributes of the INPUT element
ELEMENT/AttributeFrequency ELEMENT/AttributeFrequency
INPUT1,008,545   Maxlength329,415
   Type1,005,152   Alt213,924
   Name990,058   Border172,843
   Value947,403   Checked135,049
   Size656,354   Width120,420
   Src335,990   Height119,902

The Type attribute

Many of the attributes for the INPUT element are only applicable to specific Type attribute values, so we must examine this attribute's values first.

Popular values of the INPUT Type attribute
Attribute valueFrequencyAttribute valueFrequency
text806,926radio159,626
hidden733,126empty110,971
submit568,445checkbox81,260
image337,286button71,031
password167,098reset17,417

We can now look more deeply at the various uses of the messy INPUT element:

  • The "empty" value indicates that an INPUT element did not have a Type value at all. In such situations, a widget is interpreted as Type="Text". In all, 79,050 URLs used INPUT elements where none of them specified a Type attribute.
  • In the early days of forms, "Submit" buttons were usually paired with a "Reset" button, but today, that seems to be passé. By comparison, "Reset" is rarely encountered now.
  • The "Submit" and "Image" types: Because "Image" is a type of submittal, and each will often be used to the exclusion of the other, looking at their combined totals shows that submittal is the most popular function of forms (more popular than "Text"). This is actually an expected result.
  • The Type="Image" related attributes: Width and Hspace (horizontal dimensions) have just a slight edge over Height and Vspace (vertical dimensions), just like they do with the IMG element.
  • The exclusive choice widget, Type="Radio", is twice as popular as the multi-choice Type="Checkbox" widget.

Tables

Tables have a bad reputation among the markup purists in the development community, because many authors often use them solely for Web page layout. Tables generally increase the complexity of documents and can make them more difficult to maintain. Authors do not really see these factors as significant drawbacks, though, judging by the overwhelming popularity of layout tables in the MAMA result set. In practice, the use of presentational tables by authors is what makes the main table-related elements some of the most popular sub-elements of BODY, after the A and IMG elements. The most frequently occurring of these is the TABLE element, found in 2,894,184 of MAMA's URLs (82.5%). Authors have a definite preference for the table elements they use. Almost every table uses the TABLE, TR and TD elements. All of the other elements are used rarely by comparison. CAPTION, COL, THEAD, COLGROUP, and TFOOT are all used in less than 1% of TABLE occurrences.

Table-related elements
ELEMENTFrequency ELEMENTFrequency
TABLE2,894,184CAPTION23,306
TD2,891,972COL21,775
TR2,891,205THEAD21,474
TBODY364,542COLGROUP12,225
TH148,344TFOOT3,947

Attributes of the TABLE element

This wrapper element for table structures is (naturally) the most popular element of its type. It ranks #8 overall in element popularity, used in 82.47% of all MAMA's URLs. Many attributes were detected for this element, only some of which are in the standards. A few of these attributes are VERY popular with authors - Border, Width, Cellpadding and Cellspacing are used in ~90% of all URLs that use tables. Usage of other attributes, like Rules and Frame barely register; they are used in less than 0.5% of all TABLE cases.

Popular attributes of the TABLE element
AttributeFrequency AttributeFrequency
Border2,691,899Height1,220,050
Width2,637,117Bgcolor893,573
Cellpadding2,585,020Bordercolor417,650
Cellspacing2,578,416Background281,209
Align1,226,047Valign87,291

The TD and TH elements

These two elements are grouped together because they mostly share the same attributes and have very similar usage. But their usage rates could not be more different. The most popular table sub-element is TD (detected in 2,891,972 URLs), and it is the 9th most popular element overall (used in 82.4% of all URLs in MAMA and 99.9% of all URLs using the TABLE element). The TH sub-element, on the other hand, is used in only 5.1% of URLs using the TABLE element. Because of the inherent attribute overlap between TD and TH, it can be interesting to compare attribute usage rates between the two elements. Percentages of the total element usage are also provided to help cross-comparisons.

Top attributes of the TD and TH elements
TD AttributeFrequency% of
Element
 TH AttributeFrequency% of
Element
TD2,891,972-- TH148,344--
Width2,324,75280.4% Valign46,79931.6%
Valign2,189,28775.7% Width45,70930.8%
Align1,977,36768.4% Colspan38,58726.0%
Colspan1,711,43759.2% Align35,71024.1%
Height1,672,12957.8% Scope30,11120.3%
Bgcolor1,306,54245.2% Height28,19519.0%
Rowspan901,30331.2% Bgcolor22,40615.1%
Background714,70624.7% Nowrap10,4697.1%
Nowrap353,57212.2% Rowspan6,3244.3%

How deeply are tables nested?

One of the features requested for MAMA was the ability to detect deeply-nested tables. Such structures can be excellent stress tests for a browser. In theory, every TABLE open tag should have a corresponding closing tag. As MAMA traversed a document, any TABLE open tags added 1 to the current depth counter. A closing TABLE tag would subtract 1 from the depth counter. When the depth counter hit a new high score for the document, that value became the new "maximum table depth". This rather simplistic system yielded a number for a document's "maximum table nesting depth"—it does not necessarily mean that the open and closing tags are properly nested; that is another issue entirely. The average nesting depth when tables were used was 2.77. The maximum nesting depth discovered was an astounding 745 deep at http://www.artsforeveryone.com/.

Plug-ins

The Web has multiple elements to handle plug-ins because of simple evolution. At first, there was no standardized way to use plug-ins, so solutions arose haphazardly—APPLET, EMBED, and PARAM. The standards process produced a cohesive solution in the OBJECT element, but authoring inertia seems to indicate that APPLET and EMBED are not going anywhere. Rather than the OBJECT element being used instead of EMBED, the majority of OBJECT tags are used in conjunction with EMBED elements. In all, 503,783 URLs use both EMBED and OBJECT elements (94.5% of all OBJECT and 92.3% of all EMBED instances).

Plugin-related elements
ELEMENTFrequency
PARAM576,702
OBJECT533,343
EMBED545,734
APPLET52,160

Flash usage

MAMA tried to discover evidence of Flash usage in every document it analyzed. It had to resort to looking for a number of different factors, as authors can use Flash in many ways. Its use was detected by satisfying one or more of the following components:

  • Any PARAM element that contained the substrings ".swf" or "flash"
  • Any MIME types containing the substring "flash" from getting any EMBED[Src] or OBJECT[Data] URLs
  • Any scripting component containing the substring "flash" or ".swf"

Using these criteria, 1,176,227 URLs were found to be using Flash. This is a MUCH higher result than one would expect by looking solely at the EMBED and OBJECT elements. This means that either some aspect(s) of MAMA's detection mechanism are too relaxed, or that some part of the analysis is flagging a lot of positive matches that EMBED or OBJECT detection alone does not catch. If any part of the above detection is suspect, it is likely to be the scripting detection of Flash (due to the simplistic nature of its substring search). Judging by anecdotal evidence seen over the years, the number is probably pretty accurate; scripting is frequently given the duty of dynamically generating plug-in markup.

Java usage

As with Flash, there were a number of methods MAMA used to detect Java usage. The following criteria were used to judge whether or not Java was being used in a URL and resulted in the detection of 53,688 matches:

  • Any usage of the APPLET element
  • Any PARAM element that contained the substrings ".class" or "java"
  • Any MIME types containing the substring "java" from getting any OBJECT[Data] URLs
  • Any scripting component containing the substring "application/java-vm"

Conclusion

Now that we have spent several weeks looking intensely at HTML's many markup topics (and rightly so), we will next be turning our attention to other important Web page technologies that are vital to address in any examination of the Web. Next week we will look at the details of CSS usage: the whos, whats, wheres, whens, whys, and hows of the way CSS is used.

This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.

Comments

The forum archive of this article is still available on My Opera.

No new comments accepted.