MAMA: Markup report, part 4: Forms, tables, and plug-ins, oh my!
Introduction
In this week's overview we wrap up MAMA's look at markup by covering its most complex structures—forms, tables, and plug-ins. These topics take Web pages from a simple series of text, links, images, and lists to an entirely different level. Forms greatly expand user interaction possibilities. Tables generate axial relationships—which authors have creatively distorted for their most popular (and questionable) use, creating pixel-perfect grid based layouts. Plug-ins afford extensibility beyond HTML's stock capabilities. Without any of these features, HTML would be a barren, unexciting markup language. For a deeper look at these areas and more, the following MAMA article topics are also available this week:
Forms
Aside from hyperlinks, forms are the main way in which users interact with the Web. Among their varied critical uses, forms allow people to find things with search engines, publish their thoughts with blogging systems, and buy things on e-commerce sites. Forms in general are very popular—found on up to one-third of all pages analyzed.
Elements used in forms
The popularity of the main types of form elements varies widely, and sometimes
surprisingly. For example, almost every FORM
has an
INPUT
, but relatively few make use of
TEXTAREA
. Such variations may be due to a number of
factors, including inherent biases in MAMA's current URL set (a majority of MAMA's
URLs are Surface/Home
pages, which rarely have forms on them, apart from the increasingly-popular search field). The intended use of a Web page often dictates the types of elements
used, including form elements.
ELEMENT | Frequency | ELEMENT | Frequency | |
---|---|---|---|---|
FORM | 1,040,771 | TEXTAREA | 36,410 | |
INPUT | 1,008,545 | FIELDSET | 31,673 | |
SELECT | 285,362 | LEGEND | 18,269 | |
OPTION | 281,923 | BUTTON | 11,455 | |
LABEL | 159,631 | OPTGROUP | 5,348 |
The FORM
element
We will start our look at form elements by looking at its main container element:
FORM
. It was detected in 1,040,771 of MAMA's URLs.
Notice that the Action
attribute is used on most of these
pages—it specifies what to do with the information the form is collecting.
This attribute is required, so the dominance here is understandable. The
Method
attribute is only slightly less popular than the
Action
attribute (89.4% of all forms usage).
Form Attribute | Frequency |
---|---|
Action | 977,934 |
Method | 930,343 |
Name | 570,643 |
Id | 266,886 |
Target | 199,085 |
The Method
attribute
Approximately 70% of pages that specify an explicit HTTP Method use the
"post" method, while ~46% use the "get"
method. This would indicate a clear authoring preference for the
"post" method, but there are a few factors to
consider. About 15% of the pages specifying the Method
attribute use multiple forms on the page that mix both
"post" and "get" methods.
There are 110,428 URLs that used the FORM
element
with no Method
attribute; "get"
is the implied default value in such cases. This brings the relative preferences
for Method
amongst all FORM
usages much closer: 62.2% for "post" and 51.6% for
an explicit or implied "get" value.
Method value | Frequency |
---|---|
post | 647,234 |
get | 426,192 |
The INPUT
element
This popular element is used in 96.9% of all documents using forms. With the element's functionality being as overloaded as it is, this popularity is both understandable and expected. Some of its attributes are also very popular.
ELEMENT/Attribute | Frequency | ELEMENT/Attribute | Frequency | |
---|---|---|---|---|
INPUT | 1,008,545 | Maxlength | 329,415 | |
Type | 1,005,152 | Alt | 213,924 | |
Name | 990,058 | Border | 172,843 | |
Value | 947,403 | Checked | 135,049 | |
Size | 656,354 | Width | 120,420 | |
Src | 335,990 | Height | 119,902 |
The Type
attribute
Many of the attributes for the INPUT
element are only
applicable to specific Type
attribute values, so we must
examine this attribute's values first.
Attribute value | Frequency | Attribute value | Frequency | |
---|---|---|---|---|
text | 806,926 | radio | 159,626 | |
hidden | 733,126 | empty | 110,971 | |
submit | 568,445 | checkbox | 81,260 | |
image | 337,286 | button | 71,031 | |
password | 167,098 | reset | 17,417 |
We can now look more deeply at the various uses of the messy INPUT
element:
- The "empty" value indicates that an
INPUT
element did not have aType
value at all. In such situations, a widget is interpreted asType
="Text". In all, 79,050 URLs usedINPUT
elements where none of them specified aType
attribute. - In the early days of forms, "Submit" buttons were usually paired with a "Reset" button, but today, that seems to be passé. By comparison, "Reset" is rarely encountered now.
- The "Submit" and "Image" types: Because "Image" is a type of submittal, and each will often be used to the exclusion of the other, looking at their combined totals shows that submittal is the most popular function of forms (more popular than "Text"). This is actually an expected result.
- The
Type
="Image" related attributes:Width
andHspace
(horizontal dimensions) have just a slight edge overHeight
andVspace
(vertical dimensions), just like they do with theIMG
element. - The exclusive choice widget,
Type
="Radio", is twice as popular as the multi-choiceType
="Checkbox" widget.
Tables
Tables have a bad reputation among the markup purists in the development community,
because many authors often use them solely for Web page layout. Tables
generally increase the complexity of documents and can make them more difficult
to maintain. Authors do not really see these factors as significant drawbacks, though,
judging by the overwhelming popularity of layout tables in the MAMA result set. In practice, the use of presentational
tables by authors is what makes the main table-related elements some of the most
popular sub-elements of BODY
, after the A
and IMG
elements. The most frequently occurring of
these is the TABLE
element, found in 2,894,184 of MAMA's
URLs (82.5%). Authors have a definite preference for the table elements they use.
Almost every table uses the TABLE
, TR
and TD
elements. All of the other elements are used rarely
by comparison. CAPTION
, COL
,
THEAD
, COLGROUP
, and
TFOOT
are all used in less than 1% of
TABLE
occurrences.
ELEMENT | Frequency | ELEMENT | Frequency | |
---|---|---|---|---|
TABLE | 2,894,184 | CAPTION | 23,306 | |
TD | 2,891,972 | COL | 21,775 | |
TR | 2,891,205 | THEAD | 21,474 | |
TBODY | 364,542 | COLGROUP | 12,225 | |
TH | 148,344 | TFOOT | 3,947 |
Attributes of the TABLE
element
This wrapper element for table structures is (naturally) the most popular element
of its type. It ranks #8 overall in element popularity, used in 82.47% of all
MAMA's URLs. Many attributes were detected for this element, only some of which
are in the standards. A few of these attributes are VERY popular
with authors - Border
, Width
,
Cellpadding
and Cellspacing
are used in ~90% of all URLs that use tables. Usage of other attributes, like
Rules
and Frame
barely
register; they are used in less than 0.5% of all TABLE
cases.
Attribute | Frequency | Attribute | Frequency | |
---|---|---|---|---|
Border | 2,691,899 | Height | 1,220,050 | |
Width | 2,637,117 | Bgcolor | 893,573 | |
Cellpadding | 2,585,020 | Bordercolor | 417,650 | |
Cellspacing | 2,578,416 | Background | 281,209 | |
Align | 1,226,047 | Valign | 87,291 |
The TD
and TH
elements
These two elements are grouped together because they mostly share the same
attributes and have very similar usage. But their usage rates could not be more
different. The most popular table sub-element is TD
(detected in 2,891,972 URLs), and it is the 9th most popular element overall (used in 82.4%
of all URLs in MAMA and 99.9% of all URLs using the TABLE
element). The TH
sub-element, on the other hand, is used in only 5.1%
of URLs using the TABLE
element. Because of the
inherent attribute overlap between TD
and
TH
, it can be interesting to compare attribute usage
rates between the two elements. Percentages of the total element usage are
also provided to help cross-comparisons.
TD Attribute | Frequency | % of Element |
TH Attribute | Frequency | % of Element | |
---|---|---|---|---|---|---|
TD | 2,891,972 | -- | TH | 148,344 | -- | |
Width | 2,324,752 | 80.4% | Valign | 46,799 | 31.6% | |
Valign | 2,189,287 | 75.7% | Width | 45,709 | 30.8% | |
Align | 1,977,367 | 68.4% | Colspan | 38,587 | 26.0% | |
Colspan | 1,711,437 | 59.2% | Align | 35,710 | 24.1% | |
Height | 1,672,129 | 57.8% | Scope | 30,111 | 20.3% | |
Bgcolor | 1,306,542 | 45.2% | Height | 28,195 | 19.0% | |
Rowspan | 901,303 | 31.2% | Bgcolor | 22,406 | 15.1% | |
Background | 714,706 | 24.7% | Nowrap | 10,469 | 7.1% | |
Nowrap | 353,572 | 12.2% | Rowspan | 6,324 | 4.3% |
How deeply are tables nested?
One of the features requested for MAMA was the ability to detect deeply-nested
tables. Such structures can be excellent stress tests for a browser. In theory,
every TABLE
open tag should have a corresponding closing
tag. As MAMA traversed a document, any TABLE
open tags
added 1 to the current depth counter. A closing TABLE
tag would subtract 1 from the depth counter. When the depth counter hit a new high
score for the document, that value became the new "maximum table depth". This rather
simplistic system yielded a number for a document's "maximum table nesting depth"—it does not necessarily mean that the open and closing tags are properly nested;
that is another issue entirely. The average nesting depth when tables were used was 2.77.
The maximum nesting depth discovered was an astounding 745 deep at
http://www.artsforeveryone.com/.
Plug-ins
The Web has multiple elements to handle plug-ins because of simple evolution.
At first, there was no standardized way to use plug-ins, so solutions arose
haphazardly—APPLET
, EMBED
,
and PARAM
. The standards process produced a cohesive
solution in the OBJECT
element, but authoring inertia
seems to indicate that APPLET
and EMBED
are not going anywhere. Rather than the OBJECT
element
being used instead of EMBED
, the majority of OBJECT
tags are used
in conjunction with EMBED
elements.
In all, 503,783 URLs use both EMBED
and OBJECT
elements (94.5% of all OBJECT
and 92.3% of all
EMBED
instances).
ELEMENT | Frequency |
---|---|
PARAM | 576,702 |
OBJECT | 533,343 |
EMBED | 545,734 |
APPLET | 52,160 |
Flash usage
MAMA tried to discover evidence of Flash usage in every document it analyzed. It had to resort to looking for a number of different factors, as authors can use Flash in many ways. Its use was detected by satisfying one or more of the following components:
- Any
PARAM
element that contained the substrings ".swf" or "flash" - Any MIME types containing the substring "flash"
from getting any
EMBED
[Src
] orOBJECT
[Data
] URLs - Any scripting component containing the substring "flash" or ".swf"
Using these criteria, 1,176,227 URLs were found to be using Flash. This is a
MUCH higher result than one would expect by looking solely at
the EMBED
and OBJECT
elements.
This means that either some aspect(s) of MAMA's detection mechanism are too
relaxed, or that some part of the analysis is flagging a lot of positive matches
that EMBED
or OBJECT
detection
alone does not catch. If any part of the above detection is suspect, it is likely
to be the scripting detection of Flash (due to the simplistic nature of its
substring search). Judging by anecdotal evidence seen over the years, the number
is probably pretty accurate; scripting is frequently given the duty of dynamically
generating plug-in markup.
Java usage
As with Flash, there were a number of methods MAMA used to detect Java usage. The following criteria were used to judge whether or not Java was being used in a URL and resulted in the detection of 53,688 matches:
- Any usage of the
APPLET
element - Any
PARAM
element that contained the substrings ".class" or "java" - Any MIME types containing the substring "java"
from getting any
OBJECT
[Data
] URLs - Any scripting component containing the substring "application/java-vm"
Conclusion
Now that we have spent several weeks looking intensely at HTML's many markup topics (and rightly so), we will next be turning our attention to other important Web page technologies that are vital to address in any examination of the Web. Next week we will look at the details of CSS usage: the whos, whats, wheres, whens, whys, and hows of the way CSS is used.
This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.
Comments
The forum archive of this article is still available on My Opera.