MAMA: Markup report, part 2: Primary functional and structural markup

By Brian Wilson

Introduction

This time we will look at some of the basic document structural elements. These are the elements that form the backbone of most documents. Some of the topics mentioned this week carry so much detail (such as the child elements of the HEAD element) that we can only give them brief lip service here. For a deeper look at these areas and more, the following MAMA article topics are also available this week:

Some of the topics involved with the HEAD element, such as CSS (the LINK and STYLE elements), and script (the SCRIPT element) will receive MUCH more attention in other articles coming soon.

To read more details of MAMA's findings, check out the MAMA home page.

Frames

The document layout concept for Web pages known as "frames" was first implemented in Netscape 2.0 in 1995. It allows the browser window to be sub-divided into any number of rows or columns of smaller windowed documents. The concept has many design and usability problems; yet, it is popular enough (and easy enough) that its usage has blossomed over the years. Many authors and designers have a special place of fury in their hearts for frames—a place where disdain for other reviled constructs like the BLINK and MARQUEE elements lives. The current version of frames enjoys wide deployment "in the wild", despite its many drawbacks. Frames defiantly maintain a degree of authoring inertia, despite the general disfavor. Many authors probably do not care enough about the arguments against frames to use other alternatives—or else they just are not being original enough in coming up with design alternatives. Despite being dropped in XHTML 1.1, frames are not going to go away any time soon.

Usage of Frame-related elements

The almost identical numbers of FRAMESET and FRAME element usage are an obvious result—neither element does anything useful without the other. The FRAME and IFRAME elements, on the other hand, are not used together very often—only 19,472 of the IFRAME cases (8.8%) use the two elements together. Although the IFRAME use numbers are lower than either the FRAMESET or FRAME totals, the total number is likely much higher— many Web page ad systems are dynamically created by script using IFrames.

Top Frame-related elements
ELEMENTFrequency
FRAMESET378,033
FRAME378,107
IFRAME222,462

An interesting frame-related attribute: Target

Use of the Target attribute is far, far greater than the general usage of frames would indicate. It was detected in 2,077,198 of MAMA's URLs, with A element usage leading the way: 1,978,018 times—more than 3 times as much as the overall use of FRAME and IFRAME would indicate. Why is this? Authors are most likely concerned with the frame situation the hyperlinks in their documents will end up in, so they take steps to control it with the Target attribute.

Target attribute frequency
ELEMENTfrequency
A1,978,018
FORM199,085
BASE159,479
AREA146,703
LINK1,585

Popular Target attribute values

The Target attribute can accept a wide variety of values, but it also has several special reserved keywords, all beginning with the underscore character ("_"): "_blank", "_top", "_self", "_parent" and "_new". Naturally, these values are the most popular. Values resembling these keywords (such as "blank" or "new") are also very common, as are those which stress the parent-child relationship of frame documents to their content documents (including "main" and "contents", and even German equivalents of the same: "hauptframe" and "inhalt").

Top Target attribute values
Target attribute valuefrequency Target attribute valuefrequency
_blank1,548,594blank43,287
_top550,637mainframe31,691
_self306,182google_window20,905
_parent121,225contents18,076
_new84,293hauptframe15,829
main82,075inhalt12,828
new52,756content10,316

The HEAD element and its children

HEAD is the most popular of any element used in MAMA's URL set, found in 98.7% of MAMA's URLs. Its top 5 sub-elements are also in the top 20 of ALL markup elements used. This overview will not spend too much time on this topic. Many of these child elements participate in very important Web page topics, such as CSS and scripting.

Top elements in the HEAD block
ELEMENTfrequency ELEMENTfrequency
HEAD3,464,519LINK2,018,510
TITLE3,459,207STYLE1,313,454
META3,276,347BASE266,149
SCRIPT2,528,823ISINDEX63

The META Name and Http-equiv attributes

The META element is a popular way to assign and designate extra information about the document. It accomplishes important authoring tasks that are not possible in any other way, so its use is extremely very high. This usage is rather evenly divided between two functional attributes: Http-equiv and Name.

Top META Http-equiv and Name attribute values
Http-equiv
attribute
value
frequency Name
attribute
value
frequency
content-type2,679,505keywords2,170,259
content-language456,078description2,098,529
pragma167,801generator942,051
refresh163,413robots931,622
expires163,350author815,415

Common attributes

There are a number of attributes that are nearly universal in scope and usage with HTML; they can be applied to most, if not all elements. The following sections examine some of these in more detail.

Common attribute usage
AttributeFrequency
Name3,220,308
Class2,139,184
Style1,878,916
Id1,782,769
Event handlers ("on*" attributes)1,692,823
Title1,010,147

The Name and Id attributes

These are two similar attributes that both assign unique identifiers to individual elements. Of the two, Name is encountered more often; It is actually the most popular of all the common attributes (used in some form on 91.8% of MAMA's URLs). The Id attribute is the newer method for uniquely labeling an element, while the Name attribute has considerable historical traction with authors under a variety of different uses.

Top elements using Name and Id attributes, with relative attribute popularities
ELEMENTs
using Name
frequency% Total
element
usage
 ELEMENTs
using Id
frequency% Total
element
usage
META2,710,63882.7% DIV1,085,48243.4%
INPUT990,05898.2% TABLE482,76016.7%
IMG875,46027.2% IMG471,80714.7%
PARAM576,50899.97% INPUT372,90537.0%
FORM570,64354.8% A319,6199.7%
A485,16814.7% FORM266,88625.6%
MAP456,64899.7% TD230,3128.0%
FRAME349,82092.5% UL192,45323.8%
SELECT275,32396.5% SPAN180,55311.8%
EMBED138,80925.4% OBJECT165,62831.1%

Name and Id attribute values

There are extreme differences between the most popular values these two attributes carry. The top values of the Name attribute demonstrate their ancestry of specific usage in the popular META, IMG, A, PARAM, and form elements. On the other hand, top values for the Id attribute evidence a templating or classification behavior akin to the use of the Class attribute. The most frequent Id values show sequential unique labels for certain categories, for instance the images in a typical document might all sport successive Id attributes (eg: "image1", "image2", "image3"...). The full attribute value lists for Name and Id demonstrate these behaviors more clearly than the shorter top 10 lists here are able to do.

Popular values for the Name and Id attributes
Name
attribute
value
frequency  Id
attribute
value
frequency
keywords2,189,708footer288,061
description2,100,858content228,661
generator943,496header223,726
robots937,844logo121,351
author818,017container119,877
movie530,989main106,327
quality504,666table1101,677
revisit-after475,765menu96,161
copyright423,210layer193,920
progid281,339autonumber177,350

The Class attribute

This attribute offers a degree of categorization and classification not possible with the inherent element semantics of a markup language. The Class attribute allows multiple elements to share the same grouping, and a single element instance can belong to multiple categories. The attribute sees its greatest expression with CSS (which we will cover more later), but the category names themselves that authors assign are interesting to examine.

Top elements using Class attribute, with relative attribute popularities
ELEMENTfrequency% Total
element
usage
  ELEMENTfrequency% Total
element
usage
A1,111,52633.6%TABLE580,28120.1%
TD1,082,97937.5%INPUT438,51643.5%
SPAN1,046,84068.5%IMG320,28110.0%
DIV1,031,38441.3%LI228,42227.1%
P736,88527.3%UL197,72924.4%

Class attribute values

The most popular Class value, "footer", is twice as popular as its natural companion "header". One big noticeable trend from the full Class value list: there are a high number of class names of the form: /style\d+/. The popularity of each class value decreases as the integer value at the end increases. MAMA detected values like this going at least up to "style117" and probably higher. A high (but untested) correlation was noticed between class names of this type and the use of Macromedia Dreamweaver scripting library functions. As Macromedia Dreamweaver is not always the easiest editor to detect, this correlation will remain a theory.

Top values for the Class attribute
Valuefrequency Valuefrequency
footer179,528content113,951
menu146,673title91,957
style1138,308style289,851
msonormal123,374header89,274
text122,911copyright86,979

Event-handler attributes

As mentioned previously, we will discuss scripting in greater detail soon. However, for now, we will take a look at those HTML markup portals to scripting, the event-handler attributes. Event handlers were detected in ~2/3 of the 2,617,305 MAMA URLs using script. MAMA found 52 unique event-handler attribute names occurring more than 4 times. With each event-handler attribute, there was generally a single element with which it showed the greatest affinity.

Top event-handler attribute usage
Event handlerElement with
highest usage
Frequency when
used with element
Total overall
attribute frequency
OnmouseoverA829,2621,051,631
OnmouseoutA781,567998,854
OnloadBODY741,946772,567
OnclickA492,092684,117
OnchangeSELECT158,761163,476
OnsubmitFORM151,699152,286
OnfocusINPUT146,043197,235

Conclusion

Now that we are starting to see the general shape that markup documents take, we should pause to consider what to look at next. The full writeups for this week offer our first real glimpses of what makes most documents tick (especially the thorough treatment of a document's HEAD structure and the use of common attributes). Looking ahead to next week, our natural progression leads us to the elements that most authors use in the BODY section of their documents: images (IMG) and hyperlinks (A). Next week's overview will also start dipping into the bulk of the basic semantic phrase and block markup. See you soon!

This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.

Comments

The forum archive of this article is still available on My Opera.

No new comments accepted.