MAMA: Key findings

By Brian Wilson

Index:
  1. Introduction
  2. Web servers used
  3. Document structure and size
  4. HTML markup validation
  5. Flash detection
  6. XMLHttpRequest object detection
  7. CSS
  8. Scripting

Introduction

This article provides some of MAMA's most interesting findings, to offer a quick glimpse of what MAMA is capable of and to whet the reader's appetite for the more intricate results found in the rest of the study.

In this study, MAMA examined 3,509,180 URLs in 3,011,668 domains. More details about MAMA's URL set and how it was selected are available in another document.

Web servers used

The eternal tug-of-war continues between the two biggest Web server giants, Apache and Microsoft's IIS. Data from Netcraft at the time of MAMA's analysis details the split as follows:

  • Apache: 50.76%
  • IIS: 35.84%

The balance between these Web servers as represented in MAMA is skewed more in favor of Apache:

  • Apache: 2,011,088 domains (67.72%)
  • IIS: 769,375 domains (25.91%)

MAMA's data on Web server usage is extracted from the Server field of each HTTP Header analyzed.

For more details on HTTP headers, please see the HTTP headers report; you can also find even greater detail in the full HTTP headers results

Document structure and size

At the heart of the Web are the markup documents themselves. To start with, we will give you the following quick findings:

  • The size of the average primary document in MAMA is almost 16,500 characters.
  • 1,788,294 of URLs carried a Doctype statement—just over 50%!
  • HTML Doctypes outnumbered XHTML Doctypes by about 2 to 1.
  • "Transitional" Doctype flavors dominated over their "strict" and "frameset" variants by more than 10 to 1.
  • Approximately 85% of all of MAMA's pages would be rendered in browsers using their "Quirks" modes.

Which HTML elements do authors prefer? Structural ones. Five of the top six elements in MAMA are structural markup, although functional elements are certainly not sitting in the corner weeping. Hyperlinks, images, and tables are popular as well. Hyperlinks (the A element) make the Web into an actual "web" of documents, so its primacy as the top functional element is expected. Likewise, the grandfather of all multimedia elements, IMG, is the second most popular functional element. The overwhelming use of tables will probably disturb some markup purists ... after all, those instances cannot ALL be actual data tables (not even the majority).

Speaking of semantics, markup elements with the lowest MAMA representation are generally phrase elements with defined semantics. Authors do not seem to embrace them in significant numbers. Enough talk; let's get on to the raw numbers. The first table below shows the 10 most popular HTML 4.x elements.

The 10 most popular HTML 4.x elements
Element Frequency   Element Frequency
HEAD 3,464,519 META 3,276,347
TITLE 3,459,207 IMG 3,219,487
HTML 3,452,975 TABLE 2,894,184
BODY 3,452,907 TD 2,891,972
A 3,307,397 TR 2,891,205

Next, we have the 10 least popular HTML 4.x elements.

The 10 least popular HTML 4.x elements
Element Frequency   Element Frequency
TFOOT 3,947 INS 1,344
DFN 3,584 KBD 1,313
MENU 1,906 VAR 1,258
Q 1,785 DEL 1,243
SAMP 1,609 BDO 167

See the full lists of markup elements encountered, and markup attributes found.

HTML markup validation

MAMA ran every single URL it analyzed through the W3C validator; the validator's SOAP response contains a binary true/false result of the validation. A "true" value is considered a successful validation.

MAMA found that 145,009 out of 3,509,180 URLs passed validation—only 4.13%!. Even though this ratio shows great improvement over the results of previous validation studies (see the table below), this is a very worrying figure, which shows that there is a lot of Web standards education still to be done to increase these levels. The table below shows the trend of improvement between MAMA and previous studies.

Markup validation studies
Study Date Total
Validated
Passed
Validation
Percentage
Parnas Dec 2001 2,034,788 14,563 0.71%
Saarsoo Jun 2006 1,002,350 25,890 2.58%
MAMA Jan 2008 3,509,180 145,009 4.13%

Another related statistic MAMA uncovered was that, of the number of sites proudly displaying "W3C validation badges", only ~50% of them actually validate. There are likely many reasons for this disparity, but it is obvious that such badges are not effective at representing the current validation state of a page.

For more details on the subject of markup validation, see the Markup validation report, or the full validation study.

Flash detection

The total number of MAMA URLs using the Flash plugin is 1,176,227 (33.5%). Usage of Flash was determined by looking for any of the following items:

  • Any PARAM element containing the substrings ".swf" or "flash"
  • Any EMBED/Src or OBJECT/Data attribute values pointing to content with a MIME type using the substring "flash"
  • Any scripting content with the substring "flash" or ".swf"

The table below details the incidence of Flash in MAMA's URLs, broken down into the top 20 countries. The usage rates overall are fairly high, as one would expect from a cross-platform plugin with the popularity of Flash. Even at its "worst", it never dips below 25%—at least 1 in 4 pages uses Flash. Flash usage in some countries can even be considered extraordinary; China wins the prize for highest use at just over 67%, and Turkey comes next at almost 60%. Usage rates by country are typically between 30-40%.

URLs using Flash in MAMA's top 20 countries
Country Total URLs
From Country
# Usage
Of Flash
% Usage
Of Flash
  Country Total URLs
From Country
# Usage
Of Flash
% Usage
Of Flash
United States 1,477,436 481,250 32.57% Denmark 50,875 12,888 25.33%
Germany 407,638 101,914 25.00% Australia 49,982 15,069 30.15%
Great Britain 244,554 74,037 30.27% Switzerland 49,683 13,714 27.60%
France 139,400 57,968 41.58% Russia 40,790 13,370 32.78%
Italy 137,070 55270 40.32% Sweden 33,654 9,321 27.70%
Canada 133,506 41,316 30.95% China 31,345 21,010 67.03%
Japan 124,976 39,674 31.75% Czech Republic 26,728 11,520 43.10%
Netherlands 79,562 29,600 37.20% Austria 24,563 6,783 27.61%
Spain 76,421 35,339 46.24% Norway 21,185 7,878 37.19%
Poland 58,929 24,971 42.37% Turkey 18,621 11,145 59.85%

There is a further article available providing more detailed information on Plugin usage.

XMLHttpRequest object detection

The XMLHttpRequest DOM object is an important part of AJAX, which facilitates responsiveness and interactivity in Web applications. MAMA detected XMLHttpRequest usage by tokenizing all identifiers in script components, and looking for the complete string "XMLHttpRequest" to satisfy the condition. The following table shows the number of MAMA pages using this DOM feature in MAMA's top 20 countries.

Overall, XMLHttpRequest was used in 112,277 of MAMA's URLs (3.20% of all its Web pages or 4.29% of all MAMA's Web pages that used script). Japan showed the least usage, while Norway (Opera's home country) exhibited the highest usage rates at 10.18%. Saarsoo's previous study from June 2006 indicated overall usage of only 1.90%, so there is a significant upward trend in usage of XMLHttpRequest.

URLs using XMLHttpRequest in MAMA's top 20 countries
Country Total URLs
From Country
# Usage Of
XMLHttpRequest
% Usage Of
XMLHttpRequest
  Country Total URLs
From Country
# Usage Of
XMLHttpRequest
% Usage Of
XMLHttpRequest
United States 1,477,436 52,640 3.56% Denmark 50,875 1,966 3.86%
Germany 407,638 9,147 2.24% Australia 49,982 1,681 3.36%
Great Britain 244,554 7,402 3.03% Switzerland 49,683 1,514 3.05%
France 139,400 5,129 3.68% Russia 40,790 1,219 2.99%
Italy 137,070 2,641 1.93% Sweden 33,654 1,387 4.12%
Canada 133,506 4,391 3.29% China 31,345 1,582 5.05%
Japan 124,976 1,092 0.87% Czech Republic 26,728 771 2.88%
Netherlands 79,562 4,101 5.15% Austria 24,563 511 2.08%
Spain 76,421 1,894 2.48% Norway 21,185 2,157 10.18%
Poland 58,929 1,593 2.70% Turkey 18,621 864 4.64%

There are further articles available providing more detailed information on script tokenization - links can found at this URL.

CSS

CSS is clearly a dominant Web technology, found in 2,821,141 MAMA URLs (80.39%). Several methods are available to authors for using style sheets, and MAMA detected all of them. Let us look at some quick results:

  • Embedded and Inline CSS (via the STYLE element and Style attributes respectively) each had an average length of about 1,000 characters.
  • The average length of external CSS (referenced via the LINK element) is about 8,500 characters.
  • External and Inline CSS were both used in approximately 65% of all CSS cases, while Embedded CSS is found in ~45% of all CSS ... there is obviously some significant overlap between these CSS inclusion methods.

CSS was meant to supplant the previous markup-only attempts at specifying a Web page's look-and-feel. The old methods are perhaps best typified by the use of the FONT element. As of now, CSS usage is more popular than FONT use (FONT was used 2,061,417 times vs. 2,821,141 times for CSS). FONT is still used often as a visual fallback in the majority of cases - 1,592,546 URLs used both CSS and the FONT element.

With CSS being so prevalent, which properties are the primary choice of page authors? And which properties do they stay away from? The following are popularity lists for the top 10 and bottom 10 CSS properties encountered from CSS 2.1. Control of font characteristics are the dominant use of CSS, while properties with narrow browser support are used the least. The mantra of "if you build it, they will come" may apply here. First, the 10 most popular:

10 most popular CSS 2.1 properties
CSS Property Frequency   CSS Property Frequency
Color 2,400,643 Background-color 1,698,366
Font-size 2,336,689 Width 1,596,974
Font-family 2,223,829 Text-align 1,448,336
Text-decoration 2,113,412 Height 1,428,991
Font-weight 2,012,992 Border 1,376,821

Now, the 10 least popular:

10 least popular CSS 2.1 properties
CSS Property Frequency   CSS Property Frequency
Page-break-inside 5,075 Outline-color 1,653
Caption-side 4,666 Outline-width 1,571
Quotes 2,849 Orphans 1,499
Widows 2,092 Counter-increment 292
Outline-style 1,744 Counter-reset 247

See the CSS properties list for all the CSS properties MAMA detected

Scripting

Scripting is the other dominant and integral technology used with HTML markup. Some short facts follow:

  • MAMA found at least one form of scripting in 2,617,305 MAMA URLs (74.58%).
  • MAMA detected the 4 main methods of including script in documents. Of these methods, the most popular was embedding code in a SCRIPT element, used in over 88% of all pages using scripting! External scripts and event handler attributes had similar popularity, being used in ~65% of all URLs that used script. JavaScript protocol URLs were found in less than 20% of scripted URLs.
  • The sizes of scripts used in documents varied greatly, depending on the method used. External scripts were typically the largest, averaging ~26,500 characters. Embedded scripts averaged ~2,500 characters, while event-handler attributes and JavaScript protocol URLs each averaged about 500 characters each.
  • Finally, the use of scripting was dominated via JavaScript by a WIDE margin, with VBScript relegated to also-ran status. The substring "vbscript" was searched for in opening SCRIPT tags, as well as in any script content. It was found in only 103,485 (3.95%) of the URLs that used scripting.

There are further articles available providing more detailed information on Scripting syntax and features.

This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.

Comments

The forum archive of this article is still available on My Opera.

No new comments accepted.