MAMA: Key findings
- Introduction
- Web servers used
- Document structure and size
- HTML markup validation
- Flash detection
- XMLHttpRequest object detection
- CSS
- Scripting
Introduction
This article provides some of MAMA's most interesting findings, to offer a quick glimpse of what MAMA is capable of and to whet the reader's appetite for the more intricate results found in the rest of the study.
In this study, MAMA examined 3,509,180 URLs in 3,011,668 domains. More details about MAMA's URL set and how it was selected are available in another document.
Web servers used
The eternal tug-of-war continues between the two biggest Web server giants, Apache and Microsoft's IIS. Data from Netcraft at the time of MAMA's analysis details the split as follows:
- Apache: 50.76%
- IIS: 35.84%
The balance between these Web servers as represented in MAMA is skewed more in favor of Apache:
- Apache: 2,011,088 domains (67.72%)
- IIS: 769,375 domains (25.91%)
MAMA's data on Web server usage is extracted from the Server
field of each
HTTP Header analyzed.
For more details on HTTP headers, please see the HTTP headers report; you can also find even greater detail in the full HTTP headers results
Document structure and size
At the heart of the Web are the markup documents themselves. To start with, we will give you the following quick findings:
- The size of the average primary document in MAMA is almost 16,500 characters.
- 1,788,294 of URLs carried a Doctype statement—just over 50%!
- HTML Doctypes outnumbered XHTML Doctypes by about 2 to 1.
- "Transitional" Doctype flavors dominated over their "strict" and "frameset" variants by more than 10 to 1.
- Approximately 85% of all of MAMA's pages would be rendered in browsers using their "Quirks" modes.
Which HTML elements do authors prefer? Structural ones. Five of the top six elements
in MAMA are structural markup, although functional elements are certainly not
sitting in the corner weeping. Hyperlinks, images, and tables are popular as well.
Hyperlinks (the A
element) make the Web into an actual "web" of
documents, so its primacy as the top functional element is expected. Likewise, the
grandfather of all multimedia elements, IMG
, is the second most popular
functional element. The overwhelming use of tables will probably disturb some
markup purists ... after all, those instances cannot ALL be actual
data tables (not even the majority).
Speaking of semantics, markup elements with the lowest MAMA representation are generally phrase elements with defined semantics. Authors do not seem to embrace them in significant numbers. Enough talk; let's get on to the raw numbers. The first table below shows the 10 most popular HTML 4.x elements.
Element | Frequency | Element | Frequency | |
---|---|---|---|---|
HEAD |
3,464,519 | META |
3,276,347 | |
TITLE |
3,459,207 | IMG |
3,219,487 | |
HTML |
3,452,975 | TABLE |
2,894,184 | |
BODY |
3,452,907 | TD |
2,891,972 | |
A |
3,307,397 | TR |
2,891,205 |
Next, we have the 10 least popular HTML 4.x elements.
Element | Frequency | Element | Frequency | |
---|---|---|---|---|
TFOOT |
3,947 | INS |
1,344 | |
DFN |
3,584 | KBD |
1,313 | |
MENU |
1,906 | VAR |
1,258 | |
Q |
1,785 | DEL |
1,243 | |
SAMP |
1,609 | BDO |
167 |
See the full lists of markup elements encountered, and markup attributes found.
HTML markup validation
MAMA ran every single URL it analyzed through the W3C validator; the validator's SOAP response contains a binary true/false result of the validation. A "true" value is considered a successful validation.
MAMA found that 145,009 out of 3,509,180 URLs passed validation—only 4.13%!. Even though this ratio shows great improvement over the results of previous validation studies (see the table below), this is a very worrying figure, which shows that there is a lot of Web standards education still to be done to increase these levels. The table below shows the trend of improvement between MAMA and previous studies.
Study | Date | Total Validated |
Passed Validation |
Percentage |
---|---|---|---|---|
Parnas | Dec 2001 | 2,034,788 | 14,563 | 0.71% |
Saarsoo | Jun 2006 | 1,002,350 | 25,890 | 2.58% |
MAMA | Jan 2008 | 3,509,180 | 145,009 | 4.13% |
Another related statistic MAMA uncovered was that, of the number of sites proudly displaying "W3C validation badges", only ~50% of them actually validate. There are likely many reasons for this disparity, but it is obvious that such badges are not effective at representing the current validation state of a page.
For more details on the subject of markup validation, see the Markup validation report, or the full validation study.
Flash detection
The total number of MAMA URLs using the Flash plugin is 1,176,227 (33.5%). Usage of Flash was determined by looking for any of the following items:
- Any
PARAM
element containing the substrings ".swf" or "flash" - Any
EMBED
/Src
orOBJECT
/Data
attribute values pointing to content with a MIME type using the substring "flash" - Any scripting content with the substring "flash" or ".swf"
The table below details the incidence of Flash in MAMA's URLs, broken down into the top 20 countries. The usage rates overall are fairly high, as one would expect from a cross-platform plugin with the popularity of Flash. Even at its "worst", it never dips below 25%—at least 1 in 4 pages uses Flash. Flash usage in some countries can even be considered extraordinary; China wins the prize for highest use at just over 67%, and Turkey comes next at almost 60%. Usage rates by country are typically between 30-40%.
Country | Total URLs From Country |
# Usage Of Flash |
% Usage Of Flash |
Country | Total URLs From Country |
# Usage Of Flash |
% Usage Of Flash |
|
---|---|---|---|---|---|---|---|---|
United States | 1,477,436 | 481,250 | 32.57% | Denmark | 50,875 | 12,888 | 25.33% | |
Germany | 407,638 | 101,914 | 25.00% | Australia | 49,982 | 15,069 | 30.15% | |
Great Britain | 244,554 | 74,037 | 30.27% | Switzerland | 49,683 | 13,714 | 27.60% | |
France | 139,400 | 57,968 | 41.58% | Russia | 40,790 | 13,370 | 32.78% | |
Italy | 137,070 | 55270 | 40.32% | Sweden | 33,654 | 9,321 | 27.70% | |
Canada | 133,506 | 41,316 | 30.95% | China | 31,345 | 21,010 | 67.03% | |
Japan | 124,976 | 39,674 | 31.75% | Czech Republic | 26,728 | 11,520 | 43.10% | |
Netherlands | 79,562 | 29,600 | 37.20% | Austria | 24,563 | 6,783 | 27.61% | |
Spain | 76,421 | 35,339 | 46.24% | Norway | 21,185 | 7,878 | 37.19% | |
Poland | 58,929 | 24,971 | 42.37% | Turkey | 18,621 | 11,145 | 59.85% |
There is a further article available providing more detailed information on Plugin usage.
XMLHttpRequest object detection
The XMLHttpRequest DOM object is an important part of AJAX, which facilitates responsiveness and interactivity in Web applications. MAMA detected XMLHttpRequest usage by tokenizing all identifiers in script components, and looking for the complete string "XMLHttpRequest" to satisfy the condition. The following table shows the number of MAMA pages using this DOM feature in MAMA's top 20 countries.
Overall, XMLHttpRequest was used in 112,277 of MAMA's URLs (3.20% of all its Web pages or 4.29% of all MAMA's Web pages that used script). Japan showed the least usage, while Norway (Opera's home country) exhibited the highest usage rates at 10.18%. Saarsoo's previous study from June 2006 indicated overall usage of only 1.90%, so there is a significant upward trend in usage of XMLHttpRequest.
Country | Total URLs From Country |
# Usage Of XMLHttpRequest |
% Usage Of XMLHttpRequest |
Country | Total URLs From Country |
# Usage Of XMLHttpRequest |
% Usage Of XMLHttpRequest |
|
---|---|---|---|---|---|---|---|---|
United States | 1,477,436 | 52,640 | 3.56% | Denmark | 50,875 | 1,966 | 3.86% | |
Germany | 407,638 | 9,147 | 2.24% | Australia | 49,982 | 1,681 | 3.36% | |
Great Britain | 244,554 | 7,402 | 3.03% | Switzerland | 49,683 | 1,514 | 3.05% | |
France | 139,400 | 5,129 | 3.68% | Russia | 40,790 | 1,219 | 2.99% | |
Italy | 137,070 | 2,641 | 1.93% | Sweden | 33,654 | 1,387 | 4.12% | |
Canada | 133,506 | 4,391 | 3.29% | China | 31,345 | 1,582 | 5.05% | |
Japan | 124,976 | 1,092 | 0.87% | Czech Republic | 26,728 | 771 | 2.88% | |
Netherlands | 79,562 | 4,101 | 5.15% | Austria | 24,563 | 511 | 2.08% | |
Spain | 76,421 | 1,894 | 2.48% | Norway | 21,185 | 2,157 | 10.18% | |
Poland | 58,929 | 1,593 | 2.70% | Turkey | 18,621 | 864 | 4.64% |
There are further articles available providing more detailed information on script tokenization - links can found at this URL.
CSS
CSS is clearly a dominant Web technology, found in 2,821,141 MAMA URLs (80.39%). Several methods are available to authors for using style sheets, and MAMA detected all of them. Let us look at some quick results:
- Embedded and Inline CSS (via the
STYLE
element andStyle
attributes respectively) each had an average length of about 1,000 characters. - The average length of external CSS (referenced via the
LINK
element) is about 8,500 characters. - External and Inline CSS were both used in approximately 65% of all CSS cases, while Embedded CSS is found in ~45% of all CSS ... there is obviously some significant overlap between these CSS inclusion methods.
CSS was meant to supplant the previous markup-only attempts at specifying a Web
page's look-and-feel. The old methods are perhaps best typified by the use of the
FONT
element. As of now, CSS usage is more popular than FONT
use (FONT
was used 2,061,417 times vs. 2,821,141 times for CSS).
FONT
is still used often as a visual fallback in the majority of
cases - 1,592,546 URLs used both CSS and the FONT
element.
With CSS being so prevalent, which properties are the primary choice of page authors? And which properties do they stay away from? The following are popularity lists for the top 10 and bottom 10 CSS properties encountered from CSS 2.1. Control of font characteristics are the dominant use of CSS, while properties with narrow browser support are used the least. The mantra of "if you build it, they will come" may apply here. First, the 10 most popular:
CSS Property | Frequency | CSS Property | Frequency | |
---|---|---|---|---|
Color |
2,400,643 | Background-color |
1,698,366 | |
Font-size |
2,336,689 | Width |
1,596,974 | |
Font-family |
2,223,829 | Text-align |
1,448,336 | |
Text-decoration |
2,113,412 | Height |
1,428,991 | |
Font-weight |
2,012,992 | Border |
1,376,821 |
Now, the 10 least popular:
CSS Property | Frequency | CSS Property | Frequency | |
---|---|---|---|---|
Page-break-inside |
5,075 | Outline-color |
1,653 | |
Caption-side |
4,666 | Outline-width |
1,571 | |
Quotes |
2,849 | Orphans |
1,499 | |
Widows |
2,092 | Counter-increment |
292 | |
Outline-style |
1,744 | Counter-reset |
247 |
See the CSS properties list for all the CSS properties MAMA detected
Scripting
Scripting is the other dominant and integral technology used with HTML markup. Some short facts follow:
- MAMA found at least one form of scripting in 2,617,305 MAMA URLs (74.58%).
- MAMA detected the 4 main methods of including script in documents. Of these
methods, the most popular was embedding code in a
SCRIPT
element, used in over 88% of all pages using scripting! External scripts and event handler attributes had similar popularity, being used in ~65% of all URLs that used script. JavaScript protocol URLs were found in less than 20% of scripted URLs. - The sizes of scripts used in documents varied greatly, depending on the method used. External scripts were typically the largest, averaging ~26,500 characters. Embedded scripts averaged ~2,500 characters, while event-handler attributes and JavaScript protocol URLs each averaged about 500 characters each.
- Finally, the use of scripting was dominated via JavaScript by a WIDE margin, with
VBScript relegated to also-ran status. The substring "vbscript"
was searched for in opening
SCRIPT
tags, as well as in any script content. It was found in only 103,485 (3.95%) of the URLs that used scripting.
There are further articles available providing more detailed information on Scripting syntax and features.
This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.
Comments
The forum archive of this article is still available on My Opera.