MAMA: Scripting syntax
- Previous article—MAMA: Scripting - quantities and sizes
- Next article—MAMA: Script identifier tokenization
- Table of contents
Index:
- Introduction
- External script file names
- External script MIME types
- JavaScript function names
- DHTML menu/library usage
- Scripts dynamically creating/writing other technologies
- Mentioning specific browsers in script content
- Other miscellaneous script detections
Introduction
Scripting was detected in 2,617,305 of MAMA's URLs (74.58%), coming from 4 different sources: JavaScript URLs, event-handler attributes, external scripts and embedded scripts. A number of strategies were employed to extract information out of the scripting content that MAMA explored. Substring matching was used, in addition to regular expressions and complex scripting language tokenization. This last method was added fairly late to MAMA's analysis, but it has been an excellent way to discover various factors that were not otherwise available through the simpler substring/regular expression processes.
External script file names
This item of scripting metadata was originally requested to allow easier tracking of JavaScript/DHTML code libraries. A quick survey of authors' use of script libraries shows that they usually do not alter the filenames. Looking at the full frequency table of external script file names, one can easily pick out the names of common script libraries: Prototype, Scriptaculous, Lightbox, Milonic ... it goes on and on. Also noticeable in the list are a number of scripts used for a variety of purposes. Google's Urchin tracking script "urchin.js" is the most popular script name by more than a factor of 2, while their ad-syndication script "show_ads.js" comes in second place. A variety of Adobe/Macromedia scripts can also be found in this list. If you examine file name data and compare them to the frequency tables for script function names (below), it is easy to uncover direct relationships between the two. Of course, often a script is just created from scratch by the page author, and they often do nothing to disguise that fact—external scripts using the file names "script.js" and "scripts.js" are the most popular "obvious" file names found.
File name | Frequency | % Using external script | File name | Frequency | % Using external script | |
---|---|---|---|---|---|---|
urchin.js | 383,870 | 23.25% | init.js | 21,858 | 1.32% | |
show_ads.js | 178,697 | 10.82% | script.js | 21,849 | 1.32% | |
counter.js | 71,294 | 4.32% | scripts.js | 21,301 | 1.29% | |
AC_RunActiveContent.js | 60,428 | 3.66% | getcod.cgi | 20,099 | 1.22% | |
menu.js | 47,936 | 2.90% | hb.js | 19,866 | 1.20% | |
swfobject.js | 43,751 | 2.65% | global.js | 19,655 | 1.19% | |
mm_menu.js | 35,992 | 2.18% | functions.js | 19,503 | 1.18% | |
prototype.js | 31,162 | 1.89% | code-end.js | 19,416 | 1.18% | |
rollover.js | 28,215 | 1.71% | code-start.js | 19,416 | 1.18% | |
common.js | 26,297 | 1.59% | code-middle.js | 19,416 | 1.18% | |
animate.js | 23,656 | 1.43% | lycosRating.js.php | 19,414 | 1.18% |
External script MIME types
MAMA tracked the returned MIME type of external scripts that were downloaded.
It did not trust any explicit values for the Type
attribute (if present) for this information; it relied solely upon the
information received when actually fetching the external resource. The value
"application/x-javascript" is the most popular by
more than a factor of two over the 2nd place MIME type—"text/javascript".
At over 10% of the external script population, the MIME type "text/html"
is surprising in its popularity. One hopes that this is the result of misconfigured
servers and not Web page 404-redirects—there is currently very little MAMA
can do to tell the difference. In all, ~800 scripts reported a VBScript MIME type,
although MAMA found over 100,000 cases of the keyword "vbscript"
in both embedded and external scripts. While there are some believable scenarios
where VBscript could be used in embedded form 10-to-1 compared to external
scripts, the overall ratio of external to embedded scripts does not support
this outcome. It is very possible that a number of servers are not delivering
external VBScript files with the correct MIME type. As with the
external CSS MIME case, many other
MIME types were also observed in the full
MIME frequency table.
MIME type | Frequency | % Using external script | MIME type | Frequency | % Using external script | |
---|---|---|---|---|---|---|
application/x-javascript | 1,282,922 | 77.69% | none | 4,242 | 0.26% | |
text/javascript | 559,688 | 33.89% | application/octetstream | 1,222 | 0.07% | |
text/html | 176,863 | 10.71% | text/js | 1,051 | 0.06% | |
text/plain | 16,684 | 1.01% | text/vbscript | 797 | 0.05% | |
application/javascript | 12,522 | 0.76% | text/css | 461 | 0.03% | |
application/octet-stream | 8,420 | 0.51% | image/gif | 313 | 0.02% | |
text/x-js | 6,283 | 0.38% | texthtml | 298 | 0.02% |
JavaScript function names
MAMA tracked the function names declared in script code. For a number of reasons, library scripts are fairly easy to pick out in the full frequency list, especially near the top:
- Common script libraries are used by many different URLs, but they use the same function naming schemes, and often the same external script file name.
- These libraries almost always have the same or similar frequency rates, so they cluster together in the list for easier detection.
- Because of their proximities in the list, function naming schemes used by libraries stand out.
Top scripting libraries detected by function name
To see script library activity in action, we can look at the top 75 entries in the function name list (cutoff value chosen to demonstrate the proximity effect of libraries in the list):
- The most popular values are Macromedia-related (function names prefixed by "MM_"). The first two have similar frequencies, and the next pair have similar frequencies as well.
- Google's Urchin tracker comes next, with 29 of the top 75 spots, all with VERY similar frequencies (all within the range 384-394,000 times each). The function names are prefixed with "__utm" or "_u". Not coincidentally, an external script file name "urchin.js" was found 383,870 times.
- Google's ad-syndication platform is also well represented in the function name list. The function names are all very compact—typically 1-2 letters long. The entire code for this ad-syndication script is also compacted, with no linefeeds and extra spacing. These function names are all adjacent in the frequency list, being used 160-185,000 times. It is again no coincidence that the external script file name "show_ads.js" was used 178,697 times.
- The following image control/rollover effect functions are very popular and
all seem to be related, based on their similar naming schemes and proximities
in the frequency list:
changeImages
(66,867),preloadImages
(62,570),newImage
(60,512). - Adobe's "Active Content" controls Flash instances in Web pages. These 5 "Active Content" functions have names prefixed by "AC_" and occurred between 60-64,000 times in MAMA. A corresponding external script with the name "AC_RunActiveContent.js" was found 60,428 times and is no doubt related to these instances.
- 2 adjacent entries read and write browser cookies -
getCookie
andsetCookie
. - In the top 75, two function names (
hideMenu
andMenu
) can be found, but if you go below position 75 you can find many more functions obviously relating to menus.
This is just a small sample; a number of other unique prefixes are noticeable by glancing further down the frequency list—Adobe GoLive has many functions prefixed by "CS" (after finding 100 such unique function names, I stopped counting). Functions common to Lycos/Angelfire/Tripod scripts were well represented with the common prefixes "lhb_" (17 times), "LR_" (18 times) and "lycos_" (11 times).
DHTML menu/library usage
This part of MAMA's research began as a desire to locate real-life examples of specific popular DHTML menu systems and libraries so that we could test their functionality in Opera and investigate various issues. I worked with a colleague to identify 1 or 2 substrings from each of these menus/libraries that would uniquely distinguish them from other JavaScript code. Every effort was made to guarantee that the patterns were distinctive, but the criteria used may not be totally reliable. There can, of course, always be the occasional false positive, and future versions of these script libraries may alter some of the (currently) unique criteria that MAMA seeks.
MAMA detected 1,084,593 URLs using at least 1 of the following DHTML Menus or Libraries. In the URLs where these systems were detected, over 60% used the Macromedia functions, while over 1/3 used Google's Urchin tracking system. By comparison, all of the other code libraries were used far less often.
Note: All of the search criteria are case-sensitive regular expressions.
DHTML menu/library name | Search criteria (regexp) | Frequency |
---|---|---|
Macromedia functions from Dreamweaver/Fireworks | Script: / MM_/ | 682,019 |
Google Analytics/Urchin Tracker | Script: /function\s+urchinTracker/ Filename: /^urchin\.js$/ | 384,756 |
Prototype JavaScript Framework | Script: /var\s+Prototype\s+=\s+{\s+Version:\s+/ | 31,423 |
Omniture/SiteCatalyst Analytics | Script: /SiteCatalyst/ , /Omniture/ Filename: /^s_code\.js$/ | 18,468 |
JQuery Library | Script: /jQuery./ Filename: /^jquery.*?\.js$/ | 17,027 |
Dynamic Drive HV Menu | Script: /MbrSetUp/ , /ChildVerticalOverlap/ | 15,111 |
Milonic DHTML Menu | Script: /closeMenusByArray/ , /milonic/ | 13,585 |
WebSideStory/HitBox Analytics | Script: /function\s+_hbEvent/ Filename: /^hbx\.js$/ | 10,963 |
Yahoo YUI! Library | Script: /YAHOO.namespace/ | 7,953 |
Jupitermedia HierMenus | Script: /HM_/ | 7,631 |
Likno AllWebMenus | Script: /awmCreateMenu/ | 5,705 |
OpenCube QuickMenu Pro | Script: /dqm__/ , /DQM_/ | 4,837 |
Dan Steinman's DynAPI | Script: /dynapi/ | 3,471 |
TinyMCE Text Editor | Script: /tinyMCE./ Filename: /tiny_mce\.js/ | 3,432 |
Ultimate Drop Down Menu | Script: /um.menuClasses/ , /\/\/UDMv3/ | 3,334 |
xFx Menu | Script: /dmbtbB/ , /rjsPath/ | 2,490 |
Siteexpert/Xtreeme Menu | Script: /m1.bIncBorder/ | 2,044 |
Freestyle Menu (Angus Turnbull) | Script: /FSMenu.prototype/ | 1,770 |
Cascading Popup Menu (Angus Turnbull) | Script: /PopupMenu.prototype/ | 840 |
MochiKit Library | Script: /MochiKit.MochiKit/ | 248 |
Dojo JavaScript Toolkit | Script: /dojo.js/ | 220 |
Tree Menu | Script: /MTMOutputString/ | 110 |
Scripts dynamically creating/writing other technologies
During MAMA's development process, a number of URL examples exhibited behaviors that appeared to be distressingly common. So common, in fact, that it seemed imperative for MAMA to measure just how frequently it was happening in the wild. Scripts have the ability to dynamically add markup and code to a document, and some even go so far as to dynamically create other scripts. Full script parsing and execution would be necessary to track down, detect, and analyze ALL of these cases, but MAMA is not able to do that in the current version. Instead, MAMA settled for detecting situations where external dependencies are dynamically written in order to gauge the relative importance of this type of behavior. MAMA's discovery that as many as 25% of the URLs using scripting matched its rudimentary "Script writing a Script" criteria definitely warrants future investigation!
Dynamically created CSS and Frames occurrences were much less frequent than the script->script case. All of the checks used simple regular expression substring matches, but in the script->script instance, MAMA added an additional detection in the JavaScript tokenization routine, looking for adjacent quoted string tokens joined by JavaScript's "+" operator. A simple analysis then looked for aggregate strings satisfying MAMA's search criteria.
Scenario | What was detected | Frequency | % Total script usage |
---|---|---|---|
Script writing Script | Substring/Regexp: /<scr/ orparsed JS String tokens containing: /<script/ && /\ssrc\s*=/ | 675,902 | 25.82% |
Script writing CSS | Substring/Regexp: /rel=[\'\"]?stylesheet/ | 95,066 | 3.63% |
Script writing Frames | Substring/Regexp: /<frameset/ | 14,840 | 0.57% |
Mentioning specific browsers in script content
This feature began as a generic question many at Opera had: "How many authors write their Web pages with Opera in mind?". Opera already had evidence that some authors make use of browser-specific workarounds, and this is especially true of scripting. For a simple answer to this question, MAMA detected the use of browser name keywords (case-insensitively)—these were expected to be unique enough to give a good idea of how many authors were considering specific browsers in their development. MAMA's approach searched against all scripting content, including script comments. This method does not give 100% reliable numbers—it can be fairly easy for simple keyword matching to give false-positives, after all. The choice of the keywords used was expected to reveal true browser name mentions in the majority of cases.
It turns out that the most difficult of all the browsers to detect in script is Opera, because authors generally refer to Opera with the single "opera" keyword. This keyword can also match "operator";for example, about 25,000 of MAMA's URLs used the keywords "operator" or "operators". Authors also typically use a single keyword with Safari, but this is not a problem since "safari" is not a substring of any other common word (well, that I know of, anyway).
Browser | Keywords | Frequency | % Total script usage |
---|---|---|---|
Microsoft Internet Explorer | "Internet Explorer", "MSIE" | 916,306 | 35.01% |
Opera | "Opera" | 766,274 | 29.28% |
Mozilla Firefox | "Mozilla", "Gecko", "Firefox" | 475,628 | 18.17% |
Apple Safari | "Safari" | 279,946 | 10.70% |
Other miscellaneous script detections
Many of the items here are detections added to satisfy special requests from those at Opera who needed to quickly gather statistics on script usage. There used to be many more of this type of simple checks, but with the advent of MAMA's newer basic JavaScript tokenizer, they became redundant and were removed. These are the remainders of that older strategy. Some of the following items are important, while others would definitely be considered esoteric or "fringe" data based on the usage numbers. Mostly, it serves as a reminder that you can find any sort of information you like from scripting if you just know how to look for it.
Factor | Motivation | What was detected | Frequency |
---|---|---|---|
Window.open | To help study pop-up-blocking trends | Substring: "window.open" in any script content | 938,210 |
Frame breaking | Internal tool defeated by frame breakers | Substring: "top.location.href" in any script content | 115,564 |
VBScript usage | To find scripting cases using Microsoft's scripting language | Substring "vbscript" (CI) in all opening SCRIPT tags, as well as in any script content | 103,485 |
CSS .filter property | To find sites using MSIE CSS 'filter' property via script (could be name collisions with DOM Traversal) | Substring: ".filter" in any script content | 198,487 |
CSS .display set to "block" | Sites use to dynamically toggle sections | Regexp: /style.display\s*=\s*[\'\"]block/ in any script content | 238,917 |
CSS .display set to "table" or "inline-table" | Testing sites that use this CSS property/value combination | Regexp: /style.display\s*=\s*[\'\"](inline-)?table/ in any script content | 1,543 |
Use of the "eval" keyword | Script engine developer needed live test cases | eval used as a parsed JavaScript identifier | 13,067 |
Aliasing "eval" to another variable | Script engine developer needed live test cases | Regexp: /\=\s*eval[^\w]/ && !~ /\=\s*eval\s*\(/ | 303 |
- Previous article—MAMA: Scripting - quantities and sizes
- Next article—MAMA: Script identifier tokenization
- Table of contents
This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.
Comments
The forum archive of this article is still available on My Opera.