MAMA script tokenization: DOM

By Brian Wilson

Index:

  1. Introduction
  2. Character- and text-related objects, properties, and methods
  3. CSS related objects, properties, and methods
  4. The Document object
  5. The Element object
  6. Event-related objects, properties, and methods
  7. The Location object
  8. Miscellaneous objects, properties, and methods
  9. The Navigator object
  10. The Node object
  11. The Range object
  12. The Screen object
  13. Table related objects, properties, and methods
  14. Traversal-related objects, properties, and methods
  15. The Window object
  16. XML related objects, properties, and methods

Introduction

Scripting use was found in 2,617,305 of the URLs that MAMA analyzed. This section is devoted to the results uncovered for 294 DOM-related keywords in 15 categories encompassing the largest objects and conceptual areas of the DOM. JavaScript/ECMAScript syntax keywords are covered in the JavaScript/ECMAScript tokenization section.

Fig 1-1: Overall use of JavaScript/ECMAScript factors
ObjectsFrequency ObjectsFrequency
Window2,366,008Screen899,431
Document2,353,632XML-related694,702
Navigator1,553,086Miscellaneous263,161
Location1,511,874Character-related146,606
Event-related1,379,606Range83,590
Element1,336,464Table-related75,110
CSS-related1,066,861Traversal-related14,414
Node946,815  

Character- and text-related objects, properties, and methods

These properties and methods of the CharacterData and Text objects were discovered in 146,606 URLs. It is quite obvious that the keyword data is the one inflating this tally. While it is possible that all instances of the data keyword found are for the CharacterData object property, that outcome strains credibility too far. The word "Data" is too common for this particular usage to be unique—so, take that number with a grain of salt. The other various properties and methods here were very rarely detected, and they are dwarfed by comparison to simple JavaScript String object functions.

Fig 2-1: DOM Character and Text-related properties and methods
Property/
method
Frequency Property/
method
Frequency
data144,526appendData118
splitText2,527deleteData47
insertData659substringData20
replaceData304  

CSS related objects, properties, and methods

These keywords are used as the properties and methods of the CSS2Properties and CSSStyleSheet and objects. They were detected in 1,066,861 of MAMA's URLs, with the style keyword being used in almost every single one of those cases. Because "style" is a rather common concept when dealing with pages, it is possible that some of the uses in the keyword's overall count do not have to do with controlling CSS properties. However, unlike with other keywords that might have multiple uses, controlling CSS is expected to be the dominant use for the keyword—historically, it is the easiest method for accessing and changing CSS property information via script. The next most popular keyword, cssText, shows significantly higher use rates than the remaining methods; it is another simple older method for changing CSS property information (except with the ability to control entire CSS rules instead of individual properties).

Adding CSS to a document (addRule and insertRule) is apparently more popular than removing CSS. The IE-specific method for adding CSS (addRule) was found to be significantly more popular than the W3C DOM method (insertRule), but the W3C DOM method for removing CSS (deleteRule) was slightly more popular than the IE-specific method (removeRule). MAMA found a slight clustering in the 15,000 range with the cssRules, selectorText, and addRule keywords—suggesting that they might be commonly used together, but this only held true for cssRules and selectorText, which were used in the same URLs 12,162 times. The similar usage rate of addRule seems to be merely a coincidence.

Fig 3-1: DOM CSS-related properties and methods
Property/
method
Frequency Property/
method
Frequency
style1,066,508addRule14,520
cssText75,677insertRule4,110
cssRules16,056deleteRule752
selectorText15,960removeRule569

The Document object

Forty keywords were associated with the Document object for MAMA, with 2,353,632 of the URLs having at least one of the keyword snippets (89.93% of all URLs using script). The parent substring document has the highest popularity here, and it actually has the highest occurrence of ANY tokenized keyword detected by MAMA (in 89.63% of all script). This could be persuasive proof in demonstrating that dynamically changing the document is the most popular use of JavaScript.

The getElementById and write keywords are understandably quite popular, being the basic historic methods for addressing and dynamically creating parts of a document; each was found in over 50% of all script cases. The W3C DOM method of addressing content document.getElementById is more popular than the MSIE-originated document.all by a comfortable margin. The getElementById method is almost twice as popular as getElementsByTagName, and both trounce getElementsByName by a wide margin. The write method is clearly preferred by authors over writeln 4.5 to 1.

Other keywords from the Document object can tell us a lot about many aspects of usage in Web-page authoring. Detection of the layers keyword is actually the most common process used to sniff out (browser sniff) Netscape Navigator 4, which explains why the use of this keyword in script is so large compared to the LAYER element (the script keyword is used over 34 times as much)! The cookie keyword can give a good measure of how often client-side cookies are used by script (22.41% of all Web pages)—this is probably a much better measure than the Navigator object's cookieEnabled property reflecting only 45,411 cases. The images keyword here is just one useful factor in determining whether scripting is dynamically controlling images; top keywords from the token remainders list also suggest Image usage (src, width, Image and height). These could also be leveraged to discover scripts that are manipulating Images. Direct use of the FORM, INPUT or SELECT elements in markup were detected in 1,068,842 cases, while the DOM level 0 forms keyword was detected 665,305 times. However, these factors occurred together only 293,048 times. What this disparity might suggest about forms control via script is not really clear—perhaps, in a significant number of cases, form widgets are generated dynamically.

Usage of the createRange keyword generally agrees with the numbers for keywords in the section on the Range object. The various Range keywords were detected in 83,590 cases, while the Document object createRange keyword was used 75,703 times. Regarding the createTextNode keyword usage—there was a concern previously mentioned that the data keyword being used 144,526 times might be artificially inflated, but with createTextNode discovered in 125,308 URLs, the data rate suddenly doesn't seem that unnatural.

Fig 4-1: DOM Document object properties and methods
(Please see also the full frequency table.)
Property/
method
Frequency Property/
method
Frequency  Property/
method
Frequency
document2,345,827 getElementsByTagName797,464 URL382,120
getElementById1,484,601 cookie786,427 writeln312,995
write1,401,743 body746,071 lastModified229,841
all1,145,064 createElement731,116 links173,607
referrer959,234 forms665,305 createTextNode125,308
images901,477 domain528,066 anchors122,835
layers898,064 documentElement419,297 defaultView92,977

The Element object

The keywords collected under the Element object umbrella were found in 1,336,464 URLs from MAMA. The MSIE shorthand innerHTML, which is used to read and dynamically write content in a document, is very popular. If we compare innerHTML to document.createElement or any of the Node object's methods for accessing and writing child nodes, it appears that it may actually be less popular these days than equivalent W3C DOM methods. Writing attributes with the setAttribute method appears to be a more frequent authoring task than merely reading it with getAttribute, and by comparison the removeAttribute method barely registers against both of the others.

The currentStyle keyword (used 111,964 times) comes from IE and is only slightly more widespread than the W3C DOM version window.getComputedStyle (used 99,815 times). These two methods of accessing a browser's CSS interpretation share usage in a large majority of the cases (92,505 times), indicating an author preference to get the job done using any and all methods at their disposal.

The offset/scroll methods that originated in IE show some interesting trends. offsetTop and offsetLeft are more popular than either offsetHeight and offsetWidth. Similarly, Top and Left are both more popular than Height and Width for the "scroll" methods. The Top and Height properties are always more popular than the Left and Width properties for both the offset and scroll method groups. In cases where the Left and Width component methods are used, the overwhelming majority (more than 90% each) are used in conjunction with the more dominant Top/Height methods.

Fig 5-1: DOM Element object properties and methods
(Please see also the full frequency table.)
Property/
method
Frequency Property/
method
Frequency  Property/
method
Frequency
id1,007,621className359,699getAttribute299,346
innerHTML695,329offsetHeight353,416scrollLeft283,749
setAttribute413,403scrollTop352,061scrollHeight252,315
offsetTop370,397offsetWidth339,529tagName245,805
offsetLeft361,448offsetParent330,524currentStyle111,964

Event-related objects, properties, and methods

These keywords were detected in 1,379,606 URLs, which suggests that more than half of all URLs using events in some manner. The onload keyword was found to be the most popular of all the keywords, and more popular than any other directly addressed event by almost a factor of two. The W3C DOM addEventListener keyword and the MSIE-originated attachEvent are the next most popular values. addEventListener and attachEvent are used together 539,193 times, indicating that authors clearly prefer to cover their bases by using both methods. Adding/attaching events is far more popular than removing/detaching events, both by a factor of ~7 to 1. When removeEventListener and detachEvent methods are used, it is almost always with related add/attach methods— code class="svar">addEventListener and removeEventListener are used together 94,918 times, while attachEvent and detachEvent are found together in 80,114 URLs.

Looking at some of the specific event properties, the client coordinate properties are approximately three times as popular as the offset coordinate properties. clientX and clientY are used in very similar frequencies, indicating an affinity for being used together, and this is definitely the case—they are used together in 159,487 instances (>95% of all client* cases). The offsetX and offsetY properties manifest the same pattern; they are used together in 49,780 URLs (>97% of all offset* cases).

Two forms-related events (reset and submit) can be triggered by the DOM. The usage of the submit keyword was detected 279,761 times. This is only 30.89% of the number of URLs using submittal triggers with HTML forms (905,731 times for INPUT Type="submit" and INPUT Type="Image" combined). On the other hand, authors emphatically prefer to have form reset behavior in their control via script rather than leaving it to the author - INPUT Type="Reset" was found 17,417 times, but the reset keyword was detected in 69,536 URLs.

Note: The reset keyword is generic enough that it may be experiencing name collisions with non-forms usages, which could possibly cause inflated numbers.

Fig 6-1: DOM Event-related object properties and methods
(Please see also the full frequency table.)
Property/
method
Frequency Property/
method
Frequency  Property/
method
Frequency
onload661,791 onmouseover263,944 returnValue125,882
addEventListener640,385 srcElement238,556 stopPropagation111,280
attachEvent570,029 onmousedown177,502 onunload106,075
onclick356,291 clientX166,210 preventDefault105,364
onresize317,844 clientY163,762 onmousemove97,793
onerror294,842 onmouseup161,678 removeEventListener95,032
submit279,761 cancelBubble153,350 toElement91,460
onmouseout265,180 keyCode144,397 detachEvent80,137

DOM Event type/Event-handler usage comparison

The following comparisons are intriguing in what they reveal about event usage and authoring tendencies. With some events authors clearly prefer either the HTML event-handler version or the DOM version; in only a few cases is there no authoritative bias. Authors opt for the HTML event handler version of events with Onclick, Onmouseout, Onmouseover, Onsubmit, Onfocus, Onblur and Onchange (essentially, basic mouse and form events). The DOM event version is favored for onresize, onerror, onmousedown, onmouseup, onunload, onmousemove, onkeydown, onkeyup, onabort, ondblclick and onreset. Some events can be paired together by their natures; for example, a mousedown event yields to onmouseup event (or else a reader's hand will get really tired). No other coupling demonstrates this connection better than the onmouseover and onmouseout keywords, used in 253,222 cases together (over 95%).

Fig 6-2: DOM event vs. HTML event handler usage comparison
Event typeDOM event
frequency
HTML event
handler
frequency
  Event typeDOM event
frequency
HTML event
handler
frequency
  Event typeDOM event
frequency
HTML event
handler
frequency
onload661,791772,567onmouseup161,67841,497 onblur31,19088,175
onclick356,291684,117onunload106,07534,612 onchange26,861163,476
onresize317,84417,950onmousemove97,7937,173 onkeyup17,1299,874
onerror294,8424,892onsubmit55,652152,286 onabort8,169255
onmouseout265,180998,854onfocus50,100197,235 ondblclick6,4212,416
onmouseover263,9441,051,631onkeydown46,18614,518 onreset1,561200
onmousedown177,50257,049onkeypress42,78228,601 onselect1,106736

The Location object

Overall, the keywords from this group were found in 1,511,874 of MAMA's URLs. Most of these keywords have name collisions with other objects, so the frequency amounts are most definitely inflated beyond any totals that could be tallied solely for Location object usages. The main sources of name collision are the replace/search keywords also used by the String object, and the Location object shares all of its properties with the Link object.

Another factor to consider with these (and other) keywords is their use in conjunction with script library usage. Some of the script libraries are used so often that an object, property and/or method's use in a library would strongly skew usage numbers upward. For instance, the protocol keyword is used by Google's Urchin tracker and that alone would represent over 75% of its overall usage. The effects of script library usage can not be treated lightly!

Fig 7-1: DOM Location object properties and methods
Property/
method
Frequency Property/
method
Frequency  Property/
method
Frequency
href1,156,937hash484,143host93,132
replace710,059hostname474,023force13,392
search658,995pathname466,921port10,443
protocol506,825reload304,513  

Miscellaneous objects, properties, and methods

This was a "catch-all" group used to group together various keywords that didn't fit into the other DOM object groups. Some of the keywords were leftover legacy checks that MAMA had sought prior to the tokenization effort. Others were hand-picked from objects only having one or a few properties that might be of later interest. To reiterate, putting keywords here simply enabled faster searching capability. Otherwise they would have continued on to the final remainder token group I termed "The Others". The "item" keyword here has the highest usage—it can be used to access the components of a variety of different objects; however, it is also a generic keyword that could result in some name collision. The remaining curiosity in this group is the hasFeature keyword. With only 4,629 instances in a churning sea of heavy DOM usage, this shortcut method for detecting feature support has gained almost NO authoring traction and appears to be a failure.

Fig 8-1: Misc. DOM objects, properties, and methods
Property/methodFrequency Property/methodFrequency
item238,099getClientRects5,611
specified23,664hasFeature4,629
getBoundingClientRect16,574namedItem356
createDocument11,822ownerElement21

The use of appVersion, appName and userAgent in conjunction with parseInt and indexOf/substring has previously been discussed, resulting in a proclamation that those methods are usually coupled to Navigator object usage to enable browser sniffing. Now we can look at how these 3 Navigator object keywords are used together by authors.

At least one of the Navigator object keywords were found in 1,553,086 of MAMA's URLs, and for the top three keywords, the count is still a very high 1,345,468 URLs, or 51.41% of all Script cases. They are all used together in only 319,289 of those instances. The strongest affinity between these keywords is with appName and appVersion; they are used together in 664,239 cases, or about 75% of their respective totals in isolation.

Elsewhere in MAMA's research, we looked at the criteria MAMA used to judge when Java and Flash were in use. We can attempt to do the same here by looking at two specific keywords from the Navigator object. The javaEnabled keyword was detected 669,819 times, compared with MAMA's method of discovery tallying 53,688 times—a HUGE difference. MAMA's Java usage detection tricks were not exhaustive but should catch most cases. Perhaps most Java applet references are written dynamically by Web pages these days. This fragile theory may not be as flimsy as it seems if we also look at the plugins keyword compared to MAMA's other parameters for finding Flash usage. MAMA's basic Flash detection methods pointed toward a very strong interaction of Flash and script, and the heavy use of the plugins keyword fits in nicely with that.

Fig 9-1: DOM Navigator object properties and methods
Property/methodFrequency Property/methodFrequency  Property/methodFrequency
appVersion885,564plugins683,748platform167,109
appName877,345javaEnabled669,819cookieEnabled45,411
userAgent812,382mimeTypes323,142appCodeName3,398

The Node object

The appendChild keyword was especially popular in this group—authors apparently like to dynamically add content to documents. What a surprise! It was detected in 713,711 of MAMA's URLs—more than twice as often as the next-nearest Node object keyword. This number may seem unusually high compared to its other keyword siblings, but not if we look outside the Node object for a correlation. The related DOM method document.createElement is a likely companion to appendChild, and it was seen 731,116 times.

Some other relative comparisons can also be interesting; appendChild is four times as popular as removeChild, while removeChild is MUCH more popular than replaceChild. firstChild is approximately three times as popular as lastChild and nextSibling is more than 3 times as popular as previousSibling. nodeType and nodeName are used a similar number of times and are used in combination ~2/3 of the time (93,546 cases). Authors do not seem to use the hasAttributes property (found only 75 times)—they must be using some other means to check for attributes' existence.

Fig 10-1: DOM Node object properties and methods
(Please see also the full frequency table.)
Property/methodFrequency Property/methodFrequency  Property/methodFrequency
appendChild713,711nodeName144,836ownerDocument60,851
parentNode317,411attributes127,841xml48,824
childNodes236,865nodeValue116,097replaceChild47,405
firstChild186,788hasChildNodes115,660cloneNode47,233
removeChild174,231nextSibling102,171previousSibling28,972
insertBefore152,605prefix93,197normalize10,107
nodeType150,297lastChild62,872selectSingleNode7,679

The Range object

Usage of the various properties and methods of the Range object was detected 83,590 times (only 3.19% of URLs using script). By comparison, the createRange method of the Document object was used by 75,703 URLs. Of these Range keywords, collapse is used the most, with both setStartBefore and setStartAfter also being very popular. The selectNodeContents method was found to be considerably more popular than selectNode—by almost a factor of eight. Other related start/end keywords have usage rates that are similar to each other.

Fig 11-1: DOM Range object properties and methods
(Please see also the full frequency table.)
Property/methodFrequency Property/methodFrequency  Property/methodFrequency
collapse51,435deleteContents3,935endOffset2,647
setStartBefore43,138setStart3,171insertNode2,321
setStartAfter40,270startOffset3,150cloneContents2,261
selectNodeContents37,027setEnd3,086endContainer2,236
collapsed12,862detach2,732cloneRange1,993
selectNode4,636startContainer2,659setEndAfter1,911

The Screen object

In the midst of compiling this research, a few surprising tidbits of information came to light. Based on my experience in the past, I didn't expect the colorDepth property to be as popular as it was (just over 32% of all MAMA's script cases). The next closest keyword was used with only 30% of the frequency of colorDepth. The availWidth and availHeight properties are almost always used together (253,148 cases). Similarly, the screenX/screenY and screenTop/screenLeft properties are usually paired together as well (19,022 and 14,823 times respectively).

Fig 12-1: DOM Screen object properties and methods
Property/methodFrequency Property/methodFrequency
colorDepth843,022screenY19,173
availWidth257,025screenTop15,778
availHeight255,641screenLeft15,020
screenX19,735  

Table related objects, properties, and methods

These table keywords were in 75,110 cases. Things seem a little amiss with some aspects of the results though. The CAPTION markup element had a lower frequency than the caption keyword in script, so its usage solely in a table-related context is suspect. As a keyword, caption can also apply to image captions, so there could be some name collision going on there. This also brings into question usage rates for other simple (and popular) table-related generic keywords cells and rows, but the representation numbers don't really exhibit significant name collision overlap issues. The rows and cells keywords are used in combination together in a majority of their cases (21,574 times). However, caption is used rarely with either cells or rows (719 and 1,971 times respectively).

The tHead keyword is only used 40% as often as tBodies, while tBodies is used ten times as often as tFoot. Dealing with rows is more popular than dealing with individual cells (rows:cells = 47,401:25,321; deleteRow:deleteCell = 2,970:599), but authors use the DOM to insert rows and cells at similar rates.

Fig 13-1: DOM Table-related object properties and methods
Property/methodFrequency Property/methodFrequency  Property/methodFrequency
rows47,401rowIndex3,703tFoot167
cells25,321deleteRow2,970sectionRowIndex101
caption24,146tHead1,944createTHead21
tBodies4,824deleteCell599deleteCaption11
insertRow4,493createCaption406createTFoot7
insertCell4,314deleteTHead368deleteTFoot6

Traversal-related objects, properties, and methods

This section covers select keywords used by the NodeFilter, NodeIterator, and TreeWalker objects. It was originally added to balance the search criteria for the Range object. After all, the W3C specification is for both Range AND Traversal. The results extracted from this section are inconclusive. The nextNode method is used at much higher rates than any of the other keywords tracked—by comparison all of the other keywords are very rarely used. Since many of the methods here are inherited from the more generalized Node object, it is hard to draw many conclusions.

Fig 14-1: DOM Traversal-related objects, properties, and methods
Property/methodFrequency Property/methodFrequency
nextNode14,360whatToShow46
createTreeWalker85acceptNode39
NodeFilter84previousNode10

The Window object

This object represents a browser window or sub-frame. Of all the keywords in this group, window was obviously going to be the most popular. There are a number of interesting comparisons to be made between the various keyword couplings.

Dialogs are generated in JavaScript using the alert, confirm, and prompt methods of the Window object. Of these, alert is used most—17.84% of URLs using script utilize it in some fashion; confirm and prompt are only found in 4.07% and 1.19% of scripted pages respectively.

setTimeout is almost twice as popular as clearTimeout, but clearTimeout is almost NEVER used without setTimeout (found together in 490,124 URLs). Similarly, setInterval is significantly more popular than its companion clearInterval, but clearInterval use is almost always paired with setInterval (detected in unison 311,890 times).

The (move/scroll/resize)To methods are always more popular than the related (move/scroll/resize)By methods; use of the move* actions are preferred over the scroll* methods, which in turn see higher use than the resize* methods. pageXOffset is only found 81.58% as often as the pageYOffset cases, but pageYOffset is rarely seen without a pageXOffset present. The innerHeight and innerWidth methods share very similar frequency rates, because they are usually used together (in 641,857 cases).

Some of the keywords in this group are generic in nature and can be used across multiple objects. The keywords focus and blur were placed here, but also apply to other objects (like Input and Link). The simple keyword open definitely applies as the Window object method, but as a concept open is very generic and there may be some name collision (such as another official use as a separate method of the XMLHttpRequest object).

The relationship between getComputedStyle and currentStyle was covered in the section above on the Element object. The History DOM object was not given its own MAMA category, but its major (generically named) methods were extractable from the "rest" section (go: 72,381, back: 41,836, forward: 4,572)—it seems that authors are not inclined to go forward in the browsing history as often as they wish to back.

Fig 15-1: DOM Window object properties and methods
(Please see also the full frequency table.)
Property/methodFrequency Property/methodFrequency  Property/methodFrequency
window1,812,773frames790,893alert467,055
navigator1,570,402self739,456status464,370
location1,475,171innerWidth668,432setInterval392,436
screen1,049,650innerHeight657,440close355,895
open1,021,945event525,373clearInterval316,922
parent836,445clearTimeout493,937history254,699
setTimeout812,357focus475,947pageYOffset254,325

Double checking the use of other DOM object names

Some of the keywords in the Window object group represent the parent object names for other DOM groupings in MAMA. Looking at these keyword frequencies and the overall uses for the other keyword groups, we find that the totals are quite compatible:

Fig 15-2: Comparison of DOM Window object keywords to other MAMA category totals
Window
object
keyword
Keyword
frequency
Keyword
group
frequency
"navigator"1,570,4021,553,086
"location"1,475,1711,511,874
"screen"1,049,650899,431

XML related objects, properties, and methods

Not all of these keywords are dedicated solely to XML processing. The keyword with the highest detected frequency here was ActiveXObject, which is MSIE's generic system for using ActiveX controls in Web pages. How do we filter out non-XML related usages of ActiveXObject? Firstly, authors wanting to use XMLHttpRequest these days will typically allow for both types of objects. These two keywords are used together in 105,013 cases (93.53% of XMLHttpRequest cases). Another notable pairing is the incidence of the onreadystatechange keyword, which also tracks very close to use of XMLHttpRequest (94.94%). The readyState is a vital part of XMLHTTP processing, so tracking its numbers can also expose MSIE-only uses of XMLHTTP. The keywords readyState and onreadystatechange were used together 104,763 times. The remainder of the readyState cases (in 45,329 URLs) will likely be MSIE centric syntax.

Saarsoo also looked for "XMLHttpRequest" usage and only encountered it 6,125 times—1.90% of the pages that were determined to be using JavaScript in his study. By comparison, MAMA's usage rate is quite a bit higher. Considering only the same metric (use of XMLHttpRequest), it was found in 4.29% of MAMA's URLs that were using Script.

Fig 16-1: DOM XML-related object properties and methods
Property/methodFrequency Property/methodFrequency  Property/methodFrequency
ActiveXObject652,356onreadystatechange106,599getResponseHeader32,187
readyState150,092responseText95,262statusText22,358
XMLHttpRequest112,277setRequestHeader73,413parseFromString15,266
send109,029responseXML42,272getAllResponseHeaders11,492

This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.

Comments

The forum archive of this article is still available on My Opera.

No new comments accepted.