MAMA: CSS syntax

By Brian Wilson

Index:

  1. Introduction
  2. At-rules
  3. External CSS file names
  4. External CSS MIME types
  5. Media types
  6. Pseudo-classes and pseudo-elements
  7. CSS properties
  8. Notable CSS syntax: Inherit and !important
  9. Miscellaneous CSS property values and other syntax
  10. Saarsoo's CSS study

Introduction

MAMA's look at CSS covered a number of different areas. It looked at external CSS (the LINK element), embedded CSS (via the STYLE element), and inline CSS (using the common Style attribute). It also delved into CSS specified using the @import syntax and did its best to reveal CSS usage in XML with the "xml-stylesheet" processing instruction. Overall, CSS was detected in 2,821,141 (80.39%) of the URLs that MAMA analyzed. CSS properties, MIME types, media types, and other syntax was tracked, but more can be done to analyze CSS usage; Saarsoo's study proved that. We will look at the factors, many unique to this analysis, that MAMA tracked in its study of CSS, and then we will compare some of the commonalities and differences between the results MAMA found and what Saarsoo was able to discover in his study.

At-rules

MAMA tracked 3 types of CSS at-rule syntax: @import, @media and @charset. For @charset, only the existence of the at-rule is tracked. In the case of @media, the existence is tracked, along with the stated media type values (see the Media types section below for more). Lastly, @import statements were dissected and analyzed for their file names and media types.

Fig 2-1: Popularity of CSS At-rules
At-rule typeFrequency% Total
CSS usage
@import191,4966.79%
@media63,2932.24%
@charset30,0221.06%

@Import usage: Quantities

This syntax represents an additional source of CSS content when used in a document. The CSS @import syntax is necessary to analyze but tricky to handle - an @import statement can point to other @import statements; it can even point to itself in endless recursion. In order to sidestep such logistical headaches, only the first-level @import URLs were resolved, downloaded and added to the CSS analysis queue for MAMA. These first-level @import situations were detected in 191,496 URLs. The quantities and sizes of @import statements were also tracked. The most extreme case originally registered as having 1,224 @import statements, but more recent scrutiny exposed only 68 (still high, but not astronomical like before). When @import is used, the average number of statements was 2.3 and the most popular number was 1. The top @import case (verifiable at the time of writing) was http://www.lmmeteoven.org/, which points out an issue in MAMA's detection strategy. Sure, 151 @import statements are detected, but the majority of those are repeated declarations—there are only 1-2 dozen unique URLs represented there. A full frequency table of @import quantities is available.

Fig 2-2: URLs with the most detected @import statements
URLQuantity
@import
http://www.lmmeteoven.org/151
http://www.esu1.org/97
http://www.educarchile.cl/...68

@Import usage: sizes

As with external style sheets, the extreme size values here point out some problems with MAMA's strategies for deciding what to do with @import content. The list of URLs that MAMA analyzes is not reduced to unique URLs, which results in inflated sums. This is an issue when the same @import object is referenced in multiple sub-frames, but some pages repeat an @import reference multiple times, even in the same document!

Fig 2-3: @import sizes
Size rangeFrequency Size rangeFrequency Size rangeFrequency
=03,331,624>7000 && <=80006,881 >25000 && <=300006,089
>0 && <= 50014,350>8000 && <=90005,355 >30000 && <=350004,732
>500 && <=10007,569>9000 && <=100004,737 >35000 && <=400008,540
>1000 && <=200017,123>10000 && <=120008,461 >40000 && <=450002,423
>2000 && <=300017,252>12000 && <=140006,765 >45000 && <=500002,055
>3000 && <=40009,426>14000 && <=160005,396 >50000 && <=750004,791
>4000 && <=50008,751>16000 && <=180005,108 >75000 && <=1000001,533
>5000 && <=60008,693>18000 && <=200004,138 >100000 && <=1500001,523
>6000 && <=70007,024>20000 && <=250008,280 >150000561
Fig 2-4: URLs with the largest @import size
URL@Import size
(chars)
http://www.boone.k12.ky.us/index.htm (URL no longer active)874,643
http://www.waitrose.com/inspiration/wfi.aspx (URL no longer active)751,734
http://www.educarchile.cl/...646393

External CSS file names

Tracking CSS file names is an example of a MAMA feature in search of a reason for existing. The URLs of external style sheets from LINK and @import-ed CSS were reduced to just the final filename portion and this was stored in MAMA. This originally began as a request from a co-worker to track file names used by external scripts. With scripts, I knew that this data would be compelling and useful. The code for tracking script file names was easy to replicate for external CSS files, but I did not know what the result would be—it turns out it is not very compelling. The popular file names used for external CSS files are rather tedious and obvious: "style.css", "main.css", "default.css", and the inspired "css.css" are among the devastatingly insightful author choices for CSS file names. Yes, the full frequency table is also available.

External CSS MIME types

This feature tracked the actual returned MIME type of the external CSS files (via LINK and @import references). It did not trust any reported LINK Type attribute, if present. The actual result is what one would expect—almost 99% of all external CSS are delivered with a "text/css" MIME type. Other types were encountered, but some are puzzling. Why would some external CSS be served as an image or JavaScript MIME type? Are these servers just mis-configured? The distant-second place value of "text/html" has two easy explanations—misconfigured Web servers (again), or Pseudo-404 errors redirecting to full HTML error pages. 134,839 URLs with external style sheet references had at least one that had no MIME type at all. Once again, MAMA comes through with a full frequency table for your viewing pleasure.

Media types

In all, 404,212 pages specified at least one CSS media type. Media types were detected by looking at the Media attribute of all LINK and STYLE elements, as well as the CSS @media at-rule syntax. The resulting list of media types were then matched against the following regular expression:

Regexp:
/(all|screen|print|speech|aural|handheld|projection|tv)/i

Any media type that was not recognized fell into a catch-all category termed "other". What were some of the "other" media types? The 3 main types that were noticeable in significant quantities are all from CSS2—"braille", "embossed" and "tty". These values will definitely be added to the regular expression above the next time a big analysis is done.

Fig 5-1: Popular CSS media types
Media typeFrequency% Total
media usage
 Media typeFrequency% Total
media usage
screen252,94862.58% other7,1191.76%
print171,32842.39% tv5,7701.43%
all130,22732.22% aural2,5330.63%
projection31,6517.83% speech3010.07%
handheld22,3165.52%    

Pseudo-classes and pseudo-elements

There are a number of these constructs defined in CSS2 and CSS3. A subset of pseudo-classes and pseudo-elements were chosen for tracking in MAMA. Some obvious/important pseudo-classes were overlooked in this analysis, specifically ":active", ":link" and ":visited". It must be stressed that these are not the only possible pseudo-classes and pseudo-elements, just most of the ones that were widely (or soon to be widely) implemented by browsers at the time they were added to MAMA.

A simple regular expression match was performed on all CSS content looking for the following pattern:

Regexp:
/:(hover|focus|root|empty|not|first-child|first-node|last-node|last-child|lang|before|after|first-letter|first-line)/gios

The pseudo-class :hover is used in two-thirds of all pages that use CSS. The pseudo-element :after is (strangely) 3 times more popular than :before. The pseudo-element :first-child is more than 4 times as frequent as :last-child (although that can probably be attributed to :first-child being in CSS2, while :last-child was not added until CSS3). The typography distinctions that are :first-letter and :first-line are not that widely used, although authors clearly prefer to control the initial letter of a block 3 times as much as the initial line.

Fig 6-1: Popular CSS Pseudo-classes and pseudo-elements
Pseudo-class/elementFrequency% Total
CSS usage
 Pseudo-class/elementFrequency% Total
CSS usage
hover1,918,44268.00%first-line4,4760.16%
after96,5413.42%not2,7850.10%
focus94,9533.37%lang2,5460.09%
before34,5581.22%root2,2670.08%
first-child24,7690.88%empty3920.01%
first-letter15,8040.56%last-node750.00%
last-child5,8260.21%first-node740.00%

CSS properties

The most popular CSS properties are the replacements for standard "old school" HTML presentational markup. Three of the top five properties replicate the functionality of the FONT element, and the remaining ones take over for the U, S, STRIKE and B elements. For CSS Box Model properties ('border', 'margin', and 'padding'), the shorthand versions are more popular than their component forms, but the reverse is true for the 'font' and 'background' properties. The most popular CSS Box Model side properties are top for 'margin', and bottom for 'border'/'padding'.

Fig 7-1: Popular CSS properties
[Please also see the complete frequency table.]
CSS propertyFrequency% Total
CSS usage
 CSS propertyFrequency% Total
CSS usage
color2,400,64385.09%margin-bottom1,173,09341.58%
font-size2,336,68982.83%margin-left1,125,67539.90%
font-family2,223,82978.83%position1,095,46138.83%
text-decoration2,113,41274.91%padding-left989,49235.07%
font-weight2,012,99271.35%background958,12733.96%
background-color1,698,36660.20%display954,04733.82%
width1,596,97456.61%margin-right936,37933.19%
text-align1,448,33651.34%font-style933,69033.10%
height1,428,99150.65%background-image922,56632.70%
border1,376,82148.80%padding-top905,85232.11%
margin1,317,01646.68%border-bottom894,90031.72%
padding1,276,66145.25%top890,66631.57%
margin-top1,241,99744.02%left857,43930.39%
line-height1,179,74341.82%padding-bottom828,34929.36%

Browser vendor CSS property extensions

The major browser makers have extended CSS over the years, and documents on the Web show just how much effect this has had on authoring practice. Mozilla's '-moz-opacity' is the most popular one, with the standardized version 'opacity' being only slightly more popular. Microsoft Office CSS extensions (prefixed by "mso-") have the highest representation overall, with 202 (!!) different CSS properties in the frequency table. Adobe ("-adbe-"), Apple/Safari ("-webkit-"), KDE ("-khtml-"), Microsoft ("-ms-") and Opera ("-o-") are all also represented by CSS browser extensions.

Notable CSS syntax: inherit and !important

Two keywords in CSS have special meaning—they are not selectors, and they are not properties. The "inherit" keyword is a special global property value used to explicitly pass on a particular value from a parent to a child. Just under 10% of all URLs using CSS (278,743 URLs) use this keyword at least once. The other special keyword is "!important", which specifies a shift in the bias of a document's cascade order toward a specific CSS rule. It was found in 155,449 of MAMA's URLs (over 5% of all cases using CSS). These numbers seem significant, but if one frames the numbers in persective with the CSS property frequency table, optimism is quickly deflated. For instance, there are almost 75 CSS properties that are more popular than the "!important" keyword, including the non-standard 'filter' and most of the scrollbar properties.

Miscellaneous CSS property values

MAMA generally tracked only CSS properties in this version. Future MAMA versions plan to gather more details about CSS. Some other parts of CSS syntax were also harvested this time, but MAMA generally stayed away from the values used by the CSS properties. There were some exceptions—due to requests from co-workers, a few select property values were compiled.

Fig 9-1: Popularity of some CSS property values
Property[value]Frequency Property[value]Frequency
overflow[auto]175,474position[fixed]22,459
text-decoration[blink]73,781overflow[scroll]12,602
display[table]53,517  

Saarsoo's CSS study

Renee Saarsoo's university thesis work "Coding Practices of Web Pages" was groundbreaking in its coverage of both the breadth and depth of CSS usage on a large scale URL set. I discovered this study very late in MAMA's most recent development cycle and was impressed with the scope of the information presented—especially compared with the CSS information that MAMA was gathering. Now, Saarsoo was able to discover a number of things that MAMA did not, but the reverse is also true. Together, these two studies reveal a substantial amount of information about CSS usage on the Web.

When developing code, some things are easy ... and some are hard. For MAMA and the way it was designed, information about CSS selectors, property values, and units was among the harder things to analyze. Saarsoo's study represented these areas very well. In the future, the Perl CSS::SAC parser used in Saarsoo's study will be integrated into MAMA in the hopes of gathering similar data for it to scrutinize and correlate.

By analyzing CSS selectors, Saarsoo was able to look at the actual Class and Id attributes referenced by CSS. MAMA did not do this, but it did look at all Class and Id attributes used in markup. By combining these two, an interesting comparison could be generated about how the attributes specified in a page are used—and disused—by CSS.

Some loose comparisons between MAMA and Saarsoo's CSS results

Saarsoo's study looked at some factors that do not have direct comparisons in MAMA, but we can look at data of a similar nature for instructive parallels. For instance, Saarsoo's study looks at CSS usage of image formats for various purposes. It showed that the GIF format is used almost 4-to-1 over the JPEG format, with PNG trailing FAR behind both. MAMA's look at inline and background image usage in markup also shows that the GIF format is dominant, but JPEG usage is only slightly less popular, and the margin to PNG's third place ranking is much smaller.

MAMA's look at the FONT element reveals trends in font name usage and colors specified for them. These findings can be compared to Saarsoo's look at the CSS 'font-family' property and the general usage of CSS color units. We can see that the popularity of the top font names are almost exactly the same between these studies: "Arial", "Helvetica", "Verdana", and "sans-serif" are definitely kings. Saarsoo concluded that the #rrggbb colour syntax was the most popular in CSS, and this is also true with the FONT/Color markup usage. His results regarding the most popular colors also agrees with MAMA's findings about the FONT element.

code class=

This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.

Comments

The forum archive of this article is still available on My Opera.

No new comments accepted.