MAMA: CSS syntax
- Previous article—MAMA: CSS quantities and sizes
- Next article—MAMA: Scripting - quantities and sizes
- Table of contents
Index:
- Introduction
- At-rules
- External CSS file names
- External CSS MIME types
- Media types
- Pseudo-classes and pseudo-elements
- CSS properties
- Notable CSS syntax: Inherit and !important
- Miscellaneous CSS property values and other syntax
- Saarsoo's CSS study
Introduction
MAMA's look at CSS covered a number of different areas. It looked at external CSS
(the LINK
element), embedded CSS (via the STYLE
element), and inline CSS (using the common
Style
attribute). It also delved into CSS specified using the @import
syntax and did its best to reveal CSS usage in XML with the "xml-stylesheet"
processing instruction. Overall, CSS was detected in 2,821,141 (80.39%) of the
URLs that MAMA analyzed. CSS properties, MIME types, media types, and other
syntax was tracked, but more can be done to analyze CSS usage; Saarsoo's study
proved that. We will look at the factors, many unique to this analysis, that MAMA
tracked in its study of CSS, and then we will compare some of the commonalities
and differences between the results MAMA found and what Saarsoo was able to
discover in his study.
At-rules
MAMA tracked 3 types of CSS at-rule syntax: @import
,
@media
and @charset
.
For @charset
, only the existence of the at-rule
is tracked. In the case of @media
, the existence
is tracked, along with the stated media type values (see the Media
types section below for more). Lastly, @import
statements were dissected and analyzed for their file names and media types.
At-rule type | Frequency | % Total CSS usage |
---|---|---|
@import | 191,496 | 6.79% |
@media | 63,293 | 2.24% |
@charset | 30,022 | 1.06% |
@Import
usage: Quantities
This syntax represents an additional source of CSS content when used in a document.
The CSS @import
syntax is necessary to analyze but
tricky to handle - an @import
statement can point
to other @import
statements; it can even point to
itself in endless recursion. In order to sidestep such logistical headaches,
only the first-level @import
URLs were resolved,
downloaded and added to the CSS analysis queue for MAMA. These first-level
@import
situations were detected in 191,496 URLs.
The quantities and sizes of @import
statements
were also tracked. The most extreme case originally registered as having 1,224
@import
statements, but more recent scrutiny
exposed only 68 (still high, but not astronomical like before). When
@import
is used, the average number of statements
was 2.3 and the most popular number was 1. The top @import
case (verifiable at the time of writing) was
http://www.lmmeteoven.org/, which
points out an issue in MAMA's detection strategy. Sure, 151 @import
statements are detected, but the majority of those are repeated declarations—there are only 1-2 dozen unique URLs represented there. A
full frequency table of
@import
quantities is available.
URL | Quantity @import |
---|---|
http://www.lmmeteoven.org/ | 151 |
http://www.esu1.org/ | 97 |
http://www.educarchile.cl/... | 68 |
@Import
usage: sizes
As with external style sheets, the extreme size values here point out some
problems with MAMA's strategies for deciding what to do with @import
content. The list of URLs that MAMA analyzes is not reduced to unique URLs,
which results in inflated sums. This is an issue when the same @import
object is referenced in multiple sub-frames, but some pages
repeat an @import
reference multiple times, even in the same document!
Size range | Frequency | Size range | Frequency | Size range | Frequency | ||
---|---|---|---|---|---|---|---|
=0 | 3,331,624 | >7000 && <=8000 | 6,881 | >25000 && <=30000 | 6,089 | ||
>0 && <= 500 | 14,350 | >8000 && <=9000 | 5,355 | >30000 && <=35000 | 4,732 | ||
>500 && <=1000 | 7,569 | >9000 && <=10000 | 4,737 | >35000 && <=40000 | 8,540 | ||
>1000 && <=2000 | 17,123 | >10000 && <=12000 | 8,461 | >40000 && <=45000 | 2,423 | ||
>2000 && <=3000 | 17,252 | >12000 && <=14000 | 6,765 | >45000 && <=50000 | 2,055 | ||
>3000 && <=4000 | 9,426 | >14000 && <=16000 | 5,396 | >50000 && <=75000 | 4,791 | ||
>4000 && <=5000 | 8,751 | >16000 && <=18000 | 5,108 | >75000 && <=100000 | 1,533 | ||
>5000 && <=6000 | 8,693 | >18000 && <=20000 | 4,138 | >100000 && <=150000 | 1,523 | ||
>6000 && <=7000 | 7,024 | >20000 && <=25000 | 8,280 | >150000 | 561 |
URL | @Import size (chars) |
---|---|
http://www.boone.k12.ky.us/index.htm (URL no longer active) | 874,643 |
http://www.waitrose.com/inspiration/wfi.aspx (URL no longer active) | 751,734 |
http://www.educarchile.cl/... | 646393 |
External CSS file names
Tracking CSS file names is an example of a MAMA feature in search of a reason for
existing. The URLs of external style sheets from LINK
and
@import
-ed CSS were reduced to just the final filename
portion and this was stored in MAMA. This originally began as a request from a
co-worker to track file names used
by external scripts. With scripts, I knew that this data would
be compelling and useful. The code for tracking script file names was easy to replicate
for external CSS files, but I did not know what the result would be—it turns out it
is not very compelling. The popular file names used for external CSS files are rather
tedious and obvious: "style.css", "main.css",
"default.css", and the inspired "css.css"
are among the devastatingly insightful author choices for CSS file names. Yes, the
full frequency table is also available.
External CSS MIME types
This feature tracked the actual returned MIME type of the external CSS files
(via LINK
and @import
references). It did not trust any reported LINK
Type
attribute, if present. The actual result is
what one would expect—almost 99% of all external CSS are delivered with a
"text/css" MIME type. Other types were encountered,
but some are puzzling. Why would some external CSS be served as an image or
JavaScript MIME type? Are these servers just mis-configured? The distant-second
place value of "text/html" has two easy explanations—misconfigured Web servers (again), or Pseudo-404 errors redirecting to full
HTML error pages. 134,839 URLs with external style sheet references had at least
one that had no MIME type at all. Once again, MAMA comes through with a
full frequency table for your viewing pleasure.
Media types
In all, 404,212 pages specified at least one CSS media type. Media types were detected
by looking at the Media
attribute of all LINK
and STYLE
elements, as well as the CSS @media
at-rule syntax. The resulting list of media types were then matched against the following regular
expression:
Regexp:
/(all|screen|print|speech|aural|handheld|projection|tv)/i
Any media type that was not recognized fell into a catch-all category termed "other". What were some of the "other" media types? The 3 main types that were noticeable in significant quantities are all from CSS2—"braille", "embossed" and "tty". These values will definitely be added to the regular expression above the next time a big analysis is done.
Media type | Frequency | % Total media usage |
Media type | Frequency | % Total media usage | |
---|---|---|---|---|---|---|
screen | 252,948 | 62.58% | other | 7,119 | 1.76% | |
171,328 | 42.39% | tv | 5,770 | 1.43% | ||
all | 130,227 | 32.22% | aural | 2,533 | 0.63% | |
projection | 31,651 | 7.83% | speech | 301 | 0.07% | |
handheld | 22,316 | 5.52% |
Pseudo-classes and pseudo-elements
There are a number of these constructs defined in CSS2 and CSS3. A subset of pseudo-classes and pseudo-elements were chosen for tracking
in MAMA. Some obvious/important pseudo-classes were overlooked in this analysis,
specifically ":active"
, ":link"
and ":visited"
. It must be stressed that these are
not the only possible pseudo-classes and pseudo-elements, just most of the ones
that were widely (or soon to be widely) implemented by browsers at the time they
were added to MAMA.
A simple regular expression match was performed on all CSS content looking for the following pattern:
Regexp:
/:(hover|focus|root|empty|not|first-child|first-node|last-node|last-child|lang|before|after|first-letter|first-line)/gios
The pseudo-class :hover
is used in two-thirds of all pages that use CSS.
The pseudo-element :after
is (strangely) 3 times more popular than
:before
. The pseudo-element :first-child
is more than 4 times as frequent as :last-child
(although that can probably be attributed to :first-child
being in CSS2, while :last-child
was not added until
CSS3). The typography distinctions that are :first-letter
and :first-line
are not that widely used, although
authors clearly prefer to control the initial letter of a block 3 times as much
as the initial line.
Pseudo-class/element | Frequency | % Total CSS usage |
Pseudo-class/element | Frequency | % Total CSS usage | |
---|---|---|---|---|---|---|
hover | 1,918,442 | 68.00% | first-line | 4,476 | 0.16% | |
after | 96,541 | 3.42% | not | 2,785 | 0.10% | |
focus | 94,953 | 3.37% | lang | 2,546 | 0.09% | |
before | 34,558 | 1.22% | root | 2,267 | 0.08% | |
first-child | 24,769 | 0.88% | empty | 392 | 0.01% | |
first-letter | 15,804 | 0.56% | last-node | 75 | 0.00% | |
last-child | 5,826 | 0.21% | first-node | 74 | 0.00% |
CSS properties
The most popular CSS properties are the replacements for standard "old school"
HTML presentational markup. Three of the top five properties replicate the functionality
of the FONT
element, and the remaining ones take over
for the U
, S
, STRIKE
and B
elements. For CSS Box Model properties
('border'
, 'margin'
, and
'padding'
), the shorthand versions are more popular
than their component forms, but the reverse is true for the 'font'
and 'background'
properties. The most popular CSS Box
Model side properties are top for 'margin'
, and bottom
for 'border'
/'padding'
.
CSS property | Frequency | % Total CSS usage |
CSS property | Frequency | % Total CSS usage | |
---|---|---|---|---|---|---|
color | 2,400,643 | 85.09% | margin-bottom | 1,173,093 | 41.58% | |
font-size | 2,336,689 | 82.83% | margin-left | 1,125,675 | 39.90% | |
font-family | 2,223,829 | 78.83% | position | 1,095,461 | 38.83% | |
text-decoration | 2,113,412 | 74.91% | padding-left | 989,492 | 35.07% | |
font-weight | 2,012,992 | 71.35% | background | 958,127 | 33.96% | |
background-color | 1,698,366 | 60.20% | display | 954,047 | 33.82% | |
width | 1,596,974 | 56.61% | margin-right | 936,379 | 33.19% | |
text-align | 1,448,336 | 51.34% | font-style | 933,690 | 33.10% | |
height | 1,428,991 | 50.65% | background-image | 922,566 | 32.70% | |
border | 1,376,821 | 48.80% | padding-top | 905,852 | 32.11% | |
margin | 1,317,016 | 46.68% | border-bottom | 894,900 | 31.72% | |
padding | 1,276,661 | 45.25% | top | 890,666 | 31.57% | |
margin-top | 1,241,997 | 44.02% | left | 857,439 | 30.39% | |
line-height | 1,179,743 | 41.82% | padding-bottom | 828,349 | 29.36% |
Browser vendor CSS property extensions
The major browser makers have extended CSS over the years, and documents on the
Web show just how much effect this has had on authoring practice. Mozilla's
'-moz-opacity'
is the most popular one, with the
standardized version 'opacity'
being only slightly
more popular. Microsoft Office CSS extensions (prefixed by "mso-")
have the highest representation overall, with 202 (!!) different CSS properties
in the frequency table. Adobe ("-adbe-"), Apple/Safari
("-webkit-"), KDE ("-khtml-"),
Microsoft ("-ms-") and Opera ("-o-")
are all also represented by CSS browser extensions.
Notable CSS syntax: inherit and !important
Two keywords in CSS have special meaning—they are not selectors, and they are
not properties. The "inherit" keyword is a special
global property value used to explicitly pass on a particular value from a parent
to a child. Just under 10% of all URLs using CSS (278,743 URLs) use this keyword
at least once. The other special keyword is "!important",
which specifies a shift in the bias of a document's cascade order toward a specific
CSS rule. It was found in 155,449 of MAMA's URLs (over 5% of all cases using CSS).
These numbers seem significant, but if one frames the numbers in persective with
the CSS property frequency table, optimism is quickly deflated. For instance,
there are almost 75 CSS properties that are more popular than the "!important"
keyword, including the non-standard 'filter'
and
most of the scrollbar properties.
Miscellaneous CSS property values
MAMA generally tracked only CSS properties in this version. Future MAMA versions plan to gather more details about CSS. Some other parts of CSS syntax were also harvested this time, but MAMA generally stayed away from the values used by the CSS properties. There were some exceptions—due to requests from co-workers, a few select property values were compiled.
Property[value] | Frequency | Property[value] | Frequency | |
---|---|---|---|---|
overflow [auto] | 175,474 | position [fixed] | 22,459 | |
text-decoration [blink] | 73,781 | overflow [scroll] | 12,602 | |
display [table] | 53,517 |
Saarsoo's CSS study
Renee Saarsoo's university thesis work "Coding Practices of Web Pages" was groundbreaking in its coverage of both the breadth and depth of CSS usage on a large scale URL set. I discovered this study very late in MAMA's most recent development cycle and was impressed with the scope of the information presented—especially compared with the CSS information that MAMA was gathering. Now, Saarsoo was able to discover a number of things that MAMA did not, but the reverse is also true. Together, these two studies reveal a substantial amount of information about CSS usage on the Web.
When developing code, some things are easy ... and some are hard. For MAMA and the way it was designed, information about CSS selectors, property values, and units was among the harder things to analyze. Saarsoo's study represented these areas very well. In the future, the Perl CSS::SAC parser used in Saarsoo's study will be integrated into MAMA in the hopes of gathering similar data for it to scrutinize and correlate.
By analyzing CSS selectors, Saarsoo was able to look at the actual
Class
and Id
attributes
referenced by CSS. MAMA did not do this, but it did look at all
Class
and Id
attributes
used in markup. By combining these two, an interesting comparison could be
generated about how the attributes specified in a page are used—and disused—by CSS.
Some loose comparisons between MAMA and Saarsoo's CSS results
Saarsoo's study looked at some factors that do not have direct comparisons in MAMA, but we can look at data of a similar nature for instructive parallels. For instance, Saarsoo's study looks at CSS usage of image formats for various purposes. It showed that the GIF format is used almost 4-to-1 over the JPEG format, with PNG trailing FAR behind both. MAMA's look at inline and background image usage in markup also shows that the GIF format is dominant, but JPEG usage is only slightly less popular, and the margin to PNG's third place ranking is much smaller.
MAMA's look at the FONT
element reveals trends in font name usage and colors specified for
them. These findings can be compared to Saarsoo's look at the CSS 'font-family'
property and the general usage of CSS color units. We can see that the popularity
of the top font names are almost exactly the same between these studies:
"Arial", "Helvetica",
"Verdana", and "sans-serif"
are definitely kings. Saarsoo concluded that the #rrggbb colour syntax was the most popular
in CSS, and this is also true with the FONT
/Color
markup usage. His results regarding the most popular colors also agrees with MAMA's
findings about the FONT
element.
- Previous article—MAMA: CSS quantities and sizes
- Next article—MAMA: Scripting - quantities and sizes
- Table of contents
This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.
Comments
The forum archive of this article is still available on My Opera.