MAMA: CSS quantities and sizes
Index:
- Introduction
- Quantities of CSS components
- Venn diagram breakdown: CSS usage
- Sizes: External CSS (LINK element)
- Sizes: Embedded CSS (STYLE element)
- Sizes: Inline CSS (Style attribute)
Introduction
CSS was detected in 2,821,141 of MAMA's URLs (80.39%). "Using CSS" was defined as having any sort of content from 1 or more of the following 3 sources:
- External CSS via the
LINK
element in theHEAD
of the document - Embedded CSS by way of
STYLE
element content Style
attribute content
MAMA also detected CSS usage in XML using the "xml-stylesheet" processing instruction but discovered this in only 568 cases. We will ignore those cases for simplicity in the following discussion about quantities and sizes.
Looking at some of the URLs for extreme cases with External CSS (and @import CSS) pointed out some flaws in MAMA's analysis strategy. Some of these issues were even anticipated and workarounds were created, but developers on the Web find ways to create the most outlandish code. These shortcomings will all be addressed in the next version, but for now they help demonstrate that analyzing a Web page is always a more complex proposition than expected.
Quantities of CSS components in Web pages
MAMA detected CSS usage in 3 distinct ways: via the common Style
attribute, external CSS using the LINK
element, and
embedded CSS from the STYLE
element. The Style
attribute was the most common method, barely exceeding the total for the external
CSS construct (LINK
element). The "times per page"
values and other counters represent the number of occurrences for the specific
syntax encountered for a URL. For example, the maximum number of
LINK
ed style sheets discovered in any single page was
44; the maximum number of Style
attributes found was
21,293. The average "per-page" numbers listed in the table below (Fig 2-1)
details where the different types of CSS were used and does not cover the total MAMA URL
space.
CSS type | Description | Total URLs containing CSS type | % Total CSS usage |
Most popular quantity |
Max. quantity per page | Avg. quantity per page |
---|---|---|---|---|---|---|
Style attribute | Contents of all Style attributes | 1,898,513 | 67.30% | 1 | 21,293 | 25.6 |
External CSS | Content from all LINK/Href/Rel="stylesheet" URLs | 1,836,260 | 65.09% | 1 | 44 | 1.5 |
Embedded CSS | Contents of all STYLE elements | 1,321,006 | 46.83% | 1 | 708 | 1.6 |
Maximum quantities of the CSS usage types
URL | External CSS qty |
---|---|
http://www.rocblanchotels.com/ | 45 |
http://www2k.biglobe.ne.jp/~hideko/ (URL no longer active) | 39 |
http://www.frelimo.org.mz/ | 34 |
URL | Embedded CSS qty |
---|---|
http://www.1000irani.com/ | 708 |
http://www.australianet.net/tra/ (URL no longer active) | 665 |
http://mocambique1.blogs.sapo.pt/ (URL no longer active) | 196 |
URL | Style attribute qty |
---|---|
http://www.albumdefamille.neufblog.com/ (URL no longer active) | 21,293 |
http://www.azeribook.com/folk/molla_nasreddin.htm (URL no longer active) | 20,602 |
http://members.1012surfnet.at/sabatieu/ (URL no longer active) | 19,041 |
Venn diagram: CSS usage by type
The most popular combination of CSS methods is external CSS in conjunction with inline CSS. The least popular mixing is external CSS paired with embedded CSS. To get a clearer view of the uses and intersections of the different CSS methods, a Venn diagram is helpful:
Note: Region sizes are not to scale
Sizes: External CSS (LINK
element)
Tracking external CSS generated some errors and exposed a few problems that will be changed in MAMA's next version. A number of cases (~30) had a total analyzed external CSS size of about 5 megabytes, but in the live version of those URLs, these sizes were not at all reproducible. Five megabytes is the maximum that MAMA allowed any downloadable component to reach before aborting, so it explains why there was so much clustering around that value. There may have been a runaway/race condition in any number of sources—the Web server, network conditions, the Perl LWP fetching code, etc. There is no way to be sure what happened—some of the remaining extreme size cases exhibit aberrant behavior; for example, one URL had an external CSS reference to an MP4 video! Another URL had a reasonably-sized external CSS file, but there were a large number of garbage characters at the very end (making the file size over a megabyte). Other instances demonstrated a weakness in MAMA's approach to serializing framed content for analysis; sometimes an external CSS was referenced in multiple frames and therefore added more than once to the overall size sums.
The largest case of external CSS use that was both realistic and verifiable at the time of writing was http://www.goldeneaglecoin.com/, which had 868,426 characters. The site http://www.skatteinform.dk/ (URL no longer active) had a high number of garbage characters, and http://www.vanderlande.com was the case where MAMA repeatedly added the external CSS references in all IFRAMEs to the overall sizes. Trying to offset these problems a little, all cases above 900,000 characters were tossed out to find an average of 8387.1 characters for external style sheets. MAMA attempted to filter out Web servers silently redirecting to non-CSS content, but more could be done—there are probably a number of cases where external CSS redirects to the URL's home page or other HTML pages. An example that demonstrates the effects of failing to do this is http://www.dmc.tv/, in which 5 of the 13 detected external CSS references silently pointed to HTML pages.
Size distribution of external CSS (LINK
element)
Size range | Frequency | Size range | Frequency | Size range | Frequency | ||
---|---|---|---|---|---|---|---|
=0 | 1,727,948 | >7000 && <=8000 | 63,619 | >25000 && <=30000 | 32,007 | ||
>0 && <= 500 | 121,838 | >8000 && <=9000 | 54,192 | >30000 && <=35000 | 28,166 | ||
>500 && <=1000 | 152,027 | >9000 && <=10000 | 47,634 | >35000 && <=40000 | 15,931 | ||
>1000 && <=2000 | 267,587 | >10000 && <=12000 | 79,179 | >40000 && <=45000 | 9,562 | ||
>2000 && <=3000 | 202,354 | >12000 && <=14000 | 61,637 | >45000 && <=50000 | 10,112 | ||
>3000 && <=4000 | 150,273 | >14000 && <=16000 | 42,080 | >50000 && <=75000 | 19,512 | ||
>4000 && <=5000 | 115,056 | >16000 && <=18000 | 33,164 | >75000 && <=100000 | 6,874 | ||
>5000 && <=6000 | 105,108 | >18000 && <=20000 | 28,657 | >100000 && <=150000 | 4,403 | ||
>6000 && <=7000 | 80,456 | >20000 && <=25000 | 48,152 | >150000 | 1,652 |
Maximum sizes of external CSS (LINK
element)
URL | External CSS size (chars) |
---|---|
http://www.skatteinform.dk/ (URL no longer active) | 1,051,671 |
http://www.goldeneaglecoin.com/ | 868,426 |
http://www.vanderlande.com/Pages/default.aspx (URL no longer active) | 763,681 |
Sizes: Embedded CSS (STYLE
element)
MAMA detected a URL with embedded CSS sections totalling 1,572,504, but that points out a potential problem with this kind of tally: whitespace. The huge sizes reported for the top 3 of the URLs listed below is the result of EXCESSIVE use of extra whitespace. Such situations are probably the result of misconfigured Web servers and large amounts of code from pre-processed languages (such as PHP or ASP). The pre-processed code itself is not present in the rendered document, but extra whitespace is added for every line of the code resulting in an astronomical amount of spaces that dominate a document's downloaded size. With that in mind, MAMA's average embedded CSS size was 953.6 characters.
Size distribution of embedded CSS (STYLE
element)
Size range | Frequency | Size range | Frequency | Size range | Frequency | ||
---|---|---|---|---|---|---|---|
=0 | 2,196,509 | >5000 && <=6000 | 12,575 | >14000 && <=16000 | 1,237 | ||
>0 && <= 500 | 865,538 | >6000 && <=7000 | 13,346 | >16000 && <=18000 | 920 | ||
>500 && <=1000 | 183,220 | >7000 && <=8000 | 7,252 | >18000 && <=20000 | 1,002 | ||
>1000 && <=2000 | 124,580 | >8000 && <=9000 | 5,544 | >20000 && <=40000 | 2,830 | ||
>2000 && <=3000 | 44,621 | >9000 && <=10000 | 2,249 | >40000 && <=60000 | 584 | ||
>3000 && <=4000 | 24,977 | >10000 && <=12000 | 4,455 | >60000 && <=80000 | 111 | ||
>4000 && <=5000 | 15,466 | >12000 && <=14000 | 2,040 | >80000 | 124 |
Maximum sizes of embedded CSS (STYLE
element)
URL | STYLE element size (chars) |
---|---|
http://www.moundsviewschools.org/belair/home.htm | 1,572,504 |
http://www.macuisinechezvous.com/ | 1,049,159 |
http://www.procolharum.com/ | 1,048,724 |
http://www.kshamm.de/ | 956,362 |
Sizes: Inline CSS (Style
attribute)
The maximum recorded aggregate size of inline CSS was an enormous 2,589,039 characters (the total size for the main document size at that URL was over 3.5 Megabytes, so over 70% of this big document's size was Inline CSS! Excessive!). The average inline size was 1,006.8 characters. All the highest inline CSS size URL cases listed below were created by Microsoft Office and make heavy use of its "mso-" CSS extensions. In the future, it could be interesting to look deeper at the relative sizes of Microsoft Office document inline CSS versus non-MSOffice pages.
Size distribution of inline CSS (Style
attribute)
Size range | Frequency | Size range | Frequency | Size range | Frequency | ||
---|---|---|---|---|---|---|---|
=0 | 1,610,667 | >5000 && <=6000 | 14,804 | >14000 && <=16000 | 2,468 | ||
>0 && <= 500 | 1,235,564 | >6000 && <=7000 | 10,296 | >16000 && <=18000 | 2,840 | ||
>500 && <=1000 | 262,306 | >7000 && <=8000 | 7,386 | >18000 && <=20000 | 1,267 | ||
>1000 && <=2000 | 196,019 | >8000 && <=9000 | 5,252 | >20000 && <=40000 | 4,587 | ||
>2000 && <=3000 | 77,092 | >9000 && <=10000 | 4,111 | >40000 && <=60000 | 1,092 | ||
>3000 && <=4000 | 39,154 | >10000 && <=12000 | 6,249 | >60000 && <=80000 | 403 | ||
>4000 && <=5000 | 23,166 | >12000 && <=14000 | 3,780 | >80000 | 677 |
Maximum sizes of inline CSS (Style
attribute)
URL | STYLE element size (chars) |
---|---|
http://www.albumdefamille.neufblog.com/ (URL no longer active) | 2,589,039 |
http://www.azeribook.com/folk/molla_nasreddin.htm (URL no longer active) | 1,423,099 |
http://www.fw.hu/eventoj/steb/vortaroj/elektra-terminaro.htm (URL no longer active) | 959,898 |
This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.
Comments
The forum archive of this article is still available on My Opera.