MAMA: CSS quantities and sizes

By Brian Wilson

Index:

  1. Introduction
  2. Quantities of CSS components
  3. Venn diagram breakdown: CSS usage
  4. Sizes: External CSS (LINK element)
  5. Sizes: Embedded CSS (STYLE element)
  6. Sizes: Inline CSS (Style attribute)

Introduction

CSS was detected in 2,821,141 of MAMA's URLs (80.39%). "Using CSS" was defined as having any sort of content from 1 or more of the following 3 sources:

MAMA also detected CSS usage in XML using the "xml-stylesheet" processing instruction but discovered this in only 568 cases. We will ignore those cases for simplicity in the following discussion about quantities and sizes.

Looking at some of the URLs for extreme cases with External CSS (and @import CSS) pointed out some flaws in MAMA's analysis strategy. Some of these issues were even anticipated and workarounds were created, but developers on the Web find ways to create the most outlandish code. These shortcomings will all be addressed in the next version, but for now they help demonstrate that analyzing a Web page is always a more complex proposition than expected.

Quantities of CSS components in Web pages

MAMA detected CSS usage in 3 distinct ways: via the common Style attribute, external CSS using the LINK element, and embedded CSS from the STYLE element. The Style attribute was the most common method, barely exceeding the total for the external CSS construct (LINK element). The "times per page" values and other counters represent the number of occurrences for the specific syntax encountered for a URL. For example, the maximum number of LINKed style sheets discovered in any single page was 44; the maximum number of Style attributes found was 21,293. The average "per-page" numbers listed in the table below (Fig 2-1) details where the different types of CSS were used and does not cover the total MAMA URL space.

Fig 2-1: Totals for different methods of CSS inclusion
CSS typeDescriptionTotal URLs
containing
CSS type
% Total
CSS usage
Most
popular
quantity
Max.
quantity
per page
Avg.
quantity
per page
Style attributeContents of all Style attributes1,898,513 67.30%121,29325.6
External CSSContent from all LINK/Href/Rel="stylesheet" URLs1,836,260 65.09%1441.5
Embedded CSSContents of all STYLE elements1,321,006 46.83%17081.6

Maximum quantities of the CSS usage types

Fig 2-2: URLs with the most external CSS
URLExternal
CSS qty
http://www.rocblanchotels.com/45
http://www2k.biglobe.ne.jp/~hideko/ (URL no longer active)39
http://www.frelimo.org.mz/34
Fig 2-3: URLs with the most embedded CSS
URLEmbedded
CSS qty
http://www.1000irani.com/708
http://www.australianet.net/tra/ (URL no longer active)665
http://mocambique1.blogs.sapo.pt/ (URL no longer active)196
Fig 2-4: URLs with the most Style attributes
URLStyle
attribute
qty
http://www.albumdefamille.neufblog.com/ (URL no longer active)21,293
http://www.azeribook.com/folk/molla_nasreddin.htm (URL no longer active)20,602
http://members.1012surfnet.at/sabatieu/ (URL no longer active)19,041

Venn diagram: CSS usage by type

The most popular combination of CSS methods is external CSS in conjunction with inline CSS. The least popular mixing is external CSS paired with embedded CSS. To get a clearer view of the uses and intersections of the different CSS methods, a Venn diagram is helpful:

Note: Region sizes are not to scale

Venn diagram for CSS usage types

Sizes: External CSS (LINK element)

Tracking external CSS generated some errors and exposed a few problems that will be changed in MAMA's next version. A number of cases (~30) had a total analyzed external CSS size of about 5 megabytes, but in the live version of those URLs, these sizes were not at all reproducible. Five megabytes is the maximum that MAMA allowed any downloadable component to reach before aborting, so it explains why there was so much clustering around that value. There may have been a runaway/race condition in any number of sources—the Web server, network conditions, the Perl LWP fetching code, etc. There is no way to be sure what happened—some of the remaining extreme size cases exhibit aberrant behavior; for example, one URL had an external CSS reference to an MP4 video! Another URL had a reasonably-sized external CSS file, but there were a large number of garbage characters at the very end (making the file size over a megabyte). Other instances demonstrated a weakness in MAMA's approach to serializing framed content for analysis; sometimes an external CSS was referenced in multiple frames and therefore added more than once to the overall size sums.

The largest case of external CSS use that was both realistic and verifiable at the time of writing was http://www.goldeneaglecoin.com/, which had 868,426 characters. The site http://www.skatteinform.dk/ (URL no longer active) had a high number of garbage characters, and http://www.vanderlande.com was the case where MAMA repeatedly added the external CSS references in all IFRAMEs to the overall sizes. Trying to offset these problems a little, all cases above 900,000 characters were tossed out to find an average of 8387.1 characters for external style sheets. MAMA attempted to filter out Web servers silently redirecting to non-CSS content, but more could be done—there are probably a number of cases where external CSS redirects to the URL's home page or other HTML pages. An example that demonstrates the effects of failing to do this is http://www.dmc.tv/, in which 5 of the 13 detected external CSS references silently pointed to HTML pages.

Size distribution of external CSS (LINK element)

Fig 4-1: External CSS sizes (LINK element)
Size rangeFrequency Size rangeFrequency Size rangeFrequency
=01,727,948>7000 && <=800063,619 >25000 && <=3000032,007
>0 && <= 500121,838>8000 && <=900054,192 >30000 && <=3500028,166
>500 && <=1000152,027>9000 && <=1000047,634 >35000 && <=4000015,931
>1000 && <=2000267,587>10000 && <=1200079,179 >40000 && <=450009,562
>2000 && <=3000202,354>12000 && <=1400061,637 >45000 && <=5000010,112
>3000 && <=4000150,273>14000 && <=1600042,080 >50000 && <=7500019,512
>4000 && <=5000115,056>16000 && <=1800033,164 >75000 && <=1000006,874
>5000 && <=6000105,108>18000 && <=2000028,657 >100000 && <=1500004,403
>6000 && <=700080,456>20000 && <=2500048,152 >1500001,652

Maximum sizes of external CSS (LINK element)

Fig 4-2: URLs with the largest external CSS sizes (LINK element)
URLExternal CSS
size (chars)
http://www.skatteinform.dk/ (URL no longer active)1,051,671
http://www.goldeneaglecoin.com/868,426
http://www.vanderlande.com/Pages/default.aspx (URL no longer active)763,681

Sizes: Embedded CSS (STYLE element)

MAMA detected a URL with embedded CSS sections totalling 1,572,504, but that points out a potential problem with this kind of tally: whitespace. The huge sizes reported for the top 3 of the URLs listed below is the result of EXCESSIVE use of extra whitespace. Such situations are probably the result of misconfigured Web servers and large amounts of code from pre-processed languages (such as PHP or ASP). The pre-processed code itself is not present in the rendered document, but extra whitespace is added for every line of the code resulting in an astronomical amount of spaces that dominate a document's downloaded size. With that in mind, MAMA's average embedded CSS size was 953.6 characters.

Size distribution of embedded CSS (STYLE element)

Fig 5-1: Embedded CSS sizes (STYLE element)
Size rangeFrequency Size rangeFrequency Size rangeFrequency
=02,196,509>5000 && <=600012,575 >14000 && <=160001,237
>0 && <= 500865,538>6000 && <=700013,346 >16000 && <=18000920
>500 && <=1000183,220>7000 && <=80007,252 >18000 && <=200001,002
>1000 && <=2000124,580>8000 && <=90005,544 >20000 && <=400002,830
>2000 && <=300044,621>9000 && <=100002,249 >40000 && <=60000584
>3000 && <=400024,977>10000 && <=120004,455 >60000 && <=80000111
>4000 && <=500015,466>12000 && <=140002,040 >80000124

Maximum sizes of embedded CSS (STYLE element)

Fig 5-2: URLs with the largest STYLE element sizes
URLSTYLE element
size (chars)
http://www.moundsviewschools.org/belair/home.htm1,572,504
http://www.macuisinechezvous.com/1,049,159
http://www.procolharum.com/1,048,724
http://www.kshamm.de/956,362

Sizes: Inline CSS (Style attribute)

The maximum recorded aggregate size of inline CSS was an enormous 2,589,039 characters (the total size for the main document size at that URL was over 3.5 Megabytes, so over 70% of this big document's size was Inline CSS! Excessive!). The average inline size was 1,006.8 characters. All the highest inline CSS size URL cases listed below were created by Microsoft Office and make heavy use of its "mso-" CSS extensions. In the future, it could be interesting to look deeper at the relative sizes of Microsoft Office document inline CSS versus non-MSOffice pages.

Size distribution of inline CSS (Style attribute)

Fig 6-1: Inline CSS sizes (Style attribute)
Size rangeFrequency Size rangeFrequency Size rangeFrequency
=01,610,667>5000 && <=600014,804 >14000 && <=160002,468
>0 && <= 5001,235,564>6000 && <=700010,296 >16000 && <=180002,840
>500 && <=1000262,306>7000 && <=80007,386 >18000 && <=200001,267
>1000 && <=2000196,019>8000 && <=90005,252 >20000 && <=400004,587
>2000 && <=300077,092>9000 && <=100004,111 >40000 && <=600001,092
>3000 && <=400039,154>10000 && <=120006,249 >60000 && <=80000403
>4000 && <=500023,166>12000 && <=140003,780 >80000677

Maximum sizes of inline CSS (Style attribute)

Fig 6-2: URLs with the largest Style attribute sizes
URLSTYLE element
size (chars)
http://www.albumdefamille.neufblog.com/ (URL no longer active)2,589,039
http://www.azeribook.com/folk/molla_nasreddin.htm (URL no longer active)1,423,099
http://www.fw.hu/eventoj/steb/vortaroj/elektra-terminaro.htm (URL no longer active)959,898

This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.

Comments

The forum archive of this article is still available on My Opera.

No new comments accepted.