21: Lesser-known semantic elements

By Mark Norman Francis

11th October 2012: Material moved to webplatform.org

The Opera web standards curriculum has now been moved to the docs section of the W3C webplatform.org site. Go there to find updated versions of these docs, and much more besides!

12th April 2012: This article is obsolete

The web standards curriculum has been donated to the W3C web education community group, to become part of a much bigger educational resource. It is constantly being updated so that it remains current with modern web design practices and technologies. To find the most up-to-date web standards curriculum, visit the web education community group Wiki. Please make changes to this Wiki yourself, or suggest changes to Chris Mills, who is also the chair of the web education community group.

Introduction

In this article I will introduce you to some of the more obscure and less well-known and used semantic elements in HTML. I’ll look at marking up programming code, interaction with computers, citations and abbreviations, showing changes made to documents and more. I will also finish up by looking at some of the proposals for new extra semantics made in the draft of HTML 5.

Note: After each code example, there is a “View source” link, which when clicked will take you to the actual rendered output of that source code, contained within a different file—it is provided so you can view live examples of how the source code is actually rendered in the browser, as well as looking at the code.

Highlighting contact information

The address element is probably the most badly named and misunderstood element in HTML. At first glance, with a name like “address” it would appear that it is used to encapsulate addresses, email, postal or otherwise. This is only partially the case.

The actual meaning of address is to supply contact information for the author or authors of the page, or the major section of the page, that it appears within. This can take the form of a name, an email address, a postal address or a link to another page with more contact information. For example:

<address> 
  <span>Mark Norman Francis</span>, 
  <span class="tel">1-800-555-4865</span>
</address>

View source

In the following example, the address is contained within the footer paragraph and simply links to another page on the site. The extended contact information on the page that this link targets could then have much more detailed contact information, to save repeating it endlessly across the entire site.

<p class="footer">© Copyright 2008</p>
<address>
<a href="/contact/">Mark Norman Francis</a>
</address>

View source

Of course, if the site had more than one author, the same pattern could be used, just linking to different contact pages for the different authors.

It is *incorrect* to use the address element to indicate any other type of addresses, such as this:

<p> Our company address: </p>
  <address>
    Opera Software ASA,
    Waldemar Thranes gate 98,
    NO-0175 OSLO,
    NORWAY
  </address>

View source

(Of course, if Opera was taking collective responsibility for this article, this would be correct, even though I, and not all of Opera, am the author of this particular page.)

For any general address, you can use something called a "microformat" to indicate that a paragraph contains an address. There is more information on Microformats in other articles on dev.opera.com.

Programming languages and code

The code element is used to indicate computer code or programming languages, such as PHP, JavaScript, CSS, XML and so on. For short samples within a sentence, you would simply wrap the element around the code snippet, like so:

<p>It is bad form to use event handlers like
<code>onclick</code> directly in the HTML.</p>

View source

For larger samples of code which span multiple lines, you can use a preformatted block as shown in the Marking up textual content in HTML article.

Although there is no defined method of indicating which programming language or code is shown within the code element, you can use the class attribute. A common practice (mentioned in the HTML 5 draft) is to use the prefix language- and then append the language name. So the above example would become:

<p>It is bad form to use event handlers like 
<code class="language-javascript">onclick</code>
directly in the HTML.</p>

View source

Some programming languages have names that cannot be easily represented in classes, such as C# (C-Sharp). The correct way of writing this would be “class='language-c\#'”, which could be confusing and easily mis-typed. I would recommend using a class consisting of only letters and hyphens, and spelling it out completely. So in this case, use “class='language-csharp'” instead.

Displaying computer interaction

The two elements samp and kbd can be used to indicate the input and output of interaction with a computer program. For example:

<p>One common method of determining if a computer is connected
to the internet is to use the computer program <code>ping</code>
to see if a computer likely to be running is reachable.</p>

<pre><samp class="prompt">kittaghy%</samp> <kbd>ping -o google.com</kbd>
  <samp>PING google.com (64.233.187.99): 56 data bytes
  64 bytes from 64.233.187.99: icmp_seq=0 ttl=242 time=108.995 ms

  --- google.com ping statistics ---
  1 packets transmitted, 1 packets received, 0% packet loss
  round-trip min/avg/max/stddev = 108.995/108.995/108.995/0.000 ms
  </samp></pre>

View source

The samp element indicates sample output from a computer program. As shown in the example, different types of output can be indicated using the class attribute. There are no widely adopted conventions for which kind of classes to use, however.

The kbd element indicates input from the user interacting with the computer. Although this is traditionally keyboard input (hence the “kbd” contraction used) it should also be used to indicate other types of input, such as spoken voice.

Variables

The var element is used to indicate variables in textual content. This can include algebraic mathematical expressions or within programming code. For example:

<p>The value of <var>x</var> in
 3<var>x</var>+2=14 is 4.</p>
    
<pre><code class="language-perl">
my <var>$welcome</var> = "Hello world!";
</code></pre>

View source

Citations

The cite element is used to indicate where the nearby content comes from—when quoting a person, a book or other publication, or generally referring people to another source, that source should be wrapped in a cite element. For example:

<p>The saying <q>Everything should be made as simple as possible,
but not simpler</q> is often attributed to <cite>Albert
Einstein</cite>, but it is actually a paraphrasing of a quote which
is much less easy to understand.</p>

View source

Abbreviations

The abbr and acronym elements are used to indicate where abbreviations occur, and provide a method for expanding upon them without unnecessarily interrupting the flow of the document.

The text that is the abbreviation gets wrapped in the abbr element, and the full version is placed in the title attribute, like so:

<p>Styling is added to 
<abbr title="Hypertext Markup Language">HTML</abbr> documents
using <abbr title="Cascading Style Sheets">CSS</abbr>.</p>

View source

An acronym is a type of abbreviation, with the difference that the result is accepted to be, and spoken as if it were, an actual word. An example is scuba, which is formed from the phrase “self-contained underwater breathing apparatus”. Whilst the HTML 4.01 specification allows for both abbr and acronym elements, there is some trouble trying to do the right thing here…

Internet Explorer (before version 7, and 7 doesn't provide the dotted underline underneath abbreviations that other browsers do) doesn't recognise the abbr element, but does recognise acronym. Unfortunately, acronyms are a subset of abbreviations and it is incorrect to markup something like “HTML” using the acronym element.

Also, in the draft of HTML 5, the acronym element has been dropped in favour of standardising on abbr for both, as any acronym is also a valid abbreviation.

The best thing to do is to avoid using acronym and just stick to using abbr throughout your code. If you need to apply some visual styling to the abbr, you can place a span inside it and target that instead of the abbr so that all browsers will get the visual styles applied. More details will appear in a later article on “styling text”.

Defining instances

There is some confusion over the proper use of dfn, which is described in the HTML specification as “the defining instance of the enclosed term”. This is remarkably close to the idea of the dt element (definition term) used in definition lists.

The difference is that the term used in dfn does not have to be a part of a list of terms and descriptions and can instead be used as part of the normal flow of text, even in conversational style prose. So, let's look at an example of using dfn:

<p><dfn>HTML</dfn>: HTML stands for "HyperText Markup Language". This is 
the language used to describe the contents of web documents.</p>

The term HTML appears, and is followed immediately by a definition of what it is, therefore this is an ideal place for the dfn eement to be used. You should only really use it once on a page, where a term is first defined, but terms should only really be defined once on a page anyway, so this is not too troubling.

This is all well and good, but an isolated example is not very practical - the use of dfn is recommended when an abbreviation is used more than once on a page. For example, in the article The basics of HTML earlier in this series, the abbreviation HTML appeared over forty times. To use the code “<abbr title="HyperText Markup Language">HTML</abbr>” each and every time it is used would be a waste of bandwidth, visually distracting and for screen reader users probably quite tiresome as HTML is expanded over and over, even though they would already have been told what it stands for. Instead, the code could be inserted at the point where it is first defined for the readers:

<p><dfn><abbr>HTML</abbr></dfn> ("HyperText Markup Language") is 
a language to describe the contents of web documents.</p>

View source

Then later, whenever HTML is used, it can be marked up simply as “<abbr>HTML</abbr>”. A user agent could then make available to the user some method of retrieving the defining instance of that abbreviation. Unfortunately, no user agent currently does this, including screen readers. It would be better, then, to use the title attribute as well to provide this information:

<p><dfn><abbr title="HyperText Markup Language">HTML</abbr></dfn> ("HyperText Markup Language") is a language to describe the contents of web documents.</p>

View source

Unfortunately, we have now doubled up on the expanded term for HTML, which can be a problem for screen reader users. However, leaving out the visible expansion makes the document less useful for sighted users which will be the greater proportion of users in almost every case.

I would suggest that this is an acceptable trade-off when there are only one or two items requiring a definition (in pages that require a larger number of definitions, it might be better to create a glossary section or page where the more rigourous definition list markup can be used). If you are very concerned about this, the code could instead appear as:

<p><abbr title="HyperText Markup Language">HTML</abbr> 
(<dfn>HyperText Markup Language</dfn>) is a language to 
describe the contents of web documents.</p>

View source.

However, the user agent would still have to have some method of connecting the definition with all the instances of the defined term. No browser currently does anything useful with dfn, although it is still a useful hook for CSS to style. The solution suggested above is currently the best we’ve got.

This is an unfortunate instance where the specification has been created without clear guidelines on how an element is supposed to be used, and probably was not based upon any real-world usage of that element — otherwise there would be a method of combining the term with its full description or definition. The HTML 5 specification goes into a lot more detail about how dfn is to be used, but this is still in draft and not suitable for use on the web yet.

Superscript and subscript

To mark up a part of some text as being super- or subscripted (slightly raised or lowered compared to the rest of the text) you use the sup and sub elements.

Some languages require these elements for correct usage of abbreviations and it can be used when a small amount of mathematical content is being marked up, without resorting to using MathML (a specific, rather heavyweight mathematical markup language, created for the sole purpose of marking up heavyweight mathematical formulae).

An example of both types:

<p>The chemical formula for water is H<sub>2</sub>O, and it
is also known as hydrogen hydroxide.</p>
<p>The famous formula for mass-energy equivalence as derived
by Albert Einstein is E=mc<sup>2</sup> — energy 
is equal to the mass multiplied by the speed of light 
squared.</p>

View source

Line breaks

Because of the way HTML defines white space, it is not possible to control where lines of text break (such as marking up a postal address as a paragraph, but wanting the visual appearance to have each part of the address appear on a separate line) by simply pressing the Return key whilst writing the text.

A line break can be introduced into the document using the br element. However, this should only be used to force line breaks where they are required, and never to apply more vertical spacing between paragraphs or such in a document—that is more properly done with CSS.

Sometimes it might be easier to use the preformatted text block rather than inserting br elements. Or, if one particular part of some text is desired to be on a line by itself, but this is just a styling issue, it can be surrounded by a span element and set to display as a block level element.

So for example you could write the Opera contact address seen earlier in this article when talking about the address element like this instead:

<p>Our company address: </p>
<address>
Opera Software ASA,<br>Waldemar Thranes gate 98,<br>
NO-0175 OSLO,<br>NORWAY
</address>

View source

Of course, if you are writing XHTML rather than HTML, the element should be self-closing, like so: <br />.

Horizontal rules

A horizontal rule is created in HTML with the hr element. It inserts into the document a line, which is described to represent a boundary between different sections of a document.

Whilst some argue that this is inherently non-semantic and purely a visual, presentational effect, there is actually some precedent in literature for such an element to exist. Within a chapter (which could be described as a section within a book), a horizontal rule will appear between scenes that occur in different times and/or places. Also, poetry can use decorative breaks to separate different stanzas of the poem.

Neither use would justify the existence of a new header element, which is the accepted way of marking the boundaries between document sections.

The hr element has no uncommon attributes and should be styled using CSS if the default appearance in unsatisfactory.

Also, like the line break, if you are writing XHTML and not HTML, use the self-closing form—<hr />.

Changes to documents (inserting and deleting)

If a document has been changed since the first time it was available, you can mark these changes so that return visitors or automated processes can tell what has changed, and when.

New text (insertions) should be surrounded by the ins element. Text that has been removed (deletions) should be surrounded by the del element. If a deletion and insertion have been made at the same point in the document, good form suggests having the deleted text first, followed by the insertion.

Both elements can take two attributes that give more meaning to the edits.

If the reason for the change is stated in the page or elsewhere on the web, you should link to that document or fragment in the cite attribute. This effectively says “This change happened because of this reason.”

You can also indicate the time at which the change was made by using a datetime attribute. The value should be an ISO-standard timestamp, which is generally of the form “YYYY-MM-DD HH:MM:SS ±HH:MM” (more information is available on wikipedia).

An example using both attributes:

<p>We should only solve problems that actually arise. As
  <cite><del datetime="2008-03-25 18:26:55 Z"
  cite="/changes.html#revision-4">Donald Knuth</del><ins
  datetime="2008-03-25 18:26:55 Z"
  cite="/changes.html#revision-4">C. A. R. Hoare</ins></cite>
  said: <q>premature optimization is the root of all 
  evil</q>.</p>

View source

Some future HTML elements

As has been noted several times in this and some other articles, HTML version 5 is being drafted at the moment. This will be the most radical update to HTML since its inception. By actually studying the patterns of HTML being used right now on the internet, rather than thinking about what might be useful to people, it stands a good chance of taking document semantics that are currently little more than convention and inserting them directly into the specification.

Some example elements slated to be introduced in HTML that could really improve the way we encode and use documents include:

header—contains the header (masthead) of a page; normally consisting of a logo and title, maybe a short “about” area and some site-global navigation such as login/logout/profile links.
footer—contains the footer of a page, which normally consists of further links within a site, copyright and other legal information.
nav—contains the primary navigation links of a page.
article—contains the part of a page that is the main content area, excluding all other page elements such as navigation, header and footer.
aside—contains sidebar information on a given area of the page, and can also be used for pull quotes or notes within the main content.

There are more, which you can find in the HTML 5 specification itself.

Summary

In this article, I have described some of the lesser known and more infrequently used semantic elements available in HTML. In the next article, available soon, we will examine further how to correctly use the two semantically-neutral elements in HTML, div and span.

About the author

Picture of the article author Mark Norman Francis

Photo credit: Andy Budd.

Mark Norman Francis has been working with the internet since before the web was invented. In his last job he worked at Yahoo! as a Front End Architect for the world’s biggest website, defining best practices, coding standards and quality in web development internationally.

Previous to Yahoo! he worked at Formula One Management, Purple Interactive and City University in various roles including web development, backend CGI programming and systems architecture. He pretends to blog at http://marknormanfrancis.com/.

lesserknownsemantics.html#breaks

This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.

21: Lesser-known semantic elements

11th October 2012: Material moved to webplatform.org

12th April 2012: This article is obsolete

Introduction

Highlighting contact information

Programming languages and code

Displaying computer interaction

Variables

Citations

Abbreviations

Defining instances

Superscript and subscript

Line breaks

Horizontal rules

Changes to documents (inserting and deleting)

Some future HTML elements

Summary

About the author

Comments