21: Lesser-known semantic elements
11th October 2012: Material moved to webplatform.org
The Opera web standards curriculum has now been moved to the docs section of the W3C webplatform.org site. Go there to find updated versions of these docs, and much more besides!
12th April 2012: This article is obsolete
The web standards curriculum has been donated to the W3C web education community group, to become part of a much bigger educational resource. It is constantly being updated so that it remains current with modern web design practices and technologies. To find the most up-to-date web standards curriculum, visit the web education community group Wiki. Please make changes to this Wiki yourself, or suggest changes to Chris Mills, who is also the chair of the web education community group.
- Previous article - HTML forms—the basics
- Next article—Generic containers - the div and span elements
- Table of contents
Introduction
In this article I will introduce you to some of the more obscure and less well-known and used semantic elements in HTML. I’ll look at marking up programming code, interaction with computers, citations and abbreviations, showing changes made to documents and more. I will also finish up by looking at some of the proposals for new extra semantics made in the draft of HTML 5.
- Highlighting contact information
- Programming languages and code
- Displaying computer interaction
- Variables
- Citations
- Abbreviations
- Defining instances
- Superscript and subscript
- Line breaks
- Horizontal rules
- Changes to documents (inserting and deleting)
- Some future HTML elements
- Summary
Note: After each code example, there is a “View source” link, which when clicked will take you to the actual rendered output of that source code, contained within a different file—it is provided so you can view live examples of how the source code is actually rendered in the browser, as well as looking at the code.
Highlighting contact information
The address
element is probably the most badly named and
misunderstood element in HTML. At first glance, with a name like
“address” it would appear that it is used to encapsulate addresses,
email, postal or otherwise. This is only partially the case.
The actual meaning of address
is to supply contact information for the author or authors of the page, or the major section of the page, that it
appears within. This can take the form of a name, an email address, a
postal address or a link to another page with more contact
information. For example:
<address> <span>Mark Norman Francis</span>, <span class="tel">1-800-555-4865</span> </address>View source
In the following example, the address is contained within the footer paragraph and simply links to another page on the site. The extended contact information on the page that this link targets could then have much more detailed contact information, to save repeating it endlessly across the entire site.
<p class="footer">© Copyright 2008</p> <address> <a href="/contact/">Mark Norman Francis</a> </address>View source
Of course, if the site had more than one author, the same pattern could be used, just linking to different contact pages for the different authors.
It is *incorrect* to use the address
element to indicate any other
type of addresses, such as this:
<p> Our company address: </p> <address> Opera Software ASA, Waldemar Thranes gate 98, NO-0175 OSLO, NORWAY </address>View source
(Of course, if Opera was taking collective responsibility for this article, this would be correct, even though I, and not all of Opera, am the author of this particular page.)
For any general address, you can use something called a "microformat" to indicate that a paragraph contains an address. There is more information on Microformats in other articles on dev.opera.com.
Programming languages and code
The code
element is used to indicate computer code or programming
languages, such as PHP, JavaScript, CSS, XML and so on. For short
samples within a sentence, you would simply wrap the element around
the code snippet, like so:
<p>It is bad form to use event handlers like <code>onclick</code> directly in the HTML.</p>View source
For larger samples of code which span multiple lines, you can use a preformatted block as shown in the Marking up textual content in HTML article.
Although there is no defined method of indicating which programming
language or code is shown within the code
element, you can use
the class
attribute. A common practice (mentioned
in the HTML 5 draft) is to use the prefix language-
and then append
the language name. So the above example would become:
<p>It is bad form to use event handlers like <code class="language-javascript">onclick</code> directly in the HTML.</p>View source
Some programming languages have names that cannot be easily
represented in classes, such as C# (C-Sharp). The correct way of
writing this would be “class='language-c\#'
”, which could be confusing
and easily mis-typed. I would recommend using a class consisting of
only letters and hyphens, and spelling it out completely. So in this
case, use “class='language-csharp'
” instead.
Displaying computer interaction
The two elements samp
and kbd
can be used to indicate the input
and output of interaction with a computer program. For example:
<p>One common method of determining if a computer is connected to the internet is to use the computer program <code>ping</code> to see if a computer likely to be running is reachable.</p> <pre><samp class="prompt">kittaghy%</samp> <kbd>ping -o google.com</kbd> <samp>PING google.com (64.233.187.99): 56 data bytes 64 bytes from 64.233.187.99: icmp_seq=0 ttl=242 time=108.995 ms --- google.com ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max/stddev = 108.995/108.995/108.995/0.000 ms </samp></pre>View source
The samp
element indicates sample output from a computer program. As
shown in the example, different types of output can be indicated using
the class
attribute. There are no widely adopted conventions for which
kind of classes to use, however.
The kbd
element indicates input from the user interacting with the
computer. Although this is traditionally keyboard input (hence the
“kbd” contraction used) it should also be used to indicate other
types of input, such as spoken voice.
Variables
The var
element is used to indicate variables in textual content.
This can include algebraic mathematical expressions or within
programming code. For example:
<p>The value of <var>x</var> in 3<var>x</var>+2=14 is 4.</p> <pre><code class="language-perl"> my <var>$welcome</var> = "Hello world!"; </code></pre>View source
Citations
The cite
element is used to indicate where the nearby content comes
from—when quoting a person, a book or other publication, or
generally referring people to another source, that source should be
wrapped in a cite
element. For example:
<p>The saying <q>Everything should be made as simple as possible, but not simpler</q> is often attributed to <cite>Albert Einstein</cite>, but it is actually a paraphrasing of a quote which is much less easy to understand.</p>View source
Abbreviations
The abbr
and acronym
elements are used to indicate where
abbreviations occur, and provide a method for expanding upon them
without unnecessarily interrupting the flow of the document.
The text that is the abbreviation gets wrapped in the abbr
element,
and the full version is placed in the title
attribute, like so:
<p>Styling is added to <abbr title="Hypertext Markup Language">HTML</abbr> documents using <abbr title="Cascading Style Sheets">CSS</abbr>.</p>View source
An acronym is a type of abbreviation, with the difference that the
result is accepted to be, and spoken as if it were, an actual word.
An example is scuba, which is formed from the phrase “self-contained
underwater breathing apparatus”. Whilst the HTML 4.01 specification
allows for both abbr
and acronym
elements, there is some trouble
trying to do the right thing here…
Internet Explorer (before version 7, and 7 doesn't provide the dotted underline underneath abbreviations that other browsers do) doesn't recognise the abbr
element, but does recognise acronym
. Unfortunately, acronyms are a
subset of abbreviations and it is incorrect to markup something like
“HTML” using the acronym
element.
Also, in the draft of HTML 5, the acronym
element has been dropped
in favour of standardising on abbr
for both, as any acronym is also
a valid abbreviation.
The best thing to do is to avoid using acronym
and just stick to
using abbr
throughout your code. If you need to apply some visual
styling to the abbr
, you can place a span
inside it and target
that instead of the abbr
so that all browsers will get the visual
styles applied. More details will appear in a later article on
“styling text”.
Defining instances
There is some confusion over the proper use of dfn
, which is
described in the HTML specification as “the defining instance of the
enclosed term”. This is remarkably close to the idea of the dt
element (definition term) used in definition lists.
The difference is that the term used in dfn
does not have to be
a part of a list of terms and descriptions and can instead be used
as part of the normal flow of text, even in conversational style
prose. So, let's look at an example of using dfn
:
<p><dfn>HTML</dfn>: HTML stands for "HyperText Markup Language". This is the language used to describe the contents of web documents.</p>
The term HTML appears, and is followed immediately by a definition of what it is, therefore this is an ideal place for the dfn
eement to be used. You should only really use it once on a page, where a term is first defined, but terms should only really be defined once on a page anyway, so this is not too troubling.
This is all well and good, but an isolated example is not very practical - the use of dfn
is recommended when an abbreviation is used more than
once on a page. For example, in the article The basics of HTML earlier in
this series, the abbreviation HTML appeared over forty times. To
use the code “<abbr title="HyperText Markup Language">HTML</abbr>
”
each and every time it is used would be a waste of bandwidth,
visually distracting and for screen reader users probably quite
tiresome as HTML is expanded over and over, even though they would
already have been told what it stands for. Instead, the code could
be inserted at the point where it is first defined for the readers:
<p><dfn><abbr>HTML</abbr></dfn> ("HyperText Markup Language") is a language to describe the contents of web documents.</p>View source
Then later, whenever HTML is used, it can be marked up simply as
“<abbr>HTML</abbr>
”. A user agent could then make available to the
user some method of retrieving the defining instance of that
abbreviation. Unfortunately, no user agent currently does this,
including screen readers. It would be better, then, to use the
title
attribute as well to provide this information:
<p><dfn><abbr title="HyperText Markup Language">HTML</abbr></dfn> ("HyperText Markup Language") is a language to describe the contents of web documents.</p>View source
Unfortunately, we have now doubled up on the expanded term for HTML, which can be a problem for screen reader users. However, leaving out the visible expansion makes the document less useful for sighted users which will be the greater proportion of users in almost every case.
I would suggest that this is an acceptable trade-off when there are only one or two items requiring a definition (in pages that require a larger number of definitions, it might be better to create a glossary section or page where the more rigourous definition list markup can be used). If you are very concerned about this, the code could instead appear as:
<p><abbr title="HyperText Markup Language">HTML</abbr> (<dfn>HyperText Markup Language</dfn>) is a language to describe the contents of web documents.</p>
However, the user
agent would still have to have some method of connecting the definition with all the instances of the defined term. No browser currently does anything useful with dfn
, although it is still a useful hook for CSS to style. The solution suggested above is currently the best we’ve got.
This is an unfortunate instance where the specification has been
created without clear guidelines on how an element is supposed to be
used, and probably was not based upon any real-world usage of that
element — otherwise there would be a method of combining the
term with its full description or definition. The HTML 5
specification goes into a lot more detail about how dfn
is to be
used, but this is still in draft and not suitable for use on the
web yet.
Superscript and subscript
To mark up a part of some text as being super- or subscripted
(slightly raised or lowered compared to the rest of the text)
you use the sup
and sub
elements.
Some languages require these elements for correct usage of abbreviations and it can be used when a small amount of mathematical content is being marked up, without resorting to using MathML (a specific, rather heavyweight mathematical markup language, created for the sole purpose of marking up heavyweight mathematical formulae).
An example of both types:
<p>The chemical formula for water is H<sub>2</sub>O, and it is also known as hydrogen hydroxide.</p> <p>The famous formula for mass-energy equivalence as derived by Albert Einstein is E=mc<sup>2</sup> — energy is equal to the mass multiplied by the speed of light squared.</p>View source
Line breaks
Because of the way HTML defines white space, it is not possible to control where lines of text break (such as marking up a postal address as a paragraph, but wanting the visual appearance to have each part of the address appear on a separate line) by simply pressing the Return key whilst writing the text.
A line break can be introduced into the document using the br
element. However, this should only be used to force line breaks where
they are required, and never to apply more vertical spacing between
paragraphs or such in a document—that is more properly done
with CSS.
Sometimes it might be easier to use the preformatted text block
rather than inserting br
elements. Or, if one particular part
of some text is desired to be on a line by itself, but this is
just a styling issue, it can be surrounded by a span
element
and set to display as a block level element.
So for example you could write the Opera contact address seen
earlier in this article when talking about the address
element like
this instead:
<p>Our company address: </p> <address> Opera Software ASA,<br>Waldemar Thranes gate 98,<br> NO-0175 OSLO,<br>NORWAY </address>View source
Of course, if you are writing XHTML rather than HTML, the element should be self-closing, like so: <br />.
Horizontal rules
A horizontal rule is created in HTML with the hr
element. It
inserts into the document a line, which is described to represent
a boundary between different sections of a document.
Whilst some argue that this is inherently non-semantic and purely a visual, presentational effect, there is actually some precedent in literature for such an element to exist. Within a chapter (which could be described as a section within a book), a horizontal rule will appear between scenes that occur in different times and/or places. Also, poetry can use decorative breaks to separate different stanzas of the poem.
Neither use would justify the existence of a new header element, which is the accepted way of marking the boundaries between document sections.
The hr
element has no uncommon attributes and should be styled
using CSS if the default appearance in unsatisfactory.
Also, like the line break, if you are writing XHTML and not HTML, use the self-closing form—<hr />.
Changes to documents (inserting and deleting)
If a document has been changed since the first time it was available, you can mark these changes so that return visitors or automated processes can tell what has changed, and when.
New text (insertions) should be surrounded by the ins
element.
Text that has been removed (deletions) should be surrounded by the
del
element. If a deletion
and insertion have been made at the same point in the document, good form suggests
having the deleted text first, followed by the insertion.
Both elements can take two attributes that give more meaning to the edits.
If the reason for the change is stated in the page or elsewhere on the web, you should link to that document or fragment in the cite attribute. This effectively says “This change happened because of this reason.”
You can also indicate the time at which the change was made by using a datetime attribute. The value should be an ISO-standard timestamp, which is generally of the form “YYYY-MM-DD HH:MM:SS ±HH:MM” (more information is available on wikipedia).
An example using both attributes:
<p>We should only solve problems that actually arise. As <cite><del datetime="2008-03-25 18:26:55 Z" cite="/changes.html#revision-4">Donald Knuth</del><ins datetime="2008-03-25 18:26:55 Z" cite="/changes.html#revision-4">C. A. R. Hoare</ins></cite> said: <q>premature optimization is the root of all evil</q>.</p>View source
Some future HTML elements
As has been noted several times in this and some other articles, HTML version 5 is being drafted at the moment. This will be the most radical update to HTML since its inception. By actually studying the patterns of HTML being used right now on the internet, rather than thinking about what might be useful to people, it stands a good chance of taking document semantics that are currently little more than convention and inserting them directly into the specification.
Some example elements slated to be introduced in HTML that could really improve the way we encode and use documents include:
header
—contains the header (masthead) of a page; normally consisting of a logo and title, maybe a short “about” area and some site-global navigation such as login/logout/profile links.footer
—contains the footer of a page, which normally consists of further links within a site, copyright and other legal information.nav
—contains the primary navigation links of a page.article
—contains the part of a page that is the main content area, excluding all other page elements such as navigation, header and footer.aside
—contains sidebar information on a given area of the page, and can also be used for pull quotes or notes within the main content.
There are more, which you can find in the HTML 5 specification itself.
Summary
In this article, I have described some of the lesser known and more
infrequently used semantic elements available in HTML. In the next
article, available soon, we will examine further how to correctly use the two
semantically-neutral elements in HTML, div
and span
.
- Previous article - HTML forms—the basics
- Next article—Generic containers - the div and span elements
- Table of contents
About the author
Photo credit: Andy Budd.
Mark Norman Francis has been working with the internet since before the web was invented. In his last job he worked at Yahoo! as a Front End Architect for the world’s biggest website, defining best practices, coding standards and quality in web development internationally.
Previous to Yahoo! he worked at Formula One Management, Purple Interactive and City University in various roles including web development, backend CGI programming and systems architecture. He pretends to blog at http://marknormanfrancis.com/.
lesserknownsemantics.html#breaksThis article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.
Comments
The forum archive of this article is still available on My Opera.