3.2 Elements, Tags, and Attributes

All the vocabularies written in XML share certain characteristics. This is hardly surprising, as the philosophy behind XML will inevitably show through. One of the most obvious manifestations of this philosophy is that of content and elements.

Your documentation (whether it is a single web page, or a lengthy book) is considered to consist of content. This content is then divided (and further subdivided) into elements. The purpose of adding markup is to name and identify the boundaries of these elements for further processing.

For example, consider a typical book. At the very top level, the book is itself an element. This “book” element obviously contains chapters, which can be considered to be elements in their own right. Each chapter will contain more elements, such as paragraphs, quotations, and footnotes. Each paragraph might contain further elements, identifying content that was direct speech, or the name of a character in the story.

You might like to think of this as “chunking” content. At the very top level you have one chunk, the book. Look a little deeper, and you have more chunks, the individual chapters. These are chunked further into paragraphs, footnotes, character names, and so on.

Notice how you can make this differentiation between different elements of the content without resorting to any XML terms. It really is surprisingly straightforward. You could do this with a highlighter pen and a printout of the book, using different colors to indicate different chunks of content.

Of course, we do not have an electronic highlighter pen, so we need some other way of indicating which element each piece of content belongs to. In languages written in XML (XHTML, DocBook, et al) this is done by means of tags.

A tag is used to identify where a particular element starts, and where the element ends. The tag is not part of the element itself. Because each grammar was normally written to mark up specific types of information, each one will recognize different elements, and will therefore have different names for the tags.

For an element called element-name the start tag will normally look like <element-name>. The corresponding closing tag for this element is </element-name>.

Example 3-1. Using an Element (Start and End Tags)

XHTML has an element for indicating that the content enclosed by the element is a paragraph, called <p>.

<p>This is a paragraph.  It starts with the start tag for
  the 'p' element, and it will end with the end tag for the 'p'
  element.</p>

<p>This is another paragraph.  But this one is much shorter.</p>

Some elements have no content. For example, in XHTML you can indicate that you want a horizontal line to appear in the document.

For such elements, that have no content at all, XML introduced a shorthand form, which is ccompletely equivalent to the above form:

<hr/>

Example 3-2. Using an Element (Without Content)

XHTML has an element for indicating a horizontal rule, called <hr>. This element does not wrap content, so it looks like this.

<p>One paragraph.</p>
<hr></hr>

<p>This is another paragraph.  A horizontal rule separates this
  from the previous paragraph.</p>

For such elements, that have no content at all, XML introduced a shorthand form, which is ccompletely equivalent to the above form:

<p>One paragraph.</p>
<hr/>

<p>This is another paragraph.  A horizontal rule separates this
  from the previous paragraph.</p>

If it is not obvious by now, elements can contain other elements. In the book example earlier, the book element contained all the chapter elements, which in turn contained all the paragraph elements, and so on.

Example 3-3. Elements within Elements; <em>

<p>This is a simple <em>paragraph</em> where some
  of the <em>words</em> have been <em>emphasized</em>.</p>

The grammar will specify the rules detailing which elements can contain other elements, and exactly what they can contain.

Important: People often confuse the terms tags and elements, and use the terms as if they were interchangeable. They are not.

An element is a conceptual part of your document. An element has a defined start and end. The tags mark where the element starts and end.

When this document (or anyone else knowledgeable about XML) refers to “the <p> tag” they mean the literal text consisting of the three characters <, p, and >. But the phrase “the <p> element” refers to the whole element.

This distinction is very subtle. But keep it in mind.

Elements can have attributes. An attribute has a name and a value, and is used for adding extra information to the element. This might be information that indicates how the content should be rendered, or might be something that uniquely identifies that occurrence of the element, or it might be something else.

An element's attributes are written inside the start tag for that element, and take the form attribute-name="attribute-value".

In XHTML, the <p> element has an attribute called <align>, which suggests an alignment (justification) for the paragraph to the program displaying the XHTML.

The align attribute can take one of four defined values, left, center, right and justify. If the attribute is not specified then the default is left.

Example 3-4. Using An Element with An Attribute

<p align="left">The inclusion of the align attribute
  on this paragraph was superfluous, since the default is left.</p>

<p align="center">This may appear in the center.</p>

Some attributes will only take specific values, such as left or justify. Others will allow you to enter anything you want.

Example 3-5. Single Quotes Around Attributes

<p align='right'>I am on the right!</p>

XML requires you to quote each attribute value with either single or double quotes. It is more habitual to use double quotes but you may use single quotes, as well. Using single quotes is practical if you want to include double quotes in the attribute value.

The information on attributes, elements, and tags is stored in XML catalogs. The various Documentation Project tools use these catalog files to validate your work. The tools in textproc/docproj include a variety of XML catalog files. The FreeBSD Documentation Project includes its own set of catalog files. Your tools need to know about both sorts of catalog files.

3.2.1 For You to Do…

In order to run the examples in this document you will need to install some software on your system and ensure that an environment variable is set correctly.

  1. Download and install textproc/docproj from the FreeBSD ports system. This is a meta-port that should download and install all of the programs and supporting files that are used by the Documentation Project.

  2. Add lines to your shell startup files to set SGML_CATALOG_FILES. (If you are not working on the English version of the documentation, you will want to substitute the correct directory for your language.)

    Example 3-6. .profile, for sh(1) and bash(1) Users

    SGML_ROOT=/usr/local/share/xml
    SGML_CATALOG_FILES=${SGML_ROOT}/jade/catalog
    SGML_CATALOG_FILES=${SGML_ROOT}/docbook/4.1/catalog:$SGML_CATALOG_FILES
    SGML_CATALOG_FILES=${SGML_ROOT}/html/catalog:$SGML_CATALOG_FILES
    SGML_CATALOG_FILES=${SGML_ROOT}/iso8879/catalog:$SGML_CATALOG_FILES
    SGML_CATALOG_FILES=/usr/doc/share/xml/catalog:$SGML_CATALOG_FILES
    SGML_CATALOG_FILES=/usr/doc/en_US.ISO8859-1/share/xml/catalog:$SGML_CATALOG_FILES
    export SGML_CATALOG_FILES
    

    Example 3-7. .cshrc, for csh(1) and tcsh(1) Users

    setenv SGML_ROOT /usr/local/share/xml
    setenv SGML_CATALOG_FILES ${SGML_ROOT}/jade/catalog
    setenv SGML_CATALOG_FILES ${SGML_ROOT}/docbook/4.1/catalog:$SGML_CATALOG_FILES
    setenv SGML_CATALOG_FILES ${SGML_ROOT}/html/catalog:$SGML_CATALOG_FILES
    setenv SGML_CATALOG_FILES ${SGML_ROOT}/iso8879/catalog:$SGML_CATALOG_FILES
    setenv SGML_CATALOG_FILES /usr/doc/share/xml/catalog:$SGML_CATALOG_FILES
    setenv SGML_CATALOG_FILES /usr/doc/en_US.ISO8859-1/share/xml/catalog:$SGML_CATALOG_FILES
    

    Then either log out, and log back in again, or run those commands from the command line to set the variable values.

  1. Create example.xml, and enter the following text:

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    
    <html xmlns="http://www.w3.org/1999/xhtml">
      <head>
        <title>An Example XHTML File</title>
      </head>
    
      <body>
        <p>This is a paragraph containing some text.</p>
    
        <p>This paragraph contains some more text.</p>
    
        <p align="right">This paragraph might be right-justified.</p>
      </body>
    </html>
    
  2. Try to validate this file using an XML parser.

    Part of textproc/docproj is the xmllint validating parser.

    Use xmllint in the following way to check that your document is valid:

    % xmllint --valid --noout example.xml
    

    As you will see, xmllint returns without displaying any output. This means that your document validated successfully.

  3. See what happens when required elements are omitted. Try removing the <title> and </title> tags, and re-run the validation.

    % xmllint --valid --noout example.xml
    example.xml:5: element head: validity error : Element head content does not follow the DTD, expecting ((script | style | meta | link | object | isindex)* , ((title , (script | style | meta | link | object | isindex)* , (base , (script | style | meta | link | object | isindex)*)?) | (base , (script | style | meta | link | object | isindex)* , title , (script | style | meta | link | object | isindex)*))), got ()
    

    This line tells you that the validation error comes from the fifth line of the example.xml file and that the content of the <head> is the part, which does not follow the rules described by the XHTML grammar.

    Below this line xmllint will show you the line where the error has been found and will also mark the exact character position with a ^ sign.

  4. Put the <title> element back in.