XML Namespace Basics

The Taxonomy XML schema makes use of XML Namespaces. If you are familiar with XML namespaces you may skip to the next section. Continue reading for an introduction to namespaces.

Sometimes it is useful to combine XML data content from different sources in a single XML document. Suppose you have a lot of fish photos you shot last time you went diving with your waterproof camera in your grandma's fishtank. You uploaded your photos to a photo sharing service which provides photo metadata in XML format. In this format, the <date> element contains the date when you took the photo, <exposure> and <focalLength> give technical data about the photo etc.

In addition to the photo sharing service you are also using a biology site to get information about fish you photographed, also in XML. This XML format contains elements such as <species>, <genus>, <family> etc. Over the weekend you wrote a little application that combines data from both sources and consolidates them in a single XML file. Your plan is to use the combined XML data over next couple of dozen weekends to build an enterprise-strength application that produces a beautiful illustrated catalog of fish in PDF format.

So far so good. Now, you know what's coming: the biology XML format also uses an element called <date>, which this time means something completely different: the date the fish was first observed. This is a problem, because now it is impossible to distinguish between these two different dates. The solution is to use prefixes to distinguish your photo sharing service's XML vocabulary from the the one used by the biology site. For example you could use <photo:date> for the first element and <bio:date> for the second element. Colon is a valid character to use in XML identifiers so everything is fine. You don't really need fancy namespaces to build this renaming into your application. However, using the W3C recommendation has the advantage that the XML tools and libraries that you are using would take care of namespace issues automatically. Taxonomy of Human Services XML format uses namespaces for the same reason: to make it easier for different vendors to integrate the Taxonomy data with other data sources while using widely available processing tools. Here's how your XML might look like using W3C namespaces:


<?xml version="1.0"?>
<fc:fishPic
  xmlns:fc="http://joe.example.org/namespaces/fishphotos"
  xmlns:bio="http://biology.example.org/classification.html"
  xmlns:photo="http://photosh.example.org/metadata.xsd">

  <fc:title>Photo of a deepwater stingray from
            my grandmother's fishtank</fc:title>

  <photo:date>2006-08-30</photo:date>
  <photo:exposure>1/15</photo:exposure>
  <photo:focalLength>50 mm</photo:focalLength>
  ...
  <bio:date>1899-06-06</bio:date>
  <bio:species>Plesiobatis daviesi</bio:species>
  ...
</fc:fishPic>
In this example we are using three different namespaces. Now, pay close attention: the names of these namespaces are:

Each of these namespaces is declared using the special xmlns attribute and associated with namespace prefixes fc, bio, and photo.

If these names look like something you can type in your browser that's... a pure accident! Well, not exactly: the W3C recommendation says that namespace names must be URIs. This means that following strings would be just as suitable as namespace names as the ones above:

The idea behind using URIs for namespace names are that URIs are designed to be unique and persistent. For example, if 212-555-0123 were your phone number it is highly unlikely that anybody else would use the tel URI tel:+1-212-555-0123 as a namespace name. Also, if an XML vocabulary is described in a book, using the book's ISBN in form of its URN (also a form of URI, proposed in RFC 3187) would make it unlikely that another person uses the same string for describing his XML vocabulary of things completely unrelated with the content of that book.

People usually use a URL starting with http:// as I did in the original example. It is important to note that using a namespace name such as http://joe.example.org/namespaces/fishphotos does not mean that there is a page retrievable under that URL or even that the domain joe.example.org exists. It is a good idea, however, to use a domain name that you control. If everybody acts that way namespace clashes can't happen. While you are at that you might as well put a file behind the URL. You can use an HTML page explaining the vocabulary or another useful file related to the vocabulary, such as an XML schema definition file that formalizes the vocabulary.

Namespace names are URIs in order to be globally unique. Unfortunately, that usually makes them quite long. It would be very clumsy if we had to prefix each XML element with the full namespace name. It would even lead to invalid XML because element and attribute names must not contain characters such as slashes. That's why we have prefixes. In our example fc, bio, and photo stand for their respective namespaces. Prefixes have meaning only inside document where they are declared. In the above example I could have used a different set of prefixes such as f, b, and p. The resulting document would be completely equivalent.

So far we have talked about namespaces without mention of XML schemas. We could do that because it XML namespaces can be used without schemas. Taxonomy of Human Services uses an XML schema so that's what we'll talk about next.

XML Schema

In our example we had XML elements belonging to different vocabularies. For example <exposure> element belonged to the vocabulary of our fictitious photo sharing service. Any constraints on the form and content of those vocabularies were implicit, i.e. in programmer's head, code, or maybe in documentation. For example we never formally specified that <fishPic> element from the namespace http://joe.example.org/namespaces/fishphotos may contain a <title> element from the same namespace and an <exposure> element from the namespace http://photosh.example.org/metadata.xsd. Also, we never said in what format the two date elements should be represented.

XML schemas are languages for specifying this kind of constraints. Taxonomy of Human Services is using the XML Schema language from W3C. The current version of the XML Schema Definition file for the Taxonomy can always be found at http://www.211taxonomy.org/resources/xml_schema/taxonomy.xsd.

W3C's XML Schema definition language is itself an XML vocabulary which makes heavy use of namespaces. We'll explain the XML Schema basics as we explain the way it is used in Taxonomy of Human Services.

Taxonomy of Human Services XML Instance and XML Schema

The Instance

At the top of the taxonomy.xml file and you'll notice that it contains an xmlns attribute on the top level element <taxonomy>:
<taxonomy name="Taxonomy of Human Services"
 releaseDate="2006-08-14T18:27:31Z"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://www.211taxonomy.org/resources/xml_schema/taxonomy.xsd taxonomy.xsd"
 xmlns="http://www.211taxonomy.org/resources/xml_schema/taxonomy.xsd">
  <record code="B">
   <name>Basic Needs</name>
   <definition>Programs that furnish survival level resources...</definition>
   <createdDate>1992-03-10</createdDate>
   <lastModifiedDate>2005-03-02</lastModifiedDate>
   ...
</taxonomy>

Notice two aspects of this declaration: first, for the namespace name we are using the URL of the XSD file. Although this is quite a common practice, remember that we could just as well have used the URL of the schema documentation, the home page or any other valid URI. Second, notice that we are not using a prefix. This is a feature of XML Namespaces that we didn't mention before: each document can declare one namespace without a prefix. This way our document is more readable.

Now, let's take a look at these two lines of taxonomy.xml:

<taxonomy name="Taxonomy of Human Services"
 releaseDate="2006-08-14T18:27:31Z"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://www.211taxonomy.org/resources/xml_schema/taxonomy.xsd taxonomy.xsd"
 xmlns="http://www.211taxonomy.org/resources/xml_schema/taxonomy.xsd">
  <record code="B">
   <name>Basic Needs</name>
   <definition>Programs that furnish survival level resources...</definition>
   <createdDate>1992-03-10</createdDate>
   <lastModifiedDate>2005-03-02</lastModifiedDate>
   ...
</taxonomy>
Here, we are declaring a special namespace http://www.w3.org/2001/XMLSchema-instance and immediately using the attribute schemaLocation from that namespace. This is one of the few attributes defined in the XML Schema specification intended for (optional) use in the document instance (as opposed to the schema definition file). As you might have guessed this attribute gives a hint at where the actual XML Schema definition file is located.

Notice that the value of our schemaLocation attribute consists of a two-element space-separated list. The first list element is:

http://www.211taxonomy.org/resources/xml_schema/taxonomy.xsd

While the second list element is:

taxonomy.xsd

According to the spec, the first element is the name of the namespace for which we are locating the XSD file. The second element is the file's URI. The above value for the file's URI may seem wrong at first. Shouldn't we put the full URL of the file as well? That would certainly be a valid choice. So why did we just put taxonomy.xsd and how is that to be interpreted? This is simply a relative URI. If you ever worked with HTML you used relative URIs all the time in anchor elements such as <a href="foo.html">...</a>. So, taxonomy.xsd will be retrieved from the directory where the instance document taxonomy.xml is stored. It is a good idea to store the XSD file locally so that it works even if the 211taxonomy.org website isn't reachable. Putting the XSD file next to the instance document is an easy way to achieving this without special configuration steps.

What about users who still want to retrieve the XSD file over the Internet rather than keep a local copy? They should be fine as well: the XML Schema spec says that schema-aware processors may try to resolve the namespace URI and grab the XSD from there. If taxonomy.xsd is nowhere to be found locally, most such processors will probably try to do that. All this isn't that important, really, as every schema-aware XML processor should be able to configure these things separately so that any undesired effects of these choices can be overridden.

Finally, let's take a look at the Taxonomy XML Schema itself.

The Schema

Here's a representative part of the XML Schema for Taxonomy of Human Services:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:tx="http://www.211taxonomy.org/resources/xml_schema/taxonomy.xsd" targetNamespace="http://www.211taxonomy.org/resources/xml_schema/taxonomy.xsd" elementFormDefault="qualified" attributeFormDefault="unqualified" version="8">
  <annotation>
    <appinfo>
      <csv-id>$Id: explanation.adp,v 1.2 2014/01/13 18:05:02 dbauer Exp $</csv-id>
    </appinfo>
    <documentation>
The AIRS/INFO LINE Taxonomy of Human Services. For schema documentation see <a href="http://211taxonomy.org/resources/xml_schema/docs/docs.html">http://211taxonomy.org/resources/xml_schema/docs/docs.html</a> as well as <a href="http://211taxonomy.org/resources/xml_schema/explanation">http://211taxonomy.org/resources/xml_schema/explanation</a>
  </documentation>
  </annotation>
  <element name="bibliographicReference" type="string">
    <annotation>
      <documentation>A list of references which credits sources used in writing taxonomy definitions or structuring taxonomy sections.</documentation>
    </annotation>
  </element>
  <element name="comments" type="string"> 
    <annotation>
      <documentation>Comments on the term in plain text.</documentation>
    </annotation>
  </element>
  <element name="createdDate" type="date">
    <annotation>
      <documentation>The date a term was first added to the taxonomy.</documentation>
    </annotation>
  </element>
  <element name="definition" type="string">
    <annotation>
      <documentation>A plain text description of the meaning of the taxonomy term.</documentation>
    </annotation>
  </element>
  <element name="externalCode" type="token">
    <annotation>
      <documentation>A code in an external classification system.  Not to be confused with a taxonomy code.</documentation>
    </annotation>
  </element>
  <element name="externalTerm">
    <annotation>
      <documentation>A term in another classification system which corresponds to this taxonomy term.</documentation>
    </annotation>
    <complexType>
      <sequence>
        <element ref="tx:system">
          <annotation>
            <documentation>Values like NPC, NTEE, UWASIS go here.</documentation>
          </annotation>
        </element>
        <element ref="tx:externalCode">
          <annotation>
            <documentation>Code in the external system. Not to be confused with the code attribute of a taxonomy record.</documentation>
          </annotation>
        </element>
        <element ref="tx:name">
          <annotation>
            <documentation/>
          </annotation>
        </element>
      </sequence>
    </complexType>
  </element>  ...
</schema>
On the top element we have two xmlns declaration. The namespace http://www.w3.org/2001/XMLSchema represents the vocabulary of the XML Schema language. We chose this namespace to be the default so that all the elements and attributes can be used without prefixes. This makes the file much easier to read. The second namespace is the familiar http://www.211taxonomy.org/resources/xml_schema/taxonomy.xsd. For this one we chose the prefix tx

It may seem odd that we had to declare this second namespace here. We are certainly not using any elements and attributes from that namespace in this file, since this file is used to define those elements and attributes! That's one of the peculiarities of XML Schema: in the schema definition namespaces are used not only for correct resolution of element and attribute names but also for resolving the content of certain attributes. For example <element ref="tx:system"> has to use tx:system in order to make it clear that it is referring to the system element defined in our namespace further down in the file. On the other hand <element name="externalCode" type="token"/> refers to the token data type which belongs to the XML Schema namespace. There is no prefix before token because its namespace, http://www.w3.org/2001/XMLSchema, happens to be the default namespace. Had we chosen a prefix such as xsd we would have had to write type="xsd:token" instead.

There is one more attribute to explain: The value of the targetNamespace attribute is http://www.211taxonomy.org/resources/xml_schema/taxonomy.xsd. This means that the elements and attributes defined in this XML Schema definition file belong to the namespace http://www.211taxonomy.org/resources/xml_schema/taxonomy.xsd. This way XML Schema processors know that every element and attribute defined in this file belongs to that namespace.

For further documentation on the Taxonomy XML schema, see the Schema documentation.

Privacy Policy. Terms of Use. Contact 211 LA County.