 
Another important part of XML is the Document Type Definition (DTD), a file associated with SGML and XML documents that defines how markup tags should be interpreted by the application reading the document. A DTD is what turns XML from a metalanguage to a true language designed for a specific task.
A DTD is a text document that contains a set of rules, formally known as "entity, element, and att-list (attribute) declarations," that define an XML markup language. It names new elements and describes the type of data or other elements that an element may contain. It also lists attributes for each element.
For example, if you were creating recipes to be accessed over the Web, you might create your own language called RML, or Recipe Markup Language. RML would have tags like <title> and <body>, but also RML-specific tags such as <ingredients>, <prep-time>, and <nutritionalInformation>.
These tags would be established in a DTD for the new language. The DTD imparts detailed information about what data should be found in each tag. A DTD for Recipe Markup Language might have a line like this:
<!ELEMENT ingredients ( li+, text? )>
The first line declares an element called ingredients. An ingredients tag can contain an li element and text. The plus sign (+) after li indicates that an ingredients element will have one or more li elements within it. The question mark after text shows that text is optional. The Recipe Markup Language DTD would also specify the li element:
<!ELEMENT li (#PCDATA)>
This element contains text only (PCDATA stands for "parsed character data" and is used to indicate text that may contain other tagged elements).
The complete set of rules for declaring entities and elements in a DTD is fairly complex and beyond the scope of this chapter. Refer to one of the sources listed at the end of this chapter for more information.
When an XML document conforms to all the rules established in the DTD, it is said to be valid, meaning that all the elements are used correctly.
TIP
A well-formed document is not necessarily valid, but if a document proves to be valid it follows that it is also well-formed.
When your document uses a DTD, you can check it for mistakes using a validating parser. The parser checks the document against the DTD for contextual errors, such as missing elements or improper order of elements. Most of the best validating parsers are free. Some common parsers are Xerces from the Apache XML Project (available at http://xml.apache.org) and Microsoft MSXML (http://msdn.microsoft.com/xml/default.asp). A full list of validating parsers is provided by Web Developer's Virtual Library at http://wdvl.com/Software/XML/parsers.html.
DTDs are not required and actually come with a few disadvantages. A DTD is useful when you have specific markup requirements to apply across a large number of documents. A DTD can ensure that certain data fields are present or delivered in a particular format. You may also want to spend the time preparing a DTD if you need to coordinate content from various sources and authors. Having a DTD makes it easier to find mistakes in your code.
The disadvantages to DTDs are that they require time and effort to develop and are inconvenient to maintain (particularly while the XML language is in flux). DTDs slow down processing times and may be too restrictive on the user's end. Another problem with DTDs is that they are not compatible with the namespace convention. Elements and attributes from another namespace won't validate under a DTD unless the DTD explicitly includes them (which defeats the purpose of namespaces in the first place). If you are creating just a few XML documents or if you are using namespaces, a DTD is undesirable.
 
Copyright © 2002 O'Reilly & Associates. All rights reserved.