The purpose of a
DTD (
Document Type Definition) is to provide a
parser with the necessary rules to confirm a particular
document is valid.
SGML has a rich and complex DTD syntax. XML has a much simpler DTD syntax. However, this is likely to be replaced by XML schema, which use the same syntax as XML itself. There are a number of syntax elements in the DTD, as follows.
!DOCTYPE
An XML DTD is referenced from an XML document using a "!DOCTYPE" tag. This takes one of two forms:
- Internal
<!DOCTYPE dtd-name [
...declarations...
]>
- External
<!DOCTYPE dtd-name SYSTEM "filename">
Notice that this doesn't follow XML
syntax rules.
!ELEMENT
Each
element used in the XML
grammar described by this DTD is defined in an "!ELEMENT" tag. This has the following format:
<!ELEMENT element-name (content-model)>
The content-model describes the content of this element. The syntax elements are:
- Grouping
- A content-model definition enclosed in brackets "()" can be treated as syntactically-equivalent to an element.
- Ordering
- A sequence of elements separated by commas must appear in the indicated order.
- Alternatives
- If a number of elements are separated by "|", any one (but only one) of them may appear.
- Occurance
- If an element is suffixed by "*", it may occur any number (zero or more) times.
- If an element is suffixed by "+", it must occur once or more than once.
- If an element is suffixed by "?", it must occur either zero times or once.
A number of special, predefined content-models exist that have special meanings:
- EMPTY
- This indicates that there must be no content for this element.
- ANY
- This indicates that any valid element may form the content for this element.
- #PCDATA
- This indicates that text may form the content of this element.
!ATTRIBUTE
XML entities can have
attributes. These must be defined in the DTD using the "!ATTRIBUTE" tag. This has the following format:
<!ATTRIBUTE element-name attribute-name attribute-type>
or
<!ATTRIBUTE element-name attribute-name attribute-type keyword>
or
<!ATTRIBUTE element-name attribute-name attribute-type default>
Multiple attributes may be specified by repeating the the attribute-name... syntax as many times as is required.
The following values are valid for attribute-type:
- CDATA
- The value can be any character data.
- ID
- The value must be a unique identifier.
- IDREF
- The value must be an existing identifier - i.e. this is a reference to an entity with the matching value in an ID attribute.
- NMTOKEN
- The value may contain only letters, digits and hyphens - i.e. valid characters for constructing names or tokens.
- ENTITY
- The value must be a valid entity.
- enumerated values
- A bracketed, |-separated list of valid values.
- The value may be any of the listed of values, which are separated by "|".
The following values are valid for the optional keyword:
- #REQUIRED
- This attribute must be specified on this entity.
- #IMPLIED
- This attribute may, optionally, be specified on this entity. If omitted, the reader will supply their own value.
- #FIXED value
- This attribute of this entity is always the value stated.
Finally, for an optional default value may be supplied. This is mutually exclusive with keyword.
!ENTITY
The
XML DTD syntax also allows for "
entities" to be defined. Essentially, these represent textual substitutions at one or other level. They exist in two forms: those that are substituted in the document (such as
& in HTML) and those that can be referenced elsewhere in the DTD itself. They are defined using the "!ENTITY" tag:
<!ENTITY entity-name entity-def>
where:
- entity-name
- is the name that will be expanded (e.g. "amp"). For use in the DTD, the name is preceeded by a "%" and a space.
- entity-def
- is the value that will replace the name. This can either be supplied directly or, if preceeded by the keyword "SYSTEM", by reference to a URL.
I was going to supply a DTD describing the XML DTD grammar. However, I don't believe this is possible given what I've described above. Instead, here's a DTD for
holloway's customer record file:
<!ELEMENT customer-file customer-details*>
<!ELEMENT customer-details name, address>
<!ELEMENT name #PCDATA>
<!ELEMENT address street, city, state, postal?>
<!ELEMENT street #PCDATA>
<!ELEMENT city #PCDATA>
<!ELEMENT state #PCDATA>
<!ELEMENT postal #PCDATA>
<!ATTRIBUTE customer-details id ID #REQUIRED>
<!ATTRIBUTE address country CDATA "US">
I've decided:
Of course, other DTDs could be written against which the example would be valid.
A
tutorial lives here:
http://zvon.org/xxl/DTDTutorial/General/book.html - there's also some references. The
W3C definitions can be found here:
http://www.w3.org/TR/REC-xml#sec-logical-struct.