How to choose between DTD and XSD

I want to use either a DTD or an XSD to describe my XML document. I've read that XSDs are better than DTDs since they support namespaces and data types, and that DTDs are older.

Does this mean that I should only use XSDs for all future needs and totally ignore DTD as an option? Should I even bother learning the structure of DTDs?

What factors should I consider when choosing between XSD and DTD?

Answers


It's probably important to learn DTDs as a separate exercise, just for the knowledge of how they work in case you encounter them somewhere else, and so that you can appreciate some of the things that XSD was trying to solve.

However, for your current purposes of describing an XML document, indeed stick to XSDs.

In addition to having a far richer feature set (like you mention, including data types and namespaces), they are also XML documents themselves, which can be really useful. Because they are XML, you can check their well-formedness and validity a lot easier, and you can write code that works with them like regular XML files (for instance, if you wanted to autogenerate code classes from a schema)


It really depends on how complicated the structure is that you need to setup.

If you need things like namespacing and datatypes, definitely go with XSD. If you just need a quick little schema to check against, DTD will give you faster performance since there is no XML parsing involved.

As I understand it, XSD is derived from DTD so understanding DTD will give a solid foundation for learning XSD, plus point out some of DTD's short comings.


It wouldn't hurt to understand the structure of a DTD (it'll help you better understand an XSD in the long run)...but you should use XSDs moving forward.


No harm in learning DTD, but be sure to use XSD, because XSD has more strength,

With XSD you can not only validate the structure/hierarchy of the XML tags but also,

  1. You can define Data type of the values of the nodes. [date, number, string etc]
  2. You can also define custom data_types, [example, for node , the possible data can be one of the 12 months.. so you need to define all the 12 months in a new data type writing all the 12 month names as enumeration values .. validation shows error if the input XML contains any-other value than these 12 values .. ]
  3. You can put the restriction on the occurrence of the elements, using minOccurs and maxOccurs, the default values are 1 and 1.

.. and many more ...

There are some restrictions: as like,

  1. An element(name) defined in XSD file must be defined with only one data-type.
  2. You can't validate a node/attribute using the value of another node/attribute.

There is an IMHO very important issue to use a DTD (maybe together with a XSD if you need in-deep-validation):

In DTD you can define your own entities eg:

<!ENTITY MyName "DrDr.Hannibal Xerxes Utah,MBA and CEO">

In your document you can wherevever needed simply code &MyName; instead typing all this stuff.

Furthermore assume you have a XML-like file (maybe produced by some other application) that consists of a lot of similar tags but no root-tag, eg:

<?xml version="1.0" encoding="ISO-8859-1"?> <!-- you need this when using foreign characters like 'ü' -->
<Book Author="Author1">
  <Titel>Erstes Buch</Titel>
</Book>
...
<Book Author="Author5">
  <Titel>Fünftes Buch</Titel>
</Book>

Assume this file is named "Booklist.TXT",

Now you can code your master-xml:

<?xml version="1.0" encoding="ISO-8859-1"?> <!-- you need this when using foreign characters like 'ü' -->
<DOCTYPE MyRoot [
<ENTITY AllBooks SYSTEM "Booklist.TXT">
]

<MyRoot>
... some prefix-stuff as needed ...
&AllBooks; <!-- here are all the Books -->
... some post stuff es needed ...
</MyBook>

and whenever you need the books in another context you only must code the surrounding xml and habe not to touch or copy the booklist itself, furthermore you can maintenance it in one single place and have all changes in any document.


This is an old string, BUT in case anyone else comes across it... from what I can tell DTD still has two benefits which XSD does not, namely the inclusion of the ENTITY function which does not exist in XSD. This is a pretty awesome feature which tells the compiler how to process potentially unfamiliar file types by identifying what programs to open to process them.

Also, DTDs are written into the XML spec so they can be written directly into XML documents whereas XSD has to exist as an outside file and connected. Not a big deal especially when using in bigger documents anyway.

I think XSD is still far better and more natural since it uses XML syntax, just wanted to play devil's advocate :)


XML Schema can perform more complex validations. E.g if DTD checks the datatype of an XML element is integer or string.XML schema can perform more complicated validations like if the xml element is a string starting with uppercase letter or a positive integer. Finally XML schema uses XML syntax and its a natural choice for development of web services.


Need Your Help

Illegal mix of collations in mySQL

mysql character-encoding collation

I need to transfer a column from one table to another. The source table has a different collation than the target table (latin1_general_ci and latin1_swedish_ci).

How to know that a triangle triple exists in our array?

c++ math geometry

I was stuck in solving the following interview practice question: