So, XML. It’s actually very easy. It’s a lot like HTML, except you get to make up your own tags. The rules are:
- Any text ought to be surrounded by tags: <my_tag>text</my_tag>
- Do not nest tags. <my_tag><bold_text>text</my_tag></bold_text> is not allowed. Use <my_tag><bold_text>text</bold_text></my_tag> instead.
- There needs to be a tag that starts and ends the document. This is generally called the root node.
That’s it, mostly.* Follow those rules and you will have well-formed XML. Valid XML, however, requires a set of tag definitions to validate against. Those rules are written in a dtd file. Like so:
<!ELEMENT my_tag (tags and stuff my_tag can contain) >
A dtd needs an ELEMENT declaration for every tag in your XML file. Note that in my previous post, I showed two dtds for two XML files. A line like so:
<!ELEMENT english_term (#PCDATA)>
means that the tag <english_term>...</english_term> contains text but no other tags. A line like so:
<!ELEMENT english_entry (english_term, pronunciation, definitions+, subentry*)>
lists the tags that <english_entry>...</ contains. Note that some of these tags have characters after them: +, *, ?. Those have a special meaning:english_entry>
- + means that 1 or more tags of this type can exist within the defined tag
- * means that zero or more tags of this type can exist within the defined tag
- ? means that zero or 1 tag of this type can exist within the defined tag
- no mark means that 1 tag of this type must exist within the defined tag
There. That is basic (very basic) XML. Easy!
*Then there are attributes and namespaces and other optional complications.


