| Top Contents Index Glossary |
|
Link Summary |
External Links
Glossary Terms |
Now that you have a basic understanding of XML, it makes sense to get a high-level overview of the various XML-related acronyms and what they mean. There is a lot of work going on around XML, so there is a lot to learn.
The current APIs for accessing XML documents either serially or in random access mode are, respectively, SAX and DOM. The specifications for ensuring the validity of XML documents are DTD (the original mechanism, defined as part of the XML specification) and various schema proposals (newer mechanisms that use XML syntax to do the job of describing validation criteria).
Other future standards that are nearing completion include the XSL standard -- a mechanism for setting up translations of XML documents (for example to HTML or other XML) and for dictating how the document is rendered. The transformation part of that standard, XSLT, is completed and covered in this tutorial. Another effort nearing completion is the XML Link Language specification (XLL), which enables links between XML documents.
Those are the major initiatives you will want to be familiar with. This section also surveys a number of other interesting proposals, including the HTML-lookalike standard, XHTML, and the meta-standard for describing the information an XML document contains, RDF. There are also standards efforts that aim to extend XML, including XLink, and XPointer.
Finally, there are a number of interesting standards and standards-proposals that build on XML, including Synchronized Multimedia Integration Language (SMIL), Mathematical Markup Language (MathML), Scalable Vector Graphics (SVG), and DrawML, as well as a number of eCommerce standards.
The remainder of this section gives you a more detailed description of these initiatives. To help keep things straight, it's divided into:
Skim the terms once, so you know what's here, and keep a copy of this document handy so you can refer to it whenever you see one of these terms in something you're reading. Pretty soon, you'll have them all committed to memory, and you'll be at least "conversant" with XML!
SAX
Simple API for XMLThis API was actually a product of collaboration on the XML-DEV mailing list, rather than a product of the W3C. It's included here because it has the same "final" characteristics as a W3C recommendation.
You can also think of this standard as the "serial access" protocol for XML. This is the fast-to-execute mechanism you would use to read and write XML data in a server, for example. This is also called an event-driven protocol, because the technique is to register your handler with a SAX parser, after which the parser invokes your callback methods whenever it sees a new XML tag (or encounters an error, or wants to tell you anything else).
For more information on the SAX protocol, see Serial Access with the Simple API for XML.
DOM
The Document Object Model protocol converts an XML document into a collection of objects in your program. You can then manipulate the object model in any way that makes sense. This mechanism is also known as the "random access" protocol, because you can visit any part of the data at any time. You can then modify the data, remove it, or insert new data. For more information on the DOM specification, see Manipulating Document Contents with the Document Object Model.
Document Object ModelDTD
Document Type DefinitionThe DTD specification is actually part of the XML specification, rather than a separate entity. On the other hand, it is optional -- you can write an XML document without it. And there are a number of schema proposals that offer more flexible alternatives. So it is treated here as though it were a separate specification.
A DTD specifies the kinds of tags that can be included in your XML document, and the valid arrangements of those tags. You can use the DTD to make sure you don't create an invalid XML structure. You can also use it to make sure that the XML structure you are reading (or that got sent over the net) is indeed valid.
Unfortunately, it is difficult to specify a DTD for a complex document in such a way that it prevents all invalid combinations and allows all the valid ones. So constructing a DTD is something of an art. The DTD can exist at the front of the document, as part of the prolog. It can also exist as a separate entity, or it can be split between the document prolog and one or more additional entities.
However, while the DTD mechanism was the first method defined for specifying valid document structure, it was not the last. Several newer schema specifications have been devised. You'll learn about those momentarily.
For more information, see Defining a Document Type.
Namespaces
The namespace standard lets you write an XML document that uses two or more sets of XML tags in modular fashion. Suppose for example that you created an XML-based parts list that uses XML descriptions of parts supplied by other manufacturers (online!). The "price" data supplied by the subcomponents would be amounts you want to total up, while the "price" data for the structure as a whole would be something you want to display. The namespace specification defines mechanisms for qualifying the names so as to eliminate ambiguity. That lets you write programs that use information from other sources and do the right things with it.
The latest information on namespaces can be found at
http://www.w3.org/TR/REC-xml-names.XSL
Extensible Stylesheet LanguageThe XML standard specifies how to identify data, not how to display it. HTML, on the other hand, told how things should be displayed without identifying what they were. The XSL standard has two parts, XSLT (the transformation standard, described next) and XSL-FO (the part that covers formatting objects, also known as flow objects). XSL-FO gives you the ability to define multiple areas on a page and then link them together. When a text stream is directed at the collection, it fills the first area and then "flows" into the second when the first area is filled. Such objects are used by newsletters, catalogs, and periodical publications.
The latest W3C work on XSL is at
http://www.w3.org/TR/WD-xsl.XSLT (+XPATH)
Extensible Stylesheet Language for TransformationsThe XSLT transformation standard is essentially a translation mechanism that lets you specify what to convert an XML tag into so that it can be displayed -- for example, in HTML. Different XSL formats can then be used to display the same data in different ways, for different uses. (The XPATH standard is an addressing mechanism that you use when constructing transformation instructions, in order to specify the parts of the XML structure you want to transform.)
For more information, see Using XSLT.
A DTD makes it possible to validate the structure of relatively simple XML documents, but that's as far as it goes.
A DTD can't restrict the content of elements, and it can't specify complex relationships. For example, it is impossible to specify with a DTD that a <heading> for a <book> must have both a <title> and an <author>, while a <heading> for a <chapter> only needs a <title>. In a DTD, once you only get to specify the structure of the <heading> element one time. There is no context-sensitivity.
This issue stems from the fact that a DTD specification is not hierarchical. For a mailing address that contained several "parsed character data" (PCDATA) elements, for example, the DTD might look something like this:
<!ELEMENT mailAddress (name, address, zipcode)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT address (#PCDATA)>
<!ELEMENT zipcode (#PCDATA)>
As you can see, the specifications are linear. That fact forces you to come up with new names for similar elements in different settings. So if you wanted to add another "name" element to the DTD that contained the <firstName>, <middleInitial>, and <lastName>, then you would have to come up with another identifier. You could not simply call it "name" without conflicting with the <name> element defined for use in a <mailAddress>.
Another problem with the nonhierarchical nature of DTD specifications is that
it is not clear what comments are meant to explain. A comment at the top like
<!-- Address used for mailing via the postal system --> would
apply to all of the elements that constitute a mailing address. But a comment
like <!-- Addressee --> would apply to the name
element only. On the other hand, a comment like <!-- A 5-digit string
--> would apply specifically to the #PCDATA part of the
zipcode element, to describe the valid formats. Finally, DTDs do
not allow you to formally specify field-validation criteria, such as the 5-digit
(or 5 and 4) limitation for the zipcode field.
Finally, a DTD uses syntax which substantially different from XML, so it can't be processed with a standard XML parser. That means you can't read a DTD into a DOM, for example, modify it, and then write it back out again.
To remedy these shortcomings, a number of proposals have been made for a more database-like, hierarchical "schema" that specifies validation criteria. The major proposals are shown below.
XML Schema
A large, complex standard that has two parts. One part specifies structure relationships. (This is the largest and most complex part.) The other part specifies mechanisms for validating the content of XML elements by specifying a (potentially very sophisticated) datatype for each element. The good news is that XML Schema for Structures lets you specify any kind of relationship you can conceive of. The bad news is that it takes a lot of work to implement, and it takes a bit of learning to use. Most of the alternatives provide for simpler structure definitions, while incorporating the XML Schema datatype standard.
For more information on the XML Schema proposal, see the W3C specs XML Schema (Structures) and XML Schema (Datatypes).
RELAX
Regular Language description for XMLSimpler than XML Structure Schema, RELAX uses XML syntax to express the structure relationships that are present in a DTD, and adds the XML Datatype Schema mechanisms, as well. Includes a DTD to RELAX converter.
For more information on Relax, see http://www.xml.gr.jp/relax/.
SOX
Schema for Object-oriented XMLSOX is a schema proposal that includes extensible data types, namespaces, and embedded documentation.
For more information on SOX, see
http://www.w3.org/TR/NOTE-SOX.TREX
Tree Regular Expressions for XMA means of expressing validation criteria by describing a pattern for the structure and content of an XML document. Includes a RELAX to TREX converter.
For more information on TREX, see
http://www.thaiopensource.com/trex/.Schematron
Schema for Object-oriented XMLAn assertion-based schema mechanism that allows for sophisticated validation.
For more information on Schematron, see http://www.ascc.net/xml/resource/schematron/schematron.html.
XML Linking
These specifications provide a variety of powerful linking mechanisms, and are sure to have a big impact on how XML documents are used.
XLink: The XLink protocol is a proposed specification to handle links between XML documents. This specification allows for some pretty sophisticated linking, including two-way links, links to multiple documents, "expanding" links that insert the linked information into your document rather than replacing your document with a new page, links between two documents that are created in a third, independent document, and indirect links (so you can point to an "address book" rather than directly to the target document -- updating the address book then automatically changes any links that use it).
XML Base: This standard defines an attribute for XML documents that defines a "base" address, that is used when evaluating a relative address specified in the document. (So, for example, a simple file name would be found in the base-address directory.)
XPointer: In general, the XLink specification targets a document or document-segment using its ID. The XPointer specification defines mechanisms for "addressing into the internal structures of XML documents", without requiring the author of the document to have defined an ID for that segment. To quote the spec, it provides for "reference to elements, character strings, and other parts of XML documents, whether or not they bear an explicit ID attribute".
For more information on the XML Linking standards, see http://www.w3.org/XML/Linking.
XHTML
The XHTML specification is a way of making XML documents that look and act like HTML documents. Since an XML document can contain any tags you care to define, why not define a set of tags that look like HTML? That's the thinking behind the XHTML specification, at any rate. The result of this specification is a document that can be displayed in browsers and also treated as XML data. The data may not be quite as identifiable as "pure" XML, but it will be a heck of a lot easier to manipulate than standard HTML, because XML specifies a good deal more regularity and consistency.
For example, every tag in a well-formed XML document must either have an end-tag associated with it or it must end in
/>. So you might see<p>...</p>, or you might see<p/>, but you will never see<p>standing by itself. The upshot of that requirement is that you never have to program for the weird kinds of cases you see in HTML where, for example, a<dt>tag might be terminated by</DT>, by another<DT>, by<dd>, or by</dl>. That makes it a lot easier to write code!The XHTML specification is a reformulation of HTML 4.0 into XML. The latest information is at
http://www.w3.org/TR/xhtml1.
RDF
Resource Description FrameworkRDF is a proposed standard for defining data about data. Used in conjunction with the XHTML specification, for example, or with HTML pages, RDF could be used to describe the content of the pages. For example, if your browser stored your ID information as
FIRSTNAME,LASTNAME, andNAMEandEMAILADDRESS. Just think: One day you may not need to type your name and address at every web site you visit!For the latest information on RDF, see http://www.w3.org/TR/REC-rdf-syntax.
RDF Schema
The RDF Schema proposal allows the specification of consistency rules and additional information that describe how the statements in a Resource Description Framework (RDF) should be interpreted.
For more information on the RDF Schema recommendation, see http://www.w3.org/TR/rdf-schema.
XTM
In many ways a simpler, more readily usable knowledge-representation than RDF, the topic maps standard is one worth watching. So far, RDF is the W3C standard for knowledge representation, but topic maps could possibly become the "developer's choice" among knowledge representation standards.
XML Topic MapsFor more information on XML Topic Maps, http://www.topicmaps.org/xtm/index.html. For information on topic maps and the web, see http://www.topicmaps.org/.
The following standards and proposals build on XML. Since XML is basically a language-definition tool, these specifications use it to define standardized languages for specialized purposes.
SMIL
Synchronized Multimedia Integration LanguageSMIL is a W3C recommendation that covers audio, video, and animations. It also addresses the difficult issue of synchronizing the playback of such elements.
For more information on SMIL, see
http://www.w3.org/TR/REC-smil.MathML
Mathematical Markup LanguageMathML is a W3C recommendation that deals with the representation of mathematical formulas.
For more information on MathML, see
http://www.w3.org/TR/REC-MathML.SVG
Scalable Vector GraphicsSVG is a W3C working draft that covers the representation of vector graphic images. (Vector graphic images that are built from commands that say things like "draw a line (square, circle) from point x,y to point m,n" rather than encoding the image as a series of bits. Such images are more easily scalable, although they typically require more processing time to render.)
For more information on SVG, see
http://www.w3.org/TR/WD-SVG.DrawML
Drawing Meta LanguageDrawML is a W3C note that covers 2D images for technical illustrations. It also addresses the problem of updating and refining such images.
For more information on DrawML, see http://www.w3.org/TR/NOTE-drawml.
ICE
Information and Content ExchangeICE is a protocol for use by content syndicators and their subscribers. It focuses on "automating content exchange and reuse, both in traditional publishing contexts and in business-to-business relationships".
For more information on ICE, see http://www.w3.org/TR/NOTE-ice.
ebXML
Electronic Business with XMLThis standard aims at creating a modular electronic business framework using XML. It is the product of a joint initiative by the United Nations (UN/CEFACT) and the Organization for the Advancement of Structured Information Systems (OASIS).
For more information on ebXML, see http://www.ebxml.org/.
cxml
Commerce XMLcxml is a RosettaNet (
www.rosettanet.org) standard for setting up interactive online catalogs for different buyers, where the pricing and product offerings are company specific. Includes mechanisms to handle purchase orders, change orders, status updates, and shipping notifications.For more information on cxml, see http://www.cxml.org/
CBL
Common Business LibraryCBL is a library of element and attribute definitions maintained by CommerceNet (
www.commerce.net).For more information on CBL and a variety of other initiatives that work together to enable eCommerce applications, see
http://www.commerce.net/projects/currentprojects/eco/wg/eCo_Framework_Specifications.html.
| Top Contents Index Glossary |