JAVA JOLT
XML DOM API

John Hunt

1   Introduction
In this Java Jolt column we will concentrate on the DOM (or Document Object Model) API. The DOM API has been defined by the World Wide Web Consortium (known as the W3C see http://www.w3c.org/). This is a standard API that is specified in terms of interfaces that can be implemented in a particular language to provide a set of concrete classes for XML document creation, manipulation, searching and loading etc. Implementations may be provided in languages such as Java, Python, C# etc.

Any DOM implementation takes an XML document and builds a tree like structure that represents that XML document (as illustrated in Figure 1). It is now possible to get hold of any element in the tree (which is represented by a node object) and ask that element for its attributes, its child nodes, its value etc. This is done by a standard set of specifications. That is, the W3C specification for the DOM is actually a set of interfaces specifying what the various elements of the DOM API should do. Then in a particular language (for example Java) the interfaces are implemented to create concrete implementations. This means that moving from one DOM implementation to another should be straightforward.

1.1   The Java org.w3c.dom package
As may be expected J2EE comes with an implementation of the W3C DOM API. These interfaces and classes can be found in the org.w3c.dom package.



Figure 1: Creating a DOM from an XML document

There are in fact three standards for DOM, these are DOM Level 1, Level 2 and DOM Level 3 Core. Together these define interfaces with methods and properties used to manipulate XML (see org.w3c.dom). Each level builds on the previous with DOM Level 1 providing the core elements of DOM and Level 3 adding numerous nice to haves such as loading and saving.

The Java 1.5 implementation of the DOM API supports all of DOM Level 2 and 3 Core. One aspect of the DOM which was not specified until Level 3 was how an XML Document should be loaded. This has been added to Java in Java SDK 1.5, prior to this the JAXP API could be used with the DocumentBuilderFactory and DocumentBuilder classes together allowed a parser independent approach to loading an XML file and creating a DOM tree. The approach is similar to that used with the SX API, first a DocumentBuilderFactory object is obtained using the newInstance() static method on the class. This factory object can then be configured as required (here factory object indicates that the role of this object is to create other objects). It can then be used to create instance of the DocumentBuilder object. This is done using the DocumentBuilderFactory newDocumentBuilder() instance method.

Of course the JAXP API is merely a façade (or layer) on top of an actual DOM parser (such as Crimson or Xerces). In some cases these parsers provide additional (but non standard) facilities for reading and writing XML documents. In order to support older versions of Java this Java Jolt column will stick with the DocumentBuilder approach.

The DOM API also does not say how an in memory XML DOM tree should be written out to a file. In the case of Crimson an additional method write has been added to the root document node object that can be used to write the XML out to a file. This works but is not transferable between parsers. To overcome this a generic approach has been provided by the JAXP API which is a little more convoluted but is cross parser compliant. We shall look at this later.

Figure 2: The node types in the DOM

There are a variety of node types in the DOM that represent different types of element within an XML document. The class hierarchy for these is presented in Figure 3 and the key nodes are described briefly below:

•Node - root of node type interface hierarchy
Document - root of tree structure (one per XML document)
Element - represents an XML element
Text - represents text within an element
Attr - Attribute of an element
CDATASection - represents CDATA
NodeList - collection of child nodes
ProcessingInstruction - represents instructions
–e.g. <?xml-stylesheet type="text/xml" href="prf.xsl"?>
Comment - contains information from a comment
DocumentFragment - cut down version of a Document node - used for moving nodes around the tree
DocumentType - represents a (subset of) document type definition. In DOM 2 DocumentType has list of entity nodes and little else (getEntities(), getNotations()) DOM 3 will probably cover other DTDs
Entity - represents an entity tag in a DTD
EntityReference represents a reference to an entity in an XML document
Notation - represents notation tag in DTD

The way in which the nodes in the DOM tree actually map onto an XML document is illustrated by example in Figure 3.

Figure 3: Mapping an XML document to a DOM tree

There are a couple of things to note about this tree. The first is that it is a little more complex than at first might be expected. For example, a node representing the element <NAME>Denise Cooke</NAME> actually contains a NodeList object which in turn contains a Text node that actually holds the text "Denise Cooke". This is because an element may have multiple sub elements in general. All these sub elements must be held in something. They are therefore held in a NodeList. A NodeList is therefore somewhat similar to an ArrayList. For example, if you wanted to represent a 1 to many relationship between a father and two children, you might use an ArrayList to hold the references to each of the children in a "children" instance variable in the father object. This is exactly what is happening within a DOM tree.

Secondly note that processing instruction <?xml version="1.0"?> does not get represented explicitly in tree. This is because it is considered an instruction directed at the parser and not at the application that will use the XML!

This particular example does not contain any attributes, however they are included in the DOM if present. Attributes are not distinct parts of the tree (that is they are not separate nodes in the tree) rather they are internal to elements. Each element contains a NamedNodeMap (probably a hash table) with Attr nodes. Each Attr node contains a value for the attribute.

2   Loading an XML document
This section presents a Java program that loads an XML file into a DOM and then traverses the DOM tree printing out the elements, their values and their attributes etc. Note that this program can either perform validation or not. Remember that an XML file must be well-formed, however it is optional as to whether it is validated against a DTD. The Domifier program presented below can either be run in validating mode or in standard mode. If it is run in validating mode then it will check the XML file against the DTD. Figure 4 presents an XML file that contains an internal DTD (the Domifier program can work with external DTDs as well but we are keeping things simple here). If the Domifier is run indicating that validation should be performed, then the internal DTD will be used to check the XML in the file.

Figure 4: An XML file containing an internal DTD

The Domifier application first initialises the uri string as appropriate. The uri string is a string that represents a URI specification. In this case the protocol used if "file:" to indicate that we are accessing a local file. However other protocols could be used (such as http). We also set the validating flag as appropriate.

The main method (at the bottom of the listing) then calls the load methods and the displayDOM method. The load methods actually loads the document into memory. The displayDOM method merely traverses the DOM tree and prints out the results.

To load an XML file into the Document object the Domifer uses two classes form the JAXP API. These are the DocumentBuilderFactory and the DocumentBuilder. They work together in a two step process to load a document.

A DocumentBuilderFactory is an object that can supply an appropriate DocumentBuilder on demand. An instance of the DocumentBuilderFactory is obtained using the newInstance() method. This new instance can then be configured to supply an appropriate DocumentBuilder (in this case we configure it to generate a validating document builder).

In turn a DocumentBuilder is an object that can load an XML file and create an in memory DOM structure. This object is obtained from the DocumentBuilderFactory instance using the newDocumentBuilder() method.

To actually load a document into memory we use the parse method on the DocumentBuilder. This overloaded method can take a number of different parameters including a File object, an inputstream and as in this case a URI (Universal Resource Indicator) like string.

package jdt;
import java.io.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
public class Domifier {
private String uri;
private Document doc;
private boolean validating;

public Domifier(String file) { this(file, false); }
public Domifier(String file, boolean validating) {
uri = "file:" + new File (file).getAbsolutePath();
this.validating = validating;
}
public void load() {
try {
DocumentBuilderFactory dbf =
DocumentBuilderFactory.newInstance();
dbf.setValidating(validating);
DocumentBuilder db = dbf.newDocumentBuilder();
doc = db.parse(uri);
doc.getDocumentElement().normalize();
} catch (Exception exp) { exp.printStackTrace(); }
}
public void displayDOM() {
println("Domifying " + uri);
displayDOM(doc.getDocumentElement(), 0);
}
private void displayDOM(Node node, int level) {
println(node.getNodeName());
if (node.getNodeType() == Node.ELEMENT_NODE) {
indent(level);
println(node.getNodeName());
NodeList children = node.getChildNodes();
level++;
for (int i=0; i<children.getLength(); i++)
displayDOM(children.item(i), level);
} else
System.out.println(": " + node.getNodeValue() );
}

//===========================================
// Utility methods
//===========================================
private void println (String s) {
System.out.println(s);
}

public void print(String s) {
System.out.print(s);
}

public void indent(int number) {
StringBuffer sb = new StringBuffer(number);
for (int i=0; i < number; i++) {
sb.append(" ");
}
sb.append("+-");
print(sb.toString());
}

//===========================================
// Main method
//===========================================
public static void main(String [] args) {
Domifier dom;
if (args.length < 1) {
System.err.println (
"Usage: java Domifier xml-file <validating>");
System.exit (1);
}
if (args.length == 1)
dom = new Domifier(args[0]);
else
dom = new Domifier(args[0],
(Boolean.valueOf(
args[1])).booleanValue());
dom.load();
dom.displayDOM();
}
}

In the above listing the recursive displayDOM method actually does the work of traversing the DOM tree and displaying the results. It does this by checking to see the type of the node currently being visited. It then extracts either name and the value of the node, or if it is an element node (as opposed to a text node for example), then it checks to see if the element has any children. If so then it calls itself recursively on each of the children. Note that an XML element that contains some text, will be represented in the DOM as a node (of type element) with at least one child (of type text) that represents the text from the XML.

The effect of compiling and running this program is presented below:

C:\jdt\java\xml\domifier>javac -d . Domifier.java
C:\jdt\java\xml\domifier>java jdt.Domifier paper.xml true
Domifying file:C:\jdt\java\xml\domifier\paper.xml
paper
+-paper
#text
:
introduction
+-introduction
#text
:
The background to XML is interesting
#text
:
section
+-section
#text
:
heading
+-heading
#text
: The main event
#text
:
body
+-body
#text
:
So what is XML all about
#text
:
#text
:
section
+-section
#text
:
heading
+-heading
#text
: The Conclusion
#text
:
body
+-body
#text
:
Where is XML heading
#text
:
#text
:

3   Creating an XML document in Java
The following Java program, called DomBuilder, using the DOM API to create a (very) simple XML document. This document is then saved to file. The XML file created is illustrated in Figure 5. Notice that the program takes the file name to save the XML to as a command line parameter. Also notice that the Document object (held in the DomBuilder instance variable) is used to create the elements contained within the XML document.
The use of the document object to create the nodes that can then be added to the DOM tree may at first seem confusing. However the document object is playing the role of a factory object here. That is, it acts as a factory that produces DOM nodes that can be used to construct the DOM tree. To do this the document class provides a host of creation method that allow appropriate nodes to be created. For example:

createAttribute(String name) Creates an Attr of the given name. Note that the Attr instance can then be set on an Element using the setAttributeNode method.
–createComment(String data) Creates a Comment node given the specified string.
–createElement(String tagName) Creates an element of the type specified. Note that the instance returned implements the Element interface, so attributes can be specified directly on the returned object.

In addition, if there are known attributes with default values, Attr nodes representing them are automatically created and attached to the element.
–createTextNode(String text) Creates a Text node given the specified string.
–createCDATASection(String text) Creates a CDATASection node whose value is the specified string.

Once a node has been obtain it can then be added to the document object or any node below that using the appendChild method, defined in the Node class and inherited by every type of node.

–appendChild(Node n) Adds the node newChild to the end of the list of children of this node. If the newChild is already in the tree, it is first removed.

If you wish to work with namespaces (namespaces are a little bit like Java packages – they providing scooping of element definitions) then there are versions of the create methods that take a namespace parameter. These methods all include NS for namespace in their name. For example:

–createElementNS(String namespaceUri, String qualifiedName) Creates an element of the given qualified name and namespace URI.

The following program illustrates how some of these methods are used to construct an in memory DOM tree.

package jdt;
import org.w3c.dom.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import java.io.*;
public class DomBuilder {
private Document document;
private String filename;
public static void main (String [] args) {
try {
DomBuilder db = new DomBuilder(args[0]);
db.create();
db.save();
} catch (ParserConfigurationException exp) {
exp.printStackTrace();
}
}
public DomBuilder(String file) throws ParserConfigurationException {
filename = file;
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
document = builder.newDocument(); // Create a new XML document
}

public void create() {
Element root = document.createElement("employee");
document.appendChild (root);

Element name = document.createElement("name");
name.appendChild(document.createTextNode("John"));
root.appendChild(name);

Element dept = (Element)document.createElement("dept");
root.appendChild(dept);
dept.appendChild(document.createTextNode("Support"));

Element manager = (Element)document.createElement("manager");
manager.appendChild(document.createTextNode("Andy"));
dept.appendChild(manager);

}

public void save() {
try {
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
DOMSource source = new DOMSource(document);
transformer.setOutputProperty(OutputKeys.INDENT,
"yes");
PrintWriter pw = new PrintWriter(new FileOutputStream (filename));
StreamResult result = new StreamResult(pw);
transformer.transform(source, result);
} catch (TransformerException te) {
te.printStackTrace();
} catch (IOException exp) {
exp.printStackTrace();
}
}
}

For the moment we will skip over the save method other than to say that this writes the in memory structure out to a file. We shall look at the use of Transformers more in the next section.

The result of compiling this program and running it with the following statements is the XML file displayed in Figure 5:

javac jdt/DomBuilder.java
java jdt.DomBuilder employee.xml

Figure 5: The XML file created by the DomBuilder

4   Performing XSLT in JAXP
In this final section in this column we will briefly examine the XSLT transformations supported by the javax.xml.transform package. JAXP 1.1 introduced a vendor neutral XML document transformation API. This is very useful as previously there was great variation in the APIs provided by XSLT processors. This also means that the JAXP 1.1 is more than a parser API.

The API for transformations in JAXP is modeled after the TrAX (Transformation API for XML) specification and may in time adopt the TrAX directly. As of JAXP 1.1, the XSL tranformation support is provided by Apache’s Xalan implementation for XSLT 1.0. The actual JAXP façade to this lower level implementation is provided by the classes and interfaces in the javax.xml.transform package and its sub packages.

Performing XML transformations requires 3 basic steps:
1. Obtain a Transformer factory
2. 2. Retrieve a Transformer
3. 3. Perform operations (transformations) on an XML file in line with the rules in an XSL file.

The TransoformerFactory in the javax.xml.transform package acts as a factory for Transformers (and operates in a similar manner to the SAX and DOM equivalents). Thus to obtain a new TransformerFactory you use the static method newInstance(). You can then configure the TransformerFactory object obtain with various attributes used to set up the XSL processor chosen (by default Xalan but others could include SAXON, Oracles XSL processor or any TrAX-compliant processor).

You can then use the newTransformer() instance method on the TransformerFactory to create anew instance of the Transformer class to perform the actual transformation. A difference here is that the newTransformer() method can take a StreamSource which allows it to obtain the contents of an XSL file that will provide the processing rules to use (if one is not provides then the XML is passed without modification through the transformer).

A StreamSource is an object that can act as a holder for a transformation Source in the form of a stream of XML markup. It is a class defined in the javax.xml.transform.stream package. It provides constructors that can take a file name, a File object or an InputStream (and a Reader object).

There is a corresponding StreamResult class that can receive the results of the transformation. This class can then write the new XML to another DOM, to a file, to an output stream or to a writer object. It is this facility that was used in the save method in the last section to write the contents of a DOM tree out to a file (in this case no modification of the XML took place). Instead an in memory structure was "transformed" into a file.

However the most common use of the Transform API in the JAXP is to apply some XSL transformation file to an XML document. In the Translator application presented in Figure 6 we do exactly this.

Figure 6: The Transformer application

The Transformer package has a main method that drives the application and a constructor that sets up the name of the xml file, the xsl file and the output file. The translate method is the method that does the work. This method first loads the specified XML document into memory (as described in the section relating to the DOM). It then creates a TransformerFactory object.

The TransformerFactory is then used to create a new Transformer object using the specified xslfile name.

A StreamResult object is then created that points to the outputfile via a PrintWriter wrapped around a FileOutputStream. The in memory document is then transformed into the output file using the transform instance method of the Transformer class.

Note that the DOM tree was wrapped inside a DOMSource object. This is a class in the javax.xml.transform.dom package. This class acts as a wrapper around the DOM tree that allows it to be processed by the transformer. There is also a DOMResult class that allows the result of the transformation to be another in-memory DOM tree rather than an external file. There are also SAXSource and SAXResult classes if you wish to use the SAX API with the transformer class (see javax.xml.transform.sax).
To illustrate the result of this process we will apply the Transformer application to the XML file presented in Figure 7.

Figure 7: The paper.xml XML file

To do this we will of course need an XSL file. A sample XSL stylesheet is illustrated in Figure 8.

Figure 8: The paper.xsl XSL style sheet

The paper.xsl stylesheet merely extracts the text held in the section of the paper and places horizontal lines around it.

The Transformer application can be applied to the paper.xml file using the paper.xsl stylesheet in the following manner:

java jdt.Transformer paper.xml paper.xsl paper.html

The end result of this is a HTML file illustrated in Figure 9.

Figure 9: The paper.html file generated form the paper.xml file using the paper.xsl stylesheet




   


Get 6 FREE copies of "Application Development Advisor" magazine. This offer is open to IT professionals based in the United Kingdom. Coverage of .NET, XML and databases by experts.
www.appdevadvisor.co.uk
Visit Solutions Architect, our new website covering service-oriented infrastructures. Read articles and sign-up to receive the weekly emails.
www.solutionsarchitect.co.uk
Visit RFID Today, our new website covering Radio Frequency Identification. Read the latest articles and case studies. Sign-up to receive RFID Today magazine.
www.rfidtoday.co.uk

ADA Communications, Charwell House,
Wilsom Road, Alton, Hampshire, GU34 2PP, UK.
Tel: +44 (0)1420 594200
www.adacom.co.uk

© Copyright 2001 - 2005 by ADA Communications Ltd. All rights reserved. Statements of opinion and fact are made on the responsibility of the authors alone and do not imply an opinion on the part of ADA Communications Ltd or the editorial staff. Registered in England No. 04843018