JAVA
JOLT
XML DOM API
John
Hunt
1
Introduction
In this Java Jolt column we will concentrate on the DOM (or Document Object
Model) API. The DOM API has been defined by the World Wide Web Consortium
(known as the W3C see http://www.w3c.org/).
This is a standard API that is specified in terms of interfaces that can
be implemented in a particular language to provide a set of concrete classes
for XML document creation, manipulation, searching and loading etc. Implementations
may be provided in languages such as Java, Python, C# etc.
Any
DOM implementation takes an XML document and builds a tree like structure
that represents that XML document (as illustrated in Figure 1). It is
now possible to get hold of any element in the tree (which is represented
by a node object) and ask that element for its attributes, its child nodes,
its value etc. This is done by a standard set of specifications. That
is, the W3C specification for the DOM is actually a set of interfaces
specifying what the various elements of the DOM API should do. Then in
a particular language (for example Java) the interfaces are implemented
to create concrete implementations. This means that moving from one DOM
implementation to another should be straightforward.
1.1 The Java org.w3c.dom package
As may be expected J2EE comes with an implementation of the W3C DOM API.
These interfaces and classes can be found in the org.w3c.dom package.

Figure 1: Creating a DOM from an XML document
There
are in fact three standards for DOM, these are DOM Level 1, Level 2 and
DOM Level 3 Core. Together these define interfaces with methods and properties
used to manipulate XML (see org.w3c.dom). Each level builds on the previous
with DOM Level 1 providing the core elements of DOM and Level 3 adding
numerous nice to haves such as loading and saving.
The
Java 1.5 implementation of the DOM API supports all of DOM Level 2 and
3 Core. One aspect of the DOM which was not specified until Level 3 was
how an XML Document should be loaded. This has been added to Java in Java
SDK 1.5, prior to this the JAXP API could be used with the DocumentBuilderFactory
and DocumentBuilder classes together allowed a parser independent approach
to loading an XML file and creating a DOM tree. The approach is similar
to that used with the SX API, first a DocumentBuilderFactory object is
obtained using the newInstance() static method on the class. This factory
object can then be configured as required (here factory object indicates
that the role of this object is to create other objects). It can then
be used to create instance of the DocumentBuilder object. This is done
using the DocumentBuilderFactory newDocumentBuilder() instance method.
Of
course the JAXP API is merely a façade (or layer) on top of an
actual DOM parser (such as Crimson or Xerces). In some cases these parsers
provide additional (but non standard) facilities for reading and writing
XML documents. In order to support older versions of Java this Java Jolt
column will stick with the DocumentBuilder approach.
The
DOM API also does not say how an in memory XML DOM tree should be written
out to a file. In the case of Crimson an additional method write has been
added to the root document node object that can be used to write the XML
out to a file. This works but is not transferable between parsers. To
overcome this a generic approach has been provided by the JAXP API which
is a little more convoluted but is cross parser compliant. We shall look
at this later.

Figure
2: The node types in the DOM
There
are a variety of node types in the DOM that represent different types
of element within an XML document. The class hierarchy for these is presented
in Figure 3 and the key nodes are described briefly below:
•Node
- root of node type interface hierarchy
•Document - root of tree structure (one per XML document)
•Element - represents an XML element
•Text - represents text within an element
•Attr - Attribute of an element
•CDATASection - represents CDATA
•NodeList - collection of child nodes
•ProcessingInstruction - represents instructions
–e.g. <?xml-stylesheet type="text/xml" href="prf.xsl"?>
•Comment - contains information from a comment
•DocumentFragment - cut down version of a Document node -
used for moving nodes around the tree
•DocumentType - represents a (subset of) document type definition.
In DOM 2 DocumentType has list of entity nodes and little else (getEntities(),
getNotations()) DOM 3 will probably cover other DTDs
•Entity - represents an entity tag in a DTD
•EntityReference represents a reference to an entity in an
XML document
•Notation - represents notation tag in DTD
The
way in which the nodes in the DOM tree actually map onto an XML document
is illustrated by example in Figure 3.

Figure
3: Mapping an XML document to a DOM tree
There
are a couple of things to note about this tree. The first is that it is
a little more complex than at first might be expected. For example, a
node representing the element <NAME>Denise Cooke</NAME> actually
contains a NodeList object which in turn contains a Text node that actually
holds the text "Denise Cooke". This is because an element may
have multiple sub elements in general. All these sub elements must be
held in something. They are therefore held in a NodeList. A NodeList is
therefore somewhat similar to an ArrayList. For example, if you wanted
to represent a 1 to many relationship between a father and two children,
you might use an ArrayList to hold the references to each of the children
in a "children" instance variable in the father object. This
is exactly what is happening within a DOM tree.
Secondly
note that processing instruction <?xml version="1.0"?>
does not get represented explicitly in tree. This is because it is considered
an instruction directed at the parser and not at the application that
will use the XML!
This
particular example does not contain any attributes, however they are included
in the DOM if present. Attributes are not distinct parts of the tree (that
is they are not separate nodes in the tree) rather they are internal to
elements. Each element contains a NamedNodeMap (probably a hash table)
with Attr nodes. Each Attr node contains a value for the attribute.
2
Loading an XML document
This section presents a Java program that loads an XML file into a DOM
and then traverses the DOM tree printing out the elements, their values
and their attributes etc. Note that this program can either perform validation
or not. Remember that an XML file must be well-formed, however it is optional
as to whether it is validated against a DTD. The Domifier program presented
below can either be run in validating mode or in standard mode. If it
is run in validating mode then it will check the XML file against the
DTD. Figure 4 presents an XML file that contains an internal DTD (the
Domifier program can work with external DTDs as well but we are keeping
things simple here). If the Domifier is run indicating that validation
should be performed, then the internal DTD will be used to check the XML
in the file.

Figure
4: An XML file containing an internal DTD
The
Domifier application first initialises the uri string as appropriate.
The uri string is a string that represents a URI specification. In this
case the protocol used if "file:" to indicate that we are accessing
a local file. However other protocols could be used (such as http). We
also set the validating flag as appropriate.
The
main method (at the bottom of the listing) then calls the load methods
and the displayDOM method. The load methods actually loads the document
into memory. The displayDOM method merely traverses the DOM tree and prints
out the results.
To
load an XML file into the Document object the Domifer uses two classes
form the JAXP API. These are the DocumentBuilderFactory and the DocumentBuilder.
They work together in a two step process to load a document.
A
DocumentBuilderFactory is an object that can supply an appropriate DocumentBuilder
on demand. An instance of the DocumentBuilderFactory is obtained using
the newInstance() method. This new instance can then be configured to
supply an appropriate DocumentBuilder (in this case we configure it to
generate a validating document builder).
In
turn a DocumentBuilder is an object that can load an XML file and create
an in memory DOM structure. This object is obtained from the DocumentBuilderFactory
instance using the newDocumentBuilder() method.
To
actually load a document into memory we use the parse method on the DocumentBuilder.
This overloaded method can take a number of different parameters including
a File object, an inputstream and as in this case a URI (Universal Resource
Indicator) like string.
package
jdt;
import java.io.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
public class Domifier {
private String uri;
private Document doc;
private boolean validating;
public Domifier(String file) { this(file, false); }
public Domifier(String file, boolean validating) {
uri = "file:" + new File (file).getAbsolutePath();
this.validating = validating;
}
public void load() {
try {
DocumentBuilderFactory dbf =
DocumentBuilderFactory.newInstance();
dbf.setValidating(validating);
DocumentBuilder db = dbf.newDocumentBuilder();
doc = db.parse(uri);
doc.getDocumentElement().normalize();
} catch (Exception exp) { exp.printStackTrace(); }
}
public void displayDOM() {
println("Domifying " + uri);
displayDOM(doc.getDocumentElement(), 0);
}
private void displayDOM(Node node, int level) {
println(node.getNodeName());
if (node.getNodeType() == Node.ELEMENT_NODE) {
indent(level);
println(node.getNodeName());
NodeList children = node.getChildNodes();
level++;
for (int i=0; i<children.getLength(); i++)
displayDOM(children.item(i), level);
} else
System.out.println(": " + node.getNodeValue() );
}
//===========================================
// Utility methods
//===========================================
private void println (String s) {
System.out.println(s);
}
public void print(String s) {
System.out.print(s);
}
public void indent(int number) {
StringBuffer sb = new StringBuffer(number);
for (int i=0; i < number; i++) {
sb.append(" ");
}
sb.append("+-");
print(sb.toString());
}
//===========================================
// Main method
//===========================================
public static void main(String [] args) {
Domifier dom;
if (args.length < 1) {
System.err.println (
"Usage: java Domifier xml-file <validating>");
System.exit (1);
}
if (args.length == 1)
dom = new Domifier(args[0]);
else
dom = new Domifier(args[0],
(Boolean.valueOf(
args[1])).booleanValue());
dom.load();
dom.displayDOM();
}
}
In
the above listing the recursive displayDOM method actually does the work
of traversing the DOM tree and displaying the results. It does this by
checking to see the type of the node currently being visited. It then
extracts either name and the value of the node, or if it is an element
node (as opposed to a text node for example), then it checks to see if
the element has any children. If so then it calls itself recursively on
each of the children. Note that an XML element that contains some text,
will be represented in the DOM as a node (of type element) with at least
one child (of type text) that represents the text from the XML.
The
effect of compiling and running this program is presented below:
C:\jdt\java\xml\domifier>javac
-d . Domifier.java
C:\jdt\java\xml\domifier>java jdt.Domifier paper.xml true
Domifying file:C:\jdt\java\xml\domifier\paper.xml
paper
+-paper
#text
:
introduction
+-introduction
#text
:
The background to XML is interesting
#text
:
section
+-section
#text
:
heading
+-heading
#text
: The main event
#text
:
body
+-body
#text
:
So what is XML all about
#text
:
#text
:
section
+-section
#text
:
heading
+-heading
#text
: The Conclusion
#text
:
body
+-body
#text
:
Where is XML heading
#text
:
#text
:
3
Creating an XML document in Java
The following Java program, called DomBuilder, using the DOM API to create
a (very) simple XML document. This document is then saved to file. The
XML file created is illustrated in Figure 5. Notice that the program takes
the file name to save the XML to as a command line parameter. Also notice
that the Document object (held in the DomBuilder instance variable) is
used to create the elements contained within the XML document.
The use of the document object to create the nodes that can then be added
to the DOM tree may at first seem confusing. However the document object
is playing the role of a factory object here. That is, it acts as a factory
that produces DOM nodes that can be used to construct the DOM tree. To
do this the document class provides a host of creation method that allow
appropriate nodes to be created. For example:
createAttribute(String name) Creates an Attr of the given name. Note that
the Attr instance can then be set on an Element using the setAttributeNode
method.
–createComment(String data) Creates a Comment node given the specified
string.
–createElement(String tagName) Creates an element of the type specified.
Note that the instance returned implements the Element interface, so attributes
can be specified directly on the returned object.
In
addition, if there are known attributes with default values, Attr nodes
representing them are automatically created and attached to the element.
–createTextNode(String text) Creates a Text node given the specified
string.
–createCDATASection(String text) Creates a CDATASection node whose
value is the specified string.
Once
a node has been obtain it can then be added to the document object or
any node below that using the appendChild method, defined in the Node
class and inherited by every type of node.
–appendChild(Node
n) Adds the node newChild to the end of the list of children of this node.
If the newChild is already in the tree, it is first removed.
If
you wish to work with namespaces (namespaces are a little bit like Java
packages – they providing scooping of element definitions) then
there are versions of the create methods that take a namespace parameter.
These methods all include NS for namespace in their name. For example:
–createElementNS(String
namespaceUri, String qualifiedName) Creates an element of the given qualified
name and namespace URI.
The
following program illustrates how some of these methods are used to construct
an in memory DOM tree.
package
jdt;
import org.w3c.dom.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import java.io.*;
public class DomBuilder {
private Document document;
private String filename;
public static void main (String [] args) {
try {
DomBuilder db = new DomBuilder(args[0]);
db.create();
db.save();
} catch (ParserConfigurationException exp) {
exp.printStackTrace();
}
}
public DomBuilder(String file) throws ParserConfigurationException {
filename = file;
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
document = builder.newDocument(); // Create a new XML document
}
public void create() {
Element root = document.createElement("employee");
document.appendChild (root);
Element name = document.createElement("name");
name.appendChild(document.createTextNode("John"));
root.appendChild(name);
Element dept = (Element)document.createElement("dept");
root.appendChild(dept);
dept.appendChild(document.createTextNode("Support"));
Element manager = (Element)document.createElement("manager");
manager.appendChild(document.createTextNode("Andy"));
dept.appendChild(manager);
}
public void save() {
try {
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
DOMSource source = new DOMSource(document);
transformer.setOutputProperty(OutputKeys.INDENT,
"yes");
PrintWriter pw = new PrintWriter(new FileOutputStream (filename));
StreamResult result = new StreamResult(pw);
transformer.transform(source, result);
} catch (TransformerException te) {
te.printStackTrace();
} catch (IOException exp) {
exp.printStackTrace();
}
}
}
For
the moment we will skip over the save method other than to say that this
writes the in memory structure out to a file. We shall look at the use
of Transformers more in the next section.
The
result of compiling this program and running it with the following statements
is the XML file displayed in Figure 5:
javac
jdt/DomBuilder.java
java jdt.DomBuilder employee.xml

Figure
5: The XML file created by the DomBuilder
4
Performing XSLT in JAXP
In this final section in this column we will briefly examine the XSLT
transformations supported by the javax.xml.transform package. JAXP 1.1
introduced a vendor neutral XML document transformation API. This is very
useful as previously there was great variation in the APIs provided by
XSLT processors. This also means that the JAXP 1.1 is more than a parser
API.
The
API for transformations in JAXP is modeled after the TrAX (Transformation
API for XML) specification and may in time adopt the TrAX directly. As
of JAXP 1.1, the XSL tranformation support is provided by Apache’s
Xalan implementation for XSLT 1.0. The actual JAXP façade to this
lower level implementation is provided by the classes and interfaces in
the javax.xml.transform package and its sub packages.
Performing
XML transformations requires 3 basic steps:
1. Obtain a Transformer factory
2. 2. Retrieve a Transformer
3. 3. Perform operations (transformations) on an XML file in line with
the rules in an XSL file.
The
TransoformerFactory in the javax.xml.transform package acts as a factory
for Transformers (and operates in a similar manner to the SAX and DOM
equivalents). Thus to obtain a new TransformerFactory you use the static
method newInstance(). You can then configure the TransformerFactory object
obtain with various attributes used to set up the XSL processor chosen
(by default Xalan but others could include SAXON, Oracles XSL processor
or any TrAX-compliant processor).
You
can then use the newTransformer() instance method on the TransformerFactory
to create anew instance of the Transformer class to perform the actual
transformation. A difference here is that the newTransformer() method
can take a StreamSource which allows it to obtain the contents of an XSL
file that will provide the processing rules to use (if one is not provides
then the XML is passed without modification through the transformer).
A
StreamSource is an object that can act as a holder for a transformation
Source in the form of a stream of XML markup. It is a class defined in
the javax.xml.transform.stream package. It provides constructors that
can take a file name, a File object or an InputStream (and a Reader object).
There
is a corresponding StreamResult class that can receive the results of
the transformation. This class can then write the new XML to another DOM,
to a file, to an output stream or to a writer object. It is this facility
that was used in the save method in the last section to write the contents
of a DOM tree out to a file (in this case no modification of the XML took
place). Instead an in memory structure was "transformed" into
a file.
However
the most common use of the Transform API in the JAXP is to apply some
XSL transformation file to an XML document. In the Translator application
presented in Figure 6 we do exactly this.

Figure
6: The Transformer application
The
Transformer package has a main method that drives the application and
a constructor that sets up the name of the xml file, the xsl file and
the output file. The translate method is the method that does the work.
This method first loads the specified XML document into memory (as described
in the section relating to the DOM). It then creates a TransformerFactory
object.
The
TransformerFactory is then used to create a new Transformer object using
the specified xslfile name.
A
StreamResult object is then created that points to the outputfile via
a PrintWriter wrapped around a FileOutputStream. The in memory document
is then transformed into the output file using the transform instance
method of the Transformer class.
Note
that the DOM tree was wrapped inside a DOMSource object. This is a class
in the javax.xml.transform.dom package. This class acts as a wrapper around
the DOM tree that allows it to be processed by the transformer. There
is also a DOMResult class that allows the result of the transformation
to be another in-memory DOM tree rather than an external file. There are
also SAXSource and SAXResult classes if you wish to use the SAX API with
the transformer class (see javax.xml.transform.sax).
To illustrate the result of this process we will apply the Transformer
application to the XML file presented in Figure 7.

Figure
7: The paper.xml XML file
To
do this we will of course need an XSL file. A sample XSL stylesheet is
illustrated in Figure 8.

Figure
8: The paper.xsl XSL style sheet
The
paper.xsl stylesheet merely extracts the text held in the section of the
paper and places horizontal lines around it.
The
Transformer application can be applied to the paper.xml file using the
paper.xsl stylesheet in the following manner:
java
jdt.Transformer paper.xml paper.xsl paper.html
The
end result of this is a HTML file illustrated in Figure 9.

Figure
9: The paper.html file generated form the paper.xml file using the paper.xsl
stylesheet
|