.NET XML Best Practices
Part III: Writing XML Documents
by Aaron Skonnard
In the first part of this series, Choosing
an XML API, I covered the numerous XML-related classes in .NET for
reading and writing XML documents and discussed the characteristics of
each. In the second part of this series, Reading
XML Documents, I walked you through several practical examples of
reading XML documents to help you better understand the tradeoffs. Although
for reading XML there is a clear trade-off between performance/efficiency
and productivity, the tradeoff isn't as significant for writing XML documents
because all of the options are quite easy to use and there aren't nearly
as many options to consider.
As you learned in that piece, you'll write your XML producing code against
one of two APIs: XmlTextWriter or the DOM.
The following are some key points to consider when deciding which to use:
Use XmlTextWriter if:
- Performance and efficiency are your highest priority and…
- You don't need to work with the document in memory
Use the DOM if:
- You need to manipulate the document in-memory before writing it to
XML 1.0 (e.g., using DOM, XPath, XSLT) or…
- You wish to validate the document before writing it to XML 1.0
- The document isn't very large and you're already used to the DOM
Surprisingly, XmlTextWriter is actually more
intuitive for many developers than the DOM. Since
there isn't a significant productivity trade-off and it's the most efficient
solution, it's usually the best choice for most situations.
XmlTextWriter
As with XmlTextReader, XmlTextWriter
is as close to the XML 1.0 byte stream as you can get in .NET. You can
instantiate an XmlTextWriter over a Stream or a
TextWriter. Then, you can begin writing XML into the object by making
one of several available method calls that represent different parts of
the logical XML document, as modeled by the XmlWriter
abstract base class.
How XmlTextWriter Works
Once you've instantiated an XmlTextWriter, you
can begin writing nodes into the document's logical stream. This is much
like XmlTextReader except instead of pulling the
nodes from the stream in one at a time, now you're pushing the nodes into
the stream one at a time. Since this approach also require flattening
the XML tree structure to a linear stream of nodes, end element method
calls must be made at the appropriate time to enable proper interpretation
of the document structure. Here is a simple example:
tw.WriteStartDocument();
tw.WriteComment(" Aaron Skonnard's name structure ");
tw.WriteStartElement("x", "name", "http://example.org/name");
tw.WriteStartElement("first");
tw.WriteString("Aaron");
tw.WriteEndElement();
tw.WriteStartElement("last");
tw.WriteString("Skonnard");
tw.WriteEndElement();
tw.WriteEndElement();
tw.WriteEndDocument();
This sequence of method calls generates the following XML 1.0 byte stream
(with indentation added for human readability):
<!-- Aaron Skonnard's name structure -->
<x:name xmlns:x='http://example.org/name'>
<first>Aaron</first>
<last>Skonnard</last>
</x:name>
After calling WriteStartElement to write out
the start of an element, you can write child text nodes through calls
to WriteString. If you wish to write out a CLR
value as an XML string, you should use XmlConvert
to perform the appropriate translation between the CLR
and XML Schema type systems as shown here:
double age = 33.3;
...
tw.WriteStartElement("age");
tw.WriteString(XmlConvert.ToString(age));
tw.WriteEndElement();
You can write out attributes by calling WriteAttributeString
immediately after calling WriteStartElement.
You should write out all attributes before writing out child content.
The following example illustrates how to write out a few attributes on
the name element:
tw.WriteStartDocument();
tw.WriteComment(" Aaron Skonnard's name structure ");
tw.WriteStartElement("x", "name", "http://example.org/name");
tw.WriteAttributeString("", "ssnum", "", "555-55-1212");
tw.WriteAttributeString("", "key", "", "43532325");
tw.WriteStartElement("first");
tw.WriteString("Aaron");
tw.WriteEndElement();
tw.WriteStartElement("last");
tw.WriteString("Skonnard");
tw.WriteEndElement();
tw.WriteEndElement();
tw.WriteEndDocument();
This sequence of method calls generates the following XML 1.0 document:
<!-- Aaron Skonnard's name structure -->
<x:name ssnum="555-55-1212" dl="43532325"
xmlns:x="http://example.org/name">
<first>Aaron</first>
<last>Skonnard</last>
</x:name>
Again, if you need to write a CLR value into an attribute value, you
should use XmlConvert as shown here:
long key = 43532325;
tw.WriteAttributeString("", "dl", "",
XmlConvert.ToString(key));
Hence, the model for working with XmlTextWriter
consists of generating a logical stream of nodes through a sequence of
method calls. This model is very similar to the model for generating XML
documents with the SAX API.
Simplifying Text-Only Elements
Every time you write out a text-only element, you have to make the following
three method calls: WriteStartElement, WriteString,
and WriteEndElement as shown here in this example:
tw.WriteStartElement("first");
tw.WriteString("Aaron");
tw.WriteEndElement();
To simplify the process of working with text-only elements, XmlTextWriter
also provides a WriteElementString method, which
encapsulates these three calls:
tw.WriteElementString("first", "Aaron");
This is equivalent to the simplification produced by ReadElementString
on the reading side. Take a look at how WriteElementString
can simplify the original writing example shown above:
tw.WriteStartDocument();
tw.WriteComment(" Aaron Skonnard's name structure ");
tw.WriteStartElement("x", "name", "http://example.org/name");
tw.WriteElementString("first", "Aaron");
tw.WriteElementString("last", "Skonnard");
tw.WriteEndElement();
tw.WriteEndDocument();
Writing Qualified-Name Values
Since more and more XML developers are working with XML
Schema and namespace-aware documents, XmlTextWriter
also contains some enhancements for working with namespace declarations
and qualified names (QNames).
One such enhancement has to do with looking up the namespace prefix currently
in-scope for a given namespace name. For example, suppose that you need
to generate a title element that contains a QName
value like this:
<title xmlns:x='http://example.org/name'>x:Mr</title>
The problem with generating this XML is that you need to make sure that
you use the correct prefix in the text node. Hence, you can lookup the
namespace prefix currently in-scope for a given namespace as shown here:
tw.WriteString(
tw.LookupPrefix("http://example.org/name")+ ":" + "Mr");
There's also another method called WriteQualifiedName
that simplifies the process of writing QNames
into element content. This method takes a local name and a namespace name
and generates the correct prefixed name in the XML 1.0 document (according
to the in-scope namespace declarations):
tw.WriteStartElement("title");
tw.WriteQualifiedName("Mr", "http://example.org/name");
tw.WriteEndElement();
Although XmlTextWriter does a good job of hiding
all details related to producing namespace declarations, these additional
helper methods are necessary to query in-scope namespace declarations.
Controlling Other Serialization Details
Since XmlTextWriter is just a thin veneer on top
of the resulting XML 1.0 byte stream, the class provides several configurable
properties related to the serialization process such as:
Character encoding
Indentation (pretty-printing)
Character to use for indentation (space vs. tab, etc.)
Character to use for quoting attribute values (' vs. ")
The character encoding you wish to use needs to be supplied when instantiating
the Stream or TextWriter
object. The rest are configurable properties of XmlTextWriter
. The following example illustrates how they may be used:
XmlTextWriter tw = new XmlTextWriter ("aaron1.xml",
Encoding.UTF8);
tw.Formatting = Formatting.Indented;
tw.Indentation = 5;
tw.IndentChar = ' ';
tw.QuoteChar = '\'';
tw.WriteStartDocument();
... // omitted for brevity
This example produces an XML document that uses a UTF-8
encoding and contains the following XML declaration:
<?xml version='1.0' encoding='utf-8'?>
It also indents each element tag with five space characters and all attribute
values and namespace declarations are enclosed in single quotes.
Building Documents with the DOM
The DOM is what most XML developers use to generate
XML 1.0 documents today. As you learned in the first two parts of this
series, the DOM is intuitive and easy to use but it comes with overhead.
The same is true for generating XML documents with the DOM. In my opinion,
using the DOM to generate XML documents is no easier than using XmlTextWriter
and in many cases it's a bit more cumbersome.
The following code fragment illustrates how to build the first XML document
shown above in the XmlTextWriter examples:
XmlDocument doc = new XmlDocument();
XmlNode comment, name, first, last;
comment=doc.CreateComment("Aaron Skonnard's name structure");
doc.AppendChild(comment);
name = doc.CreateElement("x", "name",
"http://example.org/name");
doc.AppendChild(name);
first = doc.CreateElement("first");
first.InnerText = "Aaron";
name.AppendChild(first);
last = doc.CreateElement("last");
last.InnerText = "Skonnard";
name.AppendChild(last);
At this point, the logical XML document is loaded into memory as a tree
of objects. You can serialize the tree out as a string of XML 1.0 through
the XmlNode InnerXml
property as shown here:
Console.WriteLine(doc.InnerXml);
XmlDocument also provides a Save
method that you can use to serialize the tree out to a Stream,
a TextWriter, or an XmlTextWriter.
The following also illustrates how to do save the tree to the Console:
doc.Save(Console.Out);
If you want more control over the serialization details, you can instantiate
an XmlTextWriter and specify those details prior
to calling Save as shown here: XmlTextWriter tw = new XmlTextWriter
("aaron5.xml",
Encoding.UTF8);
tw.Formatting = Formatting.Indented;
doc.Save(tw);
With the DOM, you can accomplish all of the same things that I illustrated
with XmlTextWriter . For example, you can simplify
dealing with text-only elements through XmlNode's
InnerText property (see example above). You can
query the in-scope namespace declarations through XmlNode's
GetNamespaceOfPrefix and GetPrefixOfNamespace
methods. And as I just demonstrated, you can control the serialization
details through the XmlDocument Save mechanism.
Due to the fact that most of you are already familiar with the DOM API,
I won't bore you with the remaining details.
Since DOM serialization is actually performed through the XmlTextWriter
layer, the difference is the fact that document is loaded in memory ahead
of time. Although there are some situations when this is necessary, it's
generally better practice to generate XML document via XmlTextWriter
directly.
Conclusion
Deciding which API to use for writing XML documents in .NET isn't as
complicated as deciding which to use for reading. In general, unless you
absolutely need to have the document in-memory prior to serialization,
you're better off using XmlTextWriter directly.
See the guidelines at the beginning of this piece for more on the determining
factors.
References
[1] XML in .NET: .NET Framework XML Classes and C# Offer Simple, Scalable
Data Manipulation, MSDN Magazine January 2001, http://msdn.microsoft.com/msdnmag/issues/01/01/xml/xml.asp,
by Aaron Skonnard
[2] Writing XML Providers for Microsoft .NET, MSDN Magazine September
2001, http://msdn.microsoft.com/msdnmag/issues/01/09/xml/xml0109.asp,
by Aaron Skonnard
Sample Code
Download the sample code, writingxml.zip, at the bottom of this page.
About The Author
Aaron Skonnard is a consultant, instructor, and author specializing
in Windows technologies and Web applications. Aaron teaches courses for
DevelopMentor and is a columnist for Microsoft Internet Developer. He
is the author of Essential WinInet, and co-author of Essential XML: Beyond
MarkUp (Addison Wesley Longman). Contact him at http://staff.develop.com/aarons.
|