.NET XML Best Practices

Home Products Support Corporate

Support Knowledge Base, Article 675

Product

General

Title

.NET XML Best Practices - Writing XML Documents

Solution

Part III: Writing XML Documents

by Aaron Skonnard

In the first part of this series, Choosing an XML API, I covered the numerous XML-related classes in .NET for reading and writing XML documents and discussed the characteristics of each. In the second part of this series, Reading XML Documents, I walked you through several practical examples of reading XML documents to help you better understand the tradeoffs. Although for reading XML there is a clear trade-off between performance/efficiency and productivity, the tradeoff isn't as significant for writing XML documents because all of the options are quite easy to use and there aren't nearly as many options to consider.

As you learned in that piece, you'll write your XML producing code against one of two APIs: XmlTextWriter or the DOM. The following are some key points to consider when deciding which to use:

Use XmlTextWriter if:

Performance and efficiency are your highest priority and…
You don't need to work with the document in memory

Use the DOM if:

You need to manipulate the document in-memory before writing it to XML 1.0 (e.g., using DOM, XPath, XSLT) or…
You wish to validate the document before writing it to XML 1.0
The document isn't very large and you're already used to the DOM

Surprisingly, XmlTextWriter is actually more intuitive for many developers than the DOM. Since there isn't a significant productivity trade-off and it's the most efficient solution, it's usually the best choice for most situations.

XmlTextWriter

As with XmlTextReader, XmlTextWriter is as close to the XML 1.0 byte stream as you can get in .NET. You can instantiate an XmlTextWriter over a Stream or a TextWriter. Then, you can begin writing XML into the object by making one of several available method calls that represent different parts of the logical XML document, as modeled by the XmlWriter abstract base class.

How XmlTextWriter Works

Once you've instantiated an XmlTextWriter, you can begin writing nodes into the document's logical stream. This is much like XmlTextReader except instead of pulling the nodes from the stream in one at a time, now you're pushing the nodes into the stream one at a time. Since this approach also require flattening the XML tree structure to a linear stream of nodes, end element method calls must be made at the appropriate time to enable proper interpretation of the document structure. Here is a simple example:

tw.WriteStartDocument(); tw.WriteComment(" Aaron Skonnard's name structure "); tw.WriteStartElement("x", "name", "http://example.org/name"); tw.WriteStartElement("first"); tw.WriteString("Aaron"); tw.WriteEndElement(); tw.WriteStartElement("last"); tw.WriteString("Skonnard"); tw.WriteEndElement(); tw.WriteEndElement(); tw.WriteEndDocument();

This sequence of method calls generates the following XML 1.0 byte stream (with indentation added for human readability):

 <x:name xmlns:x='http://example.org/name'> <first>Aaron</first> <last>Skonnard</last> </x:name>

After calling WriteStartElement to write out the start of an element, you can write child text nodes through calls to WriteString. If you wish to write out a CLR value as an XML string, you should use XmlConvert to perform the appropriate translation between the CLR and XML Schema type systems as shown here:

double age = 33.3; ... tw.WriteStartElement("age"); tw.WriteString(XmlConvert.ToString(age)); tw.WriteEndElement();

You can write out attributes by calling WriteAttributeString immediately after calling WriteStartElement. You should write out all attributes before writing out child content. The following example illustrates how to write out a few attributes on the name element:

tw.WriteStartDocument(); tw.WriteComment(" Aaron Skonnard's name structure "); tw.WriteStartElement("x", "name", "http://example.org/name"); tw.WriteAttributeString("", "ssnum", "", "555-55-1212"); tw.WriteAttributeString("", "key", "", "43532325"); tw.WriteStartElement("first"); tw.WriteString("Aaron"); tw.WriteEndElement(); tw.WriteStartElement("last"); tw.WriteString("Skonnard"); tw.WriteEndElement(); tw.WriteEndElement(); tw.WriteEndDocument();

This sequence of method calls generates the following XML 1.0 document:

 <x:name ssnum="555-55-1212" dl="43532325" xmlns:x="http://example.org/name"> <first>Aaron</first> <last>Skonnard</last> </x:name>

Again, if you need to write a CLR value into an attribute value, you should use XmlConvert as shown here:

long key = 43532325; tw.WriteAttributeString("", "dl", "", XmlConvert.ToString(key));

Hence, the model for working with XmlTextWriter consists of generating a logical stream of nodes through a sequence of method calls. This model is very similar to the model for generating XML documents with the SAX API.

Simplifying Text-Only Elements

Every time you write out a text-only element, you have to make the following three method calls: WriteStartElement, WriteString, and WriteEndElement as shown here in this example:

tw.WriteStartElement("first"); tw.WriteString("Aaron"); tw.WriteEndElement();

To simplify the process of working with text-only elements, XmlTextWriter also provides a WriteElementString method, which encapsulates these three calls:

tw.WriteElementString("first", "Aaron");

This is equivalent to the simplification produced by ReadElementString on the reading side. Take a look at how WriteElementString can simplify the original writing example shown above:

tw.WriteStartDocument(); tw.WriteComment(" Aaron Skonnard's name structure "); tw.WriteStartElement("x", "name", "http://example.org/name"); tw.WriteElementString("first", "Aaron"); tw.WriteElementString("last", "Skonnard"); tw.WriteEndElement(); tw.WriteEndDocument();

Writing Qualified-Name Values

Since more and more XML developers are working with XML Schema and namespace-aware documents, XmlTextWriter also contains some enhancements for working with namespace declarations and qualified names (QNames).

One such enhancement has to do with looking up the namespace prefix currently in-scope for a given namespace name. For example, suppose that you need to generate a title element that contains a QName value like this:

<title xmlns:x='http://example.org/name'>x:Mr</title>

The problem with generating this XML is that you need to make sure that you use the correct prefix in the text node. Hence, you can lookup the namespace prefix currently in-scope for a given namespace as shown here:

tw.WriteString( tw.LookupPrefix("http://example.org/name")+ ":" + "Mr");

There's also another method called WriteQualifiedName that simplifies the process of writing QNames into element content. This method takes a local name and a namespace name and generates the correct prefixed name in the XML 1.0 document (according to the in-scope namespace declarations):

tw.WriteStartElement("title"); tw.WriteQualifiedName("Mr", "http://example.org/name"); tw.WriteEndElement();

Although XmlTextWriter does a good job of hiding all details related to producing namespace declarations, these additional helper methods are necessary to query in-scope namespace declarations.

Controlling Other Serialization Details

Since XmlTextWriter is just a thin veneer on top of the resulting XML 1.0 byte stream, the class provides several configurable properties related to the serialization process such as:

Character encoding

Indentation (pretty-printing)

Character to use for indentation (space vs. tab, etc.)

Character to use for quoting attribute values (' vs. ")

The character encoding you wish to use needs to be supplied when instantiating the Stream or TextWriter object. The rest are configurable properties of XmlTextWriter . The following example illustrates how they may be used:

XmlTextWriter tw = new XmlTextWriter ("aaron1.xml", Encoding.UTF8); tw.Formatting = Formatting.Indented; tw.Indentation = 5; tw.IndentChar = ' '; tw.QuoteChar = '\''; tw.WriteStartDocument(); ... // omitted for brevity

This example produces an XML document that uses a UTF-8 encoding and contains the following XML declaration:

<?xml version='1.0' encoding='utf-8'?>

It also indents each element tag with five space characters and all attribute values and namespace declarations are enclosed in single quotes.

Building Documents with the DOM

The DOM is what most XML developers use to generate XML 1.0 documents today. As you learned in the first two parts of this series, the DOM is intuitive and easy to use but it comes with overhead. The same is true for generating XML documents with the DOM. In my opinion, using the DOM to generate XML documents is no easier than using XmlTextWriter and in many cases it's a bit more cumbersome.

The following code fragment illustrates how to build the first XML document shown above in the XmlTextWriter examples:

XmlDocument doc = new XmlDocument(); XmlNode comment, name, first, last; comment=doc.CreateComment("Aaron Skonnard's name structure"); doc.AppendChild(comment); name = doc.CreateElement("x", "name", "http://example.org/name"); doc.AppendChild(name); first = doc.CreateElement("first"); first.InnerText = "Aaron"; name.AppendChild(first); last = doc.CreateElement("last"); last.InnerText = "Skonnard"; name.AppendChild(last);

At this point, the logical XML document is loaded into memory as a tree of objects. You can serialize the tree out as a string of XML 1.0 through the XmlNode InnerXml property as shown here:

Console.WriteLine(doc.InnerXml);

XmlDocument also provides a Save method that you can use to serialize the tree out to a Stream, a TextWriter, or an XmlTextWriter. The following also illustrates how to do save the tree to the Console:

doc.Save(Console.Out);

If you want more control over the serialization details, you can instantiate an XmlTextWriter and specify those details prior to calling Save as shown here:

XmlTextWriter tw = new XmlTextWriter ("aaron5.xml", Encoding.UTF8); tw.Formatting = Formatting.Indented; doc.Save(tw);

With the DOM, you can accomplish all of the same things that I illustrated with XmlTextWriter . For example, you can simplify dealing with text-only elements through XmlNode's InnerText property (see example above). You can query the in-scope namespace declarations through XmlNode's GetNamespaceOfPrefix and GetPrefixOfNamespace methods. And as I just demonstrated, you can control the serialization details through the XmlDocument Save mechanism. Due to the fact that most of you are already familiar with the DOM API, I won't bore you with the remaining details.

Since DOM serialization is actually performed through the XmlTextWriter layer, the difference is the fact that document is loaded in memory ahead of time. Although there are some situations when this is necessary, it's generally better practice to generate XML document via XmlTextWriter directly.

Conclusion

Deciding which API to use for writing XML documents in .NET isn't as complicated as deciding which to use for reading. In general, unless you absolutely need to have the document in-memory prior to serialization, you're better off using XmlTextWriter directly. See the guidelines at the beginning of this piece for more on the determining factors.

References

[1] XML in .NET: .NET Framework XML Classes and C# Offer Simple, Scalable Data Manipulation, MSDN Magazine January 2001, http://msdn.microsoft.com/msdnmag/issues/01/01/xml/xml.asp, by Aaron Skonnard

[2] Writing XML Providers for Microsoft .NET, MSDN Magazine September 2001, http://msdn.microsoft.com/msdnmag/issues/01/09/xml/xml0109.asp, by Aaron Skonnard

Sample Code

Download the sample code, writingxml.zip, at the bottom of this page.

About The Author

Aaron Skonnard is a consultant, instructor, and author specializing in Windows technologies and Web applications. Aaron teaches courses for DevelopMentor and is a columnist for Microsoft Internet Developer. He is the author of Essential WinInet, and co-author of Essential XML: Beyond MarkUp (Addison Wesley Longman). Contact him at http://staff.develop.com/aarons.