RSS

Parsing and Producing XML Documents

06 May

The XmlReader and XmlWriter abstract classes are at the heart of the XML object model in the .NET Framework. XmlReader provides the API for reading XML documents, while XmlWriter provides the complementary API for producing W3C’s standards-compliant XML documents. In designing these classes, Microsoft borrows concepts from both the DOM and SAX. In the end, these classes use neither of those models, but are a compromise between the two.

Like SAX, XmlReader and XmlWriter use a streaming model to read the streams of data that form an XML document one piece at a time, and can even skip pieces of no interest. At the same time, like DOM, the API is based on a more developer-friendly pull model. This innovative stream-based, pull-model API provides developers with means for accessing and producing XML documents that are efficient, powerful, and easy to use.

Because both XmlReader and XmlWriter are abstract classes, you cannot instantiate and work directly with them. For this purpose, .NET provides a few concrete implementations of these classes. For XmlReader, these are the XmlTextReader, XmlNodeReader, and XslReader concrete classes. For XmlWriter, we have XmlTextWriter and XmlNodeWriter.

We will first look at how we can use the XmlReader to read and process XML documents.

XmlReader

The XmlReader abstract class provides a fast, read-only, and forward-only cursor for reading XML documents. Using XmlReader is much like using the DOM, where you read and work with one node at a time. The following snippet shows a C# method that traverses an XML document and displays the names of all the elements in the document.

public void DisplayElements( XmlReader reader )
{
  /* read the next node in document order */
  while ( reader.Read() )
  {
    /* if this is an element node, display its name */
    Console.WriteLine( reader.Name );      
  }
}

XmlReader has an associated cursor that defines the notion of a current node in the document stream. Methods are provided that allow the user to traverse the document one node at a time, moving the current node along the way. XmlReader also provides several methods for inspecting the type and value of the current node.

Reading Nodes

The Read( ) method is the fundamental method in XmlReader for moving the cursor in an XML document. Every time you call Read( ), XmlReader moves the cursor to the next node in document order until it reaches the end of the stream, in which case it returns a false value. For example, running the previous sample DisplayElements( ) method against the sample1.xml file in Listing 16-7 produces the following output:

persons
person
firstname
lastname
person
firstname
lastname
person

Listing 16-7 sample1.xml: A sample XML document.

<persons>
  <person>
    <firstname>Albert</firstname>
    <lastname>Einstein</lastname>
  </person>
  <person>
    <firstname>Niels</firstname>
    <lastname>Bohr</lastname>
  </person>
</persons>

Once you have landed at a certain node through the Read( ) method or any other methods provided by the XmlReader for moving the cursor, you can inspect its contextual information or value. For example, the NodeType property returns a type identifier similar to those used by the DOM. Another property, Value, allows you to access a node’s value. XmlReader also provides methods for reading typed text values (for example, ReadInt16, ReadDouble, ReadString, and so forth).

The following snippet is a C# method that uses the various properties exposed by XmlReader to display the textual representation of the current node.

public void DisplayNode( XmlReader node )
{
  switch ( node.NodeType )
  {
    case XmlNodeType.Element:
      Console.Write( "<" + node.Name + ">" );
      break;
    case XmlNodeType.Text:
      Console.Write( node.Value );
      break;
    case XmlNodeType.CDATA:
      Console.Write( node.Value );
      break;
    case XmlNodeType.ProcessingInstruction:
      Console.Write( "< " + node.Name + " " + node.Value + " >" );
      break;
    case XmlNodeType.Comment:
      Console.Write( "<!--" + node.Value + "-->" );
      break;
    case XmlNodeType.Document:
      Console.Write( "< xml version='1.0' >" );
      break;
    case XmlNodeType.Whitespace:
      Console.Write( node.Value );
      break;
    case XmlNodeType.SignificantWhitespace:
      Console.Write( node.Value );
      break;
    case XmlNodeType.EndTag:
      Console.Write( "</" + node.Name + ">" );
      break;
  }
}

Reading Attributes

Because attributes are regarded as part of a documen’s hierarchical structure, Read( ) will not encounter attribute nodes. To access attributes for the current element you can use the GetAttribute( ) method. With GetAttribute( ) you can look up a specific attribute either by name or index. The following code snippet shows one way to iterate through and display all attributes in the current node:

for ( int = 0; i < node.AttributeCount; i++ )
{
  Console.WriteLine( node.GetAttribute( i ) );
}

XmlReader also provides methods to let you traverse the attributes using the current node cursor. The MoveToAttribute( ), MoveToFirstAttribute( ), and MoveToNextAttribute( ) methods can be used to move sideways through the attributes attached to the current node. The following code snippet displays the names and values of all the attributes in the current node:

while ( node.MoveToNextAttribute() )
{
  Console.Write( " " + node.Name + "=\"" + node.Value + "\"" );
} 

Notice that, unlike GetAttribute( ), you can retrieve both the name and the value of an attribute using this method.

A Sample XML Reader

ReadXML.cs, in Listing 16-8, shows a C# console program that uses all the methods and properties described in this section to display the complete textual representation of the book.xml XML file.

Listing 16-8 ReadXML.cs: A C# console using the discussed methods and properties.

using System;
using System.IO;
using System.Xml;
public class ReadXML
{
  public void DisplayNode( XmlReader node )
  {
    switch ( node.NodeType )
    {
      case XmlNodeType.Element:
        Console.Write( "<" + node.Name );
        while ( node.MoveToNextAttribute() )
        {
          Console.Write( " " + node.Name + "=\"" + node.Value + "\"" );
        }
        Console.Write(">");
        break;
      case XmlNodeType.Text:
        Console.Write( node.Value );
        break;
      case XmlNodeType.CDATA:
        Console.Write( node.Value );
        break;
      case XmlNodeType.ProcessingInstruction:
        Console.Write( "< " + node.Name + " " + node.Value + " >" );
        break;
      case XmlNodeType.Comment:
        Console.Write( "<!--" + node.Value + "-->" );
        break;
      case XmlNodeType.Document:
        Console.Write( "< xml version='1.0' >" );
        break;
      case XmlNodeType.Whitespace:
        Console.Write( node.Value );
        break;
      case XmlNodeType.SignificantWhitespace:
        Console.Write( node.Value );
        break;
      case XmlNodeType.EndTag:
        Console.Write( "</" + node.Name + ">" );
        break;
    }
  }
  public void DisplayElements( XmlReader reader )
  {
    /* read the next node in document order */
    while ( reader.Read() )
    {
      DisplayNode( reader );
    }
  }
  public static void Main( String[] args )
  {
    XmlTextReader reader = new XmlTextReader( "book.xml" );
    ReadXML tr = new ReadXML();
    tr.DisplayElements( reader );
  }
}

Note that we have used the XmlTextReader concrete implementation of XmlReader in ReadXML.cs.

To compile ReadXML.cs, open a console window and type in the following command line:

csc /r:System.dll /r:System.Xml.dll ReadXML.cs

Running the resultant ReadXML.exe program displays the content of book.xml on the console.

Validation

One of the features the XmlTextReader provides is validation. XmlTextReader supports validation against DTDs, XDR, and XSD schemas. Validation is off by default, and to turn it on you must provide a ValidationHandler and the Validation property must be set to either DTD or Schema, depending on the validation method required. With validation turned on, the ValidationHandler event handler is called whenever a validation error occurs. A validation error is any error listed in the W3C as “Validity Constraint”. ValidationEventHandler has the following signature:

public delegate void ValidationEventHandler( object sender, 

ValidationEventArgs args );

The ValidationEventArgs class provides the error code and error-message information for the event handler. Listing 16-9 shows a C# program that validates a user-specified XML file using schema validation.

Listing 16-9 Validation.cs: Code that validates an XML file using schema validation.

using System;
using System.IO;
using System.Xml;
public class Validation
{
  static bool gotError = false;
  public static void Main( String[] args )
  {
    /* make sure we were passed with a filename */
    if ( args.Length == 0 )
    {
      Console.WriteLine( "Usage: validation filename" );
      return;
    }
    try
    {
      /* Instantiates an XmlReaderWriter using the filename passed
         in from the command line */
      XmlTextReader reader = new XmlTextReader( args[ 0 ] );
      /* Specify the validation method */
      reader.Validation = Validation.Schema;
      /* Register event handler for reporting validation errors */
      reader.ValidationEventHandler += new ValidationEventHandler

( OnValidateError );
      /* traverse the whole document */
      while ( reader.Read() );
      if ( !gotError )
      {
        Console.WriteLine( "Document is valid" );
      }
    }
    catch ( Exception e )
    {
      Console.WriteLine( "Error: " + e.Message );
    }
  }
  public static void OnValidateError( Object sender, ValidationEventArgs args )
  {
    gotError = true;
    Console.WriteLine( "Error: " + args.Message );
  }
}

To compile Validation.cs, type in the following command line in a console window:

csc /r:System.dll /r:System.Xml.dll Validation.cs

XmlWriter

The XmlWriter abstract class is used to produce document streams conforming to the W3C’s XML 1 and Namespaces Recommendations. XmlWriter handles many of the complexities in producing XML documents automatically, such as making sure that elements are properly closed and attribute values are quoted. XmlWriter also provides methods for writing typed data (for example, WriteInt32( ), WriteDouble( ), WriteString( ), and so forth).

WriteXML.cs in Listing 16-10 is a C# console program that generates an XML document to the console using XmlWriter.

Listing 16-10 WriteXML.cs: Code that generates an XML document to the console.

using System;
using System.IO;
using System.Xml;
public class WriteXML
{
  public static void Main( String[] args )
  {
    /* Instantiates an XmlTextWriter that writes to the console */
    XmlTextWriter writer = new XmlTextWriter( Console.Out );
    /* Use indenting for readability */
    writer.Formatting = Formatting.Indented;
    writer.Indentation = 4;
    /* Now write out our XML document */
    writer.WriteStartDocument();
      writer.WriteStartElement( "Persons" );
        writer.WriteStartElement( "Person" );
          writer.WriteStartElement( "Name" );
            writer.WriteStartAttribute( null, "firstName", null );
              writer.WriteString( "Albert" );
            writer.WriteEndAttribute();
            writer.WriteStartAttribute( null, "lastName", null );
              writer.WriteString( "Einstein" );
            writer.WriteEndAttribute();
          writer.WriteEndElement();
        writer.WriteEndElement();
        writer.WriteStartElement( "Person" );
          writer.WriteStartElement( "Name" );
            writer.WriteStartAttribute( null, "firstName", null );
              writer.WriteString( "Niels" );
            writer.WriteEndAttribute();
            writer.WriteStartAttribute( null, "lastName", null );
              writer.WriteString( "Bohr" );
            writer.WriteEndAttribute();
          writer.WriteEndElement();
        writer.WriteEndElement();
      writer.WriteEndElement();
    writer.WriteEndDocument();
    /* Don't forget to close the writer */
    writer.Close();
  }
}

Note that we have used the concrete class XmlTextWriter in WriteXML.cs. XmlTextWriter is a concrete implementation of XmlWriter for writing out character streams. It supports many different output types such as file, URI, and stream. It also provides pretty printing and other options properties for characteristics like indentation, namespace support, attribute quote character, and so on.

Code demo: http://www.mediafire.com/?nrpz2bu1okl

 
2 Comments

Posted by on 06/05/2008 in XML

 

2 responses to “Parsing and Producing XML Documents

  1. Lav

    05/06/2008 at 12:46

    Nice Blog.
    Check out mine on XSD Schema Validation http://lavbox.blogspot.com/2007/07/xsd-schema-validation.html

     
  2. anon

    02/12/2009 at 09:46

    you may also want to check out vtd-xml, the latest and most advanced xml processing model

    vtd-xml

     

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: