Q&A

Track Changes With XML DataSets

Track changes in files using XML datasets with the DiffGram format. This format lets you track what has changed, and what hasn't.

Technology Toolbox: C#, XML, ADO.NET

Q:
Serialize a DataSet
I'm working on a project that needs to store different versions of the same XML document. My approach has been to read the XML document into a DataSet, then let the user modify the DataSet. Once the user finishes modifying the file, I use the GetChanges method to retrieve the changes and drop them into another DataSet. I store the second DataSet as another XML file, so I can use the Dataset.Merge method to merge it later with the original and get the modified version.

However, I'm having trouble with the GetChanges method. It's not returning the deleted rows, even though it's working fine for the added or the updated rows. GetChanges returns an empty DataSet when I delete any row from the DataSet.

For reference, here's the code I'm using:

ds.ReadXml(
   "http://localhost/Proto/Props.xml");
ds.AcceptChanges();
ds.Tables["LineDetail"].Rows[2].
   Delete();
DataSet ds1;
ds1 = ds.GetChanges(
   DataRowState.Deleted);
ds1.WriteXmlSchema(
   "c:\\ChangedSchema.xml");
ds1.WriteXml("c:\\ChangedDoc.xml");
// ChangedDoc.xml is empty  :-(

A:
There are many ways to serialize a DataSet; sometimes it's simply a matter of determining the best one for what you're trying to do. The approach in your sample, WriteXml, writes the current version of each data element. The DataSet returned by GetChanges() contains only one deleted row. That row has no current version. You get no output.

You've got two different options, depending on your needs. Let's assume you want to capture those rows that are about to be deleted. You could simply reject the changes before you write the file:

ds.ReadXml(
   "http://localhost/Proto/Props.xml");
ds.AcceptChanges();
ds.Tables["LineDetail"].Rows[2].
   Delete();
DataSet ds1;
ds1 = ds.GetChanges(
   DataRowState.Deleted);
ds1.RejectChanges();
ds1.WriteXmlSchema(
   "c:\\ChangedSchema.xml");
ds1.WriteXml("c:\\ChangedDoc.xml");
// ChangedDoc.xml is empty :-(

Rejecting the changes restores the current state of Row 2 in the LineDetail table. It's current and it exists, so it gets written to the file. This is the preferred way to make a backup of the deleted rows before you delete them. A single call to Dataset.Merge restores the deleted rows.

Unfortunately, calling RejectChanges leaves you with other problems. Your changes have been lost if they include added rows. It's the inverse of the problem you're observing: After you reject changes, the deleted rows are back, but new rows don't exist yet. The same is true of rows you change. Rejecting changes reverts those rows to the previous values, losing all the work.

A different approach lets you see the rows that have changed, regardless of what kind of changes the user has made. The DiffGram format stores two partial copies of the DataSet so you can track changes. One copy of the DataSet stores the current version of all records in each table, including any modified rows, newly added rows, and unmodified rows (see Figure 1).

The second copy of the DataSet stores the previous version of any rows that have changed. These rows are stored in the diffgr:before element of the serialized DataSet. The only copy of any deleted rows is also stored in this section. Deleted rows existed in the "before" picture but not in the "after" picture of the DataSet. Using this approach enables you to save the history of changes to the DataSet, not just to a snapshot of its current or previous contents. By saving both versions, the DiffGram format lets you see exactly what changes have been made:

ds.ReadXml(
   "http://localhost/TMP/Props.xml");
ds.AcceptChanges();
ds.Tables["LineDetail"].Rows[2].
   Delete();
DataSet ds1;
ds1 = 
   ds.GetChanges(DataRowState.Deleted);
ds1.WriteXmlSchema(
   "c:\\ChangedSchema.xml");
ds1.WriteXml("c:\\ChangedDoc.xml", 
   XmlWriteMode.DiffGram);
// ChangedDoc.xml has a 
// diffgr:before record.

The DiffGram format preserves all the information you need to transfer the set of changes to another machine, merge them into another DataSet, undo them, or save them for later processing (see Listing 1).

The DiffGram format has some extra attributes and elements in addition to the normal DataSet information. For example, note that a diffgr:id attribute is associated with each element. This attribute matches the current and previous version of any changed rows. The first record in the DataSet has been modified: I've changed the last name. You can see the previous last name in the diffgr:before section, in the record that has the matching diffgr:id, 1. You can also see that the first record has a diffgr:hasChanges attribute. The first record was modified, so this attribute has the "modified" value. Now look for the inserted record. It has a value of "inserted" for diffgr:hasChanges. The absence of this attribute indicates that the record wasn't changed.

Next, look at the bottom of the file in the diffgr:before section. Here you find a deleted record's original value, with a diffgr:id = Employees9. There is no diffgr:hasChanges attribute in the diffgr:before section. You can tell it's a deleted record because there's no record in the current version with the diffgr:id value of Employees9.

The ADO.NET DataSet is a powerful container. You can, and should, become familiar with the different ways you can use it to transfer information. The DiffGram format is especially useful to indicate all the changes that have been made to a set of records. It provides a convenient way to track any modifications made to your data.

About the Author

Bill Wagner, author of Effective C#, has been a commercial software developer for the past 20 years. He is a Microsoft Regional Director and a Visual C# MVP. His interests include the C# language, the .NET Framework and software design. Reach Bill at [email protected].

comments powered by Disqus

Featured

  • Microsoft Revamps Fledgling AutoGen Framework for Agentic AI

    Only at v0.4, Microsoft's AutoGen framework for agentic AI -- the hottest new trend in AI development -- has already undergone a complete revamp, going to an asynchronous, event-driven architecture.

  • IDE Irony: Coding Errors Cause 'Critical' Vulnerability in Visual Studio

    In a larger-than-normal Patch Tuesday, Microsoft warned of a "critical" vulnerability in Visual Studio that should be fixed immediately if automatic patching isn't enabled, ironically caused by coding errors.

  • Building Blazor Applications

    A trio of Blazor experts will conduct a full-day workshop for devs to learn everything about the tech a a March developer conference in Las Vegas keynoted by Microsoft execs and featuring many Microsoft devs.

  • Gradient Boosting Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the gradient boosting regression technique, where the goal is to predict a single numeric value. Compared to existing library implementations of gradient boosting regression, a from-scratch implementation allows much easier customization and integration with other .NET systems.

  • Microsoft Execs to Tackle AI and Cloud in Dev Conference Keynotes

    AI unsurprisingly is all over keynotes that Microsoft execs will helm to kick off the Visual Studio Live! developer conference in Las Vegas, March 10-14, which the company described as "a must-attend event."

Subscribe on YouTube