DevSmart

To Parse or Not To Parse

How to live a C++ life in an XML world.

Extensible Markup Language (XML) is everywhere, from configuration files for everyday software applications to advanced information-exchange technologies. Even car navigation systems use it. So how can native coders exist in this world and leverage the data in XML documents?

Parsers have allowed developers to read and write XML from their preferred programming languages. These access methods are the strength of XML because they remove the burden of developing code so that object-oriented applications can read serialized XML data. In my career, I've used them all: DOM, SAX and Pull parsers, among others.

Slow Process
These low-level XML access methods work to a limited extent, but parsers can be tedious and prone to error.

In my experience, I've always found it labor-intensive to teach a DOM parser how to process a document. I liken it to teaching kindergarteners how to read, except for the fact that every time you begin a new book you have to start over from scratch. The same applies to a SAX parser. I've avoided it by refraining from creating or using large XML documents.

As a result, XML parsers have always given me an uneasy feeling because of the work involved in putting them in operation in C++. On the proverbial Friday night, a few hours from weekend freedom, I might have a tendency to quickly give up using XML and a parser for my I/O works, and find instantaneous salvation in a few STL streaming operator or worst fprintf() calls.

As developers know, writing a program is like writing a murder mystery. When the book is finished, if the publisher wants to change who the murderer is, the author has to rewrite the whole thing to ensure story and plot lines logically lead the reader to the ending of the book.

Software development follows a similar pattern. If the demands of customers and requirements change, the C++ data structures must reflect those changes. In this demanding marketplace with cost and efficiency pressures, how can the C++ developer keep up?

Mapping Documents to Classes
The answer is XML binding, which maps XML documents to C++ objects created to represent the elements or data in the document, as well as the reverse-converting C++ objects to XML. XML binding is usually thought of as one more way to read or write XML. While true, that's a very limiting definition. For the C++ developer, XML binding can enable development efficiency and modernization. It allows the C++ developer to cope with constantly evolving requirements, create C++ classes quickly and expand the possibilities of what C++ can solve.

XML binding, based on XML schemas, marshalls C++ classes to XML.
[click image for larger view]
XML binding, based on XML schemas, marshalls C++ classes to XML.

XML binding is based on XML schemas, which are a simple file specification that identify "objects" and their contents. If XML is a prominent part of your C++ class development-that is, in specifying using XML Schemas-XML binding will allow you to convert your C++ class to XML, a process called marshalling. Unmarshalling is mapping XML to C++ classes.

For example, suppose your tech lead comes to you wanting to save or load data from XML. Regardless of when XML becomes a requirement, in one line of code, you can marshal your C++ class to XML:

std::ofstream ostrm ("C:\\temp\\portfolio.xml");
ostrm << myPortfolio.marshal();
ostrm.close();

If "read XML" is also a requirement, the code may look like this:

tns: :Portfolio myPortfolioReloaded;
std: :ifstream istrm("C:\\temp\\portfolio.xml");
std: :stringstream buffer;
buffer << istrm.rdbuf();
std: :string xmlContents (buffer.str());
myPortfolioReloaded.unmarshal (xmlContents);

With these very simple steps, C++ becomes XML-enabled and can now fit many programming patterns. It's an alternative to the traditional parsing methods; you can produce more powerful code with less-tedious bridging, and you have more flexibility and speed in what you can offer when app requirements change.

Automation
XML binders can help you automate code generation and develop C++ classes with a powerful programming interface. This type of tooling allows you to build complex business logic in C++ much more easily. However, some XML-processing solutions can get you lost in the weeds by generating C++ code with three levels of C pointer intricacy and no documentation. You should look for binders that provide documentation, STL-based interfaces, makefiles, Visual Studio projects and extremely readable code.

To be clear, I'm not suggesting automating a full development process, or specifying a whole project with XML schemas. C++ is cool, too. The focus is to use XML binding as another way to easily specify and develop C++ classes.

About the Author

Stephane Raynaud is a senior architect at Rogue Wave Software Inc.

comments powered by Disqus

Featured

Subscribe on YouTube