Using XSLT when only a negligible number of changes to the XML is required

I have an XML with a size ranging from 50k to about 300k, on which I need to make some transformations. The transformations include moving some elements to a different location and merging between the attributes of different elements.

For all I know, ideally I should use XSLT to perform this transformation, but I'm afraid it will create me tons of unnecessary work constructing the output XSL, as the number of required changes is negligible compared to the amount of data that stays exactly the same.

My questions:

  1. Would you recommend trying to do this raw using just DOM abilities (I work in .net 3.5) and some XPATH? The down size is that if the number of required changes increases it may add unnecessary complexity to the code.

  2. If you still recommend XSLT, is there some way in which I can just copy chunks from the input XML as are instead of recreating them from scratch?

Thank you!

Answers


Excellent question.

It's certainly true that XSLT can be expensive when you only want to make a very small change to a document. The cost is especially noticeable if you want to make lots of iterative transformations, which sometimes happens in optimization use-cases where each transformation creates a dataset that is a small improvement on the previous one.

In many cases, however, the cost of making a small transformation is dominated by the parsing and serialization costs (converting lexical XML to a tree and back), so other approaches (e.g. DOM update or XQuery update) that also involve parsing and serialization are not going to be any better. So in answer to your question 1, I don't think using DOM would be any better. The only real way to achieve an improvement is to use an XML database, whcih allows you to avoid the parsing and serialization cost because the document is now held persistently in tree form rather than lexical form.

On your question 2, yes, you can copy chunks of the document unchanged using xsl:copy-of, and on any decent XSLT processor this should be very efficient. Certainly the cost of doing this copy from source tree to result tree is likely to be much less than the cost of parsing to construct the source tree, or serialization to dispose of the result tree. (Actual results, of course, will vary from one XSLT processor to another.)


Need Your Help

Reset java certifications (on linux) due to Maven download issue

java maven ubuntu keystore

I am using Ubuntu Linux 15.10. A few days ago I cloned a small git project which is using Maven (I use version 3.3.3). After cloning I wanted to use the