XSL Remove preceding sibling based on Element value

Hi I require help parsing the following XML.

<xmeml>
<Doc>
    <Test>
        <Unit>abc</Unit>
        <Unit2>1234</Unit2>
    </Test>
    <Test>
        <Unit>bcd</Unit>
        <Unit2>2345</Unit2>
    </Test>
</Doc>
<Doc>
    <Test>
        <Unit>abc</Unit>
        <Unit2>3456</Unit2>
    </Test>
    <Test>
        <Unit>cde</Unit>
        <Unit2>3456</Unit2>
    </Test> 
</Doc>
<Doc>
    <Test>
        <Unit>abc</Unit>
        <Unit2>1234</Unit2>
    </Test>
    <Test>
        <Unit>def</Unit>
        <Unit2>4567</Unit2>
    </Test> 
</Doc>
<Doc>
    <Test>
        <Unit>abc</Unit>
        <Unit2>1234</Unit2>
    </Test>
    <Test>
        <Unit>efg</Unit>
        <Unit2>2345</Unit2>
    </Test> 
</Doc>
</xmeml>

ending up with the following

<xmeml>
<Doc>
    <Test>
        <Unit>bcd</Unit>
        <Unit2>2345</Unit2>
    </Test>
</Doc>
<Doc>
    <Test>
        <Unit>abc</Unit>
        <Unit2>3456</Unit2>
    </Test>
    <Test>
        <Unit>cde</Unit>
        <Unit2>3456</Unit2>
    </Test> 
</Doc>
<Doc>
    <Test>
        <Unit>def</Unit>
        <Unit2>4567</Unit2>
    </Test> 
</Doc>
<Doc>
    <Test>
        <Unit>abc</Unit>
        <Unit2>1234</Unit2>
    </Test>
    <Test>
        <Unit>efg</Unit>
        <Unit2>2345</Unit2>
    </Test> 
</Doc>
</xmeml>

I am attempting to create a XSLT doc to do this but as yet have not found one that works. I should note that the required matching parameters within are , in this case "abc", are variables and will never be a static searchable entity.

So in english my XSL would be like this: For any parent containing a matching 'Unit' value delete all preceding parents 'Test' containing a duplicate value within 'Unit' except the last.

All help most appreciated Thanks

Answers


You can use the identity template to copy the whole document and override that template with an empty template for elements that you want to delete. For checking whether a <Test> element should be deleted, you can compare its <Unit> value to that of following siblings.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>

    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="/xmeml/Doc/Test[(Unit = following-sibling::Test/Unit) or (Unit = ../following-sibling::Doc/Test/Unit)][descendant::Unit2[starts-with(.,'1234')]]"/>

</xsl:stylesheet>

As there are two possibilities of how those following <Unit> values can occur, both conditions are explicitly written and joined with the or operator in the condition.

The two possibilities along with their respective XPath conditions are:

  • in a succeeding <Test> element in the same parent element as the current one:Unit = following-sibling::Test/Unit
  • in a <Test> element in a succeeding <Doc> sibling of the current <Doc> element:Unit = ../following-sibling::Doc/Test/Unit

Assuming this input:

<xmeml>
<Doc>
    <Test>
        <Unit>abc</Unit>
        <Unit2>1234</Unit2>
    </Test>
    <Test>
        <Unit>bcd</Unit>
        <Unit2>2345</Unit2>
    </Test>
</Doc>
<Doc>
    <Test>
        <Unit>abc</Unit>
        <Unit2>3456</Unit2>
    </Test>
    <Test>
        <Unit>cde</Unit>
        <Unit2>3456</Unit2>
    </Test> 
</Doc>
<Doc>
    <Test>
        <Unit>abc</Unit>
        <Unit2>1234</Unit2>
    </Test>
    <Test>
        <Unit>def</Unit>
        <Unit2>4567</Unit2>
    </Test> 
</Doc>
<Doc>
    <Test>
        <Unit>abc</Unit>
        <Unit2>1234</Unit2>
    </Test>
    <Test>
        <Unit>efg</Unit>
        <Unit2>2345</Unit2>
    </Test> 
</Doc>
</xmeml>

The XSLT creates this output:

<xmeml>
  <Doc>
    <Test>
      <Unit>bcd</Unit>
      <Unit2>2345</Unit2>
    </Test>
  </Doc>
  <Doc>
    <Test>
      <Unit>abc</Unit>
      <Unit2>3456</Unit2>
    </Test>
    <Test>
      <Unit>cde</Unit>
      <Unit2>3456</Unit2>
    </Test>
  </Doc>
  <Doc>
    <Test>
      <Unit>def</Unit>
      <Unit2>4567</Unit2>
    </Test>
  </Doc>
  <Doc>
    <Test>
      <Unit>abc</Unit>
      <Unit2>1234</Unit2>
    </Test>
    <Test>
      <Unit>efg</Unit>
      <Unit2>2345</Unit2>
    </Test>
  </Doc>
</xmeml>

There are two different ways of performing grouping in XSLT 1.0 -- simple, using a preceding:: or following:: axis, and, the more efficient Muenchian grouping method -- using keys.

Here is the more efficient, Muenchian method:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>

     <xsl:key name="kParentByUnit" match="Test[Unit]" use="Unit"/>

     <xsl:template match="node()|@*" name="identity">
         <xsl:copy>
           <xsl:apply-templates select="node()|@*"/>
         </xsl:copy>
     </xsl:template>

     <xsl:template match="Test[Unit]">
       <xsl:if test=
       "generate-id()
       =
        generate-id(key('kParentByUnit', Unit)[position()=last()])">

        <xsl:call-template name="identity"/>
       </xsl:if>
     </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<xmeml>
    <Doc>
        <Test>
            <Unit>abc</Unit>
            <Unit2>1234</Unit2>
        </Test>
        <Test>
            <Unit>bcd</Unit>
            <Unit2>2345</Unit2>
        </Test>
    </Doc>
    <Doc>
        <Test>
            <Unit>abc</Unit>
            <Unit2>3456</Unit2>
        </Test>
        <Test>
            <Unit>cde</Unit>
            <Unit2>3456</Unit2>
        </Test>
    </Doc>
    <Doc>
        <Test>
            <Unit>abc</Unit>
            <Unit2>1234</Unit2>
        </Test>
        <Test>
            <Unit>def</Unit>
            <Unit2>4567</Unit2>
        </Test>
    </Doc>
    <Doc>
        <Test>
            <Unit>abc</Unit>
            <Unit2>1234</Unit2>
        </Test>
        <Test>
            <Unit>efg</Unit>
            <Unit2>2345</Unit2>
        </Test>
    </Doc>
</xmeml>

the wanted, correct result is produced:

<xmeml>
   <Doc>
      <Test>
         <Unit>bcd</Unit>
         <Unit2>2345</Unit2>
      </Test>
   </Doc>
   <Doc>
      <Test>
         <Unit>cde</Unit>
         <Unit2>3456</Unit2>
      </Test>
   </Doc>
   <Doc>
      <Test>
         <Unit>def</Unit>
         <Unit2>4567</Unit2>
      </Test>
   </Doc>
   <Doc>
      <Test>
         <Unit>abc</Unit>
         <Unit2>1234</Unit2>
      </Test>
      <Test>
         <Unit>efg</Unit>
         <Unit2>2345</Unit2>
      </Test>
   </Doc>
</xmeml>

Need Your Help

how to create a collection with O(1) complexity

algorithm data-structures collections hash theory

I would like to create a data structure or collection which will have O(1) complexity in adding, removing and calculating no. of elements. How am I supposed to start?

Extract a tar.xz in C/C++

c++ c qt tar xz

I am writing a program that downloads tar.xz files from a server and extracts them in a certain place. I am struggling to find a away of extracting the tar.xz file in the certain place. I am using ...