xslt split 1 html into n html

I have a large number of files with a structure like the following a.html:

<html>
  <body>
    <div class="a">aaa
      <div class="b">bbb</div>
      <div class="c">ccc1
        <div class="d">ddd11
          <div class="e">eee11</div>
          <div class="f">fff11
            <div class="g">ggg111</div>
            <div class="g">ggg112</div>
            <div class="g">ggg113</div>
            <div class="g">ggg114</div>
            <div class="g">ggg115</div>
            <div class="g">ggg116</div>
          </div>
        </div>
      </div>
      <div class="c">ccc2
        <div class="d">ddd21
          <div class="e">eee21</div>
          <div class="f">fff21
            <div class="g">ggg211</div>
            <div class="g">ggg212</div>
            <div class="g">ggg213</div>
            <div class="g">ggg214</div>
          </div>
        </div>
      </div>
    </div>
  </body>
</html>

The number of div class="c" is a known single-digit integer, in this case it is equal to 2.

I would like to generate the files a_1.html and a_2.html, where each file contains the 1st and the 2nd occurrence of the div class="c" respectively.

In this example, I would like to generate a_1.html and a_2.html as follows:

a_1.html

<html>
  <body>
    <div class="a">aaa
      <div class="b">bbb</div>
      <div class="c">ccc1
        <div class="d">ddd11
          <div class="e">eee11</div>
          <div class="f">fff11
            <div class="g">ggg111</div>
            <div class="g">ggg112</div>
            <div class="g">ggg113</div>
            <div class="g">ggg114</div>
            <div class="g">ggg115</div>
            <div class="g">ggg116</div>
          </div>
        </div>
      </div>
    </div>
  </body>
</html>

a_2.html

<html>
  <body>
    <div class="a">aaa
      <div class="b">bbb</div>
      <div class="c">ccc2
        <div class="d">ddd21
          <div class="e">eee21</div>
          <div class="f">fff21
            <div class="g">ggg211</div>
            <div class="g">ggg212</div>
            <div class="g">ggg213</div>
            <div class="g">ggg214</div>
          </div>
        </div>
      </div>
    </div>
  </body>
</html>

I have a shell script like the following:

#!/bin/bash
for i in {1..2}
do
  xsltproc --param occurrence ${i} a.xslt a.html > a_${i}.html
done

My a.xslt however does not extract just the i-th (first or second in this case) occurrence of div class="c".

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:param name="occurrence"/>

 <xsl:template match="@* | node()">
  <xsl:copy>
   <xsl:apply-templates select="@* | node()"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="div[@class='a']">
  <xsl:copy>
   <xsl:apply-templates select="div[@class='a']" />
   <xsl:apply-templates select="@* | div[@class='b']  | text()" />
   <xsl:apply-templates select="div[@class='c']" />
  </xsl:copy>
 </xsl:template>

</xsl:stylesheet>

How could I modify it to get the correct result?

Thank you in advance for you help.

Answers


If you need to stay with your current approach, you need only to change the call for select="div[@class='c']. To:

<xsl:apply-templates select="div[@class='c'][position()=$occurrence]" />

But attention: The <xsl:apply-templates select="div[@class='a']" /> before apply-templates for attributes (@*) is wrong. Therefor try:

<xsl:template match="div[@class='a']">
    <xsl:copy>
        <xsl:apply-templates select="@* | div[@class='b']  | text()" />
        <xsl:apply-templates select="div[@class='c'][position()=$occurrence]" />
    </xsl:copy>
</xsl:template>

Use

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

 <xsl:param name="occurrence"/>

 <xsl:template match="@* | node()">
  <xsl:copy>
   <xsl:apply-templates select="@* | node()"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="div[@class='c']">
   <xsl:variable name="pos">
     <xsl:number count="div[@class = 'c']"/>
   </xsl:variable>
   <xsl:if test="$pos = $occurrence">
     <xsl:copy-of select="."/>
   </xsl:if>
 </xsl:template>

</xsl:stylesheet>

Need Your Help

Shibboleth SP: How to pass NameID in an http header?

saml-2.0 shibboleth

My web app is secured by reverse proxy where shibboleth sp does its magic. I recieve a saml response where i have custom name id. How do I pass it in a custom header to my web app?

Publish message using exchange and routing key using MassTransit

c# message-queue masstransit rabbitmq-exchange

I've been looking at MassTransit for a couple of weeks now and I'm curious about the possibilities. However, I don't seem to be able to get the concepts quite right.