How Do I Remove All Forward Slashes Between XML Tags While Leaving the Text?

I have some VERY large XML files. I need to remove all forward slashes between the opening and closing XML tags. The foward slashes can be replaced with spaces. I need to do this without removing the forward slashes from the closing HTML tags. Any help is greatly appreciated!

This:

<XML>
<REDACTED27> CT LSPINE W/O CONT XR29 </REDACTED27>
<sampletag>str1/str2/str3</sampletag>
</XML>

Becomes This:

<XML>
<REDACTED27> CT LSPINE W O CONT XR29 </REDACTED27>
<sampletag>str1 str2 str3</sampletag>
</XML>

Answers


lacking an xml aware tool, this works for simple structures

$ sed -r 's_([^<])/([^>])_\1 \2_g' xml

<XML>
<REDACTED27> CT LSPINE W O CONT XR29 </REDACTED27>
<sampletag>str1 str2 str3</sampletag>
</XML>
<test/>

Use a XML aware tool that parses the actual XML. For example, in xsh, you can just write

open file.xml ;
for //text() set . xsh:subst(., '/', ' ', 'g') ;
save :b ;

It's best when you don't do that with sed, awk or any other text editing utility.

Use an XML editing utility, for example XSLT.

The following transformation leaves the input untouched (except from indenting it nicely, but you can disable that) and just modifies text nodes (i.e. the stuff "between the opening and closing XML tags"):

<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" encoding="UTF-8" indent="yes" />
    <xsl:strip-space elements="*" />

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="text()">
        <xsl:value-of select="translate(., '/', ' ')" />
    </xsl:template>
</xsl:transform>

Save as e.g. removeslashes.xsl and use xsltproc on the command line.

xsltproc -o outputfile.xml removeslashes.xsl inputfile.xml

You should install xmlstarlet, then basically:

xmlstarlet pyx source.xml | perl -pe 'm/^-/ && s/\// /g' | xmlstarlet p2x > target.xml

In pyx, lines starting with `-' denote text nodes, therefore you replace slashes in them and assemble xml back without touching anything else.


As you included notepad++ tag, I suggest a replace all (Ctrl+H) with this regex in the "Find what" (and a space in "Replace with")

(?<!<)/(?!>)

The negative lookbehind (?<!<) makes sure that no character < comes before the / and the negative lookahead (?!>) makes sure that there's not a > afterwards. Here I assume that no </ or /> appear outside tags to process faster for your "VERY large XML files".


Need Your Help

I'm getting a MethodNotAllowedHttpException in RouteCollection.php when trying to submit a form using jQuery and ajax

jquery ajax laravel-5.2

As mentioned in the subject I'm getting a MethodNotAllowedHttpException in RouteCollection.php when trying to submit a form using jQuery and ajax

Python Exception in thread Thread-1 (most likely raised during interpreter shutdown)?

python multithreading numpy queue pygame

My friend and I have been working on a large project to learn and for fun in python and PyGame. Basically it is an AI simulation of a small village. we wanted a day/night cycle so I found a neat wa...