Global RegEx search and replace to fix HTML in an XML export file?

I have a fairly large WordPress .XML export file from a blog that I am going to migrate to Drupal. One glaring issue with the export file is that it's missing <p> tags for any paragraph breaks. However, the tags are present on the actual site.

From what I can see from the raw text in the XML file, there are multiple line breaks between paragraphs where there should have been a single <p> tag. I was hoping to globally add in a <p> tag where there's a line break and a capital letter using RegEx but I don't have a working knowledge of how that works. A sample XML tag in the export file that contains the text in question is:

<content:encoded><![CDATA[Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur gravida risus at sem interdum iaculis. Curabitur eget est tellus, quis viverra arcu. 


Cras posuere turpis imperdiet odio aliquet sollicitudin. Maecenas et neque eget quam fringilla tempor. Vivamus sodales vulputate consectetur. 


Sed ullamcorper elementum est, at dapibus orci fermentum vitae. Vivamus nisi turpis, pretium sed tincidunt et, dapibus at eros. Quisque neque magna, posuere eget eleifend ut.

As you can see from the above, there are multiple line breaks in between what should be paragraphs. I was thinking of the line break / capital letter combo for the RegEx so as to only put in one <p> tag and also target specifically the <content:encoded> XML tag so that I don't add tags elsewhere in the XML file. One other issue to make things more complicated is that some paragraphs already have <p> tags where the editor added in a custom class like <p class="myclass">.

Answers


This issue was discussed on StackOverflow somewhere before. Problem is, that Wordpress doesn't store the p tags in its database (if you use its WYSIWYG editor), these tags are created upon rendering by wpautop() function (instead of breaks). So I edited the export.php file (running WP 3.4.1) and added the function there. You can see the result on Pastebin (changes are on lines 375 and 376).

<content:encoded><?php echo wxr_cdata( apply_filters( 'the_content_export', wpautop( $post->post_content ) ) ); ?></content:encoded>
<excerpt:encoded><?php echo wxr_cdata( apply_filters( 'the_excerpt_export', wpautop( $post->post_excerpt ) ) ); ?></excerpt:encoded>

You can copy and paste the whole code in file [root]/wp-admin/icludes/export.php and run the export again. Don't forget to backup the file before - I don't guarantee it will work other versions, but you can get the idea how to edit the export.


Need Your Help

spring mvc 3.0 HTTP Status 404

java spring tomcat spring-mvc

I know this type of questions were asked earlier here, and I have gone through many but couldn't find the solution I'm looking for....

The type org.springframework.context.ConfigurableApplicationContext cannot be resolved

eclipse spring-boot spring-tool-suite

I am getting the following error, when i tried to create my first application in Spring Tool Suite: