Java Splitting string including patterns into an Array

I'm trying to pseudo translate the text embedded within HTML in a string. I don't want to touch the actual html tags or its attributed, just the content.

So for example, if I have something like:

<td colspan='2'><a>This is a Text in <b>Bold</b></a></td>

I want this to be eventually modified into

<td colspan='2'><a>Thìs ís à Tèxt îñ <b>Bòlð</b></a></td>

1) I can't use any third party libraries, so I'm using standard regex to parse html 2) I tried both pattern.match() and pattern.split() but both seem to have a few limitations. pattern.split() helps with splitting the string based on a regex pattern, but I lose the actual pattern in that process. Pattern.match helps with retaining the pattern, but I can't guarentee the markup.

So ideally I would want something to take the string with HTML and break it into an array like

array[0]: HTML Tag
array[1]: Plain Text
array[2]: HTML Tag
array[3]: Plain Text
array[4]: HTML Tag
array[5]: Plain Text
array[6]: HTML Tag

Any ideas ?

Answers


As regex, you could use this one:

(?<=>)[^>]+(?=<)

I'm assuming here that you have a replace function that can take a captured group and mingle its text:

String str = "<td colspan='2'><a>This is a Text in <b>Bold</b></a></td>";
str.replaceAll("(?<=>)[^>]+(?=<)","");

However, without knowing how you intend to "pseudotranslate" a string, we can't really help you further. For custom replacement methods, this answer may be useful.


Need Your Help

Change images with different timeinterval using Js or jquery

javascript jquery html5 image css3

How to change images with different time interval in a single div using CSS3, JavaScript or jQuery. At the last time interval should be cleared.