Java Splitting string including patterns into an Array
I'm trying to pseudo translate the text embedded within HTML in a string. I don't want to touch the actual html tags or its attributed, just the content.
So for example, if I have something like:
<td colspan='2'><a>This is a Text in <b>Bold</b></a></td>
I want this to be eventually modified into
<td colspan='2'><a>Thìs ís à Tèxt îñ <b>Bòlð</b></a></td>
1) I can't use any third party libraries, so I'm using standard regex to parse html 2) I tried both pattern.match() and pattern.split() but both seem to have a few limitations. pattern.split() helps with splitting the string based on a regex pattern, but I lose the actual pattern in that process. Pattern.match helps with retaining the pattern, but I can't guarentee the markup.
So ideally I would want something to take the string with HTML and break it into an array like
array: HTML Tag array: Plain Text array: HTML Tag array: Plain Text array: HTML Tag array: Plain Text array: HTML Tag
Any ideas ?
As regex, you could use this one:
I'm assuming here that you have a replace function that can take a captured group and mingle its text:
String str = "<td colspan='2'><a>This is a Text in <b>Bold</b></a></td>"; str.replaceAll("(?<=>)[^>]+(?=<)","");
However, without knowing how you intend to "pseudotranslate" a string, we can't really help you further. For custom replacement methods, this answer may be useful.