Java regex split with tag

text1 = Java programming #data#2016#/data#.
text2 = Java programming #core#2016#/core#.
text3 = Java programming #year#2016#/year#.
text4 = Java programming #data#2016.
text5 = Java programming #core#2016.
        or another combination..

I want to result for five text; (Important some tags not closed)

Split[0] : Java programming 
Split[1] : 2016 

How I can solve this problem with RegEx? Or another way?

Answers


This regex (.*) (.*) .*#(\d+)# will work.

But here the first and second field must not contain any space.

You can use this regex with java using Pattern Matcher. Here is an example from http://www.ocpsoft.org/opensource/guide-to-regular-expressions-in-java-part-1/ :

import java.util.regex.*;

public class ReplaceDemo {
    public static void main(String[] args) {
        String input = 
                  "User clientId=23421. Some more text clientId=33432. This clientNum=100";

        Pattern p = Pattern.compile("(clientId=)(\\d+)");
        Matcher m = p.matcher(input);

        StringBuffer result = new StringBuffer();
        while (m.find()) {
            System.out.println("Masking: " + m.group(2));
            m.appendReplacement(result, m.group(1) + "***masked***");
        }
        m.appendTail(result);
        System.out.println(result);
    }
}

EDIT: You can split based on tags between # tokens. Here is an example code:

public class RegexTest {
  public static void main(String []args) {
    // Input text
    String text1 = "Java programming #data#2016#/data#.";

    // Split based on # tokens
    String[] text1Split = text1.split("#[^#]*#");

    // Print result
    System.out.println(text1 + ": ");
    for(int i = 0; i < text1Split.length; ++i) {
        System.out.println("Split[" + i + "] : " + text1Split[i]);
    }
  }
}

This will print:

Java programming #data#2016#/data#.:                                                                                                                            
Split[0] : Java programming                                                                                                                                     
Split[1] : 2016                                                                                                                                                 
Split[2] : .  

If you want to drop the dot, you can change the regex to #[^#]*#\\.?.

In case you need something more intricate, more sophisticated parsing would be needed, given that some tags are not even closed. There is no generic solution for this problem, but you can write a simple parser that suits your needs.

Keep in mind that regular expressions are not suitable for HTML parsing and therefore will not be the best option in your case.


Need Your Help

Can triggers on the tables in schema INFORMATION_SCHEMA be created in mysql (5.5) ?

mysql information-schema

I'd like to be able to pause shortly before creating a table (for a few seconds or so), so that i can spot in my console which step in an installation process is running the "CREATE TABLE" statemen...