Unwanted regex capturing

The following regex expression is supposed to match a date in the form of YYYY-MM-DD sandwiched between two non alpha-numeric characters. It's supposed to extract only the date and not the two non-alphanum chars...but it does the opposite. What am I doing wrong. PS i already tried surrounding the [^:alnum:] in a non-capturing group (?:) but it didn't work.

regex exp1("[^:alnum:]([1-9][0-9]{3}(?:-[0-9][1-9]){2})[^:alnum:]")
//or
regex exp1("[^a-zA-Z0-9]([1-9][0-9]{3}(?:-[0-9][1-9]){2})[^a-zA-Z0-9]")

you can also go to this website to try my regex without having to write out c+ code for it. copy&paste the non POSIX bracket expression (without the quotations) if you choose to utilize the site:

regex online tester

#include <regex>
#include <string>
#include <iostream>
#include <vector>

#define isthirty(x) for (int i = 0; i < 3; i++) {if (days[i] == x[1]) {thirty = true;break;}}
using namespace std;

int main() {
    vector<string> words;
    string str;
    getline(cin, str);
    int N = stoi(str);
    int days[] = { 4,6,9,11 };
    regex exp1("[^a-zA-Z0-9]([1-9][0-9]{3}(?:-[0-9][1-9]){2})[^a-zA-Z0-9]");
    for (int i = 0; i < N; i++) {
        getline(cin, str);
        sregex_iterator it(str.cbegin(), str.cend(), exp1);
        sregex_iterator end;
        for (; it != end; it++) {
            words.push_back(it->str(0));
        }
    }

    regex exp2("([0-9])+");
    for (auto &it : words) {
        int dates[3] = {};
        sregex_iterator pos(it.cbegin(), it.cend(), exp2);
        sregex_iterator end;
        str = it.substr(1,10);
        for (int i = 0; pos != end; pos++, i++) {
            dates[i] = stoi(pos->str(0));
        }
        if (dates[0] > 2016 || dates[1] > 12 || dates[2] > 31) {
            continue;
        }
        bool thirty = false;
        isthirty(dates);
        if (thirty && dates[2] <= 30) {
            cout << str << "\n";
        }
        else if(dates[1] == 2) {
            if (dates[0] % 4 == 0 && dates[2] <= 29) {
                cout << str << "\n";
            }
            else if (dates[0] % 4 != 0 && dates[2] <= 28) {
                cout << str << "\n";
            }
        }
        else if (dates[2] <= 31) {
            cout << str << "\n";
        }
    }
    return 0;
}

Answers


Try simplier regexp:

[^0-9]([0-9]{4}-[0-9]{2}-[0-9]{2})[^0-9]

It looks for a non-digit, then the YYYY-MM-DD date, then a non-digit. It captures the date. Works for almost all regexp flavours.


In the regex you've provided, the overall regex (a.k.a. group 0) will include the two non-alphanum characters, but capture group 1 should only contain the date you're interested in. So, you could just use your regex as-is and then extract the information from group 1.

If you actually want to change your regex to not include the non-alphanum characters, you need to look into using a "positive lookbehind assertion" for the first group and a "positive lookahead assertion" for the last group. The assertions, even though they kind of look like other groups, don't actually include what they matched in the result.


Need Your Help

403 Forbidden error shows only in one page

php codeigniter networking router http-status-code-403

I used PHP (Codeigniter) in a project. I have uploaded it into a web server. One of my clients, keep complaining about a 403 Forbidden error on a specific page. I have tested that page with other

Inserting Registered Trademark Symbol/Copyright Symbol into MySQL with PHP

php html mysql sql database

I am having a hard time understanding how to insert the registered trademark sybol and copyright symbol to my mySQL database using php.