Matching math expression with regular expression?

For example, these are valid math expressions:

a * b + c
-a * (b / 1.50)
(apple + (-0.5)) * (boy - 1)

And these are invalid math expressions:

--a *+ b @ 1.5.0  // two consecutive signs, two consecutive operators, invalid operator, invalid number
-a * b + 1)  // unmatched parentheses
a) * (b + c) / (d  // unmatched parentheses

I have no problem with matching float numbers, but have difficulty with parentheses matching. Any idea? If there is better solution than regular expression, I'll accept as well. But regex is preferred.

========

Edit:

I want to make some comments on my choice of the “accepted answer”, hoping that people who have the same question and find this thread will not be misled.

There are several answers I consider “accepted”, but I have no idea which one is the best. So I chose the accepted answer (almost) randomly. I recommend reading Guillaume Malartre’s answer as well besides the accepted answer. All of them give practical solutions to my question. For a somewhat rigorous/theoretical answer, please read David Thornley’s comments under the accepted answer. As he mentioned, Perl’s extension to regular expression (originated from regular language) make it “irregular”. (I mentioned no language in my question, so most answerers assumed the Perl implementation of regular expression – probably the most popular implementation. So did I when I posted my question.)

Please correct me if I said something wrong above.

Answers


Matching parens with a regex is quite possible.

Here is a Perl script that will parse arbitrary deep matching parens. While it will throw out the non-matching parens outside, I did not design it specifically to validate parens. It will parse arbitrarily deep parens so long as they are balanced. This will get you started however.

The key is recursion both in the regex and the use of it. Play with it, and I am sure that you can get this to also flag non matching prens. I think if you capture what this regex throws away and count parens (ie test for odd parens in the non-match text), you have invalid, unbalanced parens.

#!/usr/bin/perl
$re = qr  /
     (                      # start capture buffer 1
        \(                  #   match an opening paren
        (                   # capture buffer 2
        (?:                 #   match one of:
            (?>             #     don't backtrack over the inside of this group
                [^()]+    #       one or more 
            )               #     end non backtracking group
        |                   #     ... or ...
            (?1)            #     recurse to opening 1 and try it again
        )*                  #   0 or more times.
        )                   # end of buffer 2
        \)                  #   match a closing paren
     )                      # end capture buffer one
    /x;


sub strip {
    my ($str) = @_;
    while ($str=~/$re/g) {
        $match=$1; $striped=$2;
        print "$match\n";
        strip($striped) if $striped=~/\(/;
        return $striped;
    }
}

while(<DATA>) {
    print "start pattern: $_";
    while (/$re/g) { 
        strip($1) ;
    }
}   

__DATA__
"(apple + (-0.5)) * (boy - 1)"
"((((one)two)three)four)x(one(two(three(four))))"
"a) * (b + c) / (d"
"-a * (b / 1.50)"

Output:

start pattern: "(apple + (-0.5)) * (boy - 1)"
(apple + (-0.5))
(-0.5)
(boy - 1)
start pattern: "((((one)two)three)four)x(one(two(three(four))))"
((((one)two)three)four)
(((one)two)three)
((one)two)
(one)
(one(two(three(four))))
(two(three(four)))
(three(four))
(four)
start pattern: "a) * (b + c) / (d"
(b + c)
start pattern: "-a * (b / 1.50)"
(b / 1.50)

Use a pushdown automaton for matching paranthesis http://en.wikipedia.org/wiki/Pushdown_automaton (or just a stack ;-) )

Details for the stack solution:

while (chr available)
    if chr == '(' then
      push '('
    else
      if chr == ')' then
        if stack.elements == 0 then
          print('too many or misplaced )')
          exit
        else
          pop //from stack
end while
if (stack.elements != 0)
  print('too many or misplaced(')

Even simple: just keep a counter instead of stack.


Regular expressions can only be used to recognize regular languages. The language of mathematical expressions is not regular; you'll need to implement an actual parser (e.g. LR) in order to do this.


I believe you will be better off implementing a real parser to accomplish what you're after.

A parser for simple mathematical expressions is "Parsing 101", and there are several examples to be found online.

Some examples include:

Note that the grammar you will need for validating expressions is simpler than the examples above, since the examples also implement evaluation of the expression.


You can't use regex to do things like balance parenthesis.


This is tricky with one single regular expression, but quite easy using mixed regexp/procedural approach. The idea is to construct a regexp for the simple expression (without parenthesis) and then repeatedly replace ( simple-expression ) with some atomic string (e.g. identifier). If the final reduced expression matches the same `simple' pattern, the original expression is considered valid.

Illustration (in php).

function check_syntax($str) {

    // define the grammar
    $number = "\d+(\.\d+)?";
    $ident  = "[a-z]\w*";
    $atom   = "[+-]?($number|$ident)";
    $op     = "[+*/-]";
    $sexpr  = "$atom($op$atom)*"; // simple expression

    // step1. remove whitespace
    $str = preg_replace('~\s+~', '', $str);

    // step2. repeatedly replace parenthetic expressions with 'x'
    $par = "~\($sexpr\)~";
    while(preg_match($par, $str))
        $str = preg_replace($par, 'x', $str);

    // step3. no more parens, the string must be simple expression
    return preg_match("~^$sexpr$~", $str);
}


$tests = array(
    "a * b + c",
    "-a * (b / 1.50)",
    "(apple + (-0.5)) * (boy - 1)",
    "--a *+ b @ 1.5.0",
    "-a * b + 1)",
    "a) * (b + c) / (d",
);

foreach($tests as $t)
    echo $t, "=", check_syntax($t) ? "ok" : "nope", "\n";

The above only validates the syntax, but the same technique can be also used to construct a real parser.


For parenthesis matching, and implementing other expression validation rules, it is probably easiest to write your own little parser. Regular expressions are no good in this kind of situation.


Ok here's my version of parenthesis finding in ActionScript3, using this approach give a lot of traction to analyse the part before the parenthesis, inside the parenthesis and after the parenthis, if some parenthesis remains at the end you can raise a warning or refuse to send to a final eval function.

package {
import flash.display.Sprite;
import mx.utils.StringUtil;
public class Stackoverflow_As3RegexpExample extends Sprite
{
    private var tokenChain:String = "2+(3-4*(4/6))-9(82+-21)"
    //Constructor
    public function Stackoverflow_As3RegexpExample() {
        // remove the "\" that just escape the following "\" if you want to test outside of flash compiler.
        var getGroup:RegExp = new RegExp("((?:[^\\(\\)]+)?)   (?:\\()       (  (?:[^\\(\\)]+)? )    (?:\\))        ((?:[^\\(\\)]+)?)", "ix")   //removed g flag
        while (true) {
            tokenChain = replace(tokenChain,getGroup)
            if (tokenChain.search(getGroup) == -1) break; 
        }
        trace("cummulativeEvaluable="+cummulativeEvaluable)
    }
    private var cummulativeEvaluable:Array = new Array()
    protected function analyseGrammar(matchedSubstring:String, capturedMatch1:String, capturedMatch2:String,  capturedMatch3:String, index:int, str:String):String {
        trace("\nanalyseGrammar str:\t\t\t\t'"+str+"'")
        trace("analyseGrammar matchedSubstring:'"+matchedSubstring+"'")
        trace("analyseGrammar capturedMatchs:\t'"+capturedMatch1+"'  '("+capturedMatch2+")'   '"+capturedMatch3+"'")
        trace("analyseGrammar index:\t\t\t'"+index+"'") 
        var blank:String = buildBlank(matchedSubstring.length)
        cummulativeEvaluable.push(StringUtil.trim(matchedSubstring))
        // I could do soo much rigth here!
        return str.substr(0,index)+blank+str.substr(index+matchedSubstring.length,str.length-1)
    }
    private function replace(str:String,regExp:RegExp):String {
        var result:Object = regExp.exec(str)
        if (result)
            return analyseGrammar.apply(null,objectToArray(result)) 
        return str
    }
    private function objectToArray(value:Object):Array {
        var array:Array = new Array()
        var i:int = 0
        while (true) {
            if (value.hasOwnProperty(i.toString())) {
                array.push(value[i])
            } else {
                break;
            }
            i++
        }
        array.push(value.index)
        array.push(value.input)
        return array
    }
    protected function buildBlank(length:uint):String {
        var blank:String = ""
        while (blank.length != length)
            blank = blank+" "
        return blank
    }
}

}

It should trace this:

analyseGrammar str:             '2+(3-4*(4/6))-9(82+-21)'
analyseGrammar matchedSubstring:'3-4*(4/6)'
analyseGrammar capturedMatchs:  '3-4*'  '(4/6)'   ''
analyseGrammar index:           '3'

analyseGrammar str:             '2+(         )-9(82+-21)'
analyseGrammar matchedSubstring:'2+(         )-9'
analyseGrammar capturedMatchs:  '2+'  '(         )'   '-9'
analyseGrammar index:           '0'

analyseGrammar str:             '               (82+-21)'
analyseGrammar matchedSubstring:'               (82+-21)'
analyseGrammar capturedMatchs:  '               '  '(82+-21)'   ''
analyseGrammar index:           '0'
cummulativeEvaluable=3-4*(4/6),2+(         )-9,(82+-21)

Need Your Help

Reading from SerialPort; Error when I close and then open application

c# serial-port

I created application for reading from Serial Port and it works just fine. But the problem occure when I close the application and run it again. Suddenly reading from Serial port stops working. I h...