Regex - Match an end html tag if start tag is not present

i want to get an ending html tag like </EM> only if somewhere before it i.e. before any previous tags or text there is no starting <EM> tag my sample string is

ddd d<STRONG>dfdsdsd dsdsddd<EM>ss</EM>r and</EM>and strong</STRONG>

in this string the output should be </EM> and this also the second </EM> because it lacks the starting <EM>. i have tried

(?!=<EM>.*)</EM>

but it doesnt seem to work please help thnks

Answers


I am not sure regex is best suited for this kind of task, since tags can always be nested.

Anyhow, a C# regex like:

(?<!<EM>[^<]+)</EM>

would only bring the second </EM> tag

Note that:

  • ?! is a negative lookahead which explains why both </EM> are found. So... (?!=<EM>.*)xxx actually means capture xxx if it is not followed by =<EM>.*. I am not sure you wanted to include an = in there
  • ?<! is a negative lookbehind, more suited to what you wanted to do, but which would not work with java regex engine, since this look-behind regex does not have an obvious maximum length.

However, with a .Net regex engine, as tested on RETester, it does work.


You need a pushdown automaton here. Regular expressions aren't powerful enough to capture this concept, since they are equivalent to finite-state automata, so a regex solution is strictly speaking a no-go.

That said, .NET regular expressions do have a pushdown automaton behind them so they can theoretically cope with such cases. If you really feel you need to do this with regular expressions rather than a formal HTML parser, take a glimpse here.


You should see the top answer to this other Stack Overflow question, because it gives the perfect answer. In short, don't use regular expressions to try to parse HTML - it's a really bad idea.


Need Your Help

Web Test recorder does not allow me to record a test IE11 windows 7

testing visual-studio-2013 web-testing

I am trying to record a Web Performance Test using the Web Test Recorder 12.00 plugin that comes with VS 2013.

reading data from arduino with python

python serial-port arduino export

im trying to do something which arduino sends bunch of data frequently, and my objective is: