Regex in pandas to find a match based on string in another column

I have a dataframe of which this is a part.

   CodeID    Codes
0  'code1'   '[code1(a,b,c)][code2(c,d,e)][code3(e,f,g)]'   ...
1  'code2'   '[code1(a,b,c)][code2(c,d,e)][code3(e,f,g)]'   ...
2  'code3'   '[code1(a,b,c)][code2(c,d,e)][code3(e,f,g)]'   ...
...

What I'm trying to do is extract the part of the string in column Codes that matches the pattern r"\[<code in CodeID column>[^][]*\]"

Something like:

df['Code'] = df['Codes'].str.find(r"\[<code in CodeID column>[^][]*\]")

This recent question seems to imply it's not possible in a vectorised way but it's not exactly the same situation.

Answers


We can certainly use string from one column to compare another like below,

In lambda expression x[0] is codeID and x[1] is codes.

import re
import pandas as pd

Out[20]: 
    CodeID                                         Codes
0  'code1'  '[code1(a,b,c)][code2(c,d,e)][code3(e,f,g)]'
1  'code2'  '[code1(a,b,c)][code2(c,d,e)][code3(e,f,g)]'
2  'code3'  '[code1(a,b,c)][code2(c,d,e)][code3(e,f,g)]'

df[['CodeID','Codes']].apply(lambda x: re.match(r"\[%s[^][]*\]"%x[0], x[1]),axis=1)
Out[21]: 
0    None
1    None
2    None
dtype: object

Well it returns None because of my bad regex skills :)


Need Your Help

How to limit test data creation when running two tests with FactoryGirl

ruby ruby-on-rails-3 factory-bot

I have 48 records being created by FactoryGirl, the records use sequence so that they are all unique.

create upload script for google swiffy conversion

javascript php flash

I have a lot of .swf files (538 to be exact) that I want to convert to HTML5 using google swiffy, but they only allow one file at a time to upload and convert.