How to find all words followed by symbol using Python Regex?

I need re.findall to detect words that are followed by a "="

So it works for an example like

re.findall('\w+(?=[=])', "I think Python=amazing")

but it won't work for "I think Python = amazing" or "Python =amazing"... I do not know how to possibly integrate the whitespace issue here properly.

Thanks a bunch!

Answers


'(\w+)\s*=\s*'
re.findall('(\w+)\s*=\s*', 'I think Python=amazing')   \\ return 'Python'
re.findall('(\w+)\s*=\s*', 'I think Python = amazing') \\ return 'Python'
re.findall('(\w+)\s*=\s*', 'I think Python =amazing')  \\ return 'Python'

You said "Again stuck in the regex" probably in reference to your earlier question Looking for a way to identify and replace Python variables in a script where you got answers to the question that you asked, but I don't think you asked the question you really wanted the answer to.

You are looking to refactor Python code, and unless your tool understands Python, it will generate false positives and false negatives; that is, finding instances of variable = that aren't assignments and missing assignments that aren't matched by your regexp.

There is a partial list of tools at What refactoring tools do you use for Python? and more general searches with "refactoring Python your_editing_environment" will yield more still.


Just add some optional whitespace before the =:

\w+(?=\s*=)

Use this instead

 re.findall('^(.+)(?=[=])', "I think Python=amazing")

Explanation

# ^(.+)(?=[=])
# 
# Options: case insensitive
# 
# Assert position at the beginning of the string «^»
# Match the regular expression below and capture its match into backreference number 1 «(.+)»
#    Match any single character that is not a line break character «.+»
#       Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=[=])»
#    Match the character “=” «[=]»

You need to allow for whitespace between the word and the =:

re.findall('\w+(?=\s*[=])', "I think Python = amazing")

You can also simplify the expression by using a capturing group around the word, instead of a non-capturing group around the equals:

re.findall('(\w+)\s*=', "I think Python = amazing")

r'(.*)=.*' would do it as well ...

You have anything #1 followed with a = followed with anything #2, you get anything #1.

>>> re.findall(r'(.*)=.*', "I think Python=amazing")
['I think Python']
>>> re.findall(r'(.*)=.*', "  I think Python =    amazing oh yes very amazing   ")
['  I think Python ']
>>> re.findall(r'(.*)=.*', "=  crazy  ")
['']

Then you can strip() the string that is in the list returned.


re.split(r'\s*=', "I think Python=amazing")[0].split() # returns ['I', 'think', 'Python']

Need Your Help

Can Visual Sourcesafe 2005 generate a COMPARE report for all files in a project

visual-studio-2008 visual-sourcesafe

I am using Sourcesafe 2005 in a 3-project solution using Visual Studio 2008.

Permutations in Wolfram Alpha

wolfram-mathematica permutation wolfram-language

So, the program has to print all possible permutations of a set A with elements {x, y, z, w, u, t}, with length of 3, which accomplish this condition: 'u' must not appear after 't'.