How to remove DataFrame rows where a column's values are in a set?

I have a set

remove_set

I want to remove all rows in a dataframe where a column value is in that set.

df = df[df.column_in_set not in remove_set]

This gives me the error:

'Series' objects are mutable, thus they cannot be hashed. 

What is the most pandas/pythonic way to solve this problem? I could iterate through the rows and figure out the the ilocs to exclude, but that seems a little inelegant.

Some sample input and expected output.

Input:

 column_in_set value_2 value_3
 1             'a'      3
 2             'b'      4
 3             'c'      5
 4             'd'      6

remove = set([2,4])

Output:

column_in_set value_2 value_3
1             'a'      3
3             'c'      5

Answers


To make the selection you can write:

df[~df['column_in_set'].isin(remove)]

isin() simply checks if each value of the column/Series is in a set (or list or other iterable), returning a boolean Series.

In this case, we want to only include rows of the DataFrame which are not in remove so we invert the boolean values with ~ and use then this to index the DataFrame.


Need Your Help

Why is `do{} while(0);` so fast?

objective-c do-while

I'v tried the following 3 for loops:

How to debug a __transparentProxy instance in VisualStudio 2008?

visual-studio debugging proxy remoting

I'm currently working on a debugging topic to improve the debugging to __TransparentProxy instance resolved from Unity's TransparentProxyInterceptor.