How do I filter a pandas DataFrame based on value counts?
I'm working in Python with a pandas DataFrame of video games, each with a genre. I'm trying to remove any video game with a genre that appears less than some number of times in the DataFrame, but I have no clue how to go about this. I did find a StackOverflow question that seems to be related, but I can't decipher the solution at all (possibly because I've never heard of R and my memory of functional programming is rusty at best).
Use groupby filter:
In : df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B']) In : df Out: A B 0 1 2 1 1 4 2 5 6 In : df.groupby("A").filter(lambda x: len(x) > 1) Out: A B 0 1 2 1 1 4
I recommend reading the split-combine-section of the docs.
df1 = df[df.groupby("A")['A'].transform('size') > 1]
df1 = df[df['A'].map(df['A'].value_counts()) > 1]