pandas count over multiple columns

I have a dataframe looking like this

Measure1 Measure2 Measure3 ...
0        1         3
1        3         2
3        0        

I'd like to count the occurrences of the values over the columns to produce:

Measure Count Percentage
0       2     0.25
1       2     0.25
2       1     0.125
3       3     0.373

With

outcome_measure_count = cdss_data.groupby(key_columns=['Measure1'],operations={'count': agg.COUNT()}).sort('count', ascending=True)

I only get the first column (actually using graphlab package, but I'd prefer pandas)

Could someone help me?

Answers


You can generate the counts by flattening the df using ravel and value_counts, from this you can construct the final df:

In [230]:
import io
import pandas as pd
​
t="""Measure1 Measure2 Measure3
0        1         3
1        3         2
3        0        0"""
​
df = pd.read_csv(io.StringIO(t), sep='\s+')
df

Out[230]:
   Measure1  Measure2  Measure3
0         0         1         3
1         1         3         2
2         3         0         0

In [240]:    
count = pd.Series(df.squeeze().values.ravel()).value_counts()
pd.DataFrame({'Measure': count.index, 'Count':count.values, 'Percentage':(count/count.sum()).values})

Out[240]:
   Count  Measure  Percentage
0      3        3    0.333333
1      3        0    0.333333
2      2        1    0.222222
3      1        2    0.111111

I inserted a 0 just to make the df shape correct but you should get the point


In [68]: df=DataFrame({'m1':[0,1,3], 'm2':[1,3,0], 'm3':[3,2, np.nan]})

In [69]: df
Out[69]:
   m1  m2   m3
0   0   1  3.0
1   1   3  2.0
2   3   0  NaN

In [70]: df=df.apply(Series.value_counts).sum(1).to_frame(name='Count')

In [71]: df
Out[71]:
     Count
0.0    2.0
1.0    2.0
2.0    1.0
3.0    3.0

In [72]: df.index.name='Measure'

In [73]: df
Out[73]:
         Count
Measure
0.0        2.0
1.0        2.0
2.0        1.0
3.0        3.0

In [74]: df['Percentage']=df.Count.div(df.Count.sum())

In [75]: df
Out[75]:
         Count  Percentage
Measure
0.0        2.0       0.250
1.0        2.0       0.250
2.0        1.0       0.125
3.0        3.0       0.375

Need Your Help

Fortran and Matlab return different eigenvalues for same matrix

matlab matrix fortran lapack

I am trying to learn how to use LaPACK by diagonalizing this simple matrix:

Test.loadData with Custom sObject Throws Exception

csv salesforce apex-code apex

I am loading a CSV file via Static Resourced to test my APEX code. I am using the following code in my test: