Summarizing a CSV data into data frame by averaging based on two headers in Pandas

I have the following CSV data:

id,gene,celltype,stem,stem,stem,bcell,bcell,tcell
id,gene,organs,bm,bm,fl,pt,pt,bm
134,foo,about_foo,20,10,11,23,22,79
222,bar,about_bar,17,13,55,12,13,88

Notice that it contains two headers. What I want to do is to group the 2nd row onwards and average them by organ and cell type. So that it creates hierarchical data frame like this:

bm       stem,         bcell,  tcell
    foo  (20+10)/2     0        79/1=79
    bar  (17+13)/2     0        88/1=88



fl        stem,        bcell,    tcell
    foo    11/1=11       0         0
    bar    55/1=55


pt         stem,       bcell,        tcell
    foo      0       (23+22)/2        0
    bar      0       (12+13)/2        0

How can I achieve that?

I'm stuck with the following code:

import pandas as pd
df = pd.read_csv("http://dpaste.com/1X74TNP.txt")

Update

import pandas as pd
df = pd.read_csv("http://dpaste.com/1X74TNP.txt",header=None,index_col=[1,2]).iloc[:, 1:]
df.columns = pd.MultiIndex.from_arrays(df.ix[:2].values)
df = df.ix[2:]
df.index.names = ['cell', 'organ']
df = df.reset_index('organ', drop=True)
result = df.groupby(level=[0, 1], axis=1).mean().stack().replace(np.nan, 0).unstack().swaplevel(0,1, axis=1).sort_index(axis=1)

gives:

DataError: No numeric types to aggregate

Answers


df = pd.read_csv(join(DESKTOP, 'bio.csv'), header=None, index_col=[1,2]).iloc[:, 1:]

df.columns = pd.MultiIndex.from_arrays(df.ix[:2].values)
df = df.ix[2:].astype(int)
df.index.names = ['cell', 'organ']
df = df.reset_index('organ', drop=True)

avg = df.groupby(level=[0, 1], axis=1).mean()
result = avg.stack().replace(np.nan, 0).unstack()
result = result.swaplevel(0,1, axis=1).sort_index(axis=1)

        bm               fl               pt           
     bcell stem tcell bcell stem tcell bcell stem tcell
cell                                                   
foo      0   15    79     0   11     0  22.5    0     0
bar      0   15    88     0   55     0  12.5    0     0

To access one of the attributes, use:

print(result.loc[:, 'bm'])

      bcell  stem  tcell
cell                    
foo       0    15     79
bar       0    15     88

Need Your Help

How to remove clickHandler which already added to Label?

java gwt click handler

I am using GWT/JAVA for development. I have following problem:

Windows script to remove more than x files from directories & sub dirs

windows

I am looking to write a script that deletes more than 'n' files (likely 5 versions) starting with the oldest first, keeping the latest 5 files in hundreds of directories. These directories are all ...