# Group values in Pandas DataFrame based on calculations

This is based on my previous question. Any way I have provided all the details in this question also.

df = pd.DataFrame([[1,1,1,0,1],[2,1,1,0,1],[3,1,1,1,1],[4,0,1,0,1]]) df.columns = ['session','p1','p2','p3','p4'] df1 = df.set_index('session') common = df1.dot(df1.T) print common / np.sqrt(np.outer(*[df1.sum(1)] * 2)) session 1 2 3 4 session 1 1.000000 1.000000 0.866025 0.816497 2 1.000000 1.000000 0.866025 0.816497 3 0.866025 0.866025 1.000000 0.707107 4 0.816497 0.816497 0.707107 1.000000

Now I need to get the upper triangular part of the matrix.

print np.triu(common / np.sqrt(np.outer(*[df1.sum(1)] * 2)))

output:

[[ 1. 1. 0.8660254 0.81649658] [ 0. 1. 0.8660254 0.81649658] [ 0. 0. 1. 0.70710678] [ 0. 0. 0. 1. ]]

I don't need the values in the diagonal and from the rest of the values in the upper side need to group sessions who are having the same values. For example in the previous case session groups are like following.

session 1,3 and 2,3 session 1,4 and 2,4 session 3,4 session 1,2

## Answers

Try this:

res = common / np.sqrt(np.outer(*[df1.sum(1)] * 2)) import itertools for col in res.columns: for _,g in res.groupby(col): pairs = zip(list(g.index[g.index < col]),itertools.repeat(col)) if pairs: print pairs