Group values in Pandas DataFrame based on calculations

This is based on my previous question. Any way I have provided all the details in this question also.

df = pd.DataFrame([[1,1,1,0,1],[2,1,1,0,1],[3,1,1,1,1],[4,0,1,0,1]])
df.columns = ['session','p1','p2','p3','p4']
df1 = df.set_index('session')
common = df1.dot(df1.T)
print common / np.sqrt(np.outer(*[df1.sum(1)] * 2))

session         1         2         3         4
session                                        
1        1.000000  1.000000  0.866025  0.816497
2        1.000000  1.000000  0.866025  0.816497
3        0.866025  0.866025  1.000000  0.707107
4        0.816497  0.816497  0.707107  1.000000

Now I need to get the upper triangular part of the matrix.

print np.triu(common / np.sqrt(np.outer(*[df1.sum(1)] * 2)))

output:

[[ 1.          1.          0.8660254   0.81649658]
 [ 0.          1.          0.8660254   0.81649658]
 [ 0.          0.          1.          0.70710678]
 [ 0.          0.          0.          1.        ]]

I don't need the values in the diagonal and from the rest of the values in the upper side need to group sessions who are having the same values. For example in the previous case session groups are like following.

session 1,3 and 2,3
session 1,4 and 2,4
session 3,4
session 1,2

Answers


Try this:

res = common / np.sqrt(np.outer(*[df1.sum(1)] * 2))
import itertools
for col in res.columns:
    for _,g in res.groupby(col):
        pairs = zip(list(g.index[g.index < col]),itertools.repeat(col))
        if pairs: print pairs

Need Your Help

Converting ASCII strings to UTF-16 before passing them to Windows API functions

c++ windows winapi unicode encoding

In my current project I've been using wide chars (utf16). But since my only input from the user is going to be a url, which has to end up ascii anyways, and one other string, I'm thinking about just

ASP.Net Providers for MySQL

mysql asp.net open-source provider

Are there ASP.NET 2.0 Providers available for MySQL? On Googling, I find discrete (and incomplete) pieces of code on codeplex and elsewhere.