Assigning weights to a multilabel SVM to balance classes

How is this done? I am using Sklearn to train an SVM. My classes are unbalanced. Note that my problem is multiclass, multilabel so I am using OneVsRestClassifier:

mlb = MultiLabelBinarizer()
y = mlb.fit_transform(y_train)

clf = OneVsRestClassifier(svm.SVC(kernel='rbf'))
clf = clf.fit(x, y) 
pred = clf.predict(x_test)

Can I add a 'sample_weight' parameter somewhere to account for the unbalanced classes?


When I add a class_weight dict to the svm I get the error:

ValueError: Class label 2 not present

This is because I have converted my labels to binary using the mlb. However, if I do not convert the labels, I get:

ValueError: You appear to be using a legacy multi-label data representation. Sequence of sequences are no longer supported; use a binary array or sparse matrix instead. 

class_weight is a dict, mapping the class labels to the weight: {1: 1, 2: 1, 3: 3...}

Here are the details of x and y:

print(X[0])  
[ 0.76625633  0.63062721  0.01954162 ...,  1.1767817   0.249034    0.23544988]
print(type(X))
<type 'numpy.ndarray'>

print(y[0])
print(type(y))
[1, 2, 3, 4, 5, 6, 7]
<type 'numpy.ndarray'>

Note that mlb = MultiLabelBinarizer(); y = mlb.fit_transform(y_train) converts y to a binary array.


The suggested answer produces the error:

ValueError: You appear to be using a legacy multi-label data representation. Sequence of sequences are no longer supported; use a binary array or sparse matrix instead.

So, the problem reduces to converting the labels (a np.array) to a sparse matrix.

from scipy import sparse
y_sp = sparse.csr_matrix(y) 

This produces the error:

TypeError: no supported conversion for types: (dtype('O'),)

I will open a new query for this.

Answers


You could use :

class_weight : {dict, ‘balanced’}, optional

Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))

clf = OneVsRestClassifier(svm.SVC(kernel='rbf', class_weight='balanced'))

source


This code works fine with the 'balanced' value of class_weight attribute

>>> from sklearn.preprocessing import MultiLabelBinarizer
>>> from sklearn.svm import SVC
>>> from sklearn.multiclass import OneVsRestClassifier

>>> mlb = MultiLabelBinarizer()
>>> x = [[0,1,1,1],[1,0,0,1]]
>>> y = mlb.fit_transform([['sci-fi', 'thriller'], ['comedy']])

>>> print y
>>> print mlb.classes_
[[0 1 1]
 [1 0 0]]
['comedy' 'sci-fi' 'thriller']

>>> OneVsRestClassifier(SVC(random_state=0, class_weight='balanced')).fit(x, y).predict(x)
array([[0, 1, 1],
   [1, 0, 0]])

Need Your Help

ValueError: option error in Django / Python

python django ldap python-ldap

I'm building a django website and trying to implement LDAP to it.

How to change a ng-view depending on a variable

javascript jquery html angularjs

I'm building a site using Angular, that displays images from a folder images. The images are titled image1, image2, etc.. The site displays the images in a view, one image at a time.