# Assigning weights to a multilabel SVM to balance classes

How is this done? I am using Sklearn to train an SVM. My classes are unbalanced. Note that my problem is multiclass, multilabel so I am using OneVsRestClassifier:

mlb = MultiLabelBinarizer() y = mlb.fit_transform(y_train) clf = OneVsRestClassifier(svm.SVC(kernel='rbf')) clf = clf.fit(x, y) pred = clf.predict(x_test)

Can I add a 'sample_weight' parameter somewhere to account for the unbalanced classes?

When I add a class_weight dict to the svm I get the error:

ValueError: Class label 2 not present

This is because I have converted my labels to binary using the mlb. However, if I do not convert the labels, I get:

ValueError: You appear to be using a legacy multi-label data representation. Sequence of sequences are no longer supported; use a binary array or sparse matrix instead.

class_weight is a dict, mapping the class labels to the weight: {1: 1, 2: 1, 3: 3...}

Here are the details of x and y:

print(X[0]) [ 0.76625633 0.63062721 0.01954162 ..., 1.1767817 0.249034 0.23544988] print(type(X)) <type 'numpy.ndarray'> print(y[0]) print(type(y)) [1, 2, 3, 4, 5, 6, 7] <type 'numpy.ndarray'>

Note that mlb = MultiLabelBinarizer(); y = mlb.fit_transform(y_train) converts y to a binary array.

The suggested answer produces the error:

ValueError: You appear to be using a legacy multi-label data representation. Sequence of sequences are no longer supported; use a binary array or sparse matrix instead.

**So, the problem reduces to converting the labels (a np.array) to a sparse matrix.**

from scipy import sparse y_sp = sparse.csr_matrix(y)

This produces the error:

TypeError: no supported conversion for types: (dtype('O'),)

I will open a new query for this.

## Answers

You could use :

class_weight : {dict, ‘balanced’}, optional

Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))

clf = OneVsRestClassifier(svm.SVC(kernel='rbf', class_weight='balanced'))

This code works fine with the *'balanced'* value of **class_weight** attribute

>>> from sklearn.preprocessing import MultiLabelBinarizer >>> from sklearn.svm import SVC >>> from sklearn.multiclass import OneVsRestClassifier >>> mlb = MultiLabelBinarizer() >>> x = [[0,1,1,1],[1,0,0,1]] >>> y = mlb.fit_transform([['sci-fi', 'thriller'], ['comedy']]) >>> print y >>> print mlb.classes_ [[0 1 1] [1 0 0]] ['comedy' 'sci-fi' 'thriller'] >>> OneVsRestClassifier(SVC(random_state=0, class_weight='balanced')).fit(x, y).predict(x) array([[0, 1, 1], [1, 0, 0]])