SVM with scikit-learn- a practical example

SVM: Support Vector Machine is a highly used method for classification. It can be used to classify both linear as well as non linear data.SVM was originally created for binary classification. In this post you will learn to implement SVM with scikit-learn in Python

SVM Working and Objective

Its objective is to find a hyperplane to separate tuples of multiple classes. SVM classifier finds out the data points closest to the hyperplane line (in linear case) and finds the distance between the support vectors and hyperplane. This distance is called as margin. It is the job of SVM classifier is to find the hyperplane which is of the maximum margin.

Working of SVM

Non Linear mapping is used to transform data into a higher dimension for the purpose of classification
An optimal linear hyperplane is searched
Hyperplane is found using Support vectors in a N-Dimensional space
In case of a linear function, if the output is comes to be more than 1 it is sought to be as one class while -1 is sought to be of another

A Hyperplane can be of 2 types depending upon the input features. If number of input features are 2 then the hyperplane is a line or else if the number of features are 3 then the hyperplane is a 2-D plane.

Support Vectors: Support Vectors are data points that are used to search an optimal hyperplane. These points are very close to the hyperplane. Any alteration of support vectors can change the location of the hyerplane

Advantages of SVM with scikit-learn

Can be used for prediction as well
Highly accurate
Less prone to Overfitting when compared to other cassification methods available

Disadvantages of SVM with scikit-learn

Slow training operation
Not recommended for noisy data
A highly optimized clustering-SVM is used when the data size is very large

Below is an example of Linear based classification by SVM with scikit-learn. It implements simple classifier for 2 classes whose data points are well separated

#import all the necessary libraries as shown below:
from sklearn.svm import SVC
import seaborn as sns
sns.set()

from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

inputs,targets=make_blobs(n_samples=100, centers=2, random_state=0, cluster_std=0.7)
plt.scatter(inputs[:,0],inputs[:,1],c=targets, cmap='winter')
plt.title('Data points of the 2 classes')
plt.show()

import numpy as np
x_f=np.linspace(-1,4)
plt.scatter(inputs[:,0],inputs[:,1],c=targets, cmap='copper')

svm_model = SVC(kernel='linear', C=1.0)
svm_model.fit(inputs,targets)

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

w = svm_model.coef_[0]

# get the y-offset for the linear equation
a = -w[0] / w[1]

# make the x-axis space for the data points <set limit according to the plot axes>
X = np.linspace(-1,5)

#hyperplane formula 
y = a * X - svm_model.intercept_[0] / w[1]

# plot the decision boundary
plt.plot(X, y) #plotting the hyperplane line 
plt.scatter(inputs[:, 0], inputs[:, 1], c=targets, cmap='winter') #c==color sequence
plt.legend()
plt.xlim([-1,4])
plt.title('Plot with optimal decision boundary')
plt.show()

Plot with Optimal Decision Boundary in SVM with scikit-learn

svm_model.support_vectors_

array([[0.35482006, 2.9172298 ],
       [2.56509832, 3.28573136],
       [0.40706768, 3.09538951],
       [1.23408114, 2.25819849],
       [1.70664481, 2.2483361 ],
       [1.45240954, 2.23470913]])

As you can see that it is so easy to implement SVM with scikit-learn. Pick you data set and give it a try.

Saket Dawgotra