SVM with scikit-learn- a practical example

SVM: Support Vector Machine is a highly used method for classification. It can be used to classify both linear as well as non linear data.SVM was originally created for binary classification. In this post you will learn to implement SVM with scikit-learn in Python

SVM Working and Objective

Its objective is to find a hyperplane to separate tuples of multiple classes. SVM classifier finds out the data points closest to the hyperplane line (in linear case) and finds the distance between the support vectors and hyperplane. This distance is called as margin. It is the job of SVM classifier is to find the hyperplane which is of the maximum margin.

Working of SVM

  1. Non Linear mapping is used to transform data into a higher dimension for the purpose of classification
  2. An optimal linear hyperplane is searched
  3. Hyperplane is found using Support vectors in a N-Dimensional space
  4. In case of a linear function, if the output is comes to be more than 1 it is sought to be as one class while -1 is sought to be of another

A Hyperplane can be of 2 types depending upon the input features. If number of input features are 2 then the hyperplane is a line or else if the number of features are 3 then the hyperplane is a 2-D plane.

Support Vectors: Support Vectors are data points that are used to search an optimal hyperplane. These points are very close to the hyperplane. Any alteration of support vectors can change the location of the hyerplane

Advantages of SVM with scikit-learn

  • Can be used for prediction as well
  • Highly accurate
  • Less prone to Overfitting when compared to other cassification methods available

Disadvantages of SVM with scikit-learn

  • Slow training operation
  • Not recommended for noisy data
  • A highly optimized clustering-SVM is used when the data size is very large

Below is an example of Linear based  classification  by SVM with scikit-learn. It implements simple classifier for 2 classes whose data points are well separated

#import all the necessary libraries as shown below:
from sklearn.svm import SVC
import seaborn as sns
sns.set()
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
inputs,targets=make_blobs(n_samples=100, centers=2, random_state=0, cluster_std=0.7)
plt.scatter(inputs[:,0],inputs[:,1],c=targets, cmap='winter')
plt.title('Data points of the 2 classes')
plt.show()
Data Points of 2 classes
import numpy as np
x_f=np.linspace(-1,4)
plt.scatter(inputs[:,0],inputs[:,1],c=targets, cmap='copper')
SVM with scikit-learn Plot
svm_model = SVC(kernel='linear', C=1.0)
svm_model.fit(inputs,targets)
SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)
w = svm_model.coef_[0]

# get the y-offset for the linear equation
a = -w[0] / w[1]

# make the x-axis space for the data points <set limit according to the plot axes>
X = np.linspace(-1,5)

#hyperplane formula 
y = a * X - svm_model.intercept_[0] / w[1]

# plot the decision boundary
plt.plot(X, y) #plotting the hyperplane line 
plt.scatter(inputs[:, 0], inputs[:, 1], c=targets, cmap='winter') #c==color sequence
plt.legend()
plt.xlim([-1,4])
plt.title('Plot with optimal decision boundary')
plt.show()
Plot with Optimal Decision Boundary in SVM with scikit-learn
svm_model.support_vectors_
array([[0.35482006, 2.9172298 ],
       [2.56509832, 3.28573136],
       [0.40706768, 3.09538951],
       [1.23408114, 2.25819849],
       [1.70664481, 2.2483361 ],
       [1.45240954, 2.23470913]])

As you can see that it is so easy to implement SVM with scikit-learn. Pick you data set and give it a try.

Be First to Comment

Leave a Reply

Your email address will not be published.