SVM: Support Vector Machine is a highly used method for classification. It can be used to classify both linear as well as non linear data.SVM was originally created for binary classification. In this post you will learn to implement SVM with scikit-learn in Python
SVM Working and Objective
Its objective is to find a hyperplane to separate tuples of multiple classes. SVM classifier finds out the data points closest to the hyperplane line (in linear case) and finds the distance between the support vectors and hyperplane. This distance is called as margin. It is the job of SVM classifier is to find the hyperplane which is of the maximum margin.
Working of SVM
- Non Linear mapping is used to transform data into a higher dimension for the purpose of classification
- An optimal linear hyperplane is searched
- Hyperplane is found using Support vectors in a N-Dimensional space
- In case of a linear function, if the output is comes to be more than 1 it is sought to be as one class while -1 is sought to be of another
A Hyperplane can be of 2 types depending upon the input features. If number of input features are 2 then the hyperplane is a line or else if the number of features are 3 then the hyperplane is a 2-D plane.
Support Vectors: Support Vectors are data points that are used to search an optimal hyperplane. These points are very close to the hyperplane. Any alteration of support vectors can change the location of the hyerplane
Advantages of SVM with scikit-learn
- Can be used for prediction as well
- Highly accurate
- Less prone to Overfitting when compared to other cassification methods available
Disadvantages of SVM with scikit-learn
- Slow training operation
- Not recommended for noisy data
- A highly optimized clustering-SVM is used when the data size is very large
Below is an example of Linear based classification by SVM with scikit-learn. It implements simple classifier for 2 classes whose data points are well separated
#import all the necessary libraries as shown below: from sklearn.svm import SVC import seaborn as sns sns.set()
from sklearn.datasets import make_blobs import matplotlib.pyplot as plt
inputs,targets=make_blobs(n_samples=100, centers=2, random_state=0, cluster_std=0.7) plt.scatter(inputs[:,0],inputs[:,1],c=targets, cmap='winter') plt.title('Data points of the 2 classes') plt.show()
import numpy as np x_f=np.linspace(-1,4) plt.scatter(inputs[:,0],inputs[:,1],c=targets, cmap='copper')
svm_model = SVC(kernel='linear', C=1.0) svm_model.fit(inputs,targets)
SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
w = svm_model.coef_[0] # get the y-offset for the linear equation a = -w[0] / w[1] # make the x-axis space for the data points <set limit according to the plot axes> X = np.linspace(-1,5) #hyperplane formula y = a * X - svm_model.intercept_[0] / w[1] # plot the decision boundary plt.plot(X, y) #plotting the hyperplane line plt.scatter(inputs[:, 0], inputs[:, 1], c=targets, cmap='winter') #c==color sequence plt.legend() plt.xlim([-1,4]) plt.title('Plot with optimal decision boundary') plt.show()
svm_model.support_vectors_
array([[0.35482006, 2.9172298 ],
[2.56509832, 3.28573136],
[0.40706768, 3.09538951],
[1.23408114, 2.25819849],
[1.70664481, 2.2483361 ],
[1.45240954, 2.23470913]])
As you can see that it is so easy to implement SVM with scikit-learn. Pick you data set and give it a try.
Be First to Comment