Support vector machines with Scikit learn

Sun, Mar 6, 2016

Support vector machines are supervised learning models used for classification and regression. For a classifier the data is represented as points in space and a SVM classifier (SVC) separates the classes by a gap that is as wide as possible. SVM algorithms are known as maximum margin classifiers.

To illustrate the SVC algorithm we generate random points in two dimensions arranged in two clusters. This is illustrated in a Jupyter (IPython) notebook in this repository.

X, y = make_blobs(n_samples=50, centers=2, random_state=0, cluster_std=0.60)
from sklearn.svm import SVC
clf = SVC(kernel='linear')
clf.fit(X, y)

Multiple lines can be drawn to separate the clusters. The black line is preferred to the red line as there is a larger margin between it and the nearest points.

Some of the points nearest the boundary are known as support vectors. They margins and the support vectors are plotted below.

Support vector classifiers are linear classifiers. For datasets that are not linearly separable they do a poor job.

To create non-linear boundaries we could convert this two dimensional data set to higher dimensions. For example we could add the distance of the points from the origin as the third dimension. The two clusters will then be easily separable.

Another example with non-linearly separable data.

Human Learning Machine Learning

Support vector machines with Scikit learn

Two lines separating two clusters

Support vectors

Non-linearly separable data

Radial basis functions for SVC

Radial basis functions for SVC