Introduction to SVM
Support vector machines (SVMs) are powerful
yet flexible supervised machine learning algorithms which are used both for
classification and regression. But generally, they are used in classification
problems. In 1960s, SVMs were first introduced but later they got refined in
1990. SVMs have their unique way of implementation as compared to other machine
learning algorithms. Lately, they are extremely popular because of their
ability to handle multiple continuous and categorical variables.
Working of SVM
An SVM model is basically a representation of different classes in a hyperplane in multidimensional space. The hyperplane will be generated in an iterative manner by SVM so that the error can be minimized. The goal of SVM is to divide the datasets into classes to find a maximum marginal hyperplane (MMH).
The followings are important concepts in SVM −
- Support Vectors − Datapoints that are closest to
the hyperplane is called support vectors. Separating line will be defined
with the help of these data points.
- Hyperplane − As we can see in the above
diagram, it is a decision plane or space which is divided between a set of
objects having different classes.
- Margin − It may be defined as the gap between two lines on the
closet data points of different classes. It can be calculated as the
perpendicular distance from the line to the support vectors. Large margin
is considered as a good margin and small margin is considered as a bad
margin.
The main goal of SVM is to divide the datasets
into classes to find a maximum marginal hyperplane (MMH) and it can be done in
the following two steps −
- First, SVM will generate hyperplanes
iteratively that segregates the classes in best way.
- Then, it will choose the hyperplane that separates the classes correctly.
SVM Kernels
In practice, SVM algorithm is implemented with kernel that transforms an input data space into the required form. SVM uses a technique called the kernel trick in which kernel takes a low dimensional input space and transforms it into a higher dimensional space. In simple words, kernel converts non-separable problems into separable problems by adding more dimensions to it. It makes SVM more powerful, flexible and accurate. The following are some of the types of kernels used by SVM.
Linear Kernel: It can be used as a dot product between any
two observations. The formula of linear kernel is as below −
K(x,xi)=sum(x∗xi)
From the above formula, we can see that the product between two vectors say 𝑥 & 𝑥𝑖 is the sum of the multiplication of each pair of input values.
Polynomial Kernel: It is more generalized form of linear kernel
and distinguish curved or nonlinear input space. Following is the formula for
polynomial kernel −
k(X,Xi)=1+sum(X∗Xi)^d
Here d is the degree of polynomial, which we need to specify manually in the learning algorithm.
Gaussian Kernel: It is used to perform transformation, when
there is no prior knowledge about data.
Gaussian Kernel Formula
For so long in this post we have been
discussing the hyperplane, let’s justify its meaning before moving forward. The
hyperplane is a function which is used to differentiate between features. In
2-D, the function used to classify between features is a line whereas, the
function used to classify the features in a 3-D is called as a plane similarly
the function which classifies the point in higher dimension is called as a
hyperplane. Now since you know about the hyperplane lets move back to SVM.
Let’s say there are “m” dimensions:
thus the equation of the hyperplane in the ‘M’
dimension can be given as =
Wi = vectors(W0,W1,W2,W3……Wm)
b = biased term (W0)
X = variables.
Properties of SVM
- Flexibility in choosing a similarity function
- Sparseness of solution when dealing with large data sets
- only support vectors are used to specify the separating hyperplane
- Ability to handle large feature spaces
- complexity does not depend on the dimensionality of the feature space
- Overfitting can be controlled by soft margin approach
- Nice math property: a simple convex optimization problem which is guaranteed to converge to a single global solution
- Feature Selection
Disadvantages:
- SVM algorithm is not suitable for large
data sets.
- SVM does not perform very well when the
data set has more noise i.e. target classes are overlapping.
- In cases where the number of features for
each data point exceeds the number of training data samples, the SVM will
underperform.
- As the support vector classifier works by
putting data points, above and below the classifying hyperplane there is
no probabilistic explanation for the classification.
0 Comments