You’ll Finally got what is a Support vector machine and how it works

Lucasvittal
5 min readJun 10, 2022

Today`s subject is about support vector machine, a wide used machine learning algorithm because of its flexibility that make this algorithm fits very well in a lot of problems, soon we will see why it is so powerful. But, for a while, let`s start with fundamentals to get a good understanding of this subject.

Getting Fundamentals

The fundamentals of support vector machine classification are maximizing a width of a linear street. Yes, the fundamental of this algorithm is a linear space that we want to maximize, and we are calling it of street.

As a first problem, regarding that we are trying to resolve a linear separable classification. So, mathematically speaking, the main goal of the algorithm is finding a hyperplane in such a way that the street width is as large as possible.

A general hyperplane (the line on the middle of street) can expressed as:

Regarding a hard margin, a criterion that do not accept any instance inside the street, then the bounds of the street can be defined as:

Therefore, we have a model to visualize:

In this manner, any instances of the top-left region are classified as 1, and the ones that is on the bottom right are classified as 0. To simplify math stuff, we realize

as positive instance classification and

as negative classification, therefore we have constrained our solution:

And finally, we get the modeling of our optimization problem:

Neither all problems have instance displayed is such a way that no one is in the street, therefore a soft margin approaching fits well in this cases Therefore we need add parameter in our optimization equation that says how much each instances can violate street boundaries, theses variables are called Slack variables and with them our optimization problem become:

If you are a little bit more familiar with math, you would remember that the optimization problem above has a similarity with Lagrangian Formulation:

So

Regarding we want a minimization the final constraints are defined as:

As a consequence, the w and b value that attend our optimization condition is given by:

What gives the final form of the function that need to be minimized in order to define the street middle line and the margin):

A general form of SVM

Linear models have a restrict set of problems that it fits well, in the cases where Linear models do not fit well is necessary applying a transformation in the instances numerical data in such a way that the problem becomes solvable. It should be made in an intelligent manner, otherwise the transformation can get things unsolvable because of high computational complexity. A dumb way would be transform each instance one a one what would be tremendous expensive computationally speaking, therefore the instance are not transformed one by one, but usually in pairs as the example below:

This is called kernel trick, and this is how a general SVM works, and that is the kernel defined when a SVM instance is defined in your python program for example:

And then SVM optimization in its general form is defined through the minimization of the function below:

as previously considered, we use the fact that the minimum is the point where the derivates is 0:

That way we got condition to minimizing L(w, b,a):

and the final form of optimization problem is finally defined:

And voila, here’s a result of a multiclassification perform by different kernels of a SVM:

One last thing, all stuff presented here was about classification problem. But SVM can be used as a regressor as well, in these cases it’s only revert the optimization condition, in other words, the goal is get the shortest possible street, the rest is exactly.

I Hope with this you got a good understanding of Support vector machine and you have enjoyed.

See you soon till the next lecture.

--

--