A no-nonsense, 30, foot overview of Support Vector Machines, concisely explained with some great diagrams. We will follow a similar process to our recent post Naive Bayes for Dummies; A Simple Explanation by keeping it short and not overly-technical. The aim is to give those of you who are new to machine learning a basic understanding of the key concepts of this algorithm.
A Support Vector Machine SVM is a supervised machine learning algorithm that can be employed for both classification and regression purposes. SVMs are more commonly used in classification problems and as such, this is what we will focus on in this post. SVMs are based on the idea of finding a hyperplane that best divides a dataset into two classes, as shown in the image below. Support vectors are the data points nearest to the hyperplane, the points of a data set that, if removed, would alter the position of the dividing hyperplane.
Because of this, they can be considered the critical elements of a data set. As a simple example, for a classification task with only two features like the image above , you can think of a hyperplane as a line that linearly separates and classifies a set of data. Intuitively, the further from the hyperplane our data points lie, the more confident we are that they have been correctly classified. We therefore want our data points to be as far away from the hyperplane as possible, while still being on the correct side of it.
So when new testing data is added, whatever side of the hyperplane it lands will decide the class that we assign to it. The distance between the hyperplane and the nearest data point from either set is known as the margin.
Well if you're a really data driven farmer one way you could do it would be to build a classifier based on the position of the cows and wolves in your pasture. Racehorsing a few different types of classifiers, we see that SVM does a great job at seperating your cows from the packs of wolves. I thought these plots also do a nice job of illustrating the benefits of using a non-linear classifiers. You can see the the logistic and decision tree models both only make use of straight lines.
Want to create these plots for yourself? You can run the code in your terminal or in an IDE of your choice, but, big surprise, I'd recommend Rodeo. It has a great pop-out plot feature that comes in handy for this type of analysis.
It also ships with Python already included for Windows machines. Besides that, it's now lightning fast thanks to the hard work of TakenPilot. Make sure you've set your working directory to where you saved the file. Alright, now just copy and paste the code below into Rodeo, and run it, either by line or the entire script. Don't forget, you can pop out your plots tab, move around your windows, or resize them.
In the event that the relationship between a dependent variable and independent variable is non-linear, it's not going to be nearly as accurate as SVM. If you're still having troubles picturing this, see if you can follow along with this example.
Let's say we have a dataset that consists of green and red points. When plotted with their coordinates, the points make the shape of a red circle with a green outline and look an awful lot like Bangladesh's flag. But what type of model do we use? Let's try out the following:. Let's take a look at what our predicted shapes look like Follow along in Rodeo by copying and running the code above!
From the plots, it's pretty clear that SVM is the winner. But why? So now, SVM will divide the datasets into classes in the following way. Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. Now we will implement the SVM algorithm using Python.
Now the training set will be fitted to the SVM classifier. Below is the code for it:. However, we can change it for non-linear data. The model performance can be altered by changing the value of C Regularization factor , gamma, and kernel. Therefore we can say that our SVM model improved as compared to the Logistic regression model. As we can see, the above output is appearing similar to the Logistic regression output.
In the output, we got the straight line as hyperplane because we have used a linear kernel in the classifier. And we have also discussed above that for the 2d space, the hyperplane in SVM is a straight line.
As we can see in the above output image, the SVM classifier has divided the users into two regions Purchased or Not purchased.
Users who purchased the SUV are in the red region with the red scatter points. And users who did not purchase the SUV are in the green region with green scatter points. The hyperplane has divided the two classes into Purchased and not purchased variable. JavaTpoint offers too many high quality services. Mail us on [email protected] , to get more information about given services.
Please mail your requirement at [email protected] Duration: 1 week to 2 week. Machine Learning. Machine learning Interview. Data Pre-processing Step importing libraries import numpy as nm import matplotlib.
Creating the Confusion matrix from sklearn. Visulaizing the test set result from matplotlib. Reinforcement Learning. R Programming. React Native. Python Design Patterns. Python Pillow. Python Turtle. Verbal Ability. Interview Questions. Company Questions. Artificial Intelligence. Cloud Computing. Data Science.
0コメント