Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
July 16, 2022 05:40 am GMT

A foray into SVM w/ Scikit-Learn

Hey, Josiah here. Today Ill be discussing the concept of SVMs (Support Vector Machine), a subset of Machine Learning used commonly for classification and regression analysis. First off, we must discuss how SVM works from a theoretical standpoint.

Image description
A simple example of an SVM classifying data
Suppose some given data points each belong to one of two classes (Black & White), and the goal is to decide which class a new data point will be in.

The three lines seen (H1, H2, and H3) are known as Hyperplanes, which divide our data into classes. To define a hyperplane, one must follow a simple procedure:

Draw a line between the two classes such that the closest data point from each class (known as the support vector) are equidistant from the line. Thus, we have defined a Hyperplane.

However, it soon becomes apparent that an infinite number of hyperplanes can be derived. The obvious question arises: How do we determine which hyperplane to use? The answer is, we use the hyperplane which maximizes the distance to the support vectors. This maximized gap between the two classes is known as the maximum-margin hyperplane, and is the ideal hyperplane to predict the class of future data points.

Now that we have discussed the basic theoretical concepts of SVM classification, let us get into a simple example involving the breast-cancer dataset from Scikit-learn. (Relatively trivial code snippets will be captioned)

import sklearn
from sklearn import metrics
from sklearn import svm
cn = sklearn.datasets.load_breast_cancer() # load data set
x = cn.data # set x to the entries
y = cn.target # set y to the labels

Here, we set two variables x and y equal to our datas entries and labels, respectively. We will need this in the next snippet to separate the data further.

x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size=0.2)
We use Scikit-learns model_selection.train_test_split method to take our entries and labels and split them into training and testing subcomponents. By setting test_size to 0.2, we allocate 20% of our data for testing purposes, and the other 80% for training purposes. Note that its not recommended to set test_size to be greater than 0.3.

# Create an SVC
clf = svm.SVC()

# train the training set
clf.fit(x_train, y_train)

Here, we create the Support Vector Machine along with the SVC (Support Vector Classifier) and assign it to the variable clf. Next, we train the model on the training entries and labels (20% of the dataset as defined by test_size previously).

# Predict the labels for the training data
y_predict = clf.predict(x_test)

# compare the predicted labels to the actual labels and conver to percentage to output
acc = metrics.accuracy_score(y_test, y_predict)*100
print(f"{acc}% accuracy")

We set a variable y_predict equal to the results of the classifier predicting the labels of the testing entries. We then use the sklearn.metrics.accuracy_score() method to compare the actual testing labels to that of y_predict, which was the machines attempt at classifying them. We then convert this to a percentage by multiplying by a factor of 100, and printing this result. The accuracy you should get is ~96%, which is a very good accuracy!

If you enjoy articles like this and find it useful or have any feedback, message me on Discord @Necloremius#6412


Original Link: https://dev.to/omhaisoj/a-foray-into-svm-w-scikit-learn-32oh

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To