Sources Contact Advanced Search Tutorials

An Interest In:

Web News this Week

Search Archive

Some of Our Sources

View All Sources

Help Webnuz

Referal links:

November 21, 2020 11:21 pm GMT

Your first Machine Learning REST API with Python/FastAPI

You will learn...

a basic workflow of creating a machine learning service from stating the problem space to cleaning the data to choosing a model, training your model and finally deploying it to the web.

Requirements

Basic bash
Basic use of git/github
Basic Python3.7

Data Science

You've probably seen this image before:

Data science is a hot topic these days and is the cross section of Math, Computers and Business. That sounds nice and all but how do you do data science?

Much like normal science having the scientific method, data science has a couple of methodologies that pave the way of a project. One of the most used methodologies is the CRISP-DM (Cross-industry standard process for data mining)

Said process is described below:

We will be roughly going through each and every single one of these steps today.

Business Understanding

What do want to achieve? Think of key words that can help you find the data for your problem. Maybe you want to solve a global warming problem so you look for data on floods in a certain region. Or maybe you want to speed up the identification of mushrooms at a biology lab so you look for data describing the characteristics of different mushroom species. For now don't think about the data just think about what you want to solve.

For our use case our objective will be: Lowering school dropouts through a tool that can predict students' grade based on a short survey.

Data Understanding

Before we go on we must first understand what kind of data we should look for.
There are three types of data:

Note: Unstructured data can also be images, videos and audio

For classic machine learning (which is the scope of this tutorial) the easiest data type to use is structured data, so that's what we'll be working with.
One of the best places to look for data is Kaggle plus you can start a notebook from the same place so it makes everything a bit easier.

Kaggle is a social network for data scientists where you can find data, competitions, courses and the work of others

Once in Kaggle you can go to the datasets section.

Here you can insert the key words you came up for in the Business Understanding section.

I used the keyword education and found this data set. One of the most important things to consider is the description of the dataset and assure that it well describes the columns. For example:

Now select the **New Notebook* button and select Python and Notebook for the following options, finally click create.

You will be redirected to a Notebook where we can start understanding our data.

Notebooks are a way to run code with cells along with cells that interpret Markdown, this allows us to easily experiment with code while having a great way to document our thought process. These are also known as Kernels.

Data Preparation

Here is where we would clean our data, do some normalization, and encoding categorical data. But for the time being we'll just drop all of our categorical data. Which is the most naive strategy.

First we select our target variable which is G3(final grade) which is a numerical column.Our table without G3 are our input columns.

Then using the previously imported train_test_split method we will split our table into two. Our training (70%) dataset and our test (30%) dataset.

This is needed to evaluate how well our model learned from the data

Next we drop all of the categorical data from our training and test data, as well as G1 and G2 (we don't want to involve past grades to predict our future grade)

With the columns attribute we see what columns are left.

Modeling and Evaluation

Time for the fun part!
We now take our training data and fit it to an instantiated RandomForestRegressor with the fit method.

After that we'll predict on our test data, this will result in predicted grades for these test rows.

In order to evaluate our model we'll take our actual test grades and compare them with our predicted grades using the mean average error calculated like so:

All of these steps are put together in the following lines of code.

We see that we have a MAE of 3.15, given the few steps take to process our data this is a good enough result.

Back to Business Understanding

Since we want to make a simple form for our students we'll reduce the input variables needed for the prediction.
Using code copied from some obscure Stackoverflow post we can see an ordered list of the most important features used in the model.

We'll take top ten data points dropping the rest of the columns, then we'll train our model again and see how well it fits our new data.

Apparently this is an even better model than the previous while using less data.
We accept our results and finish this process saving this model as a file with the Pickle model. We'll download this file and save it for later.

Deployment

We now have a working model but this model is useless unless anybody uses it. Deploying a model is its own challenge. In this workshop we'll do a MacGyver-like deploy using FastAPI and Deta.

Create a free account with Deta
Create a directory for your new project

mkdir grading_prediction_servicecd grading_prediction_service