Sources Contact Advanced Search Tutorials

An Interest In:

Web News this Week

Search Archive

Some of Our Sources

View All Sources

Help Webnuz

Referal links:

April 12, 2022 10:48 am GMT

Serving Python Machine Learning Models With Ease

Ever trained a new model and just wanted to use it through an API straight away? Sometimes you don't want to bother writing Flask code or containerizing your model and running it in Docker. If that sounds like you, you definitely want to check out MLServer. It's a python based inference server that recently went GA and what's really neat about it is that it's a highly-performant server designed for production environments too. That means that, by serving models locally, you are running in the exact same environment as they will be in when they get to production.

This blog walks you through how to use MLServer by using a couple of image models as examples...

Dataset

The dataset we're going to work with is the Fashion MNIST dataset. It contains 70,000 images of clothing in greyscale 28x28 pixels across 10 different classes (top, dress, coat, trouser etc...).

If you want to reproduce the code from this blog, make sure you download the files and extract them in to a folder named data. They have been omitted from the github repo because they are quite large.

Training the Scikit-learn Model

First up, we're going to train a support vector machine (SVM) model using the scikit-learn framework. We'll then save the model to a file named Fashion-MNIST.joblib.

import pandas as pdfrom sklearn import svmimport timeimport joblib#Load Training Datatrain = pd.read_csv('../../data/fashion-mnist_train.csv', header=0)y_train = train['label']X_train = train.drop(['label'], axis=1)classifier = svm.SVC(kernel="poly", degree=4, gamma=0.1)#Train Modelstart = time.time()classifier.fit(X_train.values, y_train.values)end = time.time()exec_time = end-startprint(f'Execution time: {exec_time} seconds')#Save Modeljoblib.dump(classifier, "Fashion-MNIST.joblib")

Note: The SVM algorithm is not particularly well suited to large datasets because of it's quadratic nature. The model in this example will, depending on your hardware, take a couple of minutes to train.

Serving the Scikit-learn Model

Ok, so we've now got a saved model file Fashion-MNIST.joblib. Let's take a look at how we can serve that using MLServer...

First up, we need to install MLServer.

pip install mlserver

The additional runtimes are optional but make life really easy when serving models, we'll install the Scikit-Learn and XGBoost ones too

pip install mlserver-sklearn mlserver-xgboost

You can find details on all of the inference runtimes here

Once we've done that, all we need to do is add two configuration files:

settings.json - This contains the configuration for the server itself.
model-settings.json - As the name suggests, this file contains configuration for the model we want to run.

For our settings.json file it's enough to just define single parameter:

{    "debug": "true"}

The model-settings.json file requires a few more bits of info as it needs to know about the model we're trying to serve:

{    "name": "fashion-sklearn",    "implementation": "mlserver_sklearn.SKLearnModel",    "parameters": {        "uri": "./Fashion_MNIST.joblib",        "version": "v1"    }}

The name parameter should be self-explanatory. It gives MLServer a unique identifier which is particularly useful when serving multiple models (we'll come to that in a bit). The implementation defines which pre-built server, if any, to use. It is heavily coupled to the machine learning framework used to train your model. In our case we trained the model using scikit-learn so we're going to use the scikit-learn implementation for MLServer. For model parameters we just need to provide the location of our model file as well as a version number.

That's it, two small config files and we're ready to serve our model using the command:

mlserver start .

Boom, we've now got our model running on a production-ready server locally. It's now ready to accept requests over HTTP and gRPC (default ports 8080 and 8081 respectively).

Testing the Model

Now that our model is up and running. Let's send some requests to see it in action.

To make predictions on our model, we need to send a POST request to the following URL:

http://localhost:8080/v2/models/<MODEL_NAME>/versions/<VERSION>/infer

That means to access our scikit-learn model that we trained earlier, we need to replace the MODEL_NAME with fashion-sklearn and VERSION with v1.

The code below shows how to import the test data, make a request to the model server and then compare the result with the actual label:

import pandas as pdimport requests#Import test data, grab the first row and corresponding labeltest = pd.read_csv('../../data/fashion-mnist_test.csv', header=0)y_test = test['label'][0:1]X_test = test.drop(['label'],axis=1)[0:1]#Prediction request parametersinference_request = {    "inputs": [        {          "name": "predict",          "shape": X_test.shape,          "datatype": "FP64",          "data": X_test.values.tolist()        }    ]}endpoint = "http://localhost:8080/v2/models/fashion-sklearn/versions/v1/infer"#Make request and print responseresponse = requests.post(endpoint, json=inference_request)print(response.text)print(y_test.values)

When running the test.py code above we get the following response from MLServer:

{  "model_name": "fashion-sklearn",  "model_version": "v1",  "id": "31c3fa70-2e56-49b1-bcec-294452dbe73c",  "parameters": null,  "outputs": [    {      "name": "predict",      "shape": [        1      ],      "datatype": "INT64",      "parameters": null,      "data": [        0      ]    }  ]}

You'll notice that MLServer has generated a request id and automatically added metadata about the model and version that was used to serve our request. Capturing this kind of metadata is super important once our model gets to production; it allows us to log every request for audit and troubleshooting purposes.

You might also notice that MLServer has returned an array for outputs. In our request we only sent one row of data but MLServer also handles batch requests and returns them together. You can even use a technique called adaptive batching to optimise the way multiple requests are handled in production environments.

In our example above, the model's prediction can be found in outputs[0].data which shows that the model has labeled this sample with the category 0 (The value 0 corresponds to the category t-shirt/top). The true label for that sample was a 0 too so the model got this prediction correct!

Training the XGBoost Model

Now that we've seen how to create and serve a single model using MLServer, let's take a look at how we'd handle multiple models trained in different frameworks.

We'll be using the same Fashion MNIST dataset but, this time, we'll train an XGBoost model instead.

import pandas as pdimport xgboost as xgbimport time#Load Training Datatrain = pd.read_csv('../../data/fashion-mnist_train.csv', header=0)y_train = train['label']X_train = train.drop(['label'], axis=1)dtrain = xgb.DMatrix(X_train.values, label=y_train.values)#Train Modelparams = {    'max_depth': 5,    'eta': 0.3,    'verbosity': 1,    'objective': 'multi:softmax',    'num_class' : 10}num_round = 50start = time.time()bstmodel = xgb.train(params, dtrain, num_round, evals=[(dtrain, 'label')], verbose_eval=10)end = time.time()exec_time = end-startprint(f'Execution time: {exec_time} seconds')#Save Modelbstmodel.save_model('Fashion_MNIST.json')

The code above, used to train the XGBoost model, is similar to the code we used earlier to train the scikit-learn model but this time our model has been saved in an XGBoost-compatible format as Fashion_MNIST.json.

Serving Multiple Models

One of the cool things about MLServer is that it supports multi-model serving. This means that you don't have to create or run a new server for each ML model you want to deploy. Using the models we built above, we'll use this feature to serve them both at once.

When MLServer starts up, it will search the directory (and any subdirectories) for model-settings.json files. If you've got multiple model-settings.json files then it'll automatically serve them all.

Note: you still only need a single settings.json (server config) file in the root directory

Here's a breakdown of my directory structure for reference:

. data    fashion-mnist_test.csv    fashion-mnist_train.csv models    sklearn       Fashion_MNIST.joblib       model-settings.json       test.py       train.py    xgboost        Fashion_MNIST.json        model-settings.json        test.py        train.py README.md settings.json test_models.py

Notice that there are two model-settings.json files - one for the scikit-learn model and one for the XGBoost model.

We can now just run mlserver start . and it will start handling requests for both models.

[mlserver] INFO - Loaded model 'fashion-sklearn' succesfully.[mlserver] INFO - Loaded model 'fashion-xgboost' succesfully.

Testing Accuracy of Multiple Models

With both models now up and running on MLServer, we can use the samples from our test set to validate how accurate each of our models is.

The following code sends a batch request (containing the full test set) to each of the models and then compares the predictions received to the true labels. Doing this across the whole test set gives us a reasonably good measure for each model's accuracy, which gets printed at the end.

import pandas as pdimport requestsimport json#Import the test data and split the data from the labelstest = pd.read_csv('./data/fashion-mnist_test.csv', header=0)y_test = test['label']X_test = test.drop(['label'],axis=1)#Build the inference requestinference_request = {    "inputs": [        {          "name": "predict",          "shape": X_test.shape,          "datatype": "FP64",          "data": X_test.values.tolist()        }    ]}#Send the prediction request to the relevant model, compare responses to training labels and calculate accuracydef infer(model_name, version):    endpoint = f"http://localhost:8080/v2/models/{model_name}/versions/{version}/infer"    response = requests.post(endpoint, json=inference_request)    #calculate accuracy    correct = 0    for i, prediction in enumerate(json.loads(response.text)['outputs'][0]['data']):        if y_test[i] == prediction:            correct += 1    accuracy = correct / len(y_test)    print(f'Model Accuracy for {model_name}: {accuracy}')infer("fashion-xgboost", "v1")infer("fashion-sklearn", "v1")

The results show that the XGBoost model slightly outperforms the SVM scikit-learn one:

Model Accuracy for fashion-xgboost: 0.8953Model Accuracy for fashion-sklearn: 0.864

Summary

Hopefully by now you've gained an understanding of how easy it is to serve models using MLServer. For further info it's worth reading the docs and taking a look at the examples for different frameworks.

For MLFlow users you can now serve models directly in MLFlow using MLServer and if you're a Kubernetes user you should definitely check out Seldon Core - an open source tool that deploys models to Kubernetes (it uses MLServer under the covers).

All of the code from this example can be found here.

Original Link: https://dev.to/ukcloudman/serving-python-machine-learning-models-with-ease-37kh

Share this article:

View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To