An Interest In:
Web News this Week
- April 27, 2024
- April 26, 2024
- April 25, 2024
- April 24, 2024
- April 23, 2024
- April 22, 2024
- April 21, 2024
Serving Python Machine Learning Models With Ease
Ever trained a new model and just wanted to use it through an API straight away? Sometimes you don't want to bother writing Flask code or containerizing your model and running it in Docker. If that sounds like you, you definitely want to check out MLServer. It's a python based inference server that recently went GA and what's really neat about it is that it's a highly-performant server designed for production environments too. That means that, by serving models locally, you are running in the exact same environment as they will be in when they get to production.
This blog walks you through how to use MLServer by using a couple of image models as examples...
Dataset
The dataset we're going to work with is the Fashion MNIST dataset. It contains 70,000 images of clothing in greyscale 28x28 pixels across 10 different classes (top, dress, coat, trouser etc...).
If you want to reproduce the code from this blog, make sure you download the files and extract them in to a folder named data
. They have been omitted from the github repo because they are quite large.
Training the Scikit-learn Model
First up, we're going to train a support vector machine (SVM) model using the scikit-learn framework. We'll then save the model to a file named Fashion-MNIST.joblib
.
import pandas as pdfrom sklearn import svmimport timeimport joblib#Load Training Datatrain = pd.read_csv('../../data/fashion-mnist_train.csv', header=0)y_train = train['label']X_train = train.drop(['label'], axis=1)classifier = svm.SVC(kernel="poly", degree=4, gamma=0.1)#Train Modelstart = time.time()classifier.fit(X_train.values, y_train.values)end = time.time()exec_time = end-startprint(f'Execution time: {exec_time} seconds')#Save Modeljoblib.dump(classifier, "Fashion-MNIST.joblib")
Note: The SVM algorithm is not particularly well suited to large datasets because of it's quadratic nature. The model in this example will, depending on your hardware, take a couple of minutes to train.
Serving the Scikit-learn Model
Ok, so we've now got a saved model file Fashion-MNIST.joblib
. Let's take a look at how we can serve that using MLServer...
First up, we need to install MLServer.
pip install mlserver
The additional runtimes are optional but make life really easy when serving models, we'll install the Scikit-Learn and XGBoost ones too
pip install mlserver-sklearn mlserver-xgboost
You can find details on all of the inference runtimes here
Once we've done that, all we need to do is add two configuration files:
settings.json
- This contains the configuration for the server itself.model-settings.json
- As the name suggests, this file contains configuration for the model we want to run.
For our settings.json
file it's enough to just define single parameter:
{ "debug": "true"}
The model-settings.json
file requires a few more bits of info as it needs to know about the model we're trying to serve:
{ "name": "fashion-sklearn", "implementation": "mlserver_sklearn.SKLearnModel", "parameters": { "uri": "./Fashion_MNIST.joblib", "version": "v1" }}
The name
parameter should be self-explanatory. It gives MLServer a unique identifier which is particularly useful when serving multiple models (we'll come to that in a bit). The implementation
defines which pre-built server, if any, to use. It is heavily coupled to the machine learning framework used to train your model. In our case we trained the model using scikit-learn so we're going to use the scikit-learn implementation for MLServer. For model parameters
we just need to provide the location of our model file as well as a version number.
That's it, two small config files and we're ready to serve our model using the command:
mlserver start .
Boom, we've now got our model running on a production-ready server locally. It's now ready to accept requests over HTTP and gRPC (default ports 8080
and 8081
respectively).
Testing the Model
Now that our model is up and running. Let's send some requests to see it in action.
To make predictions on our model, we need to send a POST request to the following URL:
http://localhost:8080/v2/models/<MODEL_NAME>/versions/<VERSION>/infer
That means to access our scikit-learn model that we trained earlier, we need to replace the MODEL_NAME
with fashion-sklearn
and VERSION
with v1
.
The code below shows how to import the test data, make a request to the model server and then compare the result with the actual label:
import pandas as pdimport requests#Import test data, grab the first row and corresponding labeltest = pd.read_csv('../../data/fashion-mnist_test.csv', header=0)y_test = test['label'][0:1]X_test = test.drop(['label'],axis=1)[0:1]#Prediction request parametersinference_request = { "inputs": [ { "name": "predict", "shape": X_test.shape, "datatype": "FP64", "data": X_test.values.tolist() } ]}endpoint = "http://localhost:8080/v2/models/fashion-sklearn/versions/v1/infer"#Make request and print responseresponse = requests.post(endpoint, json=inference_request)print(response.text)print(y_test.values)
When running the test.py
code above we get the following response from MLServer:
{ "model_name": "fashion-sklearn", "model_version": "v1", "id": "31c3fa70-2e56-49b1-bcec-294452dbe73c", "parameters": null, "outputs": [ { "name": "predict", "shape": [ 1 ], "datatype": "INT64", "parameters": null, "data": [ 0 ] } ]}
You'll notice that MLServer has generated a request id and automatically added metadata about the model and version that was used to serve our request. Capturing this kind of metadata is super important once our model gets to production; it allows us to log every request for audit and troubleshooting purposes.
You might also notice that MLServer has returned an array for outputs
. In our request we only sent one row of data but MLServer also handles batch requests and returns them together. You can even use a technique called adaptive batching to optimise the way multiple requests are handled in production environments.
In our example above, the model's prediction can be found in outputs[0].data
which shows that the model has labeled this sample with the category 0
(The value 0 corresponds to the category t-shirt/top
). The true label for that sample was a 0
too so the model got this prediction correct!
Training the XGBoost Model
Now that we've seen how to create and serve a single model using MLServer, let's take a look at how we'd handle multiple models trained in different frameworks.
We'll be using the same Fashion MNIST dataset but, this time, we'll train an XGBoost model instead.
import pandas as pdimport xgboost as xgbimport time#Load Training Datatrain = pd.read_csv('../../data/fashion-mnist_train.csv', header=0)y_train = train['label']X_train = train.drop(['label'], axis=1)dtrain = xgb.DMatrix(X_train.values, label=y_train.values)#Train Modelparams = { 'max_depth': 5, 'eta': 0.3, 'verbosity': 1, 'objective': 'multi:softmax', 'num_class' : 10}num_round = 50start = time.time()bstmodel = xgb.train(params, dtrain, num_round, evals=[(dtrain, 'label')], verbose_eval=10)end = time.time()exec_time = end-startprint(f'Execution time: {exec_time} seconds')#Save Modelbstmodel.save_model('Fashion_MNIST.json')
The code above, used to train the XGBoost model, is similar to the code we used earlier to train the scikit-learn model but this time our model has been saved in an XGBoost-compatible format as Fashion_MNIST.json
.
Serving Multiple Models
One of the cool things about MLServer is that it supports multi-model serving. This means that you don't have to create or run a new server for each ML model you want to deploy. Using the models we built above, we'll use this feature to serve them both at once.
When MLServer starts up, it will search the directory (and any subdirectories) for model-settings.json
files. If you've got multiple model-settings.json
files then it'll automatically serve them all.
Note: you still only need a single settings.json
(server config) file in the root directory
Here's a breakdown of my directory structure for reference:
. data fashion-mnist_test.csv fashion-mnist_train.csv models sklearn Fashion_MNIST.joblib model-settings.json test.py train.py xgboost Fashion_MNIST.json model-settings.json test.py train.py README.md settings.json test_models.py
Notice that there are two model-settings.json
files - one for the scikit-learn model and one for the XGBoost model.
We can now just run mlserver start .
and it will start handling requests for both models.
[mlserver] INFO - Loaded model 'fashion-sklearn' succesfully.[mlserver] INFO - Loaded model 'fashion-xgboost' succesfully.
Testing Accuracy of Multiple Models
With both models now up and running on MLServer, we can use the samples from our test set to validate how accurate each of our models is.
The following code sends a batch request (containing the full test set) to each of the models and then compares the predictions received to the true labels. Doing this across the whole test set gives us a reasonably good measure for each model's accuracy, which gets printed at the end.
import pandas as pdimport requestsimport json#Import the test data and split the data from the labelstest = pd.read_csv('./data/fashion-mnist_test.csv', header=0)y_test = test['label']X_test = test.drop(['label'],axis=1)#Build the inference requestinference_request = { "inputs": [ { "name": "predict", "shape": X_test.shape, "datatype": "FP64", "data": X_test.values.tolist() } ]}#Send the prediction request to the relevant model, compare responses to training labels and calculate accuracydef infer(model_name, version): endpoint = f"http://localhost:8080/v2/models/{model_name}/versions/{version}/infer" response = requests.post(endpoint, json=inference_request) #calculate accuracy correct = 0 for i, prediction in enumerate(json.loads(response.text)['outputs'][0]['data']): if y_test[i] == prediction: correct += 1 accuracy = correct / len(y_test) print(f'Model Accuracy for {model_name}: {accuracy}')infer("fashion-xgboost", "v1")infer("fashion-sklearn", "v1")
The results show that the XGBoost model slightly outperforms the SVM scikit-learn one:
Model Accuracy for fashion-xgboost: 0.8953Model Accuracy for fashion-sklearn: 0.864
Summary
Hopefully by now you've gained an understanding of how easy it is to serve models using MLServer. For further info it's worth reading the docs and taking a look at the examples for different frameworks.
For MLFlow users you can now serve models directly in MLFlow using MLServer and if you're a Kubernetes user you should definitely check out Seldon Core - an open source tool that deploys models to Kubernetes (it uses MLServer under the covers).
All of the code from this example can be found here.
Original Link: https://dev.to/ukcloudman/serving-python-machine-learning-models-with-ease-37kh
Dev To
An online community for sharing and discovering great ideas, having debates, and making friendsMore About this Source Visit Dev To