Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
August 17, 2022 03:58 pm GMT

How to Deploy ML Models Using Gravity AI and Meadowrun

Transform your containerized models-as-services into batch jobs

GravityAI is a marketplace for ML models where data scientists can publish their models as containers, and consumers can subscribe to access those models. In this article, well talk about different options for deploying these containers and then walk through using them for batch jobs via Meadowrun, which is an open-source library for running your Python code and containers in the cloud.

Runner on a starting block
A library inside of a container. Photo by Manuel Palmeira on Unsplash

Containers vs libraries

This section lays out some motivation for this post if youre interested in just getting things working, please go to the next section!

If you think of an ML model as a library, then it might seem more natural to publish it as a package, either on PyPI for use with pip or Anaconda.org for use with conda, rather than a container. Hugging Faces transformers is a good example you run pip install transformers, then your Python interpreter can do things like:

from transformers import pipelineclassifier = pipeline("sentiment-analysis")classifier("We are very happy to show you the  Transformers library.")

There are a few disadvantages with this approach:

  • For pip specifically (not conda), theres no way for packages to express non-Python dependencies. With the transformers library, youll usually depend on something like CUDA for GPU support, but users need to know to install that separately.
  • The transformers library also needs to download gigabytes of model weights in order to do anything useful. These get cached in a local directory so they dont need to be downloaded every time you run, but for some contexts you might want to make these model weights a fixed part of the deployment.
  • Finally, when publishing a pip or conda package, users expect you to specify your dependencies relatively flexibly. E.g. the transformers package specifies tensorflow>=2.3 which declares that the package works with any version of tensorflow 2.3 or higher. This means that if the tensorflow maintainers introduce a bug or break backwards compatibility, they can effectively cause your package to stop working (welcome to dependency hell). Flexible dependencies are useful for libraries, because it means more people can install your package into their environments. But for deployments, e.g. if youre deploying a model within your company, youre not taking advantage of that flexibility and youd rather have the certainty its going to work every time, and reproducibility of the deployment.

One common way to solve these problems is to build a container. You can install CUDA into your container, if you include your model weights in the image Docker will make sure you only have a single copy of that data on each machine as long as youre managing your Docker layers correctly, and youll be able to choose and test the specific version of tensorflow that goes into the container.

So weve solved a bunch of problems, but weve traded them for a new problem, which is that each container runs in its own little world and we have to do some work to expose our API to our user. With a library, the consumer can just call a Python function, pass in some inputs and get back some outputs. With a container, our model is more like an app or service, so theres no built-in way for a consumer to call the container with some input data and get back some output data.

One option is to create a command line interface, but that requires explicitly binding input and output files to the container. This feels a bit unnatural, but we can see an example of it in this container image of ImageMagick by dpokidov. In the Usage section, the author recommends binding a local folder into the container in order to run it as a command line app.

The traditional answer to this question for Docker images is to expose an HTTP-based API, which is what Gravity AIs containers do. But this means turning our function (e.g. classifier()) into a service, which means we need to figure out where to put this service. To give the traditional answer again, we could deploy it to Kubernetes with an autoscaler and a load balancer in front of it, which can work well if e.g. you have a constant stream of processes that need to call classifier(). But you might instead have a use case where some data shows up from a vendor every few hours which triggers a batch processing job. In that case, things can get a bit weird. You might be in a situation where the batch job is running, calls classifier(), then has to wait for the autoscaler/Kubernetes to find a free machine that can spin up the service while the machine running the batch job is sitting idle.

In other words, both of these options (library vs service) are reasonable, but they come with their own disadvantages.

As a bit of an aside, you could imagine a way to get the best of both worlds with an extension to Docker that would allow you to publish a container that exposes a Python API, so that someone could call sentiment = call_container_api(image="huggingface/transformers", "my input text") directly from their python code. This would effectively be a remote procedure call into a container that is not running as a service but instead spun up just for the purpose of executing a function on-demand. This feels like a really heavyweight approach to solving dependency hell, but if your libraries are using a cross-platform memory format (hello Apache Arrow!) under the covers, you could imagine doing some fun tricks like giving the container a read-only view into the callers memory space to reduce the overhead. Its a bit implausible, but sometimes its helpful to sketch out these ideas to clarify the tradeoffs were making with the more practical bits of technology we have available.

Using models-in-containers locally

In this article, well tackle this batch jobs-with-containers scenario. To make this concrete, lets say that every morning, a vendor gives us a relatively large dump of everything everyone has said on the internet (Twitter, Reddit, Seeking Alpha, etc.) about the companies in the S&P 500 overnight. We want to feed these pieces of text into FinBERT, which is a version of BERT that has been fine-tuned for financial sentiment analysis. BERT is a language model from Google that was state of the art when it was published in 2018.

Well be using Gravity AIs container for FinBERT, but well also assume that we operate in a primarily batch process-oriented environment, so we dont have e.g. a Kubernetes cluster set up and even if we did it would probably be tricky to get the autoscaling right because of our usage pattern.

If were just trying this out on our local machine, its pretty straightforward to use, as per Gravity AIs documentation:

docker load -i Sentiment_Analysis_o_77f77f.docker.tar.gzdocker run -d -p 7000:80 gx-images:t-39c447b9e5b94d7ab75060d0a927807f

Sentiment_Analysis_o_77f77f.docker.tar.gz is the name of the file you download from Gravity AI and gx-images:t-39c447b9e5b94d7ab75060d0a927807f is the name of the Docker image once its loaded from the .tar.gz file, which will show up in the output of the first command.

And then we can write a bit of glue code:

import timeimport requestsdef upload_license_file(base_url: str) -> None:    with open("Sentiment Analysis on Financial Text.gravity-ai.key", "r") as f:        response = requests.post(f"{base_url}/api/license/file", files={"License": f})        response.raise_for_status()def call_finbert_container(base_url: str, input_text: str) -> str:    add_job_response = requests.post(        f"{base_url}/data/add-job",        files={            "CallbackUrl": (None, ""),            "File": ("temp.txt", input_text, "text/plain"),            "MimeType": (None, "text/plain"),        }    )    add_job_response.raise_for_status()    add_job_response_json = add_job_response.json()    if (        add_job_response_json.get("isError", False)        or add_job_response_json.get("errorMessage") is not None    ):        raise ValueError(f"Error from server: {add_job_response_json.get('errorMessage')}")    job_id = add_job_response_json["data"]["id"]    job_status = add_job_response_json["data"]["status"]    while job_status != "Complete":        status_response = requests.get(f"{base_url}/data/status/{job_id}")        status_response.raise_for_status()        job_status = status_response.json()["data"]["status"]        time.sleep(1)    result_response = requests.get(f"{base_url}/data/result/{job_id}")    return result_response.textdef process_data():    base_url = "http://localhost:7000"    upload_license_file(base_url)    # Pretend to query input data from somewhere    sample_data = [        "Finnish media group Talentum has issued a profit warning",        "The loss for the third quarter of 2007 was EUR 0.3 mn smaller than the loss of"        " the second quarter of 2007"    ]    results = [call_finbert_container(base_url, line) for line in sample_data]    # Pretend to store the output data somewhere    print(results)if __name__ == "__main__":    process_data()

call_finbert_container calls the REST API provided by the container to submit a job, polls for the job completion, and then returns the job result. process_data pretends to get some text data and processes it using our container, and then pretends to write the output somewhere else. Were also assuming youve downloaded the Gravity AI key to the current directory as Sentiment Analysis on Financial Text.gravity-ai.key.

Using models-in-containers on the cloud

This works great for playing around with our model locally, but at some point well probably want to run this on the cloud, either to access additional compute whether thats CPU or GPU, or to run this as a scheduled job with e.g. Airflow. The usual thing to do is package up our code (finbert_local_example.py and its dependencies) as a container, which means now we have two containers one containing our glue code, and the FinBERT container that we need to launch together and coordinate (i.e. our glue code container needs to know the address/name of the FinBERT container to access it). We might start reaching for Docker Compose which works great for long-running services, but in the context of an ad-hoc distributed batch job or a scheduled job, it will be tricky to work with.

Instead, well use Meadowrun to do most of the heavy lifting. Meadowrun will not only take care of the usual difficulties of allocating instances, deploying our code, etc., but also help launch an extra container and make it available to our code.

To follow along, youll need to set up an environment. Well show how this works with pip on Windows, but you should be able to follow along the package manager of your choice (conda environments dont work across platforms, so conda will only work if youre on Linux).

python -m venv venvvenv\Scripts\activate.batpip install requests meadowrunmeadowrun-manage-ec2 install

This creates a new virtaulenv, adds requests and meadowrun, and then installs Meadowrun into your AWS account.

When you download a container from Gravity AI, it comes as a .tar.gz file that needs to get uploaded to a container registry in order to work. There are slightly longer instructions in the Gravity AI documentation, but heres a short version of how to create an ECR (Elastic Container Registry) repository then upload a container from Gravity AI to it:

aws ecr create-repository --repository-name mygravityaiaws ecr get-login-password | docker login --username AWS --password-stdin 012345678901.dkr.ecr.us-east-2.amazonaws.comdocker tag gx-images:t-39c447b9e5b94d7ab75060d0a927807f 012345678901.dkr.ecr.us-east-2.amazonaws.com/mygravityai:finbertdocker push 012345678901.dkr.ecr.us-east-2.amazonaws.com/mygravityai:finbert

012345678901 appears in a few places in this snippet and it needs to be replaced with your account id. Youll see your account id in the output of the first command as registryId.

One more step before we can run some code: well need to give the Meadowrun role permissions to this ECR repository.

meadowrun-manage-ec2 grant-permission-to-ecr-repo mygravityai

Now we can run our code:

import asyncioimport meadowrunfrom finbert_local_example import upload_license_file, call_finbert_container# Normally this would live in S3 or a database somewhere_SAMPLE_DATA = {    "Company1": [        "Commission income fell to EUR 4.6 mn from EUR 5.1 mn in the corresponding "        "period in 2007",        "The purchase price will be paid in cash upon the closure of the transaction, "        "scheduled for April 1 , 2009"    ],    "Company2": [        "The loss for the third quarter of 2007 was EUR 0.3 mn smaller than the loss of"        " the second quarter of 2007",        "Consolidated operating profit excluding one-off items was EUR 30.6 mn, up from"        " EUR 29.6 mn a year earlier"    ]}def process_one_company(company_name: str):    base_url = "http://container-service-0"    data_for_company = _SAMPLE_DATA[company_name]    upload_license_file("http://container-service-0")    results = [call_finbert_container(base_url, line) for line in data_for_company]    return resultsasync def process_data():    results = await meadowrun.run_map(        process_one_company,        ["Company1", "Company2"],        meadowrun.AllocCloudInstance("EC2"),        meadowrun.Resources(logical_cpu=1, memory_gb=1.5, max_eviction_rate=80),        await meadowrun.Deployment.mirror_local(working_directory_globs=["*.key"]),        sidecar_containers=meadowrun.ContainerInterpreter(            "012345678901.dkr.ecr.us-east-2.amazonaws.com/mygravityai", "finbert"        ),    )    print(results)if __name__ == "__main__":    asyncio.run(process_data())

Lets walk through the process_data function line by line.

  • Meadowruns run_map function runs the specified function (in this case process_one_company) in parallel on the cloud. In this case, we provide two arguments (["Company1", "Company2"]) so well run these two tasks in parallel. The idea is that were splitting up the workload so that we can finish the job quickly.
  • AllocCloudInstance tells Meadowrun to launch an EC2 instance to run this job if we dont have one running already and Resources tells Meadowrun what resources are needed to run this code. In this case were requesting 1 CPU and 1.5 GB of RAM per task. Were also specifying that we were okay with spot instances up to an 80% eviction rate (aka probability of interruption).
  • mirror_local tells Meadowrun that we want to use the code in the current directory, which is important as were reusing some code from finbert_local_example.py. Meadowrun only uploads .py files by default, but in our case, we need to include the .key file in our current working directory so that we can apply the Gravity AI license.
  • Finally, container_services tells Meadowrun to launch a container with the specified image for every task we have running in parallel. Each task can access its associated container as container-service-0, which you can see in the code for process_one_company. If youre following along, youll need to edit the account id again to match your account id.

Lets look at a few of the more important lines from the output:

Launched 1 new instance(s) (total $0.0209/hr) for the remaining 2 workers: ec2-13-59-48-22.us-east-2.compute.amazonaws.com: r5d.large (2.0 CPU, 16.0 GB), spot ($0.0209/hr, 2.5% chance of interruption), will run 2 workers

Here Meadowrun is telling us that its launching the cheapest EC2 instance that can run our job. In this case, were only paying 2 per hour!

Next, Meadowrun will replicate our local environment on the EC2 instance by building a new container image, and then also pull the FinBERT container that we specified:

Building python environment in container  4e4e2c......Pulling docker image 012345678901.dkr.ecr.us-east-2.amazonaws.com/mygravityai:finbert

Our final result will be some output from the FinBERT model that looks like:

[['sentence,logit,prediction,sentiment_score
Commission income fell to EUR 4.6 mn from EUR 5.1 mn in the corresponding period in 2007,[0.24862055 0.44351986 0.30785954],negative,-0.1948993
',...

Closing remarks

Gravity AI packages up ML models as containers which are really easy to use as services. Making these containers work naturally for batch jobs takes a bit of work, but Meadowrun makes it easy!


Original Link: https://dev.to/hrichardlee/how-to-deploy-ml-models-using-gravity-ai-and-meadowrun-1mjn

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To