pulumi-hugo-cn/themes/default/content/blog/mlops-huggingface-llm-aws-sagemaker-python/index.md

---
date: 2023-09-11
title: "Deploy AI Models on Amazon SageMaker using Pulumi Python IaC"
allow_long_title: true
meta_desc: "Guided short tutorial on starting a Pulumi infrastructure as code project to deploy Hugging Face LLMs on Amazon SageMaker machine learning platform with Python"
meta_image: "meta.png"
authors:
    - kat-morgan
tags:
    - ai
    - ml
    - iac
    - aws
    - llm
    - vllm
    - aiops
    - mlops
    - llama
    - llama2
    - devops
    - python
    - SageMaker
    - huggingface
    - platform_engineering
---

[Pulumi CLI]:/docs/install/
[Pulumi Account]:https://app.pulumi.com/signup
[Pulumi Template]:/blog/how-to-create-and-share-a-pulumi-template
[Pulumi Templates]:/blog/how-to-create-and-share-a-pulumi-template
[Pulumi Python]:https://www.pulumi.com/docs/reference/pkg/python/pulumi
[Infrastructure as Code]:https://www.pulumi.com/what-is/what-is-infrastructure-as-code
[IaC]:https://www.pulumi.com/what-is/what-is-infrastructure-as-code
[IAM Roles]:/registry/packages/aws/api-docs/iam/role
[Amazon SageMaker Model Endpoint]:/registry/packages/aws/api-docs/SageMaker/model
[CloudWatch alarms]:/registry/packages/aws/api-docs/cloudwatch/metricalarm
[Pulumi project and stack]:/docs/using-pulumi/organizing-projects-stacks/
[Pulumi Cloud]:https://www.pulumi.com/product/pulumi-cloud
[Pulumi state]:https://www.pulumi.com/docs/concepts/state
[Python Virtual Environment]:/docs/languages-sdks/python/#virtual-environments
[venv]:/docs/languages-sdks/python/#virtual-environments

[Python]:https://www.python.org
[Python3]:https://www.python.org/downloads
[natural language prompt]:https://en.wikipedia.org/wiki/Prompt_engineering
[Meta AI LlaMa 2]:https://ai.meta.com/llama
[Hugging Face]:https://huggingface.co
[NousResearch/Llama-2-7b-chat-hf]:https://huggingface.co/NousResearch/Llama-2-7b-chat-hf
[AWS CLI]:https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
[AWS Credentials]:https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-authentication.html
[Amazon SageMaker]:https://aws.amazon.com/pm/SageMaker
[tremendous value]:https://blogs.nvidia.com/blog/2023/01/26/what-are-large-language-models-used-for
[Large Language Models]:https://en.wikipedia.org/wiki/Large_language_model
[LLM]:https://en.wikipedia.org/wiki/Large_language_model

Running models from Hugging Face on Amazon SageMaker is a popular deployment option for AI/ML services. While the SageMaker console allows for provisioning these cloud resources, this deployment pattern is labor intensive to document and vulnerable to human errors when reproducing as a regular operations practice. Infrastructure as Code (IaC) offers a reliable and easy to duplicate deployment practice. By developing this IaC with Pulumi, practitioners can choose to write their infrastructure code in Python and seamlessly develop both AI application code and IaC code in the same language.

In this short tutorial we will deploy a publicly available [Meta AI LlaMa 2] based model from [Hugging Face], on [Amazon SageMaker]. Then we will test it with a [natural language prompt] using a short [Python] script.

## Pulumi makes AI/ML Easy?

We will use the `sagemaker-aws-python` [Pulumi Template] to bootstrap our Python SageMaker IaC. Templates allow you to quickly bootstrap new Pulumi projects from a working scaffold which you can then customize further to your needs. Out of the box, this template provisions Amazon [IAM Roles] to assign SageMaker privileges, [CloudWatch alarms] to alert incase of latency or error spikes for the endpoint, and of course a [Meta AI LlaMa 2] based LLM ([NousResearch/Llama-2-7b-chat-hf]) hosted on [Hugging Face].

## Requirements

* [Python3] (3.9+)
* [Pulumi CLI]
* [Pulumi Account]
* [AWS CLI]
* [AWS Credentials] (pre-configured)

## Instructions

### 1. Login to Pulumi Cloud and initialize stack

Let's begin by logging into [Pulumi Cloud]:

```bash
# There are many ways to store Pulumi state, here we use Pulumi Cloud
# Other state backends include S3, local file, and more
pulumi login
```

> If you're unsure of which [Pulumi state] backend you're using, you can check by running `pulumi whoami` (or `pulumi whoami -v` for more info).

### 2. Prepare a new Pulumi project

Here we create our new Pulumi project directory and populate it from the `sagemaker-aws-python` [Pulumi template](https://github.com/pulumi/templates)

```bash
# Create a new directory & change directories into it
mkdir newSageMaker && cd newSageMaker

# Start your project from the sagemaker-aws-python template
# Follow along with the prompts to create your new project and initialize a stack
pulumi new sagemaker-aws-python
```

While creating a new project from the `sagemaker-aws-python` Pulumi template, you will be prompted for a [project](https://www.pulumi.com/docs/concepts/projects/) name, description, [stack](https://www.pulumi.com/docs/concepts/stack/) name, and Amazon Web Service [Region](https://aws.amazon.com/about-aws/global-infrastructure/regions_az/). You can proceed with defaults, or supply your own.

After responding to all prompts, the `pulumi new` command will proceed to setup a [Python Virtual Environment] (venv) and download all dependencies into the [venv].

![pulumi new template command](pulumi-new-sagemaker-template-and-stack.png)

![Pulumi new project ready to go message](pulumi-new-project-ready-to-go.png)

### 3. Deploy your model as a new SageMaker endpoint

This step may take between 10 and 20 minutes while Amazon builds your infrastructure and deploys the configured model. You can follow along in the console as resources are provisioned, or open the link displayed in terminal to view the deployment status and other stack information in Pulumi Cloud.

```bash
pulumi up
```

![Screenshot of iTerm2 displaying the resulting stack output after deploying the base sagemaker-aws-python Pulumi template](pulumi-up-sagemaker-endpoint.png)

### 4. Try your new SageMaker endpoint

Once your stack has finished deploying, use this rudimentary Python snippet to test the deployed SageMaker endpoint.

> NOTE: Notice that we are using `us-east-1` in this script. Be sure to change the region in Python to match the region you deployed the SageMaker endpoint into.

First, save the following python snippet as `test.py`:

```python
import json, boto3, argparse

def main(endpoint_name):
    client = boto3.client('SageMaker-runtime', region_name='us-east-1')
    payload = json.dumps({"inputs": "In 3 words, name the biggest mountain on earth?"})
    response = client.invoke_endpoint(EndpointName=endpoint_name, ContentType="application/json", Body=payload)
    print("Response:", json.loads(response['Body'].read().decode()))

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("endpoint_name")
    main(parser.parse_args().endpoint_name)
```

Once you have the `test.py` script created, source the Python virtual environment named `venv` (created automatically by Pulumi) and run the script using the name of your new endpoint, taken directly from the Pulumi stack output:

```bash
# Source the venv
# On Linux & MacOS
source venv/bin/activate

# Execute test.py
python3 test.py $(pulumi stack output EndpointName)
```

![Screenshot of executing test.py in terminal to generate a response to the question "what is the biggest mountain on earth?"](sagemaker-test-py.png)

### 5. Cleanup all resources

Finally, when you're finished with testing your Hugging Face model on SageMaker, you can easily clean up un-used resources with one easy command.

```bash
pulumi destroy
```

## Conclusion

To recap, in a few commands, we created a new Pulumi Python project from a ready-to-roll template, deployed an LLM endpoint on Amazon SageMaker, and tested it with a short Python script to generate a response from our model!

AI and ML is rapidly becoming a necessity with every new day. It may appear daunting or out of reach at first glance, but with the power of IaC written as Pulumi Python programs, getting started has never been easier.

If you followed along then tell us how it worked out for you! We would love to know what you are looking forward to, or if you have ideas for future installments of the Pulumi Python #PulumiMLOps series!

Join us on [Twitter](https://twitter.com/pulumicorp), and on the [Pulumi Community Slack](https://slack.pulumi.com) to decide what #PulumiMLOps we take on next!
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00			`---`
bump publish date 2023-09-11 08:11:41 -07:00			`date: 2023-09-11`
title seo improvement 2023-09-11 08:47:15 -07:00			`title: "Deploy AI Models on Amazon SageMaker using Pulumi Python IaC"`
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00			`allow_long_title: true`
title seo improvement 2023-09-11 08:47:15 -07:00			`meta_desc: "Guided short tutorial on starting a Pulumi infrastructure as code project to deploy Hugging Face LLMs on Amazon SageMaker machine learning platform with Python"`
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00			`meta_image: "meta.png"`
			`authors:`
			`- kat-morgan`
			`tags:`
			`- ai`
			`- ml`
			`- iac`
			`- aws`
			`- llm`
			`- vllm`
review & corrections 2023-09-10 12:16:01 -07:00			`- aiops`
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00			`- mlops`
			`- llama`
			`- llama2`
review & corrections 2023-09-10 12:16:01 -07:00			`- devops`
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00			`- python`
s/Sagemaker/SageMaker/ 2023-09-11 07:36:44 -07:00			`- SageMaker`
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00			`- huggingface`
meta tag update 2023-09-11 22:05:06 +00:00			`- platform_engineering`
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00			`---`

review & corrections 2023-09-10 12:16:01 -07:00			`[Pulumi CLI]:/docs/install/`
			`[Pulumi Account]:https://app.pulumi.com/signup`
			`[Pulumi Template]:/blog/how-to-create-and-share-a-pulumi-template`
			`[Pulumi Templates]:/blog/how-to-create-and-share-a-pulumi-template`
			`[Pulumi Python]:https://www.pulumi.com/docs/reference/pkg/python/pulumi`
			`[Infrastructure as Code]:https://www.pulumi.com/what-is/what-is-infrastructure-as-code`
			`[IaC]:https://www.pulumi.com/what-is/what-is-infrastructure-as-code`
review feedback 2023-09-11 21:54:46 +00:00			`[IAM Roles]:/registry/packages/aws/api-docs/iam/role`
s/Sagemaker/SageMaker/ 2023-09-11 07:36:44 -07:00			`[Amazon SageMaker Model Endpoint]:/registry/packages/aws/api-docs/SageMaker/model`
review feedback 2023-09-11 21:54:46 +00:00			`[CloudWatch alarms]:/registry/packages/aws/api-docs/cloudwatch/metricalarm`
review & corrections 2023-09-10 12:16:01 -07:00			`[Pulumi project and stack]:/docs/using-pulumi/organizing-projects-stacks/`
			`[Pulumi Cloud]:https://www.pulumi.com/product/pulumi-cloud`
			`[Pulumi state]:https://www.pulumi.com/docs/concepts/state`
			`[Python Virtual Environment]:/docs/languages-sdks/python/#virtual-environments`
			`[venv]:/docs/languages-sdks/python/#virtual-environments`

			`[Python]:https://www.python.org`
			`[Python3]:https://www.python.org/downloads`
			`[natural language prompt]:https://en.wikipedia.org/wiki/Prompt_engineering`
			`[Meta AI LlaMa 2]:https://ai.meta.com/llama`
			`[Hugging Face]:https://huggingface.co`
			`[NousResearch/Llama-2-7b-chat-hf]:https://huggingface.co/NousResearch/Llama-2-7b-chat-hf`
			`[AWS CLI]:https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html`
			`[AWS Credentials]:https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-authentication.html`
s/Sagemaker/SageMaker/ 2023-09-11 07:36:44 -07:00			`[Amazon SageMaker]:https://aws.amazon.com/pm/SageMaker`
review & corrections 2023-09-10 12:16:01 -07:00			`[tremendous value]:https://blogs.nvidia.com/blog/2023/01/26/what-are-large-language-models-used-for`
			`[Large Language Models]:https://en.wikipedia.org/wiki/Large_language_model`
			`[LLM]:https://en.wikipedia.org/wiki/Large_language_model`
review feedback 2023-09-11 21:54:46 +00:00
review feedback 2023-09-11 22:02:15 +00:00			Running models from Hugging Face on Amazon SageMaker is a popular deployment option for AI/ML services. While the SageMaker console allows for provisioning these cloud resources, this deployment pattern is labor intensive to document and vulnerable to human errors when reproducing as a regular operations practice. Infrastructure as Code (IaC) offers a reliable and easy to duplicate deployment practice. By developing this IaC with Pulumi, practitioners can choose to write their infrastructure code in Python and seamlessly develop both AI application code and IaC code in the same language.
review & corrections 2023-09-10 12:16:01 -07:00
s/Sagemaker/SageMaker/ 2023-09-11 07:36:44 -07:00			`In this short tutorial we will deploy a publicly available [Meta AI LlaMa 2] based model from [Hugging Face], on [Amazon SageMaker]. Then we will test it with a [natural language prompt] using a short [Python] script.`
review & corrections 2023-09-10 12:16:01 -07:00
			`## Pulumi makes AI/ML Easy?`

review feedback 2023-09-11 22:02:15 +00:00			We will use the `sagemaker-aws-python` [Pulumi Template] to bootstrap our Python SageMaker IaC. Templates allow you to quickly bootstrap new Pulumi projects from a working scaffold which you can then customize further to your needs. Out of the box, this template provisions Amazon [IAM Roles] to assign SageMaker privileges, [CloudWatch alarms] to alert incase of latency or error spikes for the endpoint, and of course a [Meta AI LlaMa 2] based LLM ([NousResearch/Llama-2-7b-chat-hf]) hosted on [Hugging Face].
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00
			`## Requirements`

review & corrections 2023-09-10 12:16:01 -07:00			`* [Python3] (3.9+)`
			`* [Pulumi CLI]`
			`* [Pulumi Account]`
			`* [AWS CLI]`
			`* [AWS Credentials] (pre-configured)`
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00
			`## Instructions`

			`### 1. Login to Pulumi Cloud and initialize stack`

review & corrections 2023-09-10 12:16:01 -07:00			`Let's begin by logging into [Pulumi Cloud]:`
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00
			```bash
			`# There are many ways to store Pulumi state, here we use Pulumi Cloud`
			`# Other state backends include S3, local file, and more`
			`pulumi login`
			```

review & corrections 2023-09-10 12:16:01 -07:00			> If you're unsure of which [Pulumi state] backend you're using, you can check by running `pulumi whoami` (or `pulumi whoami -v` for more info).
Add value prop, minor text fixes throughout Add a draft value prop (i.e., why this is important). Make minor text fixes (capitalization, proper naming) throughout. Signed-off-by: Scott Lowe <slowe@pulumi.com> 2023-09-09 22:13:51 -04:00
			`### 2. Prepare a new Pulumi project`
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00
s/SageMaker-aws-python/sagemaker-aws-python/ 2023-09-11 08:14:25 -07:00			Here we create our new Pulumi project directory and populate it from the `sagemaker-aws-python` [Pulumi template](https://github.com/pulumi/templates)
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00
			```bash
			`# Create a new directory & change directories into it`
s/Sagemaker/SageMaker/ 2023-09-11 07:36:44 -07:00			`mkdir newSageMaker && cd newSageMaker`
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00
s/SageMaker-aws-python/sagemaker-aws-python/ 2023-09-11 08:14:25 -07:00			`# Start your project from the sagemaker-aws-python template`
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00			`# Follow along with the prompts to create your new project and initialize a stack`
s/SageMaker-aws-python/sagemaker-aws-python/ 2023-09-11 08:14:25 -07:00			`pulumi new sagemaker-aws-python`
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00			```

s/SageMaker-aws-python/sagemaker-aws-python/ 2023-09-11 08:14:25 -07:00			While creating a new project from the `sagemaker-aws-python` Pulumi template, you will be prompted for a [project](https://www.pulumi.com/docs/concepts/projects/) name, description, [stack](https://www.pulumi.com/docs/concepts/stack/) name, and Amazon Web Service [Region](https://aws.amazon.com/about-aws/global-infrastructure/regions_az/). You can proceed with defaults, or supply your own.
review & corrections 2023-09-10 12:16:01 -07:00
			After responding to all prompts, the `pulumi new` command will proceed to setup a [Python Virtual Environment] (venv) and download all dependencies into the [venv].

s/SageMaker-aws-python/sagemaker-aws-python/ 2023-09-11 08:14:25 -07:00			`![pulumi new template command](pulumi-new-sagemaker-template-and-stack.png)`
review & corrections 2023-09-10 12:16:01 -07:00
			`![Pulumi new project ready to go message](pulumi-new-project-ready-to-go.png)`

s/Sagemaker/SageMaker/ 2023-09-11 07:36:44 -07:00			`### 3. Deploy your model as a new SageMaker endpoint`
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00
review & corrections 2023-09-10 12:16:01 -07:00			`This step may take between 10 and 20 minutes while Amazon builds your infrastructure and deploys the configured model. You can follow along in the console as resources are provisioned, or open the link displayed in terminal to view the deployment status and other stack information in Pulumi Cloud.`
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00
			```bash
			`pulumi up`
			```

s/SageMaker-aws-python/sagemaker-aws-python/ 2023-09-11 08:14:25 -07:00			`![Screenshot of iTerm2 displaying the resulting stack output after deploying the base sagemaker-aws-python Pulumi template](pulumi-up-sagemaker-endpoint.png)`
review & corrections 2023-09-10 12:16:01 -07:00
s/Sagemaker/SageMaker/ 2023-09-11 07:36:44 -07:00			`### 4. Try your new SageMaker endpoint`
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00
s/Sagemaker/SageMaker/ 2023-09-11 07:36:44 -07:00			`Once your stack has finished deploying, use this rudimentary Python snippet to test the deployed SageMaker endpoint.`
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00
s/Sagemaker/SageMaker/ 2023-09-11 07:36:44 -07:00			> NOTE: Notice that we are using `us-east-1` in this script. Be sure to change the region in Python to match the region you deployed the SageMaker endpoint into.
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00
Add value prop, minor text fixes throughout Add a draft value prop (i.e., why this is important). Make minor text fixes (capitalization, proper naming) throughout. Signed-off-by: Scott Lowe <slowe@pulumi.com> 2023-09-09 22:13:51 -04:00			First, save the following python snippet as `test.py`:
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00
			```python
			`import json, boto3, argparse`

			`def main(endpoint_name):`
s/Sagemaker/SageMaker/ 2023-09-11 07:36:44 -07:00			`client = boto3.client('SageMaker-runtime', region_name='us-east-1')`
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00			`payload = json.dumps({"inputs": "In 3 words, name the biggest mountain on earth?"})`
			`response = client.invoke_endpoint(EndpointName=endpoint_name, ContentType="application/json", Body=payload)`
			`print("Response:", json.loads(response['Body'].read().decode()))`

			`if __name__ == "__main__":`
			`parser = argparse.ArgumentParser()`
			`parser.add_argument("endpoint_name")`
			`main(parser.parse_args().endpoint_name)`
			```

Add value prop, minor text fixes throughout Add a draft value prop (i.e., why this is important). Make minor text fixes (capitalization, proper naming) throughout. Signed-off-by: Scott Lowe <slowe@pulumi.com> 2023-09-09 22:13:51 -04:00			Once you have the `test.py` script created, source the Python virtual environment named `venv` (created automatically by Pulumi) and run the script using the name of your new endpoint, taken directly from the Pulumi stack output:
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00
			```bash
review & corrections 2023-09-10 12:16:01 -07:00			`# Source the venv`
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00			`# On Linux & MacOS`
			`source venv/bin/activate`

			`# Execute test.py`
			`python3 test.py $(pulumi stack output EndpointName)`
			```

s/SageMaker-aws-python/sagemaker-aws-python/ 2023-09-11 08:14:25 -07:00			`![Screenshot of executing test.py in terminal to generate a response to the question "what is the biggest mountain on earth?"](sagemaker-test-py.png)`
review & corrections 2023-09-10 12:16:01 -07:00
Add value prop, minor text fixes throughout Add a draft value prop (i.e., why this is important). Make minor text fixes (capitalization, proper naming) throughout. Signed-off-by: Scott Lowe <slowe@pulumi.com> 2023-09-09 22:13:51 -04:00			`### 5. Cleanup all resources`
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00
s/Sagemaker/SageMaker/ 2023-09-11 07:36:44 -07:00			`Finally, when you're finished with testing your Hugging Face model on SageMaker, you can easily clean up un-used resources with one easy command.`
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00
			```bash
			`pulumi destroy`
			```

			`## Conclusion`

s/Sagemaker/SageMaker/ 2023-09-11 07:36:44 -07:00			`To recap, in a few commands, we created a new Pulumi Python project from a ready-to-roll template, deployed an LLM endpoint on Amazon SageMaker, and tested it with a short Python script to generate a response from our model!`
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00
review & corrections 2023-09-10 12:16:01 -07:00			`AI and ML is rapidly becoming a necessity with every new day. It may appear daunting or out of reach at first glance, but with the power of IaC written as Pulumi Python programs, getting started has never been easier.`
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00
convert hashtags (#3361) 2023-09-11 18:34:35 -07:00			`If you followed along then tell us how it worked out for you! We would love to know what you are looking forward to, or if you have ideas for future installments of the Pulumi Python #PulumiMLOps series!`
mlops blog post for deploying a model to sagemaker 2023-09-08 23:03:27 +00:00
convert hashtags (#3361) 2023-09-11 18:34:35 -07:00			`Join us on [Twitter](https://twitter.com/pulumicorp), and on the [Pulumi Community Slack](https://slack.pulumi.com) to decide what #PulumiMLOps we take on next!`