iSharkFly-Docs/amazon-bedrock-agentcore-samples

mirror of https://github.com/awslabs/amazon-bedrock-agentcore-samples.git synced 2025-09-08 20:50:46 +00:00

Dheeraj Oruganty e346e83bf1

fix(02-use-cases): SRE-Agent Deployment (#179 )

* Add missing credential_provider_name parameter to config.yaml.example

* Fix get_config function to properly parse YAML values with inline comments

* Enhanced get_config to prevent copy-paste whitespace errors in AWS identifiers

* Improve LLM provider configuration and error handling with bedrock as default

* Add OpenAPI templating system and fix hardcoded regions

* Add backend template build to Readme

* delete old yaml files

* Fix Cognito setup with automation script and missing domain creation steps

* docs: Add EC2 instance port configuration documentation

- Document required inbound ports (443, 8011-8014)
- Include SSL/TLS security requirements
- Add AWS security group best practices
- Provide port usage summary table

* docs: Add hyperlinks to prerequisites in README

- Link EC2 port configuration documentation
- Link IAM role authentication setup
- Improve navigation to detailed setup instructions

* docs: Add BACKEND_API_KEY to configuration documentation

- Document gateway environment variables section
- Add BACKEND_API_KEY requirement for credential provider
- Include example .env file format for gateway directory
- Explain usage in create_gateway.sh script

* docs: Add BACKEND_API_KEY to deployment guide environment variables

- Include BACKEND_API_KEY in environment variables reference table
- Mark as required for gateway setup
- Provide quick reference alongside other required variables

* docs: Add BedrockAgentCoreFullAccess policy and trust policy documentation

- Document AWS managed policy BedrockAgentCoreFullAccess
- Add trust policy requirements for bedrock-agentcore.amazonaws.com
- Reorganize IAM permissions for better clarity
- Remove duplicate trust policy section
- Add IAM role requirement to deployment prerequisites

* docs: Document role_name field in gateway config example

- Explain that role_name is used to create and manage the gateway
- Specify BedrockAgentCoreFullAccess policy requirement
- Note trust policy requirement for bedrock-agentcore.amazonaws.com
- Improve clarity for gateway configuration setup

* docs: Add AWS IP address ranges for production security enhancement

- Document AWS IP ranges JSON download for restricting access
- Reference official AWS documentation for IP address ranges
- Provide security alternatives to 0.0.0.0/0 for production
- Include examples of restricted security group configurations
- Enable egress filtering and region-specific access control

* style: Format Python code with black

- Reformat 14 Python files for consistent code style
- Apply PEP 8 formatting standards
- Improve code readability and maintainability

* docs: Update SRE agent prerequisites and setup documentation

- Convert prerequisites section to markdown table format
- Add SSL certificate provider examples (no-ip.com, letsencrypt.org)
- Add Identity Provider (IDP) requirement with setup_cognito.sh reference
- Clarify that all prerequisites must be completed before setup
- Add reference to domain name and cert paths needed for BACKEND_DOMAIN
- Remove Managing OpenAPI Specifications section (covered in use-case setup)
- Add Deployment Guide link to Development to Production section

Addresses issues #171 and #174

* fix: Replace 'AWS Bedrock' with 'Amazon Bedrock' in SRE agent files

- Updated error messages in llm_utils.py
- Updated comments in both .env.example files
- Ensures consistent naming convention across SRE agent codebase

---------

Co-authored-by: dheerajoruganty <dheo@amazon.com>
Co-authored-by: Amit Arora <aroraai@amazon.com>

2025-08-01 13:24:58 -04:00

13 KiB

Raw Permalink Blame History

SRE Agent Deployment Guide for Amazon Bedrock AgentCore Runtime

This guide walks you through the complete deployment process for the SRE Agent, from local testing to production deployment on Amazon Bedrock AgentCore Runtime.

Prerequisites

AWS CLI configured with appropriate permissions
Docker installed and running
UV package manager installed
Python 3.12+
Access to Amazon Bedrock AgentCore Runtime
IAM role with BedrockAgentCoreFullAccess policy and appropriate trust policy (see Authentication Setup)

Environment Configuration

The SRE Agent uses environment variables for configuration. These are read from .env files in the appropriate directories:

CLI Testing: Environment variables are read from sre_agent/.env
Container Building: Environment variables are read from deployment/.env
Docker Platform: Local builds use Dockerfile.x86_64 (linux/amd64), AgentCore deployments use Dockerfile (linux/arm64)

Required Environment Variables

Create the appropriate .env files with these variables:

For sre_agent/.env (CLI testing and local container runs):

GATEWAY_ACCESS_TOKEN=your_gateway_access_token
LLM_PROVIDER=bedrock
DEBUG=false
# If using Anthropic provider, also add:
# ANTHROPIC_API_KEY=sk-ant-your-key-here

For deployment/.env (container building and deployment):

GATEWAY_ACCESS_TOKEN=your_gateway_access_token
ANTHROPIC_API_KEY=sk-ant-your-key-here
# These can be overridden by environment variables during build/deploy

Note: When using --env-file, all required variables should be in the .env file. Use -e only to override specific variables from the .env file.

Deployment Sequence

Phase 1: Local Testing with CLI

First, test the SRE agent locally using the command-line interface to ensure it works correctly.

1.1 Setup Environment

Create and configure your environment files:

# Setup CLI environment file
cp sre_agent/.env.example sre_agent/.env
# Edit sre_agent/.env with your configuration

Note: Environment variables can be overridden at runtime, but having .env files ensures consistent configuration.

1.2 Test CLI with Bedrock (Default)

# Test with default Bedrock provider
uv run sre-agent --prompt "list the pods in my infrastructure"

# Test with debug output enabled
uv run sre-agent --prompt "list the pods in my infrastructure" --debug

# Test with specific provider
uv run sre-agent --prompt "list the pods in my infrastructure" --provider bedrock --debug

1.3 Test CLI with Anthropic Provider

# Ensure ANTHROPIC_API_KEY is set in your .env file, then:
uv run sre-agent --prompt "list the pods in my infrastructure" --provider anthropic --debug

Expected Output: You should see the agent processing your request, routing to appropriate specialized agents, and returning infrastructure information.

Phase 2: Local Container Testing

Once CLI testing is successful, build and test the agent as a container locally.

2.1 Build Local Container

The build script accepts an optional ECR repository name and uses different Dockerfiles based on the target platform:

Local builds (LOCAL_BUILD=true): Uses Dockerfile.x86_64 for linux/amd64 platform
AgentCore builds (default): Uses Dockerfile for linux/arm64 platform (required by AgentCore)

# Build container for local testing with custom name
LOCAL_BUILD=true ./deployment/build_and_deploy.sh my_custom_sre_agent

# View help for all options
./deployment/build_and_deploy.sh --help

2.2 Test Local Container with Bedrock

Run the container locally with default Bedrock provider:

# Using .env file from sre_agent directory (recommended)
# Ensure LLM_PROVIDER=bedrock is set in sre_agent/.env
docker run -p 8080:8080 --env-file sre_agent/.env my_custom_sre_agent:latest

# Alternative: with explicit environment variables (if not using .env file)
docker run -p 8080:8080 \
  -v ~/.aws:/root/.aws:ro \
  -e AWS_PROFILE=default \
  -e GATEWAY_ACCESS_TOKEN=your_token \
  -e LLM_PROVIDER=bedrock \
  my_custom_sre_agent:latest

# With debug enabled (overrides DEBUG setting from .env file)
docker run -p 8080:8080 --env-file sre_agent/.env -e DEBUG=true my_custom_sre_agent:latest

Note: The container name matches the ECR repository name you specified during build.

2.3 Test Local Container with Anthropic

# Using .env file (ensure LLM_PROVIDER=anthropic is set in sre_agent/.env)
docker run -p 8080:8080 --env-file sre_agent/.env my_custom_sre_agent:latest

# With debug enabled (override DEBUG setting from .env file)
docker run -p 8080:8080 \
  --env-file sre_agent/.env \
  -e DEBUG=true \
  my_custom_sre_agent:latest

Note: Ensure both LLM_PROVIDER=anthropic and ANTHROPIC_API_KEY are set in your sre_agent/.env file when using the anthropic provider.

2.4 Test Container with curl

Test the running container:

# Basic test
curl -X POST http://localhost:8080/invocations \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "prompt": "list the pods in my infrastructure"
    }
  }'

# Health check
curl http://localhost:8080/ping

Expected Output: The container should respond with JSON containing the agent's response.

Phase 3: Amazon Bedrock AgentCore Runtime Deployment

Once local container testing is successful, deploy to AgentCore.

3.1 Deploy to AgentCore with Bedrock

# Deploy with custom repository name and default settings (reads from deployment/.env)
./deployment/build_and_deploy.sh my_custom_sre_agent

# Deploy with debug enabled (environment variable override)
DEBUG=true ./deployment/build_and_deploy.sh my_custom_sre_agent

# Deploy with specific provider
LLM_PROVIDER=bedrock DEBUG=true ./deployment/build_and_deploy.sh my_custom_sre_agent

3.2 Deploy to AgentCore with Anthropic

# Deploy with Anthropic provider (ensure ANTHROPIC_API_KEY is in deployment/.env)
LLM_PROVIDER=anthropic ./deployment/build_and_deploy.sh my_custom_sre_agent

# Deploy with Anthropic and debug enabled
DEBUG=true LLM_PROVIDER=anthropic ./deployment/build_and_deploy.sh my_custom_sre_agent

# Override API key via environment variable
LLM_PROVIDER=anthropic ANTHROPIC_API_KEY=sk-ant-your-key ./deployment/build_and_deploy.sh my_custom_sre_agent

Build Script Usage:

# View all available options
./deployment/build_and_deploy.sh --help

# The script accepts one optional argument: ECR repository name
# Default repository name is 'sre_agent'
# Note: Use underscores (_) instead of hyphens (-) in repository names

Expected Output: The script will build, push to ECR, and deploy to AgentCore Runtime.

3.3 Test AgentCore Deployment

Test the deployed agent using the invoke script:

# Test deployed agent
uv run python deployment/invoke_agent_runtime.py \
  --prompt "list the pods in my infrastructure"

# Test with custom runtime ARN
uv run python deployment/invoke_agent_runtime.py \
  --prompt "list the pods in my infrastructure" \
  --runtime-arn "arn:aws:bedrock-agentcore:us-east-1:123456789012:runtime/your-runtime-id"

Environment Variables Reference

Core Configuration

Variable	Description	Default	Required
`GATEWAY_ACCESS_TOKEN`	Gateway authentication token	-	Yes
`BACKEND_API_KEY`	Backend API key for credential provider	-	Yes (gateway setup)
`LLM_PROVIDER`	Language model provider	`bedrock`	No
`ANTHROPIC_API_KEY`	Anthropic API key	-	Only for anthropic provider
`DEBUG`	Enable debug logging and traces	`false`	No

AWS Configuration

Variable	Description	Default	Required
`AWS_REGION`	AWS region for deployment	`us-east-1`	No
`AWS_PROFILE`	AWS profile to use	-	No
`RUNTIME_NAME`	AgentCore runtime name	ECR repo name	No

Build Script Configuration

Variable	Description	Default	Notes
`LOCAL_BUILD`	Build for local testing only	`false`	Uses Dockerfile.x86_64 when true
`PLATFORM`	Target platform	`arm64`	AgentCore requires arm64, use x86_64 for local
`ECR_REPO_NAME`	ECR repository name	`sre_agent`	Can be passed as command line argument

Debug Mode Usage

CLI Debug Mode

# Enable debug with --debug flag
uv run sre-agent --prompt "your query" --debug

# Or with environment variable
DEBUG=true uv run sre-agent --prompt "your query"

Container Debug Mode

# Local container with debug (overrides DEBUG setting in .env file)
docker run -p 8080:8080 --env-file sre_agent/.env -e DEBUG=true my_custom_sre_agent:latest

# AgentCore deployment with debug
DEBUG=true ./deployment/build_and_deploy.sh my_custom_sre_agent

Debug Output Examples

Without Debug Mode:

🤖 Multi-Agent System: Processing...
🧭 Supervisor: Routing to kubernetes_agent
🔧 Kubernetes Agent:
   💡 Full Response: Here are the pods in your infrastructure...
💬 Final Response: I found 5 pods running in your infrastructure...

With Debug Mode:

🤖 Multi-Agent System: Processing...

MCP tools loaded: 12
  - kubernetes-list-pods: List all pods in the cluster...
  - kubernetes-get-pod: Get details of a specific pod...

🧭 Supervisor: Routing to kubernetes_agent
🔧 Kubernetes Agent:
   🔍 DEBUG: agent_messages = 3
   📋 Found 3 trace messages:
      1. AIMessage: I'll help you list the pods...
   📞 Calling tools:
      kubernetes-list-pods(
        namespace=None
      ) [id: call_123]
   🛠️  kubernetes-list-pods [id: call_123]:
      {"pods": [...]}
   💡 Full Response: Here are the pods in your infrastructure...
💬 Final Response: I found 5 pods running in your infrastructure...

Provider Configuration

Using Amazon Bedrock (Default)

# CLI (reads from sre_agent/.env)
uv run sre-agent --provider bedrock --prompt "your query"

# Container (reads LLM_PROVIDER=bedrock from sre_agent/.env)
docker run -p 8080:8080 --env-file sre_agent/.env my_custom_sre_agent:latest

# Deployment (reads from deployment/.env, can override via environment variable)
LLM_PROVIDER=bedrock ./deployment/build_and_deploy.sh my_custom_sre_agent

Using Anthropic Claude

# CLI (reads LLM_PROVIDER and ANTHROPIC_API_KEY from sre_agent/.env)
uv run sre-agent --provider anthropic --prompt "your query"

# Container (reads LLM_PROVIDER=anthropic and ANTHROPIC_API_KEY from sre_agent/.env)
docker run -p 8080:8080 --env-file sre_agent/.env my_custom_sre_agent:latest

# Deployment (reads from deployment/.env, can override via environment variable)
LLM_PROVIDER=anthropic ./deployment/build_and_deploy.sh my_custom_sre_agent

# Override API key via environment variable (if not in deployment/.env)
LLM_PROVIDER=anthropic ANTHROPIC_API_KEY=sk-ant-xxx ./deployment/build_and_deploy.sh my_custom_sre_agent

Troubleshooting

Common Issues

Gateway Token Issues

# Verify token is set
echo $GATEWAY_ACCESS_TOKEN
# Or check .env file
cat sre_agent/.env

Provider Configuration

# For Anthropic, ensure API key is valid
echo $ANTHROPIC_API_KEY
# Test API key with a simple call

Debug Information

# Enable debug mode to see detailed logs
DEBUG=true uv run sre-agent --prompt "test"

Container Issues

# Check container logs
docker logs <container_id>
# Run with debug
docker run -e DEBUG=true ... my_custom_sre_agent:latest

Verification Steps

CLI Working: Agent responds to queries locally
Container Working: Container responds to curl requests
AgentCore Working: Deployed agent responds via invoke script

Quick Start: Copy-Paste Command Sequence

For a complete deployment using my_custom_sre_agent, copy and paste these commands in sequence:

1. Build Local Container

LOCAL_BUILD=true ./deployment/build_and_deploy.sh my_custom_sre_agent

2. Test Local Container (Bedrock)

docker run -p 8080:8080 --env-file sre_agent/.env my_custom_sre_agent:latest

3. Test with curl

curl -X POST http://localhost:8080/invocations \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "prompt": "list the pods in my infrastructure"
    }
  }'

4. Deploy to AgentCore

./deployment/build_and_deploy.sh my_custom_sre_agent

5. Test AgentCore Deployment

uv run python deployment/invoke_agent_runtime.py \
  --prompt "list the pods in my infrastructure"

Best Practices

Development: Always test locally first
Environment Files: Use .env files for consistent configuration
Debug Mode: Enable debug mode when troubleshooting
Provider Testing: Test both Bedrock and Anthropic providers if using both
Incremental Deployment: Deploy to staging environment before production

13 KiB Raw Permalink Blame History