rohillasandeep 01246a98b2
Configuration Management Fixes (#223)
* feat: Add AWS Operations Agent with AgentCore Runtime

- Complete rewrite of AWS Operations Agent using Amazon Bedrock AgentCore
- Added comprehensive deployment scripts for DIY and SDK runtime modes
- Implemented OAuth2/PKCE authentication with Okta integration
- Added MCP (Model Context Protocol) tool support for AWS service operations
- Sanitized all sensitive information (account IDs, domains, client IDs) with placeholders
- Added support for 17 AWS services: EC2, S3, Lambda, CloudFormation, IAM, RDS, CloudWatch, Cost Explorer, ECS, EKS, SNS, SQS, DynamoDB, Route53, API Gateway, SES, Bedrock, SageMaker
- Includes chatbot client, gateway management scripts, and comprehensive testing
- Ready for public GitHub with security-cleared configuration files

Security: All sensitive values replaced with <YOUR_AWS_ACCOUNT_ID>, <YOUR_OKTA_DOMAIN>, <YOUR_OKTA_CLIENT_ID> placeholders

* Update AWS Operations Agent architecture diagram

* feat: Enhance AWS Operations Agent with improved testing and deployment

- Update README with new local container testing approach using run-*-local-container.sh scripts
- Replace deprecated SAM-based MCP Lambda deployment with ZIP-based deployment
- Add no-cache flag to Docker builds to ensure clean builds
- Update deployment scripts to use consolidated configuration files
- Add comprehensive cleanup scripts for all deployment components
- Improve error handling and credential validation in deployment scripts
- Add new MCP tool deployment using ZIP packaging instead of Docker containers
- Update configuration management to use dynamic-config.yaml structure
- Add local testing capabilities with containerized agents
- Remove outdated test scripts and replace with interactive chat client approach

* fix: Update IAM policy configurations

- Update bac-permissions-policy.json with enhanced permissions
- Update bac-trust-policy.json for improved trust relationships

* fix: Update Docker configurations for agent runtimes

- Update Dockerfile.diy with improved container configuration
- Update Dockerfile.sdk with enhanced build settings

* fix: Update OAuth iframe flow configuration

- Update iframe-oauth-flow.html with improved OAuth handling

* feat: Update AWS Operations Agent configuration and cleanup

- Update IAM permissions policy with enhanced access controls
- Update IAM trust policy with improved security conditions
- Enhance OAuth iframe flow with better UX and error handling
- Improve chatbot client with enhanced local testing capabilities
- Remove cache files and duplicate code for cleaner repository

* docs: Add architecture diagrams and update README

- Add architecture-2.jpg and flow.jpg diagrams for better visualization
- Update README.md with enhanced documentation and diagrams

* Save current work before resolving merge conflicts

* Keep AWS-operations-agent changes (local version takes precedence)

* Fix: Remove merge conflict markers from AWS-operations-agent files - restore clean version

* Fix deployment and cleanup script issues

Major improvements and fixes:

Configuration Management:
- Fix role assignment in gateway creation (use bac-execution-role instead of Lambda role)
- Add missing role_arn cleanup in MCP tool deletion script
- Fix OAuth provider deletion script configuration clearing
- Improve memory deletion script to preserve quote consistency
- Add Lambda invoke permissions to bac-permissions-policy.json

Script Improvements:
- Reorganize deletion scripts: 11-delete-oauth-provider.sh, 12-delete-memory.sh, 13-cleanup-everything.sh
- Fix interactive prompt handling in cleanup scripts (echo -e format)
- Add yq support with sed fallbacks for better YAML manipulation
- Remove obsolete 04-deploy-mcp-tool-lambda-zip.sh script

Architecture Fixes:
- Correct gateway role assignment to use runtime.role_arn (bac-execution-role)
- Ensure proper role separation between gateway and Lambda execution
- Fix configuration cleanup to clear all dynamic config fields consistently

Documentation:
- Update README with clear configuration instructions
- Maintain security best practices with placeholder values
- Add comprehensive deployment and cleanup guidance

These changes address systematic issues with cleanup scripts, role assignments,
and configuration management while maintaining security best practices.

* Update README.md with comprehensive documentation

Enhanced documentation includes:
- Complete project structure with 75 files
- Step-by-step deployment guide with all 13 scripts
- Clear configuration instructions with security best practices
- Dual agent architecture documentation (DIY + SDK)
- Authentication flow and security implementation details
- Troubleshooting guide and operational procedures
- Local testing and container development guidance
- Tool integration and MCP protocol documentation

The README now provides complete guidance for deploying and operating
the AWS Support Agent with Amazon Bedrock AgentCore system.

---------

Co-authored-by: name <alias@amazon.com>
2025-08-09 13:51:24 -07:00

234 lines
10 KiB
Python
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

#!/usr/bin/env python3
# ============================================================================
# IMPORTS
# ============================================================================
import boto3
import time
import sys
import os
import yaml
# ============================================================================
# CONFIGURATION
# ============================================================================
# Add project root to path for shared config manager
project_root = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.append(project_root)
from shared.config_manager import AgentCoreConfigManager
# ============================================================================
# HELPER FUNCTIONS
# ============================================================================
def update_config_with_arns(config_manager, runtime_arn, endpoint_arn):
"""Update dynamic configuration with new ARNs"""
print(f"\n📝 Updating dynamic configuration with new DIY runtime ARN...")
try:
# Update dynamic configuration
updates = {
"runtime": {
"diy_agent": {
"arn": runtime_arn
}
}
}
if endpoint_arn:
updates["runtime"]["diy_agent"]["endpoint_arn"] = endpoint_arn
config_manager.update_dynamic_config(updates)
print(" ✅ Dynamic config updated with new DIY runtime ARN")
except Exception as config_error:
print(f" ⚠️ Error updating config: {config_error}")
# Initialize configuration manager
config_manager = AgentCoreConfigManager()
# Get configuration values
base_config = config_manager.get_base_settings()
merged_config = config_manager.get_merged_config() # For runtime values that may be dynamic
oauth_config = config_manager.get_oauth_settings()
# Extract configuration values
REGION = base_config['aws']['region']
ROLE_ARN = base_config['runtime']['role_arn']
AGENT_RUNTIME_NAME = base_config['runtime']['diy_agent']['name']
ECR_URI = merged_config['runtime']['diy_agent']['ecr_uri'] # ECR URI is dynamic
# Okta configuration
OKTA_DOMAIN = oauth_config['domain']
OKTA_AUDIENCE = oauth_config['jwt']['audience']
print("🚀 Creating AgentCore Runtime for DIY agent...")
print(f" 📝 Name: {AGENT_RUNTIME_NAME}")
print(f" 📦 Container: {ECR_URI}")
print(f" 🔐 Role: {ROLE_ARN}")
control_client = boto3.client('bedrock-agentcore-control', region_name=REGION)
print("🚀 Creating or updating AgentCore Runtime for DIY agent...")
print(f" 📝 Name: {AGENT_RUNTIME_NAME}")
print(f" 📦 Container: {ECR_URI}")
print(f" 🔐 Role: {ROLE_ARN}")
control_client = boto3.client('bedrock-agentcore-control', region_name=REGION)
# Check if runtime already exists
runtime_exists = False
existing_runtime_arn = None
existing_runtime_id = None
try:
# Try to list runtimes and find our DIY runtime
runtimes_response = control_client.list_agent_runtimes()
for runtime in runtimes_response.get('agentRuntimes', []):
if runtime.get('agentRuntimeName') == AGENT_RUNTIME_NAME:
runtime_exists = True
existing_runtime_arn = runtime.get('agentRuntimeArn')
existing_runtime_id = existing_runtime_arn.split('/')[-1] if existing_runtime_arn else None
print(f"✅ Found existing runtime: {existing_runtime_arn}")
break
except Exception as e:
print(f"⚠️ Error checking existing runtimes: {e}")
try:
if runtime_exists and existing_runtime_arn and existing_runtime_id:
# Runtime exists - ECR image has been updated, runtime will use it automatically
print(f"\n🔄 Runtime exists, updating with new container image...")
# Get existing endpoint ARN
existing_endpoint_arn = None
try:
endpoints_response = control_client.list_agent_runtime_endpoints(
agentRuntimeId=existing_runtime_id
)
for endpoint in endpoints_response.get('agentRuntimeEndpoints', []):
if endpoint.get('name') == 'DEFAULT':
existing_endpoint_arn = endpoint.get('agentRuntimeEndpointArn')
print(f"✅ Found existing endpoint: {existing_endpoint_arn}")
break
except Exception as e:
print(f"⚠️ Error getting endpoint ARN: {e}")
# Since ECR image is updated and runtime uses latest image,
# we just need to update the config with current ARNs
print(f"✅ ECR image updated - runtime will use new container on next invocation")
# Update config with existing ARNs
update_config_with_arns(config_manager, existing_runtime_arn, existing_endpoint_arn or "")
print(f"\n🎉 DIY Agent Updated Successfully!")
print(f"🏷️ Runtime ARN: {existing_runtime_arn}")
print(f"💾 ECR URI: {ECR_URI}")
print(f"🔗 Endpoint ARN: {existing_endpoint_arn or 'Not found'}")
print(f" Runtime will use updated container image automatically")
else:
# Runtime doesn't exist - create new runtime
print(f"\n🆕 Creating new runtime...")
response = control_client.create_agent_runtime(
agentRuntimeName=AGENT_RUNTIME_NAME,
agentRuntimeArtifact={
'containerConfiguration': {
'containerUri': ECR_URI
}
},
networkConfiguration={"networkMode": "PUBLIC"},
roleArn=ROLE_ARN,
authorizerConfiguration={
'customJWTAuthorizer': {
'discoveryUrl': oauth_config['jwt']['discovery_url'],
'allowedAudience': [OKTA_AUDIENCE]
}
}
)
runtime_arn = response['agentRuntimeArn']
runtime_id = runtime_arn.split('/')[-1]
print(f"✅ DIY AgentCore Runtime created!")
print(f"🏷️ ARN: {runtime_arn}")
print(f"🆔 Runtime ID: {runtime_id}")
print(f"\n⏳ Waiting for runtime to be READY...")
max_wait = 600 # 10 minutes
wait_time = 0
while wait_time < max_wait:
try:
status_response = control_client.get_agent_runtime(agentRuntimeId=runtime_id)
status = status_response.get('status')
print(f" 📊 Status: {status} ({wait_time}s)")
if status == 'READY':
print(f"✅ DIY Runtime is READY!")
# Create DEFAULT endpoint
print(f"\n🔗 Creating DEFAULT endpoint...")
try:
endpoint_response = control_client.create_agent_runtime_endpoint(
agentRuntimeId=runtime_id,
name="DEFAULT"
)
print(f"✅ DEFAULT endpoint created!")
print(f"🏷️ Endpoint ARN: {endpoint_response['agentRuntimeEndpointArn']}")
# Update config with new ARNs
update_config_with_arns(config_manager, runtime_arn, endpoint_response['agentRuntimeEndpointArn'])
except Exception as ep_error:
if "already exists" in str(ep_error):
print(f" DEFAULT endpoint already exists, getting existing endpoint ARN...")
try:
# Get the existing endpoint ARN
endpoints_response = control_client.list_agent_runtime_endpoints(agentRuntimeId=runtime_id)
for endpoint in endpoints_response.get('agentRuntimeEndpoints', []):
if endpoint.get('name') == 'DEFAULT':
endpoint_arn = endpoint.get('agentRuntimeEndpointArn')
print(f"🏷️ Found existing endpoint ARN: {endpoint_arn}")
update_config_with_arns(config_manager, runtime_arn, endpoint_arn)
break
else:
# Fallback: construct the endpoint ARN
endpoint_arn = f"{runtime_arn}/runtime-endpoint/DEFAULT"
print(f"🔧 Constructed endpoint ARN: {endpoint_arn}")
update_config_with_arns(config_manager, runtime_arn, endpoint_arn)
except Exception as list_error:
print(f"⚠️ Could not get endpoint ARN: {list_error}")
# Fallback: construct the endpoint ARN
endpoint_arn = f"{runtime_arn}/runtime-endpoint/DEFAULT"
print(f"🔧 Using constructed endpoint ARN: {endpoint_arn}")
update_config_with_arns(config_manager, runtime_arn, endpoint_arn)
else:
print(f"❌ Error creating endpoint: {ep_error}")
# Still update with just runtime ARN
update_config_with_arns(config_manager, runtime_arn, "")
break
elif status in ['FAILED', 'DELETING']:
print(f"❌ Runtime creation failed with status: {status}")
break
time.sleep(15)
wait_time += 15
except Exception as e:
print(f"❌ Error checking status: {e}")
break
if wait_time >= max_wait:
print(f"⚠️ Runtime creation taking longer than expected")
print(f"\n🧪 Test with:")
print(f" ARN: {runtime_arn}")
print(f" ID: {runtime_id}")
except Exception as e:
print(f"❌ Error creating/updating DIY runtime: {e}")
sys.exit(1)