236 lines
7.9 KiB
Python
Raw Permalink Normal View History

fix(SRE Agent)- Deploy SRE Agent on Amazon Bedrock AgentCore Runtime with Enhanced Architecture (#158) * feat: Deploy SRE agent on Amazon Bedrock AgentCore Runtime - Add agent_runtime.py with FastAPI endpoints for AgentCore compatibility - Create Dockerfile for ARM64-based containerization - Add deployment scripts for automated ECR push and AgentCore deployment - Update backend API URLs from placeholders to actual endpoints - Update gateway configuration for production use - Add dependencies for AgentCore runtime support Implements #143 * chore: Add deployment artifacts to .gitignore - Add deployment/.sre_agent_uri, deployment/.env, and deployment/.agent_arn to .gitignore - Remove already tracked deployment artifacts from git * feat: Make ANTHROPIC_API_KEY optional in deployment - Update deploy_agent_runtime.py to conditionally include ANTHROPIC_API_KEY - Show info message when using Amazon Bedrock as provider - Update .env.example to clarify ANTHROPIC_API_KEY is optional - Only include ANTHROPIC_API_KEY in environment variables if it exists * fix: Use uv run python instead of python in build script - Update build_and_deploy.sh to use 'uv run python' for deployment - Change to parent directory to ensure uv environment is available - Fixes 'python: command not found' error during deployment * refactor: Improve deployment script structure and create .env symlink - Flatten nested if-else blocks in deploy_agent_runtime.py for better readability - Add 10-second sleep after deletion to ensure cleanup completes - Create symlink from deployment/.env to sre_agent/.env to avoid duplication - Move time import to top of file with other imports * feat: Add debug mode support and comprehensive deployment guide Add --debug command line flag and DEBUG environment variable support: - Created shared logging configuration module - Updated CLI and runtime to support --debug flag - Made debug traces conditional on DEBUG environment variable - Added debug mode for container and AgentCore deployments Enhanced build and deployment script: - Added command line argument for ECR repository name - Added help documentation and usage examples - Added support for local builds (x86_64) vs AgentCore builds (arm64) - Added environment variable pass-through for DEBUG, LLM_PROVIDER, ANTHROPIC_API_KEY Created comprehensive deployment guide: - Step-by-step instructions from local testing to production - Docker platform documentation (x86_64 vs arm64) - Environment variable configuration with .env file usage - Debug mode examples and troubleshooting guide - Provider configuration for Bedrock and Anthropic Updated README with AgentCore Runtime deployment section and documentation links. * docs: Update SRE Agent README with deployment flow diagram and fix directory reference - Fix reference from 04-SRE-agent to SRE-agent in README - Add comprehensive flowchart showing development to production deployment flow - Update overview to mention Amazon Bedrock AgentCore Runtime deployment - Remove emojis from documentation for professional appearance * docs: Replace mermaid diagram with ASCII step-by-step flow diagram - Change from block-style mermaid diagram to ASCII flow diagram - Show clear step-by-step progression from development to production - Improve readability with structured boxes and arrows - Minor text improvements for clarity * feat: Implement comprehensive prompt management system and enhance deployment guide - Create centralized prompt template system with external files in config/prompts/ - Add PromptLoader utility class with LRU caching and template variable substitution - Integrate PromptConfig into SREConstants for centralized configuration management - Update all agents (nodes, supervisor, output_formatter) to use prompt loader - Replace 150+ lines of hardcoded prompts with modular, maintainable template system - Enhance deployment guide with consistent naming (my_custom_sre_agent) throughout - Add quick-start copy-paste command sequence for streamlined deployment - Improve constants system with comprehensive model, AWS, timeout, and prompt configs - Add architectural assessment document to .gitignore for local analysis - Run black formatting across all updated Python files * docs: Consolidate deployment and security documentation - Rename deployment-and-security.md to security.md and remove redundant deployment content - Enhance security.md with comprehensive production security guidelines including: - Authentication and authorization best practices - Encryption and data protection requirements - Operational security monitoring and logging - Input validation and prompt security measures - Infrastructure security recommendations - Compliance and governance frameworks - Update README.md to reference new security.md file - Eliminate redundancy between deployment-guide.md and deployment-and-security.md - Improve documentation organization with clear separation of concerns * config: Replace hardcoded endpoints with placeholder domains - Update OpenAPI specifications to use placeholder domain 'your-backend-domain.com' - k8s_api.yaml: mcpgateway.ddns.net:8011 -> your-backend-domain.com:8011 - logs_api.yaml: mcpgateway.ddns.net:8012 -> your-backend-domain.com:8012 - metrics_api.yaml: mcpgateway.ddns.net:8013 -> your-backend-domain.com:8013 - runbooks_api.yaml: mcpgateway.ddns.net:8014 -> your-backend-domain.com:8014 - Update agent configuration to use placeholder AgentCore gateway endpoint - agent_config.yaml: Replace specific gateway ID with 'your-agentcore-gateway-endpoint' - Improve security by removing hardcoded production endpoints from repository - Enable template-based configuration that users can customize during setup - Align with existing documentation patterns for placeholder domain replacement
2025-07-27 15:05:03 -04:00
#!/usr/bin/env python3
import logging
from functools import lru_cache
from pathlib import Path
from typing import Dict, Any, Optional
# Configure logging with basicConfig
logging.basicConfig(
level=logging.INFO, # Set the log level to INFO
# Define log message format
format="%(asctime)s,p%(process)s,{%(filename)s:%(lineno)d},%(levelname)s,%(message)s",
)
logger = logging.getLogger(__name__)
class PromptLoader:
"""Utility class for loading and managing prompt templates."""
def __init__(self, prompts_dir: Optional[str] = None):
"""Initialize the prompt loader.
Args:
prompts_dir: Directory containing prompt files. If None, uses default relative path.
"""
if prompts_dir:
self.prompts_dir = Path(prompts_dir)
else:
# Default to config/prompts relative to this file
self.prompts_dir = Path(__file__).parent / "config" / "prompts"
logger.debug(f"PromptLoader initialized with prompts_dir: {self.prompts_dir}")
@lru_cache(maxsize=32)
def _load_prompt_file(self, filename: str) -> str:
"""Load a prompt file with caching.
Args:
filename: Name of the prompt file to load
Returns:
Content of the prompt file
Raises:
FileNotFoundError: If the prompt file doesn't exist
IOError: If there's an error reading the file
"""
filepath = self.prompts_dir / filename
if not filepath.exists():
raise FileNotFoundError(f"Prompt file not found: {filepath}")
try:
with open(filepath, "r", encoding="utf-8") as f:
content = f.read().strip()
logger.debug(f"Loaded prompt file: {filename}")
return content
except Exception as e:
logger.error(f"Error loading prompt file {filename}: {e}")
raise IOError(f"Failed to read prompt file {filename}: {e}")
def load_prompt(self, prompt_name: str) -> str:
"""Load a prompt by name.
Args:
prompt_name: Name of the prompt (without .txt extension)
Returns:
Content of the prompt file
"""
filename = f"{prompt_name}.txt"
return self._load_prompt_file(filename)
def load_template(self, template_name: str, **kwargs) -> str:
"""Load a prompt template and substitute variables.
Args:
template_name: Name of the template (without .txt extension)
**kwargs: Variables to substitute in the template
Returns:
Template content with variables substituted
"""
template_content = self.load_prompt(template_name)
try:
return template_content.format(**kwargs)
except KeyError as e:
logger.error(f"Missing template variable {e} in template {template_name}")
raise ValueError(f"Missing required template variable: {e}")
except Exception as e:
logger.error(f"Error formatting template {template_name}: {e}")
raise ValueError(f"Error formatting template {template_name}: {e}")
def get_agent_prompt(
self, agent_type: str, agent_name: str, agent_description: str
) -> str:
"""Combine base agent prompt with agent-specific prompt.
Args:
agent_type: Type of agent (kubernetes, logs, metrics, runbooks)
agent_name: Display name of the agent
agent_description: Description of the agent's capabilities
Returns:
Complete system prompt for the agent
"""
try:
# Load base prompt template
base_prompt = self.load_template(
"agent_base_prompt",
agent_name=agent_name,
agent_description=agent_description,
)
# Load agent-specific prompt if it exists
try:
agent_specific_prompt = self.load_prompt(f"{agent_type}_agent_prompt")
combined_prompt = f"{base_prompt}\n\n{agent_specific_prompt}"
except FileNotFoundError:
logger.warning(f"No specific prompt found for agent type: {agent_type}")
combined_prompt = base_prompt
return combined_prompt
except Exception as e:
logger.error(f"Error building agent prompt for {agent_type}: {e}")
raise
def get_supervisor_aggregation_prompt(
self,
is_plan_based: bool,
query: str,
agent_results: str,
auto_approve_plan: bool = False,
**kwargs,
) -> str:
"""Get supervisor aggregation prompt based on context.
Args:
is_plan_based: Whether this is a plan-based aggregation
query: Original user query
agent_results: JSON string of agent results
auto_approve_plan: Whether to include auto-approve instruction
**kwargs: Additional template variables (e.g., current_step, total_steps, plan)
Returns:
Formatted aggregation prompt
"""
try:
# Determine auto-approve instruction
auto_approve_instruction = ""
if auto_approve_plan:
auto_approve_instruction = "\n\nIMPORTANT: Do not ask any follow-up questions or suggest that the user can ask for more details. Provide a complete, conclusive response."
template_vars = {
"query": query,
"agent_results": agent_results,
"auto_approve_instruction": auto_approve_instruction,
**kwargs,
}
if is_plan_based:
return self.load_template(
"supervisor_plan_aggregation", **template_vars
)
else:
return self.load_template(
"supervisor_standard_aggregation", **template_vars
)
except Exception as e:
logger.error(f"Error building supervisor aggregation prompt: {e}")
raise
def get_executive_summary_prompts(
self, query: str, results_text: str
) -> tuple[str, str]:
"""Get system and user prompts for executive summary generation.
Args:
query: Original user query
results_text: Formatted investigation results
Returns:
Tuple of (system_prompt, user_prompt)
"""
try:
system_prompt = self.load_prompt("executive_summary_system")
user_prompt = self.load_template(
"executive_summary_user_template",
query=query,
results_text=results_text,
)
return system_prompt, user_prompt
except Exception as e:
logger.error(f"Error building executive summary prompts: {e}")
raise
def list_available_prompts(self) -> list[str]:
"""List all available prompt files.
Returns:
List of prompt names (without .txt extension)
"""
try:
prompt_files = list(self.prompts_dir.glob("*.txt"))
return [f.stem for f in prompt_files]
except Exception as e:
logger.error(f"Error listing prompt files: {e}")
return []
# Convenience instance for easy import
prompt_loader = PromptLoader()
# Convenience functions for backward compatibility
def load_prompt(prompt_name: str) -> str:
"""Load a prompt by name using the default loader."""
return prompt_loader.load_prompt(prompt_name)
def load_template(template_name: str, **kwargs) -> str:
"""Load and format a template using the default loader."""
return prompt_loader.load_template(template_name, **kwargs)
def get_agent_prompt(agent_type: str, agent_name: str, agent_description: str) -> str:
"""Get complete agent prompt using the default loader."""
return prompt_loader.get_agent_prompt(agent_type, agent_name, agent_description)