Dheeraj Oruganty e346e83bf1
fix(02-use-cases): SRE-Agent Deployment (#179)
* Add missing credential_provider_name parameter to config.yaml.example

* Fix get_config function to properly parse YAML values with inline comments

* Enhanced get_config to prevent copy-paste whitespace errors in AWS identifiers

* Improve LLM provider configuration and error handling with bedrock as default

* Add OpenAPI templating system and fix hardcoded regions

* Add backend template build to Readme

* delete old yaml files

* Fix Cognito setup with automation script and missing domain creation steps

* docs: Add EC2 instance port configuration documentation

- Document required inbound ports (443, 8011-8014)
- Include SSL/TLS security requirements
- Add AWS security group best practices
- Provide port usage summary table

* docs: Add hyperlinks to prerequisites in README

- Link EC2 port configuration documentation
- Link IAM role authentication setup
- Improve navigation to detailed setup instructions

* docs: Add BACKEND_API_KEY to configuration documentation

- Document gateway environment variables section
- Add BACKEND_API_KEY requirement for credential provider
- Include example .env file format for gateway directory
- Explain usage in create_gateway.sh script

* docs: Add BACKEND_API_KEY to deployment guide environment variables

- Include BACKEND_API_KEY in environment variables reference table
- Mark as required for gateway setup
- Provide quick reference alongside other required variables

* docs: Add BedrockAgentCoreFullAccess policy and trust policy documentation

- Document AWS managed policy BedrockAgentCoreFullAccess
- Add trust policy requirements for bedrock-agentcore.amazonaws.com
- Reorganize IAM permissions for better clarity
- Remove duplicate trust policy section
- Add IAM role requirement to deployment prerequisites

* docs: Document role_name field in gateway config example

- Explain that role_name is used to create and manage the gateway
- Specify BedrockAgentCoreFullAccess policy requirement
- Note trust policy requirement for bedrock-agentcore.amazonaws.com
- Improve clarity for gateway configuration setup

* docs: Add AWS IP address ranges for production security enhancement

- Document AWS IP ranges JSON download for restricting access
- Reference official AWS documentation for IP address ranges
- Provide security alternatives to 0.0.0.0/0 for production
- Include examples of restricted security group configurations
- Enable egress filtering and region-specific access control

* style: Format Python code with black

- Reformat 14 Python files for consistent code style
- Apply PEP 8 formatting standards
- Improve code readability and maintainability

* docs: Update SRE agent prerequisites and setup documentation

- Convert prerequisites section to markdown table format
- Add SSL certificate provider examples (no-ip.com, letsencrypt.org)
- Add Identity Provider (IDP) requirement with setup_cognito.sh reference
- Clarify that all prerequisites must be completed before setup
- Add reference to domain name and cert paths needed for BACKEND_DOMAIN
- Remove Managing OpenAPI Specifications section (covered in use-case setup)
- Add Deployment Guide link to Development to Production section

Addresses issues #171 and #174

* fix: Replace 'AWS Bedrock' with 'Amazon Bedrock' in SRE agent files

- Updated error messages in llm_utils.py
- Updated comments in both .env.example files
- Ensures consistent naming convention across SRE agent codebase

---------

Co-authored-by: dheerajoruganty <dheo@amazon.com>
Co-authored-by: Amit Arora <aroraai@amazon.com>
2025-08-01 13:24:58 -04:00
..
2025-07-21 10:45:13 -04:00
2025-07-21 10:45:13 -04:00
2025-07-21 10:45:13 -04:00
2025-07-21 10:45:13 -04:00
2025-07-21 10:45:13 -04:00

Backend Demo Infrastructure

This directory contains the complete demo backend infrastructure for SRE Agent testing and development.

📁 Structure

backend/
├── config_utils.py               # Configuration utilities
├── data/                         # Organized fake data
│   ├── k8s_data/                # Kubernetes mock data
│   ├── logs_data/               # Application logs
│   ├── metrics_data/            # Performance metrics
│   └── runbooks_data/           # Operational procedures
├── openapi_specs/               # API specifications
│   ├── k8s_api.yaml            # Kubernetes API spec
│   ├── logs_api.yaml           # Logs API spec
│   ├── metrics_api.yaml        # Metrics API spec
│   └── runbooks_api.yaml       # Runbooks API spec
├── servers/                     # Mock API implementations
│   ├── k8s_server.py           # Kubernetes API server
│   ├── logs_server.py          # Logs API server
│   ├── metrics_server.py       # Metrics API server
│   ├── runbooks_server.py      # Runbooks API server
│   ├── run_all_servers.py      # Start all servers
│   └── stop_servers.py         # Stop all servers
└── scripts/                    # Operational scripts
    ├── start_demo_backend.sh   # Simplified startup
    └── stop_demo_backend.sh    # Simplified shutdown

🚀 Quick Start

# Start all demo servers with simple Python HTTP servers
./scripts/start_demo_backend.sh

Advanced Startup (Full FastAPI servers)

# Start full-featured servers with FastAPI
cd servers
python run_all_servers.py

🌐 API Endpoints

When running, the demo backend provides these endpoints:

📊 Data Organization

K8s Data (data/k8s_data/)

  • deployments.json - Deployment status and configurations
  • pods.json - Pod states and resource usage
  • events.json - Cluster events and warnings

Logs Data (data/logs_data/)

  • application_logs.json - Application log entries
  • error_logs.json - Error-specific log entries

Metrics Data (data/metrics_data/)

  • performance_metrics.json - Response times, throughput
  • resource_metrics.json - CPU, memory, disk usage

Runbooks Data (data/runbooks_data/)

  • incident_playbooks.json - Incident response procedures
  • troubleshooting_guides.json - Step-by-step guides

🔧 Server Implementations

Simple HTTP Servers (Default)

Basic Python http.server implementations that serve JSON data directly from files.

FastAPI Servers (Advanced)

Full-featured FastAPI servers with:

  • OpenAPI documentation
  • Request validation
  • Response schemas
  • Health endpoints

📋 OpenAPI Specifications

Complete OpenAPI 3.0 specifications for all APIs:

  • Endpoint definitions
  • Request/response schemas
  • Authentication requirements
  • Example data

🛑 Stopping Services

# Simple method
./scripts/stop_demo_backend.sh

# Advanced method  
cd servers
python stop_servers.py

🧪 Testing

Test individual APIs:

# Test K8s API
curl http://localhost:8001/health

# Test with specific endpoints
curl http://localhost:8001/api/v1/namespaces/production/pods
curl http://localhost:8002/api/v1/logs/search?query=error

⚙️ Configuration

The backend uses realistic data scenarios including:

  • Failed database pods
  • Memory pressure warnings
  • Performance degradation patterns
  • Common troubleshooting procedures

This provides a comprehensive testing environment for the SRE Agent system.