Compliance & Disaster Recovery Labs

Master cloud compliance frameworks, implement disaster recovery strategies, and ensure business continuity with hands-on labs covering HIPAA, PCI-DSS, SOC 2, and multi-region DR solutions.

Cloud Compliance & DR Labs - Module 4

Build compliant cloud environments and implement robust disaster recovery solutions.

Lab 10: HIPAA-Compliant Healthcare Cloud
AWS Compliance / Expert
Scenario: Healthcare Data Platform Compliance
MedTech Solutions is building a cloud platform to store and process Protected Health Information (PHI). Implement a HIPAA-compliant AWS architecture including encryption at rest and in transit, access controls with detailed audit logging, Business Associate Agreement (BAA) eligible services only, data backup and recovery procedures, and incident response automation. The platform must support 10,000 healthcare providers while maintaining strict compliance.

Learning Objectives:

  • HIPAA Architecture: Design compliant infrastructure
  • Encryption: Implement end-to-end encryption
  • Access Control: Configure detailed IAM policies
  • Audit Logging: Enable comprehensive logging

📋 Step-by-Step Instructions

  1. Step 1: Configure HIPAA-Eligible Services
    🎯 Goal: Deploy EC2 instances with dedicated tenancy for HIPAA compliance

    📝 What is Dedicated Tenancy?
    Dedicated tenancy ensures your EC2 instances run on hardware that's not shared with other AWS customers. This provides physical isolation required for HIPAA compliance.

    💻 Launch Dedicated Instance:
    aws ec2 run-instances --tenancy dedicated --image-id ami-0c55b159cbfafe1f0 --instance-type t3.medium

    🔍 What happens:
    • AWS launches instance on dedicated hardware
    • Physical isolation from other customers
    • Meets HIPAA Physical Safeguards requirement
    • BAA agreement covers this service
    💡 Pro Tip: Always verify services are BAA-eligible before using them for PHI. Check AWS HIPAA Eligible Services list.
    📖 HIPAA Note: Dedicated tenancy costs more (~10% premium) but is required for workloads handling Protected Health Information (PHI).
  2. Step 2: Implement End-to-End Encryption
    🎯 Goal: Enable encryption at rest and in transit for all PHI data

    📝 Why Encryption Matters:
    HIPAA requires encryption of PHI both at rest and in transit. AWS KMS provides FIPS 140-2 validated encryption keys required for compliance.

    💻 Enable S3 Bucket Encryption:
    aws s3api put-bucket-encryption --bucket healthcare-phi-data --server-side-encryption-configuration '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"aws:kms"}}]}'

    💻 Enable RDS Encryption:
    aws rds create-db-instance --db-instance-identifier patient-db --storage-encrypted --kms-key-id alias/hipaa-kms-key

    💻 Create KMS Key for PHI:
    aws kms create-key --description "HIPAA PHI Encryption Key" --origin AWS_KMS

    🔍 Encryption Coverage:
    • S3: AES-256 encryption with KMS
    • RDS: Encryption at rest for all databases
    • EBS: All volumes encrypted by default
    • In Transit: TLS 1.2+ enforced on all endpoints
    💡 Best Practice: Use separate KMS keys for different data classifications (PHI, PII, internal). Enable automatic key rotation annually.
    Critical: Never store unencrypted PHI. Even temporary files and logs must be encrypted. This is a HIPAA violation.
  3. Step 3: Configure Strict Access Controls
    🎯 Goal: Implement least-privilege access with MFA enforcement

    📝 Access Control Requirements:
    HIPAA mandates role-based access control (RBAC) with unique user identification, automatic logoff, and audit controls. MFA is considered "addressable" but highly recommended.

    💻 Create Healthcare IAM Role:
    aws iam create-role --role-name HealthcareProviderRole --assume-role-policy-document file://trust-policy.json

    💻 Attach Least-Privilege Policy:
    aws iam put-role-policy --role-name HealthcareProviderRole --policy-name PHI-ReadOnly --policy-document file://phi-policy.json

    💻 Enforce MFA for Admin Access:
    aws iam create-policy --policy-name Require-MFA --policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Deny","Action":"*","Resource":"*","Condition":{"BoolIfExists":{"aws:MultiFactorAuthPresent":"false"}}}]}'

    💻 Configure SAML SSO:
    aws iam create-saml-provider --name HealthcareSSO --saml-metadata-document file://saml-metadata.xml

    🔍 Access Control Features:
    • Role-based access (doctors, nurses, billing)
    • MFA required for all admin operations
    • SSO integration with hospital directory
    • Automatic session timeout after 15 minutes
    💡 Security Tip: Use temporary credentials via STS AssumeRole instead of long-lived access keys. Rotate credentials every 90 days maximum.
    🎓 Exam Tip: Know the difference between "required" and "addressable" HIPAA specifications. MFA is addressable but often becomes required by risk analysis.
  4. Step 4: Enable Comprehensive Audit Logging
    🎯 Goal: Implement audit logging for all PHI access and system activities

    📝 Why Audit Logs are Critical:
    HIPAA requires audit controls to record and examine activity in systems containing PHI. Logs must be retained for 6 years minimum. Tampering with audit logs is a federal offense.

    💻 Configure CloudTrail with Validation:
    aws cloudtrail create-trail --name hipaa-audit-trail --s3-bucket-name hipaa-audit-logs --enable-log-file-validation --is-multi-region-trail

    💻 Start Logging:
    aws cloudtrail start-logging --name hipaa-audit-trail

    💻 Enable VPC Flow Logs:
    aws ec2 create-flow-logs --resource-type VPC --resource-ids vpc-xxxxx --traffic-type ALL --log-destination-type cloud-watch-logs --log-group-name /aws/vpc/flowlogs

    💻 Configure CloudWatch Log Group:
    aws logs create-log-group --log-group-name /aws/hipaa/application-logs --kms-key-id arn:aws:kms:region:account:key/key-id

    💻 Set Retention Policy (7 years for HIPAA):
    aws logs put-retention-policy --log-group-name /aws/hipaa/application-logs --retention-in-days 2557

    🔍 What Gets Logged:
    • All API calls via CloudTrail
    • Network traffic via VPC Flow Logs
    • PHI access attempts (successful + failed)
    • Configuration changes
    • Authentication events
    💡 Best Practice: Send all logs to a separate "audit" AWS account with restricted access. Use S3 Object Lock for immutability.
    Compliance: Log file validation prevents tampering. CloudTrail digest files prove log integrity using cryptographic hashing.
  5. Step 5: Implement Backup & Disaster Recovery
    🎯 Goal: Configure automated backups with cross-region replication

    📝 HIPAA Contingency Planning:
    HIPAA requires a contingency plan including data backup, disaster recovery, and emergency mode operations. Must test recovery procedures annually at minimum.

    💻 Enable Automated RDS Backups:
    aws rds modify-db-instance --db-instance-identifier patient-db --backup-retention-period 35 --preferred-backup-window "03:00-04:00"

    💻 Create Manual Snapshot:
    aws rds create-db-snapshot --db-instance-identifier patient-db --db-snapshot-identifier phi-backup-2024-11

    💻 Enable S3 Cross-Region Replication:
    aws s3api put-bucket-replication --bucket healthcare-phi-data --replication-configuration file://replication-config.json

    💻 Configure AWS Backup:
    aws backup create-backup-plan --backup-plan file://hipaa-backup-plan.json

    💻 Test Point-in-Time Recovery:
    aws rds restore-db-instance-to-point-in-time --source-db-instance-identifier patient-db --target-db-instance-identifier patient-db-restore --restore-time 2024-11-01T12:00:00Z

    🔍 Backup Strategy:
    • Automated daily backups retained 35 days
    • Manual snapshots for compliance milestones
    • Cross-region replication to DR region
    • Point-in-time recovery within 5 minutes
    • Quarterly restore testing documented
    💡 Pro Tip: Use S3 Glacier for long-term retention of backups (7+ years). Costs pennies per GB but meets HIPAA retention requirements.
    🏗️ Architecture: Keep backups in separate AWS account and region. Use SCP policies to prevent deletion. Test recovery in isolated environment.
  6. Step 6: Configure Security Monitoring & Incident Response
    🎯 Goal: Implement real-time monitoring and automated incident response

    📝 HIPAA Security Incident Procedures:
    HIPAA requires identifying and responding to security incidents, mitigating harmful effects, and documenting outcomes. Must have incident response team and procedures.

    💻 Create Security Alarm for Unauthorized Access:
    aws cloudwatch put-metric-alarm --alarm-name unauthorized-phi-access --alarm-description "Alert on unauthorized PHI access attempts" --metric-name UnauthorizedAPICalls --namespace CloudTrailMetrics --statistic Sum --period 300 --threshold 1 --comparison-operator GreaterThanThreshold

    💻 Configure SNS Topic for Security Alerts:
    aws sns create-topic --name hipaa-security-alerts

    💻 Subscribe Security Team:
    aws sns subscribe --topic-arn arn:aws:sns:region:account:hipaa-security-alerts --protocol email --notification-endpoint security@healthcare.com

    💻 Enable GuardDuty (Threat Detection):
    aws guardduty create-detector --enable --finding-publishing-frequency FIFTEEN_MINUTES

    💻 Configure EventBridge for Automated Response:
    aws events put-rule --name isolate-compromised-instance --event-pattern '{"source":["aws.guardduty"],"detail-type":["GuardDuty Finding"],"detail":{"severity":[8,9]}}'

    💻 Create Lambda for Auto-Remediation:
    aws lambda create-function --function-name IsolateCompromisedInstance --runtime python3.11 --role arn:aws:iam::account:role/IncidentResponseRole --handler index.handler --zip-file fileb://function.zip

    🔍 Monitoring Coverage:
    • Real-time alerts for security events
    • Failed authentication attempts tracked
    • GuardDuty for threat detection
    • Automated instance isolation on compromise
    • Security team notified within 1 minute
    • All incidents logged and tracked
    💡 Incident Response: Create runbooks for common scenarios (ransomware, data breach, insider threat). Practice tabletop exercises quarterly.
    ⚠️ Breach Notification: HIPAA requires breach notification within 60 days. Have legal counsel and PR team contacts ready. Document everything!
  7. Step 7: Review Your Results & Dashboard
    🎯 Goal: Validate your implementation and review the compliance metrics

    📝 Final Validation:
    After completing all steps, it's important to validate your configuration and review the compliance metrics to ensure everything is properly implemented.

    💻 Complete These Actions:
    1. Click "Validate Compliance" button to check all your configurations
    2. Review the validation feedback - fix any missing or incorrect settings
    3. Click "View Dashboard" button to see your HIPAA compliance metrics
    4. Examine the dashboard charts showing:
       • Compliance Score breakdown by category
       • Security controls implementation status
       • Encryption coverage across services
       • Audit logging status
    5. Click "Compliance Report" to generate a detailed report
    6. Optionally, click "Export Audit Log" to download logs


    🔍 What to Look For:
    • Overall compliance score should be 100%
    • All 6 safeguard categories should show green checkmarks
    • Encryption should cover all PHI storage
    • Audit logging should be active for all services
    • Backup status should show cross-region replication enabled
    Congratulations! You've implemented a HIPAA-compliant AWS architecture. In a real environment, you would also need to document your security controls and prepare for a HIPAA audit.

HIPAA Compliance Terminal

hipaa-compliance@aws:~$
Progress: 0/7 tasks completed
Score: 0/100
0%

Lab Completed!

HIPAA-compliant architecture implemented!

Lab 11: Multi-Region DR with Automated Failover
Multi-Cloud DR / Advanced
Scenario: Global Financial Services DR
Global Finance Corp requires a robust disaster recovery solution with RPO < 1 minute and RTO < 5 minutes. Implement multi-region DR across AWS and Azure with automated failover, real-time data replication, traffic management with Route 53 and Traffic Manager, automated health checks and failover triggers, and runbook automation. The solution must handle 100,000 transactions per second with zero data loss.

Learning Objectives:

  • Multi-Region Setup: Configure active-passive DR
  • Data Replication: Implement real-time sync
  • Automated Failover: Configure health checks
  • Testing: Validate DR procedures

📋 Step-by-Step Instructions

  1. Step 1: Deploy Primary Region Infrastructure (AWS us-east-1)
    🎯 Goal: Deploy production infrastructure in AWS us-east-1 as primary region

    📝 Active-Passive DR Strategy:
    Primary region handles all traffic. Secondary (standby) region stays warm with replicated data, ready to take over in disaster. This achieves low RTO/RPO at reasonable cost.

    💻 Deploy Primary Infrastructure:
    1. 🗺️ First: Click "Network Diagram" button below to view architecture and IP ranges
    2. Keep AWS us-east-1 region (pre-selected)
    3. Enter VPC CIDR from diagram (hint: primary network is 10.0.x.x/16)
    4. Keep defaults: EC2 Instance Type (t3.medium), Aurora Engine (MySQL 8.0), DB Instance (db.r5.large)
    5. ✅ Enable "Aurora Global Database" checkbox (required)
    6. Keep Backup Retention (7 days) and Encryption (enabled)
    7. Click "Deploy Primary Region" button


    🔍 What Gets Deployed:
    • VPC with public/private subnets across 3 AZs
    • Aurora Global Database (MySQL-compatible)
    • Application Load Balancer with auto-scaling
    • ElastiCache for session management
    • S3 buckets with versioning enabled
    • CloudFront CDN for global distribution
    💡 Best Practice: Use Infrastructure as Code (CloudFormation/Terraform) for consistent deployment. Tag all resources with disaster-recovery=primary.
    High Availability: Aurora Global Database provides <1 second RPO and <1 minute RTO with automatic failover across regions.
  2. Step 2: Deploy Secondary Region (Azure West Europe)
    🎯 Goal: Set up standby infrastructure in Azure for multi-cloud resilience

    📝 Multi-Cloud Benefits:
    Using multiple cloud providers protects against provider-specific outages. Azure West Europe provides geographic diversity from AWS us-east-1. Reduces vendor lock-in risk.

    💻 Deploy Secondary Infrastructure:
    1. 🗺️ Review: Check Network Diagram for Azure VNet range (hint: secondary is 10.1.x.x/16)
    2. Switch to Azure Portal tab in GUI console
    3. Select Resource Group: rg-dr-secondary
    4. Keep West Europe region (pre-selected)
    5. Enter VNet Address Space from diagram (must not overlap with AWS)
    6. Subnet auto-calculates to 10.1.0.0/24 (default subnet)
    7. Keep defaults: App Service (S1), SQL Tier (S3), Storage Redundancy (GRS)
    8. ✅ Enable "SQL Database Geo-Replication" checkbox (required)
    9. Azure Monitor is auto-enabled
    10. Click "Deploy Secondary Region" button


    🔍 Azure Resources Deployed:
    • Virtual Network with gateway subnet
    • Azure SQL Database with Geo-replication
    • Azure App Service (scaled down to save costs)
    • Azure Redis Cache (read replicas)
    • Azure Storage with RA-GRS replication
    • Azure Front Door for traffic management
    💡 Cost Optimization: Keep secondary at 25% capacity in standby. Use Azure Reserved Instances. Scale up automatically during failover (saves 60% on DR costs).
    ☁️ Multi-Cloud Tip: Use Terraform/Pulumi for consistent IaC across clouds. Maintain separate state files per cloud provider.
  3. Step 3: Configure Real-Time Data Replication
    🎯 Goal: Enable continuous data replication to achieve RPO <1 minute

    📝 RPO vs RTO:
    RPO (Recovery Point Objective) = Maximum acceptable data loss. RTO (Recovery Time Objective) = Maximum acceptable downtime. Financial services typically require RPO <1min, RTO <5min.

    💻 Enable Cross-Cloud Replication:
    1. Scroll to Replication Configuration panel (unlocks after Step 1)
    2. ✅ Toggle "Enable Data Replication" checkbox
    3. Enter Replication Lag Alert: 10 seconds (alert threshold)
    4. Click "Start Replication" button
    5. Wait for success message (Aurora Global DB + S3 CRR enabled)


    🔍 Replication Strategy:
    Database: Aurora Global DB → Azure SQL (async, <1s lag)
    Storage: S3 → Azure Blob (lifecycle policy, CRR)
    Cache: ElastiCache → Azure Redis (write-through)
    Messages: SQS → Service Bus (message forwarding)
    Monitoring: CloudWatch → Azure Monitor (metrics streaming)
    💡 Monitoring: Set up CloudWatch alarms for replication lag. Alert if lag exceeds 10 seconds. Track replication health 24/7.
    ⚠️ Bandwidth Cost: Cross-cloud replication incurs data transfer charges. Estimate $0.09/GB. Compress data and use incremental replication.
  4. Step 4: Configure Global Traffic Management
    🎯 Goal: Set up intelligent traffic routing with automated failover detection

    📝 DNS-Based Failover:
    Route 53 health checks monitor primary region. On failure, automatically update DNS to point to secondary. Azure Traffic Manager provides additional layer with priority routing.

    💻 Configure Traffic Management:
    1. Scroll to Route 53 Traffic Management panel (unlocks after Step 3)
    2. Select Check Interval: 30 seconds (Standard)
    3. Enter Failure Threshold: 3 consecutive failures
    4. Enter DNS TTL: 60 seconds (low TTL for fast propagation)
    5. Click "Apply Traffic Rules" button
    6. Azure Traffic Manager auto-configures with priority routing


    🔍 Traffic Management Setup:
    Route 53 Health Check: HTTPS endpoint check every 30s
    Failover Policy: Primary → Secondary on 3 consecutive failures
    TTL Settings: Set to 60s for faster DNS propagation
    Azure Traffic Manager: Priority method (Primary=1, Secondary=2)
    CloudFront: Multiple origin groups with failover
    Global Accelerator: Static anycast IPs for instant failover
    💡 Pro Tip: Use low TTL values (60s) during normal operations. This allows faster DNS propagation during failover but increases DNS query costs.
    🎓 Exam Tip: Route 53 health checks can monitor CloudWatch alarms, not just endpoints. Use this for comprehensive health monitoring.
  5. Step 5: Implement Automated Failover Orchestration
    🎯 Goal: Automate the entire failover process to achieve RTO <5 minutes

    📝 Failover Automation Flow:
    1. Health check detects failure → 2. EventBridge triggers Lambda → 3. Promote secondary database → 4. Scale up Azure resources → 5. Update DNS records → 6. Notify team → 7. Update status dashboards

    💻 Enable Failover Automation:
    1. Switch to Azure Portal tab (if not already there)
    2. Scroll to Failover Automation panel (unlocks after Step 2)
    3. Keep Automation Account (aa-dr-automation) - default
    4. Keep Lambda Function (FailoverOrchestrator) - default
    5. Keep EventBridge Rule (health-check-failure) - default
    6. ⚠️ SNS Topic: Click code below to copy, then paste into SNS field:
    arn:aws:sns:us-east-1:123456789012:dr-failover-alerts (click to copy)
    7. Alert Channels are pre-configured (Email, Slack checked)
    8. ✅ Toggle "Enable Automated Failover" checkbox (required)
    9. Click "Save Automation" button


    🔍 Automation Components:
    AWS Lambda: Orchestration logic (Python/Node.js)
    EventBridge Rules: Trigger on health check failures
    Step Functions: Coordinate multi-step failover workflow
    Azure Automation: Scale up standby resources
    SNS/Email: Notify on-call team immediately
    PagerDuty: Create high-priority incident
    Slack/Teams: Post to #incidents channel
    💡 Testing is Critical: Run failover drills monthly. Measure actual RTO/RPO. Update runbooks. Practice makes perfect - don't wait for real disaster!
    Automation Benefits: Manual failover takes 30-60 minutes. Automated failover completes in 3-5 minutes. Reduces human error by 95%.
  6. Step 6: Execute DR Test & Validation
    🎯 Goal: Validate DR procedures and measure actual RPO/RTO metrics

    📝 DR Testing Types:
    Tabletop Exercise: Walk through scenarios on paper (quarterly)
    Parallel Test: Bring up secondary without shutting down primary (monthly)
    Full Failover: Complete production failover during maintenance window (semi-annually)


    💻 Initiate Failover Test:
    1. Ensure all previous steps (1-5) are completed
    2. Click "Test Failover" button below progress bar
    3. Confirm failover in popup dialog
    4. Watch real-time animation in "Failover Progress Monitor"
    5. Wait for test to complete (animation shows RTO/RPO)
    6. After completion, click "DR Report" to view full results
    7. Step 6 completes ONLY when failover animation finishes


    🔍 Test Validation Checklist:
    • ☑️ Measure actual RTO (target: <5 minutes)
    • ☑️ Verify RPO (target: <1 minute, check data loss)
    • ☑️ Test application functionality in secondary
    • ☑️ Verify all integrations still work
    • ☑️ Validate monitoring and alerting
    • ☑️ Document issues and lessons learned
    • ☑️ Practice failback to primary region
    • ☑️ Update runbooks with findings
    💡 Documentation: Create detailed runbooks with screenshots, commands, and decision trees. Include rollback procedures. Update contact lists quarterly.
    🚨 Real-World Tip: 80% of DR plan failures are due to lack of testing. Untested DR = No DR. Schedule tests and make them non-negotiable!
  7. Step 7: Review Your Results & Dashboard
    🎯 Goal: Validate your DR implementation and review the metrics

    📝 Final Validation:
    After completing all steps and the failover test, review your results to understand how well your DR solution meets the requirements.

    💻 Complete These Actions:
    1. Click "Validate DR Config" button to check all your configurations
    2. Review validation feedback - ensure all components are properly configured
    3. Click "View Dashboard" button to see your DR metrics
    4. Examine the dashboard charts showing:
       • Infrastructure deployment status (AWS/Azure)
       • Replication lag metrics over time
       • Health check status for both regions
       • RTO/RPO actual vs target comparison
    5. Click "DR Report" to generate a detailed disaster recovery report
    6. Review the report for compliance with requirements (RPO <1min, RTO <5min)


    🔍 What to Look For:
    • Overall DR readiness score should be 100%
    • Both primary (AWS) and secondary (Azure) regions show "Deployed"
    • Replication status shows "Active" with lag <1 minute
    • Health checks show "Healthy" for both regions
    • Failover test completed successfully
    • Actual RTO met the <5 minute target
    • Actual RPO met the <1 minute target
    Congratulations! You've implemented a multi-cloud disaster recovery solution with automated failover. In production, schedule monthly DR drills and update runbooks based on lessons learned.
AWS Console
Azure Portal

Primary Region Deployment (us-east-1)

Cross-Region Replication

Route 53 Health Check Configuration

Progress: 0/7 tasks completed
Score: 0/100
0%

Lab Completed!

Multi-region DR successfully implemented!