Multi-Cloud Disaster Recovery Mastery

Expert-level disaster recovery labs across AWS, Azure, and GCP. Build enterprise-grade DR solutions with automated failover, cross-cloud replication, and comprehensive monitoring.

Advanced DR Labs - Module 9

Master disaster recovery with real-world cloud console interfaces and comprehensive configuration options.

Lab 25: AWS Multi-Region Disaster Recovery
AWS / Expert
Scenario: Global E-Commerce DR Strategy
GlobalShop, a Fortune 500 e-commerce platform processing daily transactions, requires a comprehensive disaster recovery solution across multiple AWS regions. Design and implement a multi-region DR architecture with automated failover, cross-region database replication, S3 cross-region replication, and Route 53 health checks. The solution must achieve RPO < 1 minute and RTO < 15 minutes while maintaining 99.99% availability.

Learning Objectives:

  • Multi-Region Architecture: Design active-passive DR topology
  • Database Replication: Configure RDS cross-region replication
  • Traffic Management: Implement Route 53 failover policies
  • Automation: Build Lambda-based failover orchestration
  • Monitoring: Set up CloudWatch cross-region monitoring

Step-by-Step Instructions

  1. Create RDS Multi-AZ Database in Primary Region
    The Multi-AZ RDS database provides automatic failover within a region. It synchronously replicates data to a standby instance in a different Availability Zone.
    In AWS Console (right panel):
    1. Click the "RDS Database" tab at the top
    2. In "Database name" field Type: globalshop-prod-primary
    3. In "Instance class" dropdown Select "db.r5.xlarge"
    4. In "Storage type" Select "Provisioned IOPS SSD (io1)"
    5. In "Allocated storage" Type: 500 GB
    6. Under "Multi-AZ deployment":
        Select radio button: "Yes - Create a standby instance"
    7. In "Backup retention" Select "7 days"
    8. Check box: "Enable automatic minor version upgrades"
    9. Click orange "Create Database" button at bottom
    Why Multi-AZ? Provides 99.95% SLA with automatic failover in ~60 seconds if primary AZ fails. The standby replica stays in sync via synchronous replication.
  2. Configure Cross-Region Read Replica
    Read replicas in a secondary region provide disaster recovery capability. During a regional outage, you can promote the replica to become a standalone database.
    Step-by-step:
    1. Still in "RDS Database" tab, scroll to "Read Replicas" section
    2. In "Replica name" field Type: globalshop-prod-replica
    3. In "Replica region" dropdown Select "us-west-2"
    4. Check boxes:
       ✓ "Publicly accessible" (for testing)
       ✓ "Auto-promote to primary on failure"
    5. In "Replication lag alert threshold" Type: 60 seconds
    6. Click "Create Read Replica" button
    Replication Lag: Typically <1 second under normal load. Monitor with CloudWatch metric "ReplicaLag" - alerts trigger if it exceeds 60 seconds.
  3. Set Up Route 53 Failover Routing
    Route 53 health checks monitor your primary region and automatically route traffic to the secondary region if primary becomes unhealthy.
    Route 53 Configuration:
    1. Click "Route 53" tab
    2. In "Record name" field Type: www.globalshop.com
    3. In "Routing policy" dropdown Select "Failover"
    4. Configure PRIMARY record:
        Record type: "Primary"
        Value: ALB DNS in us-east-1 (e.g., primary-alb-123.us-east-1.elb.amazonaws.com)
        Health check ID: "primary-health-check"
    5. Configure SECONDARY record:
        Record type: "Secondary"
        Value: ALB DNS in us-west-2
        Failover: "Evaluate target health"
    6. Click "Create records"
    Important: Health checks run every 30 seconds from multiple global locations. Failover occurs within 60-120 seconds after primary becomes unhealthy.
  4. Configure S3 Cross-Region Replication
    Replicate static assets, user uploads, and backups to secondary region. CRR provides near real-time replication with 99.99999999999% durability.
    S3 Replication Setup:
    1. Click "S3 Replication" tab
    2. In "Source bucket" dropdown Select "globalshop-prod-assets-us-east-1"
    3. In "Destination region" Select "us-west-2"
    4. In "Destination bucket" Type: globalshop-prod-assets-us-west-2
    5. Replication options:
       ✓ "Replicate objects encrypted with AWS KMS"
       ✓ "Replicate delete markers"
       ✓ "Replication metrics & notifications"
    6. In "Replication time control (RTC)" Check "Enable"
        This guarantees 99.99% of objects replicated within 15 minutes
    7. Click "Enable Replication"
    Cost Optimization: Use S3 Intelligent-Tiering for infrequently accessed objects. CRR costs ~$0.02/GB transferred.
  5. Set Up CloudWatch Cross-Region Monitoring
    Centralized monitoring dashboard shows health of both regions. Alarms trigger automatic failover and notify ops team.
    CloudWatch Configuration:
    1. Click "CloudWatch" tab
    2. Click "Create Dashboard" Name: GlobalShop-DR-Dashboard
    3. Add widgets:
        RDS: CPUUtilization, DatabaseConnections, ReplicaLag
        ALB: TargetResponseTime, HealthyHostCount, HTTPCode_Target_5XX_Count
        Route 53: HealthCheckStatus
        S3: ReplicationLatency, BytesPendingReplication
    4. Create alarms:
       Critical Alarms:
        RDS ReplicaLag > 60 seconds SNS topic: "DR-Ops-Team"
        Route 53 Health Check fails Auto-trigger failover Lambda
        ALB HealthyHostCount < 2 Page on-call engineer
    5. Click "Save Dashboard"
    Best Practice: Use CloudWatch Anomaly Detection to automatically detect unusual patterns like sudden traffic spikes or replication lag increases.
  6. Create Lambda Failover Automation
    Automated failover Lambda function executes when primary region fails. It promotes RDS replica, updates DNS, scales secondary region, and notifies team.
    Automation Steps:
    1. In AWS Console, navigate to Lambda (use search bar if needed)
    2. Click "Create function"
    3. Function name GlobalShop-DR-Failover
    4. Runtime Python 3.11
    5. Add trigger: CloudWatch Event (alarm state change)
    6. Function logic (pseudo-code you'll implement):
       def lambda_handler(event, context):
          # 1. Promote RDS read replica to primary
          # 2. Update Route 53 to point to us-west-2
          # 3. Scale Auto Scaling group in us-west-2 to 100%
          # 4. Send notification to Slack/PagerDuty
          # 5. Log failover event to DynamoDB
    7. Set timeout to 5 minutes
    8. Attach IAM role with RDS, Route53, EC2, SNS permissions
    9. Click "Deploy"
    Critical: Test failover monthly during maintenance windows. Document RTO (target: <15 min) and RPO (target: <1 min) for each drill.
  7. Configure DR Drill and Validate
    Simulate complete regional failure, measure actual RTO/RPO, verify data consistency, and document lessons learned.
    DR Drill Procedure:
    1. Schedule drill during low-traffic period (notify stakeholders 48hrs ahead)
    2. Simulate primary region failure:
        Manually mark Route 53 health check as "unhealthy"
        Or: Shut down primary ALB target group
    3. Monitor automatic failover:
        Watch CloudWatch logs for Lambda execution
        Verify RDS replica promotion (5-10 minutes)
        Confirm DNS propagation (1-2 minutes)
        Check Auto Scaling scaling activity
    4. Validate application functionality:
        Test user login, checkout, database writes
        Verify static assets loading from S3
        Check recent transactions (data loss?)
    5. Measure metrics:
        RTO: Time from failure to full restoration
        RPO: Amount of data lost (in seconds)
        Document any issues or gaps
    6. Failback to primary region:
        Reverse replication direction
        Restore Route 53 to primary
        Scale down secondary region
    Post-Drill: Hold retrospective meeting within 24 hours. Update runbooks based on findings. Typical first-drill issues: forgotten IAM permissions, DNS TTL too high, monitoring gaps.
  8. Set Up Secondary Region (us-west-2)
    Mirror the primary region architecture in us-west-2 as a standby environment. This region will be in "warm standby" mode with reduced capacity that can scale up during failover.
    Your Tasks: Create identical VPC in us-west-2
    Deploy RDS Read Replica from primary
    Configure standby ALB and Auto Scaling
    Set minimum capacity to 25% of primary
    Enable cross-region VPC peering
    Cost Optimization: Use t3.medium instances in standby region to reduce costs. Scale to r5.large during failover.
  9. Configure S3 Cross-Region Replication
    Enable automatic replication of all S3 objects (user uploads, static assets, logs) from primary to secondary region. This ensures data consistency across regions.
    Your Tasks: Enable versioning on source and destination buckets
    Create replication rule for all objects
    Configure replication time control (RTC) for predictable timing
    Set up replication metrics and notifications
    Enable delete marker replication
    Best Practice: Use S3 Intelligent-Tiering to automatically optimize storage costs while maintaining replication performance.
  10. Implement Route 53 Health Checks & Failover
    Configure Route 53 to automatically detect primary region failures and route traffic to the secondary region. Health checks monitor both endpoint availability and application-level health.
    Your Tasks: Create health checks for primary ALB endpoint
    Set up application-level health check (HTTP /health endpoint)
    Configure failover routing policy
    Set evaluation interval to 10 seconds
    Configure SNS notifications for health check failures
    Critical: Test your health check endpoint thoroughly! A misconfigured health check can cause unnecessary failovers.
  11. Build Automated Failover Lambda Function
    Create a Lambda function that orchestrates the failover process: promoting RDS read replica to master, scaling up secondary region capacity, updating security groups, and sending notifications.
    Your Tasks: Write Lambda function in Python to handle failover
    Promote RDS read replica to standalone instance
    Scale Auto Scaling group to production capacity
    Update Route 53 DNS records
    Send notifications to operations team (SNS, Slack, PagerDuty)
    Testing: Create a failback function too! You'll need to restore primary region as primary after issues are resolved.
  12. Set Up CloudWatch Cross-Region Monitoring
    Implement comprehensive monitoring across both regions with unified dashboards, custom metrics, and automated alerting for DR-specific metrics (replication lag, health check status, etc.).
    Your Tasks: Create CloudWatch dashboard showing both regions
    Configure alarms for RDS replication lag (> 60 seconds)
    Set up custom metrics for application health
    Enable cross-region CloudWatch Logs aggregation
    Configure automated runbooks in Systems Manager
  13. Configure DR Drill & Validate RPO/RTO
    Perform a full disaster recovery test to validate your architecture. Measure actual RPO and RTO, identify bottlenecks, and refine your runbook. Document everything!
    Your Tasks: Simulate primary region failure
    Trigger automated failover process
    Measure time to restore service (RTO)
    Validate data consistency (RPO)
    Test failback to primary region
    Generate DR drill report with metrics and lessons learned
    Compliance: Most regulations require quarterly DR drills. Schedule these proactively and keep detailed records.

AWS Management Console - DR Configuration

us-east-1 | Primary admin-user@globalshop.com
RDS Database
Route 53
S3 Replication
CloudWatch

Configure RDS Multi-Region Replication

Primary Database Instance

Cross-Region Read Replica (us-west-2)

Progress: 0/7 tasks completed
Score: 0/100
0%

Lab Complete!

Excellent AWS disaster recovery implementation!

Lab 26: Azure Site Recovery Implementation
Azure / Expert
Scenario: Enterprise Azure DR with Site Recovery
TechCorp Enterprise runs mission-critical applications on Azure VMs processing $100M annual revenue. Configure Azure Site Recovery (ASR) for automated VM replication from East US to West US 2, implement Recovery Plans with sequenced failover, configure Network Security Groups for DR network, and ensure RPO < 5 minutes and RTO < 30 minutes with automated runbooks.

Learning Objectives:

  • Azure Site Recovery: Configure ASR vault and replication
  • Recovery Plans: Build multi-tier application recovery sequences
  • Network Configuration: Set up DR network topology
  • Automation: Create Azure Automation runbooks for failover
  • Testing: Execute DR drills without impacting production

Detailed Step-by-Step Instructions

  1. Create Recovery Services Vault in Primary Region
    The Recovery Services vault is the central repository for all backup and replication data. You must create this in the PRIMARY region (where your VMs currently run).
    In Azure Portal (right panel):
    1. In the "Resource Group" dropdown Select "RG-Production-EastUS"
    2. In "Vault Name" field Type: ASR-Vault-EastUS-Primary
    3. In "Region" dropdown Select "East US"
    4. Check the box: "Enable Cross Region Restore"
    5. Click the blue "Create Vault" button at bottom
    Why Primary Region? The vault coordinates replication FROM this region TO the DR region. Think of it as the "control center" for your DR operations.
  2. Configure Replication for Production VMs
    Enable replication for each production VM. ASR will create a replica in the target region and continuously sync changes with <5 minute RPO.
    Step-by-step clicks:
    1. Click the "Replication" tab in the left navigation
    2. Click "+ Enable Replication" button (top left)
    3. Source VMs section:
        Check boxes for: "web-vm-01", "app-vm-01", "db-vm-01"
    4. Target Location dropdown Select "West US 2"
    5. Target Resource Group Select "RG-DR-WestUS2" (or click "Create New")
    6. Replication Policy dropdown Select "24-hour-retention"
    7. Recovery Point Objective Keep default "5 minutes"
    8. Click "Enable Replication" button
    Initial Sync Time: First-time replication takes 2-4 hours depending on VM size. Subsequent syncs are delta-based (only changes) and happen every 5 minutes.
  3. Configure DR Network Topology
    Set up the network configuration VMs will use after failover. This includes VNets, subnets, NSGs, and load balancers in the DR region.
    Network Configuration:
    1. Click "Network Mapping" tab
    2. Click "+ Add Network Mapping"
    3. Source VNet Select "VNet-EastUS-Prod"
    4. Target VNet Select "VNet-WestUS2-DR" (if not exists, click "Create")
    5. For each VM, configure:
       • web-vm-01 Subnet: "public-subnet" IP: Dynamic
       • app-vm-01 Subnet: "app-subnet" IP: Dynamic
       • db-vm-01 Subnet: "data-subnet" IP: Static (preserve IP)
    6. Configure NSG Attach "NSG-DR-WestUS2"
    7. Click "Save Mapping"
    IP Strategy: Web/App tiers use dynamic IPs (can change on failover). Database uses static IP to avoid connection string changes in application code.
  4. Create Recovery Plan with Sequenced Failover
    Recovery Plans define the ORDER in which VMs failover. Critical for multi-tier apps: Database must start BEFORE app servers, app servers BEFORE web servers.
    Building Recovery Plan:
    1. Click "Recovery Plans" tab Click "+ Recovery Plan"
    2. Name field Type: Production-App-Failover
    3. Source "East US" | Target "West US 2"
    4. Select VMs Add all 3 VMs
    5. Configure Boot Order (CRITICAL!):
       Group 1 (Database Tier): db-vm-01 Priority 1
       Group 2 (App Tier): app-vm-01 Priority 2 Wait 5 min after Group 1
       Group 3 (Web Tier): web-vm-01 Priority 3 Wait 3 min after Group 2
    6. Add Pre-failover Script Select "Update-DNS-Records" runbook
    7. Add Post-failover Script Select "Health-Check-Validation" runbook
    8. Click "Create"
    Common Mistake: Failing over all VMs simultaneously causes app errors because web servers try to connect to database before it's ready. Always use sequenced groups!
  5. Configure Azure Automation for Failover
    Create automation runbooks that execute before/after failover to update DNS, load balancers, and validate application health.
    Automation Setup:
    1. Click "Automation" tab
    2. Select Automation Account "ASR-Automation-Account"
    3. Pre-Failover Runbook Configuration:
        Name: "Pre-Failover-Tasks"
        Action: "Update Azure Traffic Manager to DR region"
        Action: "Snapshot current database state"
        Action: "Send notification to ops team"
    4. Post-Failover Runbook Configuration:
        Name: "Post-Failover-Validation"
        Action: "Verify all VMs are running"
        Action: "Test application endpoints (HTTP 200)"
        Action: "Update monitoring dashboards"
        Action: "Generate DR drill report"
    5. Click "Save Runbooks"
  6. Execute Configure DR Drill (Non-Disruptive)
    Test failover creates isolated VMs in DR region WITHOUT affecting production. This validates your DR plan works before a real disaster.
    Running Configure DR Drill:
    1. Navigate to Recovery Plans Select "Production-App-Failover"
    2. Click "Configure DR Drill" button (NOT "Failover"!)
    3. Recovery Point Select "Latest (lowest RPO)"
    4. Test Network Select "VNet-WestUS2-Test" (isolated network)
    5. Check box: "Create separate test VMs"
    6. Check box: "Run post-failover scripts"
    7. Click "OK" to start test
    8. Monitor progress: Database App Web servers boot in sequence
    9. After complete: Click "Cleanup Configure DR Drill" to delete test VMs
    Best Practice: Run test failover MONTHLY. Azure Site Recovery allows this without impacting production - there's no excuse not to test regularly!
  7. Configure Monitoring and Alerts
    Set up comprehensive monitoring to track replication health, lag, and get alerted to any issues before they become critical.
    Monitoring Configuration:
    1. Click "Monitoring" tab
    2. Enable Metrics:
       ✓ Replication Health
       ✓ RPO Breach (alert if >5 minutes)
       ✓ Configure DR Drill Success Rate
       ✓ Replication Data Transfer Rate
    3. Configure Alerts:
        Alert: "Critical - Replication Health Unhealthy"
        Severity: Critical Action Group: "DR-Ops-Team"
        Alert: "Warning - RPO > 10 minutes"
        Severity: Warning Email + SMS
    4. Enable Azure Monitor integration
    5. Create dashboard with all DR metrics
    6. Click "Save Monitoring Config"
    Pro Tip: Integrate alerts with PagerDuty or OpsGenie for 24/7 on-call coverage. DR issues can't wait until business hours!

Azure Portal - Site Recovery Configuration

admin@techcorp.com
Recovery Vault
Replication
Network Mapping
Recovery Plans
Monitoring

Create Recovery Services Vault

Home > Recovery Services vaults > Create

Basics

Allows restore to a paired region for enhanced DR
Progress: 0/7 tasks completed
Score: 0/100
0%

Lab 26 Complete!

Excellent Azure Site Recovery implementation!

Lab 27: GCP Multi-Region Disaster Recovery
GCP / Expert
Scenario: Global GCP DR with Cloud SQL and GKE
GlobalMedia streams video content to 50M users worldwide. Design and implement a multi-region DR solution on GCP with Cloud SQL cross-region replication, GKE cluster failover, Cloud CDN configuration, and Global Load Balancing. Achieve RPO < 1 minute for databases, RTO < 20 minutes for full stack, and ensure zero data loss during regional failures.

Learning Objectives:

  • Cloud SQL HA: Configure cross-region replicas with automated promotion
  • GKE Multi-Region: Deploy GKE clusters in multiple regions
  • Global Load Balancing: Configure GCLB with health checks
  • Cloud Storage Replication: Set up multi-region buckets
  • Disaster Recovery Testing: Execute automated DR drills

Detailed GCP DR Instructions

  1. Configure Cloud SQL with High Availability
    Cloud SQL HA configuration ensures your database survives zonal failures and can quickly promote cross-region replicas during regional disasters.
    Configuration Steps:
    1. In GCP Console (right panel), click "Cloud SQL" tab
    2. Click "Create Instance" button
    3. Database Type Select "PostgreSQL 15"
    4. Instance ID Type: globalmedia-primary-us-central1
    5. Password Enter strong password (save it!)
    6. Region Select "us-central1 (Iowa)"
    7. Zonal Availability Select "Multiple zones (Highly available)"
    8. Machine Type Select "db-n1-standard-4" (4 vCPU, 15 GB RAM)
    9. Storage 500 GB SSD, Enable automatic storage increase
    10. Backups Enable automated daily backups, 30-day retention
    11. Click "Create Instance"
    HA vs Read Replicas: HA provides automatic failover within same region (99.95% SLA). Cross-region replicas provide DR across regions. You need BOTH for comprehensive protection!
  2. Create Cross-Region Read Replica for DR
    The cross-region replica continuously replicates data from primary. During disaster, you promote it to standalone instance with one click.
    Creating Replica:
    1. Select your primary instance "globalmedia-primary-us-central1"
    2. Click "Create Read Replica" button
    3. Replica ID Type: globalmedia-replica-us-east1
    4. Region Select "us-east1 (South Carolina)" (different region!)
    5. Machine Type Match primary: "db-n1-standard-4"
    6. High Availability Enable (replica can also be HA!)
    7. Replication Options:
        Enable: "Automatic Failover to this replica"
        Promote on: "Primary instance failure"
        Promotion Priority: "High"
    8. Click "Create Replica"
    9. Wait 10-15 minutes for initial data sync
    Important: After promotion, replica becomes standalone instance and stops replicating. You must manually failback to restore replication!
  3. Deploy GKE Clusters in Multiple Regions
    For application layer DR, deploy identical GKE clusters in primary and secondary regions. Use multi-cluster Ingress to route traffic.
    GKE Cluster Setup:
    1. Click "Kubernetes Engine" "Clusters"
    2. Click "Create" button

    Primary Cluster (us-central1):
    3. Name globalmedia-primary-cluster
    4. Location type "Zonal" Zone: "us-central1-a"
    5. Master Version Select "Stable channel" (latest)
    6. Node Pools:
        Machine type: "n1-standard-4"
        Number of nodes: 3 (minimum for HA)
        Enable autoscaling: Min 3, Max 10
    7. Networking:
        VPC: "default"
        Enable HTTP load balancing
        Enable Cloud Monitoring
    8. Click "Create"

    Secondary Cluster (us-east1):
    9. Repeat steps 2-8 with:
        Name: globalmedia-secondary-cluster
        Zone: "us-east1-b"
        Same node configuration
    Cost Optimization: Keep secondary cluster at minimum 1-2 nodes during normal operations. Scale up automatically during failover using Cluster Autoscaler.
  4. Configure Global Load Balancer
    GCP Global Load Balancer routes users to nearest healthy backend. During regional failure, it automatically redirects to secondary region within seconds.
    Load Balancer Configuration:
    1. Navigate to "Network Services" "Load Balancing"
    2. Click "Create Load Balancer"
    3. Type Select "HTTP(S) Load Balancing"
    4. Internet facing Yes
    5. Backend Configuration:
       Backend Service 1 (Primary):
        Name: "backend-us-central1"
        Backend type: "Instance group" (GKE nodes)
        Instance group: "gke-primary-cluster-default-pool"
        Port: 80, Protocol: HTTP
        Health Check: "/healthz" every 10 seconds
        Timeout: 5 seconds, Unhealthy threshold: 2
       Backend Service 2 (Secondary):
        Name: "backend-us-east1"
        Same configuration for secondary cluster
    6. Routing Rules:
        Traffic Split: 100% to primary (normal)
        Failover: Automatic on health check failure
    7. Frontend Configuration:
        Protocol: HTTPS
        IP: Reserve static IP "globalmedia-lb-ip"
        Certificate: Upload SSL cert or use Google-managed
    8. Click "Create"
    Health Checks: Configure aggressive health checks (10s interval, 2 failures = unhealthy) to detect issues quickly and failover faster.
  5. Set Up Multi-Region Cloud Storage
    Store user uploads, videos, and static assets in multi-region buckets for automatic geo-redundancy without manual replication.
    Storage Configuration:
    1. Navigate to "Cloud Storage" "Buckets"
    2. Click "Create Bucket"
    3. Name globalmedia-videos-prod
    4. Location type Select "Multi-region"
    5. Multi-region Select "US" (covers central and east)
    6. Storage class "Standard"
    7. Access control "Uniform"
    8. Protection tools:
       ✓ Enable versioning (for accidental deletes)
       ✓ Enable object lifecycle management
    9. Encryption Google-managed encryption key
    10. Click "Create"
    Multi-Region Benefits: Data is automatically stored in at least 2 geographically separated regions. If one region fails, data remains accessible from other region with same URL!
  6. Deploy Application with Automated Failover
    Deploy your containerized application to both GKE clusters with identical configuration. Use ConfigMaps to point to local database replica.
    Application Deployment:
    1. Click "Workloads" tab
    2. Select primary cluster
    3. Click "Deploy"
    4. Container Image gcr.io/globalmedia/app:v2.1
    5. Environment Variables:
        DB_HOST: "globalmedia-primary-us-central1.db"
        DB_REPLICA_HOST: "globalmedia-replica-us-east1.db"
        STORAGE_BUCKET: "globalmedia-videos-prod"
    6. Replicas 3 (for HA)
    7. Repeat for secondary cluster (change DB_HOST to replica)
    8. Configure liveness probe: GET /health every 10s
    9. Configure readiness probe: GET /ready every 5s
    10. Click "Deploy"
  7. Configure DR Drill and Validate RTO/RPO
    Test the entire DR process: simulate primary region failure, measure failover time, validate data consistency, and document results.
    DR Drill Procedure:
    1. Navigate to "Monitoring" "DR Testing"
    2. Click "New DR Drill"
    3. Scope Select "Full Stack Failover"
    4. Test Actions:
        Simulate: "us-central1 region outage"
        Promote: Cloud SQL replica to primary
        Redirect: Global LB to us-east1
        Scale: GKE secondary cluster to full capacity
    5. Success Criteria:
        All health checks pass
        Zero data loss (RPO check)
        Complete within 20 minutes (RTO check)
        Application fully functional
    6. Click "Start DR Drill"
    7. Monitor progress and record metrics
    8. After validation, click "Failback to Primary"
    Critical: Always perform DR drills during low-traffic periods (e.g., 2-4 AM local time) and notify all stakeholders beforehand!

Google Cloud Console - Multi-Region DR

globalmedia-production
Cloud SQL
Kubernetes Engine
Load Balancing
Cloud Storage
Monitoring

Create Cloud SQL Instance

Console > SQL > Create instance

Instance Configuration

Progress: 0/7 tasks completed
Score: 0/100
0%

Lab 27 Complete!

Excellent GCP multi-region DR implementation!