Master cutting-edge cloud technologies with hands-on labs covering chaos engineering, serverless architectures, FinOps, and cloud-native observability at scale.
Explore advanced cloud concepts and emerging technologies with expert-level hands-on scenarios.
Explanation: Install Gremlin and Litmus Chaos frameworks, then configure your first chaos experiment using the GUI panel.
Part A - Terminal Installation:
1. Install Litmus Chaos operator in your Kubernetes cluster
2. Verify installation by checking pod status
💡 Tip: Make sure you have cluster-admin permissions before installing
Command to run: kubectl apply -f https://litmuschaos.github.io/litmus/litmus-operator.yaml
Part B - GUI Configuration:
3. In the Experiment Configuration Panel below, configure the following:
Experiment Name: network-latency-test
Target Type: Select "Kubernetes Pods"
Chaos Type: Select "Network Latency"
Blast Radius: Enter 10 (affects 10% of pods)
Duration: Enter 5 minutes
Schedule: Select "Run Once"
Required Checkboxes (all must be checked):
✓ Check "Auto-rollback on SLO violation"
✓ Check "Send Slack notifications"
✓ Check "Generate detailed report"
Click the copiable values above to copy them, then paste into the GUI fields
Explanation: Define Service Level Objectives to establish baseline metrics, then execute your configured experiment.
Part A - Terminal (Define SLOs):
1. Set baseline availability, latency, and error rate thresholds
💡 Tip: Document your steady state criteria - this becomes your hypothesis for chaos experiments
Command to run: chaos define-slo --availability 99.9 --latency-p99 100ms --error-rate 0.1
Part B - GUI (Run Experiment):
2. Click the "Create Experiment" button in the Experiment Configuration Panel above
3. Verify that your configuration matches instructions from Step 1
💡 Tip: The system will validate all fields before starting the experiment
Explanation: Monitor the chaos experiment in real-time and observe system behavior.
Part A - Terminal (Monitor):
1. Watch the experiment progress and track injected failures
2. Observe how your services respond to network latency
💡 Tip: Keep monitoring dashboards open (Grafana/Datadog) to visualize impact
Command to run: chaos experiment monitor --watch --experiment-id network-latency-test
Part B - GUI (View Live Metrics):
3. Observe the System Health Metrics panel updating in real-time
4. Watch for SLO violations (availability drop, latency increase)
💡 Tip: Healthy systems should maintain SLOs even with 10% of pods experiencing latency
Explanation: After the experiment completes, generate a comprehensive report and identify improvements.
Part A - Terminal (Generate Report):
1. Export detailed experiment data in JSON format
2. Include all metrics, failures, and recovery times
💡 Tip: Save reports for trend analysis across multiple game days
Command to run: chaos report generate --experiment-id latest --format json --output chaos-report.json
Part B - GUI (Review Results):
3. Review the final metrics in the System Health Metrics panel
4. Identify services that violated SLOs or exhibited unexpected behavior
5. Document findings for system improvements
💡 Success Criteria: You should identify at least one area for resilience improvement
Once you've completed all terminal commands AND configured the GUI:
💡 Success: The dashboards show real-time changes as you configure chaos experiments!
* All checkboxes are required for lab completion
Chaos engineering implemented successfully!
Explanation: Initialize SAM project and configure your Lambda function using the GUI.
Part A - Terminal (Initialize Project):
1. Create new SAM project with Python 3.9 runtime
💡 Tip: SAM simplifies Lambda deployment by managing CloudFormation templates
Command to run:
sam init --runtime python3.9 --name data-processor
Part B - GUI (Configure Function):
2. In the Function Configuration Panel below, configure:
Function Name: data-processor
Runtime: Select "Python 3.9"
Memory: Select "512 MB"
Timeout: Enter 30 seconds
Execution Role: Select "lambda-execution-role"
Architecture: Select "x86_64"
Click copiable values to copy, then paste into GUI fields
Explanation: Build your Lambda functions with shared dependencies and orchestration logic.
Steps:
1. Create Lambda layer for shared utilities (boto3, requests libraries)
2. Implement Step Functions state machine for workflow orchestration
3. Configure Dead Letter Queue (DLQ) for failed invocations
4. Set up error handling with exponential backoff
💡 Tip: Lambda layers reduce deployment package size and enable code reuse across functions
Command to run:
sam build
This builds all Lambda functions and prepares them for deployment
Explanation: Connect your Lambda functions to various AWS event sources for event-driven processing.
Steps:
1. Set up EventBridge rule to trigger Lambda on custom events
2. Configure Kinesis stream as trigger for real-time data processing (batch size: 100)
3. Implement SQS queue for asynchronous message processing
4. Add S3 event notification trigger for file processing
💡 Tip: Use SQS for workloads that can tolerate latency; use Kinesis for real-time streaming
Command to run:
sam deploy --guided
Follow the prompts to configure your deployment settings
Explanation: Reduce cold starts and improve response times through performance optimizations.
Steps:
1. Configure provisioned concurrency (100 concurrent executions) for critical functions
2. Implement connection pooling for database connections (reuse across invocations)
3. Optimize package size by removing unused dependencies
4. Use Lambda Powertools for structured logging and tracing
💡 Tip: Provisioned concurrency keeps functions warm but increases cost - use for latency-critical APIs only
Command to run:
aws lambda put-provisioned-concurrency-config --function-name data-processor --provisioned-concurrent-executions 100
This eliminates cold starts for your main function
Explanation: Enable comprehensive monitoring and tracing to understand function behavior.
Steps:
1. Enable AWS X-Ray active tracing for all Lambda functions
2. Configure CloudWatch Insights queries for log analysis
3. Set up custom CloudWatch metrics (invocation count, duration, errors)
4. Create CloudWatch dashboards with key performance indicators
5. Configure SNS alerts for error rates > 1%
💡 Tip: X-Ray shows end-to-end request flow across your distributed serverless architecture
Command to run:
sam logs --tail --function data-processor
Monitor function logs in real-time
Explanation: Create active-active multi-region deployment for high availability and low latency.
Steps:
1. Deploy your SAM application to us-east-1 and eu-west-1 regions
2. Configure Route 53 with geoproximity routing policy
3. Set up DynamoDB Global Tables for cross-region data replication
4. Implement S3 Cross-Region Replication for static assets
5. Test failover by simulating regional outage
💡 Tip: Multi-region adds cost but provides disaster recovery and better user experience globally
Commands to run:
sam deploy --region us-east-1sam deploy --region eu-west-1
Deploy to multiple regions for global availability
Explanation: Configure the Function Configuration panel with required Lambda settings.
GUI Configuration (Required):
Function Name: data-processor
Runtime: Select "Python 3.9"
Memory: Select "512 MB"
Timeout: Enter 30 seconds
Execution Role: Select "lambda-execution-role"
Architecture: Select "x86_64"
Advanced Settings (Expand & Configure):
✓ Check "Enable X-Ray Tracing"
Reserved Concurrency: Enter 100
Provisioned Concurrency: Enter 10
Dead Letter Queue: Select "SQS Queue"
Click "Deploy Function" to deploy your Lambda configuration.
Once you've completed all terminal commands AND configured the GUI:
💡 Success: The dashboards show real-time Lambda metrics as you configure and deploy functions!
* Required fields for lab completion
Enterprise serverless architecture deployed!
Explanation: Establish tagging standards to track cloud costs by department, project, and environment.
Steps:
1. Create tagging policy requiring: Environment, Project, Owner, CostCenter tags
2. Deploy AWS Organizations SCPs to enforce tagging on resource creation
3. Configure cost allocation tags in AWS Billing console
4. Run tag compliance scan and remediate non-compliant resources
💡 Tip: Without proper tagging, you can't accurately allocate costs to teams - this is FinOps foundation
In the FinOps Terminal below, type: enforce-tagging-policy
Explanation: Set up native and third-party tools to visualize and analyze cloud spending.
Steps:
1. Enable AWS Cost Explorer with granular data (hourly breakdowns)
2. Configure Azure Cost Management + Billing with custom views
3. Integrate CloudHealth or CloudCheckr for multi-cloud visibility
4. Set up Cost & Usage Reports (CUR) to S3 with Athena queries
💡 Tip: Cost Explorer has a $0.01/query cost - use saved reports instead of ad-hoc queries
In the FinOps Terminal, type: deploy-cost-tools
Explanation: Automatically identify and resize underutilized resources to reduce waste.
Steps:
1. Enable AWS Compute Optimizer (analyzes CloudWatch metrics)
2. Create Lambda function to schedule start/stop of dev/test instances
3. Implement auto-scaling policies with CPU/Memory targets (70% utilization)
4. Set up weekly rightsizing review with stakeholders
💡 Tip: Rightsizing typically saves 20-30% of compute costs with minimal effort
In the FinOps Terminal, type: enable-rightsizing
Explanation: Commit to 1-year or 3-year terms to save up to 72% on steady-state workloads.
Steps:
1. Analyze RI recommendations in Cost Explorer (look for 60%+ utilization)
2. Purchase Compute Savings Plans for flexibility across instance families
3. Buy specific RIs for predictable workloads (databases, always-on services)
4. Implement RI/SP tracking dashboard to monitor coverage and utilization
💡 Tip: Start with 1-year terms; move to 3-year only for very stable workloads
In the FinOps Terminal, type: purchase-savings-plans
Explanation: Catch unexpected cost spikes before they become budget disasters.
Steps:
1. Configure AWS Cost Anomaly Detection with ML-based alerting
2. Set alert thresholds ($500 for services, $5000 for total spend)
3. Create SNS topic to notify FinOps team via email/Slack
4. Build automated response: Lambda to snapshot resources on anomaly
💡 Tip: Most runaway costs are from forgotten resources or misconfigurations
In the FinOps Terminal, type: setup-anomaly-detection
Explanation: Build dashboards that communicate cost trends to leadership and enable chargeback.
Steps:
1. Create QuickSight dashboards showing: month-over-month costs, forecast, savings opportunities
2. Implement chargeback reports per cost center/project
3. Configure automated monthly reports emailed to stakeholders
4. Build "showback" views for teams (visibility without billing)
💡 Tip: Use simple, executive-friendly visuals - show trends and actions, not raw data
In the FinOps Terminal, type: create-dashboards
Explanation: Configure the Cost Optimization Settings panel to apply FinOps policies.
GUI Configuration (Required):
Cost Center: Select "Engineering"
Cloud Provider: Select "AWS"
Environment: Select "Production"
Budget Alert Threshold: Enter 80 (80%)
Rightsizing Aggressiveness: Select "Moderate"
RI/SP Purchase Strategy: Select "Auto-purchase (1-year)"
Advanced Settings (Expand & Configure):
✓ Check "Auto-shutdown idle resources"
✓ Check "Schedule dev/test instances"
Click "Save Configuration" to apply your FinOps settings.
Once you've completed all terminal commands AND configured the GUI:
💡 Success: The dashboards show real-time cost savings as you implement FinOps practices!
FinOps practices successfully implemented!