AI & Machine Learning Labs

Optimize token usage, configure intelligent model routing, and evaluate AI system performance through hands-on labs.

GenAI Production Labs - Module 4

Master cost optimization, routing strategies, and evaluation frameworks.

Lab 10: Token Usage Optimizer
Cost / Production
Scenario: API Cost Reduction
CloudAI Corp is spending $50,000/month on LLM API calls. Your task is to optimize prompts and implement token reduction strategies to cut costs by at least 40% while maintaining output quality.

Learning Objectives:

  • Token Counting: Understand how prompts translate to tokens
  • Prompt Compression: Reduce tokens without losing meaning
  • Caching Strategies: Identify cacheable content
  • Cost Calculation: Estimate API costs accurately

Prompt Optimization Studio

Savings: $0.00
📋 Task: Optimize the Prompt
Write an optimized version of the prompt that conveys the same meaning with fewer tokens. Enable optimization techniques and see the real-time cost impact.
0
Original Tokens
0
Optimized Tokens
0%
Reduction
$0
Monthly Savings
You are a highly skilled and experienced customer service representative for our technology company. Your primary responsibility is to assist customers with their inquiries and problems in a friendly, professional, and helpful manner. Please analyze the following customer message and provide a comprehensive, detailed response that addresses all of their concerns while maintaining a positive and empathetic tone throughout your response.
0 characters | 0 words
0 characters | 0 words
Required Optimization Checks (Must Pass All 4)
â‘ 
Remove Filler Words
Remove at least 2 of: highly, very, really, just, actually, basically, certainly, definitely
â‘¡
Compress Redundant Phrases
Simplify "friendly, professional, and helpful" - don't use all three together
â‘¢
Shorten Instruction Phrases
Shorten at least 2 long phrases like "Your primary responsibility is to", "please analyze the following"
â‘£
Preserve Core Meaning
Keep at least 2 core concepts: customer, assist, help, respond, support, service
Target: 50%+ token reduction while passing all checks
Cost Comparison
Original Cost (10K calls/month)
$0
Optimized Cost
$0
Progress: 0/4 tasks completed
Score: 0/100
0%

Lab Completed!

Excellent cost optimization!

Lab 11: Intelligent Model Routing
Routing / Advanced
Scenario: Multi-Model Architecture
TechFlow Inc. uses multiple AI models for different tasks. Design a routing system that sends queries to the optimal model based on complexity, cost, and latency requirements. Configure rules that balance performance and budget.

Learning Objectives:

  • Query Classification: Categorize requests by complexity
  • Model Selection: Match models to task requirements
  • Rule Configuration: Create routing conditions
  • Cost-Performance Balance: Optimize for budget constraints

Model Routing Designer

Write routing logic
📋 Task: Write Routing Logic Code
Write pseudocode or JavaScript-style routing logic that selects the optimal model based on query type and priority. Then test your logic against 4 real scenarios.
Routing Logic Code
Write code that routes queries to the optimal model. Must include: conditional logic, query type check, priority check, model selection, return statement.
Write routing logic and test scenarios
Model Specs Reference
GPT-4 Turbo
Quality: ★★★★★ Speed: Medium
Cost: $10/1M input, $30/1M output
Latency: 800-1200ms
Best for: reasoning, code, analysis
GPT-3.5 Turbo
Quality: ★★★☆☆ Speed: Fast
Cost: $0.50/1M input, $1.50/1M output
Latency: 200-400ms
Best for: simple Q&A, creative
Claude 3 Sonnet
Quality: ★★★★☆ Speed: Medium
Cost: $3/1M input, $15/1M output
Latency: 500-900ms
Best for: analysis, creative, reasoning
Llama 3 70B
Quality: ★★★★☆ Speed: Fast
Cost: $0.59/1M input, $0.79/1M output
Latency: 300-500ms
Best for: simple Q&A, code
Test Scenarios - Select the Correct Model for Each
Based on your routing logic, select which model should handle each scenario. You must pass all 4 to complete the lab.
Scenario 1: Code Generation + Quality Priority
"Write a Python function to implement binary search with detailed comments"
Scenario 2: Simple Q&A + Speed Priority
"What is the capital of France?"
Scenario 3: Complex Reasoning + Quality Priority
"Analyze the pros and cons of microservices vs monolithic architecture"
Scenario 4: Creative Writing + Cost Priority
"Write a short poem about technology"
Progress: 0/5 tasks completed
Score: 0/100
0%

Lab Completed!

Excellent routing configuration!

Lab 12: Evaluation Metrics Designer
Evaluation / Critical
Scenario: QA System Evaluation
DataSmart Inc. needs to evaluate their question-answering AI before deployment. Create test cases with expected outputs, run evaluations, and analyze the model's performance using standard metrics like accuracy, relevance, and coherence.

Learning Objectives:

  • Test Case Design: Create comprehensive evaluation cases
  • Metric Selection: Choose appropriate evaluation metrics
  • Result Analysis: Interpret evaluation outputs
  • Quality Gates: Set pass/fail thresholds

Evaluation Dashboard

Create test cases
📋 Task: Design Evaluation Framework
Create at least 3 test cases with inputs and expected outputs. Configure evaluation metrics with appropriate thresholds, then run the evaluation to see real-time results.
Test Cases
No test cases yet. Click "Add Case" to create one.
Evaluation Metrics
--%
Accuracy
--%
Relevance
--%
Coherence
--%
Overall Score
Pass Thresholds
Run evaluation to see results
Progress: 0/4 tasks completed
Score: 0/100
0%

Lab Completed!

Excellent evaluation design!

Lab 10: Token Optimizer Instructions

Objective

Optimize a verbose prompt to reduce token usage by 50%+ while maintaining its core meaning. Pass all 4 optimization checks to complete the lab.

Optimization Steps

  1. Read Original: Study the verbose prompt shown in the "Given" section.
  2. Write Optimized: Rewrite it concisely in the text area below.
  3. Remove Filler Words: Cut words like "highly", "very", "really", "just".
  4. Compress Phrases: Don't use "friendly, professional, and helpful" together.
  5. Shorten Instructions: Replace long phrases with concise alternatives.
  6. Preserve Meaning: Keep core concepts: customer, assist, help, service.
Pro Tips

Focus on what the AI should DO, not elaborate descriptions of HOW. "Help customers professionally" conveys the same as a 50-word explanation.

Example Optimization

Before: "Your primary responsibility is to assist customers"
After: "Assist customers" (saves 4 words)

Common Mistakes

Over-simplifying and losing meaning. Make sure you keep words like "customer", "assist/help", and maintain a professional tone indicator.

Lab 11: Model Routing Instructions

Objective

Write routing logic code and correctly route 4 test scenarios to the optimal model based on query type and priority requirements.

Configuration Steps

  1. Write Routing Code: Create a function that takes queryType and priority as inputs and returns the optimal model.
  2. Use Conditionals: Include if/else statements to handle different scenarios.
  3. Analyze Code: Click "Analyze Code" to verify your logic structure.
  4. Test Scenarios: Select the correct model for each of the 4 test scenarios.
Pro Tips

GPT-4 for quality-critical tasks (code, analysis). GPT-3.5/Llama for speed-critical simple tasks. Consider cost vs quality tradeoffs.

Model Selection Guide
  • Code + Quality: GPT-4 (best reasoning)
  • Simple Q&A + Speed: GPT-3.5 or Llama (fast, cheap)
  • Analysis + Cost: Claude (good balance)
  • Creative + Speed: GPT-3.5 (fast creative)
Common Mistakes

Always using the most expensive model. Match model capabilities to actual requirements. Simple tasks don't need GPT-4.

Lab 12: Evaluation Instructions

Objective

Design an evaluation framework with test cases, configure metric thresholds, and run evaluations to assess AI system quality.

Evaluation Steps

  1. Create Test Cases: Click "Add Case" to create at least 3 test cases with input questions and expected outputs.
  2. Set Thresholds: Configure minimum acceptable percentages for Accuracy, Relevance, Coherence, and Overall score.
  3. Run Evaluation: Click "Run Evaluation" to test the simulated AI against your cases.
  4. Analyze Results: Review metric gauges and pass/fail status for each threshold.
Pro Tips

Use diverse test cases covering different question types. Set realistic thresholds (70-85% is typical for production systems).

Good Test Case Examples
  • Factual Q&A: "What is X?" → Expected: Accurate definition
  • Reasoning: "Why does X happen?" → Expected: Logical explanation
  • Comparison: "Compare X and Y" → Expected: Balanced analysis
Common Mistakes

Thresholds too high (100%) or too low (below 50%). Test cases too vague or with ambiguous expected outputs.