AI & Machine Learning Labs

Build RAG pipelines, configure fine-tuning workflows, and implement AI safety guardrails through hands-on exercises.

GenAI Advanced Labs - Module 3

Master RAG architecture, model fine-tuning, and responsible AI practices.

Lab 7: RAG Pipeline Builder
RAG / Advanced
Scenario: Enterprise Knowledge Assistant
TechCorp needs a Retrieval-Augmented Generation (RAG) system to answer questions from their 10,000+ document knowledge base. Build a complete RAG pipeline by arranging components in the correct order and configuring each stage properly.

Learning Objectives:

  • Pipeline Architecture: Understand RAG component flow
  • Document Processing: Configure chunking and embedding
  • Vector Search: Set up retrieval parameters
  • Context Injection: Optimize prompt augmentation

RAG Pipeline Builder

0/5 components placed
Components
Document Loader
Text Chunker
Embedding Model
Vector Store
Retriever
LLM Generator
Stage 1
Stage 2
Stage 3
Stage 4
Stage 5
Progress: 0/4 tasks completed
Score: 0/100
0%

Lab Completed!

Excellent RAG pipeline design!

Lab 8: Model Fine-Tuning Studio
Fine-Tuning / Advanced
Scenario: Custom Model Training
MedTech Solutions needs to fine-tune GPT-4 for medical terminology extraction. Configure the fine-tuning job including dataset preparation, hyperparameter selection, and validation settings. Create a properly formatted training dataset and optimize for their specific use case.

Learning Objectives:

  • Dataset Format: Structure JSONL training data correctly
  • Hyperparameters: Select appropriate learning rate, epochs, batch size
  • Validation: Configure evaluation metrics and checkpoints
  • Cost Estimation: Calculate training costs and tokens

Fine-Tuning Configuration

📋 Task: Create Training Dataset
Write at least 3 valid JSONL training examples in the editor below. Each line must be valid JSON with "messages" array containing system, user, and assistant roles.
0 valid lines
Required Format (each line):
{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
📋 Task: Configure Hyperparameters
Set all hyperparameters within the valid ranges. Hover over fields for guidance.
Model to fine-tune
1-10 recommended, 3-4 typical
1-32, larger = faster but more memory
0.1-2.0, default is 1.0
📋 Task: Configure Validation
Set up validation data split and checkpoint frequency.
5-30%, typically 10-20%
How often to save model
Stop if no improvement for N checks
Custom name suffix (max 40 chars)
📋 Review Configuration
Verify all settings before launching the fine-tuning job.
Complete all tabs to see configuration summary...
Progress: 0/4 tasks completed
Score: 0/100
0%

Lab Completed!

Excellent fine-tuning configuration!

Lab 9: AI Safety & Guardrails
Safety / Critical
Scenario: Content Moderation System
SafeAI Inc. needs to implement guardrails for their customer-facing chatbot. Configure input/output filters to block harmful content, PII leakage, prompt injections, and jailbreak attempts. Test your guardrails against various attack scenarios.

Learning Objectives:

  • Input Validation: Detect and block malicious prompts
  • Output Filtering: Prevent harmful content generation
  • PII Protection: Redact sensitive information
  • Jailbreak Defense: Resist manipulation attempts

Guardrail Rule Editor

Write rules to block attacks
📋 Task: Write Guardrail Detection Rules
For each attack type, write regex patterns or keyword lists that will detect and block malicious inputs. Your rules must successfully block the test attack to complete each scenario.
Select Attack Scenario:
Your Guardrail Rule
Enter keywords that indicate this attack type
e.g., \bignore\s+.*instructions\b
Attack Input (Read-Only)
Select an attack scenario to see the malicious input
Test Result
Write a guardrail rule and test it
Progress: 0/4 scenarios blocked
Score: 0/100
0%

Lab Completed!

Excellent safety implementation!

Lab 7: RAG Pipeline Instructions

Objective

Build a complete RAG (Retrieval-Augmented Generation) pipeline by placing components in the correct order and configuring all parameters properly.

Step-by-Step Guide

  1. Drag Components: Drag components from the palette to the pipeline slots in the correct order: Document Loader → Text Chunker → Embedding Model → Vector Store → Retriever.
  2. Configure Parameters: After placing all components, configure chunk size (100-2000), chunk overlap (0-500), embedding model, top-K results, similarity threshold, and vector store type.
  3. Test Pipeline: Write a test query (20+ characters) and click "Test Pipeline" to verify your configuration works.
Pro Tips

Chunk overlap should be ~10-20% of chunk size for better context preservation. Use text-embedding-3-small for cost efficiency or text-embedding-3-large for better accuracy.

Optimal Parameter Ranges
  • Chunk size: 500-1000 tokens is typical for most documents
  • Chunk overlap: 50-200 tokens prevents context loss
  • Top-K: 3-5 results balances relevance and noise
  • Similarity threshold: 0.7-0.8 filters low-quality matches
Common Mistakes

Placing components in wrong order (e.g., Retriever before Vector Store). Make sure chunk overlap is less than chunk size. Don't forget to select an embedding model and vector store.

Lab 8: Fine-Tuning Instructions

Objective

Configure a complete fine-tuning job for a language model, including dataset preparation, hyperparameter selection, and validation settings.

Configuration Steps

  1. Dataset Tab: Create valid JSONL training examples with system, user, and assistant messages. Minimum 10 examples required.
  2. Hyperparameters Tab: Set learning rate multiplier (0.1-2.0), batch size (1-32), and number of epochs (1-10).
  3. Validation Tab: Configure validation split percentage and evaluation frequency.
  4. Review & Submit: Review your configuration and verify the cost estimate before submitting.
Pro Tips

Start with a learning rate multiplier of 1.0 and adjust based on results. More epochs with smaller learning rates often yields better results than fewer epochs with higher rates.

JSONL Format Example

{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}

Common Mistakes

Invalid JSONL syntax (missing commas, wrong quotes). Each line must be valid JSON. Learning rate too high (>2.0) can cause unstable training.

Lab 9: AI Safety Instructions

Objective

Write guardrail rules to detect and block 4 types of attacks: Prompt Injection, Jailbreak Attempts, PII Extraction, and Harmful Content requests.

For Each Attack Scenario

  1. Select Scenario: Click on an attack type to see the malicious input.
  2. Write Rule Name: Give your detection rule a descriptive name.
  3. Add Keywords: Enter comma-separated keywords that indicate this attack type.
  4. Optional Regex: Add a regex pattern for advanced detection.
  5. Select Action: Choose "Block Request" for security-critical attacks.
  6. Test Rule: Click "Test Rule Against Attack" to verify it blocks the attack.
Pro Tips

Look for manipulation phrases like "ignore previous", "pretend you are", "disregard instructions". For PII attacks, detect requests for SSN, credit cards, passwords.

Keywords by Attack Type
  • Injection: ignore, disregard, forget, new instructions, override
  • Jailbreak: pretend, roleplay, DAN, bypass, no restrictions
  • PII: SSN, credit card, password, social security, bank account
  • Harmful: hack, exploit, weapon, illegal, bypass security
Common Mistakes

Rules too narrow (missing variations). Always use "Block Request" for security threats. Don't forget to test each rule before moving to the next scenario.