AI Labs | CertLabz

GenAI Advanced Labs - Module 3

Master RAG architecture, model fine-tuning, and responsible AI practices.

Lab 7: RAG Pipeline Builder

RAG / Advanced

Scenario: Enterprise Knowledge Assistant

TechCorp needs a Retrieval-Augmented Generation (RAG) system to answer questions from their 10,000+ document knowledge base. Build a complete RAG pipeline by arranging components in the correct order and configuring each stage properly.

Learning Objectives:

Pipeline Architecture: Understand RAG component flow
Document Processing: Configure chunking and embedding
Vector Search: Set up retrieval parameters
Context Injection: Optimize prompt augmentation

RAG Pipeline Builder

0/5 components placed

Components

Document Loader

Text Chunker

Embedding Model

Vector Store

Retriever

LLM Generator

Stage 1

Stage 2

Stage 3

Stage 4

Stage 5

Test Query

Progress: 0/4 tasks completed

Score: 0/100

Lab Completed!

Excellent RAG pipeline design!

Lab 8: Model Fine-Tuning Studio

Fine-Tuning / Advanced

Scenario: Custom Model Training

MedTech Solutions needs to fine-tune GPT-4 for medical terminology extraction. Configure the fine-tuning job including dataset preparation, hyperparameter selection, and validation settings. Create a properly formatted training dataset and optimize for their specific use case.

Learning Objectives:

Dataset Format: Structure JSONL training data correctly
Hyperparameters: Select appropriate learning rate, epochs, batch size
Validation: Configure evaluation metrics and checkpoints
Cost Estimation: Calculate training costs and tokens

Fine-Tuning Configuration

📋 Task: Create Training Dataset

Write at least 3 valid JSONL training examples in the editor below. Each line must be valid JSON with "messages" array containing system, user, and assistant roles.

Training Data (JSONL Format)

0 valid lines

Required Format (each line):

{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}

📋 Task: Configure Hyperparameters

Set all hyperparameters within the valid ranges. Hover over fields for guidance.

Base Model

Model to fine-tune

Number of Epochs

1-10 recommended, 3-4 typical

Batch Size

1-32, larger = faster but more memory

Learning Rate Multiplier

0.1-2.0, default is 1.0

📋 Task: Configure Validation

Set up validation data split and checkpoint frequency.

Validation Split (%)

5-30%, typically 10-20%

Checkpoint Frequency

How often to save model

Early Stopping Patience

Stop if no improvement for N checks

Model Suffix

Custom name suffix (max 40 chars)

📋 Review Configuration

Verify all settings before launching the fine-tuning job.

Complete all tabs to see configuration summary...

Progress: 0/4 tasks completed

Score: 0/100

Lab Completed!

Excellent fine-tuning configuration!

Lab 9: AI Safety & Guardrails

Safety / Critical

Scenario: Content Moderation System

SafeAI Inc. needs to implement guardrails for their customer-facing chatbot. Configure input/output filters to block harmful content, PII leakage, prompt injections, and jailbreak attempts. Test your guardrails against various attack scenarios.

Learning Objectives:

Input Validation: Detect and block malicious prompts
Output Filtering: Prevent harmful content generation
PII Protection: Redact sensitive information
Jailbreak Defense: Resist manipulation attempts

Guardrail Rule Editor

Write rules to block attacks

📋 Task: Write Guardrail Detection Rules

For each attack type, write regex patterns or keyword lists that will detect and block malicious inputs. Your rules must successfully block the test attack to complete each scenario.

Select Attack Scenario:

Your Guardrail Rule

Rule Name

Detection Keywords (comma-separated)

Enter keywords that indicate this attack type

Regex Pattern (optional, for advanced detection)

e.g., \bignore\s+.*instructions\b

Block Action

Attack Input (Read-Only)

Select an attack scenario to see the malicious input

Test Result

Write a guardrail rule and test it

Progress: 0/4 scenarios blocked

Score: 0/100

Lab Completed!

Excellent safety implementation!

Lab 7: RAG Pipeline Instructions

Objective

Build a complete RAG (Retrieval-Augmented Generation) pipeline by placing components in the correct order and configuring all parameters properly.

Step-by-Step Guide

Drag Components: Drag components from the palette to the pipeline slots in the correct order: Document Loader → Text Chunker → Embedding Model → Vector Store → Retriever.
Configure Parameters: After placing all components, configure chunk size (100-2000), chunk overlap (0-500), embedding model, top-K results, similarity threshold, and vector store type.
Test Pipeline: Write a test query (20+ characters) and click "Test Pipeline" to verify your configuration works.

Pro Tips

Chunk overlap should be ~10-20% of chunk size for better context preservation. Use text-embedding-3-small for cost efficiency or text-embedding-3-large for better accuracy.

Optimal Parameter Ranges

Chunk size: 500-1000 tokens is typical for most documents
Chunk overlap: 50-200 tokens prevents context loss
Top-K: 3-5 results balances relevance and noise
Similarity threshold: 0.7-0.8 filters low-quality matches

Common Mistakes

Placing components in wrong order (e.g., Retriever before Vector Store). Make sure chunk overlap is less than chunk size. Don't forget to select an embedding model and vector store.

Lab 8: Fine-Tuning Instructions

Objective

Configure a complete fine-tuning job for a language model, including dataset preparation, hyperparameter selection, and validation settings.

Configuration Steps

Dataset Tab: Create valid JSONL training examples with system, user, and assistant messages. Minimum 10 examples required.
Hyperparameters Tab: Set learning rate multiplier (0.1-2.0), batch size (1-32), and number of epochs (1-10).
Validation Tab: Configure validation split percentage and evaluation frequency.
Review & Submit: Review your configuration and verify the cost estimate before submitting.

Pro Tips

Start with a learning rate multiplier of 1.0 and adjust based on results. More epochs with smaller learning rates often yields better results than fewer epochs with higher rates.

JSONL Format Example

{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}

Common Mistakes

Invalid JSONL syntax (missing commas, wrong quotes). Each line must be valid JSON. Learning rate too high (>2.0) can cause unstable training.

Lab 9: AI Safety Instructions

Objective

Write guardrail rules to detect and block 4 types of attacks: Prompt Injection, Jailbreak Attempts, PII Extraction, and Harmful Content requests.

For Each Attack Scenario

Select Scenario: Click on an attack type to see the malicious input.
Write Rule Name: Give your detection rule a descriptive name.
Add Keywords: Enter comma-separated keywords that indicate this attack type.
Optional Regex: Add a regex pattern for advanced detection.
Select Action: Choose "Block Request" for security-critical attacks.
Test Rule: Click "Test Rule Against Attack" to verify it blocks the attack.

Pro Tips

Look for manipulation phrases like "ignore previous", "pretend you are", "disregard instructions". For PII attacks, detect requests for SSN, credit cards, passwords.

Keywords by Attack Type

Injection: ignore, disregard, forget, new instructions, override
Jailbreak: pretend, roleplay, DAN, bypass, no restrictions
PII: SSN, credit card, password, social security, bank account
Harmful: hack, exploit, weapon, illegal, bypass security

Common Mistakes

Rules too narrow (missing variations). Always use "Block Request" for security threats. Don't forget to test each rule before moving to the next scenario.

AI & Machine Learning Labs

GenAI Advanced Labs - Module 3

Learning Objectives:

RAG Pipeline Builder

Pipeline Configuration

Lab Completed!

Learning Objectives:

Fine-Tuning Configuration

Lab Completed!

Learning Objectives:

Guardrail Rule Editor

Lab Completed!

Lab 7: RAG Pipeline Instructions

Objective

Step-by-Step Guide

Pro Tips

Optimal Parameter Ranges

Common Mistakes

Lab 8: Fine-Tuning Instructions

Objective

Configuration Steps

Pro Tips

JSONL Format Example

Common Mistakes

Lab 9: AI Safety Instructions

Objective

For Each Attack Scenario

Pro Tips

Keywords by Attack Type

Common Mistakes