Data Analytics Intermediate Labs

Level up your data skills with visualization dashboards, data quality frameworks, and NoSQL databases. Build production-ready data solutions.

Visualization & NoSQL - Module 2

Intermediate-level labs covering data visualization, quality assurance, and NoSQL database operations.

Lab 4: Data Visualization Dashboard
Visualization / Intermediate
Scenario: Executive Sales Dashboard
SalesForce Global needs a real-time executive dashboard to visualize sales performance across regions. Using the dashboard builder, you'll create interactive charts, configure KPIs, and design a presentation-ready analytics view for the C-suite.

Learning Objectives:

  • Chart Selection: Choose appropriate visualization types for different data
  • KPI Design: Create meaningful key performance indicators
  • Interactivity: Add filters and drill-down capabilities
  • Dashboard Layout: Design effective visual hierarchy

📋 Step-by-Step Instructions

  1. Step 1: Configure Data Source
    đŸŽ¯ Goal: Connect to the sales database

    📝 Why Data Source First?
    Before visualizing anything, you must connect to your data source. Dashboards are only as good as the data they display.

    đŸ’ģ Configuration:
    Enter database: sales_db
    Enter table: orders
    Click "Connect"
    💡 Pro Tip: Always test connections before building visualizations to avoid errors.
  2. Step 2: Create Bar Chart with Aggregation
    đŸŽ¯ Goal: Visualize sales by region with proper aggregation

    📝 Configuration Required:
    You must specify which columns to use and how to aggregate the data.

    đŸ’ģ Steps:
    1. Select chart type: bar
    2. X-axis: region
    3. Y-axis: SUM(amount)
    4. Click "Add Chart"
    📖 Best Practice: Use SUM for totals, AVG for averages, COUNT for quantities.
  3. Step 3: Add Line Chart with Time Grouping
    đŸŽ¯ Goal: Show monthly sales trend

    📝 Time-Series Configuration:
    Line charts for time data require date grouping (daily, monthly, yearly).

    đŸ’ģ Configuration:
    1. Chart type: line
    2. X-axis: date (grouped by MONTH)
    3. Y-axis: SUM(amount)
    4. Click "Add Chart"
    💡 Exam Tip: Always group date fields - raw dates create too many data points.
  4. Step 4: Configure Filter
    đŸŽ¯ Goal: Add date range filter for interactivity

    📝 Why Filters?
    Filters let users explore different time periods or segments without rebuilding the dashboard.

    đŸ’ģ Configuration:
    1. Filter field: date
    2. Filter type: range
    3. Default: Last 6 months
    4. Click "Add Filter"
    âš ī¸ Warning: Filters affect ALL charts on the dashboard - configure them carefully.
  5. Step 5: Set Refresh Rate
    đŸŽ¯ Goal: Configure auto-refresh for live data

    📝 Refresh Strategy:
    Real-time dashboards need periodic refresh. Balance freshness with database load.

    đŸ’ģ Configuration:
    Enter refresh interval: 300
    Unit: seconds
    Click "Set Refresh"
  6. Step 6: Configure Alerts
    đŸŽ¯ Goal: Set threshold alert for revenue drops

    đŸ’ģ Alert Configuration:
    1. Metric: daily_revenue
    2. Condition: less than
    3. Threshold: 50000
    4. Click "Add Alert"
    🎓 Learning Checkpoint: You've built a production-ready dashboard with data source, visualizations, filters, and alerts!

Dashboard Builder - SalesForce Global

1. Data Source
2. Charts
3. Filters
4. Settings
Preview
Configure Data Source
Progress: 0/6 tasks completed
Score: 0/100
0%

Lab Completed!

Excellent dashboard design!

Lab 5: Data Quality Framework
Quality / Intermediate
Scenario: Data Quality Audit
DataTrust Corp's data warehouse has quality issues affecting downstream analytics. Using the Data Quality Console, you'll configure validation rules, run quality checks, and generate compliance reports. Your goal is to achieve 95%+ data quality score.

Learning Objectives:

  • Validation Rules: Configure completeness, uniqueness, and format checks
  • Quality Metrics: Understand DQ dimensions (accuracy, consistency, timeliness)
  • Remediation: Apply fixes for common data quality issues
  • Reporting: Generate compliance and audit reports

📋 Step-by-Step Instructions

  1. Step 1: Create NULL Check Rule
    đŸŽ¯ Goal: Write rule to detect missing critical fields

    📝 What is Completeness?
    Completeness measures if all required data is present. NULL values in critical fields like customer_id or email break downstream processes.

    đŸ’ģ Configuration:
    1. Field: customer_id
    2. Rule: NOT NULL
    3. Threshold: 100%
    4. Click "Add Rule"
    💡 Pro Tip: Critical fields should have 100% completeness. Optional fields may have lower thresholds like 90%.
  2. Step 2: Create Uniqueness Rule
    đŸŽ¯ Goal: Configure duplicate detection on primary key

    📝 Why Uniqueness Matters?
    Duplicate records cause inflated metrics, double billing, and analytics errors. Primary keys MUST be unique.

    đŸ’ģ Configuration:
    1. Field: customer_id
    2. Rule: UNIQUE
    3. Action: flag
    4. Click "Add Rule"
  3. Step 3: Write Email Validation Regex
    đŸŽ¯ Goal: Create pattern to validate email formats

    📝 Regex Pattern Required:
    You must write a regular expression that matches valid email addresses (user@domain.com).

    đŸ’ģ Configuration:
    1. Field: email
    2. Rule: REGEX
    3. Pattern: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
    4. Click "Add Rule"
    📖 Regex Tip: Test your pattern on sample data before deploying to production.
  4. Step 4: Configure Range Validation
    đŸŽ¯ Goal: Set min/max bounds for numeric fields

    📝 Range Configuration:
    Numeric fields should have logical bounds. Prices can't be negative, ages can't exceed 120.

    đŸ’ģ Configuration:
    1. Field: age
    2. Rule: BETWEEN
    3. Min: 0, Max: 120
    4. Click "Add Rule"
  5. Step 5: Set Quality Thresholds
    đŸŽ¯ Goal: Define acceptable quality scores per dimension

    📝 Threshold Configuration:
    Set minimum acceptable scores. Below these thresholds, data quality fails audit.

    đŸ’ģ Configuration:
    1. Completeness: 95%
    2. Uniqueness: 99%
    3. Validity: 90%
    4. Click "Set Thresholds"
    ✅ Target: All dimensions must meet thresholds to pass audit.
  6. Step 6: Configure Alert Notifications
    đŸŽ¯ Goal: Set up alerts for quality failures

    📝 Alert Configuration:
    When quality checks fail, stakeholders must be notified immediately.

    đŸ’ģ Configuration:
    1. Alert type: email
    2. Recipients: dq-team@company.com
    3. Trigger: any rule fails
    4. Click "Save Alert"
    🎓 Learning Checkpoint: You've configured a complete data quality framework with custom rules, thresholds, and alerting!

Data Quality Console - DataTrust Corp

Create Rules
Thresholds
Alerts
Active Rules
Results
Create Validation Rule
Progress: 0/6 tasks completed
Score: 0/100
0%

Lab Completed!

Excellent data quality skills!

Lab 6: NoSQL with MongoDB
NoSQL / Intermediate
Scenario: Document Database Operations
StreamFlix needs to migrate their user profiles and viewing history to MongoDB for better scalability. You'll perform CRUD operations, create indexes for performance, and write aggregation pipelines to analyze viewing patterns.

Learning Objectives:

  • CRUD Operations: Insert, find, update, delete documents
  • Query Operators: Use $gt, $in, $regex for complex queries
  • Aggregation: Build pipelines with $match, $group, $sort
  • Indexing: Create indexes for query optimization

📋 Step-by-Step Instructions

  1. Step 1: Connect & Show Collections
    đŸŽ¯ Goal: Connect to database and explore collections

    📝 MongoDB vs SQL:
    â€ĸ Database = Database
    â€ĸ Collection = Table
    â€ĸ Document = Row
    â€ĸ Field = Column


    đŸ’ģ Command:
    show collections
    💡 Pro Tip: Use 'use dbname' to switch databases, 'show dbs' to list all databases.
  2. Step 2: Insert Documents
    đŸŽ¯ Goal: Add user profile documents

    📝 Document Structure:
    MongoDB stores data as BSON (Binary JSON). Documents can have nested objects and arrays - no schema required!

    đŸ’ģ Command:
    db.users.insertOne
  3. Step 3: Query with Operators
    đŸŽ¯ Goal: Find users matching criteria

    📝 Query Operators:
    â€ĸ $gt/$gte: Greater than
    â€ĸ $lt/$lte: Less than
    â€ĸ $in: Match array values
    â€ĸ $regex: Pattern matching


    đŸ’ģ Command:
    db.users.find
  4. Step 4: Update Documents
    đŸŽ¯ Goal: Modify existing documents

    📝 Update Operators:
    â€ĸ $set: Set field value
    â€ĸ $inc: Increment number
    â€ĸ $push: Add to array
    â€ĸ $unset: Remove field


    đŸ’ģ Command:
    db.users.updateOne
  5. Step 5: Aggregation Pipeline
    đŸŽ¯ Goal: Analyze viewing patterns

    📝 Pipeline Stages:
    â€ĸ $match: Filter documents
    â€ĸ $group: Group and aggregate
    â€ĸ $sort: Order results
    â€ĸ $project: Shape output


    đŸ’ģ Command:
    db.users.aggregate
  6. Step 6: Create Index
    đŸŽ¯ Goal: Optimize query performance

    📝 Why Indexes?
    Without indexes, MongoDB scans every document (collection scan). Indexes enable O(log n) lookups on indexed fields.

    đŸ’ģ Command:
    db.users.createIndex
    🎓 Learning Checkpoint: You've mastered MongoDB CRUD, queries, aggregations, and indexing!

MongoDB Shell - StreamFlix Database

MongoDB shell version v6.0.0
connecting to: mongodb://localhost:27017/streamflix
Implicit session: session { "id" : UUID("...") }
MongoDB server version: 6.0.0
---
streamflix>
Progress: 0/6 tasks completed
Score: 0/100
0%

Lab Completed!

Excellent NoSQL skills!