Advanced Database Systems

Master Cassandra distributed systems, Elasticsearch full-text search, and Neo4j graph databases. Work with cutting-edge NoSQL technologies used in production environments.

Distributed & Graph Databases - Module 9

Advanced NoSQL systems requiring hands-on CQL, REST API, and Cypher query skills!

Lab 25: Cassandra Distributed Database
NoSQL / Expert
Scenario: IoT Sensor Data Platform
TechSensors needs a highly available, distributed database for billions of IoT sensor readings. You must write CQL commands to create keyspaces, design wide-row tables with proper partitioning, and configure replication strategies. Each command is validated for correct syntax.

Learning Objectives:

  • Keyspace Creation: Define replication strategy and consistency levels
  • Table Design: Create wide-row tables with partition and clustering keys
  • Data Modeling: Write time-series data models with TTL
  • Queries: Use WHERE clauses with partition keys properly

πŸ“‹ Step-by-Step Instructions

  1. Step 1: Create Keyspace with Replication
    Create a keyspace with NetworkTopologyStrategy and RF=3.
    Required Syntax:
    CREATE KEYSPACE iot_sensors WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1': 3};

    β€’ Keyspace name: iot_sensors
    β€’ Strategy: NetworkTopologyStrategy
    β€’ RF: 3
    πŸ’‘ Tip: NetworkTopologyStrategy is production-ready, SimpleStrategy is for dev only.
  2. Step 2: Switch to Keyspace
    Use the keyspace to work with tables.
    Required Command:
    USE iot_sensors;
    πŸ’‘ Tip: Always USE keyspace before creating tables.
  3. Step 3: Create Time-Series Table
    Design a wide-row table for sensor readings with proper keys.
    Required Elements:
    β€’ Table name: sensor_data
    β€’ Partition key: sensor_id
    β€’ Clustering key: timestamp (DESC order)
    β€’ Columns: temperature, humidity, battery_level
    β€’ Must use: PRIMARY KEY ((sensor_id), timestamp)
    πŸ’‘ Tip: Partition by sensor_id, cluster by timestamp for efficient time-series queries.
  4. Step 4: Insert Sensor Reading
    Insert a sensor reading with all required fields.
    Required Fields:
    β€’ sensor_id: text (e.g., 'SENSOR-001')
    β€’ timestamp: timestamp
    β€’ temperature, humidity, battery_level
    πŸ’‘ Tip: Must include partition key (sensor_id) and clustering key (timestamp)!
  5. Step 5: Query with WHERE Clause
    Query sensor data using the partition key properly.
    Required Query:
    β€’ Must use: SELECT
    β€’ Must have: WHERE sensor_id = (partition key required!)
    β€’ Optional: LIMIT clause
    πŸ’‘ Tip: Cassandra requires WHERE clause to include partition key. Can't scan full table!
  6. Step 6: Create Table with TTL
    Create a table for temporary data with automatic expiration.
    Required Elements:
    β€’ Table name: alerts
    β€’ Must have: default_time_to_live = 86400 (24 hours)
    β€’ Primary key: alert_id
    πŸ’‘ Tip: TTL automatically deletes old data - perfect for temporary alerts!
Cassandra CQL Shell
cqlsh -- localhost:9042
Connected to Cassandra Cluster CQL Version: 3.4.5 [cqlsh 6.0.0 | Cassandra 4.0.0] cqlsh> Type CQL commands below.
cqlsh>
CQL Quick Reference
β€’ CREATE KEYSPACE name WITH replication = {...};
β€’ USE keyspace_name;
β€’ CREATE TABLE name (col type, PRIMARY KEY ((partition), clustering));
β€’ INSERT INTO table (cols) VALUES (vals);
β€’ SELECT * FROM table WHERE partition_key = value;
Progress: 0/6 tasks completed
Score: 0/100
0%

Lab Completed!

Excellent Cassandra mastery!

Lab 26: Elasticsearch Full-Text Search
Search / Expert
Scenario: News Article Search Engine
MediaCorp needs a powerful search engine for millions of news articles. You must write Elasticsearch REST API commands to create indices with mappings, configure analyzers for full-text search, and write complex queries with aggregations.

Learning Objectives:

  • Index Creation: Create indices with custom mappings and analyzers
  • Document Indexing: Index documents with proper field types
  • Full-Text Search: Write match, bool, and multi_match queries
  • Aggregations: Create bucket and metric aggregations

πŸ“‹ Step-by-Step Instructions

  1. Step 1: Create Index with Mappings
    Create an index with explicit field mappings and analyzers.
    Required API Call:
    PUT /news_articles

    β€’ Index name: news_articles
    β€’ Must define: mappings with properties
    β€’ Fields: title (text), content (text), author (keyword), published_date (date)
    πŸ’‘ Tip: Use 'text' for full-text search, 'keyword' for exact match.
  2. Step 2: Index a Document
    Add a news article document to the index.
    Required API Call:
    POST /news_articles/_doc

    β€’ Must include: title, content, author, published_date
    πŸ’‘ Tip: POST auto-generates ID, PUT with /_doc/id for specific ID.
  3. Step 3: Full-Text Match Query
    Search articles using full-text match query.
    Required Query:
    GET /news_articles/_search

    β€’ Must use: match query
    β€’ Search in: title or content field
    πŸ’‘ Tip: match query analyzes search terms and finds relevant documents.
  4. Step 4: Bool Query with Filters
    Write a compound query with must, should, filter clauses.
    Required Elements:
    β€’ Must use: bool query
    β€’ Include: must or should clause
    β€’ Include: filter for date range
    πŸ’‘ Tip: filter doesn't affect scoring - use for date ranges, exact matches.
  5. Step 5: Multi-Match Query
    Search across multiple fields at once.
    Required Query:
    β€’ Must use: multi_match query
    β€’ Search fields: ["title", "content"]
    β€’ Include: type parameter (best_fields or cross_fields)
    πŸ’‘ Tip: best_fields returns docs matching best in any field.
  6. Step 6: Aggregation Query
    Create aggregations to analyze article data.
    Required Elements:
    β€’ Must use: aggs or aggregations
    β€’ Include: terms aggregation on author
    β€’ Or: date_histogram on published_date
    πŸ’‘ Tip: terms agg creates buckets by field value - great for faceted search!
Elasticsearch REST Console
elasticsearch:9200
Elasticsearch 8.11.0 Cluster: news-cluster (green) Connected to localhost:9200 ES> Type REST commands (e.g., GET /_cluster/health)
ES>
Elasticsearch API Reference
β€’ PUT /index_name - Create index
β€’ POST /index/_doc - Index document
β€’ GET /index/_search - Search documents
β€’ GET /_cluster/health - Cluster status
Progress: 0/6 tasks completed
Score: 0/100
0%

Lab Completed!

Excellent Elasticsearch mastery!

Lab 27: Neo4j Graph Database
Graph / Expert
Scenario: Social Network Analysis
SocialNet needs a graph database to model user relationships, recommendations, and fraud detection. You must write Cypher queries to create nodes, relationships, and perform graph traversals and pattern matching.

Learning Objectives:

  • Node Creation: Create labeled nodes with properties
  • Relationships: Define typed relationships between nodes
  • Pattern Matching: Use MATCH to find graph patterns
  • Graph Algorithms: Find shortest paths and recommendations

πŸ“‹ Step-by-Step Instructions

  1. Step 1: Create User Nodes
    Create user nodes with the Person label and properties.
    Required Syntax:
    CREATE (n:Person {name: 'Alice', age: 30})

    β€’ Label: Person
    β€’ Properties: name, age
    πŸ’‘ Tip: Labels categorize nodes (Person, Product, etc). Properties store data.
  2. Step 2: Create Relationships
    Create FOLLOWS relationship between users.
    Required Syntax:
    MATCH (a:Person {name:'Alice'}), (b:Person {name:'Bob'}) CREATE (a)-[:FOLLOWS]->(b)

    β€’ Relationship type: FOLLOWS
    β€’ Direction matters: (a)-[:REL]->(b)
    πŸ’‘ Tip: Relationships are typed and directional. Use MERGE to avoid duplicates.
  3. Step 3: MATCH Pattern Query
    Query the graph to find all people someone follows.
    Required Query:
    β€’ Must use: MATCH
    β€’ Pattern: (p:Person)-[:FOLLOWS]->(followed)
    β€’ Must use: RETURN
    πŸ’‘ Tip: MATCH finds patterns in the graph. Always end with RETURN.
  4. Step 4: Variable-Length Paths
    Find friends-of-friends using variable-length paths.
    Required Query:
    β€’ Must use: *1..2 or similar variable length
    β€’ Pattern: -[:FOLLOWS*1..2]->
    β€’ Find 2nd degree connections
    πŸ’‘ Tip: *1..2 means 1 to 2 hops. *..3 means up to 3 hops.
  5. Step 5: Aggregation Query
    Count followers per user using aggregation.
    Required Query:
    β€’ Must use: COUNT() aggregation
    β€’ Must use: ORDER BY
    β€’ Find most followed users
    πŸ’‘ Tip: count() aggregates per group. ORDER BY DESC for top results.
  6. Step 6: Shortest Path Algorithm
    Find the shortest path between two users.
    Required Query:
    β€’ Must use: shortestPath function
    β€’ Pattern: shortestPath((a)-[*]-(b))
    β€’ Find connection between two people
    πŸ’‘ Tip: shortestPath finds minimum hops. Use [*..10] to limit depth.
Neo4j Cypher Shell
neo4j://localhost:7687
Neo4j 5.15.0 Community Edition Connected to neo4j://localhost:7687 Database: neo4j neo4j> Type Cypher queries below.
neo4j>
Cypher Quick Reference
β€’ CREATE (n:Label {prop: val}) - Create node
β€’ (a)-[:REL]->(b) - Relationship pattern
β€’ MATCH (n) RETURN n - Query pattern
β€’ WHERE n.prop = val - Filter results
Progress: 0/6 tasks completed
Score: 0/100
0%

Lab Completed!

Excellent Neo4j mastery!