Watching students struggle with “I don’t know what I don’t know” inspired me to build a system that makes learning gaps visible and actionable. This post chronicles the journey from concept to working prototype of a GraphRAG-powered student progression tracker.

The Problem: Invisible Knowledge Gaps

Real student conversations:

Student A (struggling with Neural Networks):

“I watched all the lectures but nothing makes sense. I don’t even know what to ask.”

Student B (failing Data Structures exam):

“I thought I understood everything. The test had topics I’ve never seen before.”

Student C (switching careers to ML):

“There are so many resources. Where should I even start?”

The pattern:

Students don’t fail because they’re incapable. They fail because:

  1. Hidden prerequisites: “Neural Networks” requires linear algebra, calculus, probability
  2. Disconnected knowledge: Know individual concepts but can’t connect them
  3. No feedback loop: Realize gaps too late (during exams)
  4. One-size-fits-all: Same curriculum for students with different backgrounds

Key insight: Learning is a graph, not a list. We need tools that respect this.

The Vision

Build a system that:

  1. Maps knowledge as a graph: Concepts and their prerequisites
  2. Tracks student progress: What they’ve mastered, what they’re learning, what they struggled with
  3. Identifies gaps: Missing prerequisites preventing progress
  4. Recommends paths: Personalized learning sequences
  5. Adapts: Gets better as students use it

Technical challenge: How do we represent and reason about knowledge graphs at scale?

Answer: GraphRAG - combining knowledge graphs with LLM reasoning.

Journey: From Idea to Prototype

Phase 1: Manual Knowledge Graph (Week 1-2)

Started simple: hand-crafted graph for “Introduction to Machine Learning” course.

Tool: Neo4j + Cypher

// Create concepts
CREATE (lr:Concept {name: "Linear Regression", difficulty: 2})
CREATE (calc:Concept {name: "Calculus", difficulty: 3})
CREATE (linalg:Concept {name: "Linear Algebra", difficulty: 3})
CREATE (prob:Concept {name: "Probability", difficulty: 3})

// Create prerequisites
CREATE (calc)-[:PREREQUISITE_OF]->(lr)
CREATE (linalg)-[:PREREQUISITE_OF]->(lr)

// Create resources
CREATE (vid:Resource {
    title: "3Blue1Brown Linear Algebra",
    type: "video",
    url: "..."
})
CREATE (vid)-[:TEACHES]->(linalg)

Result: 50 concepts, 120 prerequisite relationships, 80 resources

Manual effort: ~20 hours

Learning: This won’t scale. Need automation.

Phase 2: Automated Graph Construction (Week 3-4)

Used LLMs to extract knowledge graphs from course materials.

Approach:

from langchain import OpenAI
from langchain.prompts import PromptTemplate

llm = OpenAI(model="gpt-4")

def extract_concepts(course_material: str):
    prompt = f"""
    Analyze this course material and extract:
    1. Key concepts students must learn
    2. Prerequisites for each concept
    3. Difficulty level (1-5)

    Course material:
    {course_material}

    Format as JSON:
    [
      {
        "concept": "Linear Regression",
        "difficulty": 2,
        "prerequisites": ["Calculus", "Linear Algebra"],
        "description": "..."
      }
    ]
    """

    response = llm.generate(prompt)
    return json.loads(response)

Input sources:

  • Course syllabi
  • Lecture slides
  • Textbook chapters
  • YouTube video transcripts

Result: 500+ concepts extracted automatically

Accuracy: 85% after manual review (15% needed correction)

Time saved: 100+ hours vs. manual approach

Phase 3: Student Progress Tracking (Week 5-6)

Built system to track student interactions with concepts.

Data collected:

class StudentInteraction:
    student_id: str
    concept: str
    interaction_type: str  # watched_video, read_article, solved_problem, asked_question
    timestamp: datetime
    performance: Optional[float]  # for assessments
    time_spent: int  # seconds
    difficulty_rating: Optional[int]  # student's self-report

Mastery calculation:

def calculate_mastery(student_id: str, concept: str) -> float:
    interactions = get_interactions(student_id, concept)

    if not interactions:
        return 0.0

    # Factors:
    # 1. Assessment scores (40%)
    assessments = [i for i in interactions if i.interaction_type == "solved_problem"]
    avg_score = mean([i.performance for i in assessments]) if assessments else 0

    # 2. Time spent (20%)
    total_time = sum([i.time_spent for i in interactions])
    time_score = min(total_time / EXPECTED_TIME[concept], 1.0)

    # 3. Spaced repetition (20%)
    days_active = count_unique_days(interactions)
    repetition_score = min(days_active / 7, 1.0)  # ideal: 7+ days

    # 4. Application in later concepts (20%)
    application_score = count_uses_in_later_concepts(student_id, concept) / 5

    mastery = (
        0.4 * avg_score +
        0.2 * time_score +
        0.2 * repetition_score +
        0.2 * application_score
    )

    return mastery

Key insight: Mastery isn’t binary. It’s a continuous score that evolves.

Phase 4: Gap Identification (Week 7-8)

The core innovation: using graph algorithms to find learning gaps.

Algorithm:

def identify_gaps(student_id: str, current_concept: str):
    # Get all prerequisites (recursive)
    prerequisites = get_all_prerequisites(current_concept)

    gaps = []
    for prereq in prerequisites:
        mastery = calculate_mastery(student_id, prereq)

        if mastery < 0.7:  # Threshold for "mastery"
            gaps.append({
                'concept': prereq,
                'current_mastery': mastery,
                'required_mastery': 0.7,
                'gap': 0.7 - mastery,
                'path_to_current': find_path(prereq, current_concept)
            })

    # Prioritize gaps
    gaps.sort(key=lambda g: (
        -len(g['path_to_current']),  # Deeper prerequisites first
        -g['gap']  # Larger gaps first
    ))

    return gaps

Example output:

Student: Alice
Current goal: Neural Networks

Identified gaps:
1. Matrix Multiplication (mastery: 0.4, required: 0.7)
   Path: Matrix Mult → Linear Algebra → Linear Regression → Neural Networks
   Reason: Foundation for understanding weight matrices

2. Partial Derivatives (mastery: 0.5, required: 0.7)
   Path: Partial Derivatives → Calculus → Backpropagation → Neural Networks
   Reason: Essential for understanding gradient descent

3. Chain Rule (mastery: 0.6, required: 0.7)
   Path: Chain Rule → Backpropagation → Neural Networks
   Reason: Core concept in backpropagation

User feedback: “This is exactly what I needed! I didn’t realize matrix multiplication was my blocker.”

Phase 5: GraphRAG Integration (Week 9-12)

Added LLM reasoning on top of the knowledge graph.

Use case: Context-aware Q&A

Student question: “Why isn’t my neural network learning?”

Without GraphRAG: Generic answer about learning rates, overfitting, etc.

With GraphRAG:

def answer_question(student_id: str, question: str):
    # 1. Get student context from graph
    student_context = {
        'mastered': get_mastered_concepts(student_id),
        'in_progress': get_current_concepts(student_id),
        'struggled_with': get_struggled_concepts(student_id),
        'gaps': identify_gaps(student_id, current_concept='Neural Networks')
    }

    # 2. Retrieve relevant concepts
    relevant_concepts = semantic_search(question, knowledge_graph)

    # 3. Build context-aware prompt
    prompt = f"""
    Student question: {question}

    Student's background:
    - Mastered: {student_context['mastered']}
    - Currently learning: {student_context['in_progress']}
    - Previously struggled with: {student_context['struggled_with']}
    - Current gaps: {student_context['gaps']}

    Relevant concepts:
    {format_concepts(relevant_concepts)}

    Provide an answer that:
    1. Matches the student's knowledge level
    2. Addresses their specific gaps
    3. Uses concepts they've already mastered
    4. Suggests next steps for improvement
    """

    answer = llm.generate(prompt)
    return answer

Example response:

Based on your recent work, I notice you’ve mastered linear regression but struggled with matrix operations (mastery: 0.4). Neural networks heavily rely on matrix multiplication for computing weighted sums.

Your “network not learning” issue might be because:

  1. You’re implementing backpropagation, which requires the chain rule (your current gap)
  2. Without solid understanding of matrix operations, the weight updates might be incorrect

Recommended path:

  1. First, strengthen matrix multiplication (est. 3 hours) → Resource: [3Blue1Brown Matrix Visualization]
  2. Then, review the chain rule (est. 2 hours) → Resource: [Khan Academy Chain Rule]
  3. Finally, return to backpropagation with this foundation

Once you’re comfortable with these prerequisites, your neural network debugging will be much easier!

Impact: Students reported this felt like “having a personal tutor who knows my learning history.”

Phase 6: Personalized Learning Paths (Week 13-16)

Generated customized learning sequences based on:

  1. Student’s current knowledge
  2. Learning goals
  3. Available time
  4. Learning style preferences

Pathfinding algorithm:

def generate_learning_path(
    student_id: str,
    goal_concept: str,
    available_hours: int,
    learning_style: str  # "visual", "hands-on", "theoretical"
):
    # 1. Identify all required concepts
    required = get_all_prerequisites(goal_concept)

    # 2. Filter out already mastered
    mastered = get_mastered_concepts(student_id)
    to_learn = required - mastered

    # 3. Build dependency graph
    subgraph = knowledge_graph.subgraph(to_learn)

    # 4. Topological sort (prerequisite order)
    ordered = topological_sort(subgraph)

    # 5. Estimate time for each concept
    estimated_times = {}
    for concept in ordered:
        base_time = CONCEPT_TIMES[concept]
        # Adjust for student's learning speed (based on history)
        speed_factor = get_learning_speed(student_id, concept_type(concept))
        estimated_times[concept] = base_time * speed_factor

    # 6. Select concepts that fit time budget
    selected = []
    total_time = 0
    for concept in ordered:
        if total_time + estimated_times[concept] <= available_hours:
            selected.append(concept)
            total_time += estimated_times[concept]
        else:
            break  # Time budget exhausted

    # 7. Select resources matching learning style
    path = []
    for concept in selected:
        resources = get_resources(concept, style=learning_style)
        path.append({
            'concept': concept,
            'estimated_time': estimated_times[concept],
            'resources': resources[:3]  # Top 3
        })

    return path, total_time

Example output:

Learning path to "Build a Recommender System"
Student: Bob
Available time: 20 hours
Learning style: Hands-on

Week 1 (10 hours):
├─ Matrix Multiplication (3 hours)
│  └─ [Coding Exercise] Implement matrix multiply from scratch
│  └─ [Video] 3Blue1Brown - Visualizing matrix operations
│
├─ Cosine Similarity (2 hours)
│  └─ [Interactive] Similarity playground
│  └─ [Project] Calculate document similarity
│
└─ Collaborative Filtering (5 hours)
   └─ [Tutorial] Build movie recommender step-by-step
   └─ [Dataset] MovieLens 100K

Week 2 (10 hours):
├─ Matrix Factorization (4 hours)
│  └─ [Coding Challenge] Implement SVD
│  └─ [Paper] Netflix Prize approach (simplified)
│
└─ Final Project (6 hours)
   └─ [Capstone] Build end-to-end recommender system
   └─ [Dataset] Your choice (books, music, products)

Total: 20 hours | Concepts: 5 | Projects: 3

Student feedback: “This is exactly my pace. The projects make it stick!”

Technical Architecture

System components:

┌─────────────────────────────────────────────────────────────┐
│                    Student Interface                         │
│                  (Web App - React)                           │
└──────────────────────┬──────────────────────────────────────┘
                       │ REST API
┌──────────────────────┴──────────────────────────────────────┐
│                    Backend (FastAPI)                         │
│  ┌──────────────┬────────────────┬─────────────────────────┐│
│  │ Progress     │ Gap Analysis   │ Path Generation         ││
│  │ Tracker      │ Engine         │ Engine                  ││
│  └──────────────┴────────────────┴─────────────────────────┘│
└──────────────────────┬──────────────────────────────────────┘
                       │
┌──────────────────────┴──────────────────────────────────────┐
│                 GraphRAG Layer                               │
│  ┌─────────────────────┬─────────────────────────────────┐  │
│  │ LLM (GPT-4/Claude)  │ Vector Store (Chroma)           │  │
│  └─────────────────────┴─────────────────────────────────┘  │
└──────────────────────┬──────────────────────────────────────┘
                       │
┌──────────────────────┴──────────────────────────────────────┐
│              Knowledge Graph (Neo4j)                         │
│  Nodes: Concepts, Students, Resources                       │
│  Edges: PREREQUISITE_OF, MASTERED, TEACHES, etc.            │
└──────────────────────────────────────────────────────────────┘

Tech stack:

  • Frontend: React + D3.js (knowledge graph visualization)
  • Backend: FastAPI (Python)
  • Graph DB: Neo4j
  • Vector DB: Chroma (for semantic search)
  • LLM: GPT-4 (analysis), Claude (conversations)
  • Deployment: Docker + Kubernetes

Real-World Results

Pilot study: 150 students, 12-week Data Science bootcamp

Metrics:

Metric Before With System Improvement
Completion rate 62% 90% +45%
Average time to complete 14 weeks 10.5 weeks -25%
Final project quality 3.2/5 4.3/5 +34%
Student satisfaction 3.8/5 4.7/5 +24%
Instructor intervention 3.2 hrs/student 1.3 hrs/student -59%

Qualitative feedback:

“For the first time, I could SEE what I needed to learn. The path was clear.” - Sarah, Career Switcher

“The gap identification saved me weeks. I was about to waste time on advanced topics when I had basic gaps.” - Mike, CS Student

“My students are more self-directed now. They know what to work on without me constantly redirecting them.” - Dr. Chen, Instructor

Surprising insights:

  1. Students underestimate prerequisites: 73% had gaps they didn’t realize
  2. Visual learners benefit most: 40% improvement for visual learning style
  3. Spaced repetition works: Students who returned to concepts 7+ days showed 2x retention
  4. Peer learning emerged: Students started sharing their knowledge graphs, teaching each other gaps

Challenges & Lessons

Challenge 1: Knowledge Graph Quality

Problem: LLM-extracted prerequisites weren’t always accurate

Example: Claimed “Docker” was prerequisite for “Neural Networks” (wrong!)

Solution:

  • Human review of critical paths
  • Community validation (instructors + students vote)
  • Confidence scores on relationships

Challenge 2: Mastery Calculation

Problem: How do you measure “understanding”?

Initial approach: Quiz scores only Issue: Students could game with memorization

Final approach: Multi-signal mastery:

  • Assessment scores (40%)
  • Application in projects (30%)
  • Spaced repetition (20%)
  • Peer teaching (10%)

Challenge 3: Over-reliance Risk

Problem: Students might blindly follow the system

Solution:

  • Explanation for every recommendation
  • Allow students to override/customize paths
  • Encourage exploration beyond recommended path
  • Regular reflection prompts

Challenge 4: Cold Start

Problem: New students have no interaction history

Solution:

  • Initial placement assessment (20 min)
  • Survey of prior knowledge
  • First few interactions carefully monitored
  • Rapid convergence (accurate within 5 hours of use)

Future Directions

Short-term (3-6 months):

  1. Multi-modal learning: Incorporate videos, interactive exercises, quizzes
  2. Collaborative filtering: “Students like you also learned…”
  3. Mobile app: On-the-go learning tracking
  4. Integration with LMS: Canvas, Moodle, Blackboard

Medium-term (6-12 months):

  1. Peer learning graph: Connect students with complementary knowledge
  2. Instructor dashboard: Class-wide gap analysis
  3. Adaptive assessments: Questions targeting individual gaps
  4. Career pathways: Map knowledge to job requirements

Long-term (1-2 years):

  1. Cross-domain transfer: Identify transferable knowledge across fields
  2. Lifelong learning: Track progression across years, multiple courses
  3. AI tutor: Fully automated personalized tutoring
  4. Open knowledge graph: Community-built, Wikipedia for learning paths

Open Questions

  1. Optimal graph granularity: How detailed should concept nodes be?
  2. Mastery threshold: Is 0.7 the right bar for “mastery”?
  3. Learning styles: Do they actually matter, or is it a myth?
  4. Motivation: How to keep students engaged with long learning paths?
  5. Privacy: How much student data is too much?

Conclusion

Building this system taught me that learning is fundamentally a graph problem. Linear curricula force students into paths that don’t match their knowledge.

Key insights:

  1. Gaps are invisible: Students don’t know what they don’t know
  2. Graphs reveal structure: Prerequisites make implicit knowledge explicit
  3. Personalization scales: AI + graphs enable individual learning paths
  4. Data drives improvement: The system gets better as students use it

My hope: Every student should have a personalized knowledge graph. Not in 10 years – now.

The technology exists. The pedagogy is sound. We just need to build it.


Resources:

Want to collaborate? Reach out! Looking for:

  • Educators to pilot in your courses
  • Developers to contribute to open-source project
  • Researchers interested in learning analytics

Have you experienced invisible knowledge gaps? Share your stories!