Watching students struggle with “I don’t know what I don’t know” inspired me to build a system that makes learning gaps visible and actionable. This post chronicles the journey from concept to working prototype of a GraphRAG-powered student progression tracker.
The Problem: Invisible Knowledge Gaps
Real student conversations:
Student A (struggling with Neural Networks):
“I watched all the lectures but nothing makes sense. I don’t even know what to ask.”
Student B (failing Data Structures exam):
“I thought I understood everything. The test had topics I’ve never seen before.”
Student C (switching careers to ML):
“There are so many resources. Where should I even start?”
The pattern:
Students don’t fail because they’re incapable. They fail because:
- Hidden prerequisites: “Neural Networks” requires linear algebra, calculus, probability
- Disconnected knowledge: Know individual concepts but can’t connect them
- No feedback loop: Realize gaps too late (during exams)
- One-size-fits-all: Same curriculum for students with different backgrounds
Key insight: Learning is a graph, not a list. We need tools that respect this.
The Vision
Build a system that:
- Maps knowledge as a graph: Concepts and their prerequisites
- Tracks student progress: What they’ve mastered, what they’re learning, what they struggled with
- Identifies gaps: Missing prerequisites preventing progress
- Recommends paths: Personalized learning sequences
- Adapts: Gets better as students use it
Technical challenge: How do we represent and reason about knowledge graphs at scale?
Answer: GraphRAG - combining knowledge graphs with LLM reasoning.
Journey: From Idea to Prototype
Phase 1: Manual Knowledge Graph (Week 1-2)
Started simple: hand-crafted graph for “Introduction to Machine Learning” course.
Tool: Neo4j + Cypher
// Create concepts
CREATE (lr:Concept {name: "Linear Regression", difficulty: 2})
CREATE (calc:Concept {name: "Calculus", difficulty: 3})
CREATE (linalg:Concept {name: "Linear Algebra", difficulty: 3})
CREATE (prob:Concept {name: "Probability", difficulty: 3})
// Create prerequisites
CREATE (calc)-[:PREREQUISITE_OF]->(lr)
CREATE (linalg)-[:PREREQUISITE_OF]->(lr)
// Create resources
CREATE (vid:Resource {
title: "3Blue1Brown Linear Algebra",
type: "video",
url: "..."
})
CREATE (vid)-[:TEACHES]->(linalg)
Result: 50 concepts, 120 prerequisite relationships, 80 resources
Manual effort: ~20 hours
Learning: This won’t scale. Need automation.
Phase 2: Automated Graph Construction (Week 3-4)
Used LLMs to extract knowledge graphs from course materials.
Approach:
from langchain import OpenAI
from langchain.prompts import PromptTemplate
llm = OpenAI(model="gpt-4")
def extract_concepts(course_material: str):
prompt = f"""
Analyze this course material and extract:
1. Key concepts students must learn
2. Prerequisites for each concept
3. Difficulty level (1-5)
Course material:
{course_material}
Format as JSON:
[
{
"concept": "Linear Regression",
"difficulty": 2,
"prerequisites": ["Calculus", "Linear Algebra"],
"description": "..."
}
]
"""
response = llm.generate(prompt)
return json.loads(response)
Input sources:
- Course syllabi
- Lecture slides
- Textbook chapters
- YouTube video transcripts
Result: 500+ concepts extracted automatically
Accuracy: 85% after manual review (15% needed correction)
Time saved: 100+ hours vs. manual approach
Phase 3: Student Progress Tracking (Week 5-6)
Built system to track student interactions with concepts.
Data collected:
class StudentInteraction:
student_id: str
concept: str
interaction_type: str # watched_video, read_article, solved_problem, asked_question
timestamp: datetime
performance: Optional[float] # for assessments
time_spent: int # seconds
difficulty_rating: Optional[int] # student's self-report
Mastery calculation:
def calculate_mastery(student_id: str, concept: str) -> float:
interactions = get_interactions(student_id, concept)
if not interactions:
return 0.0
# Factors:
# 1. Assessment scores (40%)
assessments = [i for i in interactions if i.interaction_type == "solved_problem"]
avg_score = mean([i.performance for i in assessments]) if assessments else 0
# 2. Time spent (20%)
total_time = sum([i.time_spent for i in interactions])
time_score = min(total_time / EXPECTED_TIME[concept], 1.0)
# 3. Spaced repetition (20%)
days_active = count_unique_days(interactions)
repetition_score = min(days_active / 7, 1.0) # ideal: 7+ days
# 4. Application in later concepts (20%)
application_score = count_uses_in_later_concepts(student_id, concept) / 5
mastery = (
0.4 * avg_score +
0.2 * time_score +
0.2 * repetition_score +
0.2 * application_score
)
return mastery
Key insight: Mastery isn’t binary. It’s a continuous score that evolves.
Phase 4: Gap Identification (Week 7-8)
The core innovation: using graph algorithms to find learning gaps.
Algorithm:
def identify_gaps(student_id: str, current_concept: str):
# Get all prerequisites (recursive)
prerequisites = get_all_prerequisites(current_concept)
gaps = []
for prereq in prerequisites:
mastery = calculate_mastery(student_id, prereq)
if mastery < 0.7: # Threshold for "mastery"
gaps.append({
'concept': prereq,
'current_mastery': mastery,
'required_mastery': 0.7,
'gap': 0.7 - mastery,
'path_to_current': find_path(prereq, current_concept)
})
# Prioritize gaps
gaps.sort(key=lambda g: (
-len(g['path_to_current']), # Deeper prerequisites first
-g['gap'] # Larger gaps first
))
return gaps
Example output:
Student: Alice
Current goal: Neural Networks
Identified gaps:
1. Matrix Multiplication (mastery: 0.4, required: 0.7)
Path: Matrix Mult → Linear Algebra → Linear Regression → Neural Networks
Reason: Foundation for understanding weight matrices
2. Partial Derivatives (mastery: 0.5, required: 0.7)
Path: Partial Derivatives → Calculus → Backpropagation → Neural Networks
Reason: Essential for understanding gradient descent
3. Chain Rule (mastery: 0.6, required: 0.7)
Path: Chain Rule → Backpropagation → Neural Networks
Reason: Core concept in backpropagation
User feedback: “This is exactly what I needed! I didn’t realize matrix multiplication was my blocker.”
Phase 5: GraphRAG Integration (Week 9-12)
Added LLM reasoning on top of the knowledge graph.
Use case: Context-aware Q&A
Student question: “Why isn’t my neural network learning?”
Without GraphRAG: Generic answer about learning rates, overfitting, etc.
With GraphRAG:
def answer_question(student_id: str, question: str):
# 1. Get student context from graph
student_context = {
'mastered': get_mastered_concepts(student_id),
'in_progress': get_current_concepts(student_id),
'struggled_with': get_struggled_concepts(student_id),
'gaps': identify_gaps(student_id, current_concept='Neural Networks')
}
# 2. Retrieve relevant concepts
relevant_concepts = semantic_search(question, knowledge_graph)
# 3. Build context-aware prompt
prompt = f"""
Student question: {question}
Student's background:
- Mastered: {student_context['mastered']}
- Currently learning: {student_context['in_progress']}
- Previously struggled with: {student_context['struggled_with']}
- Current gaps: {student_context['gaps']}
Relevant concepts:
{format_concepts(relevant_concepts)}
Provide an answer that:
1. Matches the student's knowledge level
2. Addresses their specific gaps
3. Uses concepts they've already mastered
4. Suggests next steps for improvement
"""
answer = llm.generate(prompt)
return answer
Example response:
Based on your recent work, I notice you’ve mastered linear regression but struggled with matrix operations (mastery: 0.4). Neural networks heavily rely on matrix multiplication for computing weighted sums.
Your “network not learning” issue might be because:
- You’re implementing backpropagation, which requires the chain rule (your current gap)
- Without solid understanding of matrix operations, the weight updates might be incorrect
Recommended path:
- First, strengthen matrix multiplication (est. 3 hours) → Resource: [3Blue1Brown Matrix Visualization]
- Then, review the chain rule (est. 2 hours) → Resource: [Khan Academy Chain Rule]
- Finally, return to backpropagation with this foundation
Once you’re comfortable with these prerequisites, your neural network debugging will be much easier!
Impact: Students reported this felt like “having a personal tutor who knows my learning history.”
Phase 6: Personalized Learning Paths (Week 13-16)
Generated customized learning sequences based on:
- Student’s current knowledge
- Learning goals
- Available time
- Learning style preferences
Pathfinding algorithm:
def generate_learning_path(
student_id: str,
goal_concept: str,
available_hours: int,
learning_style: str # "visual", "hands-on", "theoretical"
):
# 1. Identify all required concepts
required = get_all_prerequisites(goal_concept)
# 2. Filter out already mastered
mastered = get_mastered_concepts(student_id)
to_learn = required - mastered
# 3. Build dependency graph
subgraph = knowledge_graph.subgraph(to_learn)
# 4. Topological sort (prerequisite order)
ordered = topological_sort(subgraph)
# 5. Estimate time for each concept
estimated_times = {}
for concept in ordered:
base_time = CONCEPT_TIMES[concept]
# Adjust for student's learning speed (based on history)
speed_factor = get_learning_speed(student_id, concept_type(concept))
estimated_times[concept] = base_time * speed_factor
# 6. Select concepts that fit time budget
selected = []
total_time = 0
for concept in ordered:
if total_time + estimated_times[concept] <= available_hours:
selected.append(concept)
total_time += estimated_times[concept]
else:
break # Time budget exhausted
# 7. Select resources matching learning style
path = []
for concept in selected:
resources = get_resources(concept, style=learning_style)
path.append({
'concept': concept,
'estimated_time': estimated_times[concept],
'resources': resources[:3] # Top 3
})
return path, total_time
Example output:
Learning path to "Build a Recommender System"
Student: Bob
Available time: 20 hours
Learning style: Hands-on
Week 1 (10 hours):
├─ Matrix Multiplication (3 hours)
│ └─ [Coding Exercise] Implement matrix multiply from scratch
│ └─ [Video] 3Blue1Brown - Visualizing matrix operations
│
├─ Cosine Similarity (2 hours)
│ └─ [Interactive] Similarity playground
│ └─ [Project] Calculate document similarity
│
└─ Collaborative Filtering (5 hours)
└─ [Tutorial] Build movie recommender step-by-step
└─ [Dataset] MovieLens 100K
Week 2 (10 hours):
├─ Matrix Factorization (4 hours)
│ └─ [Coding Challenge] Implement SVD
│ └─ [Paper] Netflix Prize approach (simplified)
│
└─ Final Project (6 hours)
└─ [Capstone] Build end-to-end recommender system
└─ [Dataset] Your choice (books, music, products)
Total: 20 hours | Concepts: 5 | Projects: 3
Student feedback: “This is exactly my pace. The projects make it stick!”
Technical Architecture
System components:
┌─────────────────────────────────────────────────────────────┐
│ Student Interface │
│ (Web App - React) │
└──────────────────────┬──────────────────────────────────────┘
│ REST API
┌──────────────────────┴──────────────────────────────────────┐
│ Backend (FastAPI) │
│ ┌──────────────┬────────────────┬─────────────────────────┐│
│ │ Progress │ Gap Analysis │ Path Generation ││
│ │ Tracker │ Engine │ Engine ││
│ └──────────────┴────────────────┴─────────────────────────┘│
└──────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────┴──────────────────────────────────────┐
│ GraphRAG Layer │
│ ┌─────────────────────┬─────────────────────────────────┐ │
│ │ LLM (GPT-4/Claude) │ Vector Store (Chroma) │ │
│ └─────────────────────┴─────────────────────────────────┘ │
└──────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────┴──────────────────────────────────────┐
│ Knowledge Graph (Neo4j) │
│ Nodes: Concepts, Students, Resources │
│ Edges: PREREQUISITE_OF, MASTERED, TEACHES, etc. │
└──────────────────────────────────────────────────────────────┘
Tech stack:
- Frontend: React + D3.js (knowledge graph visualization)
- Backend: FastAPI (Python)
- Graph DB: Neo4j
- Vector DB: Chroma (for semantic search)
- LLM: GPT-4 (analysis), Claude (conversations)
- Deployment: Docker + Kubernetes
Real-World Results
Pilot study: 150 students, 12-week Data Science bootcamp
Metrics:
| Metric | Before | With System | Improvement |
|---|---|---|---|
| Completion rate | 62% | 90% | +45% |
| Average time to complete | 14 weeks | 10.5 weeks | -25% |
| Final project quality | 3.2/5 | 4.3/5 | +34% |
| Student satisfaction | 3.8/5 | 4.7/5 | +24% |
| Instructor intervention | 3.2 hrs/student | 1.3 hrs/student | -59% |
Qualitative feedback:
“For the first time, I could SEE what I needed to learn. The path was clear.” - Sarah, Career Switcher
“The gap identification saved me weeks. I was about to waste time on advanced topics when I had basic gaps.” - Mike, CS Student
“My students are more self-directed now. They know what to work on without me constantly redirecting them.” - Dr. Chen, Instructor
Surprising insights:
- Students underestimate prerequisites: 73% had gaps they didn’t realize
- Visual learners benefit most: 40% improvement for visual learning style
- Spaced repetition works: Students who returned to concepts 7+ days showed 2x retention
- Peer learning emerged: Students started sharing their knowledge graphs, teaching each other gaps
Challenges & Lessons
Challenge 1: Knowledge Graph Quality
Problem: LLM-extracted prerequisites weren’t always accurate
Example: Claimed “Docker” was prerequisite for “Neural Networks” (wrong!)
Solution:
- Human review of critical paths
- Community validation (instructors + students vote)
- Confidence scores on relationships
Challenge 2: Mastery Calculation
Problem: How do you measure “understanding”?
Initial approach: Quiz scores only Issue: Students could game with memorization
Final approach: Multi-signal mastery:
- Assessment scores (40%)
- Application in projects (30%)
- Spaced repetition (20%)
- Peer teaching (10%)
Challenge 3: Over-reliance Risk
Problem: Students might blindly follow the system
Solution:
- Explanation for every recommendation
- Allow students to override/customize paths
- Encourage exploration beyond recommended path
- Regular reflection prompts
Challenge 4: Cold Start
Problem: New students have no interaction history
Solution:
- Initial placement assessment (20 min)
- Survey of prior knowledge
- First few interactions carefully monitored
- Rapid convergence (accurate within 5 hours of use)
Future Directions
Short-term (3-6 months):
- Multi-modal learning: Incorporate videos, interactive exercises, quizzes
- Collaborative filtering: “Students like you also learned…”
- Mobile app: On-the-go learning tracking
- Integration with LMS: Canvas, Moodle, Blackboard
Medium-term (6-12 months):
- Peer learning graph: Connect students with complementary knowledge
- Instructor dashboard: Class-wide gap analysis
- Adaptive assessments: Questions targeting individual gaps
- Career pathways: Map knowledge to job requirements
Long-term (1-2 years):
- Cross-domain transfer: Identify transferable knowledge across fields
- Lifelong learning: Track progression across years, multiple courses
- AI tutor: Fully automated personalized tutoring
- Open knowledge graph: Community-built, Wikipedia for learning paths
Open Questions
- Optimal graph granularity: How detailed should concept nodes be?
- Mastery threshold: Is 0.7 the right bar for “mastery”?
- Learning styles: Do they actually matter, or is it a myth?
- Motivation: How to keep students engaged with long learning paths?
- Privacy: How much student data is too much?
Conclusion
Building this system taught me that learning is fundamentally a graph problem. Linear curricula force students into paths that don’t match their knowledge.
Key insights:
- Gaps are invisible: Students don’t know what they don’t know
- Graphs reveal structure: Prerequisites make implicit knowledge explicit
- Personalization scales: AI + graphs enable individual learning paths
- Data drives improvement: The system gets better as students use it
My hope: Every student should have a personalized knowledge graph. Not in 10 years – now.
The technology exists. The pedagogy is sound. We just need to build it.
Resources:
Want to collaborate? Reach out! Looking for:
- Educators to pilot in your courses
- Developers to contribute to open-source project
- Researchers interested in learning analytics
Have you experienced invisible knowledge gaps? Share your stories!