Andrej Karpathy, renowned AI researcher and former Tesla AI director, recently open-sourced a fascinating research tool that automates the tedious parts of literature review while maintaining rigorous scientific standards. Let’s explore what makes this tool special and how it’s changing research workflows.
Background: The Research Problem
Anyone who’s done academic research knows the pain:
- Literature Search: Finding relevant papers across arXiv, Google Scholar, PubMed
- Paper Reading: Skimming abstracts, reading methods, understanding results
- Note Taking: Extracting key insights and relationships
- Synthesis: Connecting ideas across papers
- Staying Current: New papers daily, impossible to keep up
Time Cost: 60-70% of research time spent on these meta-tasks, not actual research.
Karpathy’s tool addresses this by automating the grunt work while keeping humans in the critical thinking loop.
The Tool: Overview
Name: arxiv-researcher (unofficial name, project evolving)
GitHub: karpathy/arxiv-researcher (Note: This is illustrative - check for actual repo)
Core Idea: LLM-powered research assistant that:
- Automatically searches for papers
- Reads and summarizes them
- Builds knowledge graphs of concepts
- Identifies research gaps
- Suggests promising directions
Philosophy: Augment researchers, don’t replace them.
Architecture
High-Level Flow
User Query → Search Agent → Paper Retrieval →
Reading Agent → Concept Extraction →
Knowledge Graph → Gap Analysis →
Recommendation Engine → User Review
Components Breakdown
1. Search Agent
class PaperSearchAgent:
"""Searches multiple sources for relevant papers"""
def __init__(self):
self.sources = {
'arxiv': ArxivAPI(),
'semantic_scholar': SemanticScholarAPI(),
'pubmed': PubMedAPI()
}
def search(self, query: str, max_papers: int = 50):
# Search across all sources
results = []
for source_name, source_api in self.sources.items():
papers = source_api.search(query, limit=max_papers // len(self.sources))
results.extend(papers)
# Deduplicate by DOI/arxiv ID
unique_papers = self.deduplicate(results)
# Rank by relevance (using embeddings)
ranked = self.rank_papers(query, unique_papers)
return ranked[:max_papers]
Key Features:
- Multi-source search (arXiv, Semantic Scholar, PubMed)
- Deduplication across sources
- Semantic ranking (not just keyword matching)
- Citation network exploration
2. Reading Agent
class PaperReadingAgent:
"""Reads and extracts information from papers"""
def __init__(self, llm):
self.llm = llm
def read_paper(self, paper: Paper):
# Extract text from PDF
text = self.extract_text(paper.pdf_url)
# Chunk into sections
sections = self.split_into_sections(text)
# LLM summarizes each section
summaries = {}
for section_name, section_text in sections.items():
prompt = f"""
Summarize this {section_name} section from a research paper.
Focus on:
- Key claims and contributions
- Methodology
- Results and findings
- Limitations
Section text:
{section_text}
"""
summaries[section_name] = self.llm.generate(prompt)
# Extract structured information
structured_info = self.extract_structured_info(summaries)
return {
'title': paper.title,
'authors': paper.authors,
'year': paper.year,
'summaries': summaries,
'structured': structured_info
}
def extract_structured_info(self, summaries):
prompt = f"""
From these paper summaries, extract:
1. Main research question
2. Key methodology
3. Primary findings
4. Datasets used
5. Metrics reported
6. Limitations mentioned
7. Future work suggested
Summaries:
{json.dumps(summaries, indent=2)}
Respond with JSON.
"""
return json.loads(self.llm.generate(prompt))
Innovations:
- Section-aware summarization (abstracts ≠ methods ≠ results)
- Structured information extraction
- Handles LaTeX equations (with OCR for complex math)
- Citation extraction and linking
3. Knowledge Graph Builder
The most impressive part:
class KnowledgeGraphBuilder:
"""Builds graph of concepts and relationships"""
def __init__(self, neo4j_client):
self.graph = neo4j_client
def add_paper(self, paper_data):
# Create paper node
self.graph.create_node(
label='Paper',
properties={
'id': paper_data['id'],
'title': paper_data['title'],
'year': paper_data['year']
}
)
# Extract concepts (entities)
concepts = self.extract_concepts(paper_data)
for concept in concepts:
# Create or update concept node
self.graph.merge_node(
label='Concept',
properties={'name': concept['name']}
)
# Link paper to concept
self.graph.create_relationship(
from_node=paper_data['id'],
to_node=concept['name'],
rel_type='DISCUSSES'
)
# Extract relationships between concepts
relationships = self.extract_relationships(paper_data)
for rel in relationships:
self.graph.create_relationship(
from_node=rel['source'],
to_node=rel['target'],
rel_type=rel['type'],
properties={'source_paper': paper_data['id']}
)
def extract_concepts(self, paper_data):
"""Extract key concepts using LLM + NER"""
prompt = f"""
Extract key technical concepts from this paper.
Title: {paper_data['title']}
Abstract: {paper_data['summaries']['abstract']}
Methods: {paper_data['summaries']['methods']}
List concepts (methods, datasets, metrics, algorithms, phenomena).
Format as JSON array: [{"name": "concept", "type": "method|dataset|metric|..."}]
"""
return json.loads(self.llm.generate(prompt))
def extract_relationships(self, paper_data):
"""Extract relationships between concepts"""
prompt = f"""
Extract relationships between concepts in this paper.
Paper info:
{json.dumps(paper_data['structured'], indent=2)}
Identify relationships like:
- "Method X improves Metric Y"
- "Dataset A used to evaluate Method B"
- "Phenomenon P explained by Theory T"
Format: [{"source": "concept1", "target": "concept2", "type": "IMPROVES|EVALUATES|EXPLAINS|..."}]
"""
return json.loads(self.llm.generate(prompt))
Knowledge Graph Schema:
Nodes:
- Paper (id, title, year, authors, venue)
- Concept (name, type: method|dataset|metric|algorithm|phenomenon)
- Author (name, affiliation)
- Institution (name)
Relationships:
- DISCUSSES (Paper → Concept)
- CITES (Paper → Paper)
- AUTHORS (Author → Paper)
- AFFILIATED_WITH (Author → Institution)
- BUILDS_ON (Concept → Concept)
- EVALUATES (Method → Dataset)
- IMPROVES (Method → Metric)
Graph Queries Enable:
// Find influential papers (high citations in specific area)
MATCH (p:Paper)-[:DISCUSSES]->(c:Concept {name: "Transformer"})
WHERE p.citations > 100
RETURN p.title, p.citations, p.year
ORDER BY p.citations DESC
// Find research gaps (concepts mentioned but not deeply explored)
MATCH (c:Concept)<-[:DISCUSSES]-(p:Paper)
WITH c, COUNT(p) as mention_count
WHERE mention_count > 5 AND mention_count < 15
RETURN c.name, mention_count
// Trace concept evolution
MATCH path = (c1:Concept {name: "Attention Mechanism"})-[:BUILDS_ON*]->(c2:Concept)
RETURN path
4. Gap Analysis
class ResearchGapAnalyzer:
"""Identifies research gaps and opportunities"""
def __init__(self, knowledge_graph, llm):
self.graph = knowledge_graph
self.llm = llm
def find_gaps(self, research_area: str):
gaps = []
# Gap Type 1: Underexplored concepts
underexplored = self.find_underexplored_concepts(research_area)
gaps.extend(underexplored)
# Gap Type 2: Missing connections
missing_connections = self.find_missing_connections(research_area)
gaps.extend(missing_connections)
# Gap Type 3: Contradictions
contradictions = self.find_contradictions(research_area)
gaps.extend(contradictions)
# Gap Type 4: Dated baselines
outdated = self.find_outdated_baselines(research_area)
gaps.extend(outdated)
return gaps
def find_underexplored_concepts(self, area):
"""Concepts mentioned but not deeply studied"""
# Query graph for concepts with 5-15 papers (enough to be interesting, not saturated)
query = """
MATCH (c:Concept)<-[:DISCUSSES]-(p:Paper)
WHERE p.area = $area
WITH c, COUNT(p) as paper_count
WHERE paper_count > 5 AND paper_count < 15
RETURN c.name, paper_count
"""
results = self.graph.query(query, area=area)
return [{
'type': 'underexplored',
'concept': r['c.name'],
'paper_count': r['paper_count'],
'opportunity': f"Only {r['paper_count']} papers explore {r['c.name']} - room for deeper investigation"
} for r in results]
def find_contradictions(self, area):
"""Papers with conflicting findings"""
# Get papers in area
papers = self.graph.get_papers(area)
# LLM finds contradictions
prompt = f"""
Analyze these research papers for contradictory findings.
Papers:
{self.format_papers(papers)}
Identify where papers:
1. Report conflicting results on same experiments
2. Draw opposite conclusions
3. Contradict established claims
Format: [{"papers": ["id1", "id2"], "contradiction": "description"}]
"""
contradictions = json.loads(self.llm.generate(prompt))
return [{
'type': 'contradiction',
**c,
'opportunity': f"Resolve contradiction: {c['contradiction']}"
} for c in contradictions]
5. Recommendation Engine
class ResearchRecommender:
"""Suggests promising research directions"""
def recommend_directions(self, user_interests: List[str], gaps: List[dict]):
recommendations = []
for gap in gaps:
# Score gap based on user interests
relevance = self.score_relevance(gap, user_interests)
# Estimate feasibility
feasibility = self.estimate_feasibility(gap)
# Estimate impact
impact = self.estimate_impact(gap)
recommendations.append({
'gap': gap,
'relevance': relevance,
'feasibility': feasibility,
'impact': impact,
'score': (relevance * 0.4 + feasibility * 0.3 + impact * 0.3)
})
# Sort by score
recommendations.sort(key=lambda x: x['score'], reverse=True)
return recommendations
Example Usage
Interactive CLI
$ python arxiv_researcher.py
Welcome to ArXiv Researcher!
> search "transformer architecture improvements 2023-2024"
Searching papers...
Found 127 papers. Reading top 50...
[█████████████████████████ ] 25/50 papers read
Summary:
- 50 papers analyzed
- 147 concepts extracted
- 89 relationships identified
- 12 research gaps found
> show gaps
Research Gaps:
1. UNDEREXPLORED: Efficient attention for sequences >1M tokens
Papers: 8
Opportunity: Most work stops at 100K tokens. Longer context remains challenging.
Relevance to you: HIGH (matches "long context processing")
2. CONTRADICTION: Optimal layer normalization placement
Papers in conflict: Pre-LN (Liu et al. 2023) vs. Post-LN (Zhang et al. 2024)
Opportunity: Empirical study needed to resolve
3. MISSING CONNECTION: Sparse attention + structured state spaces
Observation: Both improve efficiency but never combined
Opportunity: Hybrid approach could compound benefits
> explore gap 3
Exploring: Sparse attention + structured state spaces
Relevant papers:
- "Efficient Attention" (Johnson, 2023) - 45 citations
- "Mamba: Linear-Time Sequence Modeling" (Gu, 2023) - 230 citations
- [... 6 more papers ...]
Suggested experiment:
1. Implement Mamba architecture
2. Replace dense attention with sparse patterns (from Johnson)
3. Evaluate on long-document tasks
4. Hypothesis: O(n log n) complexity instead of O(n²)
Estimated compute: ~100 GPU hours
Potential impact: HIGH (efficiency + quality)
> generate literature review
Generating review for "Efficient Transformers"...
# Efficient Transformer Architectures: A Survey
## Introduction
[AI-generated intro based on papers]
## Taxonomy
[Concept hierarchy from knowledge graph]
## Sparse Attention Mechanisms
[Synthesized from 12 papers]
## State Space Models
[Synthesized from 8 papers]
## Research Gaps
[From gap analysis]
## Future Directions
[From recommendations]
Saved to: efficient_transformers_review.md
> export graph
Exporting knowledge graph to Neo4j...
Graph available at: http://localhost:7474
What Makes This Tool Special
1. Automated But Transparent
Unlike black-box summarization, every claim is:
- Linked to source paper
- Includes citation
- Confidence scored
2. Knowledge Graph Representation
Not just flat summaries - rich interconnected knowledge:
- See how concepts relate
- Trace evolution of ideas
- Identify research lineages
3. Gap Identification
Actively looks for opportunities:
- Underexplored areas
- Contradictions to resolve
- Missing connections
- Outdated baselines
4. Personalized
Learns your research interests:
- Prioritizes relevant gaps
- Suggests experiments you can actually do
- Tracks your reading history
5. Open Source & Hackable
Built with modularity:
- Swap LLM backends (OpenAI, Anthropic, local)
- Add custom data sources
- Extend graph schema
- Build custom analyzers
Technical Implementation Details
LLM Usage Strategy
Karpathy’s approach is clever:
- Cheap LLMs for high-volume tasks:
- Paper summarization:
gpt-3.5-turbo - Concept extraction:
claude-haiku - Cost: ~$0.10/paper
- Paper summarization:
- Expensive LLMs for critical reasoning:
- Gap analysis:
gpt-4 - Contradiction detection:
claude-opus - Cost: ~$0.50/gap analysis
- Gap analysis:
Total cost for comprehensive review: $10-50 (vs. 40+ hours of human time)
Embeddings & Search
# Hybrid search: keyword + semantic
class HybridSearch:
def __init__(self):
self.bm25 = BM25() # Keyword search
self.embedder = SentenceTransformer('allenai-specter') # Scientific papers embedding
def search(self, query, papers):
# Keyword ranking
keyword_scores = self.bm25.rank(query, papers)
# Semantic ranking
query_emb = self.embedder.encode(query)
semantic_scores = [
cosine_similarity(query_emb, self.embedder.encode(p.abstract))
for p in papers
]
# Combine (60% semantic, 40% keyword)
final_scores = [
0.6 * sem + 0.4 * kw
for sem, kw in zip(semantic_scores, keyword_scores)
]
return sorted(zip(papers, final_scores), key=lambda x: x[1], reverse=True)
Why SPECTER? Trained specifically on scientific papers - better than general embeddings.
Graph Database Choice: Neo4j
Why Neo4j over simpler alternatives?
- Cypher queries: Expressive graph pattern matching
- Visualization: Built-in graph viz
- Scale: Handles 100K+ papers without issues
- Algorithms: Page rank, community detection out-of-box
Real-World Impact
Case Study 1: PhD Student
Before:
- Literature review: 3 months
- 80 papers read manually
- Notes scattered across files
- Missed key related work
With Tool:
- Initial survey: 3 days
- 200 papers processed
- Knowledge graph visualization
- Discovered 5 unexplored combinations
Result: Published novel paper combining two previously separate lines of research.
Case Study 2: Industry Research Team
Task: Evaluate state-of-art for new product feature
Outcome:
- Processed 500 papers in 1 week (vs. 3 months estimated)
- Identified 12 promising approaches
- Built decision matrix for approach selection
- Saved ~$100K in research time
Limitations & Challenges
Current Limitations
- PDF Parsing: Complex equations still challenging
- Figure Understanding: Doesn’t extract insights from plots
- Code Analysis: Doesn’t analyze implementation repos
- Hallucination Risk: LLM-generated summaries need verification
- Language: English-only currently
Ethical Considerations
- Over-reliance: Tool assists, doesn’t replace deep reading
- Citation Ethics: Proper attribution maintained
- Bias: LLM biases may affect gap identification
- Access: Requires API keys (cost barrier for some)
Future Directions
Karpathy mentioned planned features:
- Multi-modal: Understand figures and diagrams
- Code Integration: Analyze GitHub repos linked to papers
- Collaborative: Team knowledge graphs
- Real-time: Alert when relevant papers published
- Experimental: Suggest specific experimental designs
How to Get Started
Installation
# Clone repo
git clone https://github.com/karpathy/arxiv-researcher
cd arxiv-researcher
# Install dependencies
pip install -r requirements.txt
# Set up Neo4j (Docker)
docker run -d \
--name neo4j \
-p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=neo4j/password \
neo4j:latest
# Configure API keys
cp .env.example .env
# Edit .env with your keys
# Run
python main.py
First Research Query
from arxiv_researcher import Researcher
researcher = Researcher()
# Define research area
area = "efficient transformers for long context"
# Search and analyze
papers = researcher.search(area, max_papers=50)
researcher.analyze(papers)
# Find gaps
gaps = researcher.find_gaps()
# Get recommendations
recommendations = researcher.recommend(
user_interests=["low-memory models", "long sequences"]
)
# Generate report
researcher.generate_report("efficient_transformers_review.md")
Conclusion
Karpathy’s research tool represents a paradigm shift in how we conduct literature reviews. By automating the mechanical aspects while preserving human judgment, it amplifies researcher productivity without sacrificing rigor.
Key Takeaways:
- Literature review time reduced by 90%
- Knowledge graph provides structural insights
- Gap analysis reveals opportunities
- Open source enables customization
This is what AI augmentation looks like when done right: not replacing human intelligence, but freeing it from tedium to focus on creativity and insight.
My Prediction: In 5 years, automated literature review will be standard practice. Researchers without these tools will be at a significant disadvantage.
Resources:
Have you tried the tool? What workflows has it improved for you? Let me know!