python my_analysis.py
Example session:
ArXiv Paper Analysis
==================================================
Enter your search query: quantum computing
How many papers to analyze? [50]: 25
Fetching papers...
Generating embeddings...
Clustering papers...
Found 25 papers in 4 clusters
Cluster 0 (8 papers):
- Title: Quantum Computing with Neutral Atoms
Authors: John Smith, Jane Doe
Published: 2024-01-15
...
-
Basic topic search:
machine learning
-
Multiple topics:
quantum computing AND artificial intelligence
-
Title-specific search:
ti:"deep learning"
-
Author search:
au:Smith
For more comprehensive analysis:
python scripts/analyze_papers.py
This provides:
- Detailed clustering analysis
- Visualizations
- Saved results in outputs directory
For a simple predefined analysis:
python quick_start.py
Results are saved in:
outputs/
├── papers_[timestamp]/
│ ├── papers.json # Raw paper data
│ └── metadata.json # Query information
├── clusters_[timestamp]/
│ └── cluster_data.json # Clustering results
└── visualizations/
└── clusters.html # Interactive visualization
from src.api.arxiv_api import ArxivAPI
from src.embedding_generator import EnhancedEmbeddingGenerator
# Initialize
api = ArxivAPI()
generator = EnhancedEmbeddingGenerator()
# Fetch papers
papers = api.fetch_papers_batch("your query", max_papers=50)
# Process
embeddings = generator.generate_embeddings([p['summary'] for p in papers])
-
Start Small
- Begin with 25-50 papers
- Increase if needed
-
Refine Searches
- Use specific terms
- Add category filters
- Combine search terms
-
Save Results
- Results automatically saved
- Check outputs directory
-
No Results
- Broaden search terms
- Check internet connection
- Verify query syntax
-
Slow Performance
- Reduce number of papers
- Close other applications
- Check memory usage
-
Installation Issues
- Use Python 3.11
- Create fresh virtual environment
- Update pip and dependencies