Skip to content

alicoding/claude-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Claude Parser v2.1.0 πŸš€

PyPI version Documentation License: MIT

Git-like disaster recovery for Claude Code conversations

Claude Parser treats every Claude API call as a git commit, enabling powerful recovery and analysis capabilities when things go wrong.

πŸŽ‰ What's New in v2.1.0

New Features

  • πŸ” Export Domain - Export conversations to different formats for indexing
  • πŸ“š LlamaIndex Export - export_for_llamaindex() for semantic search
  • πŸ› οΈ Fixed Discovery - discover_claude_files() now properly returns file paths
  • πŸ“¦ Complete API - All filtering functions now properly exported

πŸ“‹ v2.0.0 Major Changes

  • 🎯 Complete API Redesign - Clean, intuitive Python API with 30+ functions
  • πŸ“š 15 Domain Architecture - Organized into focused, composable modules
  • πŸ”§ CG Commands - Full Git-like CLI for disaster recovery
  • πŸͺ CH Hook System - Composable hooks for Claude Code integrations
  • πŸ“Š DuckDB Backend - Efficient JSONL querying without intermediate storage
  • πŸ—οΈ LNCA Compliant - Every file <80 LOC, 100% framework delegation
  • πŸ“– Full Documentation - Complete API reference, user guides, and examples

Breaking Changes from v1

  • New import structure: from claude_parser import load_session (not from claude_parser.main)
  • Removed god objects - now uses focused domain modules
  • All functions return plain dicts, not custom objects
  • Pydantic schema normalization handles all JSONL variations

πŸš€ Quick Start

Installation

pip install claude-parser

Basic Usage

from claude_parser import load_latest_session, analyze_session

# Load your most recent Claude session
session = load_latest_session()
print(f"Found session with {len(session['messages'])} messages")

# Analyze the session
analysis = analyze_session(session)
print(f"Total tokens: {analysis['total_tokens']}")
print(f"Estimated cost: ${analysis['estimated_cost']:.2f}")

Disaster Recovery with CG Commands

# Oh no! Claude deleted important files!
python -m claude_parser.cli.cg find "important_file.py"

# Found it! Now restore it
python -m claude_parser.cli.cg checkout /path/to/important_file.py

# See what happened
python -m claude_parser.cli.cg reflog

# Jump back to before the disaster
python -m claude_parser.cli.cg reset abc123

🎯 Core Features

Git-like Commands (CG)

  • cg status - Current session state
  • cg log - Conversation history
  • cg find <pattern> - Search across all sessions
  • cg blame <file> - Who last modified a file
  • cg checkout <file> - Restore deleted files
  • cg reflog - Operation history
  • cg show <uuid> - Operation details
  • cg reset <uuid> - Time travel to any point

Composable Hooks (CH)

# Run hooks with custom executors
python -m claude_parser.cli.ch run --executor my_executor

# Or set default executor
export CLAUDE_HOOK_EXECUTOR=my_executor

Python SDK

from claude_parser import (
    # Session Management
    load_session, load_latest_session, discover_all_sessions,

    # Analytics
    analyze_session, analyze_tool_usage, analyze_token_usage,
    calculate_session_cost, calculate_context_window,

    # File Operations
    restore_file_content, generate_file_diff, compare_files,

    # Navigation
    find_message_by_uuid, get_message_sequence, get_timeline_summary,

    # Discovery
    discover_claude_files, discover_current_project_files,

    # Filtering (NEW in v2!)
    filter_messages_by_type, filter_messages_by_tool,
    search_messages_by_content, exclude_tool_operations,

    # Export (NEW in v2.1!)
    export_for_llamaindex  # Export conversations for semantic search
)

πŸ“Š Real-World Use Cases

1. Disaster Recovery

from claude_parser import load_latest_session, restore_file_content

session = load_latest_session()
content = restore_file_content("/deleted/important.py", session)
with open("recovered.py", "w") as f:
    f.write(content)

2. Cost Analysis

from claude_parser import load_latest_session, calculate_session_cost

session = load_latest_session()
cost = calculate_session_cost(
    input_tokens=100000,
    output_tokens=50000,
    model="claude-3-5-sonnet-20241022"
)
print(f"This session cost: ${cost['total_cost']:.2f}")

3. Message Filtering

from claude_parser import load_latest_session, MessageType
from claude_parser.filtering import filter_messages_by_type

session = load_latest_session()
user_messages = filter_messages_by_type(session['messages'], MessageType.USER)
print(f"You sent {len(user_messages)} messages")

4. Real-time Monitoring

from claude_parser.watch import watch

def on_assistant(message):
    print(f"Claude says: {message['content'][:100]}...")

watch("~/.claude/projects/current/session.jsonl", on_assistant=on_assistant)

5. Export for Semantic Search (NEW in v2.1!)

from claude_parser import export_for_llamaindex

# Export conversations to LlamaIndex format
docs = export_for_llamaindex("session.jsonl")
# Returns: [{"text": "message", "metadata": {...}}, ...]

# Use with semantic search services
for doc in docs:
    print(f"Text: {doc['text'][:50]}...")
    print(f"Speaker: {doc['metadata']['speaker']}")

πŸ—οΈ Architecture

Clean Domain Organization (19 modules)

claude_parser/
β”œβ”€β”€ analytics/     # Session and tool analysis
β”œβ”€β”€ api/           # API utilities
β”œβ”€β”€ cli/           # CG and CH commands
β”œβ”€β”€ core/          # Core utilities
β”œβ”€β”€ discovery/     # File and project discovery
β”œβ”€β”€ export/        # Export formats (NEW in v2.1!)
β”œβ”€β”€ extensions/    # Extension system
β”œβ”€β”€ filtering/     # Message filtering
β”œβ”€β”€ hooks/         # Hook system and API
β”œβ”€β”€ loaders/       # Session loading
β”œβ”€β”€ messages/      # Message utilities
β”œβ”€β”€ models/        # Data models
β”œβ”€β”€ navigation/    # Timeline and UUID navigation
β”œβ”€β”€ operations/    # File operations
β”œβ”€β”€ queries/       # DuckDB SQL queries
β”œβ”€β”€ session/       # Session management
β”œβ”€β”€ storage/       # DuckDB engine
β”œβ”€β”€ tokens/        # Token counting and billing
└── watch/         # Real-time monitoring

LNCA Principles

  • <80 LOC per file - Optimized for LLM comprehension
  • 100% Framework Delegation - No custom loops or error handling
  • Single Source of Truth - One file per feature
  • Pydantic Everything - Schema normalization for all JSONL variations

πŸ“– Documentation

Full documentation available at: https://alicoding.github.io/claude-parser/

πŸ”„ Migration from v1

Old v1 Code

# v1 - Complex imports, god objects
from claude_parser.main import ClaudeParser
parser = ClaudeParser()
parser.load_and_analyze_everything()  # God object doing too much

New v2 Code

# v2 - Clean, focused functions
from claude_parser import load_latest_session, analyze_session
session = load_latest_session()
analysis = analyze_session(session)  # One function, one purpose

Key Differences

  1. No more god objects - Each function does one thing well
  2. Plain dicts everywhere - No custom classes to learn
  3. Explicit imports - Import only what you need
  4. Better error handling - Framework delegation (Pydantic/Typer)
  5. More features - Filtering, watching, complete hook system

🚒 Deployment

PyPI Release

# Build and upload to PyPI
python -m build
twine upload dist/*

Documentation

Documentation auto-deploys to GitHub Pages on every push to main.

πŸ—ΊοΈ Export Format Roadmap

Currently Available (v2.1)

  • βœ… LlamaIndex - export_for_llamaindex() - For semantic search indexing

Planned Export Formats

  • πŸ”œ Mem0 - Long-term memory for AI agents
  • πŸ”œ ChromaDB - Vector database format
  • πŸ”œ Pinecone - Cloud vector database
  • πŸ”œ Markdown - Human-readable conversation logs
  • πŸ”œ JSON-LD - Structured data with context
  • πŸ”œ OpenAI Messages - Direct OpenAI API format
  • πŸ”œ Anthropic Messages - Direct Anthropic API format
  • πŸ”œ LangChain Documents - LangChain document format
  • πŸ”œ Haystack Documents - Haystack NLP framework

Export Domain Architecture

claude_parser/export/
β”œβ”€β”€ __init__.py        # Export registry
β”œβ”€β”€ llamaindex.py      # LlamaIndex format (DONE)
β”œβ”€β”€ mem0.py           # Mem0 format (TODO)
β”œβ”€β”€ chroma.py         # ChromaDB format (TODO)
β”œβ”€β”€ markdown.py       # Markdown format (TODO)
└── ...               # More formats

πŸš€ Roadmap

v3.0.0 - UI-Ready API (Coming Soon)

Complete redesign for zero-boilerplate UI development.

Core/Feature Layer Separation

  • βœ… Audit complete - identified all internal vs public functions
  • βœ… Framework delegation mapped (humanize, babel, arrow, rich)
  • πŸ”œ claude_parser.core - Low-level utilities for advanced users
  • πŸ”œ claude_parser - High-level UI-ready functions

Display-Ready Functions

# Coming in v3.0.0
from claude_parser import (
    get_session_summary,         # "436 messages, 3 hours, $12.45"
    get_formatted_messages,       # Markdown-formatted conversation
    get_token_breakdown,          # "45,678 tokens ($12.45)"
    get_file_changes_display,     # Formatted diff with colors
    export_as_html,              # Complete HTML report
)

# One-liner, zero parsing needed:
print(get_session_summary(session))  # That's it!

Planned Features

  • Session Display: Pre-formatted messages with timestamps, roles, emojis
  • Analytics Dashboard: Human-readable metrics (not raw numbers)
  • File Operations: Formatted diffs, file lists with status icons
  • Export Formats: HTML, Markdown, JSON, PDF - all display-ready
  • Smart Defaults: "No messages found" instead of empty arrays
  • Number Formatting: "$12.45" not 0.01245, "45,678" not 45678
  • Time Formatting: "2:34 PM" not timestamps, "3 hours ago" not seconds

Framework Delegation

All formatting delegated to specialized libraries:

  • humanize - Number and size formatting
  • babel - Currency formatting
  • arrow - Time and date formatting
  • rich - Terminal colors and tables
  • emoji - Status indicators (βœ… ❌ ⚠️)
  • tabulate - Markdown/HTML tables
  • jinja2 - HTML report generation

v2.2.0 - Bug Fixes (Next Release)

  • βœ… Fixed token counting to match UI (v2.1.1)
  • βœ… Fixed None message field handling
  • πŸ”œ Additional message extraction improvements

🀝 Contributing

We welcome contributions! Please ensure:

  • Files stay under 80 lines of code
  • Use framework delegation (no custom loops)
  • Add tests for new features
  • Update documentation

πŸ“œ License

MIT License - See LICENSE file for details.

πŸ™ Acknowledgments

  • Built for the Claude Code community
  • Inspired by git's powerful recovery capabilities
  • Designed with LNCA principles for LLM-native development

πŸ“Š Stats

  • 19 specialized domains
  • 35+ public functions
  • <80 lines per file
  • 100% framework delegation
  • 0 custom error handling

Ready to never lose code again? Install v2.1.0 and experience the power of Git-like recovery for Claude Code!

pip install claude-parser==2.1.0

Documentation | GitHub | PyPI

About

No description, website, or topics provided.

Resources

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages