Turn code repositories into AI-friendly Markdown documentation
Codefetch is a powerful tool that converts git repositories and local codebases into structured Markdown files optimized for Large Language Models (LLMs). It intelligently collects, processes, and formats code while respecting ignore patterns and providing token counting for various AI models.
- 📁 Local Codebase Processing - Convert entire codebases into AI-friendly Markdown
- 🔗 Git Repository Support - Fast GitHub API fetching with git clone fallback
- 🐙 Multi-Platform Support - Works with GitHub, GitLab, and Bitbucket
- 🎯 Smart Filtering - Respect .gitignore patterns and custom exclusions
- 📊 Token Counting - Track tokens for GPT-4, Claude, and other models
- 🚀 CLI & SDK - Use via command line or integrate programmatically
- 💾 Intelligent Caching - Speed up repeated fetches with smart caching
- 🌲 Project Structure Visualization - Generate tree views of your codebase
- ⚡ GitHub API Integration - Fetch repos without git using the GitHub API
Click here for a Demo & Videos
# Analyze current directory
npx codefetch
# Analyze a GitHub repo (uses API - no git needed!)
npx codefetch --url github.com/facebook/react
# Analyze from GitLab or Bitbucket
npx codefetch --url gitlab.com/gitlab-org/gitlab
npx codefetch
npm install -g codefetch
codefetch --help
npm install --save-dev codefetch
# Add to package.json scripts
# Basic usage - outputs to codefetch/codebase.md
npx codefetch
# Include only TypeScript files with tree view
npx codefetch -e ts,tsx -t 3
# Generate with AI prompt template
npx codefetch -p improve --max-tokens 50000
# Analyze a GitHub repository
npx codefetch --url github.com/vuejs/vue --branch main -e js,ts
Include or exclude specific files and directories:
# Exclude node_modules and public directories
npx codefetch --exclude-dir test,public
# Include only TypeScript files
npx codefetch --include-files "*.ts" -o typescript-only.md
# Include src directory, exclude test files
npx codefetch --include-dir src --exclude-files "*.test.ts" -o src-no-tests.md
Dry run (only output to console)
npx codefetch --d
Count tokens only (without generating markdown file)
# Count tokens with default encoder
npx codefetch -c
# Count tokens with specific encoder
npx codefetch -c --token-encoder cl100k
# Count tokens for specific file types
npx codefetch -c -e .ts,.js --token-encoder o200k
If no output file is specified (-o
or --output
), it will print to codefetch/codebase.md
Option | Description |
---|---|
-o, --output <file> |
Specify output filename (defaults to codebase.md). Note: If you include "codefetch/" in the path, it will be automatically stripped to avoid double-nesting |
--dir <path> |
Specify the directory to scan (defaults to current directory) |
--max-tokens <number> |
Limit output tokens (default: 500,000) |
-e, --extension <ext,...> |
Filter by file extensions (e.g., .ts,.js) |
--token-limiter <type> |
Token limiting strategy when using --max-tokens (sequential, truncated) |
--include-files <pattern,...> |
Include specific files (supports patterns like *.ts) |
--exclude-files <pattern,...> |
Exclude specific files (supports patterns like *.test.ts) |
--include-dir <dir,...> |
Include specific directories |
--exclude-dir <dir,...> |
Exclude specific directories |
-v, --verbose [level] |
Show processing information (0=none, 1=basic, 2=debug) |
-t, --project-tree [depth] |
Generate visual project tree (optional depth, default: 2) |
--token-encoder <type> |
Token encoding method (simple, p50k, o200k, cl100k) |
--disable-line-numbers |
Disable line numbers in output |
-d, --dry-run |
Output markdown to stdout instead of file |
-c, --token-count-only |
Output only the token count without generating markdown file |
All options that accept multiple values use comma-separated lists. File patterns support simple wildcards:
*
matches any number of characters?
matches a single character
You can generate a visual tree representation of your project structure:
# Generate tree with default depth (2 levels)
npx codefetch --project-tree
# Generate tree with custom depth
npx codefetch -t 3
# Generate tree and save code to file
npx codefetch -t 2 -o output.md
Example output:
Project Tree:
└── my-project
├── src
│ ├── index.ts
│ ├── types.ts
│ └── utils
├── tests
│ └── index.test.ts
└── package.json
You can add predefined or custom prompts to your output:
# Use default prompt (looks for codefetch/prompts/default.md)
npx codefetch -p
npx codefetch --prompt
# Use built-in prompts
npx codefetch -p fix # fixes codebase
npx codefetch -p improve # improves codebase
npx codefetch -p codegen # generates code
npx codefetch -p testgen # generates tests
# Use custom prompts
npx codefetch --prompt custom-prompt.md
npx codefetch -p my-architect.txt
Create custom prompts in codefetch/prompts/
directory:
- Create a markdown file (e.g.,
codefetch/prompts/my-prompt.md
) - Use it with
--prompt my-prompt.md
You can also set a default prompt in your codefetch.config.mjs
:
export default {
defaultPromptFile: "dev", // Use built-in prompt
}
export default {
defaultPromptFile: "custom-prompt.md", // Use custom prompt file
}
The prompt resolution order is:
- CLI argument (
-p
or--prompt
) - Config file prompt setting
- No prompt if neither is specified
When using just -p
or --prompt
without a value, codefetch will look for codefetch/prompts/default.md
.
When using --max-tokens
, you can control how tokens are distributed across files using the --token-limiter
option:
# Sequential mode - process files in order until reaching token limit
npx codefetch --max-tokens 500 --token-limiter sequential
# Truncated mode (default) - distribute tokens evenly across all files
npx codefetch --max-tokens 500 --token-limiter truncated
sequential
: Processes files in order until the total token limit is reached. Useful when you want complete content from the first files.truncated
: Distributes tokens evenly across all files, showing partial content from each file. This is the default mode and is useful for getting an overview of the entire codebase.
codefetch supports two ways to ignore files:
.gitignore
- Respects your project's existing.gitignore
patterns.codefetchignore
- Additional patterns specific to codefetch
The .codefetchignore
file works exactly like .gitignore
and is useful when you want to ignore files that aren't in your .gitignore
.
Codefetch uses a set of default ignore patterns to exclude common files and directories that typically don't need to be included in code reviews or LLM analysis.
You can view the complete list of default patterns in default-ignore.ts.
Codefetch supports different token counting methods to match various AI models:
simple
: Basic word-based estimation (not very accurate but fastest!)p50k
: GPT-3 style tokenizationo200k
: gpt-4o style tokenizationcl100k
: GPT-4 style tokenization
Select the appropriate encoder based on your target model:
# For GPT-4o
npx codefetch --token-encoder o200k
By default (unless using --dry-run) codefetch will:
- Create a
codefetch/
directory in your project - Store all output files in this directory
This ensures that:
- Your fetched code is organized in one place
- The output directory won't be fetched so we avoid fetching the codebase again
Add codefetch/
to your .gitignore
file to avoid committing the fetched codebase.
You can use this command to create code-to-markdown in bolt.new, cursor.com, ... and ask the AI chat for guidance about your codebase.
Codefetch is organized as a monorepo with multiple packages:
Command-line interface for Codefetch with web fetching capabilities.
npm install -g codefetch
Features:
- Full CLI with all options
- Website crawling and conversion
- Git repository cloning
- Built-in caching system
- Progress reporting
Read the full CLI documentation →
Core SDK for programmatic usage in your applications.
npm install codefetch-sdk@latest
Features:
- 🎯 Unified
fetch()
API - Single method for all sources - 🚀 Zero-config defaults - Works out of the box
- 📦 Optimized bundle - Small footprint for edge environments
- 🔧 Full TypeScript support - Complete type safety
- 🌐 Enhanced web support - GitHub API integration
Quick Start:
import { fetch } from 'codefetch-sdk';
// Local codebase
const result = await fetch({
source: './src',
extensions: ['.ts', '.tsx'],
maxTokens: 50000,
});
// GitHub repository
const result = await fetch({
source: 'https://github.com/facebook/react',
branch: 'main',
extensions: ['.js', '.ts'],
});
console.log(result.markdown); // AI-ready markdown
Read the full SDK documentation →
Cloudflare Workers optimized build - Zero file system dependencies.
import { fetch } from 'codefetch-sdk/worker';
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const result = await fetch({
source: 'https://github.com/vercel/next.js',
maxFiles: 50,
extensions: ['.ts', '.tsx'],
});
return new Response(result.markdown, {
headers: { 'Content-Type': 'text/markdown' }
});
}
};
Features:
- 🚀 Zero nodejs_compat required - Uses native Web APIs
- 📦 35.4KB bundle size - Optimized for edge performance
- 🔒 Private repo support - GitHub token authentication
- 🌊 Native streaming - Memory-efficient processing
Read the full Worker documentation →
codefetch-mcp-server (Coming soon)
Model Context Protocol server for AI assistants like Claude.
Features:
- MCP tools for codebase analysis
- Direct integration with Claude Desktop
- Token counting tools
- Configurable via environment variables
Read the full MCP documentation →
Initialize your project with codefetch:
npx codefetch init
This will:
- Create a
.codefetchignore
file for excluding files - Generate a
codefetch.config.mjs
with your preferences - Set up the project structure
Create a .codefetchrc
file in your project root:
{
"extensions": [".ts", ".tsx", ".js", ".jsx"],
"excludeDirs": ["node_modules", "dist", "coverage"],
"maxTokens": 100000,
"outputFile": "codebase.md",
"tokenEncoder": "cl100k"
}
Or use codefetch.config.mjs
for more control:
export default {
// Output settings
outputPath: "codefetch",
outputFile: "codebase.md",
maxTokens: 999_000,
// Processing options
projectTree: 2,
tokenEncoder: "cl100k",
tokenLimiter: "truncated",
// File filtering
extensions: [".ts", ".js"],
excludeDirs: ["test", "dist"],
// AI/LLM settings
trackedModels: ["gpt-4", "claude-3-opus", "gpt-3.5-turbo"],
};
- X/Twitter: @kregenrek
- Bluesky: @kevinkern.dev
- Learn Cursor AI: Ultimate Cursor Course
- Learn to build software with AI: AI Builder Hub
- aidex - AI model information
This project was inspired by