Generate an llms.txt that gives AI models a clean, structured summary of your website or docs. It works with any site and has first-class helpers for popular frameworks (Vite, Next.js, Nuxt, Astro, Remix), plus a docs generator for Markdown/MDX.
Node.js 18+ is required.
- Clear signal for AI: Produce a compact, consistent llms.txt that lists your important pages with key metadata, headings, and structured data.
- Multiple input modes: Crawl a live site, read a sitemap, scan static builds, or run framework-aware adapters without extra setup.
- Docs-first: Generate llms.txt and llms-full.txt directly from Markdown/MDX, including optional sectioned link lists and concatenated context files.
- Robots made easy: Generate a robots.txt that explicitly allows popular search and LLM crawlers, and auto-includes your sitemap.
npm install --save-dev llmoptimizer
Pick the scenario that matches your project. All commands write llms.txt by default.
# 1) Crawl production
npx llmoptimizer generate --url https://example.com --out public/llms.txt --max-pages 200
# 2) Use a sitemap
npx llmoptimizer generate --sitemap https://example.com/sitemap.xml --out llms.txt
# 3) Scan a static export (e.g., Next.js out/)
npx llmoptimizer generate --root ./out --out ./out/llms.txt
# 4) Build-scan (no crawling): search common build dirs for HTML
npx llmoptimizer generate --build-scan --project-root . --out llms.txt
# 5) Docs (Markdown/MDX) → llms.txt + llms-full.txt + stats
npx llmoptimizer docs --docs-dir docs --out-dir build --site-url https://example.com --base-url /
# 6) Autodetect best mode (docs → build-scan → adapter → crawl)
npx llmoptimizer auto --url https://example.com
# 7) Generate robots.txt that allows search + LLM crawlers
npx llmoptimizer robots --out public/robots.txt --sitemap https://example.com/sitemap.xml
Common flags:
--format markdown|json
(default markdown)--include <glob...>
/--exclude <glob...>
to filter routes/files--concurrency <n>
and--delay-ms <ms>
for performance/throttling--no-robots
to skip robots.txt checks in network modes
llmoptimizer extracts and summarizes the signals that matter to AI and search.
- Site summary: base URL, generation time, totals
- Per page (varies by mode):
- Basics: URL, title, description, canonical
- Metadata: robots meta, keywords, social (OpenGraph/Twitter)
- Structure: H1–H4 headings, snippets, estimated words/tokens
- Links/media: internal/external link counts, images, missing alt counts
- Structured data: schema.org JSON‑LD types summary
Docs mode also emits:
llms.txt
: Sectioned link list (or auto-grouped) with a short introllms-full.txt
: Concatenated cleaned content for all docsllms-stats.json
: Headings, words, token estimates per doc + totals- Optional:
llms-ctx.txt
andllms-ctx-full.txt
context bundles
Use --theme structured
(or render.theme: 'structured'
in config) for a more LLM-friendly, categorized Markdown output. It includes:
- Site header with base URL, locales, page count, and totals.
- Categories (Home, Docs, Guides, API, Blog, etc.) with counts and an index.
- Per-page JSON metadata blocks (url/title/description/canonical/locale/metrics/alternates/OG/Twitter) followed by concise headings, links, and images samples.
Example:
Base URL: https://example.com Generated: 2025-08-27 Pages: 42 Totals: words=12345 images=120 missingAlt=3 internalLinks=420 externalLinks=88
- Docs: 20
- Guides: 8
- Blog: 5
- Other: 9
{ "url": "https://example.com/docs/getting-started", "title": "Getting Started", "metrics": { "wordCount": 950 } }
- Headings:
- H1: Getting Started
- H2: Installation
- Generate from a site/build
npx llmoptimizer generate [options]
# Modes
--url <https://...> # crawl production (obeys robots by default)
--sitemap <url> # seed from sitemap.xml
--root <dir> # scan a static export/build dir for HTML
--build-scan # scan common build dirs under --project-root
--adapter --project-root . # framework-aware route fetch (when supported)
# Output & format
--out <file> # default: llms.txt
--format markdown|json
--theme default|compact|detailed|structured # default: structured
# Filtering & perf
--include <glob...> --exclude <glob...>
--max-pages <n> --concurrency <n> --delay-ms <ms>
--no-robots
- Debug dump (routes/build/sample)
npx llmoptimizer dump \
--project-root . \
--base-url https://example.com --sample 5 \
--scan-build --build-dirs dist .next/server/pages \
--framework-details \
--include "/docs/*" --exclude "/admin/*" \
--out dump.json
Outputs JSON including:
- Adapter detection and basic routes/params
- Next.js extractor details (when applicable)
- Framework details (when
--framework-details
):- SvelteKit: filesystem-derived route patterns + param names + example blog slugs
- Nuxt: pages/ routes (Nuxt 2 underscore + Nuxt 3 bracket), i18n locales (best-effort), content/blog slugs
- Remix: app/routes routes (dotted segments, $params, pathless parentheses), param names
- Angular:
angular.json
outputPath, extractedpath:
entries andloadChildren
hints
- Optional build scan results
- Optional sample of fetched pages when
--base-url
is provided
- Docs (Markdown/MDX) → llms files
npx llmoptimizer docs \
--docs-dir docs --out-dir build --site-url https://example.com --base-url / \
--include-blog --blog-dir blog \
--ignore "advanced/*" "private/*" \
--order "getting-started/*" "guides/*" "api/*" \
--ignore-path docs --add-path api \
--exclude-imports --remove-duplicate-headings \
--generate-markdown-files \
--emit-ctx --ctx-out llms-ctx.txt --ctx-full-out llms-ctx-full.txt \
--llms-filename llms.txt --llms-full-filename llms-full.txt \
--stats-file llms-stats.json \
--title "Your Docs" --description "Great docs" --version 1.0.0 \
--sections-file ./examples/sections.json \
--optional-links-file ./examples/optional-links.json
What “sections” mean:
- You can provide explicit sections as JSON (see
examples/sections.json
). - Or omit them and let auto-sections group content like Getting Started, Guides, API, Tutorials, Reference.
- “Optional” links are supported via a separate JSON file (see
examples/optional-links.json
).
- Autodetect best mode
npx llmoptimizer auto \
--project-root . \
--url https://example.com \
--out llms.txt --format markdown --concurrency 8 --max-pages 200 --delay-ms 0
- Robots.txt generator
npx llmoptimizer robots \
--out public/robots.txt \
--sitemap https://example.com/sitemap.xml \
--no-allow-all # optional: do not add Allow: /
--no-llm-allow # optional: skip explicit LLM bot allow-list
--no-search-allow # optional: skip search bot allow-list
--search-bot Googlebot --search-bot Bingbot # override bots
It allows popular LLM crawlers (e.g., GPTBot, Google‑Extended, Claude, Perplexity, CCBot, Applebot‑Extended, Meta‑ExternalAgent, Amazonbot, Bytespider) and mainstream search bots (Googlebot, Bingbot, DuckDuckBot, Slurp, Baiduspider, YandexBot).
Create llmoptimizer.config.ts
if you prefer defaults on the CLI. Structured is the default theme.
// llmoptimizer.config.ts
import { defineConfig } from 'llmoptimizer'
export default defineConfig({
baseUrl: 'https://example.com',
obeyRobots: true,
maxPages: 200,
concurrency: 8,
network: { delayMs: 100, sitemap: { concurrency: 6, delayMs: 50 } },
// Themes: 'default' | 'compact' | 'detailed' | 'structured'
render: {
theme: 'structured',
// Optional: customize structured output
structured: {
limits: { headings: 16, links: 12, images: 8 },
categories: {
// Control section order
order: ['Home', 'Products', 'Product Categories', 'Docs', 'Guides', 'API', 'Policies', 'Important', 'Blog', 'Company', 'Legal', 'Support', 'Examples', 'Other'],
// Keyword mapping: match in URL path or H1
keywords: {
Products: ['product', 'pricing', 'features'],
'Product Categories': ['category', 'categories', 'catalog', 'collection'],
Policies: ['privacy', 'terms', 'cookies', 'policy', 'policies', 'security', 'gdpr'],
Important: ['status', 'uptime', 'login', 'signup', 'contact'],
},
},
},
},
output: { file: 'public/llms.txt', format: 'markdown' },
robots: {
outFile: 'public/robots.txt',
allowAll: true,
llmAllow: true,
searchAllow: true,
sitemaps: ['https://example.com/sitemap.xml'],
},
})
All integrations default to writing llms.txt. You can swap to JSON via format: 'json'
.
-
Vite (React/Vue/Svelte/Solid/Preact)
// vite.config.ts import { defineConfig } from 'vite' import { llmOptimizer } from 'llmoptimizer/vite' export default defineConfig({ plugins: [ llmOptimizer({ mode: 'static', // or 'crawl' with baseUrl robots: { outFile: 'dist/robots.txt' }, }), ], })
-
Next.js
// scripts/postbuild-llm.ts import { runAfterNextBuild } from 'llmoptimizer/next' await runAfterNextBuild({ projectRoot: process.cwd(), baseUrl: process.env.NEXT_PUBLIC_SITE_URL || 'https://yourdomain.com', outFile: 'public/llms.txt', // Choose the strategy: // - static: build-scan (.next/server/*, out) with baseUrl mapping → adapter → crawl // - adapter: fetch detected routes from baseUrl → build-scan → crawl // - crawl: breadth-first crawl baseUrl mode: 'static', robots: true, log: true, }) // package.json // { "scripts": { "postbuild": "node scripts/postbuild-llm.ts" } }
-
Nuxt 3 (Nitro)
// nuxt.config.ts export default defineNuxtConfig({ modules: [[ 'llmoptimizer/nuxt', { // static: build-scan on .output/public with baseUrl mapping → crawl fallback mode: 'static', baseUrl: process.env.NUXT_PUBLIC_SITE_URL || 'https://yourdomain.com', robots: true, }, ]], })
-
Astro
// astro.config.mjs import { defineConfig } from 'astro/config' import llm from 'llmoptimizer/astro' export default defineConfig({ integrations: [ llm({ // static: build-scan on dist with baseUrl mapping → crawl fallback mode: 'static', baseUrl: process.env.SITE_URL, robots: true, }) ] })
-
Remix
// scripts/postbuild-llm.mjs import { runAfterRemixBuild } from 'llmoptimizer/remix' await runAfterRemixBuild({ // static: build-scan on public with baseUrl mapping → crawl fallback mode: 'static', baseUrl: process.env.SITE_URL || 'https://your.app', outFile: 'public/llms.txt', robots: true, })
-
SvelteKit
// scripts/sveltekit-postbuild-llm.mjs import { runAfterSvelteKitBuild } from 'llmoptimizer/sveltekit' await runAfterSvelteKitBuild({ // static: scan 'build' and map to URLs using baseUrl → crawl fallback if SSR-only mode: 'static', buildDir: 'build', baseUrl: process.env.SITE_URL || 'https://your.app', outFile: 'build/llms.txt', theme: 'structured', // Optional filters and structured theme options // include: ['/docs/*'], exclude: ['/admin/*'], // renderOptions: { limits: { headings: 12, links: 10, images: 6 } }, robots: { outFile: 'build/robots.txt' }, }) // package.json → { "scripts": { "postbuild": "node scripts/sveltekit-postbuild-llm.mjs" } }
-
Angular
// scripts/angular-postbuild-llm.mjs import { runAfterAngularBuild } from 'llmoptimizer/angular' await runAfterAngularBuild({ // static: scan Angular dist output; distDir auto-detected from angular.json when omitted mode: 'static', baseUrl: process.env.SITE_URL || 'https://your.app', theme: 'structured', // Optional: distDir: 'dist/your-project/browser' // include/exclude and renderOptions are supported robots: { outFile: 'dist/robots.txt' }, }) // package.json → { "scripts": { "postbuild": "node scripts/angular-postbuild-llm.mjs" } }
-
Generic Node script
// scripts/postbuild-llm.ts import { runAfterBuild } from 'llmoptimizer/node' await runAfterBuild({ // static: build-scan on dist with baseUrl mapping → crawl fallback mode: 'static', rootDir: 'dist', baseUrl: process.env.SITE_URL, robots: true, })
-
Generic Node/SSR
// scripts/postbuild-llm.mjs import { runAfterBuild } from 'llmoptimizer/node' await runAfterBuild({ mode: 'crawl', baseUrl: 'https://yourdomain.com', outFile: 'llms.txt' })
Use the CLI or the API. The integration cleans content, removes duplicate headings, optionally inlines local partials, and can generate cleaned per-doc .md files.
Programmatic example:
// scripts/generate-docs-llm.mjs
import { docsLLMs } from 'llmoptimizer/docs'
const plugin = docsLLMs({
docsDir: 'docs',
includeBlog: true,
ignoreFiles: ['advanced/*', 'private/*'],
includeOrder: ['getting-started/*', 'guides/*', 'api/*'],
pathTransformation: { ignorePaths: ['docs'], addPaths: ['api'] },
excludeImports: true,
removeDuplicateHeadings: true,
generateMarkdownFiles: true,
autoSections: true,
// Optional: explicit sections/links
// sections: [...],
// optionalLinks: [...],
})
await plugin.postBuild({
outDir: 'build',
siteConfig: { url: 'https://example.com', baseUrl: '/', title: 'Docs', tagline: 'Great docs' },
})
Outputs in build/
:
llms.txt
andllms-full.txt
llms-stats.json
with word/token estimates- Optionally
llms-ctx.txt
andllms-ctx-full.txt
(whenemitCtx
) - Optional cleaned per-doc
.md
files used for link targets
See examples/sections.json
and examples/optional-links.json
for input formats.
Prefer one helper that “just works”? Use the auto integration in a postbuild script. It picks from docs → build → adapter → crawl based on your repo and writes the right output.
// scripts/auto-llm.mjs
import { autoPostbuild } from 'llmoptimizer/auto'
const res = await autoPostbuild({ baseUrl: 'https://example.com', log: true })
console.log(res) // { mode: 'docs'|'build'|'adapter'|'crawl', outPath: '...' }
Add to package.json: { "scripts": { "postbuild": "node scripts/auto-llm.mjs" } }
.
Notes
- Absolute links: Internal links, canonical, hreflang, and images are resolved to absolute URLs using the page URL. Pass
baseUrl
in static/build-scan modes to avoid file:// URLs. - Build-scan coverage: When
baseUrl
is provided, build-scan enriches routes using framework artifacts (e.g., Next prerender/routes manifests) and falls back to sitemap or crawl if empty. - Adapter vs static: Adapter fetches via HTTP from
baseUrl
(requires a reachable server). Static uses build output folders and does not require a running server.
Examples
- Next postbuild:
examples/next-postbuild-llm.mjs
- Auto detection:
examples/auto-llm.mjs
- Nuxt config:
examples/nuxt.config.ts
- Astro config:
examples/astro.config.mjs
- Remix postbuild:
examples/remix-postbuild-llm.mjs
- Vite config:
examples/vite.config.mjs
- Generic Node postbuild:
examples/node-postbuild-llm.mjs
- SvelteKit postbuild:
examples/sveltekit-postbuild-llm.mjs
- Angular postbuild:
examples/angular-postbuild-llm.mjs
- Titles and descriptions: Ensure every page has good
<title>
and meta description. - Structured data: Use JSON‑LD for key entities; we summarize types in output.
- Headings: Keep H1–H3 clear and scannable; these are extracted.
- Internationalization: Use
<html lang>
andhreflang
alternates when applicable. - Sitemaps: Keep
sitemap.xml
fresh for coverage. - Robots: Use the robots generator to allow search + LLM crawlers on public content.
- Empty or few pages: Check
--include/--exclude
filters and robots settings; try--no-robots
for testing. - Dynamic routes (adapter mode): Provide sample params or ensure your framework exposes discoverable routes.
- Rate limits: Lower
--concurrency
and add--delay-ms
when crawling. - Wrong links in docs mode: Adjust
--ignore-path/--add-path
or provide--site-url/--base-url
.
- Email: ihuzaifashoukat@gmail.com
- GitHub: https://github.com/ihuzaifashoukat
MIT