llms.txt — Structured Site Summary

llmoptimizer

Generate an llms.txt that gives AI models a clean, structured summary of your website or docs. It works with any site and has first-class helpers for popular frameworks (Vite, Next.js, Nuxt, Astro, Remix), plus a docs generator for Markdown/MDX.

Node.js 18+ is required.

Why This Matters

Clear signal for AI: Produce a compact, consistent llms.txt that lists your important pages with key metadata, headings, and structured data.
Multiple input modes: Crawl a live site, read a sitemap, scan static builds, or run framework-aware adapters without extra setup.
Docs-first: Generate llms.txt and llms-full.txt directly from Markdown/MDX, including optional sectioned link lists and concatenated context files.
Robots made easy: Generate a robots.txt that explicitly allows popular search and LLM crawlers, and auto-includes your sitemap.

Install

npm install --save-dev llmoptimizer

Quick Starts

Pick the scenario that matches your project. All commands write llms.txt by default.

# 1) Crawl production
npx llmoptimizer generate --url https://example.com --out public/llms.txt --max-pages 200

# 2) Use a sitemap
npx llmoptimizer generate --sitemap https://example.com/sitemap.xml --out llms.txt

# 3) Scan a static export (e.g., Next.js out/)
npx llmoptimizer generate --root ./out --out ./out/llms.txt

# 4) Build-scan (no crawling): search common build dirs for HTML
npx llmoptimizer generate --build-scan --project-root . --out llms.txt

# 5) Docs (Markdown/MDX) → llms.txt + llms-full.txt + stats
npx llmoptimizer docs --docs-dir docs --out-dir build --site-url https://example.com --base-url /

# 6) Autodetect best mode (docs → build-scan → adapter → crawl)
npx llmoptimizer auto --url https://example.com

# 7) Generate robots.txt that allows search + LLM crawlers
npx llmoptimizer robots --out public/robots.txt --sitemap https://example.com/sitemap.xml

Common flags:

--format markdown|json (default markdown)
--include <glob...> / --exclude <glob...> to filter routes/files
--concurrency <n> and --delay-ms <ms> for performance/throttling
--no-robots to skip robots.txt checks in network modes

What llmoptimizer Generates

llmoptimizer extracts and summarizes the signals that matter to AI and search.

Site summary: base URL, generation time, totals
Per page (varies by mode):
- Basics: URL, title, description, canonical
- Metadata: robots meta, keywords, social (OpenGraph/Twitter)
- Structure: H1–H4 headings, snippets, estimated words/tokens
- Links/media: internal/external link counts, images, missing alt counts
- Structured data: schema.org JSON‑LD types summary

Docs mode also emits:

llms.txt: Sectioned link list (or auto-grouped) with a short intro
llms-full.txt: Concatenated cleaned content for all docs
llms-stats.json: Headings, words, token estimates per doc + totals
Optional: llms-ctx.txt and llms-ctx-full.txt context bundles

Structured theme

Use --theme structured (or render.theme: 'structured' in config) for a more LLM-friendly, categorized Markdown output. It includes:

Site header with base URL, locales, page count, and totals.
Categories (Home, Docs, Guides, API, Blog, etc.) with counts and an index.
Per-page JSON metadata blocks (url/title/description/canonical/locale/metrics/alternates/OG/Twitter) followed by concise headings, links, and images samples.

Example:

llms.txt — Structured Site Summary

Base URL: https://example.com Generated: 2025-08-27 Pages: 42 Totals: words=12345 images=120 missingAlt=3 internalLinks=420 externalLinks=88

Docs (20)

Getting Started

{ "url": "https://example.com/docs/getting-started", "title": "Getting Started", "metrics": { "wordCount": 950 } }

Headings:
- H1: Getting Started
- H2: Installation

CLI Overview

Generate from a site/build

npx llmoptimizer generate [options]

# Modes
  --url <https://...>           # crawl production (obeys robots by default)
  --sitemap <url>               # seed from sitemap.xml
  --root <dir>                  # scan a static export/build dir for HTML
  --build-scan                  # scan common build dirs under --project-root
  --adapter --project-root .    # framework-aware route fetch (when supported)

# Output & format
  --out <file>                  # default: llms.txt
  --format markdown|json
  --theme default|compact|detailed|structured   # default: structured

# Filtering & perf
  --include <glob...> --exclude <glob...>
  --max-pages <n> --concurrency <n> --delay-ms <ms>
  --no-robots

Debug dump (routes/build/sample)

npx llmoptimizer dump \
  --project-root . \
  --base-url https://example.com --sample 5 \
  --scan-build --build-dirs dist .next/server/pages \
  --framework-details \
  --include "/docs/*" --exclude "/admin/*" \
  --out dump.json

Outputs JSON including:

Adapter detection and basic routes/params
Next.js extractor details (when applicable)
Framework details (when --framework-details):
- SvelteKit: filesystem-derived route patterns + param names + example blog slugs
- Nuxt: pages/ routes (Nuxt 2 underscore + Nuxt 3 bracket), i18n locales (best-effort), content/blog slugs
- Remix: app/routes routes (dotted segments, $params, pathless parentheses), param names
- Angular: angular.json outputPath, extracted path: entries and loadChildren hints
Optional build scan results
Optional sample of fetched pages when --base-url is provided

Docs (Markdown/MDX) → llms files

npx llmoptimizer docs \
  --docs-dir docs --out-dir build --site-url https://example.com --base-url / \
  --include-blog --blog-dir blog \
  --ignore "advanced/*" "private/*" \
  --order "getting-started/*" "guides/*" "api/*" \
  --ignore-path docs --add-path api \
  --exclude-imports --remove-duplicate-headings \
  --generate-markdown-files \
  --emit-ctx --ctx-out llms-ctx.txt --ctx-full-out llms-ctx-full.txt \
  --llms-filename llms.txt --llms-full-filename llms-full.txt \
  --stats-file llms-stats.json \
  --title "Your Docs" --description "Great docs" --version 1.0.0 \
  --sections-file ./examples/sections.json \
  --optional-links-file ./examples/optional-links.json

What “sections” mean:

You can provide explicit sections as JSON (see examples/sections.json).
Or omit them and let auto-sections group content like Getting Started, Guides, API, Tutorials, Reference.
“Optional” links are supported via a separate JSON file (see examples/optional-links.json).

Autodetect best mode

npx llmoptimizer auto \
  --project-root . \
  --url https://example.com \
  --out llms.txt --format markdown --concurrency 8 --max-pages 200 --delay-ms 0

Robots.txt generator

npx llmoptimizer robots \
  --out public/robots.txt \
  --sitemap https://example.com/sitemap.xml \
  --no-allow-all        # optional: do not add Allow: /
  --no-llm-allow        # optional: skip explicit LLM bot allow-list
  --no-search-allow     # optional: skip search bot allow-list
  --search-bot Googlebot --search-bot Bingbot  # override bots

It allows popular LLM crawlers (e.g., GPTBot, Google‑Extended, Claude, Perplexity, CCBot, Applebot‑Extended, Meta‑ExternalAgent, Amazonbot, Bytespider) and mainstream search bots (Googlebot, Bingbot, DuckDuckBot, Slurp, Baiduspider, YandexBot).

Configuration (optional)

Create llmoptimizer.config.ts if you prefer defaults on the CLI. Structured is the default theme.

// llmoptimizer.config.ts
import { defineConfig } from 'llmoptimizer'

export default defineConfig({
  baseUrl: 'https://example.com',
  obeyRobots: true,
  maxPages: 200,
  concurrency: 8,
  network: { delayMs: 100, sitemap: { concurrency: 6, delayMs: 50 } },
  // Themes: 'default' | 'compact' | 'detailed' | 'structured'
  render: {
    theme: 'structured',
    // Optional: customize structured output
    structured: {
      limits: { headings: 16, links: 12, images: 8 },
      categories: {
        // Control section order
        order: ['Home', 'Products', 'Product Categories', 'Docs', 'Guides', 'API', 'Policies', 'Important', 'Blog', 'Company', 'Legal', 'Support', 'Examples', 'Other'],
        // Keyword mapping: match in URL path or H1
        keywords: {
          Products: ['product', 'pricing', 'features'],
          'Product Categories': ['category', 'categories', 'catalog', 'collection'],
          Policies: ['privacy', 'terms', 'cookies', 'policy', 'policies', 'security', 'gdpr'],
          Important: ['status', 'uptime', 'login', 'signup', 'contact'],
        },
      },
    },
  },
  output: { file: 'public/llms.txt', format: 'markdown' },
  robots: {
    outFile: 'public/robots.txt',
    allowAll: true,
    llmAllow: true,
    searchAllow: true,
    sitemaps: ['https://example.com/sitemap.xml'],
  },
})

Framework Integrations

All integrations default to writing llms.txt. You can swap to JSON via format: 'json'.

Vite (React/Vue/Svelte/Solid/Preact)

// vite.config.ts
import { defineConfig } from 'vite'
import { llmOptimizer } from 'llmoptimizer/vite'

export default defineConfig({
  plugins: [
    llmOptimizer({
      mode: 'static', // or 'crawl' with baseUrl
      robots: { outFile: 'dist/robots.txt' },
    }),
  ],
})

Next.js

// scripts/postbuild-llm.ts
import { runAfterNextBuild } from 'llmoptimizer/next'
await runAfterNextBuild({
  projectRoot: process.cwd(),
  baseUrl: process.env.NEXT_PUBLIC_SITE_URL || 'https://yourdomain.com',
  outFile: 'public/llms.txt',
  // Choose the strategy:
  // - static: build-scan (.next/server/*, out) with baseUrl mapping → adapter → crawl
  // - adapter: fetch detected routes from baseUrl → build-scan → crawl
  // - crawl: breadth-first crawl baseUrl
  mode: 'static',
  robots: true,
  log: true,
})
// package.json
// { "scripts": { "postbuild": "node scripts/postbuild-llm.ts" } }

Nuxt 3 (Nitro)

// nuxt.config.ts
export default defineNuxtConfig({
  modules: [[
    'llmoptimizer/nuxt',
    {
      // static: build-scan on .output/public with baseUrl mapping → crawl fallback
      mode: 'static',
      baseUrl: process.env.NUXT_PUBLIC_SITE_URL || 'https://yourdomain.com',
      robots: true,
    },
  ]],
})

Astro

// astro.config.mjs
import { defineConfig } from 'astro/config'
import llm from 'llmoptimizer/astro'
export default defineConfig({
  integrations: [
    llm({
      // static: build-scan on dist with baseUrl mapping → crawl fallback
      mode: 'static',
      baseUrl: process.env.SITE_URL,
      robots: true,
    })
  ]
})

Remix

// scripts/postbuild-llm.mjs
import { runAfterRemixBuild } from 'llmoptimizer/remix'
await runAfterRemixBuild({
  // static: build-scan on public with baseUrl mapping → crawl fallback
  mode: 'static',
  baseUrl: process.env.SITE_URL || 'https://your.app',
  outFile: 'public/llms.txt',
  robots: true,
})

SvelteKit

// scripts/sveltekit-postbuild-llm.mjs
import { runAfterSvelteKitBuild } from 'llmoptimizer/sveltekit'
await runAfterSvelteKitBuild({
  // static: scan 'build' and map to URLs using baseUrl → crawl fallback if SSR-only
  mode: 'static',
  buildDir: 'build',
  baseUrl: process.env.SITE_URL || 'https://your.app',
  outFile: 'build/llms.txt',
  theme: 'structured',
  // Optional filters and structured theme options
  // include: ['/docs/*'], exclude: ['/admin/*'],
  // renderOptions: { limits: { headings: 12, links: 10, images: 6 } },
  robots: { outFile: 'build/robots.txt' },
})
// package.json → { "scripts": { "postbuild": "node scripts/sveltekit-postbuild-llm.mjs" } }

Angular

// scripts/angular-postbuild-llm.mjs
import { runAfterAngularBuild } from 'llmoptimizer/angular'
await runAfterAngularBuild({
  // static: scan Angular dist output; distDir auto-detected from angular.json when omitted
  mode: 'static',
  baseUrl: process.env.SITE_URL || 'https://your.app',
  theme: 'structured',
  // Optional: distDir: 'dist/your-project/browser'
  // include/exclude and renderOptions are supported
  robots: { outFile: 'dist/robots.txt' },
})
// package.json → { "scripts": { "postbuild": "node scripts/angular-postbuild-llm.mjs" } }

Generic Node script

// scripts/postbuild-llm.ts
import { runAfterBuild } from 'llmoptimizer/node'
await runAfterBuild({
  // static: build-scan on dist with baseUrl mapping → crawl fallback
  mode: 'static',
  rootDir: 'dist',
  baseUrl: process.env.SITE_URL,
  robots: true,
})

Generic Node/SSR

// scripts/postbuild-llm.mjs
import { runAfterBuild } from 'llmoptimizer/node'
await runAfterBuild({ mode: 'crawl', baseUrl: 'https://yourdomain.com', outFile: 'llms.txt' })

Docs Integration Details (Markdown/MDX)

Use the CLI or the API. The integration cleans content, removes duplicate headings, optionally inlines local partials, and can generate cleaned per-doc .md files.

Programmatic example:

// scripts/generate-docs-llm.mjs
import { docsLLMs } from 'llmoptimizer/docs'

const plugin = docsLLMs({
  docsDir: 'docs',
  includeBlog: true,
  ignoreFiles: ['advanced/*', 'private/*'],
  includeOrder: ['getting-started/*', 'guides/*', 'api/*'],
  pathTransformation: { ignorePaths: ['docs'], addPaths: ['api'] },
  excludeImports: true,
  removeDuplicateHeadings: true,
  generateMarkdownFiles: true,
  autoSections: true,
  // Optional: explicit sections/links
  // sections: [...],
  // optionalLinks: [...],
})

await plugin.postBuild({
  outDir: 'build',
  siteConfig: { url: 'https://example.com', baseUrl: '/', title: 'Docs', tagline: 'Great docs' },
})

Outputs in build/:

llms.txt and llms-full.txt
llms-stats.json with word/token estimates
Optionally llms-ctx.txt and llms-ctx-full.txt (when emitCtx)
Optional cleaned per-doc .md files used for link targets

See examples/sections.json and examples/optional-links.json for input formats.

Smart Autoregistration (Auto)

Prefer one helper that “just works”? Use the auto integration in a postbuild script. It picks from docs → build → adapter → crawl based on your repo and writes the right output.

// scripts/auto-llm.mjs
import { autoPostbuild } from 'llmoptimizer/auto'
const res = await autoPostbuild({ baseUrl: 'https://example.com', log: true })
console.log(res) // { mode: 'docs'|'build'|'adapter'|'crawl', outPath: '...' }

Add to package.json: { "scripts": { "postbuild": "node scripts/auto-llm.mjs" } }.

Notes

Absolute links: Internal links, canonical, hreflang, and images are resolved to absolute URLs using the page URL. Pass baseUrl in static/build-scan modes to avoid file:// URLs.
Build-scan coverage: When baseUrl is provided, build-scan enriches routes using framework artifacts (e.g., Next prerender/routes manifests) and falls back to sitemap or crawl if empty.
Adapter vs static: Adapter fetches via HTTP from baseUrl (requires a reachable server). Static uses build output folders and does not require a running server.

Examples

Next postbuild: examples/next-postbuild-llm.mjs
Auto detection: examples/auto-llm.mjs
Nuxt config: examples/nuxt.config.ts
Astro config: examples/astro.config.mjs
Remix postbuild: examples/remix-postbuild-llm.mjs
Vite config: examples/vite.config.mjs
Generic Node postbuild: examples/node-postbuild-llm.mjs
SvelteKit postbuild: examples/sveltekit-postbuild-llm.mjs
Angular postbuild: examples/angular-postbuild-llm.mjs

Best Practices

Titles and descriptions: Ensure every page has good <title> and meta description.
Structured data: Use JSON‑LD for key entities; we summarize types in output.
Headings: Keep H1–H3 clear and scannable; these are extracted.
Internationalization: Use <html lang> and hreflang alternates when applicable.
Sitemaps: Keep sitemap.xml fresh for coverage.
Robots: Use the robots generator to allow search + LLM crawlers on public content.

Troubleshooting

Empty or few pages: Check --include/--exclude filters and robots settings; try --no-robots for testing.
Dynamic routes (adapter mode): Provide sample params or ensure your framework exposes discoverable routes.
Rate limits: Lower --concurrency and add --delay-ms when crawling.
Wrong links in docs mode: Adjust --ignore-path/--add-path or provide --site-url/--base-url.

Contact

Email: ihuzaifashoukat@gmail.com
GitHub: https://github.com/ihuzaifashoukat

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
examples		examples
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
TODO.md		TODO.md
llmoptimizer.config.example.ts		llmoptimizer.config.example.ts
package.json		package.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

llmoptimizer

Why This Matters

Install

Quick Starts

What llmoptimizer Generates

Structured theme

llms.txt — Structured Site Summary

Categories

Docs (20)

Getting Started

CLI Overview

Configuration (optional)

Framework Integrations

Docs Integration Details (Markdown/MDX)

Smart Autoregistration (Auto)

Best Practices

Troubleshooting

Contact

License

About

Uh oh!

Releases 1

Packages

Languages

License

ihuzaifashoukat/llmoptimizer

Folders and files

Latest commit

History

Repository files navigation

llmoptimizer

Why This Matters

Install

Quick Starts

What llmoptimizer Generates

Structured theme

llms.txt — Structured Site Summary

Categories

Docs (20)

Getting Started

CLI Overview

Configuration (optional)

Framework Integrations

Docs Integration Details (Markdown/MDX)

Smart Autoregistration (Auto)

Best Practices

Troubleshooting

Contact

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages