@jasonnathan/llm-core

Lightweight, composable TypeScript tools for chunking, pipelining, and LLM orchestration.

`jasonnathan/llm-core`

llm-core is a lightweight, modular TypeScript library for building robust, production-ready data processing and Large Language Model (LLM) workflows. It provides a focused set of powerful tools designed to solve common but complex problems in preparing, processing, and orchestrating LLM-centric tasks.

It is unopinionated and designed to be composed into any existing application.

Why Use `llm-core`?

While many libraries can connect to LLM APIs, llm-core excels by providing solutions for the practical, real-world challenges that arise when building serious applications.

Advanced Semantic Chunking: Most libraries offer basic, fixed-size or recursive chunking. The CosineDropChunker is a significant step up, using semantic understanding to split content at natural topic boundaries. This is crucial for creating high-quality, contextually-aware chunks for Retrieval-Augmented Generation (RAG) systems, leading to more accurate results.
Pragmatic Workflow Orchestration: The pipeline module provides a simple, powerful, and "no-frills" way to chain data processing steps. It avoids the complexity of heavier workflow frameworks while offering a flexible, type-safe structure to build and reuse complex sequences of operations.
Robust Local LLM Integration: The OllamaService is more than just a basic API client. It includes first-class support for structured JSON output, response sanitization, and custom validation, making it easy to get reliable, machine-readable data from local models.
Modular and Unopinionated: This library is not a monolithic framework. It's a toolkit. You can pick and choose the components you need - the chunker, the pipeline, the services - and integrate them into your existing application without being forced into a specific architecture.

Features

🤖 Service Connectors: Type-safe clients for OpenAI and Ollama APIs with built-in retry logic.
🧩 Smart Chunking: Advanced text and Markdown chunking based on semantic similarity (CosineDropChunker).
✂️ Markdown Splitting: Intelligently splits Markdown content while preserving its structure, ideal for preprocessing.
⛓️ Pipelining: A simple, generic, and powerful pipeline builder to chain data processing steps.
✅ Type-Safe: Fully written in TypeScript to ensure type safety across your workflows.
⚙️ Environment-Aware: Easily configured through environment variables.

Installation

Install the package using your preferred package manager:

bun install @jasonnathan/llm-core

npm install @jasonnathan/llm-core

Quick Start

1. Set Up Environment

Create a .env file in your project root to configure the LLM services:

# For OpenAI
OPENAI_API_KEY="sk-..."
OPENAI_ENDPOINT="https://api.openai.com"

# For Ollama
OLLAMA_ENDPOINT="http://localhost:11434"

That’s it, once your environment is configured, you’re ready to import only what you need from llm-core and start composing robust, production-ready LLM workflows.

Core Modules

`pipeline`

The pipeline module allows you to chain together a series of processing steps to create sophisticated, reusable workflows. Each step is a function that receives the output of the previous one, making it easy to compose complex logic. It's generic, type-safe, and includes logging for each stage.

Example: Building a Question Generation Pipeline

Here's a simplified pipeline that processes documents to generate questions.

import { pipeline, createLogger, PipelineStep } from "@jasonnathan/llm-core";

interface QuestionDoc {
  source: string;
  content: string;
  questions: string[];
}

const logger = createLogger();

const collectContentStep: PipelineStep<QuestionDoc[]> =
  (logger) => async (docs) => {
    logger.info("Collecting content...");
    const newDocs = [
      { source: "doc1.md", content: "Pipelines are great.", questions: [] },
      { source: "doc2.md", content: "They are easy to use.", questions: [] },
    ];
    return [...docs, ...newDocs];
  };

const generateQuestionsStep: PipelineStep<QuestionDoc[]> =
  (logger) => async (docs) => {
    logger.info("Generating questions...");
    return docs.map((doc) => ({
      ...doc,
      questions: [`What is the main point of ${doc.source}?`],
    }));
  };

const questionPipeline = pipeline<QuestionDoc[]>(logger)
  .addStep(collectContentStep)
  .addStep(generateQuestionsStep);

async function main() {
  const initialDocs: QuestionDoc[] = [];
  const result = await questionPipeline.run(initialDocs);
  console.log(JSON.stringify(result, null, 2));
}

main();

That's how simple and powerful the pipeline abstraction is, allowing you to compose steps and inject logging or other effects across the whole workflow. For detailed usage and advanced examples, see the Pipeline Module Developer Guide.

`OllamaService` and `OpenAIService`

These services provide a consistent interface for interacting with Ollama and OpenAI APIs, handling requests, retries, and error handling. OllamaService is particularly powerful when paired with models that support structured JSON output.

Usage:

import { OllamaService } from "@jasonnathan/llm-core";

const ollama = new OllamaService("llama3:8b-instruct-q8_0");

async function getGreeting() {
  const response = await ollama.generatePromptAndSend(
    "You are a friendly assistant.",
    "Provide a one-sentence greeting to a new user.",
    {}
  );
  console.log(response);
}

For detailed usage, including structured JSON responses and embeddings, see the OllamaService Developer Guide.

`CosineDropChunker`

The CosineDropChunker is a sophisticated tool for splitting text or markdown based on semantic similarity. Instead of using fixed sizes, it finds natural breaks in the content's topics, resulting in more contextually coherent chunks. This is ideal for preparing data for RAG systems.

Usage:

import { CosineDropChunker, OllamaService } from "@jasonnathan/llm-core";

const ollama = new OllamaService("mxbai-embed-large");
const embedFn = (texts: string[]) => ollama.embedTexts(texts);

const chunker = new CosineDropChunker(embedFn);

async function chunkMyMarkdown() {
  const markdown =
    "# Title\n\nThis is the first paragraph. A second paragraph discusses a new topic.";
  const chunks = await chunker.chunk(markdown, {
    type: "markdown",
    breakPercentile: 95,
  });
  console.log(chunks);
}

For a deep dive into semantic chunking and all configuration options, see the Semantic Chunker Developer Guide.

`markdownSplitter`

This utility intelligently splits a Markdown document into smaller segments based on its structure (headings, paragraphs, code blocks, tables). It's useful for preprocessing documentation before embedding or analysis.

Usage:

import { markdownSplitter } from "@jasonnathan/llm-core";
import fs from "fs/promises";

async function splitMarkdown() {
  const markdownContent = await fs.readFile("my-doc.md", "utf-8");
  const chunks = markdownSplitter(markdownContent);
  console.log(chunks);
}

Development

Building the Project

To build the project from the source, run:

bun run build

This command uses tsup to bundle the code and tsc to generate type declarations, placing the output in the dist directory.

Running Tests

To run the test suite:

bun test

Release and Publish

This project uses standard-version for versioning and changelog generation. To create a new release and publish to the configured NPM registry:

Ensure your .npmrc and .env files are correctly configured.
Run the release command, loading the environment variables:

# For a minor release
bun --env-file=.env release:minor

# For a patch release
bun --env-file=.env release:patch

This will bump the version, create a git tag, generate a changelog, and publish the package.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github/workflows		.github/workflows
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
.npmrc		.npmrc
CHANGELOG.md		CHANGELOG.md
CHUNKER.md		CHUNKER.md
LICENSE		LICENSE
OLLAMA_SERVICE.md		OLLAMA_SERVICE.md
PIPELINE.md		PIPELINE.md
README.md		README.md
app.md		app.md
bun.lockb		bun.lockb
index.ts		index.ts
logo.png		logo.png
package.json		package.json
prompts.ts		prompts.ts
stash.ts		stash.ts
tsconfig.build.json		tsconfig.build.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
v0.1.2.md		v0.1.2.md
v0.1.3.md		v0.1.3.md
v0.1.4.md		v0.1.4.md
v0.1.5.md		v0.1.5.md
v0.2.0.md		v0.2.0.md
v0.3.0.md		v0.3.0.md
v0.4.0.md		v0.4.0.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

@jasonnathan/llm-core

Table of Contents

`jasonnathan/llm-core`

Why Use `llm-core`?

Features

Installation

Quick Start

1. Set Up Environment

Core Modules

`pipeline`

Example: Building a Question Generation Pipeline

`OllamaService` and `OpenAIService`

`CosineDropChunker`

`markdownSplitter`

Development

Building the Project

Running Tests

Release and Publish

About

Uh oh!

Releases 14

Packages

Uh oh!

Languages

License

jasonnathan/llm-core

Folders and files

Latest commit

History

Repository files navigation

@jasonnathan/llm-core

Table of Contents

jasonnathan/llm-core

Why Use llm-core?

Features

Installation

Quick Start

1. Set Up Environment

Core Modules

pipeline

Example: Building a Question Generation Pipeline

OllamaService and OpenAIService

CosineDropChunker

markdownSplitter

Development

Building the Project

Running Tests

Release and Publish

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Languages

`jasonnathan/llm-core`

Why Use `llm-core`?

`pipeline`

`OllamaService` and `OpenAIService`

`CosineDropChunker`

`markdownSplitter`

Packages