Introduction

The Provider-Agnostic LLM Runtime for Node.js.

NodeLLM is a backend orchestration layer designed for building reliable, testable, and provider-agnostic AI systems.

It is not a “simple API wrapper” or a “prompt engineering tool.” NodeLLM deals with the hard infrastructure problems: normalizing streaming across providers, managing tool execution loops, enforcing timeouts, and enabling first-class testing and telemetry.

Unified support for

                Your App
                   ↓
NodeLLM (Unified API + State + Security)
                   ↓
 OpenAI | Anthropic | Bedrock | Ollama

🛑 What NodeLLM is NOT

To understand NodeLLM, you must understand what it is NOT.

NodeLLM is NOT:

❌ A thin wrapper around vendor SDKs (like openai or @anthropic-ai/sdk)
❌ A UI streaming library (like Vercel AI SDK)
❌ A prompt-only framework

NodeLLM IS:

✅ A Backend Runtime: Designed for workers, cron jobs, agents, and API servers.
✅ Provider Agnostic: Switches providers via config, not code rewrites.
✅ Contract Driven: Guarantees identical behavior for Tools and Streaming across all models.
✅ Infrastructure First: Built for evals, telemetry, retries, and circuit breaking.

🏗️ The “Infrastructure-First” Approach

Most AI SDKs optimize for “getting a response to the user fast” (Frontend/Edge). NodeLLM optimizes for system reliability (Backend).

It is designed for architects and platform engineers who need:

Strict Process Protection: Preventing hung requests from stalling event loops.
Normalized Persistence: Treating chat interactions as database records via @node-llm/orm.
Determinism: Testing your AI logic with VCR recordings and time-travel debugging.

Strategic Goals

Decoupling: Isolate your business logic from the rapid churn of AI model versions.
Production Safety: Native support for circuit breaking, redaction, and audit logging.
Predictability: A unified Mental Model for streaming, structured outputs, and vision.

⚡ The 5-Minute Path

import { createLLM } from "@node-llm/core";

// 1. Explicit Initialization (Preferred)
const llm = createLLM({ provider: "openai" });
const chat = llm.chat("gpt-4o");

// 2. Chat (High-level request/response)
const response = await chat.ask("Explain event-driven architecture");
console.log(response.content);

// 3. Streaming (Standard AsyncIterator)
for await (const chunk of chat.stream("Explain event-driven architecture")) {
  process.stdout.write(chunk.content);
}

🚀 Why Use This Over Official SDKs?

Feature	NodeLLM	Official SDKs	Architectural Impact
Provider Logic	Transparently Handled	Exposed to your code	Low Coupling
Streaming	Standard `AsyncIterator`	Vendor-specific Events	Predictable Data Flow
Tool Loops	Automated Recursion	Manual implementation	Reduced Boilerplate
Files/Vision	Intelligent Path/URL handling	Base64/Buffer management	Cleaner Service Layer
Configuration	Centralized & Global	Per-instance initialization	Easier Lifecycle Mgmt

🔮 Capabilities

💬 Unified Chat

Stop rewriting code for every provider. NodeLLM normalizes inputs and outputs into a single, predictable mental model.

import { createLLM } from "@node-llm/core";

const llm = createLLM({ provider: "openai" });
const chat = llm.chat("gpt-4o");
await chat.ask("Hello world");

🛠️ Auto-Executing Tools

Define tools once using our clean Class-Based DSL; NodeLLM manages the recursive execution loop for you.

import { Tool, z } from "@node-llm/core";

class WeatherTool extends Tool {
  name = "get_weather";
  description = "Get current weather";
  schema = z.object({ loc: z.string() });

  async handler({ loc }) {
    return `Sunny in ${loc}`;
  }
}

await chat.withTool(WeatherTool).ask("Weather in Tokyo?");

💾 Persistence Layer

Automatically track chat history, tool executions, and API metrics with @node-llm/orm. Now with full support for Extended Thinking persistence.

import { createChat } from "@node-llm/orm/prisma";

// Chat state is automatically saved to your database (Postgres/MySQL/SQLite)
const chat = await createChat(prisma, llm, { model: "claude-3-7-sonnet" });

await chat.withThinking({ budget: 16000 }).ask("Develop a strategy");

🧪 Deterministic Testing

Validate your AI agents with VCR cassettes (record/replay) and a Fluent Mocker for unit tests. No more flaky or expensive test runs. Powered by @node-llm/testing.

import { vcr, Mocker } from "@node-llm/testing";

// 1. Integration Tests (VCR)
await vcr.useCassette("pricing_flow", async () => {
  const res = await chat.ask("How much?");
  expect(res.content).toContain("$20/mo");
});

// 2. Unit Tests (Mocker)
const mock = new Mocker()
  .chat("Next step?")
  .respond("Login User")
  .callsTool("getCurrentUser", { id: 1 });

🛡️ Security & Compliance

Implement custom security, PII detection, and compliance logic using pluggable asynchronous hooks (beforeRequest and afterResponse).

🔧 Strategic Configuration

NodeLLM provides a flexible configuration system designed for enterprise usage:

// Switch providers at the framework level
const llm = createLLM({ provider: "anthropic" });

⚡ Scoped Parallelism

Run multiple providers in parallel safely without global configuration side effects using isolated contexts.

const [gpt, claude] = await Promise.all([
  NodeLLM.withProvider("openai").chat("gpt-4o").ask(prompt),
  NodeLLM.withProvider("anthropic").chat("claude-3-5-sonnet").ask(prompt)
]);

🧠 Extended Thinking

Direct access to the thought process of modern reasoning models like Claude 3.7, DeepSeek R1, or OpenAI o1/o3 using a unified interface.

const res = await chat
  .withThinking({ budget: 16000 })
  .ask("Solve this logical puzzle");

console.log(res.thinking.text); // Full chain-of-thought

📋 Supported Providers

Provider	Supported Features
OpenAI	Chat, Streaming, Tools, Vision, Audio, Images, Transcription, Reasoning, Smart Developer Role
Gemini	Chat, Streaming, Tools, Vision, Audio, Video, Embeddings
Anthropic	Chat, Streaming, Tools, Vision, PDF, Structured Output, Extended Thinking (Claude 3.7)
DeepSeek	Chat (V3), Extended Thinking (R1), Tools, Streaming
Bedrock	Chat, Streaming, Tools, Image Gen (Titan/SD), Embeddings, Prompt Caching
OpenRouter	Aggregator, Chat, Streaming, Tools, Vision, Embeddings, Reasoning
Ollama	Local Inference, Chat, Streaming, Tools, Vision, Embeddings

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for more details on how to get started.

🫶 Credits

Heavily inspired by the elegant design of RubyLLM.

Upgrading to v1.6.0? Read the Migration Guide to understand the new strict provider requirements and typed error hierarchy.