NodeLLM

Introduction

npm version GitHub Repository CI TypeScript License: MIT

The Provider-Agnostic LLM Runtime for Node.js.

NodeLLM is a backend orchestration layer designed for building reliable, testable, and provider-agnostic AI systems.

It is not a “simple API wrapper” or a “prompt engineering tool.” NodeLLM deals with the hard infrastructure problems: normalizing streaming across providers, managing tool execution loops, enforcing timeouts, and enabling first-class testing and telemetry.

Unified support for

                Your App
                   ↓
NodeLLM (Unified API + State + Security)
                   ↓
 OpenAI | Anthropic | Bedrock | Ollama

🛑 What NodeLLM is NOT

To understand NodeLLM, you must understand what it is NOT.

NodeLLM is NOT:

  • A thin wrapper around vendor SDKs (like openai or @anthropic-ai/sdk)
  • A UI streaming library (like Vercel AI SDK)
  • A prompt-only framework

NodeLLM IS:

  • A Backend Runtime: Designed for workers, cron jobs, agents, and API servers.
  • Provider Agnostic: Switches providers via config, not code rewrites.
  • Contract Driven: Guarantees identical behavior for Tools and Streaming across all models.
  • Infrastructure First: Built for evals, telemetry, retries, and circuit breaking.

🏗️ The “Infrastructure-First” Approach

Most AI SDKs optimize for “getting a response to the user fast” (Frontend/Edge). NodeLLM optimizes for system reliability (Backend).

It is designed for architects and platform engineers who need:

  • Strict Process Protection: Preventing hung requests from stalling event loops.
  • Normalized Persistence: Treating chat interactions as database records via @node-llm/orm.
  • Determinism: Testing your AI logic with VCR recordings and time-travel debugging.

Strategic Goals

  • Decoupling: Isolate your business logic from the rapid churn of AI model versions.
  • Production Safety: Native support for circuit breaking, redaction, and audit logging.
  • Predictability: A unified Mental Model for streaming, structured outputs, and vision.

⚡ The 5-Minute Path

import { createLLM } from "@node-llm/core";

// 1. Explicit Initialization (Preferred)
const llm = createLLM({ provider: "openai" });
const chat = llm.chat("gpt-4o");

// 2. Chat (High-level request/response)
const response = await chat.ask("Explain event-driven architecture");
console.log(response.content);

// 3. Streaming (Standard AsyncIterator)
for await (const chunk of chat.stream("Explain event-driven architecture")) {
  process.stdout.write(chunk.content);
}

🚀 Why Use This Over Official SDKs?

Feature NodeLLM Official SDKs Architectural Impact
Provider Logic Transparently Handled Exposed to your code Low Coupling
Streaming Standard AsyncIterator Vendor-specific Events Predictable Data Flow
Tool Loops Automated Recursion Manual implementation Reduced Boilerplate
Files/Vision Intelligent Path/URL handling Base64/Buffer management Cleaner Service Layer
Configuration Centralized & Global Per-instance initialization Easier Lifecycle Mgmt

🔮 Capabilities

💬 Unified Chat

Stop rewriting code for every provider. NodeLLM normalizes inputs and outputs into a single, predictable mental model.

import { createLLM } from "@node-llm/core";

const llm = createLLM({ provider: "openai" });
const chat = llm.chat("gpt-4o");
await chat.ask("Hello world");

🛠️ Auto-Executing Tools

Define tools once using our clean Class-Based DSL; NodeLLM manages the recursive execution loop for you.

import { Tool, z } from "@node-llm/core";

class WeatherTool extends Tool {
  name = "get_weather";
  description = "Get current weather";
  schema = z.object({ loc: z.string() });

  async handler({ loc }) {
    return `Sunny in ${loc}`;
  }
}

await chat.withTool(WeatherTool).ask("Weather in Tokyo?");

💾 Persistence Layer

Automatically track chat history, tool executions, and API metrics with @node-llm/orm. Now with full support for Extended Thinking persistence.

import { createChat } from "@node-llm/orm/prisma";

// Chat state is automatically saved to your database (Postgres/MySQL/SQLite)
const chat = await createChat(prisma, llm, { model: "claude-3-7-sonnet" });

await chat.withThinking({ budget: 16000 }).ask("Develop a strategy");

🧪 Deterministic Testing

Validate your AI agents with VCR cassettes (record/replay) and a Fluent Mocker for unit tests. No more flaky or expensive test runs. Powered by @node-llm/testing.

import { vcr, Mocker } from "@node-llm/testing";

// 1. Integration Tests (VCR)
await vcr.useCassette("pricing_flow", async () => {
  const res = await chat.ask("How much?");
  expect(res.content).toContain("$20/mo");
});

// 2. Unit Tests (Mocker)
const mock = new Mocker()
  .chat("Next step?")
  .respond("Login User")
  .callsTool("getCurrentUser", { id: 1 });

🛡️ Security & Compliance

Implement custom security, PII detection, and compliance logic using pluggable asynchronous hooks (beforeRequest and afterResponse).

🔧 Strategic Configuration

NodeLLM provides a flexible configuration system designed for enterprise usage:

// Switch providers at the framework level
const llm = createLLM({ provider: "anthropic" });

⚡ Scoped Parallelism

Run multiple providers in parallel safely without global configuration side effects using isolated contexts.

const [gpt, claude] = await Promise.all([
  NodeLLM.withProvider("openai").chat("gpt-4o").ask(prompt),
  NodeLLM.withProvider("anthropic").chat("claude-3-5-sonnet").ask(prompt)
]);

🧠 Extended Thinking

Direct access to the thought process of modern reasoning models like Claude 3.7, DeepSeek R1, or OpenAI o1/o3 using a unified interface.

const res = await chat
  .withThinking({ budget: 16000 })
  .ask("Solve this logical puzzle");

console.log(res.thinking.text); // Full chain-of-thought

📋 Supported Providers

Provider Supported Features
OpenAI Chat, Streaming, Tools, Vision, Audio, Images, Transcription, Reasoning, Smart Developer Role
Gemini Chat, Streaming, Tools, Vision, Audio, Video, Embeddings
Anthropic Chat, Streaming, Tools, Vision, PDF, Structured Output, Extended Thinking (Claude 3.7)
DeepSeek Chat (V3), Extended Thinking (R1), Tools, Streaming
Bedrock Chat, Streaming, Tools, Image Gen (Titan/SD), Embeddings, Prompt Caching
OpenRouter Aggregator, Chat, Streaming, Tools, Vision, Embeddings, Reasoning
Ollama Local Inference, Chat, Streaming, Tools, Vision, Embeddings

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for more details on how to get started.


🫶 Credits

Heavily inspired by the elegant design of RubyLLM.


Upgrading to v1.6.0? Read the Migration Guide to understand the new strict provider requirements and typed error hierarchy.