Testing

Deterministic testing infrastructure for NodeLLM applications. VCR integration and fluent mocking for reliable AI systems.

Table of contents

  1. Overview
  2. The Philosophy: Two-Tier Testing
    1. 1. VCR (Integration Testing) πŸ“Ό
    2. 2. Mocker (Unit Testing) 🎭
  3. πŸ“Ό VCR Usage
    1. Basic Interaction
    2. Hierarchical Organization (Convention-Based Mode) πŸ“‚
    3. Security & Scrubbing πŸ›‘οΈ
    4. Global Configuration 🌍
    5. Per-Test Overrides
  4. 🎭 Mocker Usage
    1. Fluent Mocking
    2. Streaming Mocks 🌊
    3. Multimodal Mocks 🎨
    4. Call Verification & History πŸ•΅οΈβ€β™€οΈ
  5. πŸ›£οΈ Decision Tree: VCR vs Mocker
    1. At-a-Glance Comparison
  6. βš™οΈ Configuration Contract
  7. πŸ›οΈ Integration with @node-llm/orm
    1. Pattern: Testing Database Persistence
    2. Pattern: Mocking Rare Logic
  8. πŸ§ͺ Framework Integration
    1. Vitest (Native Support)
    2. Jest Compatibility
    3. Framework Support Matrix
    4. Any Test Framework
  9. 🚨 Common Error Scenarios
    1. VCR: Missing Cassette
    2. VCR: Cassette Mismatch
    3. Mocker: Strict Mode Violation
    4. Mocker: Debug Information
  10. 🎯 Advanced Patterns
    1. Pattern: Parametrized Testing with VCR
    2. Pattern: Strict Mode for Safety
    3. Pattern: Testing Streaming
  11. πŸ›οΈ Architecture Contract

Overview

Deterministic testing infrastructure for NodeLLM-powered AI systems. Built for engineers who prioritize Boring Solutions, Security, and High-Fidelity Feedback Loops.

πŸ’‘ What is High-Fidelity? Your tests exercise the same execution path, provider behavior, and tool orchestration as production β€” without live network calls.

Framework Support: βœ… Vitest (native) βœ… Jest (compatible via core APIs) βœ… Any test framework

The Philosophy: Two-Tier Testing

We believe AI testing should never be flaky or expensive. We provide two distinct strategies:

1. VCR (Integration Testing) πŸ“Ό

When to use: To verify your system works with real LLM responses without paying for every test run.

  • High Fidelity: Captures the NodeLLM-normalized LLM execution (model, prompt, tools, retries, and final output), ensuring replay remains stable even if provider APIs change.
  • Security First: Automatically scrubs API Keys and sensitive PII from β€œcassettes”.
  • CI Safe: Fails-fast in CI if a cassette is missing, preventing accidental live API calls.

🚨 CI Safety Guarantee When CI=true, VCR will never record new cassettes. If a matching cassette is missing or mismatched, the test fails immediately.

2. Mocker (Unit Testing) 🎭

⚠️ Note The Mocker does not attempt to simulate model intelligence or reasoning. It deterministically simulates provider responses to validate application logic, error handling, and control flow.

When to use: To test application logic, edge cases (errors, rate limits), and rare tool-calling paths.

  • Declarative: Fluent, explicit API to define expected prompts and responses.
  • Multimodal: Native support for chat, embed, paint, transcribe, and moderate.
  • Streaming: Simulate token-by-token delivery to test real-time UI logic.

πŸ“Ό VCR Usage

Basic Interaction

Wrap your tests in withVCR to automatically record interactions the first time they run.

import { withVCR } from "@node-llm/testing";

it(
  "calculates sentiment correctly",
  withVCR(async () => {
    const result = await mySentimentAgent.run("I love NodeLLM!");
    expect(result.sentiment).toBe("positive");
  })
);

Hierarchical Organization (Convention-Based Mode) πŸ“‚

Organize your cassettes into nested subfolders to match your test suite structure.

import { describeVCR, withVCR } from "@node-llm/testing";

describeVCR("Authentication", () => {
  describeVCR("Login", () => {
    it(
      "logs in successfully",
      withVCR(async () => {
        // Cassette saved to: test/cassettes/authentication/login/logs-in-successfully.json
      })
    );
  });
});

Security & Scrubbing πŸ›‘οΈ

The VCR automatically redacts api_key, authorization, and other sensitive headers. You can add custom redaction:

withVCR({
  // Redact by key name
  sensitiveKeys: ["user_ssn", "stripe_token"],
  
  // Redact by value pattern (Regex)
  sensitivePatterns: [/sk-test-[0-9a-zA-Z]+/g],
  
  // Advanced: Custom function hook
  scrub: (data) => data.replace(/SSN: \d+/g, "[REDACTED_SSN]")
}, async () => { ... });

Global Configuration 🌍

Instead of repeating configuration in every test, set global defaults in your test setup file:

import { configureVCR } from "@node-llm/testing";

configureVCR({
  cassettesDir: "test/__cassettes__", // Configurable global path
  sensitiveKeys: ["user_ssn", "stripe_token"],
  sensitivePatterns: [/sk-test-[0-9a-zA-Z]+/g]
});

Per-Test Overrides

You can still override defaults on a per-test basis:

withVCR({
  // Merged with global config
  sensitiveKeys: ["specific_secret"] 
}, async () => { ... });

🎭 Mocker Usage

Fluent Mocking

Define lightning-fast, zero-network tests for your agents.

import { mockLLM } from "@node-llm/testing";

const mocker = mockLLM();

// Exact match
mocker.chat("Ping").respond("Pong");

// Regex match
mocker.chat(/hello/i).respond("Greetings!");

// Simulate a Tool Call
mocker.chat("What's the weather?").callsTool("get_weather", { city: "London" });

Streaming Mocks 🌊

Test your streaming logic by simulating token delivery.

mocker.chat("Tell a story").stream(["Once ", "upon ", "a ", "time."]);

Multimodal Mocks 🎨

mocker.paint(/a cat/i).respond({ url: "https://mock.com/cat.png" });
mocker.embed("text").respond({ vectors: [[0.1, 0.2, 0.3]] });

Call Verification & History πŸ•΅οΈβ€β™€οΈ

Inspect what requests were sent to your mock, enabling β€œspy” style assertions.

// 1. Check full history
const history = mocker.history;
expect(history.length).toBe(1);

// 2. Filter by method
const chats = mocker.getCalls("chat");
expect(chats[0].args[0].messages[0].content).toContain("Hello");

// 3. Get the most recent call
const lastEmbed = mocker.getLastCall("embed");
expect(lastEmbed.args[0].input).toBe("text to embed");

// 4. Reset history (keep mocks)
mocker.resetHistory();

// 5. Snapshot your prompt structure
// Ensures your system prompts & tool definitions don't drift
expect(mocker.getLastCall().prompt).toMatchSnapshot();

πŸ›£οΈ Decision Tree: VCR vs Mocker

Choose the right tool for your test:

Does your test need to verify behavior against REAL LLM responses?
β”œβ”€ YES β†’ Use VCR (integration testing)
β”‚   β”œβ”€ Do you need to record the first time and replay afterward?
β”‚   β”‚   └─ YES β†’ Use VCR in "record" or "auto" mode
β”‚   β”œβ”€ Are you testing in CI/CD? (No live API calls allowed)
β”‚   β”‚   └─ YES β†’ Set VCR_MODE=replay in CI
β”‚   └─ Need custom scrubbing for sensitive data?
β”‚       └─ YES β†’ Use withVCR({ scrub: ... })
β”‚
└─ NO β†’ Use Mocker (unit testing)
    β”œβ”€ Testing error handling, edge cases, or rare paths?
    β”‚   └─ YES β†’ Mock the error with mocker.chat(...).respond({ error: ... })
    β”œβ”€ Testing streaming token delivery?
    β”‚   └─ YES β†’ Use mocker.chat(...).stream([...])
    └─ Testing tool-calling paths without real tools?
        └─ YES β†’ Use mocker.chat(...).callsTool(name, params)

Quick Reference:

  • VCR: Database queries, API calls, real provider behavior, network latency
  • Mocker: Business logic, UI interactions, error scenarios, tool orchestration

At-a-Glance Comparison

Use Case VCR Mocker
Real provider behavior βœ… ❌
CI-safe (no live calls) βœ… (after record) βœ…
Zero network overhead ❌ (first run) βœ…
Error simulation ⚠️ (record real) βœ…
Tool orchestration βœ… βœ…
Streaming tokens βœ… βœ…

βš™οΈ Configuration Contract

Env Variable Description Default
VCR_MODE record, replay, auto, or passthrough auto
VCR_CASSETTE_DIR Base directory for cassettes test/cassettes
CI When true, VCR prevents recording and forces exact matches (Auto-detected)

πŸ›οΈ Integration with @node-llm/orm

The testing tools operate at the providerRegistry level. This means they automatically intercept LLM calls made by the ORM layer.

Pattern: Testing Database Persistence

When using @node-llm/orm, you can verify both the database state and the LLM response in a single test.

import { withVCR } from "@node-llm/testing";
import { createChat } from "@node-llm/orm/prisma";

it(
  "saves the LLM response to the database",
  withVCR(async () => {
    // 1. Setup ORM Chat
    const chat = await createChat(prisma, llm, { model: "gpt-4" });

    // 2. Interaction (VCR intercepts the LLM call)
    await chat.ask("Hello ORM!");

    // 3. Verify DB state (standard Prisma/ORM assertions)
    const messages = await prisma.assistantMessage.findMany({
      where: { chatId: chat.id }
    });

    expect(messages).toHaveLength(2); // User + Assistant
    expect(messages[1].content).toBeDefined();
  })
);

Pattern: Mocking Rare Logic

Use the Mocker to test how your application handles complex tool results or errors without setting up a real LLM.

import { mockLLM } from "@node-llm/testing";

it("handles tool errors in ORM sessions", async () => {
  const mocker = mockLLM();
  mocker.chat("Search docs").respond({ error: new Error("DB Timeout") });

  const chat = await loadChat(prisma, llm, "existing-id");

  await expect(chat.ask("Search docs")).rejects.toThrow("DB Timeout");
});

πŸ§ͺ Framework Integration

Vitest (Native Support)

Vitest is the primary test framework with optimized helpers:

import { it, describe } from "vitest";
import { mockLLM, withVCR, describeVCR } from "@node-llm/testing";

describeVCR("Payments", () => {
  it(
    "processes successfully",
    withVCR(async () => {
      // ✨ withVCR auto-detects test name ("processes successfully")
      // ✨ describeVCR auto-manages scopes
    })
  );
});

Jest Compatibility

All core APIs work with Jest. The only difference: withVCR() can’t auto-detect test names, so provide it manually:

import { describe, it } from "@jest/globals";
import { mockLLM, setupVCR, describeVCR } from "@node-llm/testing";

describeVCR("Payments", () => {
  it("processes successfully", async () => {
    // βœ… describeVCR works with Jest (framework-agnostic)
    // ⚠️ withVCR doesn't work here (needs Vitest's expect.getState())
    // βœ… Use setupVCR instead:
    const vcr = setupVCR("processes", { mode: "record" });

    const mocker = mockLLM();  // βœ… works with Jest
    mocker.chat("pay").respond("done");

    // Test logic here

    await vcr.stop();
  });
});

Framework Support Matrix

API Vitest Jest Any Framework
mockLLM() βœ… βœ… βœ…
describeVCR() βœ… βœ… βœ…
setupVCR() βœ… βœ… βœ…
withVCR() βœ… (auto name) ⚠️ (manual name) ⚠️ (manual name)
Mocker class βœ… βœ… βœ…
VCR class βœ… βœ… βœ…

Only withVCR() is Vitest-specific because it auto-detects test names. All other APIs are framework-agnostic.

Any Test Framework

Using raw classes for maximum portability:

import { Mocker, VCR } from "@node-llm/testing";

// Mocker - works everywhere
const mocker = new Mocker();
mocker.chat("hello").respond("hi");

// VCR - works everywhere
const vcr = new VCR("test-name", { mode: "record" });
// ... run test ...
await vcr.stop();

🚨 Common Error Scenarios

VCR: Missing Cassette

Error: Error: Cassette file not found

Cause: VCR is in replay mode but the cassette doesn’t exist yet.

Solution:

# Record it first
VCR_MODE=record npm test

# Or use auto mode (records if missing, replays if exists)
VCR_MODE=auto npm test

VCR: Cassette Mismatch

Error: AssertionError: No interaction matched the request

Cause: Your code is making a request that doesn’t match any recorded interaction.

Solution:

# Re-record the cassette
rm -rf test/cassettes/your-test
VCR_MODE=record npm test -- your-test

Mocker: Strict Mode Violation

Error: Error: No mock defined for prompt: "unexpected question"

Cause: Your code asked a question you didn’t mock in strict mode.

Solution:

// Add the missing mock
mocker.chat("unexpected question").respond("mocked response");

// Or disable strict mode
const mocker = mockLLM({ strict: false });

Mocker: Debug Information

Get insight into what mocks are registered:

const mocker = mockLLM();
mocker.chat("hello").respond("hi");
mocker.embed("text").respond({ vectors: [[0.1, 0.2]] });

const debug = mocker.getDebugInfo();
console.log(debug);
// Output: { totalMocks: 2, methods: ["chat", "embed"] }

🎯 Advanced Patterns

Pattern: Parametrized Testing with VCR

Test the same logic against multiple scenarios by organizing cassettes hierarchically:

describeVCR("Payment Processing", () => {
  ["visa", "mastercard", "amex"].forEach((cardType) => {
    describeVCR(cardType, () => {
      it(
        "processes payment",
        withVCR(async () => {
          const result = await processor.pay({
            amount: 100,
            cardType
          });
          expect(result.status).toBe("success");
        })
      );
    });
  });
});

// Cassettes created at:
// test/cassettes/payment-processing/visa/processes-payment.json
// test/cassettes/payment-processing/mastercard/processes-payment.json
// test/cassettes/payment-processing/amex/processes-payment.json

Pattern: Strict Mode for Safety

Enforce that every expected interaction is mocked:

describe("Customer Service Bot", () => {
  it("responds to greeting", async () => {
    const mocker = mockLLM({ strict: true });
    mocker.chat("hello").respond("Hello! How can I help?");
    
    await bot.handle("hello");
    // Pass βœ…
  });

  it("fails if unmocked", async () => {
    const mocker = mockLLM({ strict: true });
    mocker.chat("hello").respond("Hello!");
    
    // This throws because "goodbye" wasn't mocked
    await expect(bot.handle("goodbye")).rejects.toThrow();
  });
});

Pattern: Testing Streaming

Simulate token delivery to verify UI updates correctly:

it("displays tokens as they arrive", async () => {
  const mocker = mockLLM();
  mocker.chat("Write a poem").stream([
    "Roses ",
    "are ",
    "red\n",
    "Violets ",
    "are ",
    "blue"
  ]);

  const tokens: string[] = [];
  const chat = NodeLLM.chat("gpt-4");
  for await (const chunk of chat.stream("Write a poem")) {
    tokens.push(chunk.content || "");
  }

  expect(tokens).toEqual([
    "Roses ",
    "are ",
    "red\n",
    "Violets ",
    "are ",
    "blue"
  ]);
});

πŸ›οΈ Architecture Contract

  • No Side Effects: Mocks and VCR interceptors are automatically cleared after each test turn.
  • Deterministic: The same input MUST always yield the same output in Replay mode.
  • Explicit > Implicit: We prefer explicit mock definitions over complex global state.

## πŸ›‘ When Not to Use @node-llm/testing

  • Do not use VCR for rapid prompt iteration β€” use live calls instead.
  • Do not use Mocker to validate response quality or correctness.
  • Do not commit cassettes for experimental or throwaway prompts.