Middlewares
NodeLLM’s middleware system allows you to intercept, monitor, and modify LLM interactions at the infrastructure level. This is essential for building production-grade systems that require observability, auditing, cost tracking, and safety.
- Why Middlewares?
- Basic Usage
- The Middleware Interface (v1.10.0+)
- Common Use Cases
- Middleware Execution Order (v1.10.0+)
- Standard Middleware Library
- Global Middlewares
- Integration with @node-llm/orm
Why Middlewares?
In a production environment, you rarely want to call an LLM directly without additional logic. Middlewares allow you to separate these cross-cutting concerns from your business logic:
- Observability: Log requests, responses, and errors to external systems.
- Cost Tracking: Calculate and record the token usage and cost of every request.
- Security & Compliance: Redact PII (Personally Identifiable Information) before it segments to the LLM.
- Auditing: Maintain a permanent audit trail of all AI interactions.
- Performance: Track latency and success rates across different models.
- Quality: Automatically verify the integrity of the response.
Basic Usage
Middlewares are passed when creating a chat instance. You can pass a single middleware or an array of middlewares.
import { NodeLLM, Middleware, ChatResponseString } from "@node-llm/core";
const myMiddleware: Middleware = {
name: "MyMiddleware",
onRequest: async (context) => {
console.log(`[Request] Sending to ${context.model}`);
},
onResponse: async (context, result) => {
// result is NodeLLMResponse (union of Chat, Image, Transcription, etc.)
if (result instanceof ChatResponseString) {
console.log(`[Response] Received ${result.usage.total_tokens} tokens`);
}
}
};
const chat = NodeLLM.chat("gpt-4o", {
middlewares: [myMiddleware]
});
await chat.ask("Hello world");
The Middleware Interface (v1.10.0+)
A middleware consists of a unique name and several optional hooks that cover the entire lifecycle of an LLM request, including tool execution.
interface Middleware {
name: string;
onRequest?: (context: MiddlewareContext) => Promise<void> | void;
onResponse?: (context: MiddlewareContext, result: NodeLLMResponse) => Promise<void> | void;
onError?: (context: MiddlewareContext, error: Error) => Promise<void> | void;
// Tool Execution Hooks
onToolCallStart?: (context: MiddlewareContext, tool: ToolCall) => Promise<void> | void;
onToolCallEnd?: (context: MiddlewareContext, tool: ToolCall, result: unknown) => Promise<void> | void;
onToolCallError?: (context: MiddlewareContext, tool: ToolCall, error: Error) => Promise<ToolErrorDirective> | ToolErrorDirective;
}
Warning: Since
v1.10.0, theonResponsehook receives aNodeLLMResponseunion. You must use a type guard (likeinstanceof ChatResponseString) before accessing chat-specific properties likeusageorcontent.
### MiddlewareContext The context object is persistent across the lifecycle of a single request and provides deep access to the execution state:
requestId: A unique UUID for tracing the request.provider: The provider name (e.g., “openai”, “anthropic”).model: The model identifier.messages: The conversation history (mutable inonRequest).options: TheChatOptionsfor the request (mutable inonRequest).state: A record for sharing data between hooks in the same middleware (e.g., storing a timer).metadata: Custom metadata passed to the request.- Operation Specifics:
input(Embeddings),imageOptions(Paint), etc.
Common Use Cases
1. PII Redaction (Security)
Redact sensitive information before it reaches the provider.
const piiMiddleware = {
name: "PIIRedactor",
onRequest: async (context) => {
context.messages.forEach(msg => {
msg.content = msg.content.replace(/\b\d{4}-\d{4}-\d{4}-\d{4}\b/g, "[REDACTED_CC]");
});
}
}
2. Cost & Performance Tracking
Use standard telemetry patterns to track your infrastructure.
const perfMiddleware = {
name: "PerformanceTracker",
onRequest: async (context) => {
context.state.startTime = Date.now();
},
onResponse: async (context, result) => {
const latency = Date.now() - (context.state.startTime as number);
// Safety check for usage metrics
if (result instanceof ChatResponseString) {
const cost = calculateCost(context.model, result.usage);
await db.metrics.create({ model: context.model, latency, cost });
}
}
}
Middleware Execution Order (v1.10.0+)
Middlewares are executed as a stack (Onion model). This ensures that outer middlewares (like loggers) can correctly wrap and observe the transformations made by inner middlewares (like security maskers).
- onRequest: Executed in order (first to last).
- onToolCallStart: Executed in order (first to last).
- onToolCallEnd: Executed in REVERSE order (last to first).
- onToolCallError: Executed in REVERSE order (last to first).
- onResponse: Executed in REVERSE order (last to first).
- onError: Executed in REVERSE order (last to first).
Example Lifecycle
If you have two middlewares: [Logger, Security], the execution order for a successful tool-calling request is:
Logger.onRequestSecurity.onRequestLogger.onToolCallStartSecurity.onToolCallStart- … Tool Execution …
Security.onToolCallEndLogger.onToolCallEndSecurity.onResponseLogger.onResponse
Standard Middleware Library
NodeLLM includes a set of pre-built, production-ready middlewares that you can use out of the box.
1. PIIMaskMiddleware
Automatically redacts sensitive information like emails, phone numbers, and credit cards from user messages before they are sent to the LLM.
import { NodeLLM, PIIMaskMiddleware } from "@node-llm/core";
const chat = NodeLLM.chat("gpt-4o", {
middlewares: [new PIIMaskMiddleware({ mask: "[SECRET]" })]
});
2. CostGuardMiddleware
Monitors accumulated cost during a session (especially useful for multi-turn tool calling loops) and throws an error if a defined budget is exceeded.
import { NodeLLM, CostGuardMiddleware } from "@node-llm/core";
const chat = NodeLLM.chat("gpt-4o", {
middlewares: [
new CostGuardMiddleware({
maxCost: 0.05, // $0.05 budget
onLimitExceeded: (ctx, cost) => console.log(`Budget blown for ${ctx.requestId}`)
})
]
});
3. UsageLoggerMiddleware
Standardizes telemetry by logging token usage, request IDs, and calculated costs for every successful interaction.
import { NodeLLM, UsageLoggerMiddleware } from "@node-llm/core";
const chat = NodeLLM.chat("gpt-4o", {
middlewares: [new UsageLoggerMiddleware({ prefix: "MY-APP" })]
});
Global Middlewares
You can also register middlewares at the global level when creating the LLM instance. These will be applied to every chat, embedding, or image generation call made from that instance.
import { createLLM, UsageLoggerMiddleware } from "@node-llm/core";
const llm = createLLM({
provider: "openai",
middlewares: [new UsageLoggerMiddleware()]
});
// This chat will automatically use the global UsageLoggerMiddleware
const chat = llm.chat("gpt-4o");
Integration with @node-llm/orm
When using the ORM, you can pass middlewares directly to the createChat call. They will be applied to the underlying chat instance but will NOT be persisted to the database.
import { createChat } from "@node-llm/orm/prisma";
const chat = await createChat(prisma, llm, {
model: "gpt-4o",
middlewares: [new UsageLoggerMiddleware()]
});