Embeddings

Generate high-dimensional vector representations for semantic search, RAG, and clustering with single and batch embedding operations.

Basic Usage
1. Single Text
2. Batch Embeddings
Configuring Models
Reducing Dimensions
Best Practices
Error Handling

Embeddings are vector representations of text used for semantic search, clustering, and similarity comparisons. `NodeLLM` provides a unified interface for generating embeddings across different providers.

Basic Usage

Single Text

import { createLLM } from "@node-llm/core";

const embedding = await NodeLLM.embed("Ruby is a programmer's best friend");

console.log(embedding.vector); // number[] (e.g., 1536 dimensions)
console.log(embedding.dimensions); // 1536
console.log(embedding.model); // "text-embedding-3-small" (default)
console.log(embedding.input_tokens); // Token count

Batch Embeddings

Always batch multiple texts in a single call when possible. This is much more efficient than calling embed in a loop.

const embeddings = await NodeLLM.embed(["First text", "Second text", "Third text"]);

console.log(embeddings.vectors.length); // 3
console.log(embeddings.vectors[0]); // Vector for "First text"

Configuring Models

By default, NodeLLM uses text-embedding-3-small. You can change this globally or per request.

Global Configuration

const llm = createLLM({
  defaultEmbeddingModel: "text-embedding-3-large"
});

Per-Request

const embedding = await NodeLLM.embed("Text", {
  model: "text-embedding-004" // Google Gemini model
});

Custom Models

For models not in the registry (e.g., Azure deployments or new releases), use assumeModelExists.

const embedding = await NodeLLM.embed("Text", {
  model: "new-embedding-v2",
  provider: "openai",
  assumeModelExists: true
});

Reducing Dimensions

Some models (like text-embedding-3-large) allow you to reduce the output dimensions to save on storage and compute, with minimal loss in accuracy.

const embedding = await NodeLLM.embed("Text", {
  model: "text-embedding-3-large",
  dimensions: 256
});

console.log(embedding.vector.length); // 256

Best Practices

Batching: Use NodeLLM.embed(["text1", "text2"]) instead of serial calls.
Caching: Embeddings are deterministic for a given model and text. Cache them in your database to save costs.

Cosine Similarity: To compare two vectors, calculate the cosine similarity. NodeLLM does not include math utilities to keep the core light, but you can implement it easily:

function cosineSimilarity(A: number[], B: number[]) {
  const dotProduct = A.reduce((sum, a, i) => sum + a * B[i], 0);
  const magnitudeA = Math.sqrt(A.reduce((sum, a) => sum + a * a, 0));
  const magnitudeB = Math.sqrt(B.reduce((sum, b) => sum + b * b, 0));
  return dotProduct / (magnitudeA * magnitudeB);
}

Error Handling

Wrap calls in try/catch blocks to handle API outages or rate limits.

try {
  await NodeLLM.embed("Text");
} catch (error) {
  console.error("Embedding failed:", error.message);
}