Gemini v1.0.0+

Leverage Google’s powerful multimodal capabilities with native support for image, audio, and video processing alongside long-context reasoning.

Configuration
1. Custom Endpoint
Specific Parameters
Features
Video Support
Embeddings
Image Generation
Transcription
Getting an API Key

Google’s Gemini provider offers multimodal capabilities including native video and audio understanding.

Configuration

import { createLLM } from "@node-llm/core";

const llm = createLLM({ 
  provider: "gemini", 
  geminiApiKey: process.env.GEMINI_API_KEY // Optional if set in env 
});

Custom Endpoint

To route requests through a proxy or Gemini-compatible gateway, override the base URL via geminiApiBase (or the GEMINI_API_BASE environment variable):

const llm = createLLM({
  provider: "gemini",
  geminiApiBase: "https://my-proxy.example.com/v1beta"
});

Specific Parameters

Gemini uses generationConfig and safetySettings.

const chat = llm.chat("gemini-pro-latest").withParams({
  generationConfig: {
    topP: 0.8,
    topK: 40,
    maxOutputTokens: 8192
  },
  safetySettings: [
    {
      category: "HARM_CATEGORY_HARASSMENT",
      threshold: "BLOCK_LOW_AND_ABOVE"
    }
  ]
});

Features

Models: gemini-pro-latest, gemini-flash-latest, gemini-flash-lite-latest.
Multimodal: Supports images, audio, and video files directly.
Tools: Supported.
System Instructions: Supported.
Structured Output: Native JSON schema support.
Embeddings: Vector generation via text-embedding-004.
Image Generation: Imagen model support via llm.paint().
Transcription: Audio transcription via Gemini’s multimodal generateContent endpoint.

Video Support

Gemini is unique in its ability to natively process video files.

await chat.ask("What happens in this video?", {
  files: ["./video.mp4"]
});

Embeddings

Generate vector embeddings using Gemini’s embedding models.

const embedding = await llm.embed("The concept of general relativity", {
  model: "text-embedding-004"
});

console.log(embedding.vector); // number[]

Image Generation

Use the paint() method to generate images with Imagen.

const response = await llm.paint("A futuristic city on Mars, high quality, 4k", {
  model: "imagen-4.0-generate-001"
});

await response.save("./mars-city.png");

Transcription

Transcribe audio files using Gemini’s native multimodal understanding.

const transcription = await llm.transcribe("./meeting.mp3", {
  model: "gemini-flash-latest"
});

console.log(transcription.text);