Gemini

Leverage Google’s powerful multimodal capabilities with native support for image, audio, and video processing alongside long-context reasoning.

Table of contents

  1. Configuration
  2. Specific Parameters
  3. Features
  4. Video Support

Google’s Gemini provider offers multimodal capabilities including native video and audio understanding.


Configuration

import { createLLM } from "@node-llm/core";

const llm = createLLM({ 
  provider: "gemini", 
  geminiApiKey: process.env.GEMINI_API_KEY // Optional if set in env 
});

Specific Parameters

Gemini uses generationConfig and safetySettings.

const chat = llm.chat("gemini-1.5-pro").withParams({
  generationConfig: {
    topP: 0.8,
    topK: 40,
    maxOutputTokens: 8192
  },
  safetySettings: [
    {
      category: "HARM_CATEGORY_HARASSMENT",
      threshold: "BLOCK_LOW_AND_ABOVE"
    }
  ]
});

Features

  • Models: gemini-1.5-pro, gemini-1.5-flash, gemini-2.0-flash.
  • Multimodal: Supports images, audio, and video files directly.
  • Tools: Supported.
  • System Instructions: Supported.

Video Support

Gemini is unique in its ability to natively process video files.

await chat.ask("What happens in this video?", {
  files: ["./video.mp4"]
});