Gemini v1.0.0+

Leverage Google’s powerful multimodal capabilities with native support for image, audio, and video processing alongside long-context reasoning.

Table of contents

  1. Configuration
  2. Specific Parameters
  3. Features
  4. Video Support
  5. Getting an API Key

Google’s Gemini provider offers multimodal capabilities including native video and audio understanding.


Configuration

import { createLLM } from "@node-llm/core";

const llm = createLLM({ 
  provider: "gemini", 
  geminiApiKey: process.env.GEMINI_API_KEY // Optional if set in env 
});

Specific Parameters

Gemini uses generationConfig and safetySettings.

const chat = llm.chat("gemini-1.5-pro").withParams({
  generationConfig: {
    topP: 0.8,
    topK: 40,
    maxOutputTokens: 8192
  },
  safetySettings: [
    {
      category: "HARM_CATEGORY_HARASSMENT",
      threshold: "BLOCK_LOW_AND_ABOVE"
    }
  ]
});

Features

  • Models: gemini-1.5-pro, gemini-1.5-flash, gemini-2.0-flash.
  • Multimodal: Supports images, audio, and video files directly.
  • Tools: Supported.
  • System Instructions: Supported.

Video Support

Gemini is unique in its ability to natively process video files.

await chat.ask("What happens in this video?", {
  files: ["./video.mp4"]
});

Getting an API Key

Sign up and get your API key at aistudio.google.com/apikey.