235 Gemini Cli Tool Architecture

title: “235 Gemini Cli Tool Architecture” tags: [“kb”]

1. Overview

The Gemini CLI’s tool-use functionality is built on a robust and extensible architecture that separates tool definition, registration, and execution. The system is designed around a central ToolRegistry that manages all available tools, whether they are built-in “core” tools, dynamically discovered from the local project, or provided by remote Model Context Protocol (MCP) servers.

2. Core Components

The architecture is primarily composed of three key TypeScript files within the @google/gemini-cli-core package:

  • tools.ts: Defines the foundational interfaces and abstract classes, such as DeclarativeTool (the base class for all tools), ToolInvocation (a single, validated tool call), and ToolBuilder.
  • tool-registry.ts: Implements the ToolRegistry class, which acts as a central repository for all tool definitions. It handles both the programmatic registration of core tools and the dynamic discovery of custom tools.
  • config.ts: Implements the Config class, which acts as a dependency injector. It is responsible for instantiating and holding the single ToolRegistry instance, which is then provided to other parts of the application.

3. The Tool Lifecycle

The process of using a tool from user prompt to final output follows a clear, multi-step lifecycle:

  1. Initialization & Registration:

    • On application startup, a central Config object is created.
    • The Config object’s createToolRegistry() method is called.
    • This method instantiates the ToolRegistry.
    • It then programmatically registers all built-in core tools (e.g., ReadFileTool, ShellTool, EditTool).
    • Finally, it calls discoverAllTools() on the registry to find and register any project-specific tools via a shell command or remote MCP servers.
  2. Schema Generation:

    • The GeminiClient (in core/client.ts), which manages the chat session, retrieves the populated ToolRegistry from the Config object.
    • It calls toolRegistry.getFunctionDeclarations() to get an array of all tool schemas in the format required by the Gemini API.
  3. API Request:

    • The user’s prompt, the conversation history, and the array of tool schemas are sent to the Gemini API.
  4. Tool Invocation:

    • The model processes the request and, if it decides to use a tool, responds with a functionCall containing the tool’s name and arguments.
    • The application layer (e.g., the UI’s useToolScheduler hook) receives this functionCall.
  5. Execution:

    • The application layer uses the Config object to get the ToolRegistry.
    • It looks up the requested tool by name in the registry to get the appropriate DeclarativeTool instance.
    • It calls the tool’s build() method with the arguments provided by the model. This step validates the parameters and returns a ToolInvocation object.
    • It calls the execute() method on the ToolInvocation object, which runs the actual tool logic.
  6. Response and Completion:

    • The output from the tool’s execution is formatted into a functionResponse part.
    • This functionResponse is sent back to the Gemini API in the next turn of the conversation.
    • The model uses the tool’s output to formulate its final, user-facing response.

This architecture effectively decouples the chat logic from the tool implementation, allowing for a flexible system where tools can be added, removed, or discovered without altering the core conversational flow.