⚙️ Core Tools
Terminal
Default On ★★★★★
Execute shell commands, manage background processes, run scripts. The primary way Hermes interacts with your system. Supports local, Docker, SSH, and Modal backends.
Shell executionProcess managementPTY modeMulti-backend (local/docker/ssh/modal)
PTY mode has \r vs \n issues with prompt_toolkit apps. Prefer tmux for interactive spawning.
⚙️ Core Tools
File System
Default On ★★★★★
Read, write, search, and patch files on the local filesystem. Replaces cat/grep/sed with agent-friendly structured operations.
File read/writeContent search (ripgrep-backed)Find-and-replace patchingSyntax linting on write
write_file completely overwrites files. Use patch for targeted edits to avoid losing content.
⚙️ Core Tools
Web Search
Default On ★★★★★
Internet search and content extraction. The primary research tool — finds URLs, fetches content, and extracts structured data from web pages.
Web searchContent extractionURL fetchingMulti-backend (Firecrawl, Tavily, SearXNG, etc.)
Requires: FIRECRAWL_API_KEY or TAVILY_API_KEY
Requires at least one search backend configured. Some sites block automated access.
⚙️ Core Tools
Browse, install, create, and manage skills. Skills are reusable procedure documents that teach the agent how to do specific tasks.
Skill search/installSkill creationSkill managementHub publishing
None — fully self-contained.
⚙️ Core Tools
Persistent cross-session memory. Stores facts about the user, environment, and lessons learned. Pluggable backends (built-in SQLite, Honcho, Mem0).
Fact storage/retrievalUser preference learningCross-session persistencePluggable backends
Memory is bounded (~2KB). Old entries are evicted when full. Cloud backends need API keys.
⚙️ Core Tools
Session Search
Default On ★★★★☆
Search past conversations using FTS5 full-text search. Retrieves summaries of matching sessions. Essential for cross-session context.
FTS5 full-text searchSession summariesRecent session browsingRole-based filtering
Search uses OR between keywords by default (AND for phrases). Recent sessions mode has no LLM cost.
⚙️ Core Tools
Delegation
Default On ★★★★☆
Spawn subagents with isolated contexts and terminal sessions. Supports parallel batch execution (up to 3 concurrent children).
Subagent spawningParallel batch executionIsolated context/terminalLeaf and orchestrator roles
Not durable — children are cancelled if parent is interrupted. Use cron jobs or background terminal for persistent work.
⚙️ Core Tools
Cron Jobs
Default On ★★★★★
Built-in scheduler for recurring tasks. Supports durations, cron expressions, and ISO timestamps. Jobs run autonomously with configurable model/skills/delivery.
Scheduled executionPer-job model overrideScript pre-runMulti-platform deliveryWatchdog pattern (no_agent)
Schedule format: duration (30m), cron (0 9 * * *), or ISO. "every sunday" phrases not supported.
⚙️ Core Tools
Ask the user clarifying questions when a task is ambiguous. Supports multiple choice (up to 4 options) and open-ended modes.
Multiple choice promptsOpen-ended questionsInline "Other" option
Overuse can be annoying. Prefer making a reasonable default when the decision is low-stakes.
⚙️ Core Tools
Task List
Default On ★★★★☆
In-session task tracking with priority ordering. Supports create, update, mark complete, cancel, and merge operations. One task in_progress at a time.
Task CRUDPriority orderingMerge/replace modesProgress tracking
Tasks are session-scoped — not persistent across sessions. Use kanban for durable task management.
⚙️ Core Tools
Code Execution
Default On ★★★★☆
Sandboxed Python execution with access to file/search/patch/terminal tools. Use for multi-step processing, data filtering, and conditional logic between tool calls.
Python executionTool library access5-minute timeout50KB stdout cap
Foreground-only (no background/pty). 50 tool calls per script. Stdout capped at 50KB.
🌐 Web & Browser
Full browser automation — navigate pages, click elements, type text, take screenshots, read console output, and execute JavaScript. Supports local Chromium, Browserbase, and Camofox backends.
Page navigationElement interactionScreenshot + vision analysisConsole outputJavaScript evaluationScroll/click/type
Requires: BROWSERBASE_API_KEY or local Chromium
Resource-heavy. Prefer web_search for simple lookups. Local Chromium must be installed separately.
🌐 Web & Browser
Image analysis — load and describe images from URLs, file paths, or data URIs. Falls back to an auxiliary vision model if the main model lacks vision capabilities.
Image loading (URL/file/data URI)Visual descriptionFallback to auxiliary model
Some models lack native vision — falls back to slower auxiliary model. File paths must be absolute.
🎵 Media
Image Generation
Opt-in ★★★★☆
AI image generation via multiple backends. Supports OpenAI gpt-image-2, xAI Grok-Imagine, and more via plugins.
Text-to-imageMulti-backendImage caching
Requires: OPENAI_API_KEY or XAI_API_KEY
Requires a backend plugin with API key. Not available on all platforms.
🎵 Media
Video analysis and generation. Supports FAL.ai multi-model (Veo 3.1, Kling, Pixverse) and xAI Grok-Imagine backends.
Text-to-videoImage-to-videoVideo analysis
Requires: FAL_KEY or XAI_API_KEY
Expensive API costs. FAL is the more mature backend.
🎵 Media
Text-to-Speech
Default On ★★★★☆
Convert text to spoken audio. Supports Edge TTS (free, default), ElevenLabs, OpenAI, MiniMax, Mistral, and local NeuTTS.
Text-to-audioMulti-providerVoice memo saving
Requires: Provider-dependent
Edge TTS works out of the box. Cloud providers need API keys. 4096-15000 char limits per provider.
⚡ Automation
Durable SQLite-based work queue for multi-agent coordination. Tasks have lifecycle (create → assign → complete/block), comments, and links. Dispatcher auto-assigns to worker profiles.
Task lifecycleMulti-profile assignmentComments and linksAuto-dispatchHeartbeat monitoring
Best used with multi-profile setups. Single-user kanban adds overhead without benefit.
💬 Messaging
Messaging
Default On ★★★★☆
Cross-platform message sending. Routes messages through the gateway to any connected platform — Telegram, Discord, Slack, Signal, and more.
Cross-platform sendGateway routingPlatform-specific formatting
Depends on gateway being active. Not available in CLI-only mode.
💬 Messaging
Discord integration tools for the gateway. Enables the Hermes Discord bot to read and respond in channels and DMs.
Channel messagingDM handlingMessage history reading
Requires: Discord bot token
Requires Message Content Intent enabled in Discord Developer Portal.
💬 Messaging
Discord Admin
Opt-in ★★★☆☆
Discord admin and moderation tools — manage users, roles, channels, and server settings through the agent.
User managementRole managementChannel managementModeration actions
Requires: Discord bot token with admin permissions
Requires elevated Discord permissions. Use with caution — moderation actions are irreversible.
🧠 AI / ML
Reinforcement Learning
Opt-in ★★☆☆☆
Reinforcement learning tools for training and evaluating AI models. Off by default — niche use case for ML researchers.
RL training loopsModel evaluation
Requires: ML framework dependencies
Experimental. Not recommended for general use.
🧠 AI / ML
Mixture of Agents
Opt-in ★★★☆☆
Mixture of Agents pattern — runs multiple model instances in parallel and aggregates their outputs for improved quality. Off by default.
Parallel model inferenceOutput aggregationQuality improvement
Requires: Multiple API keys
Token cost multiplies by number of agents. Experimental feature.
🔧 Developer
Extra introspection and debugging tools. Adds verbose logging, state inspection, and diagnostic capabilities. Off by default.
Verbose loggingState inspectionDiagnostic output
Generates a lot of output. Enable only when debugging specific issues.
🔧 Developer
Minimal, low-risk toolset for locked-down sessions. Strips dangerous tools (terminal, browser, delegation) for safe exploration.
Read-only operationsMinimal tool surfaceReduced risk profile
Very limited functionality. Use only for untrusted or shared environments.
🔗 Integrations
Spotify playback control — play, pause, skip, queue, search, manage playlists and library. Uses Spotify Web API with PKCE OAuth via the Spotify plugin.
Playback controlDevice managementQueue managementSearchPlaylist/library management
Requires: Spotify Premium, hermes auth spotify
Requires Spotify Premium. One-time OAuth setup via hermes auth spotify.
🔗 Integrations
Home Assistant
Opt-in ★★★☆☆
Smart home control via Home Assistant integration. Control lights, switches, sensors, and automations through the agent.
Device controlState queriesAutomation triggers
Requires: Home Assistant URL + token
Requires running Home Assistant instance. Off by default for security reasons.
🔗 Integrations
Feishu (Lark) document tools — create, read, and edit Feishu documents through the agent.
Document CRUDBlock operations
Requires: Feishu API credentials
Feishu-specific. Only useful if you use the Feishu/Lark platform.
🔗 Integrations
Feishu Drive
Opt-in ★★★☆☆
Feishu (Lark) drive tools — manage files and folders in Feishu Cloud Drive.
File managementFolder operations
Requires: Feishu API credentials
Feishu-specific. Requires separate API setup from Feishu Docs.
🔗 Integrations
Yuanbao (Tencent) integration — @mention users in groups, query member information and group details.
Group member queries@mention supportGroup information
Requires: Yuanbao API credentials
China-specific platform. Requires Yuanbao account.