Skip to main content

Bellwether

Automated behavioral documentation for MCP servers through LLM-guided testing.

Bellwether is a CLI tool that generates comprehensive behavioral documentation for Model Context Protocol (MCP) servers. Instead of relying on manually written docs, Bellwether interviews your MCP server by:

  1. Discovering available tools, prompts, and resources
  2. Generating realistic test scenarios using an LLM
  3. Executing tests and analyzing actual responses
  4. Synthesizing findings into actionable documentation

Why Bellwether?

ProblemSolution
Documentation says one thing, but what does the server actually do?Trust but verify - Interview the server to document real behavior
Breaking changes slip into production unnoticedDrift detection - Catch behavioral changes before they hit production
Security vulnerabilities are hard to discover manuallySecurity insights - Persona-based adversarial testing
Manual testing is slow and expensiveCI/CD integration - Automated regression testing for MCP servers

Key Features

  • AGENTS.md Generation - Human-readable behavioral documentation generated automatically from actual server responses
  • Complete MCP Coverage - Test tools, prompts, and resources with content previews and access patterns
  • Drift Detection - Compare baselines to detect behavioral changes between versions with semantic diff analysis
  • Multi-Persona Testing - Security tester, QA engineer, technical writer, and novice user personas for comprehensive coverage
  • MCP Registry Integration - Search and discover servers from the official MCP Registry
  • Verification Program - Certify your server with Bronze, Silver, Gold, or Platinum tiers
  • GitHub Action - Official action for automated CI/CD integration
  • Multiple Output Formats - Markdown, JSON, JUnit XML, and SARIF for GitHub Code Scanning

How It Works

   MCP Server           Bellwether                    Output
| | |
| tools/list | |
|<------------------| |
| | |
| tools/call | LLM generates |
|<------------------| test scenarios |
| | |
| responses | |
|------------------>| Analyze behavior |
| | |
| |-----------------------> AGENTS.md
| | baseline.json

Output Example

Bellwether generates AGENTS.md files documenting observed server behavior:

# @modelcontextprotocol/server-filesystem

> Generated by Bellwether on 2026-01-12

## Overview

A file management server providing tools for reading, writing, and searching files.

## Tools

### read_file

Read contents of a file from the specified path.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| path | string | yes | Path to the file to read |

**Observed Behavior:**
- Returns file contents as UTF-8 text
- Binary files are returned as base64-encoded content
- Maximum file size: 10MB

**Limitations:**
- Cannot read files outside configured root directory

**Security Considerations:**
- Path traversal attempts (../) are normalized within root

Cost Efficiency

Bellwether uses LLMs for intelligent testing. Typical costs per interview (10 tools, 3 questions each):

ModelCostQuality
gpt-5-mini~$0.02Good (recommended for CI)
claude-haiku-4-5~$0.04Good
gpt-5.2~$0.12Best
claude-sonnet-4-5~$0.13Best
Ollama (local)FreeVariable

Use --quick flag in CI for fastest, cheapest runs (~$0.01).

Quick Example

# Install
npm install -g @dotsetlabs/bellwether

# Set your API key (or use Ollama for free)
export OPENAI_API_KEY=sk-xxx

# Interview a local server during development
bellwether interview node ./src/mcp-server.js

# Or interview an npm package
bellwether interview npx @modelcontextprotocol/server-filesystem /tmp

# Output: AGENTS.md with behavioral documentation

Local Development Workflow

Bellwether integrates into your development workflow to catch behavioral drift before deployment:

# 1. Test your local server
bellwether interview node ./src/mcp-server.js

# 2. Save a baseline after initial development
bellwether interview --save-baseline node ./src/mcp-server.js

# 3. Use watch mode for continuous testing
bellwether watch node ./src/mcp-server.js --watch-path ./src

# 4. Before committing, check for drift
bellwether interview --compare-baseline ./baseline.json node ./src/mcp-server.js

Use Ollama for completely free testing during development.

Next Steps