Bellwether

Automated behavioral documentation for MCP servers through LLM-guided testing.

Bellwether is a CLI tool that generates comprehensive behavioral documentation for Model Context Protocol (MCP) servers. Instead of relying on manually written docs, Bellwether interviews your MCP server by:

Discovering available tools, prompts, and resources
Generating realistic test scenarios using an LLM
Executing tests and analyzing actual responses
Synthesizing findings into actionable documentation

Why Bellwether?

Problem	Solution
Documentation says one thing, but what does the server actually do?	Trust but verify - Interview the server to document real behavior
Breaking changes slip into production unnoticed	Drift detection - Catch behavioral changes before they hit production
Security vulnerabilities are hard to discover manually	Security insights - Persona-based adversarial testing
Manual testing is slow and expensive	CI/CD integration - Automated regression testing for MCP servers

Key Features

AGENTS.md Generation - Human-readable behavioral documentation generated automatically from actual server responses
Complete MCP Coverage - Test tools, prompts, and resources with content previews and access patterns
Drift Detection - Compare baselines to detect behavioral changes between versions with semantic diff analysis
Multi-Persona Testing - Security tester, QA engineer, technical writer, and novice user personas for comprehensive coverage
MCP Registry Integration - Search and discover servers from the official MCP Registry
Verification Program - Certify your server with Bronze, Silver, Gold, or Platinum tiers
GitHub Action - Official action for automated CI/CD integration
Multiple Output Formats - Markdown, JSON, JUnit XML, and SARIF for GitHub Code Scanning

How It Works

   MCP Server           Bellwether                    Output
       |                   |                          |
       |  tools/list       |                          |
       |<------------------|                          |
       |                   |                          |
       |   tools/call      |   LLM generates          |
       |<------------------|   test scenarios         |
       |                   |                          |
       |   responses       |                          |
       |------------------>|   Analyze behavior       |
       |                   |                          |
       |                   |----------------------->  AGENTS.md
       |                   |                          baseline.json

Output Example

Bellwether generates AGENTS.md files documenting observed server behavior:

# @modelcontextprotocol/server-filesystem

> Generated by Bellwether on 2026-01-12

## Overview

A file management server providing tools for reading, writing, and searching files.

## Tools

### read_file

Read contents of a file from the specified path.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| path | string | yes | Path to the file to read |

**Observed Behavior:**
- Returns file contents as UTF-8 text
- Binary files are returned as base64-encoded content
- Maximum file size: 10MB

**Limitations:**
- Cannot read files outside configured root directory

**Security Considerations:**
- Path traversal attempts (../) are normalized within root

Cost Efficiency

Bellwether uses LLMs for intelligent testing. Typical costs per interview (10 tools, 3 questions each):

Model	Cost	Quality
`gpt-5-mini`	~$0.02	Good (recommended for CI)
`claude-haiku-4-5`	~$0.04	Good
`gpt-5.2`	~$0.12	Best
`claude-sonnet-4-5`	~$0.13	Best
Ollama (local)	Free	Variable

Use --quick flag in CI for fastest, cheapest runs (~$0.01).

Quick Example

# Install
npm install -g @dotsetlabs/bellwether

# Set your API key (or use Ollama for free)
export OPENAI_API_KEY=sk-xxx

# Interview a local server during development
bellwether interview node ./src/mcp-server.js

# Or interview an npm package
bellwether interview npx @modelcontextprotocol/server-filesystem /tmp

# Output: AGENTS.md with behavioral documentation

Local Development Workflow

Bellwether integrates into your development workflow to catch behavioral drift before deployment:

# 1. Test your local server
bellwether interview node ./src/mcp-server.js

# 2. Save a baseline after initial development
bellwether interview --save-baseline node ./src/mcp-server.js

# 3. Use watch mode for continuous testing
bellwether watch node ./src/mcp-server.js --watch-path ./src

# 4. Before committing, check for drift
bellwether interview --compare-baseline ./baseline.json node ./src/mcp-server.js

Use Ollama for completely free testing during development.

Next Steps

Installation - Install Bellwether and configure your LLM provider
Quick Start - Run your first interview in 5 minutes
Local Development - Test your server during development
CLI Reference - Full command documentation
MCP Registry - Discover servers to test
Verification - Certify your server
CI/CD Integration - Automate with the GitHub Action

Why Bellwether?​

Key Features​

How It Works​

Output Example​

Cost Efficiency​

Quick Example​

Local Development Workflow​

Next Steps​