Frequently Asked Questions

General

What is Bellwether?

Bellwether is a CLI tool for structural drift detection and behavioral documentation of MCP (Model Context Protocol) servers. It has two main commands:

bellwether check - Free, fast, deterministic schema validation and drift detection
bellwether explore - LLM-powered persona-based exploration for deeper behavioral documentation

What is MCP?

Model Context Protocol is an open standard created by Anthropic for connecting AI assistants (Claude, GPT, Cursor) to external tools and data sources.

When you build an MCP server, you're creating capabilities that AI agents can call—reading files, querying databases, calling APIs, or running custom business logic. MCP is supported by Claude Desktop, Zed, Cursor, Cline, and other AI-powered tools.

What's the difference between check and explore?

	`bellwether check`	`bellwether explore`
Cost	Free	~$0.01-0.15 per run (cloud) or Free (local)
Speed	Seconds	Minutes
LLM Required	No	Yes
Output	Docs and/or JSON	Docs and/or JSON
Best For	CI/CD, drift detection	Deep analysis, documentation
Deterministic	Yes	No

check compares tool schemas against a baseline—fast, free, and deterministic. Perfect for CI/CD.

explore uses LLMs for persona-based probing. By default it runs technical_writer; you can enable additional personas (security_tester, qa_engineer, novice_user) in config.

Is Bellwether free?

Yes! Bellwether is completely free and open source (MIT license).

bellwether check requires no LLM and has zero costs
bellwether explore requires an LLM API key (OpenAI, Anthropic) or local Ollama (also free)

What LLM providers are supported?

For bellwether explore:

Anthropic (recommended) - Claude Haiku 4.5 (default), Claude Sonnet 4.5 (premium)
OpenAI - GPT-4.1-nano (default), GPT-4.1 (premium)
Ollama - Local models (Qwen3:8b default, Llama, Mistral, etc.)

How much does explore mode cost?

Typical costs per exploration (varies based on server complexity):

Model	Cost	Notes
Ollama (qwen3:8b)	Free	Local, requires GPU
gpt-4.1-nano	~$0.01-0.02	Budget cloud option
claude-haiku-4-5	~$0.02-0.05	Recommended for quality/cost balance
gpt-4.1	~$0.04-0.08	Higher quality OpenAI
claude-sonnet-4-5	~$0.08-0.15	Premium quality

Note: Avoid GPT-5 series models for Bellwether—they use "reasoning tokens" that make costs unpredictable and significantly higher.

Why not just write unit tests?

Unit tests verify YOUR expectations. Bellwether discovers UNEXPECTED behaviors.

Think of the difference:

Unit test: "Does get_weather('NYC') return weather data?"
Bellwether: "What happens when someone calls get_weather with a SQL injection string?"

They're complementary. Unit tests catch regressions in known behavior. Bellwether surfaces behaviors you haven't thought to test yet. Use both for complete coverage.

How reliable is drift detection?

Drift detection in bellwether check is 100% deterministic—it compares tool schemas, parameters, and descriptions against a saved baseline. No LLM involved.

This detects:

Tool additions and removals
Parameter changes (added, removed, type changes)
Schema modifications
Description changes
Tool annotation changes (readOnlyHint, destructiveHint, etc.)
Entity title changes (tool, prompt, resource, resource template)
Output schema changes
Execution/task support changes
Server instruction changes
Prompt and resource changes
Performance regression (P50/P95 latency, success rate)

Comparisons are protocol-version-aware — version-specific fields are only compared when both baselines support the relevant MCP protocol version.

For behavioral changes (how tools actually respond), use bellwether explore periodically for deeper analysis.

Is this project sustainable?

Bellwether is fully open source (MIT license). The project is designed for long-term sustainability:

Open Source: If development ever stops, the code is yours to fork and maintain
Community-Driven: Contributions welcome from the community
Simple Architecture: Minimal dependencies, easy to understand and extend

Installation

What are the system requirements?

Node.js 20 or later
npm or npx
For explore mode: One of OpenAI API key, Anthropic API key, or local Ollama

Can I use Bellwether without an API key?

Yes! bellwether check works completely without any API key. It's free and deterministic.

For bellwether explore, you can use Ollama for free local LLM inference:

ollama serve
ollama pull qwen3:8b
bellwether explore npx your-server

How do I update Bellwether?

npm update -g @dotsetlabs/bellwether

Usage

How do I check an MCP server?

bellwether check npx @modelcontextprotocol/server-filesystem /tmp

How do I explore an MCP server with LLM?

bellwether explore npx @modelcontextprotocol/server-filesystem /tmp

What output formats are supported?

CONTRACT.md - Structural documentation (from check)
AGENTS.md - Behavioral documentation (from explore)
JSON - Machine-readable data for programmatic analysis

How do I use different personas in explore mode?

Configure in bellwether.yaml:

explore:
  personas:
    - technical_writer
    - security_tester
    - qa_engineer
    - novice_user

How do I save a baseline?

bellwether check npx your-server
bellwether baseline save
# Creates (default): .bellwether/bellwether-baseline.json

How do I compare against a baseline?

Run check first, then compare:

bellwether check npx your-server
bellwether baseline compare ./bellwether-baseline.json --fail-on-drift

Or configure baseline comparison in bellwether.yaml:

baseline:
  comparePath: "./bellwether-baseline.json"
  failOnDrift: true

Then simply run:

bellwether check --fail-on-drift

CI/CD

How do I use Bellwether in CI?

# GitHub Actions (check mode - free, no API key needed)
- name: Run Bellwether
  run: |
    npx @dotsetlabs/bellwether check
    npx @dotsetlabs/bellwether baseline compare ./bellwether-baseline.json --fail-on-drift

What do exit codes mean?

Code	Meaning
0	Clean (no changes)
1	Info-level changes
2	Warning-level changes
3	Breaking changes
4	Runtime error
5	Low confidence metrics (when `check.sampling.failOnLowConfidence` is true)

How do I minimize CI costs?

Use bellwether check which is completely free. Only use bellwether explore periodically for deeper analysis (not in every CI run).

Security

Is my API key safe?

API keys are:

Never logged
Never sent to Bellwether servers
Only sent to your chosen LLM provider (for explore mode)

What data does Bellwether send to LLMs?

In explore mode:

Tool names and schemas
Test scenarios and responses
No source code unless included in tool responses

In check mode:

Nothing—check mode doesn't use LLMs

Can Bellwether damage my server?

Bellwether only calls tools that exist on your server. It generates test scenarios but doesn't execute arbitrary code. Use appropriate test environments.

Troubleshooting

"API key not found"

This only applies to explore mode. Set up your API key:

# Interactive setup (recommended)
bellwether auth

# Or set environment variable
export OPENAI_API_KEY=sk-xxx

"Connection refused"

Check your server starts correctly:

bellwether discover npx your-server

"Timeout errors"

Increase timeout in bellwether.yaml:

server:
  timeout: 120000

Contributing

General​

What is Bellwether?​

What is MCP?​

What's the difference between check and explore?​

Is Bellwether free?​

What LLM providers are supported?​

How much does explore mode cost?​

Why not just write unit tests?​

How reliable is drift detection?​

Is this project sustainable?​

Installation​

What are the system requirements?​

Can I use Bellwether without an API key?​

How do I update Bellwether?​

Usage​

How do I check an MCP server?​

How do I explore an MCP server with LLM?​

What output formats are supported?​

How do I use different personas in explore mode?​

How do I save a baseline?​

How do I compare against a baseline?​

CI/CD​

How do I use Bellwether in CI?​

What do exit codes mean?​

How do I minimize CI costs?​

Security​

Is my API key safe?​

What data does Bellwether send to LLMs?​

Can Bellwether damage my server?​

Troubleshooting​

"API key not found"​

"Connection refused"​

"Timeout errors"​

Contributing​

How do I report bugs?​

How do I contribute?​

Is there a community?​