Skip to main content

Frequently Asked Questions

General

What is Bellwether?

Bellwether is a CLI tool for structural drift detection and behavioral documentation of MCP (Model Context Protocol) servers. It has two main commands:

  • bellwether check - Free, fast, deterministic schema validation and drift detection
  • bellwether explore - LLM-powered persona-based exploration for deeper behavioral documentation

What is MCP?

Model Context Protocol is an open standard created by Anthropic for connecting AI assistants (Claude, GPT, Cursor) to external tools and data sources.

When you build an MCP server, you're creating capabilities that AI agents can call—reading files, querying databases, calling APIs, or running custom business logic. MCP is supported by Claude Desktop, Zed, Cursor, Cline, and other AI-powered tools.

What's the difference between check and explore?

bellwether checkbellwether explore
CostFree~$0.01-0.15 per run (cloud) or Free (local)
SpeedSecondsMinutes
LLM RequiredNoYes
OutputDocs and/or JSONDocs and/or JSON
Best ForCI/CD, drift detectionDeep analysis, documentation
DeterministicYesNo

check compares tool schemas against a baseline—fast, free, and deterministic. Perfect for CI/CD.

explore uses LLMs for persona-based probing. By default it runs technical_writer; you can enable additional personas (security_tester, qa_engineer, novice_user) in config.

Is Bellwether free?

Yes! Bellwether is completely free and open source (MIT license).

  • bellwether check requires no LLM and has zero costs
  • bellwether explore requires an LLM API key (OpenAI, Anthropic) or local Ollama (also free)

What LLM providers are supported?

For bellwether explore:

  • Anthropic (recommended) - Claude Haiku 4.5 (default), Claude Sonnet 4.5 (premium)
  • OpenAI - GPT-4.1-nano (default), GPT-4.1 (premium)
  • Ollama - Local models (Qwen3:8b default, Llama, Mistral, etc.)

How much does explore mode cost?

Typical costs per exploration (varies based on server complexity):

ModelCostNotes
Ollama (qwen3:8b)FreeLocal, requires GPU
gpt-4.1-nano~$0.01-0.02Budget cloud option
claude-haiku-4-5~$0.02-0.05Recommended for quality/cost balance
gpt-4.1~$0.04-0.08Higher quality OpenAI
claude-sonnet-4-5~$0.08-0.15Premium quality

Note: Avoid GPT-5 series models for Bellwether—they use "reasoning tokens" that make costs unpredictable and significantly higher.

Why not just write unit tests?

Unit tests verify YOUR expectations. Bellwether discovers UNEXPECTED behaviors.

Think of the difference:

  • Unit test: "Does get_weather('NYC') return weather data?"
  • Bellwether: "What happens when someone calls get_weather with a SQL injection string?"

They're complementary. Unit tests catch regressions in known behavior. Bellwether surfaces behaviors you haven't thought to test yet. Use both for complete coverage.

How reliable is drift detection?

Drift detection in bellwether check is 100% deterministic—it compares tool schemas, parameters, and descriptions against a saved baseline. No LLM involved.

This detects:

  • Tool additions and removals
  • Parameter changes (added, removed, type changes)
  • Schema modifications
  • Description changes
  • Tool annotation changes (readOnlyHint, destructiveHint, etc.)
  • Entity title changes (tool, prompt, resource, resource template)
  • Output schema changes
  • Execution/task support changes
  • Server instruction changes
  • Prompt and resource changes
  • Performance regression (P50/P95 latency, success rate)

Comparisons are protocol-version-aware — version-specific fields are only compared when both baselines support the relevant MCP protocol version.

For behavioral changes (how tools actually respond), use bellwether explore periodically for deeper analysis.

Is this project sustainable?

Bellwether is fully open source (MIT license). The project is designed for long-term sustainability:

  1. Open Source: If development ever stops, the code is yours to fork and maintain
  2. Community-Driven: Contributions welcome from the community
  3. Simple Architecture: Minimal dependencies, easy to understand and extend

Installation

What are the system requirements?

  • Node.js 20 or later
  • npm or npx
  • For explore mode: One of OpenAI API key, Anthropic API key, or local Ollama

Can I use Bellwether without an API key?

Yes! bellwether check works completely without any API key. It's free and deterministic.

For bellwether explore, you can use Ollama for free local LLM inference:

ollama serve
ollama pull qwen3:8b
bellwether explore npx your-server

How do I update Bellwether?

npm update -g @dotsetlabs/bellwether

Usage

How do I check an MCP server?

bellwether check npx @modelcontextprotocol/server-filesystem /tmp

How do I explore an MCP server with LLM?

bellwether explore npx @modelcontextprotocol/server-filesystem /tmp

What output formats are supported?

  • CONTRACT.md - Structural documentation (from check)
  • AGENTS.md - Behavioral documentation (from explore)
  • JSON - Machine-readable data for programmatic analysis

How do I use different personas in explore mode?

Configure in bellwether.yaml:

explore:
personas:
- technical_writer
- security_tester
- qa_engineer
- novice_user

How do I save a baseline?

bellwether check npx your-server
bellwether baseline save
# Creates (default): .bellwether/bellwether-baseline.json

How do I compare against a baseline?

Run check first, then compare:

bellwether check npx your-server
bellwether baseline compare ./bellwether-baseline.json --fail-on-drift

Or configure baseline comparison in bellwether.yaml:

baseline:
comparePath: "./bellwether-baseline.json"
failOnDrift: true

Then simply run:

bellwether check --fail-on-drift

CI/CD

How do I use Bellwether in CI?

# GitHub Actions (check mode - free, no API key needed)
- name: Run Bellwether
run: |
npx @dotsetlabs/bellwether check
npx @dotsetlabs/bellwether baseline compare ./bellwether-baseline.json --fail-on-drift

What do exit codes mean?

CodeMeaning
0Clean (no changes)
1Info-level changes
2Warning-level changes
3Breaking changes
4Runtime error
5Low confidence metrics (when check.sampling.failOnLowConfidence is true)

How do I minimize CI costs?

Use bellwether check which is completely free. Only use bellwether explore periodically for deeper analysis (not in every CI run).

Security

Is my API key safe?

API keys are:

  • Never logged
  • Never sent to Bellwether servers
  • Only sent to your chosen LLM provider (for explore mode)

What data does Bellwether send to LLMs?

In explore mode:

  • Tool names and schemas
  • Test scenarios and responses
  • No source code unless included in tool responses

In check mode:

  • Nothing—check mode doesn't use LLMs

Can Bellwether damage my server?

Bellwether only calls tools that exist on your server. It generates test scenarios but doesn't execute arbitrary code. Use appropriate test environments.

Troubleshooting

"API key not found"

This only applies to explore mode. Set up your API key:

# Interactive setup (recommended)
bellwether auth

# Or set environment variable
export OPENAI_API_KEY=sk-xxx

"Connection refused"

Check your server starts correctly:

bellwether discover npx your-server

"Timeout errors"

Increase timeout in bellwether.yaml:

server:
timeout: 120000

Contributing

How do I report bugs?

Open an issue at github.com/dotsetlabs/bellwether/issues.

How do I contribute?

See CONTRIBUTING.md.

Is there a community?