Skip to main content

bellwether check

Check an MCP server for schema validation and drift detection. Free, fast, and deterministic.

Synopsis

bellwether check [server-command] [args...]
bellwether check --fail-on-drift
bellwether check --format junit
bellwether check --accept-drift --accept-reason "Added new feature"

Description

The check command is the core of Bellwether. It connects to an MCP server, discovers capabilities (tools, prompts, resources), validates schemas, and generates CONTRACT.md documentation—all without requiring an LLM.

This is the recommended command for CI/CD pipelines because it's:

  • Free - No API keys or LLM costs
  • Fast - Runs in seconds
  • Deterministic - Same input produces identical output
Config Required

check requires a config file. Run bellwether init once in your project.

Arguments

ArgumentDescription
[server-command]Command to start the MCP server (optional if set in config)
[args...]Arguments to pass to the server command

Options

Core Options

OptionDescriptionDefault
-c, --config <path>Path to config filebellwether.yaml
--fail-on-driftExit with error if drift detected (overrides config)From config
--accept-driftAccept detected drift as intentional and update baselinefalse
--accept-reason <reason>Reason for accepting drift (used with --accept-drift)-

Output Options

OptionDescriptionDefault
--format <format>Output format: text, json, compact, github, markdown, junit, sarifcheck.diffFormat

Severity Options

OptionDescriptionDefault
--min-severity <level>Minimum severity to report (overrides config): none, info, warning, breakingFrom config
--fail-on-severity <level>Fail threshold (overrides config): none, info, warning, breakingbreaking

Remote Auth Override

OptionDescriptionDefault
-H, --header <header...>Custom header(s) for remote MCP requests (for example Authorization: Bearer token)From server.headers
Config-First Design

Bellwether uses a config-first approach. All settings—including parallel testing, security testing, sampling, and output options—are configured in bellwether.yaml. CLI flags are minimal and primarily used for one-time overrides in CI/CD pipelines.

Run bellwether init to generate a comprehensive, well-documented configuration file.

Examples

Basic Check

# Check with server command
bellwether check npx @modelcontextprotocol/server-filesystem /tmp

# Or use server command from config
bellwether check

Drift Detection with Config

Configure baseline comparison in bellwether.yaml:

baseline:
comparePath: "./bellwether-baseline.json" # Compared under output.dir (cwd fallback for existing files)
failOnDrift: true # Fail if drift detected

Then run:

bellwether check

CI/CD Pipeline

# Quick drift check in CI (--fail-on-drift overrides config)
bellwether check --fail-on-drift

# JUnit output for CI reporting (parallel testing is config default)
bellwether check --format junit > results.xml

# Fail on any warning or breaking change
bellwether check --fail-on-severity warning

Remote Server with Auth Header

bellwether check \
--config bellwether.yaml \
-H "Authorization: Bearer $MCP_SERVER_TOKEN"

Configure parallel testing, incremental checking, and security testing in bellwether.yaml:

# bellwether.yaml (CI preset example)
check:
parallel: true
parallelWorkers: 4
incremental: true
security:
enabled: true
baseline:
failOnDrift: true

Save Baseline Separately

Use the baseline command to save baselines:

bellwether check
bellwether baseline save

Accept Drift During Check

When you intentionally change your server, you can accept drift as part of the check:

# Accept drift in one command
bellwether check --accept-drift --accept-reason "Added new delete_file tool"

This updates the baseline and records acceptance metadata (timestamp and reason, if provided).

note

The --accepted-by option is only available in bellwether baseline accept. The --accept-drift flag records the reason, but does not set an acceptor by default.

Output Files

FileDescription
CONTRACT.mdStructural documentation of tool schemas (configurable via output.files.contractDoc)
bellwether-check.jsonMachine-readable validation results (configurable via output.files.checkReport)

Output locations are controlled by output.dir (JSON) and output.docsDir (docs). Which files are written is controlled by output.format (docs, json, or both; legacy alias: agents.md).

The JSON report embeds a $schema pointer and includes enriched fields such as semanticInferences, schemaEvolution, errorAnalysisSummaries, and documentationScore when available.

CONTRACT.md Contents

The generated documentation includes:

  • Server Information: Name, version, protocol version
  • Quick Reference Table: Tool names with parameters, success rates, descriptions
  • Performance Baseline: P50/P95 latency metrics, success rates, confidence levels
  • Tool Reference: Names, descriptions, parameters with full schema
  • Example Usage: Up to 2 successful interaction examples per tool
  • Error Patterns: Categorized errors with root cause and remediation suggestions
  • Error Summary: Aggregate error patterns across all tools
  • Security Baseline: Security findings and risk scores (when check.security.enabled is true)
  • Documentation Quality: Score breakdown and improvement suggestions
  • Custom Scenario Results: If bellwether-tests.yaml exists

Configuration

Check mode uses settings from bellwether.yaml. Run bellwether init to generate a comprehensive configuration file with all options documented.

Key Configuration Sections

server:
command: "npx @mcp/your-server"
args: ["/data"]
# transport: sse
# url: "https://api.example.com/mcp"
# headers:
# Authorization: "Bearer ${MCP_SERVER_TOKEN}"
timeout: 30000

output:
dir: ".bellwether" # JSON output directory
docsDir: "." # Documentation output (CONTRACT.md)
format: both # docs, json, or both (legacy: agents.md)

# Example output settings for documentation
examples:
full: true # Include full (non-truncated) examples
maxLength: 5000 # Maximum example length (100-50000)
maxPerTool: 5 # Maximum examples per tool (1-20)

baseline:
comparePath: "./bellwether-baseline.json" # Compared under output.dir (cwd fallback for existing files)
failOnDrift: false

# Severity thresholds for CI
severity:
minimumSeverity: none # Filter: none, info, warning, breaking
failOnSeverity: breaking # CI fail threshold
suppressWarnings: false # Hide warning-level changes
# aspectOverrides: # Custom severity per aspect type
# description: none # Ignore description-only changes

# Check command settings
check:
incremental: false # Only test changed tools
incrementalCacheHours: 168 # Cache age (1 week)
parallel: true # Parallel tool testing (recommended)
parallelWorkers: 4 # Concurrent workers (1-10)
performanceThreshold: 10 # Regression threshold (%)

# Security testing settings
security:
enabled: false # Enable security vulnerability testing
categories: # Categories to test
- sql_injection
- xss
- path_traversal
- command_injection
- ssrf
- error_disclosure

# Statistical sampling settings
sampling:
minSamples: 10 # Minimum samples per tool (1-50)
targetConfidence: low # low, medium, or high
failOnLowConfidence: false # Fail if confidence below target

# Workflow testing
workflows:
autoGenerate: false # Auto-generate from tool patterns
stepTimeout: 5000 # Timeout per step in ms

scenarios:
path: "./bellwether-tests.yaml" # Custom test scenarios
only: false

Output Formats

The --format option supports multiple output formats for CI integration:

FormatDescriptionUse Case
textHuman-readable colored outputTerminal, local dev
jsonMachine-readable JSONScripting, automation
compactSingle-line summaryLog aggregation
githubGitHub Actions annotationsGitHub CI
markdownMarkdown-formatted reportPR comments
junitJUnit XML formatJenkins, GitLab CI, CircleCI
sarifSARIF 2.1.0 formatGitHub Code Scanning

JUnit Output

bellwether check --format junit > bellwether-results.xml

Generates standard JUnit XML that CI systems can parse for test reporting.

SARIF Output

bellwether check --format sarif > bellwether-results.sarif

Generates SARIF format for GitHub Code Scanning and other static analysis tools.

Parallel Testing

Speed up checks by testing tools concurrently. Configure in bellwether.yaml:

check:
parallel: true # Enabled by default
parallelWorkers: 4 # 1-10 concurrent workers
note

Parallel testing uses a mutex to serialize MCP client calls, ensuring stable results while maximizing throughput.

Incremental Checking

Only test tools with changed schemas, using cached results for unchanged tools. Configure in bellwether.yaml:

check:
incremental: true
incrementalCacheHours: 168 # Cache valid for 1 week

Incremental checking:

  • Compares current tool schemas against the baseline
  • Only tests tools with changed schemas or new tools
  • Reuses cached fingerprints for unchanged tools
  • Significantly faster for large servers with many tools
tip

Incremental checking requires an existing baseline (baseline.comparePath). On first run, all tools are tested.

Performance Regression Detection

Bellwether tracks tool latency and flags performance regressions. Configure the threshold in bellwether.yaml:

check:
performanceThreshold: 10 # Flag if P50 latency increases by >10%

When comparing baselines, performance regressions are reported:

--- Performance Regressions ---
read_file: p50 45ms → 78ms (+73%)
write_file: p50 120ms → 145ms (+21%)

Performance metrics captured:

  • P50 latency - Median response time
  • P95 latency - 95th percentile response time
  • Success rate - Percentage of successful calls

Performance Confidence

Bellwether calculates statistical confidence for performance metrics:

─── Confidence Changes ───
↑ read_file: low → high (more samples collected)
↓ write_file: high → medium (increased variability)

Confidence levels:

  • High - 10+ samples, low variability (CV < 0.3)
  • Medium - 5+ samples or moderate variability
  • Low - Few samples or high variability

Tools with low confidence are flagged in reports:

Note: Some tools have low confidence metrics.
Run with more samples for reliable baselines: write_file, delete_file

Security Testing

Enable security testing to detect vulnerabilities. Configure in bellwether.yaml:

check:
security:
enabled: true
categories:
- sql_injection
- xss
- path_traversal
- command_injection
- ssrf
- error_disclosure

Security testing probes for:

  • SQL Injection - ' OR 1=1 --, ; DROP TABLE
  • Path Traversal - ../../../etc/passwd
  • Command Injection - ; rm -rf /, $(whoami)
  • XSS - <script>alert(1)</script>
  • SSRF - http://169.254.169.254/

When security issues are found:

─── Security Findings ───
Tool: execute_query
Category: sql_injection
Risk Level: critical
Finding: Tool accepted SQL injection payload without sanitization

Security findings are stored in the baseline and compared across runs to detect security regressions.

Documentation Quality

Bellwether scores documentation quality for all tools:

─── Documentation Quality ───
Score: 85/100 (B)
Grade: B → A
Issues fixed: 3

Documentation is scored on:

  • Description Coverage (30%) - Tools with descriptions
  • Description Quality (30%) - Length, clarity, examples
  • Parameter Documentation (25%) - Parameters with descriptions
  • Example Coverage (15%) - Tools with usage examples

Grades: A (90+), B (80+), C (70+), D (60+), F (below 60)

Error Analysis

Bellwether performs enhanced error analysis with root cause inference:

─── Error Analysis ───
Tool: read_file
Category: NotFound (404)
Root Cause: File does not exist at specified path
Remediation: Verify the file path exists before calling read_file
Related Parameters: path

Error patterns are tracked across runs to detect changes in error behavior.

Custom Scenarios

Check mode supports custom YAML test scenarios for deterministic testing:

# bellwether-tests.yaml
scenarios:
- tool: read_file
description: "File reading works"
args:
path: "/tmp/test.txt"
assertions:
- path: "content"
condition: "contains"
value: "expected content"

See Custom Scenarios for full documentation.

Exit Codes

Bellwether uses granular exit codes for CI/CD integration:

CodeMeaningCI Behavior
0No changes detectedPass
1Info-level changes (non-breaking)Exit code 1 (handle in CI as desired)
2Warning-level changesExit code 2 (handle in CI as desired)
3Breaking changes detectedAlways fail
4Runtime error (connection, config)Fail
5Low confidence metrics (when check.sampling.failOnLowConfidence is true)Fail

Using Exit Codes in CI

bellwether check npx @mcp/server
case $? in
0) echo "No drift detected" ;;
1) echo "Info-level changes (non-breaking additions)" ;;
2) echo "Warning-level changes (review recommended)" ;;
3) echo "Breaking changes detected!" && exit 1 ;;
4) echo "Error: check failed" && exit 1 ;;
5) echo "Low confidence metrics" && exit 1 ;;
esac

See Also