Drift Detection

Drift detection identifies when MCP server behavior changes between versions, helping catch breaking changes before they reach production.

What Is Drift?

Drift occurs when your MCP server's behavior differs from its documented baseline. This can be:

Intentional - New features, bug fixes, refactoring
Unintentional - Regressions, breaking changes, bugs

How It Works

   Baseline (v1)              Current Behavior
        |                           |
        |                           |
   [read_file]                [read_file]
   - Returns UTF-8            - Returns UTF-8
   - Max 10MB                 - Max 50MB  <-- CHANGED
   - ENOENT on missing        - Different error message <-- CHANGED
        |                           |
        v                           v
              Drift Detection
                    |
                    v
            Changes Detected:
            - Max size: 10MB -> 50MB (warning)
            - Error message changed (info)

Determinism and Reliability

Drift detection in bellwether check is deterministic and does not use an LLM.

Schema and metadata changes are compared deterministically:

Tool added/removed
Parameter added/removed/renamed
Type changes
Required status changes
Description changes
Tool annotation changes (readOnlyHint, destructiveHint, idempotentHint, openWorldHint)
Entity title changes (tool, prompt, resource, and resource template titles)
Output schema changes
Execution/task support changes
Server instruction changes
Prompt added/removed/modified
Resource added/removed/modified
Resource template changes
Performance regression (P50/P95 latency, success rate)
Security finding deltas (when check.security.enabled is true)
Error trend changes
Response schema evolution changes
Documentation score changes

These detections are 100% reliable and consistent across runs.

Comparisons are protocol-version-aware — version-specific fields (annotations, titles, output schemas, etc.) are only compared when both baselines support the relevant MCP protocol version.

Achieving 100% Determinism

For CI/CD pipelines requiring deterministic results, use bellwether check:

# Initialize config (if not already done)
bellwether init npx your-server

# Run check (free, deterministic, no LLM)
bellwether check --fail-on-drift

Configure baseline path in bellwether.yaml:

baseline:
  comparePath: "./bellwether-baseline.json"
  failOnDrift: true

Check mode:

No LLM calls = no non-determinism
Consistent pass/fail results every time
Zero API costs
Fast execution

Optionally add custom scenarios for stricter deterministic coverage:

# bellwether.yaml - with custom scenarios
scenarios:
  path: "./bellwether-tests.yaml"
  only: true

Recommendations by Use Case

Use Case	Recommended Mode	Why
CI/CD deployment gates	`bellwether check` + baseline comparison	Deterministic, enforceable exit codes
PR review checks	`bellwether check --fail-on-severity warning`	Catches meaningful drift early
Initial documentation	`bellwether explore`	Rich behavioral docs (`AGENTS.md`)
Compliance environments	`bellwether check` (+ optional `scenarios.only: true`)	Auditable and reproducible

Drift Severity Levels

Level	Description	Examples	CI Behavior
`breaking`	Schema or critical behavior changes	Tool removed, required param added	Always fails
`warning`	Behavioral changes to investigate	Error messages, limits, side effects	Exit code `2` (handle in CI as desired)
`info`	Documentation-only changes	Wording improvements	Exit code `1` (handle in CI as desired)
`none`	No changes detected	-	Pass

Using Drift Detection

Local Development

# Initialize config (first time only)
bellwether init npx your-server

# Run check and save initial baseline
bellwether check
bellwether baseline save

# Make changes to server...

# Re-run check (uses baseline from config)
bellwether check

CI/CD Pipeline

Configure baseline comparison in bellwether.yaml:

baseline:
  comparePath: "./bellwether-baseline.json"
  failOnDrift: true

# CI command
bellwether check --fail-on-drift

Check Mode (100% Deterministic)

Check mode (bellwether check) provides deterministic drift detection without any LLM involvement:

bellwether check --fail-on-drift

In check mode:

No LLM calls required
Results are reproducible across runs
Schema, capability, performance, security, and report-quality deltas are all compared deterministically
Free and fast

Use check mode for:

CI/CD deployment gates requiring determinism
Compliance environments with audit requirements
Detecting drift across contract, reliability, and security signals

Understanding Drift Output

Drift Detection Results
=======================

BREAKING (1):
  - Tool "legacy_read" was removed

WARNING (2):
  - read_file: Maximum file size changed from 10MB to 50MB
  - write_file: Error message format changed

INFO (1):
  - read_file: Documentation clarified for binary files

Summary: 1 breaking, 2 warnings, 1 info
Exit code: 3 (breaking changes)

Drift Categories

Schema Drift

Changes to tool definitions:

Change	Severity	Example
Tool added	info	New `delete_file` tool
Tool removed	breaking	`legacy_read` removed
Required param added	breaking	`path` now required
Optional param added	info	New `encoding` option
Type changed	breaking	`limit` string -> number

Behavioral Drift

Changes to how tools behave:

Change	Severity	Example
Return value format	warning	Date format changed
Error handling	warning	New error type
Performance	info	Faster response
Limits	warning	Max size changed
Side effects	warning	Now creates parent dirs

Security Drift

Changes affecting security:

Change	Severity	Example
New vulnerability	breaking	Path traversal found
Vulnerability fixed	info	Injection prevented
Permission change	warning	More restrictive

Performance Drift

Bellwether tracks tool latency and detects performance regressions:

Change	Severity	Example
P50 latency increased	warning	45ms → 78ms (+73%)
Success rate dropped	warning	98% → 85%
Timeout frequency	warning	More frequent timeouts
Confidence degraded	info	high → medium (more variability)

Configure the regression threshold:

check:
  performanceThreshold: 10  # Flag if P50 latency increases by >10%

This setting is configuration-only and applies to all check runs.

Performance Confidence

Bellwether calculates statistical confidence for performance metrics based on sample count and variability:

Confidence	Criteria	Meaning
High	10+ samples, CV < 0.3	Reliable baseline
Medium	5+ samples, CV < 0.5	Somewhat reliable
Low	Few samples or high variability	Needs more data

Tools with low confidence are flagged in reports, and regressions are marked as unreliable.

Error Pattern Drift

Changes in error behavior across runs:

Change	Severity	Example
New error category	warning	VALIDATION errors appearing
Error resolved	info	TIMEOUT errors no longer occur
Error rate increased	warning	NotFound errors up 50%
Root cause changed	info	Different error messages

Response Schema Drift

Changes to response structure:

Change	Severity	Example
Fields added	info	New `metadata` field in response
Fields removed	warning	`timestamp` field removed
Type changed	breaking	`count` changed from number to string
Schema became unstable	warning	Response structure varies between calls

Documentation Drift

Changes to documentation quality:

Change	Severity	Example
Score degraded	warning	85 → 65 (B → D)
Score improved	info	70 → 90 (C → A)
New issues	info	Missing parameter descriptions
Issues fixed	info	Descriptions added

Handling Drift

Intentional Changes

When drift is expected (new features, bug fixes, refactoring), you can accept the changes and update the baseline:

Option 1: Accept command (recommended)

# Run check to detect drift
bellwether check

# Review and accept the drift with a reason
bellwether baseline accept --reason "Added new delete_file tool"

# Commit updated baseline
git add .bellwether/bellwether-baseline.json
git commit -m "Update baseline: added delete_file tool"

Option 2: Accept during check

# Accept drift in a single command
bellwether check --accept-drift --accept-reason "Improved error handling"

# Commit updated baseline
git add .bellwether/bellwether-baseline.json
git commit -m "Update baseline: improved error handling"

Option 3: Force save baseline

# Run check and review the changes
bellwether check

# Overwrite baseline without acceptance metadata
bellwether baseline save --force

# Commit updated baseline
git add .bellwether/bellwether-baseline.json
git commit -m "Update baseline: improved error handling"

Acceptance Metadata

When you use baseline accept or --accept-drift, the baseline records:

When the drift was accepted
Who accepted it (if --accepted-by provided)
Why it was accepted (the reason)
What changes were accepted (snapshot of the diff)

This creates an audit trail for intentional changes.

Unintentional Changes

When drift is unexpected (regressions, bugs):

Review the diff output
Identify the root cause
Fix the regression
Re-run check to verify the fix

Exit Codes

Bellwether uses granular exit codes for semantic CI/CD integration:

Code	Meaning	Description
`0`	Clean	No changes detected
`1`	Info	Non-breaking changes (new tools, optional params)
`2`	Warning	Behavioral changes to investigate
`3`	Breaking	Critical changes (tool removed, type changed)
`4`	Error	Runtime error (connection, config)
`5`	Low confidence	Metrics lack confidence (when `check.sampling.failOnLowConfidence` is true)

Bellwether always returns the severity-specific exit code; use your CI to decide which severities should fail a build.

Configurable Failure Threshold

You can configure which severity level you treat as a CI failure:

baseline:
  severity:
    failOnSeverity: breaking  # Only fail on breaking changes

Or via CLI flag:

# Fail on any drift (including info-level)
bellwether check --fail-on-severity info

# Fail only on warnings or breaking (default)
bellwether check --fail-on-severity warning

# Fail only on breaking changes
bellwether check --fail-on-severity breaking

Best Practices

Run drift detection in CI - Catch changes early
Review drift before merging - Understand what changed
Update baselines intentionally - Don't auto-update
Use appropriate severity - Configure baseline.severity.failOnSeverity and handle exit codes in CI
Commit baselines to git - Track history in version control

What Is Drift?​

How It Works​

Determinism and Reliability​

Achieving 100% Determinism​

Recommendations by Use Case​

Drift Severity Levels​

Using Drift Detection​

Local Development​

CI/CD Pipeline​

Check Mode (100% Deterministic)​

Understanding Drift Output​

Drift Categories​

Schema Drift​

Behavioral Drift​

Security Drift​

Performance Drift​

Performance Confidence​

Error Pattern Drift​

Response Schema Drift​

Documentation Drift​

Handling Drift​

Intentional Changes​

Option 1: Accept command (recommended)​

Option 2: Accept during check​

Option 3: Force save baseline​

Acceptance Metadata​

Unintentional Changes​

Exit Codes​

Configurable Failure Threshold​

Best Practices​

See Also​