Skip to main content

Drift Detection

Drift detection identifies when MCP server behavior changes between versions, helping catch breaking changes before they reach production.

What Is Drift?

Drift occurs when your MCP server's behavior differs from its documented baseline. This can be:

  • Intentional - New features, bug fixes, refactoring
  • Unintentional - Regressions, breaking changes, bugs

How It Works

   Baseline (v1)              Current Behavior
| |
| |
[read_file] [read_file]
- Returns UTF-8 - Returns UTF-8
- Max 10MB - Max 50MB <-- CHANGED
- ENOENT on missing - Different error message <-- CHANGED
| |
v v
Drift Detection
|
v
Changes Detected:
- Max size: 10MB -> 50MB (warning)
- Error message changed (info)

Determinism and Reliability

Drift detection in bellwether check is deterministic and does not use an LLM.

Schema and metadata changes are compared deterministically:

  • Tool added/removed
  • Parameter added/removed/renamed
  • Type changes
  • Required status changes
  • Description changes
  • Tool annotation changes (readOnlyHint, destructiveHint, idempotentHint, openWorldHint)
  • Entity title changes (tool, prompt, resource, and resource template titles)
  • Output schema changes
  • Execution/task support changes
  • Server instruction changes
  • Prompt added/removed/modified
  • Resource added/removed/modified
  • Resource template changes
  • Performance regression (P50/P95 latency, success rate)
  • Security finding deltas (when check.security.enabled is true)
  • Error trend changes
  • Response schema evolution changes
  • Documentation score changes

These detections are 100% reliable and consistent across runs.

Comparisons are protocol-version-aware — version-specific fields (annotations, titles, output schemas, etc.) are only compared when both baselines support the relevant MCP protocol version.

Achieving 100% Determinism

For CI/CD pipelines requiring deterministic results, use bellwether check:

# Initialize config (if not already done)
bellwether init npx your-server

# Run check (free, deterministic, no LLM)
bellwether check --fail-on-drift

Configure baseline path in bellwether.yaml:

baseline:
comparePath: "./bellwether-baseline.json"
failOnDrift: true

Check mode:

  • No LLM calls = no non-determinism
  • Consistent pass/fail results every time
  • Zero API costs
  • Fast execution

Optionally add custom scenarios for stricter deterministic coverage:

# bellwether.yaml - with custom scenarios
scenarios:
path: "./bellwether-tests.yaml"
only: true

Recommendations by Use Case

Use CaseRecommended ModeWhy
CI/CD deployment gatesbellwether check + baseline comparisonDeterministic, enforceable exit codes
PR review checksbellwether check --fail-on-severity warningCatches meaningful drift early
Initial documentationbellwether exploreRich behavioral docs (AGENTS.md)
Compliance environmentsbellwether check (+ optional scenarios.only: true)Auditable and reproducible

Drift Severity Levels

LevelDescriptionExamplesCI Behavior
breakingSchema or critical behavior changesTool removed, required param addedAlways fails
warningBehavioral changes to investigateError messages, limits, side effectsExit code 2 (handle in CI as desired)
infoDocumentation-only changesWording improvementsExit code 1 (handle in CI as desired)
noneNo changes detected-Pass

Using Drift Detection

Local Development

# Initialize config (first time only)
bellwether init npx your-server

# Run check and save initial baseline
bellwether check
bellwether baseline save

# Make changes to server...

# Re-run check (uses baseline from config)
bellwether check

CI/CD Pipeline

Configure baseline comparison in bellwether.yaml:

baseline:
comparePath: "./bellwether-baseline.json"
failOnDrift: true
# CI command
bellwether check --fail-on-drift

Check Mode (100% Deterministic)

Check mode (bellwether check) provides deterministic drift detection without any LLM involvement:

bellwether check --fail-on-drift

In check mode:

  • No LLM calls required
  • Results are reproducible across runs
  • Schema, capability, performance, security, and report-quality deltas are all compared deterministically
  • Free and fast

Use check mode for:

  • CI/CD deployment gates requiring determinism
  • Compliance environments with audit requirements
  • Detecting drift across contract, reliability, and security signals

Understanding Drift Output

Drift Detection Results
=======================

BREAKING (1):
- Tool "legacy_read" was removed

WARNING (2):
- read_file: Maximum file size changed from 10MB to 50MB
- write_file: Error message format changed

INFO (1):
- read_file: Documentation clarified for binary files

Summary: 1 breaking, 2 warnings, 1 info
Exit code: 3 (breaking changes)

Drift Categories

Schema Drift

Changes to tool definitions:

ChangeSeverityExample
Tool addedinfoNew delete_file tool
Tool removedbreakinglegacy_read removed
Required param addedbreakingpath now required
Optional param addedinfoNew encoding option
Type changedbreakinglimit string -> number

Behavioral Drift

Changes to how tools behave:

ChangeSeverityExample
Return value formatwarningDate format changed
Error handlingwarningNew error type
PerformanceinfoFaster response
LimitswarningMax size changed
Side effectswarningNow creates parent dirs

Security Drift

Changes affecting security:

ChangeSeverityExample
New vulnerabilitybreakingPath traversal found
Vulnerability fixedinfoInjection prevented
Permission changewarningMore restrictive

Performance Drift

Bellwether tracks tool latency and detects performance regressions:

ChangeSeverityExample
P50 latency increasedwarning45ms → 78ms (+73%)
Success rate droppedwarning98% → 85%
Timeout frequencywarningMore frequent timeouts
Confidence degradedinfohigh → medium (more variability)

Configure the regression threshold:

check:
performanceThreshold: 10 # Flag if P50 latency increases by >10%

This setting is configuration-only and applies to all check runs.

Performance Confidence

Bellwether calculates statistical confidence for performance metrics based on sample count and variability:

ConfidenceCriteriaMeaning
High10+ samples, CV < 0.3Reliable baseline
Medium5+ samples, CV < 0.5Somewhat reliable
LowFew samples or high variabilityNeeds more data

Tools with low confidence are flagged in reports, and regressions are marked as unreliable.

Error Pattern Drift

Changes in error behavior across runs:

ChangeSeverityExample
New error categorywarningVALIDATION errors appearing
Error resolvedinfoTIMEOUT errors no longer occur
Error rate increasedwarningNotFound errors up 50%
Root cause changedinfoDifferent error messages

Response Schema Drift

Changes to response structure:

ChangeSeverityExample
Fields addedinfoNew metadata field in response
Fields removedwarningtimestamp field removed
Type changedbreakingcount changed from number to string
Schema became unstablewarningResponse structure varies between calls

Documentation Drift

Changes to documentation quality:

ChangeSeverityExample
Score degradedwarning85 → 65 (B → D)
Score improvedinfo70 → 90 (C → A)
New issuesinfoMissing parameter descriptions
Issues fixedinfoDescriptions added

Handling Drift

Intentional Changes

When drift is expected (new features, bug fixes, refactoring), you can accept the changes and update the baseline:

# Run check to detect drift
bellwether check

# Review and accept the drift with a reason
bellwether baseline accept --reason "Added new delete_file tool"

# Commit updated baseline
git add .bellwether/bellwether-baseline.json
git commit -m "Update baseline: added delete_file tool"

Option 2: Accept during check

# Accept drift in a single command
bellwether check --accept-drift --accept-reason "Improved error handling"

# Commit updated baseline
git add .bellwether/bellwether-baseline.json
git commit -m "Update baseline: improved error handling"

Option 3: Force save baseline

# Run check and review the changes
bellwether check

# Overwrite baseline without acceptance metadata
bellwether baseline save --force

# Commit updated baseline
git add .bellwether/bellwether-baseline.json
git commit -m "Update baseline: improved error handling"

Acceptance Metadata

When you use baseline accept or --accept-drift, the baseline records:

  • When the drift was accepted
  • Who accepted it (if --accepted-by provided)
  • Why it was accepted (the reason)
  • What changes were accepted (snapshot of the diff)

This creates an audit trail for intentional changes.

Unintentional Changes

When drift is unexpected (regressions, bugs):

  1. Review the diff output
  2. Identify the root cause
  3. Fix the regression
  4. Re-run check to verify the fix

Exit Codes

Bellwether uses granular exit codes for semantic CI/CD integration:

CodeMeaningDescription
0CleanNo changes detected
1InfoNon-breaking changes (new tools, optional params)
2WarningBehavioral changes to investigate
3BreakingCritical changes (tool removed, type changed)
4ErrorRuntime error (connection, config)
5Low confidenceMetrics lack confidence (when check.sampling.failOnLowConfidence is true)

Bellwether always returns the severity-specific exit code; use your CI to decide which severities should fail a build.

Configurable Failure Threshold

You can configure which severity level you treat as a CI failure:

baseline:
severity:
failOnSeverity: breaking # Only fail on breaking changes

Or via CLI flag:

# Fail on any drift (including info-level)
bellwether check --fail-on-severity info

# Fail only on warnings or breaking (default)
bellwether check --fail-on-severity warning

# Fail only on breaking changes
bellwether check --fail-on-severity breaking

Best Practices

  1. Run drift detection in CI - Catch changes early
  2. Review drift before merging - Understand what changed
  3. Update baselines intentionally - Don't auto-update
  4. Use appropriate severity - Configure baseline.severity.failOnSeverity and handle exit codes in CI
  5. Commit baselines to git - Track history in version control

See Also