Drift Detection
Drift detection identifies when MCP server behavior changes between versions, helping catch breaking changes before they reach production.
What Is Drift?
Drift occurs when your MCP server's behavior differs from its documented baseline. This can be:
- Intentional - New features, bug fixes, refactoring
- Unintentional - Regressions, breaking changes, bugs
How It Works
Baseline (v1) Current Behavior
| |
| |
[read_file] [read_file]
- Returns UTF-8 - Returns UTF-8
- Max 10MB - Max 50MB <-- CHANGED
- ENOENT on missing - Different error message <-- CHANGED
| |
v v
Drift Detection
|
v
Changes Detected:
- Max size: 10MB -> 50MB (warning)
- Error message changed (info)
Determinism and Reliability
Drift detection in bellwether check is deterministic and does not use an LLM.
Schema and metadata changes are compared deterministically:
- Tool added/removed
- Parameter added/removed/renamed
- Type changes
- Required status changes
- Description changes
- Tool annotation changes (readOnlyHint, destructiveHint, idempotentHint, openWorldHint)
- Entity title changes (tool, prompt, resource, and resource template titles)
- Output schema changes
- Execution/task support changes
- Server instruction changes
- Prompt added/removed/modified
- Resource added/removed/modified
- Resource template changes
- Performance regression (P50/P95 latency, success rate)
- Security finding deltas (when
check.security.enabledis true) - Error trend changes
- Response schema evolution changes
- Documentation score changes
These detections are 100% reliable and consistent across runs.
Comparisons are protocol-version-aware — version-specific fields (annotations, titles, output schemas, etc.) are only compared when both baselines support the relevant MCP protocol version.
Achieving 100% Determinism
For CI/CD pipelines requiring deterministic results, use bellwether check:
# Initialize config (if not already done)
bellwether init npx your-server
# Run check (free, deterministic, no LLM)
bellwether check --fail-on-drift
Configure baseline path in bellwether.yaml:
baseline:
comparePath: "./bellwether-baseline.json"
failOnDrift: true
Check mode:
- No LLM calls = no non-determinism
- Consistent pass/fail results every time
- Zero API costs
- Fast execution
Optionally add custom scenarios for stricter deterministic coverage:
# bellwether.yaml - with custom scenarios
scenarios:
path: "./bellwether-tests.yaml"
only: true
Recommendations by Use Case
| Use Case | Recommended Mode | Why |
|---|---|---|
| CI/CD deployment gates | bellwether check + baseline comparison | Deterministic, enforceable exit codes |
| PR review checks | bellwether check --fail-on-severity warning | Catches meaningful drift early |
| Initial documentation | bellwether explore | Rich behavioral docs (AGENTS.md) |
| Compliance environments | bellwether check (+ optional scenarios.only: true) | Auditable and reproducible |
Drift Severity Levels
| Level | Description | Examples | CI Behavior |
|---|---|---|---|
breaking | Schema or critical behavior changes | Tool removed, required param added | Always fails |
warning | Behavioral changes to investigate | Error messages, limits, side effects | Exit code 2 (handle in CI as desired) |
info | Documentation-only changes | Wording improvements | Exit code 1 (handle in CI as desired) |
none | No changes detected | - | Pass |
Using Drift Detection
Local Development
# Initialize config (first time only)
bellwether init npx your-server
# Run check and save initial baseline
bellwether check
bellwether baseline save
# Make changes to server...
# Re-run check (uses baseline from config)
bellwether check
CI/CD Pipeline
Configure baseline comparison in bellwether.yaml:
baseline:
comparePath: "./bellwether-baseline.json"
failOnDrift: true
# CI command
bellwether check --fail-on-drift
Check Mode (100% Deterministic)
Check mode (bellwether check) provides deterministic drift detection without any LLM involvement:
bellwether check --fail-on-drift
In check mode:
- No LLM calls required
- Results are reproducible across runs
- Schema, capability, performance, security, and report-quality deltas are all compared deterministically
- Free and fast
Use check mode for:
- CI/CD deployment gates requiring determinism
- Compliance environments with audit requirements
- Detecting drift across contract, reliability, and security signals
Understanding Drift Output
Drift Detection Results
=======================
BREAKING (1):
- Tool "legacy_read" was removed
WARNING (2):
- read_file: Maximum file size changed from 10MB to 50MB
- write_file: Error message format changed
INFO (1):
- read_file: Documentation clarified for binary files
Summary: 1 breaking, 2 warnings, 1 info
Exit code: 3 (breaking changes)
Drift Categories
Schema Drift
Changes to tool definitions:
| Change | Severity | Example |
|---|---|---|
| Tool added | info | New delete_file tool |
| Tool removed | breaking | legacy_read removed |
| Required param added | breaking | path now required |
| Optional param added | info | New encoding option |
| Type changed | breaking | limit string -> number |
Behavioral Drift
Changes to how tools behave:
| Change | Severity | Example |
|---|---|---|
| Return value format | warning | Date format changed |
| Error handling | warning | New error type |
| Performance | info | Faster response |
| Limits | warning | Max size changed |
| Side effects | warning | Now creates parent dirs |
Security Drift
Changes affecting security:
| Change | Severity | Example |
|---|---|---|
| New vulnerability | breaking | Path traversal found |
| Vulnerability fixed | info | Injection prevented |
| Permission change | warning | More restrictive |
Performance Drift
Bellwether tracks tool latency and detects performance regressions:
| Change | Severity | Example |
|---|---|---|
| P50 latency increased | warning | 45ms → 78ms (+73%) |
| Success rate dropped | warning | 98% → 85% |
| Timeout frequency | warning | More frequent timeouts |
| Confidence degraded | info | high → medium (more variability) |
Configure the regression threshold:
check:
performanceThreshold: 10 # Flag if P50 latency increases by >10%
This setting is configuration-only and applies to all check runs.
Performance Confidence
Bellwether calculates statistical confidence for performance metrics based on sample count and variability:
| Confidence | Criteria | Meaning |
|---|---|---|
| High | 10+ samples, CV < 0.3 | Reliable baseline |
| Medium | 5+ samples, CV < 0.5 | Somewhat reliable |
| Low | Few samples or high variability | Needs more data |
Tools with low confidence are flagged in reports, and regressions are marked as unreliable.
Error Pattern Drift
Changes in error behavior across runs:
| Change | Severity | Example |
|---|---|---|
| New error category | warning | VALIDATION errors appearing |
| Error resolved | info | TIMEOUT errors no longer occur |
| Error rate increased | warning | NotFound errors up 50% |
| Root cause changed | info | Different error messages |
Response Schema Drift
Changes to response structure:
| Change | Severity | Example |
|---|---|---|
| Fields added | info | New metadata field in response |
| Fields removed | warning | timestamp field removed |
| Type changed | breaking | count changed from number to string |
| Schema became unstable | warning | Response structure varies between calls |
Documentation Drift
Changes to documentation quality:
| Change | Severity | Example |
|---|---|---|
| Score degraded | warning | 85 → 65 (B → D) |
| Score improved | info | 70 → 90 (C → A) |
| New issues | info | Missing parameter descriptions |
| Issues fixed | info | Descriptions added |
Handling Drift
Intentional Changes
When drift is expected (new features, bug fixes, refactoring), you can accept the changes and update the baseline:
Option 1: Accept command (recommended)
# Run check to detect drift
bellwether check
# Review and accept the drift with a reason
bellwether baseline accept --reason "Added new delete_file tool"
# Commit updated baseline
git add .bellwether/bellwether-baseline.json
git commit -m "Update baseline: added delete_file tool"
Option 2: Accept during check
# Accept drift in a single command
bellwether check --accept-drift --accept-reason "Improved error handling"
# Commit updated baseline
git add .bellwether/bellwether-baseline.json
git commit -m "Update baseline: improved error handling"
Option 3: Force save baseline
# Run check and review the changes
bellwether check
# Overwrite baseline without acceptance metadata
bellwether baseline save --force
# Commit updated baseline
git add .bellwether/bellwether-baseline.json
git commit -m "Update baseline: improved error handling"
Acceptance Metadata
When you use baseline accept or --accept-drift, the baseline records:
- When the drift was accepted
- Who accepted it (if
--accepted-byprovided) - Why it was accepted (the reason)
- What changes were accepted (snapshot of the diff)
This creates an audit trail for intentional changes.
Unintentional Changes
When drift is unexpected (regressions, bugs):
- Review the diff output
- Identify the root cause
- Fix the regression
- Re-run check to verify the fix
Exit Codes
Bellwether uses granular exit codes for semantic CI/CD integration:
| Code | Meaning | Description |
|---|---|---|
0 | Clean | No changes detected |
1 | Info | Non-breaking changes (new tools, optional params) |
2 | Warning | Behavioral changes to investigate |
3 | Breaking | Critical changes (tool removed, type changed) |
4 | Error | Runtime error (connection, config) |
5 | Low confidence | Metrics lack confidence (when check.sampling.failOnLowConfidence is true) |
Bellwether always returns the severity-specific exit code; use your CI to decide which severities should fail a build.
Configurable Failure Threshold
You can configure which severity level you treat as a CI failure:
baseline:
severity:
failOnSeverity: breaking # Only fail on breaking changes
Or via CLI flag:
# Fail on any drift (including info-level)
bellwether check --fail-on-severity info
# Fail only on warnings or breaking (default)
bellwether check --fail-on-severity warning
# Fail only on breaking changes
bellwether check --fail-on-severity breaking
Best Practices
- Run drift detection in CI - Catch changes early
- Review drift before merging - Understand what changed
- Update baselines intentionally - Don't auto-update
- Use appropriate severity - Configure
baseline.severity.failOnSeverityand handle exit codes in CI - Commit baselines to git - Track history in version control
See Also
- Baselines - Creating and managing baselines
- CI/CD Integration - Automated drift checking
- Configuration - Config file drift options
- check - Running drift detection