Skip to main content

CI/CD Integration

Integrate Bellwether into your CI/CD pipeline for automated behavioral testing of MCP servers.

The easiest way to integrate Bellwether is with our official GitHub Action:

name: MCP Behavioral Testing
on: [push, pull_request]

jobs:
bellwether:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Run Bellwether
uses: dotsetlabs/bellwether/action@v1
with:
server-command: 'npx @modelcontextprotocol/server-filesystem'
server-args: '/tmp'
baseline-path: './bellwether-baseline.json'
fail-on-drift: 'true'
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

Action Features

  • Automatic setup - Installs Node.js and Bellwether
  • SARIF upload - Results appear in GitHub Security tab
  • Artifact upload - AGENTS.md and baseline files saved
  • All presets supported - ci, docs, security, thorough
  • Custom scenarios - Pass your bellwether-tests.yaml

Action Inputs

InputDescriptionDefault
server-commandCommand to start the MCP serverRequired
server-argsArguments for the server''
presetInterview preset-
quickQuick mode for PR checksfalse
baseline-pathBaseline for drift detection-
fail-on-driftFail if drift detectedtrue
fail-on-securityFail on security issuestrue
output-formatsarif, junit, json, markdownsarif
scenarios-pathCustom test scenarios file-
scenarios-onlyRun only custom scenariosfalse

See the full action documentation for all options.


Manual Setup

If you prefer not to use the action, you can run Bellwether directly:

GitHub Actions (Manual)

name: MCP Behavioral Testing
on: [push, pull_request]

jobs:
bellwether:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'

- name: Run Bellwether
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
npx @dotsetlabs/bellwether interview \
--preset ci \
--compare-baseline ./bellwether-baseline.json \
--fail-on-drift \
npx your-mcp-server

GitLab CI

bellwether:
image: node:20
script:
- |
npx @dotsetlabs/bellwether interview \
--preset ci \
--compare-baseline ./bellwether-baseline.json \
--fail-on-drift \
npx your-mcp-server
variables:
OPENAI_API_KEY: $OPENAI_API_KEY

CI Mode

Enable CI mode with --ci for:

  • No colors - Clean output for log parsing
  • No progress bars - Streaming-friendly output
  • Machine-readable errors - Structured error messages
  • Proper exit codes - Pipeline gates

Exit Codes

CodeMeaningAction
0SuccessPipeline passes
1Drift or security issuesPipeline fails
2Interview errorPipeline fails

Workflow Patterns

PR Checks

Fast checks on every pull request using the ci preset (~$0.01/run):

name: PR Check
on: pull_request

jobs:
bellwether:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Quick Behavioral Check
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
npx @dotsetlabs/bellwether interview \
--preset ci \
--compare-baseline ./bellwether-baseline.json \
--fail-on-drift \
npx your-server

Nightly Full Tests

Comprehensive testing on a schedule using the thorough preset:

name: Nightly Tests
on:
schedule:
- cron: '0 0 * * *'

jobs:
bellwether:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Full Interview
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
npx @dotsetlabs/bellwether interview \
--preset thorough \
--compare-baseline ./bellwether-baseline.json \
--fail-on-drift \
npx your-server

Security Scanning

Security-focused pipeline using the security preset:

name: Security Scan
on: pull_request

jobs:
security:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Security Audit
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
npx @dotsetlabs/bellwether interview \
--preset security \
--fail-on-security \
--output-format sarif \
-o ./security \
npx your-server

- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v3
if: always()
with:
sarif_file: ./security/bellwether.sarif

Baseline Management

Storing Baselines

Commit baselines to version control:

# Create baseline
bellwether interview --save-baseline npx your-server

# Commit
git add bellwether-baseline.json
git commit -m "Add behavioral baseline"
git push

Updating Baselines

Update when intentional changes are made:

name: Update Baseline
on:
workflow_dispatch:

jobs:
update:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Generate New Baseline
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
npx @dotsetlabs/bellwether interview \
--save-baseline \
npx your-server

- name: Commit Baseline
run: |
git config user.name "GitHub Actions"
git config user.email "actions@github.com"
git add bellwether-baseline.json
git commit -m "Update behavioral baseline"
git push

Cloud Integration

Sync with Bellwether Cloud for history and drift tracking:

- name: Upload to Cloud
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
BELLWETHER_SESSION: ${{ secrets.BELLWETHER_SESSION }}
run: |
npx @dotsetlabs/bellwether interview --save-baseline npx your-server
npx @dotsetlabs/bellwether upload --ci --fail-on-drift

Output Formats

SARIF for GitHub

- name: Generate SARIF
run: |
npx @dotsetlabs/bellwether interview \
--output-format sarif \
-o ./results \
npx your-server

- uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: ./results/bellwether.sarif

JUnit for GitLab

bellwether:
script:
- npx @dotsetlabs/bellwether interview --output-format junit -o ./results npx your-server
artifacts:
reports:
junit: results/junit.xml

Environment Variables

VariableDescriptionRequired
OPENAI_API_KEYOpenAI API keyYes*
ANTHROPIC_API_KEYAnthropic API keyYes*
BELLWETHER_SESSIONCloud session tokenFor cloud
CIAuto-detected in CI-

*One LLM provider required

Cost Optimization

Use the CI Preset

The ci preset is optimized for fast, cheap CI runs:

- run: |
npx @dotsetlabs/bellwether interview \
--preset ci \
npx your-server

Cost by Preset

PresetApprox. CostUse Case
ci~$0.01PR checks, fast validation
docs~$0.02Documentation generation
security~$0.05Security-focused testing
thorough~$0.10Comprehensive nightly tests

Verification Badges

Display your server's verification status in your README:

Add Badge After CI Run

- name: Get Badge
env:
BELLWETHER_SESSION: ${{ secrets.BELLWETHER_SESSION }}
run: |
npx @dotsetlabs/bellwether badge --markdown >> $GITHUB_STEP_SUMMARY

Update README Automatically

name: Update Badge
on:
workflow_run:
workflows: ["MCP Behavioral Testing"]
types: [completed]

jobs:
badge:
runs-on: ubuntu-latest
if: ${{ github.event.workflow_run.conclusion == 'success' }}
steps:
- uses: actions/checkout@v4

- name: Update Badge Status
env:
BELLWETHER_SESSION: ${{ secrets.BELLWETHER_SESSION }}
run: |
BADGE_URL=$(npx @dotsetlabs/bellwether badge --url)
echo "Badge URL: $BADGE_URL"

Badge status reflects:

  • Verified (green): Server has been tested
  • Stable (green): No behavioral drift between versions
  • Drift detected (yellow): Behavioral changes found
  • Breaking changes (red): Significant breaking changes

Troubleshooting

Exit Code 2

Interview failed (not drift):

  • Check API key is set
  • Verify server starts correctly
  • Check network connectivity

Timeout Errors

Increase timeout for slow servers:

- run: |
npx @dotsetlabs/bellwether interview \
--timeout 120000 \
npx slow-server

Debug Logging

- run: |
npx @dotsetlabs/bellwether interview \
--ci \
--log-level debug \
--log-file ./bellwether-debug.log \
npx your-server

- uses: actions/upload-artifact@v4
if: failure()
with:
name: debug-logs
path: ./bellwether-debug.log

See Also