Output Formats

Bellwether generates output in multiple formats to support different use cases: documentation and machine-readable reports.

Available Formats

Format	File	Use Case
Markdown	`CONTRACT.md` (check) / `AGENTS.md` (explore)	Human-readable documentation
JSON	`bellwether-check.json` / `bellwether-explore.json`	Machine-readable data
Baseline	Configured by `baseline.path` / `baseline.savePath`	Drift detection snapshots
JUnit	(stdout)	CI test reporting (`bellwether check --format junit`)
SARIF	(stdout)	GitHub Code Scanning (`bellwether check --format sarif`)
Compact	(stdout)	Single-line summary for log aggregation
GitHub	(stdout)	GitHub Actions annotations

Markdown (Default)

Human-readable documentation. Check generates CONTRACT.md, explore generates AGENTS.md.

bellwether check npx your-server
# Output: CONTRACT.md

Example Output

# @modelcontextprotocol/server-filesystem

> Generated by Bellwether on 2026-01-12

## Overview

A file management server providing read/write access to the local filesystem.

## Tools

### read_file

Reads the contents of a file from the specified path.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| path | string | yes | Path to the file |

**Observed Behavior:**
- Returns file contents as UTF-8 text
- Binary files returned as base64
- Maximum file size: 10MB

**Error Handling:**
- `ENOENT`: File not found
- `EACCES`: Permission denied

**Limitations:**
- Cannot read outside root directory

**Security Considerations:**
- Path traversal normalized within root

## Quick Reference

| Tool | Signature |
|------|-----------|
| read_file | `read_file(path)` |
| write_file | `write_file(path, content)` |

## Performance

| Tool | Calls | Avg | P95 | Max | Errors |
|------|-------|-----|-----|-----|--------|
| read_file | 5 | 45ms | 120ms | 150ms | 0% |
| write_file | 3 | 89ms | 200ms | 250ms | 0% |

### Performance Insights
- All tools performing within acceptable limits

## Prompts

### summarize_file

Generates a summary of a file's contents.

| Argument | Type | Required | Description |
|----------|------|----------|-------------|
| path | string | yes | Path to the file to summarize |
| max_length | number | no | Maximum summary length |

**Expected Output:**
Returns a structured summary prompt message suitable for LLM processing.

**Behavior Notes:**
- Works best with text files under 50KB
- Returns error for binary files

The CONTRACT.md output includes:

Tool Profiles - Behavioral documentation for each tool
Prompt Profiles - Documentation for prompts (if the server exposes any)
Quick Reference - Tool signatures for easy lookup
Performance Metrics - Response times and error rates for each tool

JSON Report

Machine-readable format for programmatic access.

JSON reports are generated when output.format includes json. File names and locations are configurable in bellwether.yaml:

output:
  dir: ".bellwether"
  files:
    checkReport: "bellwether-check.json"
    exploreReport: "bellwether-explore.json"

Each JSON report embeds a $schema pointer for validation. The schemas live in the repo under schemas/ and are published for tooling:

Example Output

The example below is abbreviated for readability. Refer to the schema for the full structure.

{
  "$schema": "https://unpkg.com/@dotsetlabs/bellwether/schemas/bellwether-explore.schema.json",
  "version": 1,
  "timestamp": "2026-01-12T10:30:00Z",
  "server": {
    "name": "@modelcontextprotocol/server-filesystem",
    "version": "0.10.1"
  },
  "tools": [
    {
      "name": "read_file",
      "description": "Reads file contents",
      "schema": {
        "type": "object",
        "properties": {
          "path": { "type": "string" }
        },
        "required": ["path"]
      },
      "interview": {
        "questionsAsked": 3,
        "observations": [...],
        "errors": [...],
        "security": [...]
      }
    }
  ],
  "prompts": [
    {
      "name": "summarize_file",
      "description": "Generate a summary of file contents",
      "arguments": [
        { "name": "path", "required": true },
        { "name": "max_length", "required": false }
      ],
      "interview": {
        "questionsAsked": 2,
        "observations": [...],
        "errors": [...]
      }
    }
  ],
  "scenarioResults": [
    {
      "type": "tool",
      "name": "read_file",
      "description": "Read existing file",
      "passed": true,
      "assertions": [...]
    }
  ],
  "cost": {
    "tokens": 1234,
    "estimatedCost": 0.02
  }
}

The JSON report includes:

tools and prompts arrays with their respective interview results
scenarioResults array with custom scenario test results (if scenarios were run)
semanticInferences with inferred parameter types (check mode)
schemaEvolution tracking response schema stability (check mode)
errorAnalysisSummaries with root causes and remediation hints (check mode)
documentationScore with quality grading and suggestions (check mode)

Baseline Format

Save a baseline for drift detection:

bellwether check npx your-server
bellwether baseline save
# Output (default): .bellwether/bellwether-baseline.json

The baseline captures the server's behavior at a point in time. Later, compare against it:

bellwether check npx your-server
bellwether baseline compare ./bellwether-baseline.json

Baseline format versions follow the CLI package version; baselines are compatible when their major versions match.

Example Baseline

{
  "version": "2.1.1",
  "metadata": {
    "mode": "check",
    "generatedAt": "2026-01-25T10:30:00Z",
    "cliVersion": "2.1.1",
    "serverCommand": "npx @modelcontextprotocol/server-filesystem /tmp",
    "serverName": "@modelcontextprotocol/server-filesystem",
    "durationMs": 1823,
    "personas": [],
    "model": "none"
  },
  "server": {
    "name": "@modelcontextprotocol/server-filesystem",
    "version": "0.10.1",
    "protocolVersion": "2025-11-25",
    "capabilities": ["tools"]
  },
  "capabilities": {
    "tools": [
      {
        "name": "read_file",
        "description": "Read contents of a file",
        "inputSchema": { "type": "object", "properties": { "path": { "type": "string" } } },
        "schemaHash": "def456..."
      }
    ]
  },
  "interviews": [],
  "toolProfiles": [
    {
      "name": "read_file",
      "description": "Read contents of a file",
      "schemaHash": "def456...",
      "assertions": [],
      "securityNotes": [],
      "limitations": [],
      "behavioralNotes": []
    }
  ],
  "assertions": [],
  "summary": "Filesystem server with 1 tool",
  "hash": "a1b2c3d4e5f6..."
}

Multiple Formats

Documentation and JSON reports are written based on output.format (docs, json, or both; legacy alias: agents.md).
Control their locations in bellwether.yaml:

output:
  dir: ".bellwether"   # JSON reports
  docsDir: "."         # CONTRACT.md / AGENTS.md

Custom Output Directory

Set output.dir for JSON files and output.docsDir for markdown docs.

JUnit Format

Generate JUnit XML (stdout):

bellwether check --format junit > bellwether-results.xml

check --format junit works in both modes:

Check-only run (no baseline.comparePath): includes tool reliability and security findings from the current run.
Baseline comparison run (baseline.comparePath set): includes drift-focused test cases (schema drift, performance regression, security deltas, schema evolution, error trends, and documentation score changes).

JUnit output includes test cases for:

Schema changes (breaking, warning, info)
Performance regressions
Security findings
Documentation quality
Error pattern changes

SARIF Format

Generate SARIF (stdout):

bellwether check --format sarif > bellwether.sarif

check --format sarif works in both modes:

Check-only run: emits reliability/security findings from the current run (for example BWH-REL, BWH-SEC/CWE-based IDs).
Baseline comparison run: emits drift-specific rules and findings (BWH001 and above).

SARIF rules include:

BWH001-004: Schema drift rules (breaking, warning, info)
BWH005-006: Response structure and error pattern drift rules
BWH007: Security finding rule
BWH008-009: Response schema evolution rules
BWH010-011: Error trend rules
BWH012-013: Performance regression and confidence rules
BWH014-015: Documentation quality rules

Report Sections

Check mode reports include these sections:

Performance Metrics

─── Performance ───
  Tool: read_file
  P50: 45ms | P95: 120ms | Success: 98%
  Confidence: high (15 samples, CV: 0.28)

Security Findings (with `check.security.enabled`)

─── Security ───
  Tool: execute_query
  Category: sql_injection
  Risk: critical
  Finding: Tool accepted SQL injection payload

Documentation Quality

─── Documentation Quality ───
  Score: 85/100 (B)
  Coverage: 100% | Quality: 80% | Params: 85%
  Issues: 2 (1 warning, 1 info)

Error Analysis

─── Error Summary ───
  Category: NotFound (ENOENT)
  Root Cause: File does not exist
  Remediation: Verify path before calling

Available Formats​

Markdown (Default)​

Example Output​

JSON Report​

Example Output​

Baseline Format​

Example Baseline​

Multiple Formats​

Custom Output Directory​

JUnit Format​

SARIF Format​

Report Sections​

Performance Metrics​

Security Findings (with check.security.enabled)​

Documentation Quality​

Error Analysis​

See Also​

Available Formats

Markdown (Default)

Example Output

JSON Report

Example Output

Baseline Format

Example Baseline

Multiple Formats

Custom Output Directory

JUnit Format

SARIF Format

Report Sections

Performance Metrics

Security Findings (with `check.security.enabled`)

Documentation Quality

Error Analysis

See Also