ADR-0013 — sandbox verification framework (scratch/verification/)

§1. Context

ADR-0003 で確定した Phase X3 minimal plugin (1 skill + 4 hook + 6 script + 1 CLI) を実装する前に、「机上設計と現実の乖離」リスクを潰す必要がある。既存 research 2 件 (plugin-architecture-research.html / spec-graph-management-research.html) は web 上の情報のみで実機検証なし。 Claude Code 公式 hook の permissionDecision: "deny" Bug 群 (Issue #37210 / #33106 / #39344 / #18312、すべて Closed / not planned) や hooks 数 29 種等の実挙動を自前で確認していない状態で実装に進むのは危険。

5 並列 researcher × 87 ソース調査 (plugin-sandbox-verification-research.html) で類似 plugin ecosystem (VS Code / Neovim / JetBrains / MCP Inspector / mcp-recorder / MCPSpec / Continue.dev / Aider) の test harness と AI agent eval framework (Inspect AI / Promptfoo / Hypothesis / EARS→Gherkin) を精査、 folio 用 sandbox verification framework の設計を導出した。

§2. Decision

§2.1 framework 採用

folio Phase X3 着手前に sandbox verification framework を導入する。配置は scratch/verification/ (本体 spec とは分離、 P-11 整合)。 framework の仕様 (scope / scenario format / use case mapping / implementation phase) を scratch/specs/verification.html に集約 SSoT 化する。

§2.2 framework 構造

scratch/
├── specs/verification.html         (spec SSoT、 normative)
└── verification/                   (実体、 試作実装)
    ├── README.html                 entry + 全 scenario 一覧
    ├── scenarios/                  use case 別 YAML scenario
    ├── fixtures/                   テストデータ
    ├── baselines/
    │   ├── reference/              VCS 管理 (golden、 TypeScript 2-dir model)
    │   └── local/                  実行生成 (.gitignore)
    └── runner.sh                   軽量 bash runner

§2.3 scenario format = YAML、 1 REQ = 1 scenario

YAML を採用 (MCPSpec / Promptfoo / pytest-regressions 流)。各 scenario は EARS REQ から Gherkin Given/When/Then に 1:1 変換 (conductofcode.io / RequireKit pattern)。 schema 詳細は verification.html §3 EARS Requirements。

§2.4 assertion 戦略 = exit code + stderr 中心

Claude Code permissionDecision: "deny" JSON は Issue #37210 / #33106 / #39344 等で動作不安定 (Closed/not planned)。 verification scenario の assertion は exit_code: 2 + stderr_contains を確実な fallback として採用する。 ADR-0016 候補で詳細規範化予定。

§2.5 runner = 軽量試作 runner

Phase X3 試作段階は軽量試作 runner (runner.sh 仮称) を採用 (試作駆動と整合、重量 framework 導入を回避)。完成形 (Phase X4+) では Inspect AI / Promptfoo 統合候補。具体的な runner interface (YAML parse 方式 / 言語選定 / assertion 評価 logic / exit code 集計) は WHAT 規定外 (HOW として binding に隔離、 P-11 整合)、 Phase X3 着手時に research §10.1 Gap 2 解決後に確定する。

§2.6 3 段階 implementation phase

verification を 3 段階で導入する意思決定 (本 ADR の Decision 部分): Step 1 = hook script unit test (Phase X3 最初 MUST)、 Step 2 = worktree-based integration (中盤 SHOULD)、 Step 3 = container-based isolation (Phase X4+ MAY)。 detailed table (方式 / API call / 適用 Phase) は verification.html §4.1 を SSoT として参照する MUST (DRY 整合、本 ADR は意思決定 trace のみ)。

§3. Consequences

Positive

plugin 実装の「動くと思って書いたが動かない」リスクを低減: Phase X3 着手前に hook の実挙動を verify 可能
P-3/P-11 整合: verification の WHAT のみ verification.html に集約、 HOW (runner 実装 / fixture 中身) は scratch/verification/ 配下に隔離
業界 best practice 採用: TypeScript baseline 2-dir model + EARS→Gherkin + log4brains-style golden file
段階的成長: Step 1 (API 不要 unit) で着手、 Step 2-3 は段階追加

Negative

verification framework 自体の保守 cost (runner.sh / scenario YAML / baseline file)
Phase X3 着手の前提として framework 実装が必要 (実装 barrier 増)
Issue #39344 fix 確認未完 (research §10.1 Gap 1) のため、 assertion 戦略 (exit code 中心) が前提崩れる可能性

Neutral

SHOULD 候補 ADR (0016 exit code assertion / 0017 unit-vs-integration / 0018 baseline 管理) は別 PR で逐次起票
完成形では .claude-plugin/ 配下に verification も統合候補 (Phase X4+)、現在は試作物として scratch/verification/ に隔離

§4. Alternatives Considered

案	採用しなかった理由
案 A: 公式 Claude Code sandbox 待ち	BoxLite sandbox 統合提案 (Issue #15888) は Anthropic "not planned" で却下済。公式 plugin isolation sandbox は存在しない
案 B: twill experiment-verified 方式 (情報単位)	P-3 WHAT-only / P-11 HOW 禁止と衝突。詳細は ADR-0015
案 C: Inspect AI / Promptfoo を最初から採用	試作段階で重量、 Python / Node 依存導入、 hook script 中心の folio に過剰。 Phase X4+ 候補として保留
案 D: verification framework なしで実装着手	「机上設計と現実の乖離」リスク (permissionDecision deny bug 等) を放置、試作後の修正 cost 爆発

§5. Trace

本 ADR を生んだ調査: plugin-sandbox-verification-research.html (87 sources、 5 並列 researcher、 critique PASS)
関連 constitution 原則: P-3 (WHAT-only)、 P-7 (Content domain exclusivity、 verification = how to verify は別 dir)、 P-11 (HOW 禁止)、 P-12 (Layer 0 一体配布、将来 .claude-plugin/ 統合候補)
関連 ADR: ADR-0003 (前提、 verification 対象 plugin の定義)、 ADR-0015 (twill との境界)
SHOULD 候補 ADR (将来別 PR): ADR-0016 (exit code assertion)、 ADR-0017 (unit vs integration test)、 ADR-0018 (golden baseline 管理)
Open Questions (research §10.1): Gap 1 Issue #39344 fix 確認、 Gap 2 runner.sh 具体仕様、 Gap 3 claude plugin validate 全 check 項目