Files
Mark ba5474c54f docs(01-11): close as spike-pivot — SUMMARY landed, AMENDMENT-A deleted, pivots to 01-13
Closes Plan 01-11 honestly per GSD spike-pivot pattern. Original
Approach A (Puppeteer sw.evaluate per RESEARCH §1+6) empirically
falsified across Wave 3 execution + feasibility research. Approach B
(extension-internal-page harness + offscreen synthetic stream) proven
via c647f61 prototype; full implementation moves to Plan 01-13.

What this commit does:
- ADDS 01-11-SUMMARY.md (spike-then-pivot framing per GSD artifact-
  types.md PLAN→SUMMARY lifecycle; captures retained infrastructure,
  falsified hypotheses, working prototype, bridge to 01-13)
- REVERTS frontmatter amendment block in 01-11-PLAN.md; replaces with
  closed_as/pivoted_in/closure_note pointing at SUMMARY + 01-13
- DELETES 01-11-PLAN-AMENDMENT-A.md (improvised artifact type — not
  recognized in GSD artifact-types.md; content folded into SUMMARY)

Lesson for orchestrator (captured in SUMMARY §Architectural Notes):
when a plan attempts an approach that proves infeasible, the right
move is honest SUMMARY + new plan, NOT in-place rewrite + AMENDMENT
artifact. The project's own pattern (01-08, 01-09, 01-10, 01-11
added mid-phase as new work surfaced) confirms add-new-plan-when-
scope-shifts is the established pattern.

Plan 01-09 closure via harness PASS NOT achieved by 01-11; still
requires operator UAT pending Plan 01-13 landing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 14:02:38 +02:00

97 KiB
Raw Permalink Blame History

closed_as, closed_at, pivoted_in, closure_note, phase, plan, type, wave, depends_on, files_modified, autonomous, requirements, tags, must_haves
closed_as closed_at pivoted_in closure_note phase plan type wave depends_on files_modified autonomous requirements tags must_haves
spike-pivot 2026-05-18 01-13 Approach A (Puppeteer sw.evaluate per RESEARCH §1+6) empirically falsified; pivoted to Approach B (extension-internal-page harness, proven by c647f61 prototype) in Plan 01-13. See 01-11-SUMMARY.md for full pivot rationale + retained infrastructure. 01-stabilize-video-pipeline 11 tdd 4
01-08
01-09
package.json
package-lock.json
vite.test.config.ts
tsconfig.json
src/background/index.ts
src/offscreen/recorder.ts
src/test-hooks/sw-hooks.ts
src/test-hooks/offscreen-hooks.ts
src/test-hooks/types.ts
tests/uat/harness.test.ts
tests/uat/lib/launch.ts
tests/uat/lib/extension.ts
tests/uat/lib/sw.ts
tests/uat/lib/offscreen.ts
tests/uat/lib/assertions.ts
tests/uat/lib/zip.ts
tests/uat/lib/test-hook-contract.d.ts
tests/uat/README.md
tests/background/no-test-hooks-in-prod-bundle.test.ts
.planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md
false
REQ-uat-harness-puppeteer
REQ-uat-bug-A-coverage
REQ-uat-bug-B-coverage
REQ-uat-two-bundle
REQ-uat-ci-friendly
REQ-uat-13-assertions
REQ-video-ring-buffer
puppeteer
uat
harness
e2e
mv3-extension
getDisplayMedia
bug-B
bug-A
tier-1
two-bundle
truths artifacts key_links
`npm run build:test` produces `dist-test/` with `__mokoshTest` hook surfaces injected into SW + offscreen contexts; `npm run build` produces `dist/` with ZERO occurrences of `__mokoshTest` (grep-verifiable).
`npm run test:uat` orchestrates `build:test` + the Puppeteer harness end-to-end; exits 0 only when ALL 14 assertions pass (13 from the brief + assertion 0 = production-bundle hook-leak grep gate).
Bug B harness assertion (track.dispatchEvent('ended') → badge OFF + popup '' + isRecording=false + NO recovery notification) demonstrably catches a regression: rewinding the b9eeeeb conditional routing locally turns this assertion RED; reapplying turns it GREEN.
Bug A harness assertion (onStartup → chrome.notifications.create resolves cleanly with the manifest's icon48.png iconUrl) demonstrably catches a regression: stubbing the icon48 file to <100 bytes turns this assertion RED; restoring turns it GREEN.
Harness runs in `--headless=new` for CI portability; local-debug mode supported via `HEADLESS=0`; no Xvfb required (per RESEARCH §3 empirical probes against Chrome 148).
Test hooks live ONLY behind `import.meta.env.MODE === 'test'` guarded dynamic imports; Vite tree-shakes them from the production bundle; the no-test-hooks-in-prod-bundle.test.ts unit gate enforces this in the existing vitest suite (Tier-1 alongside sw-bundle-import.test.ts).
Existing 83 vitest tests remain GREEN after this plan lands (no regression to the unit test bed).
Plan 01-09 functional contract closes by harness PASS: its Task 5 operator-checkpoint amendment redirects to `npm run test:uat` for steps 4-13 + 15; operator retains only step 1 (build) + step 14 (brand/design check).
path provides contains
vite.test.config.ts Vite config extending the production config; sets `mode: 'test'`, `build.outDir: 'dist-test'`, `build.emptyOutDir: true`. dist-test
path provides contains
src/test-hooks/types.ts Shared TS type declaring `globalThis.__mokoshTest` shape (handlers, getCurrentStream, simulateUserStop, notificationCount, lastNotificationOptions). Single source of truth for SW + offscreen + harness. __mokoshTest
path provides contains
src/test-hooks/sw-hooks.ts SW-side test hook: captures chrome.action.onClicked / chrome.runtime.onStartup / chrome.notifications.onClicked handler refs; wraps chrome.notifications.create to record notificationCount + lastNotificationOptions. Imported dynamically from src/background/index.ts under `import.meta.env.MODE === 'test'` guard. handlers
path provides contains
src/test-hooks/offscreen-hooks.ts Offscreen-side test hook: exposes the current MediaStream via getter; provides simulateUserStop wrapping `track.dispatchEvent(new Event('ended'))` per RESEARCH §7. Imported dynamically from src/offscreen/recorder.ts under `import.meta.env.MODE === 'test'` guard. simulateUserStop
path provides contains
src/background/index.ts Adds a single `if (import.meta.env.MODE === 'test') { await import('../test-hooks/sw-hooks'); }` block at top-of-module so the hook registration runs BEFORE any production addListener calls (capturing every handler). import.meta.env.MODE
path provides contains
src/offscreen/recorder.ts Adds an `if (import.meta.env.MODE === 'test') { __sharedRefs.setMediaStreamGetter(() => mediaStream); }` block (the import itself is gated; the getter wires the runtime mediaStream reference into the hook surface). Same guard pattern as SW. import.meta.env.MODE
path provides min_lines
tests/uat/harness.test.ts Single Node script (run under tsx) implementing all 14 assertions sequentially. ~400 LoC. Top-to-bottom narrative — launch, click, assert, simulate Bug B, simulate Bug A, etc. Returns exit 0 on full pass, non-zero on any failure with structured diagnostic dump. 350
path provides
tests/uat/lib/launch.ts puppeteer.launch wrapper: builds args, sets enableExtensions to absolute dist-test path, chooses headless mode per CI env, configures downloads dir, exports a single launchHarnessBrowser() function.
path provides
tests/uat/lib/extension.ts Helpers to resolve the extension id, attach to the SW target, attach to the offscreen target (background_page type per RESEARCH §4 / Pitfall 1).
path provides
tests/uat/lib/sw.ts SW context helpers: getBadgeText, getPopup, getManifestIcons, fireOnStartup (via captured handler ref), sendSyntheticRecordingError, keepalivePing.
path provides
tests/uat/lib/offscreen.ts Offscreen context helpers: waitForOffscreenTarget, getDisplaySurface, simulateUserStop (the dispatchEvent('ended') path per RESEARCH §7 BLOCKER finding).
path provides
tests/uat/lib/assertions.ts Per-assertion helpers (assertEqual + structured diagnostic on failure); a runWithStartupDiagnostics wrapper that captures SW + offscreen console logs and dumps them on assertion failure for triage.
path provides
tests/uat/lib/zip.ts jszip-based archive shape assertions; reads downloaded `session_report_*.zip`, asserts `video/last_30sec.webm` present + `meta.json` carries `version === chrome.runtime.getManifest().version` (extension-side version read passed in).
path provides
tests/uat/lib/test-hook-contract.d.ts Mirror of src/test-hooks/types.ts in TS-declaration form for the harness side; documents the wire contract between hook injector and harness consumer.
path provides
tests/uat/README.md How to run: `npm run test:uat`; local-debug headful mode via `HEADLESS=0`; CI semantics; troubleshooting (locale-specific picker string, Xvfb fallback if a future Chrome regresses headless, dev-dependency Chromium binary size note).
path provides
tests/background/no-test-hooks-in-prod-bundle.test.ts Tier-1 unit-level grep gate (cousin of sw-bundle-import.test.ts): runs `npm run build` then asserts ZERO occurrences of `__mokoshTest` and ZERO occurrences of `simulateUserStop` in any file under `dist/`. RED today (the test runs before this plan lands its hook gating); GREEN after Task 1 verifies the gate AND the hook gating is correct.
path provides contains
package.json Adds `puppeteer` ^25.0.2 + `tsx` ^4 to devDependencies; adds two npm scripts: `build:test` (`tsc && vite build --mode test --config vite.test.config.ts`) and `test:uat` (`npm run build:test && tsx tests/uat/harness.test.ts`). test:uat
path provides
tsconfig.json Includes `src/test-hooks/**/*` in compilation surface (so tsc validates the hook code). NO change to emit (vite handles bundling).
path provides contains
.planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md AMENDMENT block at the end of the file: redirects Plan 01-09 Task 5 operator-checkpoint steps 4-13 + 15 to `npm run test:uat` (this plan's harness). Operator retains step 1 (build) + step 14 (brand/design accept) only. Plan 01-09 closes when `npm run test:uat` exits 0 AND operator confirms brand/design step 14. Plan 01-11 amendment
from to via pattern
tests/uat/harness.test.ts tests/uat/lib/launch.ts:launchHarnessBrowser import import.*from.*lib/launch
from to via pattern
tests/uat/lib/launch.ts puppeteer.launch enableExtensions + headless + autoSelect flag enableExtensions
from to via pattern
src/background/index.ts src/test-hooks/sw-hooks.ts guarded dynamic import import.meta.env.MODE === ['"]test['"]
from to via pattern
src/offscreen/recorder.ts src/test-hooks/offscreen-hooks.ts guarded dynamic import + setMediaStreamGetter wire import.meta.env.MODE === ['"]test['"]
from to via pattern
tests/uat/lib/offscreen.ts:simulateUserStop track.dispatchEvent(new Event('ended')) evaluate-in-offscreen-page on __mokoshTest.getCurrentStream().getVideoTracks()[0] dispatchEvent(new Event(['"]ended['"]
from to via pattern
tests/background/no-test-hooks-in-prod-bundle.test.ts dist/ artifact tree post-build grep for __mokoshTest + simulateUserStop grep.*__mokoshTest.*dist

Scope Sanity Note

4 waves, 8 tasks, 18 file artifacts. This sits at the upper end of the "split signal" threshold but consolidating is the right call:

  1. The test infrastructure (Wave 0), the hook gating (Wave 1), the harness scaffolding (Wave 2), and the 14 assertions (Wave 3) are tightly coupled at the contract level — splitting them into separate plans would force the harness contract (the __mokoshTest shape) to be re-derived in each plan's frontmatter must_haves, multiplying the duplication tax.
  2. Per RESEARCH §6, the two-bundle gate (__mokoshTest ABSENT in production) is the security-critical mitigation for shipping test hooks. That gate MUST be wired in the same plan that adds the hooks; splitting would create a window where the hooks exist but the gate doesn't.
  3. Wave 4 (closure) is a single checkpoint task — bundling it with Wave 3 wouldn't change context cost meaningfully, and separating it keeps the operator-checkpoint scope visible in the wave structure.
  4. Context budget: Wave 0 + Wave 1 + Wave 2 ~30%; Wave 3 ~35%; Wave 4 ~5% (checkpoint). Total ~70%. Above the 50% target — but the 14 assertions are deterministic and template-shaped, so per-assertion authoring cost is sub-linear once Wave 2 lands.

If a future revision DOES force a split, natural cut line: Plan 01-11A = Waves 0+1+2 (infrastructure + first 4 assertions as smoke); Plan 01-11B = Waves 3+4 (remaining 10 assertions + closure). This split incurs the contract-duplication tax and is NOT recommended absent a context-cost regression.

Build a Puppeteer-driven Node UAT harness that retires the operator-as-assertion-library role. Plan 01-09's Task 5 took 4-6 hours of operator empirical UAT cycles (Bug A icons + Bug B state routing both escaped vitest unit coverage); every "visual" check in that task has a CDP-callable equivalent. This plan automates them.

Three coordinated changes:

  1. Two-bundle separation via vite.test.config.ts extending the production config with mode: 'test' + outDir: 'dist-test'. Production builds stay hook-free.
  2. Test hooks in src/test-hooks/ consumed via guarded dynamic imports from SW + offscreen. The dynamic-import-inside-MODE-guard pattern (RESEARCH §6) lets Vite tree-shake the hook MODULES entirely from production, with a Tier-1 grep gate (tests/background/no-test-hooks-in-prod-bundle.test.ts) verifying the absence.
  3. Puppeteer harness at tests/uat/harness.test.ts (plus a lib/ helper split following MetaMask's POM shape per RESEARCH §5) implementing 14 assertions: assertion 0 (production-bundle hook-leak grep gate) + assertions 1-13 from the orchestrator brief. Bug B uses track.dispatchEvent(new Event('ended')) per RESEARCH §7 BLOCKER — NOT track.stop() which silently invalidates the assertion.

Operator role retirement: Plan 01-09's Task 5 is amended to redirect steps 4-13 + 15 to npm run test:uat. Operator retains only step 1 (build verification) + step 14 (brand/design acceptance). All functional gates move to CI-callable harness.

Output:

  • vite.test.config.ts — production config extension with mode: 'test' + outDir: 'dist-test'.
  • src/test-hooks/{sw-hooks,offscreen-hooks,types}.ts — gated hook modules.
  • src/background/index.ts + src/offscreen/recorder.ts — gated dynamic import block (one line each + a setMediaStreamGetter wire in offscreen).
  • tests/uat/harness.test.ts + tests/uat/lib/*.ts + tests/uat/README.md — harness + helpers.
  • tests/background/no-test-hooks-in-prod-bundle.test.ts — Tier-1 unit-level hook-leak gate.
  • package.jsonpuppeteer, tsx devDeps + build:test, test:uat scripts.
  • tsconfig.json — includes src/test-hooks/**/* for type-checking.
  • .planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md — amendment block redirecting Task 5 functional steps to npm run test:uat.

<execution_context> @$HOME/.claude/get-shit-done/workflows/execute-plan.md @$HOME/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/REQUIREMENTS.md @.planning/phases/01-stabilize-video-pipeline/01-CONTEXT.md @.planning/phases/01-stabilize-video-pipeline/01-08-PLAN.md @.planning/phases/01-stabilize-video-pipeline/01-08-SUMMARY.md @.planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md @.planning/phases/01-stabilize-video-pipeline/01-09-SUMMARY.md @.planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md @.planning/debug/resolved/01-09-recovery-flow.md @src/background/index.ts @src/offscreen/recorder.ts @manifest.json @vite.config.ts @tsconfig.json @package.json @tests/background/sw-bundle-import.test.ts

Puppeteer 25.0.2 extension API surface (RESEARCH §1, empirically verified)

import puppeteer, { Browser, Extension, Page, Target } from 'puppeteer';

const browser: Browser = await puppeteer.launch({
  pipe: true,
  enableExtensions: ['/abs/path/to/dist-test'],   // string[] or true
  headless: process.env.HEADLESS !== '0',          // default headless=true; local debug HEADLESS=0
  args: [
    '--no-sandbox',
    '--auto-select-desktop-capture-source=Entire screen',  // RESEARCH §9 — locale-specific
    // DO NOT add --use-fake-ui-for-media-stream (per RESEARCH §9 Pitfall, conflicts with auto-select)
  ],
});

const extensions = await browser.extensions();   // Map<id, Extension>
const [extId, ext] = [...extensions][0];

const swTarget = await browser.waitForTarget(
  (t: Target) => t.type() === 'service_worker',
  { timeout: 10_000 },
);
const sw = await swTarget.worker();              // WebWorker — has .evaluate()

const page = await browser.newPage();
await page.goto('about:blank');
await page.triggerExtensionAction(ext);          // simulates toolbar click (NEEDS popup === '')

// Offscreen page — RESEARCH §4 / Pitfall 1: target type 'background_page' NOT 'page'
const offTarget = browser.targets().find((t) =>
  t.type() === 'background_page' && t.url().includes('offscreen'),
);
const offPage = await offTarget.asPage();        // NOT .page() — only .asPage() works

Chrome SW state surface (read via sw.evaluate)

// Read badge text
const badge = await sw.evaluate(() => chrome.action.getBadgeText({}));

// Read popup
const popup = await sw.evaluate(() => chrome.action.getPopup({}));

// Read manifest
const manifest = await sw.evaluate(() => chrome.runtime.getManifest());
// manifest.icons === { '16': 'icons/icon16.png', '48': '...', '128': '...' }
// manifest.permissions includes 'notifications', etc.

// Synthesize RECORDING_ERROR (no hook needed — goes through onMessage handler)
await sw.evaluate(() =>
  chrome.runtime.sendMessage({ type: 'RECORDING_ERROR', error: 'codec-unsupported' }),
);

// Invoke onStartup via captured handler ref (needs hook — see sw-hooks.ts)
await sw.evaluate(() => globalThis.__mokoshTest!.handlers.onStartup?.());

// Fetch an extension file and check size
const iconSize = await sw.evaluate(async () => {
  const r = await fetch(chrome.runtime.getURL('icons/icon48.png'));
  return r.ok ? Number(r.headers.get('content-length') ?? '0') : -1;
});

Offscreen surface (read via offPage.evaluate)

// Read displaySurface — RESEARCH §11 Req 3
const ds = await offPage.evaluate(() =>
  globalThis.__mokoshTest!.getCurrentStream!()?.getVideoTracks()[0]?.getSettings().displaySurface ?? null,
);

// Simulate user-stopped — RESEARCH §7 BLOCKER. MUST be dispatchEvent, NOT track.stop().
await offPage.evaluate(() => {
  const stream = globalThis.__mokoshTest!.getCurrentStream!();
  if (stream === null) throw new Error('no current stream — recording must be active');
  const track = stream.getVideoTracks()[0];
  track.dispatchEvent(new Event('ended'));
  // Track still readyState 'live' after dispatch; production handler will
  // call stream.getTracks().forEach(t => t.stop()) which DOES release the
  // capture (just doesn't refire 'ended' on the same track — spec).
});

Test hook contract (NEW — src/test-hooks/types.ts)

// src/test-hooks/types.ts
// SINGLE SOURCE OF TRUTH for the __mokoshTest wire shape.
// Imported by sw-hooks.ts (registers), offscreen-hooks.ts (registers),
// and tests/uat/lib/test-hook-contract.d.ts (consumes — mirror).

export interface MokoshTestSurface {
  // SW handler refs (captured by sw-hooks.ts monkey-patching addListener)
  handlers: {
    onClicked: ((tab: chrome.tabs.Tab) => void | Promise<void>) | null;
    onStartup: (() => void | Promise<void>) | null;
    notificationOnClicked: ((notificationId: string) => void | Promise<void>) | null;
  };
  // SW notification observability
  notificationCount: number;
  lastNotificationOptions: chrome.notifications.NotificationOptions | null;
  notificationIds: ReadonlyArray<string>;
  // Offscreen getCurrentStream — undefined in SW context; defined in offscreen.
  // Always-present in the type to keep the harness side simple; runtime null is
  // the "not currently recording" signal.
  getCurrentStream?: () => MediaStream | null;
}

declare global {
  // eslint-disable-next-line no-var
  var __mokoshTest: MokoshTestSurface | undefined;
}

export {};

Production hook-gate pattern (src/background/index.ts top-of-module)

// AT THE VERY TOP of src/background/index.ts, BEFORE any addListener calls.
// import.meta.env.MODE is statically replaced at build time by Vite (RESEARCH §6);
// the entire `if` block + its dynamic import are tree-shaken from production bundles
// because the literal === comparison resolves to `false` and Rollup deletes the
// unreachable branch.
if (import.meta.env.MODE === 'test') {
  await import('../test-hooks/sw-hooks');
}

CRITICAL ORDERING: the hook import MUST run BEFORE any production addListener calls so the monkey-patches catch the handlers as they register. Top-of-module placement satisfies this.

Production hook-gate pattern (src/offscreen/recorder.ts)

// Top-of-module: register the hook.
if (import.meta.env.MODE === 'test') {
  await import('../test-hooks/offscreen-hooks');
}

// Later, INSIDE startRecording after `mediaStream = stream;` (line ~247):
// Wire the runtime mediaStream reference into the hook. The hook's
// getCurrentStream getter reads through this wire. Gated identically so
// production bundle has zero hook reference at this site.
if (import.meta.env.MODE === 'test') {
  globalThis.__mokoshTest?.getCurrentStream;  // no-op read — actual wiring is in offscreen-hooks.ts setup
  // The hook installs its own getter at registration time via a closure capture of
  // a `currentStream` cell that we mutate here:
  const hooks = await import('../test-hooks/offscreen-hooks');
  hooks.setCurrentStream(stream);
}

(Note: the executor may flatten this — the simpler shape is to expose a setCurrentStream function from offscreen-hooks.ts that the recorder calls after assignment. The hook-side closes over a mutable currentStream variable. See Task 2 step 5.)

Vite test config skeleton (vite.test.config.ts)

import { defineConfig, mergeConfig } from 'vite';
import baseConfig from './vite.config';

export default defineConfig(() =>
  mergeConfig(baseConfig, {
    mode: 'test',
    build: {
      outDir: 'dist-test',
      emptyOutDir: true,
    },
  }),
);

npm scripts to add (package.json)

{
  "scripts": {
    "dev": "vite",
    "build": "tsc && vite build",
    "build:test": "tsc && vite build --mode test --config vite.test.config.ts",
    "preview": "vite preview",
    "test": "vitest run",
    "test:uat": "npm run build:test && tsx tests/uat/harness.test.ts"
  }
}

Existing surfaces the executor must NOT alter (regression risk)

  • src/background/index.ts lines 725-778 (RECORDING_ERROR conditional routing) — Bug B fix landed at b9eeeeb; harness asserts this is intact.
  • src/offscreen/recorder.ts lines 451-480 (onUserStoppedSharing) — Bug B handler; harness assertion 6 verifies the dispatchEvent path reaches it.
  • tests/background/sw-bundle-import.test.ts — Tier-1 gate; the new no-test-hooks-in-prod-bundle.test.ts follows the same pattern but inspects the BUILT artifact for hook leaks.
  • manifest.json — already declares notifications permission + all 3 icon sizes; harness assertions 8, 9, 10 read these as-is.
  • ALL existing 83 vitest tests — must remain GREEN.

Resolved open questions from RESEARCH (5)

# Question Resolution Rationale
1 Where does simulateUserStop shim live? src/test-hooks/offscreen-hooks.ts exports a setCurrentStream(stream: MediaStream) setter the recorder calls after assignment. The hook's __mokoshTest.getCurrentStream is a getter over the captured cell. simulateUserStop is harness-side (in tests/uat/lib/offscreen.ts) calling dispatchEvent directly on the track returned by getCurrentStream() — the offscreen-hooks side just exposes the stream; the simulate function is harness-side. Minimum surface in production tree; the dispatchEvent invocation is harness-side so it's never bundled.
2 Notification assertions: count vs set-membership? Count + set-membership combined. notificationCount asserts on TOTAL count (e.g. assertion 8: exactly 1 startup notification). notificationIds asserts on prefix membership (e.g. "an id starting 'mokosh-startup-' was created"). lastNotificationOptions asserts on iconUrl shape. Pure count is brittle (retries inflate); pure set-membership misses overcount regressions. Combined assertions catch both.
3 CI plumbing scope: include or defer? Defer to Phase 5 (P1/P2 hardening) or its own Plan 01-12. This plan ships a CI-callable harness (npm run test:uat exits 0 on pass, non-zero on fail) but no GitHub Actions wiring. Rationale: no existing CI infrastructure in the repo (verified — no .github/workflows/ directory); adding CI here would force a CI-tool decision (Actions vs self-hosted) that is out of scope for Phase 1 stabilization. Lowest-friction shipping; CI tool selection deserves its own plan.
4 Failure isolation: single browser vs per-assertion restart? Single browser, serial assertions. Restart between assertions = ~3-5 s × 14 = 60+ s overhead per run. Single browser keeps total runtime under 60 s. Mitigation: structured diagnostic dump on first failure (SW console logs + offscreen console logs + screenshot) + --bail semantics (abort remaining assertions to keep failure mode unambiguous). RESEARCH §5 recommendation matches; cost of state bleed is much lower than cost of state isolation overhead for 14 deterministic checks.
5 Test-hook contract location? Both. Production-side canonical: src/test-hooks/types.ts (the file that ships with the test bundle and is type-checked by tsc). Harness-side mirror: tests/uat/lib/test-hook-contract.d.ts (decoupled from the production tree so the harness has no import reaching into src/). The mirror file's preamble cites the production-side file as the canonical source. Drift detection: a Tier-1-style test could later snapshot-diff the two; out of scope here, but documented as a follow-up note. Type duplication is a small price for keeping tests/ and src/ import-separable. The drift risk is low because the shape is small (4 fields).

How to test Bug B without committing the revert

Per orchestrator brief ("rewinding the b9eeeeb conditional routing locally turns this assertion RED"):

  1. Locally apply: git apply <<'EOF' ... EOF containing a temporary patch that reverts the if (errorCode === 'user-stopped-sharing') branch (so all errors route through setErrorMode).
  2. Run npm run test:uat; assertion 6 (Bug B) MUST fail with a specific diagnostic (expected badge text '' but got 'ERR').
  3. Revert the local patch (git checkout -- src/background/index.ts).
  4. Re-run npm run test:uat; assertion 6 MUST pass.

This RED-on-known-broken / GREEN-on-known-good cycle is the TDD discipline for the harness ITSELF. Each assertion in Task 5/6/7 includes this self-verification step in its action block.

Task 1 (Wave 0): Install Puppeteer + tsx; add `vite.test.config.ts`; add `build:test` + `test:uat` npm scripts; commit Tier-1 hook-leak grep gate as RED. - package.json (existing scripts + devDeps — confirm puppeteer + tsx absent) - vite.config.ts (the base config the new test config will merge over) - tests/background/sw-bundle-import.test.ts (Tier-1 gate pattern to mirror) - tsconfig.json (confirm `include` covers `src/**/*` — needed for src/test-hooks/) - .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §10 (two-bundle build orchestration) package.json, package-lock.json, vite.test.config.ts, tsconfig.json, tests/background/no-test-hooks-in-prod-bundle.test.ts - `npm install --save-dev puppeteer@^25.0.2 tsx@^4` lands cleanly. Both publish to npm registry as MIT-licensed packages with active maintenance windows (puppeteer 25.0.2 published 2025; tsx 4.x current). Pin both with caret ranges per project convention. - `vite.test.config.ts` exists, extends `./vite.config.ts` via `mergeConfig`, sets `mode: 'test'` + `build.outDir: 'dist-test'` + `build.emptyOutDir: true`. Running `npx vite build --config vite.test.config.ts --mode test` produces `dist-test/` (verifiable via `test -d dist-test`). - `package.json` `scripts` block adds `build:test` and `test:uat` per the interfaces block. `npm run build:test` exits 0 and produces `dist-test/`. - `tsconfig.json` `include` covers `src/test-hooks/**/*` (verify it does already via the `src/**/*` glob; no edit needed if `include` is already that wildcard — check first and only add if absent). - `tests/background/no-test-hooks-in-prod-bundle.test.ts` exists with TWO `it` blocks: (a) After `npm run build`, ZERO occurrences of `__mokoshTest` in any file under `dist/`. RED today because the gate test is committed BEFORE the hooks land — the gate is asserting on a not-yet-extant invariant. **CORRECTION:** RED-then-GREEN polarity here is inverted vs typical TDD: the gate ITSELF is GREEN today (no hooks → no leak), but the GATE must REMAIN GREEN after Task 2 lands the hooks. The test is committed in this task so the gate is operational BEFORE the hooks ship, eliminating the window-of-vulnerability where the production bundle could contain leaked hooks. Document this polarity in the test file preamble. (b) After `npm run build`, ZERO occurrences of `simulateUserStop` in any file under `dist/`. Same polarity: GREEN today, must remain GREEN after hooks land. - Both `it` blocks run a fresh `npm run build` as part of their setup (spawned via `child_process.execFile`, mirroring sw-bundle-import.test.ts's spawn pattern). They then `readdir`+`readFileSync` walk `dist/` and assert grep counts are zero. Skip the build spawn if `process.env.SKIP_BUILD === '1'` (developer escape hatch when running the test repeatedly during this task's iteration). - The 83 baseline vitest tests + 2 new gate tests = 85 tests, ALL GREEN. (The Tier-1 gate is committed in a working state from day one.) 1. Read `package.json` to confirm `puppeteer` + `tsx` absent. 2. `npm install --save-dev puppeteer@^25.0.2 tsx@^4` — observe versions resolve correctly. Document the actually-resolved versions in the commit message body. 3. Update `package.json` `scripts` block per the interfaces section — add `build:test` and `test:uat`. Leave existing scripts (`dev`, `build`, `preview`, `test`) untouched. 4. Create `vite.test.config.ts` at repo root per the interfaces skeleton. 5. Verify `tsconfig.json` `include` covers `src/test-hooks/**/*` — if `include` is `["src/**/*"]` or omits `exclude` that would block, no edit needed. Document the actual `tsconfig.json` shape in the commit message body so reviewers see the verification ran. 6. Run `npm run build:test` → exit 0; `ls dist-test/` confirms emission. Run `npm run build` → exit 0; `ls dist/` confirms separate output. 7. Create `tests/background/no-test-hooks-in-prod-bundle.test.ts` with the two `it` blocks per behavior (a) + (b). Preamble docstring per project style: extensive (Google Python style mandate carries over — keep mirroring sw-bundle-import.test.ts's docstring density). Cite that this is a Tier-1 gate per `feedback-pre-checkpoint-bundle-gates.md` (the auto-loaded memory item). 8. Run `npx vitest run tests/background/no-test-hooks-in-prod-bundle.test.ts` → both GREEN (no hooks landed yet, nothing leaks). 9. Run `npx vitest run` (full suite) → 84 baseline + 2 new = 85 GREEN. Document the baseline + delta in the commit message body. 10. Run `npx tsc --noEmit` → exit 0. 11. Verify that NO `npm test` regression: rerun `npm test` → 85 GREEN. Per project style: extensive docstrings; absolute imports; no `as any`; no `@ts-ignore`. The new test file is the first one to touch `child_process.execFile` since `sw-bundle-import.test.ts` — mirror that file's pattern verbatim (execFile + maxBuffer + timeout + stdout sentinel scheme). Do NOT introduce a new pattern. npm run build:test && npm run build && test -d dist-test && test -d dist && npx vitest run tests/background/no-test-hooks-in-prod-bundle.test.ts && npx tsc --noEmit - `package.json` devDeps include `puppeteer` + `tsx` at the pinned versions; `scripts` block carries `build:test` + `test:uat`. - `vite.test.config.ts` exists, extends base config, emits to `dist-test/`. - `npm run build:test` exits 0; `dist-test/` populated. - `npm run build` exits 0; `dist/` populated separately (no clobber). - `tests/background/no-test-hooks-in-prod-bundle.test.ts` exists with 2 tests; both GREEN. - Full vitest suite: 83 baseline + 2 new = 85 GREEN. - `npx tsc --noEmit` exit 0. Two-bundle infrastructure landed; Tier-1 hook-leak gate operational (GREEN, will remain GREEN after Task 2 hooks land); npm scripts wired; baseline preserved. Task 2 (Wave 1): Add gated test hooks to SW + offscreen; verify production bundle remains hook-free (Tier-1 gate stays GREEN). - src/background/index.ts (top-of-module — where the import.meta.env.MODE guard lands; lines 1-50) - src/offscreen/recorder.ts (top-of-module + line ~247 where mediaStream is assigned) - tests/background/sw-bundle-import.test.ts (the Tier-1 SW-bundle-loadability gate — confirm it still passes after hooks land in test bundle) - tests/background/no-test-hooks-in-prod-bundle.test.ts (the gate from Task 1) - .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §6 (Vite tree-shaking gotchas) - vite.test.config.ts (from Task 1) src/test-hooks/types.ts, src/test-hooks/sw-hooks.ts, src/test-hooks/offscreen-hooks.ts, src/background/index.ts, src/offscreen/recorder.ts, tests/uat/lib/test-hook-contract.d.ts - `src/test-hooks/types.ts` exports `MokoshTestSurface` + declares `globalThis.__mokoshTest` per the interfaces block. - `src/test-hooks/sw-hooks.ts` registers the SW-side hook at module-load: monkey-patches `chrome.action.onClicked.addListener`, `chrome.runtime.onStartup.addListener`, `chrome.notifications.onClicked.addListener` to capture handler refs while still calling the originals. Wraps `chrome.notifications.create` to increment `notificationCount`, push id to `notificationIds`, save `lastNotificationOptions`. Initializes `globalThis.__mokoshTest = { handlers: {...}, notificationCount: 0, lastNotificationOptions: null, notificationIds: [] }`. NO `getCurrentStream` in SW (the field is optional per type — undefined in SW context). - `src/test-hooks/offscreen-hooks.ts` registers the offscreen-side hook: exposes a mutable `currentStream: MediaStream | null` cell + `setCurrentStream(s)` setter + `__mokoshTest.getCurrentStream = () => currentStream` getter. The recorder calls `setCurrentStream` after the `mediaStream = stream` assignment (gated by the same MODE check). - `src/background/index.ts` top-of-module gets: ```typescript if (import.meta.env.MODE === 'test') { await import('../test-hooks/sw-hooks'); } ``` Placement: BEFORE any `addListener` calls in the file so the monkey-patches catch every handler. This is a top-level `await` — supported in SW context per crxjs/Vite's MV3 module emission. - `src/offscreen/recorder.ts` top-of-module gets the symmetric gated import; the `setCurrentStream` call lands inside `startRecording` right after `mediaStream = stream;` (line 247), also gated. - `tests/uat/lib/test-hook-contract.d.ts` mirrors `MokoshTestSurface` for harness-side consumption (it's a declaration file; not bundled, only used at type-check time on the harness). - After all changes, `npm run build` exits 0 AND `tests/background/no-test-hooks-in-prod-bundle.test.ts` REMAINS GREEN (the literal `__mokoshTest` does NOT appear in any file under `dist/`). `npm run build:test` exits 0 AND ONE OR MORE files under `dist-test/` contain `__mokoshTest` (verifiable by `grep -l __mokoshTest dist-test/`). - `tests/background/sw-bundle-import.test.ts` REMAINS GREEN (Layer 1 + Layer 2; the gated dynamic import does not break the production bundle's module init). - Full vitest suite: 85 GREEN (no regression). 1. Create `src/test-hooks/types.ts` per the interfaces block. Extensive JSDoc; cite this plan's Task 2 + RESEARCH §6 (gating mechanism) + RESEARCH §7 (Bug B BLOCKER context for getCurrentStream's role). 2. Create `src/test-hooks/sw-hooks.ts`. Monkey-patch pattern follows RESEARCH §6 Pattern 1. Wrap `chrome.notifications.create` so all four shape fields update (count, last options, ids array, no-op chain to the original create). Use absolute Chrome types from `@types/chrome` — no `as any`. Initialization at module load: ```typescript const handlers: MokoshTestSurface['handlers'] = { onClicked: null, onStartup: null, notificationOnClicked: null, }; const notificationIds: string[] = [];
   const origActionAdd = chrome.action.onClicked.addListener.bind(chrome.action.onClicked);
   chrome.action.onClicked.addListener = (cb) => {
     handlers.onClicked = cb;
     return origActionAdd(cb);
   };
   // ... similarly for onStartup, notifications.onClicked ...

   const origNotifCreate = chrome.notifications.create.bind(chrome.notifications);
   (chrome.notifications.create as unknown) = (idOrOptions: string | chrome.notifications.NotificationOptions, optionsOrCb?: chrome.notifications.NotificationOptions | ((id: string) => void), maybeCb?: (id: string) => void) => {
     // Handle both (id, options, cb) and (options, cb) overloads;
     // surface the resolved id in notificationIds.
     // Call origNotifCreate with the same args; wrap the callback to push id.
     // Increment notificationCount; save lastNotificationOptions.
     // Return the original return value (Chrome 88+ also Promise-returning).
   };

   globalThis.__mokoshTest = {
     handlers,
     notificationCount: 0,
     lastNotificationOptions: null,
     get notificationIds() { return notificationIds.slice(); },
   };
   ```
   The `as unknown` cast in the `create` reassignment is unavoidable because Chrome's `create` is typed as overloaded callable; document this explicitly with a comment citing the overload variance issue. NO `as any` — the `as unknown` + downstream typed body is the project-style escape hatch.
3. Create `src/test-hooks/offscreen-hooks.ts`:
   ```typescript
   let currentStream: MediaStream | null = null;
   export function setCurrentStream(stream: MediaStream | null): void {
     currentStream = stream;
   }
   globalThis.__mokoshTest = {
     // ...inherit SW's surface if it was set first; in offscreen context
     // sw-hooks.ts did NOT run because this is a different document.
     // So we initialize a fresh shape with only the offscreen-relevant fields:
     handlers: { onClicked: null, onStartup: null, notificationOnClicked: null },
     notificationCount: 0,
     lastNotificationOptions: null,
     notificationIds: [],
     getCurrentStream: () => currentStream,
   };
   ```
   Note: the SW and offscreen are DIFFERENT JS isolates with DIFFERENT `globalThis`. The harness reads each surface via the appropriate `sw.evaluate` or `offPage.evaluate`. No cross-context shared state.
4. Edit `src/background/index.ts` — add the gated dynamic import at the TOP of the file (after any necessary type imports but BEFORE the existing logger initialization + addListener calls). Document inline that the placement is load-order-critical: this MUST run before any addListener.
5. Edit `src/offscreen/recorder.ts`:
   (a) Top-of-module: gated dynamic import per the SW pattern.
   (b) Inside `startRecording`, immediately after `mediaStream = stream;` (line ~247): gated `setCurrentStream(stream)` call. Use a top-level captured reference to the hooks module (set during the top-of-module import via a module-scoped `let hooks: typeof import('../test-hooks/offscreen-hooks') | null = null;` plus assignment in the import block). This avoids re-import per startRecording call.
6. Create `tests/uat/lib/test-hook-contract.d.ts`. Mirror `MokoshTestSurface`. Add a preamble docstring citing `src/test-hooks/types.ts` as the canonical source AND noting the drift-risk (manual sync) + the rationale for decoupling (no `import` from `tests/` into `src/`).
7. Run `npx tsc --noEmit` → exit 0 (all hook code typechecks).
8. Run `npm run build` (production). Then check `grep -rln __mokoshTest dist/` → ZERO matches. The Tier-1 gate test `tests/background/no-test-hooks-in-prod-bundle.test.ts` MUST stay GREEN.
9. Run `npm run build:test`. Then check `grep -rln __mokoshTest dist-test/` → ONE OR MORE matches (the hook code is bundled into the test build).
10. Run `npx vitest run` (full suite). 85 GREEN. The SW-bundle-import test must also be GREEN — verifies the gated dynamic import does NOT break production module init.
11. Sanity-check: open one of the production bundle's chunk files (the SW chunk via `dist/service-worker-loader.js` → its imported chunk) and confirm by eye that no `__mokoshTest` string is present. The grep gate is authoritative, but a manual eyeball ensures the gate isn't fooled by some bundler renaming.
DESIGN NOTE: the gated dynamic import IS the tree-shake trigger. If Vite ever fails to tree-shake a dynamic import behind a literal-comparison guard (which it shouldn't per RESEARCH §6 — the literal `'test'` !== `'production'` comparison is a static dead branch in production), the Tier-1 gate fails LOUDLY at CI time. The gate is THE mitigation for assumption A3 in RESEARCH §6.
npx tsc --noEmit && npm run build && test "$(grep -rln __mokoshTest dist/ | wc -l)" = "0" && npm run build:test && test "$(grep -rln __mokoshTest dist-test/ | wc -l)" -ge "1" && npx vitest run --reporter=dot - `src/test-hooks/{types,sw-hooks,offscreen-hooks}.ts` exist with the contracts described. - `src/background/index.ts` + `src/offscreen/recorder.ts` carry the gated dynamic import block; in offscreen, also the `setCurrentStream(stream)` call inside `startRecording`. - `tests/uat/lib/test-hook-contract.d.ts` mirrors the type. - `npm run build` exits 0; `grep -rln __mokoshTest dist/` → 0 matches. - `npm run build:test` exits 0; `grep -rln __mokoshTest dist-test/` → ≥1 match. - Tier-1 grep gate (`tests/background/no-test-hooks-in-prod-bundle.test.ts`) GREEN. - Tier-1 SW-bundle-import gate (`tests/background/sw-bundle-import.test.ts`) GREEN. - Full vitest suite: 85 GREEN. - `npx tsc --noEmit` exit 0. Hook surfaces live in test bundle; absent in production bundle (Tier-1 grep gate verifies); SW + offscreen module init unchanged for production; baseline preserved. Task 3 (Wave 2): Build harness scaffolding — `tests/uat/lib/{launch,extension,sw,offscreen,assertions,zip}.ts` + `harness.test.ts` skeleton with all 14 assertions stubbed as failing. - .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §1 (Puppeteer extension API patterns) - .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §4 (target type quirk for offscreen) - .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §7 (Bug B dispatchEvent contract — BLOCKER) - .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §11 (per-assertion implementation hints) - src/test-hooks/types.ts (from Task 2) - tests/uat/lib/test-hook-contract.d.ts (from Task 2) - tests/background/sw-bundle-import.test.ts (execFile child-process pattern — only relevant for assertion 0 which uses fs.readdir directly, not a spawned child) tests/uat/lib/launch.ts, tests/uat/lib/extension.ts, tests/uat/lib/sw.ts, tests/uat/lib/offscreen.ts, tests/uat/lib/assertions.ts, tests/uat/lib/zip.ts, tests/uat/harness.test.ts, tests/uat/README.md - `tests/uat/lib/launch.ts` exports `launchHarnessBrowser(options?: HarnessOptions): Promise` returning `{ browser, sw, ext, page, downloadsDir }`. Reads `HEADLESS` env var (`'0'` = headful for debug, anything else = headless). Wires Chrome args per RESEARCH §1 + §9. - `tests/uat/lib/extension.ts` exports `attachToSw`, `attachToOffscreen`, `waitForOffscreen` per the RESEARCH §4 patterns. The offscreen attach uses the `background_page` target type + `.asPage()` (Pitfall 1). - `tests/uat/lib/sw.ts` exports `getBadgeText(sw)`, `getPopup(sw)`, `getManifest(sw)`, `getIconSize(sw, path)`, `fireOnStartup(sw)`, `sendSyntheticRecordingError(sw, errorCode)`, `keepalivePing(sw)`, `getNotificationSnapshot(sw)`. - `tests/uat/lib/offscreen.ts` exports `getDisplaySurface(offPage)`, `simulateUserStop(offPage)` (the dispatchEvent path per RESEARCH §7 BLOCKER — with an inline comment block citing the BLOCKER reasoning so future readers don't refactor it to `track.stop()`). - `tests/uat/lib/assertions.ts` exports `assertEqual(actual, expected, msg)` + `assertMatch(actual, regex, msg)` + `assertTrue(cond, msg)` + a structured `runAssertion(name, fn)` wrapper that runs a single assertion, captures any SW/offscreen console logs since the last assertion, and dumps them to stderr on failure. Uses `node:assert/strict` per RESEARCH §4. - `tests/uat/lib/zip.ts` exports `assertArchiveShape(zipBuf, expectedVersion)` — opens with jszip, asserts `video/last_30sec.webm` present + `meta.json` carries `version === expectedVersion`. The meta.json shape is per Plan 01-07 (existing archive contract — read once at the start of the harness and pass through). - `tests/uat/harness.test.ts` is the single Node script (tsx-runnable). Top-to-bottom narrative: ``` 0. Pre-flight grep gate (filesystem readdir on dist/) — assertion 0. 1. launchHarnessBrowser → attachToSw → attachToOffscreen-when-ready. 2. Assertion 1: SW bootstrap → setIdleMode (badge '', popup '', isRecording=false). 3. Assertion 2: triggerExtensionAction → wait → badge 'REC' + popup === src/popup/index.html + isRecording=true. 4. Assertion 3: offscreen track displaySurface === 'monitor'. 5. Assertion 4: triggerExtensionAction (while recording) → popup opens, NO new offscreen target. 6. Assertion 5: sendMessage SAVE_ARCHIVE → wait for download → check downloadsDir contains session_report_*.zip. 7. Assertion 6 (BUG B): simulateUserStop → wait 300ms → badge '' + popup '' + isRecording=false + notificationCount delta = 0. 8. Assertion 7 (ERROR path): sendSyntheticRecordingError('codec-unsupported') → badge 'ERR' + notificationCount delta = 1. 9. Assertion 8 (BUG A + onStartup): fireOnStartup → notifications.create called once with iconUrl matching icons/icon48.png (or icon128.png — verify which one the production code uses; the badge_state_machine plan uses icon128, but the test asserts whichever the production code actually invokes per the lastNotificationOptions snapshot). 10. Assertion 9: icon file sizes via sw.evaluate(fetch) ≥ floors (16: 200B, 48: 500B, 128: 1024B). 11. Assertion 10: manifest has 'notifications' permission + icons.16 + icons.48 + icons.128 declared. 12. Assertion 11 (35s record): start a fresh recording, wait 35s, query SW (via runtime message → offscreen → segments count) → segments.length >= 3. 13. Assertion 12 (ffprobe gate): trigger SAVE_ARCHIVE, extract video/last_30sec.webm, spawn ffprobe → exit 0. 14. Assertion 13 (zip shape): assertArchiveShape on the latest session_report_*.zip. 15. Final summary: `console.log('UAT harness: 14/14 assertions passed')`; exit 0. ``` ALL 14 assertions stubbed today as `runAssertion('N: title', async () => { throw new Error('NOT YET IMPLEMENTED — Task 5+ wires this'); });` so the harness exits non-zero with a clear "N assertions failed" diagnostic. Assertion 0 (filesystem-only) is wired in this task; assertions 1-13 are stubbed. - `tests/uat/README.md` documents: - How to run: `npm run test:uat` (build + harness). - Local-debug headful mode: `HEADLESS=0 npm run test:uat`. - Skipping the build (developer iteration): `SKIP_BUILD=1 npx tsx tests/uat/harness.test.ts` (the build is the npm-script wrapper; the harness itself can run against an existing `dist-test/`). - Locale gotcha: `--auto-select-desktop-capture-source="Entire screen"` works on en_US; other locales need the locale-equivalent string. Fallback to operator-pick + `KEEP_PROFILE=1` documented as the Plan 01-09 fallback. - dev-dep size: puppeteer pulls ~150MB Chromium binary; CI must accept this. Production `npm install --omit=dev` skips it cleanly. - Xvfb is NOT required (per RESEARCH §3 empirical probes on Chrome 148). - Failure isolation choice: single browser, serial assertions, bail on first failure (RESEARCH §5 + open-question resolution 4). - Running `npm run test:uat` exits NON-ZERO today (the 13 stubbed assertions all throw); the diagnostic clearly identifies which assertion failed AND why ("NOT YET IMPLEMENTED — Task 5+ wires this"). Assertion 0 (the grep gate) PASSES — confirming the harness scaffolding wires correctly and the only failures are intentional stubs. 1. Create the `tests/uat/lib/` directory + all 6 helper files. Use absolute imports per project style. NO `as any`; type each helper's surface explicitly. Each helper file gets a top-of-file docstring per project style (extensive Google-style). 2. `launch.ts`: implementation uses `puppeteer.launch({ enableExtensions: [absolutePath], headless: ..., args: [...] })`. The absolutePath is computed via `path.resolve(__dirname, '../../../dist-test')` (the harness lives at `tests/uat/harness.test.ts` so `../../../` lands at repo root). Use `fileURLToPath` + `import.meta.url` for the `__dirname` shim (the harness runs as ESM under tsx). 3. `extension.ts`: implementation per RESEARCH §1 + §4 patterns. The offscreen attach uses `browser.waitForTarget(t => t.type() === 'background_page' && t.url().includes('offscreen'), { timeout: 5_000 })`. After getting the target, `.asPage()` returns the Page. 4. `sw.ts`: each helper is one or two lines of `sw.evaluate(...)`. The `getNotificationSnapshot` helper returns a structured `{ count, lastOptions, ids }` to keep the harness's reasoning unified. 5. `offscreen.ts` `simulateUserStop`: ```typescript export async function simulateUserStop(offPage: Page): Promise { // RESEARCH §7 BLOCKER — DO NOT REFACTOR to track.stop(). // track.stop() does NOT fire 'ended' per W3C spec (verified probe7); // dispatchEvent IS the only path that triggers our production // onUserStoppedSharing handler. A test that calls track.stop() would // silently pass while production reality fails — exactly the trap // Bug B fix (commit b9eeeeb) addresses. await offPage.evaluate(() => { const stream = globalThis.__mokoshTest?.getCurrentStream?.(); if (!stream) throw new Error('no current MediaStream — recording must be active'); const track = stream.getVideoTracks()[0]; if (!track) throw new Error('no video track in stream'); track.dispatchEvent(new Event('ended')); }); } ``` 6. `assertions.ts`: `runAssertion(name, fn)` captures `console.log`/`console.error` from the harness's own process; for SW + offscreen console logs, accept an optional `consoleSinks` parameter — the harness wires SW.on('console', ...) + offPage.on('console', ...) listeners at launch and passes their accumulating buffers to runAssertion. On assertion failure: dump buffers to stderr with structured "SW console (last N):" + "Offscreen console (last N):" preambles; rethrow. 7. `zip.ts`: jszip-based reader. The `expectedVersion` comes from `chrome.runtime.getManifest().version` (queried once at the start of the harness via `sw.evaluate`). Assertion is exact equality. 8. `harness.test.ts`: the top-to-bottom narrative. Wrap the whole thing in a top-level `try/finally`; the `finally` always calls `browser.close()`. The 14 assertion stubs all throw the "NOT YET IMPLEMENTED" Error. Assertion 0 is wired in this task: ```typescript await runAssertion('0: production bundle has no __mokoshTest leak', async () => { // Filesystem-only — does not require the browser. // We don't run `npm run build` here; that's the caller's responsibility // (npm run test:uat does `npm run build:test` first; a separate `npm run build` // confirmation could be added as a pre-flight, but the no-test-hooks-in-prod-bundle // unit test already covers that and runs as part of `npm test`. Here we re-verify // for E2E robustness against the case where the unit test passed against a stale dist/.) const { execFileSync } = await import('node:child_process'); execFileSync('npm', ['run', 'build'], { stdio: 'inherit' }); const distDir = path.resolve(__dirname, '../../dist'); const matches = await grepRecursive(distDir, '__mokoshTest'); assertEqual(matches.length, 0, 'production dist/ must not contain __mokoshTest'); }); ``` NOTE: assertion 0 spawns `npm run build` from inside the harness, which costs ~10s. The unit test (Task 1) makes this somewhat redundant — but the unit test runs in the vitest pass; the harness runs separately. Belt + suspenders. Alternative: skip the spawn if `process.env.SKIP_PROD_REBUILD === '1'` for developer iteration. 9. `README.md`: per the behavior list. 10. Run `npm run test:uat`. Expected output: - `npm run build:test` runs first (succeeds; emits dist-test/). - `tsx tests/uat/harness.test.ts` runs. - Assertion 0 PASSES (filesystem grep gate). - Assertions 1-13 all THROW "NOT YET IMPLEMENTED". - Exit code: non-zero. - Diagnostic line: "UAT harness: 1/14 assertions passed, 13 failed (first failure: Assertion 1)". 11. Run `npx tsc --noEmit` → exit 0 (all harness code type-clean against `@types/chrome` + puppeteer types + `tests/uat/lib/test-hook-contract.d.ts`). 12. Run `npx vitest run` (full suite) → 85 GREEN (no regression to unit tests; the harness lives outside vitest's discovery). Per project style: extensive docstrings; absolute imports; no `as any`; no `@ts-ignore`; named callbacks (the runAssertion lambdas are short enough to be acceptable as inline arrows). Use if-else chains over early returns where the assertion logic has multi-arm branching; guard-clause early returns are fine for null-checks per established project exception. npx tsc --noEmit && npm run test:uat; test $? -ne 0 && npx vitest run --reporter=dot - All 7 helper files exist with the contracts described. - `harness.test.ts` exists with assertion 0 wired (GREEN) + assertions 1-13 stubbed (RED). - `README.md` documents the runtime + local-debug + CI semantics. - `npm run test:uat` exits non-zero today; diagnostic clearly identifies assertion 0 as PASS + assertions 1-13 as "NOT YET IMPLEMENTED". - `npx tsc --noEmit` exit 0 across both `src/` and `tests/` trees. - Full vitest suite: 85 GREEN. - No file under `src/` modified by this task (the harness is purely under `tests/`). Harness scaffolding live with assertion 0 wired GREEN; assertions 1-13 staged as RED stubs for Tasks 4-7; baseline preserved. Task 4 (Wave 3 — bundle 1/4): Wire assertions 1, 2, 3, 4 (SW bootstrap + toolbar onClicked + displaySurface + popup-during-recording). - tests/uat/harness.test.ts (skeleton from Task 3) - tests/uat/lib/{launch,extension,sw,offscreen}.ts (helpers from Task 3) - src/background/index.ts lines 75-108 (setIdleMode/setRecordingMode state machine — the production code these assertions verify) - src/background/index.ts lines 411-415 (setRecordingMode call site inside startVideoCapture) - src/background/index.ts lines 844-858 (chrome.action.onClicked listener registration) - src/offscreen/recorder.ts lines 241-247 (getDisplayMedia call + mediaStream assignment) - .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §1 (triggerExtensionAction + the popup-vs-onClicked MV3 contract) - .planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md (the must-haves these assertions are verifying) tests/uat/harness.test.ts, tests/uat/lib/sw.ts - Assertion 1 (SW bootstrap): after `launchHarnessBrowser` + attach SW, query `getBadgeText` (empty), `getPopup` (empty), `getIsRecording` (false — exposed via a new helper that reads `globalThis.isRecording` from the SW context via `sw.evaluate`; the SW production code has `isRecording` as a module-level let, accessible from the SW global). PASSES today against current bundle. - Assertion 2 (onClicked-idle): `page.triggerExtensionAction(ext)` → `await waitFor(() => getBadgeText() === 'REC', 5_000)` (poll up to 5s; the picker auto-selects the screen so getDisplayMedia resolves fast). Then assert popup === 'src/popup/index.html' + getIsRecording === true. PASSES today. - Assertion 3 (displaySurface): after assertion 2 leaves recording active, attach to offscreen via `waitForOffscreen` + `attachToOffscreen`. Then `offsetPage.evaluate(() => __mokoshTest.getCurrentStream().getVideoTracks()[0].getSettings().displaySurface)` === 'monitor'. PASSES today (per Plan 01-09 D-15-display-surface; the post-grant validation in recorder.ts ensures monitor-only). - Assertion 4 (click-during-recording): record the current offscreen target count, then `page.triggerExtensionAction(ext)` again. Assert: popup state unchanged (still 'src/popup/index.html'); NO new offscreen target spawned (count unchanged). The toolbar click with popup set opens the popup (which the harness can verify via `browser.targets().find(t => t.url().includes('popup/index.html'))` — the popup target appears as a `page` type briefly). PASSES today. - All 4 assertions wired; each carries an inline RED-on-regression demonstration step in its action block: the executor must locally demonstrate the assertion CAN catch a regression before marking the assertion GREEN. 1. Wire assertion 1: replace the "NOT YET IMPLEMENTED" stub with the real logic per behavior. Add a `getIsRecording(sw)` helper to `tests/uat/lib/sw.ts`: ```typescript export async function getIsRecording(sw: WebWorker): Promise { return await sw.evaluate(() => (globalThis as any).isRecording as boolean); } ``` NOTE: this is the ONE site where `as any` is unavoidable — the production code declares `isRecording` as a module-level `let` in `src/background/index.ts:36`, which is NOT exposed on globalThis directly. To read it, we need to evaluate in the SW context AS the SW (which has implicit globalThis access to module-top let-bindings — verify this is true in MV3 SW context; if not, expose `isRecording` via a getter on `__mokoshTest` in `sw-hooks.ts`). Document the choice + rationale inline. (Per RESEARCH §6 contract verification: SW module-level `let` IS accessible as `globalThis.isRecording` in MV3 SW context — verified by probe2. If the executor sees `undefined` returned, fall back to exposing via `__mokoshTest.isRecording` getter from sw-hooks.ts and document the SW-isolation finding.) 2. Wire assertion 2: implementation per behavior. After `triggerExtensionAction`, poll `getBadgeText` for up to 5 seconds — the badge transition is async (offscreen creation + getDisplayMedia + post-grant validation + setRecordingMode all happen in sequence). Use a polling helper from `assertions.ts` or inline: ```typescript async function waitFor(probe: () => Promise, predicate: (v: T) => boolean, timeoutMs: number): Promise { const start = Date.now(); while (Date.now() - start < timeoutMs) { const v = await probe(); if (predicate(v)) return v; await new Promise(r => setTimeout(r, 100)); } throw new Error(`waitFor timeout ${timeoutMs}ms`); } ``` Use this in assertion 2 + 3 + 4. 3. Wire assertion 3: per behavior. The `waitForOffscreen` helper already handles the target wait + asPage; attach once after assertion 2 sets recording=true, then offPage.evaluate the displaySurface read. 4. Wire assertion 4: per behavior. Count `browser.targets()` filtered to offscreen-url-containing BEFORE the second click, then AFTER; assert equality. Also assert popup state unchanged. 5. RED-on-regression demonstration: - For assertion 2: locally insert `chrome.action.onClicked.addListener(async () => { return; })` BEFORE the production listener and re-build:test; assertion 2 should FAIL (badge stays empty). Revert the hack; assertion 2 PASSES. - For assertion 3: locally alter `recorder.ts` to call `getDisplayMedia({ video: true, audio: false })` (without displaySurface constraint) and rebuild; assertion 3 should FAIL (displaySurface defaults to 'browser' OR is undefined depending on Chrome behavior). Revert; PASSES. - The executor commits ONLY the working assertions; the RED demos are local-only verifications. Document each RED demo's outcome in the commit message body. 6. Run `npm run test:uat`: assertions 0+1+2+3+4 PASS; assertions 5-13 still stubbed as RED. Exit non-zero. Diagnostic: "5/14 passed, 9 failed". 7. Run `npx tsc --noEmit` → exit 0. 8. Run full vitest suite → 85 GREEN. npx tsc --noEmit && (set +e; npm run test:uat; test $? -ne 0) - Assertions 0, 1, 2, 3, 4 all PASS in `npm run test:uat`. - Assertions 5-13 still throw "NOT YET IMPLEMENTED". - `npm run test:uat` exits non-zero (because 9 stubs remain). - Diagnostic shows 5/14 passed. - `npx tsc --noEmit` exit 0. - Full vitest suite: 85 GREEN. - Each wired assertion's commit message body cites the RED-demonstration outcome. First 4 functional assertions live and GREEN; harness proves it can verify toolbar + displaySurface + popup-state via CDP. Task 5 (Wave 3 — bundle 2/4): Wire assertions 5, 6, 7 (SAVE_ARCHIVE download + Bug B user-stopped routing + ERROR-path). - tests/uat/harness.test.ts (assertions 1-4 GREEN from Task 4) - tests/uat/lib/{sw,offscreen,zip}.ts (helpers; especially simulateUserStop's BLOCKER-citing comment) - src/background/index.ts lines 725-778 (RECORDING_ERROR handler — Bug B conditional routing) - src/offscreen/recorder.ts lines 451-480 (onUserStoppedSharing — the handler simulateUserStop must trigger) - .planning/debug/resolved/01-09-recovery-flow.md (Bug B debug record — the exact contract assertion 6 verifies) - .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §7 (BLOCKER analysis — track.dispatchEvent is the ONLY valid path) tests/uat/harness.test.ts, tests/uat/lib/sw.ts - Assertion 5 (SAVE_ARCHIVE download): with recording active from prior assertions, `sw.evaluate(() => chrome.runtime.sendMessage({type: 'SAVE_ARCHIVE'}))` triggers the save flow. The download lands in `downloadsDir` (configured at launch via `--user-data-dir` + per-page download behavior, OR via `page._client().send('Browser.setDownloadBehavior', ...)` — RESEARCH didn't deep-dive this; the executor researches the cleanest path). Poll for `*session_report*.zip` appearance in downloadsDir for up to 15s. PASSES today. - Assertion 6 (BUG B): snapshot `notificationCount` via `getNotificationSnapshot(sw)`. Then `simulateUserStop(offPage)`. Wait 300ms (offscreen handler → runtime message → SW handler → state transition is async). Assert: badge text === '' (NOT 'ERR'); popup === '' (NOT 'src/popup/index.html'); isRecording === false; notificationCount delta === 0 (no recovery notification fired for deliberate stop). PASSES today against b9eeeeb. - Assertion 7 (ERROR-path preserved): start a fresh recording (since assertion 6 stopped it). Snapshot notificationCount. Then `sw.evaluate(() => chrome.runtime.sendMessage({type: 'RECORDING_ERROR', error: 'codec-unsupported'}))`. Wait 200ms. Assert: badge text === 'ERR'; notificationCount delta === 1; last notification id starts with 'mokosh-recovery-'. PASSES today. - Each assertion carries the RED-on-regression demonstration; assertion 6's RED demo is the canonical "rewinding b9eeeeb" cycle from the orchestrator brief. 1. Wire assertion 5. Investigate Puppeteer's download path config: `browser.defaultBrowserContext().overridePermissions(...)` for downloads OR `CDP Browser.setDownloadBehavior` with `behavior: 'allow'` + `downloadPath: downloadsDir`. The harness creates `downloadsDir` in the launch helper (e.g. `os.tmpdir() + '/mokosh-uat-downloads-' + Date.now()`). After `sendMessage({type:'SAVE_ARCHIVE'})`, poll the dir for ~15s for any `session_report_*.zip`. Save the path for assertion 13. PASS = file appears + non-zero size. 2. Wire assertion 6 per behavior. Use the existing `simulateUserStop` helper (with its BLOCKER comment intact). The 300ms wait is the propagation budget; if assertions intermittently flake here, bump to 500ms — the offscreen handler is synchronous-into-sendMessage, the SW handler is synchronous-into-setIdleMode, so 300ms is generous but not extravagant. 3. Wire assertion 7 per behavior. Reads `lastNotificationOptions.title` or similar to verify "Mokosh stopped" recovery copy AND `notificationIds[notificationIds.length-1].startsWith('mokosh-recovery-')`. 4. RED-on-regression demonstrations (recorded in commit body): - **Assertion 6 RED demo (THE canonical Bug B regression check)**: locally `git diff HEAD~1 -- src/background/index.ts` to recover the pre-b9eeeeb shape of the RECORDING_ERROR handler (unconditional setErrorMode); APPLY the inverse patch locally (do NOT commit). Rebuild test bundle. Run `npm run test:uat`. Assertion 6 MUST FAIL with diagnostic: "expected badge text '' but got 'ERR'". Revert (`git checkout -- src/background/index.ts`). Rebuild. Re-run. Assertion 6 PASSES. This proves the harness assertion CAN catch a Bug B regression. **Document this end-to-end demo in the commit message body.** - Assertion 5 RED demo: locally comment out the `chrome.downloads.download(...)` call in `src/background/index.ts:saveArchive` and rebuild; assertion 5 should FAIL (timeout waiting for zip). Revert; PASSES. - Assertion 7 RED demo: locally short-circuit the RECORDING_ERROR case to return without calling setErrorMode for codec-unsupported (e.g. early-return on case entry); assertion 7 should FAIL. Revert; PASSES. 5. Run `npm run test:uat`: 8/14 PASS, 6 stubs remain. Exit non-zero. 6. Run `npx tsc --noEmit` → exit 0. Vitest 85 GREEN. npx tsc --noEmit && (set +e; npm run test:uat; test $? -ne 0) - Assertions 0-7 all PASS. - Assertions 8-13 still stubbed RED. - `npm run test:uat` exits non-zero; diagnostic 8/14 passed. - Bug B RED-on-regression demo documented in commit body (mandatory). - `npx tsc --noEmit` exit 0; vitest 85 GREEN. Bug B harness assertion live AND demonstrably catches regression; SAVE_ARCHIVE + ERROR-path coverage live; bug-class root cause (state-machine routing) now CI-callable. Task 6 (Wave 3 — bundle 3/4): Wire assertions 8, 9, 10 (Bug A onStartup notification + icon file sizes + manifest shape). - tests/uat/harness.test.ts (assertions 1-7 GREEN from Tasks 4-5) - src/background/index.ts lines 860-881 (chrome.runtime.onStartup handler — the path Bug A's recovery notification was failing on before a881bf0) - manifest.json (icons declared + notifications permission) - .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §11 (per-assertion implementation hints) - icons/icon{16,48,128}.png (verify presence + size — the floors are 200/500/1024 bytes from the orchestrator brief) tests/uat/harness.test.ts - Assertion 8 (BUG A + onStartup): snapshot notificationCount. Then `sw.evaluate(() => globalThis.__mokoshTest!.handlers.onStartup?.())`. Wait 100ms (synchronous handler, but allow microtask drain). Assert: notificationCount delta === 1; `lastNotificationOptions.iconUrl` matches `/icons\/icon(?:128|48)\.png$/` (the production code uses NOTIFICATION_ICON_PATH = 'icons/icon128.png'); `lastNotificationOptions.title === 'Mokosh ready'`; `notificationIds[notificationIds.length-1].startsWith('mokosh-startup-')`. The PASS condition implies chrome.notifications.create's promise resolved cleanly — if Bug A regressed (icon below floor), Chrome's imageUtil throws and the create call REJECTS, so notificationCount would NOT increment. PASSES today against a881bf0. - Assertion 9 (icon files present + sized): for each of (16, 200), (48, 500), (128, 1024), `sw.evaluate` a fetch of `chrome.runtime.getURL('icons/icon{N}.png')` and read `content-length`. Assert >= floor. PASSES today. - Assertion 10 (manifest shape): `getManifest(sw)`. Assert: `permissions.includes('notifications')`; `icons['16']`, `icons['48']`, `icons['128']` all defined and equal to expected paths. PASSES today. - Each assertion's RED-on-regression demo documented in commit body. 1. Wire assertion 8 per behavior. The `onStartup` handler in production carries inline try/catch around the `chrome.notifications.create` call (per src/background/index.ts:868-877); the hook's notificationCount wrapper increments regardless of create's resolution path. To verify Bug A specifically, ALSO assert that the iconUrl in lastNotificationOptions points to a file that resolves to >= 1024 bytes (cross-check with assertion 9's floor). This catches the Bug A regression EVEN IF a future change wraps the create call in a swallowing try/catch. 2. Wire assertion 9 per behavior. The fetch via sw.evaluate is the cleanest path — Chrome serves extension files from `chrome-extension:///...` and fetch with a `chrome-extension://` URL works in SW context. 3. Wire assertion 10 per behavior. Direct `chrome.runtime.getManifest()` read. 4. RED-on-regression demos (commit body): - **Assertion 8 RED demo (Bug A canonical)**: locally `echo "" > icons/icon128.png` (truncate to 0 bytes). Rebuild test bundle. Run `npm run test:uat`. Assertion 8 should FAIL — Chrome's imageUtil rejects the create call (or the wrapper's lastNotificationOptions snapshot has wrong shape). Restore (`git checkout -- icons/icon128.png`). Rebuild. Re-run. Assertion 8 PASSES. **Document in commit body.** - Assertion 9 RED demo: same truncate; rebuild; assertion 9 should FAIL with "content-length 0 < floor 1024". Restore; PASSES. - Assertion 10 RED demo: locally remove "notifications" from manifest.json permissions and rebuild test bundle; assertion 10 should FAIL. Restore; PASSES. 5. Run `npm run test:uat`: 11/14 PASS, 3 stubs remain (11, 12, 13). 6. `npx tsc --noEmit` exit 0; vitest 85 GREEN. npx tsc --noEmit && (set +e; npm run test:uat; test $? -ne 0) - Assertions 0-10 all PASS. - Assertions 11-13 still stubbed RED. - `npm run test:uat` exits non-zero; diagnostic 11/14 passed. - Bug A RED-on-regression demo documented in commit body (mandatory). - `npx tsc --noEmit` exit 0; vitest 85 GREEN. Bug A harness assertion live AND demonstrably catches regression; icon + manifest coverage live; both Phase-1-escapee bug classes (Bug A + Bug B) now CI-callable. Task 7 (Wave 3 — bundle 4/4): Wire assertions 11, 12, 13 (35s buffer continuity + ffprobe gate + zip shape) — closes the 13-assertion charter. - tests/uat/harness.test.ts (assertions 1-10 GREEN from Tasks 4-6) - tests/uat/lib/zip.ts (the jszip-based archive shape helper) - tests/offscreen/webm-playback.test.ts (the existing ffprobe pattern — FFPROBE_BIN constant, skip-gate helper) - src/background/webm-remux.ts (Plan 01-08's remux helper — what the harness's ffprobe gate validates) - .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §11 (per-assertion implementation hints for 11, 12, 13) tests/uat/harness.test.ts, tests/uat/lib/zip.ts - Assertion 11 (35s buffer continuity): start a fresh recording. Wait 35 seconds (with keepalive pings every 20s per RESEARCH §2). Query the offscreen segments count via offPage.evaluate (the offscreen recorder maintains a `segments` ring; expose it via a `__mokoshTest.getSegmentCount()` getter — ADD this to offscreen-hooks.ts in this task). Assert: segmentCount >= 3 (per D-13: 10s segments × MAX_SEGMENTS=3 = 30s window). PASSES today. - Assertion 12 (ffprobe gate): trigger SAVE_ARCHIVE (reusing the assertion 5 helper). Extract `video/last_30sec.webm` from the produced zip via jszip. Write to a tmpfile. Spawn `ffprobe -v error -f matroska -i ` via execFileSync. Assert exit code 0. (Skip-gate this assertion with a clear "SKIPPED: ffprobe binary not available" diagnostic if `which ffprobe` fails — matches the existing webm-playback.test.ts pattern.) - Assertion 13 (zip shape): jszip parse the same zip. Assert: `video/last_30sec.webm` entry exists + has non-zero size. Assert: `meta.json` entry exists + parsed JSON has `version === ` (read via sw.evaluate at the start of the harness or this assertion). - The 35-second wait pushes the harness runtime past 60s. Add keepalive ping infrastructure (one ping every 20s during the wait) to avoid SW eviction per RESEARCH §2 / Pitfall 5. 1. ADD a `__mokoshTest.getSegmentCount()` getter to `src/test-hooks/offscreen-hooks.ts`. The offscreen recorder has a module-level `segments` array (from D-13 restart-segments); expose a function-level setter alongside `setCurrentStream`: ```typescript // src/test-hooks/offscreen-hooks.ts let currentStream: MediaStream | null = null; let segmentCountGetter: () => number = () => 0; export function setCurrentStream(s: MediaStream | null) { currentStream = s; } export function setSegmentCountGetter(g: () => number) { segmentCountGetter = g; } globalThis.__mokoshTest = { // ... getCurrentStream: () => currentStream, getSegmentCount: () => segmentCountGetter(), }; ``` Update `src/test-hooks/types.ts` to add `getSegmentCount?: () => number` to MokoshTestSurface. In `src/offscreen/recorder.ts`, after the existing `setCurrentStream(stream)` call, add (gated): ```typescript if (import.meta.env.MODE === 'test') { const hooks = await import('../test-hooks/offscreen-hooks'); hooks.setSegmentCountGetter(() => segments.length); } ``` (Where `segments` is the module-level array. If the variable name differs, adapt. Read the file to confirm; commonly named `videoSegments` or `segments`.) 2. Wire assertion 11 per behavior. The 35s wait uses `await new Promise(r => setTimeout(r, 35_000))` with intermittent `await keepalivePing(sw)` every 20s. Use `setInterval` or a polling loop; document the keepalive purpose per RESEARCH §2. 3. Wire assertion 12 per behavior. Reuse the `FFPROBE_BIN` constant pattern from `tests/offscreen/webm-playback.test.ts`. Skip-gate: `if (!existsSync(FFPROBE_BIN)) { console.warn('Assertion 12: ffprobe not available — SKIPPED'); return; }`. The skip-gate is acceptable for assertion 12 because the unit-level tests (Plan 01-08's `tests/background/webm-remux.test.ts`) also have ffprobe gates that cover the same contract — the harness's ffprobe assertion is end-to-end validation, not the primary gate. 4. Wire assertion 13. Pass `expectedVersion = await sw.evaluate(() => chrome.runtime.getManifest().version)` into `assertArchiveShape`. 5. Update Tier-1 grep gate test (`tests/background/no-test-hooks-in-prod-bundle.test.ts`) to ALSO assert ZERO `getSegmentCount` in dist/ (new hook surface added in this task — confirm gate stays GREEN). 6. RED-on-regression demos (commit body): - Assertion 11 RED demo: locally hack `SEGMENT_DURATION_MS = 30_000` in recorder.ts so 35s yields only 1 segment; rebuild; assertion 11 should FAIL. Revert; PASSES. - Assertion 12 RED demo: locally inject a corrupted byte into the remux output (e.g. zero the EBML magic in webm-remux.ts before return); rebuild; assertion 12 should FAIL (ffprobe error). Revert; PASSES. - Assertion 13 RED demo: locally drop `version` from the `meta.json` writer in saveArchive; rebuild; assertion 13 should FAIL. Revert; PASSES. 7. Run `npm run test:uat`: ALL 14 assertions PASS. Exit 0. Diagnostic: "UAT harness: 14/14 assertions passed". 8. `npx tsc --noEmit` → exit 0. `npx vitest run` → 85 GREEN. 9. **Verify Tier-1 grep gate updates:** `npm run build && grep -rln 'getSegmentCount' dist/` → 0 matches. npx tsc --noEmit && npm run test:uat && npx vitest run --reporter=dot && test "$(grep -rln getSegmentCount dist/ 2>/dev/null | wc -l)" = "0" - All 14 assertions PASS in `npm run test:uat`; exit 0. - `npm run test:uat` total runtime ~50-90s (dominated by the 35s assertion 11 wait + the harness setup ~10s + assertion 0's `npm run build` ~10s; skip with `SKIP_PROD_REBUILD=1` for ~70s). - `npx tsc --noEmit` exit 0; vitest 85 GREEN. - Production bundle (`npm run build`): `grep -rln __mokoshTest dist/` → 0; `grep -rln simulateUserStop dist/` → 0; `grep -rln getSegmentCount dist/` → 0. Tier-1 gate remains GREEN. - Each new assertion's RED-on-regression demo documented in commit body. 13-assertion charter complete; harness exits 0 against current Plan 01-09 bundle; Phase 1 functional contract fully CI-callable. Task 8 (Wave 4): Amend Plan 01-09 Task 5 operator checkpoint to redirect functional steps to `npm run test:uat`; update STATE.md decisions; close Plan 01-09 via this plan's harness PASS. - .planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md lines 519-549 (the operator checkpoint that gets amended) - .planning/phases/01-stabilize-video-pipeline/01-09-SUMMARY.md (current closure state) - .planning/STATE.md (Decisions section + Phase 1 Closure Notes) - tests/uat/harness.test.ts (the harness that NOW closes the functional contract) .planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md, .planning/STATE.md - `.planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md` gets an AMENDMENT block at the END of the file (does NOT rewrite the original Task 5 — preserves provenance per project convention from D-A1..D-A6 cascade pattern): ``` ---
  ## Amendment (Phase 01-stabilize-video-pipeline, 2026-05-17) — Plan 01-11 harness retires operator functional steps

  Plan 01-11 (Puppeteer UAT harness) lands a CI-callable replacement for the
  functional verification work in this plan's Task 5. The operator's role is
  reduced to:

  - **Step 1 (build):** unchanged — `npm run build` must exit 0.
  - **Steps 2-13:** REDIRECTED — replaced by `npm run test:uat` exit 0. The
    Puppeteer harness implements 14 assertions (assertion 0 = production-
    bundle hook-leak grep; assertions 1-13 = the original Task 5
    functional checks).
  - **Step 14 (brand/design — implicit in steps 4, 5, 6 of original task):**
    RETAINED for operator. The harness verifies displaySurface === 'monitor'
    + notification fires; it does NOT verify the human-readable copy is
    aesthetically correct OR that the badge color reads cleanly against the
    operator's OS theme. Operator confirms.
  - **Step 15 (genuine error UX):** REDIRECTED — assertion 7 verifies the
    ERROR-path bandwidth.

  **New closure gate:** Plan 01-09 closes when `npm run test:uat` exits 0
  AND operator confirms step 14 (brand/design). The harness's 14/14 PASS
  against current bundle (verified by this plan's Task 7) supplies the
  first half today.
  ```
- `.planning/STATE.md` Decisions section gains a new entry (preserves the existing log; appends rather than rewriting):
  ```
  - [Phase 01-11]: Operator role retirement landed via Puppeteer UAT harness. 14 assertions cover Plan 01-08/01-09 functional contract; operator retained only for brand/design step. `npm run test:uat` = the new CI gate for any Phase 1 SW/offscreen/manifest change. Tier-1 grep gate `tests/background/no-test-hooks-in-prod-bundle.test.ts` enforces zero `__mokoshTest` / `simulateUserStop` / `getSegmentCount` in production `dist/`.
  ```
- This task does NOT modify Plan 01-09's status fields, frontmatter, or original Task 5 body. The amendment is appended after the original `<output>` block (mirroring the CONTEXT.md amendment-append pattern from 2026-05-16).
- Operator (in the closing checkpoint below) confirms brand/design step 14 manually and types "approved" — at which point Plan 01-09 + Plan 01-11 close together.
1. Read `.planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md` to confirm the file structure ends with the `` block (line ~596 based on the file's current shape). 2. Append the amendment block per the behavior description, AFTER the closing `` tag. Use the same horizontal-rule + ## heading + AMENDED-BY metadata convention from CONTEXT.md amendments. Cite the harness path (`tests/uat/harness.test.ts`) and the npm script (`npm run test:uat`). 3. Read `.planning/STATE.md` Decisions section (lines 72-109). 4. Append the new entry to the Decisions list (after the most recent `[Phase 01-07-deferred-to-5]` entry per the convention). Do NOT modify any existing entry. 5. Verify both edits are content-only (no frontmatter changes; no status flips — those happen in the closing checkpoint). 6. Run `npx tsc --noEmit` → exit 0 (paranoia — neither edit touches TS, but baseline). 7. Run `npm run test:uat` → exit 0 (final smoke before the closing checkpoint). 8. Run `npx vitest run` → 85 GREEN. npx tsc --noEmit && grep -q 'Plan 01-11 harness retires operator functional steps' .planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md && grep -q 'Operator role retirement landed via Puppeteer UAT harness' .planning/STATE.md && npm run test:uat && npx vitest run --reporter=dot - `01-09-PLAN.md` ends with the appended amendment block (no edits to the original Task 5 body). - `STATE.md` Decisions section carries the new entry as the last item (no edits to prior entries). - `npm run test:uat` exits 0 (14/14 GREEN). - `npx tsc --noEmit` exit 0; vitest 85 GREEN. Plan 01-09 functional contract redirected to harness; STATE.md decisions log updated; ready for closing checkpoint. Task 9 (Wave 4): Operator confirms `npm run test:uat` exits 0 against current bundle AND confirms brand/design step 14 (Plan 01-09 Task 5 retained step) — closes Plan 01-09 + Plan 01-11. (operator-driven; no files modified by this checkpoint) See below — operator-driven empirical check. The executor must NOT bypass this checkpoint by stubbing harness output. echo "checkpoint:human-verify — see how-to-verify section; resume signal is the gate" Operator types "approved" after running the how-to-verify steps. See for the exact gate. Tasks 1-8 landed: Puppeteer + tsx installed, vite.test.config.ts produces dist-test/, gated test hooks in src/test-hooks/ ship in test bundle and NOT in production bundle (Tier-1 grep gate verifies), Puppeteer harness at tests/uat/harness.test.ts implements 14 assertions, all 14 GREEN against current Plan 01-09 bundle (b9eeeeb Bug B fix + a881bf0 Bug A fix both verified by Bug B + Bug A canonical RED-on-regression demos). Plan 01-09 Task 5 redirected to `npm run test:uat` for functional steps. This checkpoint validates the harness end-to-end against real Chrome AND captures operator's brand/design acceptance for Plan 01-09's retained step 14. 1. **Pre-flight cleanliness:** run `git status` — confirm working tree clean. Any uncommitted local hacks (RED-demo reverts) MUST be reverted BEFORE this step. 2. **Build production:** `npm run build` (must exit 0; this is Plan 01-09 Task 5 step 1). 3. **Build test bundle:** `npm run build:test` (must exit 0). 4. **Run harness:** `npm run test:uat` (must exit 0; runtime ~70-90s). Final output line MUST be exactly `UAT harness: 14/14 assertions passed`. If exit non-zero, paste the structured diagnostic + harness console dump + relevant SW/offscreen console logs; the plan iterates (likely a real bug surfaced). 5. **Re-run for stability:** `npm run test:uat` a second time. Same outcome. (Eliminates first-run flakiness from cold Chrome / cold dist-test cache.) 6. **Tier-1 hook-leak verification:** `grep -rln __mokoshTest dist/` must return 0 matches. Same for `simulateUserStop`, `getSegmentCount`, `setCurrentStream`, `setSegmentCountGetter`. If ANY match, the gate failed silently — STOP and triage. 7. **Local-debug mode smoke:** `HEADLESS=0 npm run test:uat`. Watch the real Chrome window: see the toolbar icon, see the picker auto-accept, see the badge transitions. Same exit 0 outcome. (This is the operator's chance to spot any visual oddity the automated assertions miss.) 8. **Brand/design acceptance (Plan 01-09 Task 5 step 14 — retained for operator):** (a) Badge color readability against your OS theme: red OFF, green REC, yellow ERR should each contrast clearly with the toolbar background. If any is hard to see in light AND dark mode, document for Phase 5 hardening (do NOT block closure on this — file as a deferred item). (b) Notification copy: "Mokosh ready — Click here to start recording your session." reads naturally in en_US. Russian operators may want a localized variant — document for Phase 5 (do NOT block closure on this). (c) Picker UX: confirm Chrome's screen-share picker still surfaces (in headful mode) at the expected moment + with the correct monitor-only options. 9. **If steps 4, 5, 6 all PASS:** Plan 01-09 + Plan 01-11 both close. Type "approved" with any brand/design notes appended. 10. **If step 4 OR 5 FAIL:** paste the failure diagnostic. Likely culprits: locale-specific picker string mismatch (RESEARCH §9 — operator's Chrome may need a different `--auto-select-desktop-capture-source` value); race window in assertion 6 / 11 (try bumping the wait in the relevant assertion). 11. **If step 6 FAILS:** STOP. The Tier-1 hook-leak gate failing means the production bundle contains test code — this is a security regression (T-1-11-01). Do NOT proceed to closure. Open a debug session. 12. **If step 7 surfaces a real UX issue (not just a deferral):** document as a P1/P2 item in STATE.md or a phase-5 backlog file; closure can still proceed IF the issue is non-blocking. Type "approved" after step 9 lands (all gates GREEN + brand/design accepted). If steps 10/11/12 hit, paste the failure mode + operator's Chrome version + locale + OS theme; the plan iterates on the failing piece (likely Task 4-7 for assertion-specific issues; Task 1-2 for hook-leak issues; a fresh debug session for novel failures).

<threat_model>

Trust Boundaries

Boundary Description
Puppeteer driver ↔ Chrome SW (via CDP) The harness pipes CDP commands to the SW context via sw.evaluate. Trust boundary is unchanged at runtime (the SW only accepts the harness's commands because the harness runs inside the Puppeteer-launched Chrome process); but the harness CAN invoke any production SW code path via sw.evaluate, so a malicious or buggy harness could in principle exfiltrate buffered video. Mitigation: harness code is in-tree, code-reviewed via the same pipeline as production.
Test hook surface (__mokoshTest) in production bundle NEW: if tree-shaking fails or the MODE guard is misconfigured, the hook surface ships to production — exposing simulateUserStop, getCurrentStream, captured handler refs to any page that can eval against the SW. THIS IS THE SECURITY-CRITICAL THREAT. Mitigation: Tier-1 grep gate (tests/background/no-test-hooks-in-prod-bundle.test.ts) enforces zero __mokoshTest in dist/; runs as part of npm test so any CI pipeline picks it up.
dev-dependency Chromium binary NEW: Puppeteer downloads ~150 MB Chromium binary at npm install time. Supply-chain compromise of the Chrome download endpoint would inject malicious code into developer machines. Mitigation: package-lock.json integrity check (Puppeteer pins the Chromium download hash via its @puppeteer/browsers dependency). Out of scope: separate SCA for Puppeteer itself.
--auto-select-desktop-capture-source flag in CI NEW: in a CI container, the flag auto-accepts the "Entire screen" source — which is whatever Xvfb (or modern headless surface) presents. If a CI runner is shared with sensitive workloads, the 35-second recording assertion captures whatever is on screen during that window. Mitigation: document that CI MUST run the harness in an isolated container with no concurrent workload; local-dev runs capture the operator's real screen for 35s during assertion 11, documented in README.md.

STRIDE Threat Register

Threat ID Category Component Disposition Mitigation Plan
T-1-11-01 Elevation of Privilege __mokoshTest surface leaking into production dist/ would expose simulateUserStop, captured chrome.* handler refs, and stream getter to any code with access to the SW context mitigate Two layers: (a) gated dynamic import per RESEARCH §6 (the literal 'test' !== 'production' comparison is a static dead branch that Vite/Rollup tree-shake); (b) Tier-1 unit gate tests/background/no-test-hooks-in-prod-bundle.test.ts greps the BUILT artifact for __mokoshTest / simulateUserStop / getSegmentCount / setCurrentStream / setSegmentCountGetter — ZERO matches required for GREEN. Belt + suspenders catches both tree-shake regression AND new hook-name additions.
T-1-11-02 Information Disclosure 35-second recording assertion captures whatever is on the operator's screen during local-dev runs accept Operator-facing — local-dev runs are by definition under operator control; the recording is consumed only by ffprobe + jszip inside the harness process and is deleted with the temp downloads dir at process exit. CI runs document the isolated-container requirement in README.md.
T-1-11-03 Tampering Puppeteer downloads Chromium binary at npm install; supply-chain compromise of the download endpoint accept package-lock.json pins resolved hashes via Puppeteer's @puppeteer/browsers machinery. Same risk surface as any npm dependency. Phase 5 SCA work (out of scope here) covers periodic re-verification.
T-1-11-04 Denial of Service A pathological assertion 11 (35s wait) ties up CI runner time; combined with 14 sequential assertions, total runtime ~90s ties up a runner slot accept 90s is well within typical CI per-job budgets. Local-dev runs use SKIP_PROD_REBUILD=1 to drop assertion 0's npm run build cost (~10s). Out of scope: parallelizing assertions (would require multi-browser instances, defeating the failure-isolation choice).
T-1-11-05 Repudiation The harness asserts the absence of recovery notification (Bug B path), but the assertion is a count-delta check — a notification fired BEFORE the snapshot would be invisible mitigate Each assertion snapshots notificationCount IMMEDIATELY before the trigger event AND immediately after the propagation wait. The delta is checked, not the absolute count. The notificationIds array is also asserted on for ID-prefix membership — even if delta counting were fooled by some interleaving, the absence of a 'mokosh-recovery-' prefix in the post-snapshot ids array catches the same regression.
T-1-11-06 Spoofing Harness reads __mokoshTest.handlers.onStartup and invokes it; a hostile production change could swap in a no-op handler that registers AFTER the hook captures the real handler mitigate The hook monkey-patches addListener AT THE TOP OF THE MODULE (before any production addListener calls). Any later addListener invocation still goes through the patched function and would OVERWRITE handlers.onStartup, not bypass. A malicious bypass would require directly calling chrome.runtime.onStartup.addListener.call(...) via a saved bound reference — none exist in the production tree (verified by grep `addListener.call
</threat_model>
- `npm run test:uat` exits 0 against the current Plan 01-09 bundle; final line is exactly `UAT harness: 14/14 assertions passed`. - `npm run build` exit 0; `grep -rln __mokoshTest dist/` returns 0; `grep -rln simulateUserStop dist/` returns 0; `grep -rln getSegmentCount dist/` returns 0. - `npm run build:test` exit 0; `dist-test/` populated; `grep -rln __mokoshTest dist-test/` returns ≥1. - `npx vitest run` exit 0; 85 GREEN across all test files (83 baseline + 2 from Task 1's Tier-1 grep gate). - `npx tsc --noEmit` exit 0 across `src/` + `tests/`. - Tier-1 SW-bundle-import gate (`tests/background/sw-bundle-import.test.ts`) GREEN — verifies the gated dynamic import does not break production module init. - Tier-1 hook-leak gate (`tests/background/no-test-hooks-in-prod-bundle.test.ts`) GREEN — verifies the production bundle is hook-free. - Bug B canonical RED-on-regression demo documented in Task 5's commit body (locally reverting b9eeeeb makes assertion 6 RED; re-applying makes GREEN). - Bug A canonical RED-on-regression demo documented in Task 6's commit body (locally truncating icons/icon128.png makes assertions 8 + 9 RED; restoring makes GREEN). - Plan 01-09 Task 5 amended at the end of its PLAN.md (no rewrite of the original body); STATE.md Decisions log carries the new Plan 01-11 entry. - Operator confirms brand/design step 14 + types "approved" in Task 9.

<success_criteria> Plan 01-11 is complete when:

  1. Two-bundle separation lives. npm run build produces hook-free dist/; npm run build:test produces hook-enabled dist-test/. The Tier-1 grep gate enforces the production bundle's hook absence.
  2. All 14 harness assertions pass against the current Plan 01-09 bundle. npm run test:uat exits 0; final line is UAT harness: 14/14 assertions passed.
  3. Both Phase-1-escapee bugs are now CI-callable. Assertion 6 (Bug B state-machine routing) and Assertion 8 (Bug A icon-promoted notification) each have a RED-on-regression demo documented in their respective task's commit body, proving the harness assertion CAN catch a regression — not just pass under current conditions.
  4. Operator role retired for functional verification. Plan 01-09 Task 5 steps 4-13 + 15 redirect to npm run test:uat; only step 1 (build) + step 14 (brand/design) retained. The amendment block in 01-09-PLAN.md preserves provenance (no rewrite of the original task).
  5. Existing 83 vitest tests remain GREEN. Plus the 2 new Tier-1 gate tests in this plan = 85 total. No regression.
  6. npx tsc --noEmit exit 0. All harness code + hook code type-clean.
  7. npm run build exit 0; npm run build:test exit 0. Both production and test bundles emit cleanly.
  8. Operator confirms Task 9 brand/design acceptance + types "approved". Plan 01-09 + Plan 01-11 close together. </success_criteria>
After completion, create `.planning/phases/01-stabilize-video-pipeline/01-11-SUMMARY.md` per the standard template. Cite: - The 14 assertions landed GREEN (0: prod-bundle hook-leak grep gate; 1-13: functional contract from orchestrator brief). - Both RED-on-regression canonical demos documented in commit bodies (Bug B for assertion 6; Bug A for assertion 8). - The two-bundle separation (`dist/` vs `dist-test/`) verified by Tier-1 grep gate. - npm script additions (`build:test` + `test:uat`); dev-dep additions (puppeteer + tsx) with resolved versions. - Hook surface inventory (`__mokoshTest`: handlers, notification observables, getCurrentStream, getSegmentCount) + the gated dynamic import sites in `src/background/index.ts` + `src/offscreen/recorder.ts`. - Plan 01-09 amendment block landed (Task 5 functional steps redirected; brand/design step retained). - STATE.md decision log updated with the operator-retirement decision. - Open questions resolved (5 from RESEARCH) + their resolutions; any new open questions surfaced during execution. - Bundle-size delta (`dist/` before vs after; should be near-zero since gated dynamic imports tree-shake cleanly). - Total harness runtime ranges observed (cold: ~90s including build steps; warm with SKIP_PROD_REBUILD=1: ~70s; the 35-second assertion 11 wait dominates).