---
amended_at: "2026-05-18"
amendment: "A"
amendment_summary: "Wave 1 T2 + Wave 2 T3 + Wave 3 method guidance superseded. MV3 SW blocks dynamic import (verified empirically); SW-side test hooks DROPPED. Replaced with extension-internal harness page architecture (proven by c647f61 prototype). See 01-11-PLAN-AMENDMENT-A.md."
phase: 01-stabilize-video-pipeline
plan: 11
type: tdd
wave: 4
depends_on:
- 01-08
- 01-09
files_modified:
- package.json
- package-lock.json
- vite.test.config.ts
- tsconfig.json
- src/background/index.ts
- src/offscreen/recorder.ts
- src/test-hooks/sw-hooks.ts
- src/test-hooks/offscreen-hooks.ts
- src/test-hooks/types.ts
- tests/uat/harness.test.ts
- tests/uat/lib/launch.ts
- tests/uat/lib/extension.ts
- tests/uat/lib/sw.ts
- tests/uat/lib/offscreen.ts
- tests/uat/lib/assertions.ts
- tests/uat/lib/zip.ts
- tests/uat/lib/test-hook-contract.d.ts
- tests/uat/README.md
- tests/background/no-test-hooks-in-prod-bundle.test.ts
- .planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md
autonomous: false
requirements:
- REQ-uat-harness-puppeteer
- REQ-uat-bug-A-coverage
- REQ-uat-bug-B-coverage
- REQ-uat-two-bundle
- REQ-uat-ci-friendly
- REQ-uat-13-assertions
- REQ-video-ring-buffer
tags:
- puppeteer
- uat
- harness
- e2e
- mv3-extension
- getDisplayMedia
- bug-B
- bug-A
- tier-1
- two-bundle
must_haves:
truths:
- "`npm run build:test` produces `dist-test/` with `__mokoshTest` hook surfaces injected into SW + offscreen contexts; `npm run build` produces `dist/` with ZERO occurrences of `__mokoshTest` (grep-verifiable)."
- "`npm run test:uat` orchestrates `build:test` + the Puppeteer harness end-to-end; exits 0 only when ALL 14 assertions pass (13 from the brief + assertion 0 = production-bundle hook-leak grep gate)."
- "Bug B harness assertion (track.dispatchEvent('ended') → badge OFF + popup '' + isRecording=false + NO recovery notification) demonstrably catches a regression: rewinding the b9eeeeb conditional routing locally turns this assertion RED; reapplying turns it GREEN."
- "Bug A harness assertion (onStartup → chrome.notifications.create resolves cleanly with the manifest's icon48.png iconUrl) demonstrably catches a regression: stubbing the icon48 file to <100 bytes turns this assertion RED; restoring turns it GREEN."
- "Harness runs in `--headless=new` for CI portability; local-debug mode supported via `HEADLESS=0`; no Xvfb required (per RESEARCH §3 empirical probes against Chrome 148)."
- "Test hooks live ONLY behind `import.meta.env.MODE === 'test'` guarded dynamic imports; Vite tree-shakes them from the production bundle; the no-test-hooks-in-prod-bundle.test.ts unit gate enforces this in the existing vitest suite (Tier-1 alongside sw-bundle-import.test.ts)."
- "Existing 83 vitest tests remain GREEN after this plan lands (no regression to the unit test bed)."
- "Plan 01-09 functional contract closes by harness PASS: its Task 5 operator-checkpoint amendment redirects to `npm run test:uat` for steps 4-13 + 15; operator retains only step 1 (build) + step 14 (brand/design check)."
artifacts:
- path: "vite.test.config.ts"
provides: "Vite config extending the production config; sets `mode: 'test'`, `build.outDir: 'dist-test'`, `build.emptyOutDir: true`."
contains: "dist-test"
- path: "src/test-hooks/types.ts"
provides: "Shared TS type declaring `globalThis.__mokoshTest` shape (handlers, getCurrentStream, simulateUserStop, notificationCount, lastNotificationOptions). Single source of truth for SW + offscreen + harness."
contains: "__mokoshTest"
- path: "src/test-hooks/sw-hooks.ts"
provides: "SW-side test hook: captures chrome.action.onClicked / chrome.runtime.onStartup / chrome.notifications.onClicked handler refs; wraps chrome.notifications.create to record notificationCount + lastNotificationOptions. Imported dynamically from src/background/index.ts under `import.meta.env.MODE === 'test'` guard."
contains: "handlers"
- path: "src/test-hooks/offscreen-hooks.ts"
provides: "Offscreen-side test hook: exposes the current MediaStream via getter; provides simulateUserStop wrapping `track.dispatchEvent(new Event('ended'))` per RESEARCH §7. Imported dynamically from src/offscreen/recorder.ts under `import.meta.env.MODE === 'test'` guard."
contains: "simulateUserStop"
- path: "src/background/index.ts"
provides: "Adds a single `if (import.meta.env.MODE === 'test') { await import('../test-hooks/sw-hooks'); }` block at top-of-module so the hook registration runs BEFORE any production addListener calls (capturing every handler)."
contains: "import.meta.env.MODE"
- path: "src/offscreen/recorder.ts"
provides: "Adds an `if (import.meta.env.MODE === 'test') { __sharedRefs.setMediaStreamGetter(() => mediaStream); }` block (the import itself is gated; the getter wires the runtime mediaStream reference into the hook surface). Same guard pattern as SW."
contains: "import.meta.env.MODE"
- path: "tests/uat/harness.test.ts"
provides: "Single Node script (run under tsx) implementing all 14 assertions sequentially. ~400 LoC. Top-to-bottom narrative — launch, click, assert, simulate Bug B, simulate Bug A, etc. Returns exit 0 on full pass, non-zero on any failure with structured diagnostic dump."
min_lines: 350
- path: "tests/uat/lib/launch.ts"
provides: "puppeteer.launch wrapper: builds args, sets enableExtensions to absolute dist-test path, chooses headless mode per CI env, configures downloads dir, exports a single launchHarnessBrowser() function."
- path: "tests/uat/lib/extension.ts"
provides: "Helpers to resolve the extension id, attach to the SW target, attach to the offscreen target (background_page type per RESEARCH §4 / Pitfall 1)."
- path: "tests/uat/lib/sw.ts"
provides: "SW context helpers: getBadgeText, getPopup, getManifestIcons, fireOnStartup (via captured handler ref), sendSyntheticRecordingError, keepalivePing."
- path: "tests/uat/lib/offscreen.ts"
provides: "Offscreen context helpers: waitForOffscreenTarget, getDisplaySurface, simulateUserStop (the dispatchEvent('ended') path per RESEARCH §7 BLOCKER finding)."
- path: "tests/uat/lib/assertions.ts"
provides: "Per-assertion helpers (assertEqual + structured diagnostic on failure); a runWithStartupDiagnostics wrapper that captures SW + offscreen console logs and dumps them on assertion failure for triage."
- path: "tests/uat/lib/zip.ts"
provides: "jszip-based archive shape assertions; reads downloaded `session_report_*.zip`, asserts `video/last_30sec.webm` present + `meta.json` carries `version === chrome.runtime.getManifest().version` (extension-side version read passed in)."
- path: "tests/uat/lib/test-hook-contract.d.ts"
provides: "Mirror of src/test-hooks/types.ts in TS-declaration form for the harness side; documents the wire contract between hook injector and harness consumer."
- path: "tests/uat/README.md"
provides: "How to run: `npm run test:uat`; local-debug headful mode via `HEADLESS=0`; CI semantics; troubleshooting (locale-specific picker string, Xvfb fallback if a future Chrome regresses headless, dev-dependency Chromium binary size note)."
- path: "tests/background/no-test-hooks-in-prod-bundle.test.ts"
provides: "Tier-1 unit-level grep gate (cousin of sw-bundle-import.test.ts): runs `npm run build` then asserts ZERO occurrences of `__mokoshTest` and ZERO occurrences of `simulateUserStop` in any file under `dist/`. RED today (the test runs before this plan lands its hook gating); GREEN after Task 1 verifies the gate AND the hook gating is correct."
- path: "package.json"
provides: "Adds `puppeteer` ^25.0.2 + `tsx` ^4 to devDependencies; adds two npm scripts: `build:test` (`tsc && vite build --mode test --config vite.test.config.ts`) and `test:uat` (`npm run build:test && tsx tests/uat/harness.test.ts`)."
contains: "test:uat"
- path: "tsconfig.json"
provides: "Includes `src/test-hooks/**/*` in compilation surface (so tsc validates the hook code). NO change to emit (vite handles bundling)."
- path: ".planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md"
provides: "AMENDMENT block at the end of the file: redirects Plan 01-09 Task 5 operator-checkpoint steps 4-13 + 15 to `npm run test:uat` (this plan's harness). Operator retains step 1 (build) + step 14 (brand/design accept) only. Plan 01-09 closes when `npm run test:uat` exits 0 AND operator confirms brand/design step 14."
contains: "Plan 01-11 amendment"
key_links:
- from: "tests/uat/harness.test.ts"
to: "tests/uat/lib/launch.ts:launchHarnessBrowser"
via: "import"
pattern: "import.*from.*lib/launch"
- from: "tests/uat/lib/launch.ts"
to: "puppeteer.launch"
via: "enableExtensions + headless + autoSelect flag"
pattern: "enableExtensions"
- from: "src/background/index.ts"
to: "src/test-hooks/sw-hooks.ts"
via: "guarded dynamic import"
pattern: "import\\.meta\\.env\\.MODE === ['\"]test['\"]"
- from: "src/offscreen/recorder.ts"
to: "src/test-hooks/offscreen-hooks.ts"
via: "guarded dynamic import + setMediaStreamGetter wire"
pattern: "import\\.meta\\.env\\.MODE === ['\"]test['\"]"
- from: "tests/uat/lib/offscreen.ts:simulateUserStop"
to: "track.dispatchEvent(new Event('ended'))"
via: "evaluate-in-offscreen-page on __mokoshTest.getCurrentStream().getVideoTracks()[0]"
pattern: "dispatchEvent\\(new Event\\(['\"]ended['\"]"
- from: "tests/background/no-test-hooks-in-prod-bundle.test.ts"
to: "dist/ artifact tree"
via: "post-build grep for __mokoshTest + simulateUserStop"
pattern: "grep.*__mokoshTest.*dist"
---
## Scope Sanity Note
**4 waves, 8 tasks, 18 file artifacts.** This sits at the upper end of the "split signal" threshold but consolidating is the right call:
1. The test infrastructure (Wave 0), the hook gating (Wave 1), the harness scaffolding (Wave 2), and the 14 assertions (Wave 3) are tightly coupled at the contract level — splitting them into separate plans would force the harness contract (the `__mokoshTest` shape) to be re-derived in each plan's frontmatter `must_haves`, multiplying the duplication tax.
2. Per RESEARCH §6, the two-bundle gate (`__mokoshTest` ABSENT in production) is the security-critical mitigation for shipping test hooks. That gate MUST be wired in the same plan that adds the hooks; splitting would create a window where the hooks exist but the gate doesn't.
3. Wave 4 (closure) is a single checkpoint task — bundling it with Wave 3 wouldn't change context cost meaningfully, and separating it keeps the operator-checkpoint scope visible in the wave structure.
4. Context budget: Wave 0 + Wave 1 + Wave 2 ~30%; Wave 3 ~35%; Wave 4 ~5% (checkpoint). Total ~70%. Above the 50% target — but the 14 assertions are deterministic and template-shaped, so per-assertion authoring cost is sub-linear once Wave 2 lands.
**If a future revision DOES force a split,** natural cut line: Plan 01-11A = Waves 0+1+2 (infrastructure + first 4 assertions as smoke); Plan 01-11B = Waves 3+4 (remaining 10 assertions + closure). This split incurs the contract-duplication tax and is NOT recommended absent a context-cost regression.
Build a Puppeteer-driven Node UAT harness that retires the operator-as-assertion-library role. Plan 01-09's Task 5 took 4-6 hours of operator empirical UAT cycles (Bug A icons + Bug B state routing both escaped vitest unit coverage); every "visual" check in that task has a CDP-callable equivalent. This plan automates them.
Three coordinated changes:
1. **Two-bundle separation** via `vite.test.config.ts` extending the production config with `mode: 'test'` + `outDir: 'dist-test'`. Production builds stay hook-free.
2. **Test hooks** in `src/test-hooks/` consumed via guarded dynamic imports from SW + offscreen. The dynamic-import-inside-MODE-guard pattern (RESEARCH §6) lets Vite tree-shake the hook MODULES entirely from production, with a Tier-1 grep gate (`tests/background/no-test-hooks-in-prod-bundle.test.ts`) verifying the absence.
3. **Puppeteer harness** at `tests/uat/harness.test.ts` (plus a `lib/` helper split following MetaMask's POM shape per RESEARCH §5) implementing 14 assertions: assertion 0 (production-bundle hook-leak grep gate) + assertions 1-13 from the orchestrator brief. Bug B uses `track.dispatchEvent(new Event('ended'))` per RESEARCH §7 BLOCKER — NOT `track.stop()` which silently invalidates the assertion.
Operator role retirement: Plan 01-09's Task 5 is amended to redirect steps 4-13 + 15 to `npm run test:uat`. Operator retains only step 1 (build verification) + step 14 (brand/design acceptance). All functional gates move to CI-callable harness.
Output:
- `vite.test.config.ts` — production config extension with `mode: 'test'` + `outDir: 'dist-test'`.
- `src/test-hooks/{sw-hooks,offscreen-hooks,types}.ts` — gated hook modules.
- `src/background/index.ts` + `src/offscreen/recorder.ts` — gated dynamic import block (one line each + a `setMediaStreamGetter` wire in offscreen).
- `tests/uat/harness.test.ts` + `tests/uat/lib/*.ts` + `tests/uat/README.md` — harness + helpers.
- `tests/background/no-test-hooks-in-prod-bundle.test.ts` — Tier-1 unit-level hook-leak gate.
- `package.json` — `puppeteer`, `tsx` devDeps + `build:test`, `test:uat` scripts.
- `tsconfig.json` — includes `src/test-hooks/**/*` for type-checking.
- `.planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md` — amendment block redirecting Task 5 functional steps to `npm run test:uat`.
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/REQUIREMENTS.md
@.planning/phases/01-stabilize-video-pipeline/01-CONTEXT.md
@.planning/phases/01-stabilize-video-pipeline/01-08-PLAN.md
@.planning/phases/01-stabilize-video-pipeline/01-08-SUMMARY.md
@.planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md
@.planning/phases/01-stabilize-video-pipeline/01-09-SUMMARY.md
@.planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md
@.planning/debug/resolved/01-09-recovery-flow.md
@src/background/index.ts
@src/offscreen/recorder.ts
@manifest.json
@vite.config.ts
@tsconfig.json
@package.json
@tests/background/sw-bundle-import.test.ts
### Puppeteer 25.0.2 extension API surface (RESEARCH §1, empirically verified)
```typescript
import puppeteer, { Browser, Extension, Page, Target } from 'puppeteer';
const browser: Browser = await puppeteer.launch({
pipe: true,
enableExtensions: ['/abs/path/to/dist-test'], // string[] or true
headless: process.env.HEADLESS !== '0', // default headless=true; local debug HEADLESS=0
args: [
'--no-sandbox',
'--auto-select-desktop-capture-source=Entire screen', // RESEARCH §9 — locale-specific
// DO NOT add --use-fake-ui-for-media-stream (per RESEARCH §9 Pitfall, conflicts with auto-select)
],
});
const extensions = await browser.extensions(); // Map
const [extId, ext] = [...extensions][0];
const swTarget = await browser.waitForTarget(
(t: Target) => t.type() === 'service_worker',
{ timeout: 10_000 },
);
const sw = await swTarget.worker(); // WebWorker — has .evaluate()
const page = await browser.newPage();
await page.goto('about:blank');
await page.triggerExtensionAction(ext); // simulates toolbar click (NEEDS popup === '')
// Offscreen page — RESEARCH §4 / Pitfall 1: target type 'background_page' NOT 'page'
const offTarget = browser.targets().find((t) =>
t.type() === 'background_page' && t.url().includes('offscreen'),
);
const offPage = await offTarget.asPage(); // NOT .page() — only .asPage() works
```
### Chrome SW state surface (read via sw.evaluate)
```typescript
// Read badge text
const badge = await sw.evaluate(() => chrome.action.getBadgeText({}));
// Read popup
const popup = await sw.evaluate(() => chrome.action.getPopup({}));
// Read manifest
const manifest = await sw.evaluate(() => chrome.runtime.getManifest());
// manifest.icons === { '16': 'icons/icon16.png', '48': '...', '128': '...' }
// manifest.permissions includes 'notifications', etc.
// Synthesize RECORDING_ERROR (no hook needed — goes through onMessage handler)
await sw.evaluate(() =>
chrome.runtime.sendMessage({ type: 'RECORDING_ERROR', error: 'codec-unsupported' }),
);
// Invoke onStartup via captured handler ref (needs hook — see sw-hooks.ts)
await sw.evaluate(() => globalThis.__mokoshTest!.handlers.onStartup?.());
// Fetch an extension file and check size
const iconSize = await sw.evaluate(async () => {
const r = await fetch(chrome.runtime.getURL('icons/icon48.png'));
return r.ok ? Number(r.headers.get('content-length') ?? '0') : -1;
});
```
### Offscreen surface (read via offPage.evaluate)
```typescript
// Read displaySurface — RESEARCH §11 Req 3
const ds = await offPage.evaluate(() =>
globalThis.__mokoshTest!.getCurrentStream!()?.getVideoTracks()[0]?.getSettings().displaySurface ?? null,
);
// Simulate user-stopped — RESEARCH §7 BLOCKER. MUST be dispatchEvent, NOT track.stop().
await offPage.evaluate(() => {
const stream = globalThis.__mokoshTest!.getCurrentStream!();
if (stream === null) throw new Error('no current stream — recording must be active');
const track = stream.getVideoTracks()[0];
track.dispatchEvent(new Event('ended'));
// Track still readyState 'live' after dispatch; production handler will
// call stream.getTracks().forEach(t => t.stop()) which DOES release the
// capture (just doesn't refire 'ended' on the same track — spec).
});
```
### Test hook contract (NEW — src/test-hooks/types.ts)
```typescript
// src/test-hooks/types.ts
// SINGLE SOURCE OF TRUTH for the __mokoshTest wire shape.
// Imported by sw-hooks.ts (registers), offscreen-hooks.ts (registers),
// and tests/uat/lib/test-hook-contract.d.ts (consumes — mirror).
export interface MokoshTestSurface {
// SW handler refs (captured by sw-hooks.ts monkey-patching addListener)
handlers: {
onClicked: ((tab: chrome.tabs.Tab) => void | Promise) | null;
onStartup: (() => void | Promise) | null;
notificationOnClicked: ((notificationId: string) => void | Promise) | null;
};
// SW notification observability
notificationCount: number;
lastNotificationOptions: chrome.notifications.NotificationOptions | null;
notificationIds: ReadonlyArray;
// Offscreen getCurrentStream — undefined in SW context; defined in offscreen.
// Always-present in the type to keep the harness side simple; runtime null is
// the "not currently recording" signal.
getCurrentStream?: () => MediaStream | null;
}
declare global {
// eslint-disable-next-line no-var
var __mokoshTest: MokoshTestSurface | undefined;
}
export {};
```
### Production hook-gate pattern (src/background/index.ts top-of-module)
```typescript
// AT THE VERY TOP of src/background/index.ts, BEFORE any addListener calls.
// import.meta.env.MODE is statically replaced at build time by Vite (RESEARCH §6);
// the entire `if` block + its dynamic import are tree-shaken from production bundles
// because the literal === comparison resolves to `false` and Rollup deletes the
// unreachable branch.
if (import.meta.env.MODE === 'test') {
await import('../test-hooks/sw-hooks');
}
```
**CRITICAL ORDERING:** the hook import MUST run BEFORE any production `addListener` calls so the monkey-patches catch the handlers as they register. Top-of-module placement satisfies this.
### Production hook-gate pattern (src/offscreen/recorder.ts)
```typescript
// Top-of-module: register the hook.
if (import.meta.env.MODE === 'test') {
await import('../test-hooks/offscreen-hooks');
}
// Later, INSIDE startRecording after `mediaStream = stream;` (line ~247):
// Wire the runtime mediaStream reference into the hook. The hook's
// getCurrentStream getter reads through this wire. Gated identically so
// production bundle has zero hook reference at this site.
if (import.meta.env.MODE === 'test') {
globalThis.__mokoshTest?.getCurrentStream; // no-op read — actual wiring is in offscreen-hooks.ts setup
// The hook installs its own getter at registration time via a closure capture of
// a `currentStream` cell that we mutate here:
const hooks = await import('../test-hooks/offscreen-hooks');
hooks.setCurrentStream(stream);
}
```
(Note: the executor may flatten this — the simpler shape is to expose a `setCurrentStream` function from offscreen-hooks.ts that the recorder calls after assignment. The hook-side closes over a mutable `currentStream` variable. See Task 2 step 5.)
### Vite test config skeleton (vite.test.config.ts)
```typescript
import { defineConfig, mergeConfig } from 'vite';
import baseConfig from './vite.config';
export default defineConfig(() =>
mergeConfig(baseConfig, {
mode: 'test',
build: {
outDir: 'dist-test',
emptyOutDir: true,
},
}),
);
```
### npm scripts to add (package.json)
```jsonc
{
"scripts": {
"dev": "vite",
"build": "tsc && vite build",
"build:test": "tsc && vite build --mode test --config vite.test.config.ts",
"preview": "vite preview",
"test": "vitest run",
"test:uat": "npm run build:test && tsx tests/uat/harness.test.ts"
}
}
```
### Existing surfaces the executor must NOT alter (regression risk)
- `src/background/index.ts` lines 725-778 (RECORDING_ERROR conditional routing) — Bug B fix landed at b9eeeeb; harness asserts this is intact.
- `src/offscreen/recorder.ts` lines 451-480 (`onUserStoppedSharing`) — Bug B handler; harness assertion 6 verifies the dispatchEvent path reaches it.
- `tests/background/sw-bundle-import.test.ts` — Tier-1 gate; the new `no-test-hooks-in-prod-bundle.test.ts` follows the same pattern but inspects the BUILT artifact for hook leaks.
- `manifest.json` — already declares `notifications` permission + all 3 icon sizes; harness assertions 8, 9, 10 read these as-is.
- ALL existing 83 vitest tests — must remain GREEN.
### Resolved open questions from RESEARCH (5)
| # | Question | Resolution | Rationale |
|---|----------|------------|-----------|
| 1 | Where does `simulateUserStop` shim live? | `src/test-hooks/offscreen-hooks.ts` exports a `setCurrentStream(stream: MediaStream)` setter the recorder calls after assignment. The hook's `__mokoshTest.getCurrentStream` is a getter over the captured cell. `simulateUserStop` is harness-side (in `tests/uat/lib/offscreen.ts`) calling `dispatchEvent` directly on the track returned by `getCurrentStream()` — the offscreen-hooks side just exposes the stream; the simulate function is harness-side. | Minimum surface in production tree; the dispatchEvent invocation is harness-side so it's never bundled. |
| 2 | Notification assertions: count vs set-membership? | **Count + set-membership combined.** notificationCount asserts on TOTAL count (e.g. assertion 8: exactly 1 startup notification). notificationIds asserts on prefix membership (e.g. "an id starting 'mokosh-startup-' was created"). lastNotificationOptions asserts on iconUrl shape. | Pure count is brittle (retries inflate); pure set-membership misses overcount regressions. Combined assertions catch both. |
| 3 | CI plumbing scope: include or defer? | **Defer to Phase 5 (P1/P2 hardening) or its own Plan 01-12.** This plan ships a CI-callable harness (`npm run test:uat` exits 0 on pass, non-zero on fail) but no GitHub Actions wiring. Rationale: no existing CI infrastructure in the repo (verified — no `.github/workflows/` directory); adding CI here would force a CI-tool decision (Actions vs self-hosted) that is out of scope for Phase 1 stabilization. | Lowest-friction shipping; CI tool selection deserves its own plan. |
| 4 | Failure isolation: single browser vs per-assertion restart? | **Single browser, serial assertions.** Restart between assertions = ~3-5 s × 14 = 60+ s overhead per run. Single browser keeps total runtime under 60 s. Mitigation: structured diagnostic dump on first failure (SW console logs + offscreen console logs + screenshot) + `--bail` semantics (abort remaining assertions to keep failure mode unambiguous). | RESEARCH §5 recommendation matches; cost of state bleed is much lower than cost of state isolation overhead for 14 deterministic checks. |
| 5 | Test-hook contract location? | **Both.** Production-side canonical: `src/test-hooks/types.ts` (the file that ships with the test bundle and is type-checked by tsc). Harness-side mirror: `tests/uat/lib/test-hook-contract.d.ts` (decoupled from the production tree so the harness has no `import` reaching into `src/`). The mirror file's preamble cites the production-side file as the canonical source. Drift detection: a Tier-1-style test could later snapshot-diff the two; out of scope here, but documented as a follow-up note. | Type duplication is a small price for keeping `tests/` and `src/` import-separable. The drift risk is low because the shape is small (4 fields). |
### How to test Bug B without committing the revert
Per orchestrator brief ("rewinding the b9eeeeb conditional routing locally turns this assertion RED"):
1. Locally apply: `git apply <<'EOF' ... EOF` containing a temporary patch that reverts the `if (errorCode === 'user-stopped-sharing')` branch (so all errors route through `setErrorMode`).
2. Run `npm run test:uat`; assertion 6 (Bug B) MUST fail with a specific diagnostic (`expected badge text '' but got 'ERR'`).
3. Revert the local patch (`git checkout -- src/background/index.ts`).
4. Re-run `npm run test:uat`; assertion 6 MUST pass.
This RED-on-known-broken / GREEN-on-known-good cycle is the TDD discipline for the harness ITSELF. Each assertion in Task 5/6/7 includes this self-verification step in its action block.
Task 1 (Wave 0): Install Puppeteer + tsx; add `vite.test.config.ts`; add `build:test` + `test:uat` npm scripts; commit Tier-1 hook-leak grep gate as RED.
- package.json (existing scripts + devDeps — confirm puppeteer + tsx absent)
- vite.config.ts (the base config the new test config will merge over)
- tests/background/sw-bundle-import.test.ts (Tier-1 gate pattern to mirror)
- tsconfig.json (confirm `include` covers `src/**/*` — needed for src/test-hooks/)
- .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §10 (two-bundle build orchestration)
package.json, package-lock.json, vite.test.config.ts, tsconfig.json, tests/background/no-test-hooks-in-prod-bundle.test.ts
- `npm install --save-dev puppeteer@^25.0.2 tsx@^4` lands cleanly. Both publish to npm registry as MIT-licensed packages with active maintenance windows (puppeteer 25.0.2 published 2025; tsx 4.x current). Pin both with caret ranges per project convention.
- `vite.test.config.ts` exists, extends `./vite.config.ts` via `mergeConfig`, sets `mode: 'test'` + `build.outDir: 'dist-test'` + `build.emptyOutDir: true`. Running `npx vite build --config vite.test.config.ts --mode test` produces `dist-test/` (verifiable via `test -d dist-test`).
- `package.json` `scripts` block adds `build:test` and `test:uat` per the interfaces block. `npm run build:test` exits 0 and produces `dist-test/`.
- `tsconfig.json` `include` covers `src/test-hooks/**/*` (verify it does already via the `src/**/*` glob; no edit needed if `include` is already that wildcard — check first and only add if absent).
- `tests/background/no-test-hooks-in-prod-bundle.test.ts` exists with TWO `it` blocks:
(a) After `npm run build`, ZERO occurrences of `__mokoshTest` in any file under `dist/`. RED today because the gate test is committed BEFORE the hooks land — the gate is asserting on a not-yet-extant invariant. **CORRECTION:** RED-then-GREEN polarity here is inverted vs typical TDD: the gate ITSELF is GREEN today (no hooks → no leak), but the GATE must REMAIN GREEN after Task 2 lands the hooks. The test is committed in this task so the gate is operational BEFORE the hooks ship, eliminating the window-of-vulnerability where the production bundle could contain leaked hooks. Document this polarity in the test file preamble.
(b) After `npm run build`, ZERO occurrences of `simulateUserStop` in any file under `dist/`. Same polarity: GREEN today, must remain GREEN after hooks land.
- Both `it` blocks run a fresh `npm run build` as part of their setup (spawned via `child_process.execFile`, mirroring sw-bundle-import.test.ts's spawn pattern). They then `readdir`+`readFileSync` walk `dist/` and assert grep counts are zero. Skip the build spawn if `process.env.SKIP_BUILD === '1'` (developer escape hatch when running the test repeatedly during this task's iteration).
- The 83 baseline vitest tests + 2 new gate tests = 85 tests, ALL GREEN. (The Tier-1 gate is committed in a working state from day one.)
1. Read `package.json` to confirm `puppeteer` + `tsx` absent.
2. `npm install --save-dev puppeteer@^25.0.2 tsx@^4` — observe versions resolve correctly. Document the actually-resolved versions in the commit message body.
3. Update `package.json` `scripts` block per the interfaces section — add `build:test` and `test:uat`. Leave existing scripts (`dev`, `build`, `preview`, `test`) untouched.
4. Create `vite.test.config.ts` at repo root per the interfaces skeleton.
5. Verify `tsconfig.json` `include` covers `src/test-hooks/**/*` — if `include` is `["src/**/*"]` or omits `exclude` that would block, no edit needed. Document the actual `tsconfig.json` shape in the commit message body so reviewers see the verification ran.
6. Run `npm run build:test` → exit 0; `ls dist-test/` confirms emission. Run `npm run build` → exit 0; `ls dist/` confirms separate output.
7. Create `tests/background/no-test-hooks-in-prod-bundle.test.ts` with the two `it` blocks per behavior (a) + (b). Preamble docstring per project style: extensive (Google Python style mandate carries over — keep mirroring sw-bundle-import.test.ts's docstring density). Cite that this is a Tier-1 gate per `feedback-pre-checkpoint-bundle-gates.md` (the auto-loaded memory item).
8. Run `npx vitest run tests/background/no-test-hooks-in-prod-bundle.test.ts` → both GREEN (no hooks landed yet, nothing leaks).
9. Run `npx vitest run` (full suite) → 84 baseline + 2 new = 85 GREEN. Document the baseline + delta in the commit message body.
10. Run `npx tsc --noEmit` → exit 0.
11. Verify that NO `npm test` regression: rerun `npm test` → 85 GREEN.
Per project style: extensive docstrings; absolute imports; no `as any`; no `@ts-ignore`. The new test file is the first one to touch `child_process.execFile` since `sw-bundle-import.test.ts` — mirror that file's pattern verbatim (execFile + maxBuffer + timeout + stdout sentinel scheme). Do NOT introduce a new pattern.
npm run build:test && npm run build && test -d dist-test && test -d dist && npx vitest run tests/background/no-test-hooks-in-prod-bundle.test.ts && npx tsc --noEmit
- `package.json` devDeps include `puppeteer` + `tsx` at the pinned versions; `scripts` block carries `build:test` + `test:uat`.
- `vite.test.config.ts` exists, extends base config, emits to `dist-test/`.
- `npm run build:test` exits 0; `dist-test/` populated.
- `npm run build` exits 0; `dist/` populated separately (no clobber).
- `tests/background/no-test-hooks-in-prod-bundle.test.ts` exists with 2 tests; both GREEN.
- Full vitest suite: 83 baseline + 2 new = 85 GREEN.
- `npx tsc --noEmit` exit 0.
Two-bundle infrastructure landed; Tier-1 hook-leak gate operational (GREEN, will remain GREEN after Task 2 hooks land); npm scripts wired; baseline preserved.
Task 2 (Wave 1): Add gated test hooks to SW + offscreen; verify production bundle remains hook-free (Tier-1 gate stays GREEN).
- src/background/index.ts (top-of-module — where the import.meta.env.MODE guard lands; lines 1-50)
- src/offscreen/recorder.ts (top-of-module + line ~247 where mediaStream is assigned)
- tests/background/sw-bundle-import.test.ts (the Tier-1 SW-bundle-loadability gate — confirm it still passes after hooks land in test bundle)
- tests/background/no-test-hooks-in-prod-bundle.test.ts (the gate from Task 1)
- .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §6 (Vite tree-shaking gotchas)
- vite.test.config.ts (from Task 1)
src/test-hooks/types.ts, src/test-hooks/sw-hooks.ts, src/test-hooks/offscreen-hooks.ts, src/background/index.ts, src/offscreen/recorder.ts, tests/uat/lib/test-hook-contract.d.ts
- `src/test-hooks/types.ts` exports `MokoshTestSurface` + declares `globalThis.__mokoshTest` per the interfaces block.
- `src/test-hooks/sw-hooks.ts` registers the SW-side hook at module-load: monkey-patches `chrome.action.onClicked.addListener`, `chrome.runtime.onStartup.addListener`, `chrome.notifications.onClicked.addListener` to capture handler refs while still calling the originals. Wraps `chrome.notifications.create` to increment `notificationCount`, push id to `notificationIds`, save `lastNotificationOptions`. Initializes `globalThis.__mokoshTest = { handlers: {...}, notificationCount: 0, lastNotificationOptions: null, notificationIds: [] }`. NO `getCurrentStream` in SW (the field is optional per type — undefined in SW context).
- `src/test-hooks/offscreen-hooks.ts` registers the offscreen-side hook: exposes a mutable `currentStream: MediaStream | null` cell + `setCurrentStream(s)` setter + `__mokoshTest.getCurrentStream = () => currentStream` getter. The recorder calls `setCurrentStream` after the `mediaStream = stream` assignment (gated by the same MODE check).
- `src/background/index.ts` top-of-module gets:
```typescript
if (import.meta.env.MODE === 'test') {
await import('../test-hooks/sw-hooks');
}
```
Placement: BEFORE any `addListener` calls in the file so the monkey-patches catch every handler. This is a top-level `await` — supported in SW context per crxjs/Vite's MV3 module emission.
- `src/offscreen/recorder.ts` top-of-module gets the symmetric gated import; the `setCurrentStream` call lands inside `startRecording` right after `mediaStream = stream;` (line 247), also gated.
- `tests/uat/lib/test-hook-contract.d.ts` mirrors `MokoshTestSurface` for harness-side consumption (it's a declaration file; not bundled, only used at type-check time on the harness).
- After all changes, `npm run build` exits 0 AND `tests/background/no-test-hooks-in-prod-bundle.test.ts` REMAINS GREEN (the literal `__mokoshTest` does NOT appear in any file under `dist/`). `npm run build:test` exits 0 AND ONE OR MORE files under `dist-test/` contain `__mokoshTest` (verifiable by `grep -l __mokoshTest dist-test/`).
- `tests/background/sw-bundle-import.test.ts` REMAINS GREEN (Layer 1 + Layer 2; the gated dynamic import does not break the production bundle's module init).
- Full vitest suite: 85 GREEN (no regression).
1. Create `src/test-hooks/types.ts` per the interfaces block. Extensive JSDoc; cite this plan's Task 2 + RESEARCH §6 (gating mechanism) + RESEARCH §7 (Bug B BLOCKER context for getCurrentStream's role).
2. Create `src/test-hooks/sw-hooks.ts`. Monkey-patch pattern follows RESEARCH §6 Pattern 1. Wrap `chrome.notifications.create` so all four shape fields update (count, last options, ids array, no-op chain to the original create). Use absolute Chrome types from `@types/chrome` — no `as any`. Initialization at module load:
```typescript
const handlers: MokoshTestSurface['handlers'] = {
onClicked: null, onStartup: null, notificationOnClicked: null,
};
const notificationIds: string[] = [];
const origActionAdd = chrome.action.onClicked.addListener.bind(chrome.action.onClicked);
chrome.action.onClicked.addListener = (cb) => {
handlers.onClicked = cb;
return origActionAdd(cb);
};
// ... similarly for onStartup, notifications.onClicked ...
const origNotifCreate = chrome.notifications.create.bind(chrome.notifications);
(chrome.notifications.create as unknown) = (idOrOptions: string | chrome.notifications.NotificationOptions, optionsOrCb?: chrome.notifications.NotificationOptions | ((id: string) => void), maybeCb?: (id: string) => void) => {
// Handle both (id, options, cb) and (options, cb) overloads;
// surface the resolved id in notificationIds.
// Call origNotifCreate with the same args; wrap the callback to push id.
// Increment notificationCount; save lastNotificationOptions.
// Return the original return value (Chrome 88+ also Promise-returning).
};
globalThis.__mokoshTest = {
handlers,
notificationCount: 0,
lastNotificationOptions: null,
get notificationIds() { return notificationIds.slice(); },
};
```
The `as unknown` cast in the `create` reassignment is unavoidable because Chrome's `create` is typed as overloaded callable; document this explicitly with a comment citing the overload variance issue. NO `as any` — the `as unknown` + downstream typed body is the project-style escape hatch.
3. Create `src/test-hooks/offscreen-hooks.ts`:
```typescript
let currentStream: MediaStream | null = null;
export function setCurrentStream(stream: MediaStream | null): void {
currentStream = stream;
}
globalThis.__mokoshTest = {
// ...inherit SW's surface if it was set first; in offscreen context
// sw-hooks.ts did NOT run because this is a different document.
// So we initialize a fresh shape with only the offscreen-relevant fields:
handlers: { onClicked: null, onStartup: null, notificationOnClicked: null },
notificationCount: 0,
lastNotificationOptions: null,
notificationIds: [],
getCurrentStream: () => currentStream,
};
```
Note: the SW and offscreen are DIFFERENT JS isolates with DIFFERENT `globalThis`. The harness reads each surface via the appropriate `sw.evaluate` or `offPage.evaluate`. No cross-context shared state.
4. Edit `src/background/index.ts` — add the gated dynamic import at the TOP of the file (after any necessary type imports but BEFORE the existing logger initialization + addListener calls). Document inline that the placement is load-order-critical: this MUST run before any addListener.
5. Edit `src/offscreen/recorder.ts`:
(a) Top-of-module: gated dynamic import per the SW pattern.
(b) Inside `startRecording`, immediately after `mediaStream = stream;` (line ~247): gated `setCurrentStream(stream)` call. Use a top-level captured reference to the hooks module (set during the top-of-module import via a module-scoped `let hooks: typeof import('../test-hooks/offscreen-hooks') | null = null;` plus assignment in the import block). This avoids re-import per startRecording call.
6. Create `tests/uat/lib/test-hook-contract.d.ts`. Mirror `MokoshTestSurface`. Add a preamble docstring citing `src/test-hooks/types.ts` as the canonical source AND noting the drift-risk (manual sync) + the rationale for decoupling (no `import` from `tests/` into `src/`).
7. Run `npx tsc --noEmit` → exit 0 (all hook code typechecks).
8. Run `npm run build` (production). Then check `grep -rln __mokoshTest dist/` → ZERO matches. The Tier-1 gate test `tests/background/no-test-hooks-in-prod-bundle.test.ts` MUST stay GREEN.
9. Run `npm run build:test`. Then check `grep -rln __mokoshTest dist-test/` → ONE OR MORE matches (the hook code is bundled into the test build).
10. Run `npx vitest run` (full suite). 85 GREEN. The SW-bundle-import test must also be GREEN — verifies the gated dynamic import does NOT break production module init.
11. Sanity-check: open one of the production bundle's chunk files (the SW chunk via `dist/service-worker-loader.js` → its imported chunk) and confirm by eye that no `__mokoshTest` string is present. The grep gate is authoritative, but a manual eyeball ensures the gate isn't fooled by some bundler renaming.
DESIGN NOTE: the gated dynamic import IS the tree-shake trigger. If Vite ever fails to tree-shake a dynamic import behind a literal-comparison guard (which it shouldn't per RESEARCH §6 — the literal `'test'` !== `'production'` comparison is a static dead branch in production), the Tier-1 gate fails LOUDLY at CI time. The gate is THE mitigation for assumption A3 in RESEARCH §6.
npx tsc --noEmit && npm run build && test "$(grep -rln __mokoshTest dist/ | wc -l)" = "0" && npm run build:test && test "$(grep -rln __mokoshTest dist-test/ | wc -l)" -ge "1" && npx vitest run --reporter=dot
- `src/test-hooks/{types,sw-hooks,offscreen-hooks}.ts` exist with the contracts described.
- `src/background/index.ts` + `src/offscreen/recorder.ts` carry the gated dynamic import block; in offscreen, also the `setCurrentStream(stream)` call inside `startRecording`.
- `tests/uat/lib/test-hook-contract.d.ts` mirrors the type.
- `npm run build` exits 0; `grep -rln __mokoshTest dist/` → 0 matches.
- `npm run build:test` exits 0; `grep -rln __mokoshTest dist-test/` → ≥1 match.
- Tier-1 grep gate (`tests/background/no-test-hooks-in-prod-bundle.test.ts`) GREEN.
- Tier-1 SW-bundle-import gate (`tests/background/sw-bundle-import.test.ts`) GREEN.
- Full vitest suite: 85 GREEN.
- `npx tsc --noEmit` exit 0.
Hook surfaces live in test bundle; absent in production bundle (Tier-1 grep gate verifies); SW + offscreen module init unchanged for production; baseline preserved.
Task 3 (Wave 2): Build harness scaffolding — `tests/uat/lib/{launch,extension,sw,offscreen,assertions,zip}.ts` + `harness.test.ts` skeleton with all 14 assertions stubbed as failing.
- .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §1 (Puppeteer extension API patterns)
- .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §4 (target type quirk for offscreen)
- .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §7 (Bug B dispatchEvent contract — BLOCKER)
- .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §11 (per-assertion implementation hints)
- src/test-hooks/types.ts (from Task 2)
- tests/uat/lib/test-hook-contract.d.ts (from Task 2)
- tests/background/sw-bundle-import.test.ts (execFile child-process pattern — only relevant for assertion 0 which uses fs.readdir directly, not a spawned child)
tests/uat/lib/launch.ts, tests/uat/lib/extension.ts, tests/uat/lib/sw.ts, tests/uat/lib/offscreen.ts, tests/uat/lib/assertions.ts, tests/uat/lib/zip.ts, tests/uat/harness.test.ts, tests/uat/README.md
- `tests/uat/lib/launch.ts` exports `launchHarnessBrowser(options?: HarnessOptions): Promise` returning `{ browser, sw, ext, page, downloadsDir }`. Reads `HEADLESS` env var (`'0'` = headful for debug, anything else = headless). Wires Chrome args per RESEARCH §1 + §9.
- `tests/uat/lib/extension.ts` exports `attachToSw`, `attachToOffscreen`, `waitForOffscreen` per the RESEARCH §4 patterns. The offscreen attach uses the `background_page` target type + `.asPage()` (Pitfall 1).
- `tests/uat/lib/sw.ts` exports `getBadgeText(sw)`, `getPopup(sw)`, `getManifest(sw)`, `getIconSize(sw, path)`, `fireOnStartup(sw)`, `sendSyntheticRecordingError(sw, errorCode)`, `keepalivePing(sw)`, `getNotificationSnapshot(sw)`.
- `tests/uat/lib/offscreen.ts` exports `getDisplaySurface(offPage)`, `simulateUserStop(offPage)` (the dispatchEvent path per RESEARCH §7 BLOCKER — with an inline comment block citing the BLOCKER reasoning so future readers don't refactor it to `track.stop()`).
- `tests/uat/lib/assertions.ts` exports `assertEqual(actual, expected, msg)` + `assertMatch(actual, regex, msg)` + `assertTrue(cond, msg)` + a structured `runAssertion(name, fn)` wrapper that runs a single assertion, captures any SW/offscreen console logs since the last assertion, and dumps them to stderr on failure. Uses `node:assert/strict` per RESEARCH §4.
- `tests/uat/lib/zip.ts` exports `assertArchiveShape(zipBuf, expectedVersion)` — opens with jszip, asserts `video/last_30sec.webm` present + `meta.json` carries `version === expectedVersion`. The meta.json shape is per Plan 01-07 (existing archive contract — read once at the start of the harness and pass through).
- `tests/uat/harness.test.ts` is the single Node script (tsx-runnable). Top-to-bottom narrative:
```
0. Pre-flight grep gate (filesystem readdir on dist/) — assertion 0.
1. launchHarnessBrowser → attachToSw → attachToOffscreen-when-ready.
2. Assertion 1: SW bootstrap → setIdleMode (badge '', popup '', isRecording=false).
3. Assertion 2: triggerExtensionAction → wait → badge 'REC' + popup === src/popup/index.html + isRecording=true.
4. Assertion 3: offscreen track displaySurface === 'monitor'.
5. Assertion 4: triggerExtensionAction (while recording) → popup opens, NO new offscreen target.
6. Assertion 5: sendMessage SAVE_ARCHIVE → wait for download → check downloadsDir contains session_report_*.zip.
7. Assertion 6 (BUG B): simulateUserStop → wait 300ms → badge '' + popup '' + isRecording=false + notificationCount delta = 0.
8. Assertion 7 (ERROR path): sendSyntheticRecordingError('codec-unsupported') → badge 'ERR' + notificationCount delta = 1.
9. Assertion 8 (BUG A + onStartup): fireOnStartup → notifications.create called once with iconUrl matching icons/icon48.png (or icon128.png — verify which one the production code uses; the badge_state_machine plan uses icon128, but the test asserts whichever the production code actually invokes per the lastNotificationOptions snapshot).
10. Assertion 9: icon file sizes via sw.evaluate(fetch) ≥ floors (16: 200B, 48: 500B, 128: 1024B).
11. Assertion 10: manifest has 'notifications' permission + icons.16 + icons.48 + icons.128 declared.
12. Assertion 11 (35s record): start a fresh recording, wait 35s, query SW (via runtime message → offscreen → segments count) → segments.length >= 3.
13. Assertion 12 (ffprobe gate): trigger SAVE_ARCHIVE, extract video/last_30sec.webm, spawn ffprobe → exit 0.
14. Assertion 13 (zip shape): assertArchiveShape on the latest session_report_*.zip.
15. Final summary: `console.log('UAT harness: 14/14 assertions passed')`; exit 0.
```
ALL 14 assertions stubbed today as `runAssertion('N: title', async () => { throw new Error('NOT YET IMPLEMENTED — Task 5+ wires this'); });` so the harness exits non-zero with a clear "N assertions failed" diagnostic. Assertion 0 (filesystem-only) is wired in this task; assertions 1-13 are stubbed.
- `tests/uat/README.md` documents:
- How to run: `npm run test:uat` (build + harness).
- Local-debug headful mode: `HEADLESS=0 npm run test:uat`.
- Skipping the build (developer iteration): `SKIP_BUILD=1 npx tsx tests/uat/harness.test.ts` (the build is the npm-script wrapper; the harness itself can run against an existing `dist-test/`).
- Locale gotcha: `--auto-select-desktop-capture-source="Entire screen"` works on en_US; other locales need the locale-equivalent string. Fallback to operator-pick + `KEEP_PROFILE=1` documented as the Plan 01-09 fallback.
- dev-dep size: puppeteer pulls ~150MB Chromium binary; CI must accept this. Production `npm install --omit=dev` skips it cleanly.
- Xvfb is NOT required (per RESEARCH §3 empirical probes on Chrome 148).
- Failure isolation choice: single browser, serial assertions, bail on first failure (RESEARCH §5 + open-question resolution 4).
- Running `npm run test:uat` exits NON-ZERO today (the 13 stubbed assertions all throw); the diagnostic clearly identifies which assertion failed AND why ("NOT YET IMPLEMENTED — Task 5+ wires this"). Assertion 0 (the grep gate) PASSES — confirming the harness scaffolding wires correctly and the only failures are intentional stubs.
1. Create the `tests/uat/lib/` directory + all 6 helper files. Use absolute imports per project style. NO `as any`; type each helper's surface explicitly. Each helper file gets a top-of-file docstring per project style (extensive Google-style).
2. `launch.ts`: implementation uses `puppeteer.launch({ enableExtensions: [absolutePath], headless: ..., args: [...] })`. The absolutePath is computed via `path.resolve(__dirname, '../../../dist-test')` (the harness lives at `tests/uat/harness.test.ts` so `../../../` lands at repo root). Use `fileURLToPath` + `import.meta.url` for the `__dirname` shim (the harness runs as ESM under tsx).
3. `extension.ts`: implementation per RESEARCH §1 + §4 patterns. The offscreen attach uses `browser.waitForTarget(t => t.type() === 'background_page' && t.url().includes('offscreen'), { timeout: 5_000 })`. After getting the target, `.asPage()` returns the Page.
4. `sw.ts`: each helper is one or two lines of `sw.evaluate(...)`. The `getNotificationSnapshot` helper returns a structured `{ count, lastOptions, ids }` to keep the harness's reasoning unified.
5. `offscreen.ts` `simulateUserStop`:
```typescript
export async function simulateUserStop(offPage: Page): Promise {
// RESEARCH §7 BLOCKER — DO NOT REFACTOR to track.stop().
// track.stop() does NOT fire 'ended' per W3C spec (verified probe7);
// dispatchEvent IS the only path that triggers our production
// onUserStoppedSharing handler. A test that calls track.stop() would
// silently pass while production reality fails — exactly the trap
// Bug B fix (commit b9eeeeb) addresses.
await offPage.evaluate(() => {
const stream = globalThis.__mokoshTest?.getCurrentStream?.();
if (!stream) throw new Error('no current MediaStream — recording must be active');
const track = stream.getVideoTracks()[0];
if (!track) throw new Error('no video track in stream');
track.dispatchEvent(new Event('ended'));
});
}
```
6. `assertions.ts`: `runAssertion(name, fn)` captures `console.log`/`console.error` from the harness's own process; for SW + offscreen console logs, accept an optional `consoleSinks` parameter — the harness wires SW.on('console', ...) + offPage.on('console', ...) listeners at launch and passes their accumulating buffers to runAssertion. On assertion failure: dump buffers to stderr with structured "SW console (last N):" + "Offscreen console (last N):" preambles; rethrow.
7. `zip.ts`: jszip-based reader. The `expectedVersion` comes from `chrome.runtime.getManifest().version` (queried once at the start of the harness via `sw.evaluate`). Assertion is exact equality.
8. `harness.test.ts`: the top-to-bottom narrative. Wrap the whole thing in a top-level `try/finally`; the `finally` always calls `browser.close()`. The 14 assertion stubs all throw the "NOT YET IMPLEMENTED" Error. Assertion 0 is wired in this task:
```typescript
await runAssertion('0: production bundle has no __mokoshTest leak', async () => {
// Filesystem-only — does not require the browser.
// We don't run `npm run build` here; that's the caller's responsibility
// (npm run test:uat does `npm run build:test` first; a separate `npm run build`
// confirmation could be added as a pre-flight, but the no-test-hooks-in-prod-bundle
// unit test already covers that and runs as part of `npm test`. Here we re-verify
// for E2E robustness against the case where the unit test passed against a stale dist/.)
const { execFileSync } = await import('node:child_process');
execFileSync('npm', ['run', 'build'], { stdio: 'inherit' });
const distDir = path.resolve(__dirname, '../../dist');
const matches = await grepRecursive(distDir, '__mokoshTest');
assertEqual(matches.length, 0, 'production dist/ must not contain __mokoshTest');
});
```
NOTE: assertion 0 spawns `npm run build` from inside the harness, which costs ~10s. The unit test (Task 1) makes this somewhat redundant — but the unit test runs in the vitest pass; the harness runs separately. Belt + suspenders. Alternative: skip the spawn if `process.env.SKIP_PROD_REBUILD === '1'` for developer iteration.
9. `README.md`: per the behavior list.
10. Run `npm run test:uat`. Expected output:
- `npm run build:test` runs first (succeeds; emits dist-test/).
- `tsx tests/uat/harness.test.ts` runs.
- Assertion 0 PASSES (filesystem grep gate).
- Assertions 1-13 all THROW "NOT YET IMPLEMENTED".
- Exit code: non-zero.
- Diagnostic line: "UAT harness: 1/14 assertions passed, 13 failed (first failure: Assertion 1)".
11. Run `npx tsc --noEmit` → exit 0 (all harness code type-clean against `@types/chrome` + puppeteer types + `tests/uat/lib/test-hook-contract.d.ts`).
12. Run `npx vitest run` (full suite) → 85 GREEN (no regression to unit tests; the harness lives outside vitest's discovery).
Per project style: extensive docstrings; absolute imports; no `as any`; no `@ts-ignore`; named callbacks (the runAssertion lambdas are short enough to be acceptable as inline arrows). Use if-else chains over early returns where the assertion logic has multi-arm branching; guard-clause early returns are fine for null-checks per established project exception.
npx tsc --noEmit && npm run test:uat; test $? -ne 0 && npx vitest run --reporter=dot
- All 7 helper files exist with the contracts described.
- `harness.test.ts` exists with assertion 0 wired (GREEN) + assertions 1-13 stubbed (RED).
- `README.md` documents the runtime + local-debug + CI semantics.
- `npm run test:uat` exits non-zero today; diagnostic clearly identifies assertion 0 as PASS + assertions 1-13 as "NOT YET IMPLEMENTED".
- `npx tsc --noEmit` exit 0 across both `src/` and `tests/` trees.
- Full vitest suite: 85 GREEN.
- No file under `src/` modified by this task (the harness is purely under `tests/`).
Harness scaffolding live with assertion 0 wired GREEN; assertions 1-13 staged as RED stubs for Tasks 4-7; baseline preserved.
Task 4 (Wave 3 — bundle 1/4): Wire assertions 1, 2, 3, 4 (SW bootstrap + toolbar onClicked + displaySurface + popup-during-recording).
- tests/uat/harness.test.ts (skeleton from Task 3)
- tests/uat/lib/{launch,extension,sw,offscreen}.ts (helpers from Task 3)
- src/background/index.ts lines 75-108 (setIdleMode/setRecordingMode state machine — the production code these assertions verify)
- src/background/index.ts lines 411-415 (setRecordingMode call site inside startVideoCapture)
- src/background/index.ts lines 844-858 (chrome.action.onClicked listener registration)
- src/offscreen/recorder.ts lines 241-247 (getDisplayMedia call + mediaStream assignment)
- .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §1 (triggerExtensionAction + the popup-vs-onClicked MV3 contract)
- .planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md (the must-haves these assertions are verifying)
tests/uat/harness.test.ts, tests/uat/lib/sw.ts
- Assertion 1 (SW bootstrap): after `launchHarnessBrowser` + attach SW, query `getBadgeText` (empty), `getPopup` (empty), `getIsRecording` (false — exposed via a new helper that reads `globalThis.isRecording` from the SW context via `sw.evaluate`; the SW production code has `isRecording` as a module-level let, accessible from the SW global). PASSES today against current bundle.
- Assertion 2 (onClicked-idle): `page.triggerExtensionAction(ext)` → `await waitFor(() => getBadgeText() === 'REC', 5_000)` (poll up to 5s; the picker auto-selects the screen so getDisplayMedia resolves fast). Then assert popup === 'src/popup/index.html' + getIsRecording === true. PASSES today.
- Assertion 3 (displaySurface): after assertion 2 leaves recording active, attach to offscreen via `waitForOffscreen` + `attachToOffscreen`. Then `offsetPage.evaluate(() => __mokoshTest.getCurrentStream().getVideoTracks()[0].getSettings().displaySurface)` === 'monitor'. PASSES today (per Plan 01-09 D-15-display-surface; the post-grant validation in recorder.ts ensures monitor-only).
- Assertion 4 (click-during-recording): record the current offscreen target count, then `page.triggerExtensionAction(ext)` again. Assert: popup state unchanged (still 'src/popup/index.html'); NO new offscreen target spawned (count unchanged). The toolbar click with popup set opens the popup (which the harness can verify via `browser.targets().find(t => t.url().includes('popup/index.html'))` — the popup target appears as a `page` type briefly). PASSES today.
- All 4 assertions wired; each carries an inline RED-on-regression demonstration step in its action block: the executor must locally demonstrate the assertion CAN catch a regression before marking the assertion GREEN.
1. Wire assertion 1: replace the "NOT YET IMPLEMENTED" stub with the real logic per behavior. Add a `getIsRecording(sw)` helper to `tests/uat/lib/sw.ts`:
```typescript
export async function getIsRecording(sw: WebWorker): Promise {
return await sw.evaluate(() => (globalThis as any).isRecording as boolean);
}
```
NOTE: this is the ONE site where `as any` is unavoidable — the production code declares `isRecording` as a module-level `let` in `src/background/index.ts:36`, which is NOT exposed on globalThis directly. To read it, we need to evaluate in the SW context AS the SW (which has implicit globalThis access to module-top let-bindings — verify this is true in MV3 SW context; if not, expose `isRecording` via a getter on `__mokoshTest` in `sw-hooks.ts`). Document the choice + rationale inline.
(Per RESEARCH §6 contract verification: SW module-level `let` IS accessible as `globalThis.isRecording` in MV3 SW context — verified by probe2. If the executor sees `undefined` returned, fall back to exposing via `__mokoshTest.isRecording` getter from sw-hooks.ts and document the SW-isolation finding.)
2. Wire assertion 2: implementation per behavior. After `triggerExtensionAction`, poll `getBadgeText` for up to 5 seconds — the badge transition is async (offscreen creation + getDisplayMedia + post-grant validation + setRecordingMode all happen in sequence). Use a polling helper from `assertions.ts` or inline:
```typescript
async function waitFor(probe: () => Promise, predicate: (v: T) => boolean, timeoutMs: number): Promise {
const start = Date.now();
while (Date.now() - start < timeoutMs) {
const v = await probe();
if (predicate(v)) return v;
await new Promise(r => setTimeout(r, 100));
}
throw new Error(`waitFor timeout ${timeoutMs}ms`);
}
```
Use this in assertion 2 + 3 + 4.
3. Wire assertion 3: per behavior. The `waitForOffscreen` helper already handles the target wait + asPage; attach once after assertion 2 sets recording=true, then offPage.evaluate the displaySurface read.
4. Wire assertion 4: per behavior. Count `browser.targets()` filtered to offscreen-url-containing BEFORE the second click, then AFTER; assert equality. Also assert popup state unchanged.
5. RED-on-regression demonstration:
- For assertion 2: locally insert `chrome.action.onClicked.addListener(async () => { return; })` BEFORE the production listener and re-build:test; assertion 2 should FAIL (badge stays empty). Revert the hack; assertion 2 PASSES.
- For assertion 3: locally alter `recorder.ts` to call `getDisplayMedia({ video: true, audio: false })` (without displaySurface constraint) and rebuild; assertion 3 should FAIL (displaySurface defaults to 'browser' OR is undefined depending on Chrome behavior). Revert; PASSES.
- The executor commits ONLY the working assertions; the RED demos are local-only verifications. Document each RED demo's outcome in the commit message body.
6. Run `npm run test:uat`: assertions 0+1+2+3+4 PASS; assertions 5-13 still stubbed as RED. Exit non-zero. Diagnostic: "5/14 passed, 9 failed".
7. Run `npx tsc --noEmit` → exit 0.
8. Run full vitest suite → 85 GREEN.
npx tsc --noEmit && (set +e; npm run test:uat; test $? -ne 0)
- Assertions 0, 1, 2, 3, 4 all PASS in `npm run test:uat`.
- Assertions 5-13 still throw "NOT YET IMPLEMENTED".
- `npm run test:uat` exits non-zero (because 9 stubs remain).
- Diagnostic shows 5/14 passed.
- `npx tsc --noEmit` exit 0.
- Full vitest suite: 85 GREEN.
- Each wired assertion's commit message body cites the RED-demonstration outcome.
First 4 functional assertions live and GREEN; harness proves it can verify toolbar + displaySurface + popup-state via CDP.
Task 5 (Wave 3 — bundle 2/4): Wire assertions 5, 6, 7 (SAVE_ARCHIVE download + Bug B user-stopped routing + ERROR-path).
- tests/uat/harness.test.ts (assertions 1-4 GREEN from Task 4)
- tests/uat/lib/{sw,offscreen,zip}.ts (helpers; especially simulateUserStop's BLOCKER-citing comment)
- src/background/index.ts lines 725-778 (RECORDING_ERROR handler — Bug B conditional routing)
- src/offscreen/recorder.ts lines 451-480 (onUserStoppedSharing — the handler simulateUserStop must trigger)
- .planning/debug/resolved/01-09-recovery-flow.md (Bug B debug record — the exact contract assertion 6 verifies)
- .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §7 (BLOCKER analysis — track.dispatchEvent is the ONLY valid path)
tests/uat/harness.test.ts, tests/uat/lib/sw.ts
- Assertion 5 (SAVE_ARCHIVE download): with recording active from prior assertions, `sw.evaluate(() => chrome.runtime.sendMessage({type: 'SAVE_ARCHIVE'}))` triggers the save flow. The download lands in `downloadsDir` (configured at launch via `--user-data-dir` + per-page download behavior, OR via `page._client().send('Browser.setDownloadBehavior', ...)` — RESEARCH didn't deep-dive this; the executor researches the cleanest path). Poll for `*session_report*.zip` appearance in downloadsDir for up to 15s. PASSES today.
- Assertion 6 (BUG B): snapshot `notificationCount` via `getNotificationSnapshot(sw)`. Then `simulateUserStop(offPage)`. Wait 300ms (offscreen handler → runtime message → SW handler → state transition is async). Assert: badge text === '' (NOT 'ERR'); popup === '' (NOT 'src/popup/index.html'); isRecording === false; notificationCount delta === 0 (no recovery notification fired for deliberate stop). PASSES today against b9eeeeb.
- Assertion 7 (ERROR-path preserved): start a fresh recording (since assertion 6 stopped it). Snapshot notificationCount. Then `sw.evaluate(() => chrome.runtime.sendMessage({type: 'RECORDING_ERROR', error: 'codec-unsupported'}))`. Wait 200ms. Assert: badge text === 'ERR'; notificationCount delta === 1; last notification id starts with 'mokosh-recovery-'. PASSES today.
- Each assertion carries the RED-on-regression demonstration; assertion 6's RED demo is the canonical "rewinding b9eeeeb" cycle from the orchestrator brief.
1. Wire assertion 5. Investigate Puppeteer's download path config: `browser.defaultBrowserContext().overridePermissions(...)` for downloads OR `CDP Browser.setDownloadBehavior` with `behavior: 'allow'` + `downloadPath: downloadsDir`. The harness creates `downloadsDir` in the launch helper (e.g. `os.tmpdir() + '/mokosh-uat-downloads-' + Date.now()`). After `sendMessage({type:'SAVE_ARCHIVE'})`, poll the dir for ~15s for any `session_report_*.zip`. Save the path for assertion 13. PASS = file appears + non-zero size.
2. Wire assertion 6 per behavior. Use the existing `simulateUserStop` helper (with its BLOCKER comment intact). The 300ms wait is the propagation budget; if assertions intermittently flake here, bump to 500ms — the offscreen handler is synchronous-into-sendMessage, the SW handler is synchronous-into-setIdleMode, so 300ms is generous but not extravagant.
3. Wire assertion 7 per behavior. Reads `lastNotificationOptions.title` or similar to verify "Mokosh stopped" recovery copy AND `notificationIds[notificationIds.length-1].startsWith('mokosh-recovery-')`.
4. RED-on-regression demonstrations (recorded in commit body):
- **Assertion 6 RED demo (THE canonical Bug B regression check)**: locally `git diff HEAD~1 -- src/background/index.ts` to recover the pre-b9eeeeb shape of the RECORDING_ERROR handler (unconditional setErrorMode); APPLY the inverse patch locally (do NOT commit). Rebuild test bundle. Run `npm run test:uat`. Assertion 6 MUST FAIL with diagnostic: "expected badge text '' but got 'ERR'". Revert (`git checkout -- src/background/index.ts`). Rebuild. Re-run. Assertion 6 PASSES. This proves the harness assertion CAN catch a Bug B regression. **Document this end-to-end demo in the commit message body.**
- Assertion 5 RED demo: locally comment out the `chrome.downloads.download(...)` call in `src/background/index.ts:saveArchive` and rebuild; assertion 5 should FAIL (timeout waiting for zip). Revert; PASSES.
- Assertion 7 RED demo: locally short-circuit the RECORDING_ERROR case to return without calling setErrorMode for codec-unsupported (e.g. early-return on case entry); assertion 7 should FAIL. Revert; PASSES.
5. Run `npm run test:uat`: 8/14 PASS, 6 stubs remain. Exit non-zero.
6. Run `npx tsc --noEmit` → exit 0. Vitest 85 GREEN.
npx tsc --noEmit && (set +e; npm run test:uat; test $? -ne 0)
- Assertions 0-7 all PASS.
- Assertions 8-13 still stubbed RED.
- `npm run test:uat` exits non-zero; diagnostic 8/14 passed.
- Bug B RED-on-regression demo documented in commit body (mandatory).
- `npx tsc --noEmit` exit 0; vitest 85 GREEN.
Bug B harness assertion live AND demonstrably catches regression; SAVE_ARCHIVE + ERROR-path coverage live; bug-class root cause (state-machine routing) now CI-callable.
Task 6 (Wave 3 — bundle 3/4): Wire assertions 8, 9, 10 (Bug A onStartup notification + icon file sizes + manifest shape).
- tests/uat/harness.test.ts (assertions 1-7 GREEN from Tasks 4-5)
- src/background/index.ts lines 860-881 (chrome.runtime.onStartup handler — the path Bug A's recovery notification was failing on before a881bf0)
- manifest.json (icons declared + notifications permission)
- .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §11 (per-assertion implementation hints)
- icons/icon{16,48,128}.png (verify presence + size — the floors are 200/500/1024 bytes from the orchestrator brief)
tests/uat/harness.test.ts
- Assertion 8 (BUG A + onStartup): snapshot notificationCount. Then `sw.evaluate(() => globalThis.__mokoshTest!.handlers.onStartup?.())`. Wait 100ms (synchronous handler, but allow microtask drain). Assert: notificationCount delta === 1; `lastNotificationOptions.iconUrl` matches `/icons\/icon(?:128|48)\.png$/` (the production code uses NOTIFICATION_ICON_PATH = 'icons/icon128.png'); `lastNotificationOptions.title === 'Mokosh ready'`; `notificationIds[notificationIds.length-1].startsWith('mokosh-startup-')`. The PASS condition implies chrome.notifications.create's promise resolved cleanly — if Bug A regressed (icon below floor), Chrome's imageUtil throws and the create call REJECTS, so notificationCount would NOT increment. PASSES today against a881bf0.
- Assertion 9 (icon files present + sized): for each of (16, 200), (48, 500), (128, 1024), `sw.evaluate` a fetch of `chrome.runtime.getURL('icons/icon{N}.png')` and read `content-length`. Assert >= floor. PASSES today.
- Assertion 10 (manifest shape): `getManifest(sw)`. Assert: `permissions.includes('notifications')`; `icons['16']`, `icons['48']`, `icons['128']` all defined and equal to expected paths. PASSES today.
- Each assertion's RED-on-regression demo documented in commit body.
1. Wire assertion 8 per behavior. The `onStartup` handler in production carries inline try/catch around the `chrome.notifications.create` call (per src/background/index.ts:868-877); the hook's notificationCount wrapper increments regardless of create's resolution path. To verify Bug A specifically, ALSO assert that the iconUrl in lastNotificationOptions points to a file that resolves to >= 1024 bytes (cross-check with assertion 9's floor). This catches the Bug A regression EVEN IF a future change wraps the create call in a swallowing try/catch.
2. Wire assertion 9 per behavior. The fetch via sw.evaluate is the cleanest path — Chrome serves extension files from `chrome-extension:///...` and fetch with a `chrome-extension://` URL works in SW context.
3. Wire assertion 10 per behavior. Direct `chrome.runtime.getManifest()` read.
4. RED-on-regression demos (commit body):
- **Assertion 8 RED demo (Bug A canonical)**: locally `echo "" > icons/icon128.png` (truncate to 0 bytes). Rebuild test bundle. Run `npm run test:uat`. Assertion 8 should FAIL — Chrome's imageUtil rejects the create call (or the wrapper's lastNotificationOptions snapshot has wrong shape). Restore (`git checkout -- icons/icon128.png`). Rebuild. Re-run. Assertion 8 PASSES. **Document in commit body.**
- Assertion 9 RED demo: same truncate; rebuild; assertion 9 should FAIL with "content-length 0 < floor 1024". Restore; PASSES.
- Assertion 10 RED demo: locally remove "notifications" from manifest.json permissions and rebuild test bundle; assertion 10 should FAIL. Restore; PASSES.
5. Run `npm run test:uat`: 11/14 PASS, 3 stubs remain (11, 12, 13).
6. `npx tsc --noEmit` exit 0; vitest 85 GREEN.
npx tsc --noEmit && (set +e; npm run test:uat; test $? -ne 0)
- Assertions 0-10 all PASS.
- Assertions 11-13 still stubbed RED.
- `npm run test:uat` exits non-zero; diagnostic 11/14 passed.
- Bug A RED-on-regression demo documented in commit body (mandatory).
- `npx tsc --noEmit` exit 0; vitest 85 GREEN.
Bug A harness assertion live AND demonstrably catches regression; icon + manifest coverage live; both Phase-1-escapee bug classes (Bug A + Bug B) now CI-callable.
Task 7 (Wave 3 — bundle 4/4): Wire assertions 11, 12, 13 (35s buffer continuity + ffprobe gate + zip shape) — closes the 13-assertion charter.
- tests/uat/harness.test.ts (assertions 1-10 GREEN from Tasks 4-6)
- tests/uat/lib/zip.ts (the jszip-based archive shape helper)
- tests/offscreen/webm-playback.test.ts (the existing ffprobe pattern — FFPROBE_BIN constant, skip-gate helper)
- src/background/webm-remux.ts (Plan 01-08's remux helper — what the harness's ffprobe gate validates)
- .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §11 (per-assertion implementation hints for 11, 12, 13)
tests/uat/harness.test.ts, tests/uat/lib/zip.ts
- Assertion 11 (35s buffer continuity): start a fresh recording. Wait 35 seconds (with keepalive pings every 20s per RESEARCH §2). Query the offscreen segments count via offPage.evaluate (the offscreen recorder maintains a `segments` ring; expose it via a `__mokoshTest.getSegmentCount()` getter — ADD this to offscreen-hooks.ts in this task). Assert: segmentCount >= 3 (per D-13: 10s segments × MAX_SEGMENTS=3 = 30s window). PASSES today.
- Assertion 12 (ffprobe gate): trigger SAVE_ARCHIVE (reusing the assertion 5 helper). Extract `video/last_30sec.webm` from the produced zip via jszip. Write to a tmpfile. Spawn `ffprobe -v error -f matroska -i ` via execFileSync. Assert exit code 0. (Skip-gate this assertion with a clear "SKIPPED: ffprobe binary not available" diagnostic if `which ffprobe` fails — matches the existing webm-playback.test.ts pattern.)
- Assertion 13 (zip shape): jszip parse the same zip. Assert: `video/last_30sec.webm` entry exists + has non-zero size. Assert: `meta.json` entry exists + parsed JSON has `version === ` (read via sw.evaluate at the start of the harness or this assertion).
- The 35-second wait pushes the harness runtime past 60s. Add keepalive ping infrastructure (one ping every 20s during the wait) to avoid SW eviction per RESEARCH §2 / Pitfall 5.
1. ADD a `__mokoshTest.getSegmentCount()` getter to `src/test-hooks/offscreen-hooks.ts`. The offscreen recorder has a module-level `segments` array (from D-13 restart-segments); expose a function-level setter alongside `setCurrentStream`:
```typescript
// src/test-hooks/offscreen-hooks.ts
let currentStream: MediaStream | null = null;
let segmentCountGetter: () => number = () => 0;
export function setCurrentStream(s: MediaStream | null) { currentStream = s; }
export function setSegmentCountGetter(g: () => number) { segmentCountGetter = g; }
globalThis.__mokoshTest = {
// ...
getCurrentStream: () => currentStream,
getSegmentCount: () => segmentCountGetter(),
};
```
Update `src/test-hooks/types.ts` to add `getSegmentCount?: () => number` to MokoshTestSurface.
In `src/offscreen/recorder.ts`, after the existing `setCurrentStream(stream)` call, add (gated):
```typescript
if (import.meta.env.MODE === 'test') {
const hooks = await import('../test-hooks/offscreen-hooks');
hooks.setSegmentCountGetter(() => segments.length);
}
```
(Where `segments` is the module-level array. If the variable name differs, adapt. Read the file to confirm; commonly named `videoSegments` or `segments`.)
2. Wire assertion 11 per behavior. The 35s wait uses `await new Promise(r => setTimeout(r, 35_000))` with intermittent `await keepalivePing(sw)` every 20s. Use `setInterval` or a polling loop; document the keepalive purpose per RESEARCH §2.
3. Wire assertion 12 per behavior. Reuse the `FFPROBE_BIN` constant pattern from `tests/offscreen/webm-playback.test.ts`. Skip-gate: `if (!existsSync(FFPROBE_BIN)) { console.warn('Assertion 12: ffprobe not available — SKIPPED'); return; }`. The skip-gate is acceptable for assertion 12 because the unit-level tests (Plan 01-08's `tests/background/webm-remux.test.ts`) also have ffprobe gates that cover the same contract — the harness's ffprobe assertion is end-to-end validation, not the primary gate.
4. Wire assertion 13. Pass `expectedVersion = await sw.evaluate(() => chrome.runtime.getManifest().version)` into `assertArchiveShape`.
5. Update Tier-1 grep gate test (`tests/background/no-test-hooks-in-prod-bundle.test.ts`) to ALSO assert ZERO `getSegmentCount` in dist/ (new hook surface added in this task — confirm gate stays GREEN).
6. RED-on-regression demos (commit body):
- Assertion 11 RED demo: locally hack `SEGMENT_DURATION_MS = 30_000` in recorder.ts so 35s yields only 1 segment; rebuild; assertion 11 should FAIL. Revert; PASSES.
- Assertion 12 RED demo: locally inject a corrupted byte into the remux output (e.g. zero the EBML magic in webm-remux.ts before return); rebuild; assertion 12 should FAIL (ffprobe error). Revert; PASSES.
- Assertion 13 RED demo: locally drop `version` from the `meta.json` writer in saveArchive; rebuild; assertion 13 should FAIL. Revert; PASSES.
7. Run `npm run test:uat`: ALL 14 assertions PASS. Exit 0. Diagnostic: "UAT harness: 14/14 assertions passed".
8. `npx tsc --noEmit` → exit 0. `npx vitest run` → 85 GREEN.
9. **Verify Tier-1 grep gate updates:** `npm run build && grep -rln 'getSegmentCount' dist/` → 0 matches.
npx tsc --noEmit && npm run test:uat && npx vitest run --reporter=dot && test "$(grep -rln getSegmentCount dist/ 2>/dev/null | wc -l)" = "0"
- All 14 assertions PASS in `npm run test:uat`; exit 0.
- `npm run test:uat` total runtime ~50-90s (dominated by the 35s assertion 11 wait + the harness setup ~10s + assertion 0's `npm run build` ~10s; skip with `SKIP_PROD_REBUILD=1` for ~70s).
- `npx tsc --noEmit` exit 0; vitest 85 GREEN.
- Production bundle (`npm run build`): `grep -rln __mokoshTest dist/` → 0; `grep -rln simulateUserStop dist/` → 0; `grep -rln getSegmentCount dist/` → 0. Tier-1 gate remains GREEN.
- Each new assertion's RED-on-regression demo documented in commit body.
13-assertion charter complete; harness exits 0 against current Plan 01-09 bundle; Phase 1 functional contract fully CI-callable.
Task 8 (Wave 4): Amend Plan 01-09 Task 5 operator checkpoint to redirect functional steps to `npm run test:uat`; update STATE.md decisions; close Plan 01-09 via this plan's harness PASS.
- .planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md lines 519-549 (the operator checkpoint that gets amended)
- .planning/phases/01-stabilize-video-pipeline/01-09-SUMMARY.md (current closure state)
- .planning/STATE.md (Decisions section + Phase 1 Closure Notes)
- tests/uat/harness.test.ts (the harness that NOW closes the functional contract)
.planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md, .planning/STATE.md
- `.planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md` gets an AMENDMENT block at the END of the file (does NOT rewrite the original Task 5 — preserves provenance per project convention from D-A1..D-A6 cascade pattern):
```
---
## Amendment (Phase 01-stabilize-video-pipeline, 2026-05-17) — Plan 01-11 harness retires operator functional steps
Plan 01-11 (Puppeteer UAT harness) lands a CI-callable replacement for the
functional verification work in this plan's Task 5. The operator's role is
reduced to:
- **Step 1 (build):** unchanged — `npm run build` must exit 0.
- **Steps 2-13:** REDIRECTED — replaced by `npm run test:uat` exit 0. The
Puppeteer harness implements 14 assertions (assertion 0 = production-
bundle hook-leak grep; assertions 1-13 = the original Task 5
functional checks).
- **Step 14 (brand/design — implicit in steps 4, 5, 6 of original task):**
RETAINED for operator. The harness verifies displaySurface === 'monitor'
+ notification fires; it does NOT verify the human-readable copy is
aesthetically correct OR that the badge color reads cleanly against the
operator's OS theme. Operator confirms.
- **Step 15 (genuine error UX):** REDIRECTED — assertion 7 verifies the
ERROR-path bandwidth.
**New closure gate:** Plan 01-09 closes when `npm run test:uat` exits 0
AND operator confirms step 14 (brand/design). The harness's 14/14 PASS
against current bundle (verified by this plan's Task 7) supplies the
first half today.
```
- `.planning/STATE.md` Decisions section gains a new entry (preserves the existing log; appends rather than rewriting):
```
- [Phase 01-11]: Operator role retirement landed via Puppeteer UAT harness. 14 assertions cover Plan 01-08/01-09 functional contract; operator retained only for brand/design step. `npm run test:uat` = the new CI gate for any Phase 1 SW/offscreen/manifest change. Tier-1 grep gate `tests/background/no-test-hooks-in-prod-bundle.test.ts` enforces zero `__mokoshTest` / `simulateUserStop` / `getSegmentCount` in production `dist/`.
```
- This task does NOT modify Plan 01-09's status fields, frontmatter, or original Task 5 body. The amendment is appended after the original `
1. Read `.planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md` to confirm the file structure ends with the `` tag. Use the same horizontal-rule + ## heading + AMENDED-BY metadata convention from CONTEXT.md amendments. Cite the harness path (`tests/uat/harness.test.ts`) and the npm script (`npm run test:uat`).
3. Read `.planning/STATE.md` Decisions section (lines 72-109).
4. Append the new entry to the Decisions list (after the most recent `[Phase 01-07-deferred-to-5]` entry per the convention). Do NOT modify any existing entry.
5. Verify both edits are content-only (no frontmatter changes; no status flips — those happen in the closing checkpoint).
6. Run `npx tsc --noEmit` → exit 0 (paranoia — neither edit touches TS, but baseline).
7. Run `npm run test:uat` → exit 0 (final smoke before the closing checkpoint).
8. Run `npx vitest run` → 85 GREEN.
npx tsc --noEmit && grep -q 'Plan 01-11 harness retires operator functional steps' .planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md && grep -q 'Operator role retirement landed via Puppeteer UAT harness' .planning/STATE.md && npm run test:uat && npx vitest run --reporter=dot
- `01-09-PLAN.md` ends with the appended amendment block (no edits to the original Task 5 body).
- `STATE.md` Decisions section carries the new entry as the last item (no edits to prior entries).
- `npm run test:uat` exits 0 (14/14 GREEN).
- `npx tsc --noEmit` exit 0; vitest 85 GREEN.
Plan 01-09 functional contract redirected to harness; STATE.md decisions log updated; ready for closing checkpoint.
Task 9 (Wave 4): Operator confirms `npm run test:uat` exits 0 against current bundle AND confirms brand/design step 14 (Plan 01-09 Task 5 retained step) — closes Plan 01-09 + Plan 01-11.
(operator-driven; no files modified by this checkpoint)
See below — operator-driven empirical check. The executor must NOT bypass this checkpoint by stubbing harness output.
echo "checkpoint:human-verify — see how-to-verify section; resume signal is the gate"
Operator types "approved" after running the how-to-verify steps. See for the exact gate.
Tasks 1-8 landed: Puppeteer + tsx installed, vite.test.config.ts produces dist-test/, gated test hooks in src/test-hooks/ ship in test bundle and NOT in production bundle (Tier-1 grep gate verifies), Puppeteer harness at tests/uat/harness.test.ts implements 14 assertions, all 14 GREEN against current Plan 01-09 bundle (b9eeeeb Bug B fix + a881bf0 Bug A fix both verified by Bug B + Bug A canonical RED-on-regression demos). Plan 01-09 Task 5 redirected to `npm run test:uat` for functional steps. This checkpoint validates the harness end-to-end against real Chrome AND captures operator's brand/design acceptance for Plan 01-09's retained step 14.
1. **Pre-flight cleanliness:** run `git status` — confirm working tree clean. Any uncommitted local hacks (RED-demo reverts) MUST be reverted BEFORE this step.
2. **Build production:** `npm run build` (must exit 0; this is Plan 01-09 Task 5 step 1).
3. **Build test bundle:** `npm run build:test` (must exit 0).
4. **Run harness:** `npm run test:uat` (must exit 0; runtime ~70-90s). Final output line MUST be exactly `UAT harness: 14/14 assertions passed`. If exit non-zero, paste the structured diagnostic + harness console dump + relevant SW/offscreen console logs; the plan iterates (likely a real bug surfaced).
5. **Re-run for stability:** `npm run test:uat` a second time. Same outcome. (Eliminates first-run flakiness from cold Chrome / cold dist-test cache.)
6. **Tier-1 hook-leak verification:** `grep -rln __mokoshTest dist/` must return 0 matches. Same for `simulateUserStop`, `getSegmentCount`, `setCurrentStream`, `setSegmentCountGetter`. If ANY match, the gate failed silently — STOP and triage.
7. **Local-debug mode smoke:** `HEADLESS=0 npm run test:uat`. Watch the real Chrome window: see the toolbar icon, see the picker auto-accept, see the badge transitions. Same exit 0 outcome. (This is the operator's chance to spot any visual oddity the automated assertions miss.)
8. **Brand/design acceptance (Plan 01-09 Task 5 step 14 — retained for operator):**
(a) Badge color readability against your OS theme: red OFF, green REC, yellow ERR should each contrast clearly with the toolbar background. If any is hard to see in light AND dark mode, document for Phase 5 hardening (do NOT block closure on this — file as a deferred item).
(b) Notification copy: "Mokosh ready — Click here to start recording your session." reads naturally in en_US. Russian operators may want a localized variant — document for Phase 5 (do NOT block closure on this).
(c) Picker UX: confirm Chrome's screen-share picker still surfaces (in headful mode) at the expected moment + with the correct monitor-only options.
9. **If steps 4, 5, 6 all PASS:** Plan 01-09 + Plan 01-11 both close. Type "approved" with any brand/design notes appended.
10. **If step 4 OR 5 FAIL:** paste the failure diagnostic. Likely culprits: locale-specific picker string mismatch (RESEARCH §9 — operator's Chrome may need a different `--auto-select-desktop-capture-source` value); race window in assertion 6 / 11 (try bumping the wait in the relevant assertion).
11. **If step 6 FAILS:** STOP. The Tier-1 hook-leak gate failing means the production bundle contains test code — this is a security regression (T-1-11-01). Do NOT proceed to closure. Open a debug session.
12. **If step 7 surfaces a real UX issue (not just a deferral):** document as a P1/P2 item in STATE.md or a phase-5 backlog file; closure can still proceed IF the issue is non-blocking.
Type "approved" after step 9 lands (all gates GREEN + brand/design accepted). If steps 10/11/12 hit, paste the failure mode + operator's Chrome version + locale + OS theme; the plan iterates on the failing piece (likely Task 4-7 for assertion-specific issues; Task 1-2 for hook-leak issues; a fresh debug session for novel failures).
## Trust Boundaries
| Boundary | Description |
|----------|-------------|
| Puppeteer driver ↔ Chrome SW (via CDP) | The harness pipes CDP commands to the SW context via `sw.evaluate`. Trust boundary is unchanged at runtime (the SW only accepts the harness's commands because the harness runs inside the Puppeteer-launched Chrome process); but the harness CAN invoke any production SW code path via `sw.evaluate`, so a malicious or buggy harness could in principle exfiltrate buffered video. Mitigation: harness code is in-tree, code-reviewed via the same pipeline as production. |
| Test hook surface (`__mokoshTest`) in production bundle | NEW: if tree-shaking fails or the MODE guard is misconfigured, the hook surface ships to production — exposing simulateUserStop, getCurrentStream, captured handler refs to any page that can `eval` against the SW. THIS IS THE SECURITY-CRITICAL THREAT. Mitigation: Tier-1 grep gate (`tests/background/no-test-hooks-in-prod-bundle.test.ts`) enforces zero `__mokoshTest` in `dist/`; runs as part of `npm test` so any CI pipeline picks it up. |
| dev-dependency Chromium binary | NEW: Puppeteer downloads ~150 MB Chromium binary at `npm install` time. Supply-chain compromise of the Chrome download endpoint would inject malicious code into developer machines. Mitigation: `package-lock.json` integrity check (Puppeteer pins the Chromium download hash via its `@puppeteer/browsers` dependency). Out of scope: separate SCA for Puppeteer itself. |
| --auto-select-desktop-capture-source flag in CI | NEW: in a CI container, the flag auto-accepts the "Entire screen" source — which is whatever Xvfb (or modern headless surface) presents. If a CI runner is shared with sensitive workloads, the 35-second recording assertion captures whatever is on screen during that window. Mitigation: document that CI MUST run the harness in an isolated container with no concurrent workload; local-dev runs capture the operator's real screen for 35s during assertion 11, documented in README.md. |
## STRIDE Threat Register
| Threat ID | Category | Component | Disposition | Mitigation Plan |
|-----------|----------|-----------|-------------|-----------------|
| T-1-11-01 | Elevation of Privilege | `__mokoshTest` surface leaking into production `dist/` would expose simulateUserStop, captured chrome.* handler refs, and stream getter to any code with access to the SW context | mitigate | Two layers: (a) gated dynamic import per RESEARCH §6 (the literal `'test' !== 'production'` comparison is a static dead branch that Vite/Rollup tree-shake); (b) Tier-1 unit gate `tests/background/no-test-hooks-in-prod-bundle.test.ts` greps the BUILT artifact for `__mokoshTest` / `simulateUserStop` / `getSegmentCount` / `setCurrentStream` / `setSegmentCountGetter` — ZERO matches required for GREEN. Belt + suspenders catches both tree-shake regression AND new hook-name additions. |
| T-1-11-02 | Information Disclosure | 35-second recording assertion captures whatever is on the operator's screen during local-dev runs | accept | Operator-facing — local-dev runs are by definition under operator control; the recording is consumed only by ffprobe + jszip inside the harness process and is deleted with the temp downloads dir at process exit. CI runs document the isolated-container requirement in README.md. |
| T-1-11-03 | Tampering | Puppeteer downloads Chromium binary at `npm install`; supply-chain compromise of the download endpoint | accept | `package-lock.json` pins resolved hashes via Puppeteer's `@puppeteer/browsers` machinery. Same risk surface as any npm dependency. Phase 5 SCA work (out of scope here) covers periodic re-verification. |
| T-1-11-04 | Denial of Service | A pathological assertion 11 (35s wait) ties up CI runner time; combined with 14 sequential assertions, total runtime ~90s ties up a runner slot | accept | 90s is well within typical CI per-job budgets. Local-dev runs use `SKIP_PROD_REBUILD=1` to drop assertion 0's `npm run build` cost (~10s). Out of scope: parallelizing assertions (would require multi-browser instances, defeating the failure-isolation choice). |
| T-1-11-05 | Repudiation | The harness asserts the absence of recovery notification (Bug B path), but the assertion is a count-delta check — a notification fired BEFORE the snapshot would be invisible | mitigate | Each assertion snapshots `notificationCount` IMMEDIATELY before the trigger event AND immediately after the propagation wait. The delta is checked, not the absolute count. The `notificationIds` array is also asserted on for ID-prefix membership — even if delta counting were fooled by some interleaving, the absence of a 'mokosh-recovery-' prefix in the post-snapshot ids array catches the same regression. |
| T-1-11-06 | Spoofing | Harness reads `__mokoshTest.handlers.onStartup` and invokes it; a hostile production change could swap in a no-op handler that registers AFTER the hook captures the real handler | mitigate | The hook monkey-patches `addListener` AT THE TOP OF THE MODULE (before any production addListener calls). Any later addListener invocation still goes through the patched function and would OVERWRITE handlers.onStartup, not bypass. A malicious bypass would require directly calling `chrome.runtime.onStartup.addListener.call(...)` via a saved bound reference — none exist in the production tree (verified by grep `addListener.call|.bind(chrome.runtime.onStartup)` returns 0). Defense in depth: the assertion verifies the captured handler actually fires the notification side-effect; a stub handler would fail assertion 8's notificationCount check. |
- `npm run test:uat` exits 0 against the current Plan 01-09 bundle; final line is exactly `UAT harness: 14/14 assertions passed`.
- `npm run build` exit 0; `grep -rln __mokoshTest dist/` returns 0; `grep -rln simulateUserStop dist/` returns 0; `grep -rln getSegmentCount dist/` returns 0.
- `npm run build:test` exit 0; `dist-test/` populated; `grep -rln __mokoshTest dist-test/` returns ≥1.
- `npx vitest run` exit 0; 85 GREEN across all test files (83 baseline + 2 from Task 1's Tier-1 grep gate).
- `npx tsc --noEmit` exit 0 across `src/` + `tests/`.
- Tier-1 SW-bundle-import gate (`tests/background/sw-bundle-import.test.ts`) GREEN — verifies the gated dynamic import does not break production module init.
- Tier-1 hook-leak gate (`tests/background/no-test-hooks-in-prod-bundle.test.ts`) GREEN — verifies the production bundle is hook-free.
- Bug B canonical RED-on-regression demo documented in Task 5's commit body (locally reverting b9eeeeb makes assertion 6 RED; re-applying makes GREEN).
- Bug A canonical RED-on-regression demo documented in Task 6's commit body (locally truncating icons/icon128.png makes assertions 8 + 9 RED; restoring makes GREEN).
- Plan 01-09 Task 5 amended at the end of its PLAN.md (no rewrite of the original body); STATE.md Decisions log carries the new Plan 01-11 entry.
- Operator confirms brand/design step 14 + types "approved" in Task 9.
Plan 01-11 is complete when:
1. **Two-bundle separation lives.** `npm run build` produces hook-free `dist/`; `npm run build:test` produces hook-enabled `dist-test/`. The Tier-1 grep gate enforces the production bundle's hook absence.
2. **All 14 harness assertions pass against the current Plan 01-09 bundle.** `npm run test:uat` exits 0; final line is `UAT harness: 14/14 assertions passed`.
3. **Both Phase-1-escapee bugs are now CI-callable.** Assertion 6 (Bug B state-machine routing) and Assertion 8 (Bug A icon-promoted notification) each have a RED-on-regression demo documented in their respective task's commit body, proving the harness assertion CAN catch a regression — not just pass under current conditions.
4. **Operator role retired for functional verification.** Plan 01-09 Task 5 steps 4-13 + 15 redirect to `npm run test:uat`; only step 1 (build) + step 14 (brand/design) retained. The amendment block in 01-09-PLAN.md preserves provenance (no rewrite of the original task).
5. **Existing 83 vitest tests remain GREEN.** Plus the 2 new Tier-1 gate tests in this plan = 85 total. No regression.
6. **`npx tsc --noEmit` exit 0.** All harness code + hook code type-clean.
7. **`npm run build` exit 0; `npm run build:test` exit 0.** Both production and test bundles emit cleanly.
8. **Operator confirms Task 9 brand/design acceptance + types "approved".** Plan 01-09 + Plan 01-11 close together.