mokosh/.planning/phases/01-stabilize-video-pipeline/01-11-PLAN.md at main

Files

Mark ba5474c54f docs(01-11): close as spike-pivot — SUMMARY landed, AMENDMENT-A deleted, pivots to 01-13

Closes Plan 01-11 honestly per GSD spike-pivot pattern. Original
Approach A (Puppeteer sw.evaluate per RESEARCH §1+6) empirically
falsified across Wave 3 execution + feasibility research. Approach B
(extension-internal-page harness + offscreen synthetic stream) proven
via c647f61 prototype; full implementation moves to Plan 01-13.

What this commit does:
- ADDS 01-11-SUMMARY.md (spike-then-pivot framing per GSD artifact-
  types.md PLAN→SUMMARY lifecycle; captures retained infrastructure,
  falsified hypotheses, working prototype, bridge to 01-13)
- REVERTS frontmatter amendment block in 01-11-PLAN.md; replaces with
  closed_as/pivoted_in/closure_note pointing at SUMMARY + 01-13
- DELETES 01-11-PLAN-AMENDMENT-A.md (improvised artifact type — not
  recognized in GSD artifact-types.md; content folded into SUMMARY)

Lesson for orchestrator (captured in SUMMARY §Architectural Notes):
when a plan attempts an approach that proves infeasible, the right
move is honest SUMMARY + new plan, NOT in-place rewrite + AMENDMENT
artifact. The project's own pattern (01-08, 01-09, 01-10, 01-11
added mid-phase as new work surfaced) confirms add-new-plan-when-
scope-shifts is the established pattern.

Plan 01-09 closure via harness PASS NOT achieved by 01-11; still
requires operator UAT pending Plan 01-13 landing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-18 14:02:38 +02:00

97 KiB

Raw Permalink Blame History

closed_as, closed_at, pivoted_in, closure_note, phase, plan, type, wave, depends_on, files_modified, autonomous, requirements, tags, must_haves

closed_as closed_at pivoted_in closure_note phase plan type wave depends_on files_modified autonomous requirements tags must_haves

spike-pivot

2026-05-18

01-13

Approach A (Puppeteer sw.evaluate per RESEARCH §1+6) empirically falsified; pivoted to Approach B (extension-internal-page harness, proven by c647f61 prototype) in Plan 01-13. See 01-11-SUMMARY.md for full pivot rationale + retained infrastructure.

01-stabilize-video-pipeline

tdd

01-08

01-09

package.json

package-lock.json

vite.test.config.ts

tsconfig.json

src/background/index.ts

src/offscreen/recorder.ts

src/test-hooks/sw-hooks.ts

src/test-hooks/offscreen-hooks.ts

src/test-hooks/types.ts

tests/uat/harness.test.ts

tests/uat/lib/launch.ts

tests/uat/lib/extension.ts

tests/uat/lib/sw.ts

tests/uat/lib/offscreen.ts

tests/uat/lib/assertions.ts

tests/uat/lib/zip.ts

tests/uat/lib/test-hook-contract.d.ts

tests/uat/README.md

tests/background/no-test-hooks-in-prod-bundle.test.ts

.planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md

false

REQ-uat-harness-puppeteer

REQ-uat-bug-A-coverage

REQ-uat-bug-B-coverage

REQ-uat-two-bundle

REQ-uat-ci-friendly

REQ-uat-13-assertions

REQ-video-ring-buffer

puppeteer

uat

harness

e2e

mv3-extension

getDisplayMedia

bug-B

bug-A

tier-1

two-bundle

truths artifacts key_links

`npm run build:test` produces `dist-test/` with `__mokoshTest` hook surfaces injected into SW + offscreen contexts; `npm run build` produces `dist/` with ZERO occurrences of `__mokoshTest` (grep-verifiable).

`npm run test:uat` orchestrates `build:test` + the Puppeteer harness end-to-end; exits 0 only when ALL 14 assertions pass (13 from the brief + assertion 0 = production-bundle hook-leak grep gate).

Bug B harness assertion (track.dispatchEvent('ended') → badge OFF + popup '' + isRecording=false + NO recovery notification) demonstrably catches a regression: rewinding the b9eeeeb conditional routing locally turns this assertion RED; reapplying turns it GREEN.

Bug A harness assertion (onStartup → chrome.notifications.create resolves cleanly with the manifest's icon48.png iconUrl) demonstrably catches a regression: stubbing the icon48 file to <100 bytes turns this assertion RED; restoring turns it GREEN.

Harness runs in `--headless=new` for CI portability; local-debug mode supported via `HEADLESS=0`; no Xvfb required (per RESEARCH §3 empirical probes against Chrome 148).

Test hooks live ONLY behind `import.meta.env.MODE === 'test'` guarded dynamic imports; Vite tree-shakes them from the production bundle; the no-test-hooks-in-prod-bundle.test.ts unit gate enforces this in the existing vitest suite (Tier-1 alongside sw-bundle-import.test.ts).

Existing 83 vitest tests remain GREEN after this plan lands (no regression to the unit test bed).

Plan 01-09 functional contract closes by harness PASS: its Task 5 operator-checkpoint amendment redirects to `npm run test:uat` for steps 4-13 + 15; operator retains only step 1 (build) + step 14 (brand/design check).

path	provides	contains
vite.test.config.ts	Vite config extending the production config; sets `mode: 'test'`, `build.outDir: 'dist-test'`, `build.emptyOutDir: true`.	dist-test

path	provides	contains
src/test-hooks/types.ts	Shared TS type declaring `globalThis.__mokoshTest` shape (handlers, getCurrentStream, simulateUserStop, notificationCount, lastNotificationOptions). Single source of truth for SW + offscreen + harness.	__mokoshTest

path	provides	contains
src/test-hooks/sw-hooks.ts	SW-side test hook: captures chrome.action.onClicked / chrome.runtime.onStartup / chrome.notifications.onClicked handler refs; wraps chrome.notifications.create to record notificationCount + lastNotificationOptions. Imported dynamically from src/background/index.ts under `import.meta.env.MODE === 'test'` guard.	handlers

path	provides	contains
src/test-hooks/offscreen-hooks.ts	Offscreen-side test hook: exposes the current MediaStream via getter; provides simulateUserStop wrapping `track.dispatchEvent(new Event('ended'))` per RESEARCH §7. Imported dynamically from src/offscreen/recorder.ts under `import.meta.env.MODE === 'test'` guard.	simulateUserStop

path	provides	contains
src/background/index.ts	Adds a single `if (import.meta.env.MODE === 'test') { await import('../test-hooks/sw-hooks'); }` block at top-of-module so the hook registration runs BEFORE any production addListener calls (capturing every handler).	import.meta.env.MODE

path	provides	contains
src/offscreen/recorder.ts	Adds an `if (import.meta.env.MODE === 'test') { __sharedRefs.setMediaStreamGetter(() => mediaStream); }` block (the import itself is gated; the getter wires the runtime mediaStream reference into the hook surface). Same guard pattern as SW.	import.meta.env.MODE

path	provides	min_lines
tests/uat/harness.test.ts	Single Node script (run under tsx) implementing all 14 assertions sequentially. ~400 LoC. Top-to-bottom narrative — launch, click, assert, simulate Bug B, simulate Bug A, etc. Returns exit 0 on full pass, non-zero on any failure with structured diagnostic dump.	350

path	provides
tests/uat/lib/launch.ts	puppeteer.launch wrapper: builds args, sets enableExtensions to absolute dist-test path, chooses headless mode per CI env, configures downloads dir, exports a single launchHarnessBrowser() function.

path	provides
tests/uat/lib/extension.ts	Helpers to resolve the extension id, attach to the SW target, attach to the offscreen target (background_page type per RESEARCH §4 / Pitfall 1).

path	provides
tests/uat/lib/sw.ts	SW context helpers: getBadgeText, getPopup, getManifestIcons, fireOnStartup (via captured handler ref), sendSyntheticRecordingError, keepalivePing.

path	provides
tests/uat/lib/offscreen.ts	Offscreen context helpers: waitForOffscreenTarget, getDisplaySurface, simulateUserStop (the dispatchEvent('ended') path per RESEARCH §7 BLOCKER finding).

path	provides
tests/uat/lib/assertions.ts	Per-assertion helpers (assertEqual + structured diagnostic on failure); a runWithStartupDiagnostics wrapper that captures SW + offscreen console logs and dumps them on assertion failure for triage.

path	provides
tests/uat/lib/zip.ts	jszip-based archive shape assertions; reads downloaded `session_report_*.zip`, asserts `video/last_30sec.webm` present + `meta.json` carries `version === chrome.runtime.getManifest().version` (extension-side version read passed in).

path	provides
tests/uat/lib/test-hook-contract.d.ts	Mirror of src/test-hooks/types.ts in TS-declaration form for the harness side; documents the wire contract between hook injector and harness consumer.

path	provides
tests/uat/README.md	How to run: `npm run test:uat`; local-debug headful mode via `HEADLESS=0`; CI semantics; troubleshooting (locale-specific picker string, Xvfb fallback if a future Chrome regresses headless, dev-dependency Chromium binary size note).

path	provides
tests/background/no-test-hooks-in-prod-bundle.test.ts	Tier-1 unit-level grep gate (cousin of sw-bundle-import.test.ts): runs `npm run build` then asserts ZERO occurrences of `__mokoshTest` and ZERO occurrences of `simulateUserStop` in any file under `dist/`. RED today (the test runs before this plan lands its hook gating); GREEN after Task 1 verifies the gate AND the hook gating is correct.

path	provides	contains
package.json	Adds `puppeteer` ^25.0.2 + `tsx` ^4 to devDependencies; adds two npm scripts: `build:test` (`tsc && vite build --mode test --config vite.test.config.ts`) and `test:uat` (`npm run build:test && tsx tests/uat/harness.test.ts`).	test:uat

path	provides
tsconfig.json	Includes `src/test-hooks/*/` in compilation surface (so tsc validates the hook code). NO change to emit (vite handles bundling).

path	provides	contains
.planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md	AMENDMENT block at the end of the file: redirects Plan 01-09 Task 5 operator-checkpoint steps 4-13 + 15 to `npm run test:uat` (this plan's harness). Operator retains step 1 (build) + step 14 (brand/design accept) only. Plan 01-09 closes when `npm run test:uat` exits 0 AND operator confirms brand/design step 14.	Plan 01-11 amendment

from	to	via	pattern
tests/uat/harness.test.ts	tests/uat/lib/launch.ts:launchHarnessBrowser	import	import.from.lib/launch

from	to	via	pattern
tests/uat/lib/launch.ts	puppeteer.launch	enableExtensions + headless + autoSelect flag	enableExtensions

from	to	via	pattern
src/background/index.ts	src/test-hooks/sw-hooks.ts	guarded dynamic import	import.meta.env.MODE === ['"]test['"]

from	to	via	pattern
src/offscreen/recorder.ts	src/test-hooks/offscreen-hooks.ts	guarded dynamic import + setMediaStreamGetter wire	import.meta.env.MODE === ['"]test['"]

from	to	via	pattern
tests/uat/lib/offscreen.ts:simulateUserStop	track.dispatchEvent(new Event('ended'))	evaluate-in-offscreen-page on __mokoshTest.getCurrentStream().getVideoTracks()[0]	dispatchEvent(new Event(['"]ended['"]

from	to	via	pattern
tests/background/no-test-hooks-in-prod-bundle.test.ts	dist/ artifact tree	post-build grep for __mokoshTest + simulateUserStop	grep.__mokoshTest.dist

Scope Sanity Note

4 waves, 8 tasks, 18 file artifacts. This sits at the upper end of the "split signal" threshold but consolidating is the right call:

The test infrastructure (Wave 0), the hook gating (Wave 1), the harness scaffolding (Wave 2), and the 14 assertions (Wave 3) are tightly coupled at the contract level — splitting them into separate plans would force the harness contract (the __mokoshTest shape) to be re-derived in each plan's frontmatter must_haves, multiplying the duplication tax.
Per RESEARCH §6, the two-bundle gate (__mokoshTest ABSENT in production) is the security-critical mitigation for shipping test hooks. That gate MUST be wired in the same plan that adds the hooks; splitting would create a window where the hooks exist but the gate doesn't.
Wave 4 (closure) is a single checkpoint task — bundling it with Wave 3 wouldn't change context cost meaningfully, and separating it keeps the operator-checkpoint scope visible in the wave structure.
Context budget: Wave 0 + Wave 1 + Wave 2 ~30%; Wave 3 ~35%; Wave 4 ~5% (checkpoint). Total ~70%. Above the 50% target — but the 14 assertions are deterministic and template-shaped, so per-assertion authoring cost is sub-linear once Wave 2 lands.

If a future revision DOES force a split, natural cut line: Plan 01-11A = Waves 0+1+2 (infrastructure + first 4 assertions as smoke); Plan 01-11B = Waves 3+4 (remaining 10 assertions + closure). This split incurs the contract-duplication tax and is NOT recommended absent a context-cost regression.

Build a Puppeteer-driven Node UAT harness that retires the operator-as-assertion-library role. Plan 01-09's Task 5 took 4-6 hours of operator empirical UAT cycles (Bug A icons + Bug B state routing both escaped vitest unit coverage); every "visual" check in that task has a CDP-callable equivalent. This plan automates them.

Three coordinated changes:

Two-bundle separation via vite.test.config.ts extending the production config with mode: 'test' + outDir: 'dist-test'. Production builds stay hook-free.
Test hooks in src/test-hooks/ consumed via guarded dynamic imports from SW + offscreen. The dynamic-import-inside-MODE-guard pattern (RESEARCH §6) lets Vite tree-shake the hook MODULES entirely from production, with a Tier-1 grep gate (tests/background/no-test-hooks-in-prod-bundle.test.ts) verifying the absence.
Puppeteer harness at tests/uat/harness.test.ts (plus a lib/ helper split following MetaMask's POM shape per RESEARCH §5) implementing 14 assertions: assertion 0 (production-bundle hook-leak grep gate) + assertions 1-13 from the orchestrator brief. Bug B uses track.dispatchEvent(new Event('ended')) per RESEARCH §7 BLOCKER — NOT track.stop() which silently invalidates the assertion.

Operator role retirement: Plan 01-09's Task 5 is amended to redirect steps 4-13 + 15 to npm run test:uat. Operator retains only step 1 (build verification) + step 14 (brand/design acceptance). All functional gates move to CI-callable harness.

Output:

vite.test.config.ts — production config extension with mode: 'test' + outDir: 'dist-test'.
src/test-hooks/{sw-hooks,offscreen-hooks,types}.ts — gated hook modules.
src/background/index.ts + src/offscreen/recorder.ts — gated dynamic import block (one line each + a setMediaStreamGetter wire in offscreen).
tests/uat/harness.test.ts + tests/uat/lib/*.ts + tests/uat/README.md — harness + helpers.
tests/background/no-test-hooks-in-prod-bundle.test.ts — Tier-1 unit-level hook-leak gate.
package.json — puppeteer, tsx devDeps + build:test, test:uat scripts.
tsconfig.json — includes src/test-hooks/**/* for type-checking.
.planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md — amendment block redirecting Task 5 functional steps to npm run test:uat.

<execution_context> @$HOME/.claude/get-shit-done/workflows/execute-plan.md @$HOME/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/REQUIREMENTS.md @.planning/phases/01-stabilize-video-pipeline/01-CONTEXT.md @.planning/phases/01-stabilize-video-pipeline/01-08-PLAN.md @.planning/phases/01-stabilize-video-pipeline/01-08-SUMMARY.md @.planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md @.planning/phases/01-stabilize-video-pipeline/01-09-SUMMARY.md @.planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md @.planning/debug/resolved/01-09-recovery-flow.md @src/background/index.ts @src/offscreen/recorder.ts @manifest.json @vite.config.ts @tsconfig.json @package.json @tests/background/sw-bundle-import.test.ts

Puppeteer 25.0.2 extension API surface (RESEARCH §1, empirically verified)

import puppeteer, { Browser, Extension, Page, Target } from 'puppeteer';

const browser: Browser = await puppeteer.launch({
  pipe: true,
  enableExtensions: ['/abs/path/to/dist-test'],   // string[] or true
  headless: process.env.HEADLESS !== '0',          // default headless=true; local debug HEADLESS=0
  args: [
    '--no-sandbox',
    '--auto-select-desktop-capture-source=Entire screen',  // RESEARCH §9 — locale-specific
    // DO NOT add --use-fake-ui-for-media-stream (per RESEARCH §9 Pitfall, conflicts with auto-select)
  ],
});

const extensions = await browser.extensions();   // Map<id, Extension>
const [extId, ext] = [...extensions][0];

const swTarget = await browser.waitForTarget(
  (t: Target) => t.type() === 'service_worker',
  { timeout: 10_000 },
);
const sw = await swTarget.worker();              // WebWorker — has .evaluate()

const page = await browser.newPage();
await page.goto('about:blank');
await page.triggerExtensionAction(ext);          // simulates toolbar click (NEEDS popup === '')

// Offscreen page — RESEARCH §4 / Pitfall 1: target type 'background_page' NOT 'page'
const offTarget = browser.targets().find((t) =>
  t.type() === 'background_page' && t.url().includes('offscreen'),
);
const offPage = await offTarget.asPage();        // NOT .page() — only .asPage() works

Chrome SW state surface (read via sw.evaluate)

// Read badge text
const badge = await sw.evaluate(() => chrome.action.getBadgeText({}));

// Read popup
const popup = await sw.evaluate(() => chrome.action.getPopup({}));

// Read manifest
const manifest = await sw.evaluate(() => chrome.runtime.getManifest());
// manifest.icons === { '16': 'icons/icon16.png', '48': '...', '128': '...' }
// manifest.permissions includes 'notifications', etc.

// Synthesize RECORDING_ERROR (no hook needed — goes through onMessage handler)
await sw.evaluate(() =>
  chrome.runtime.sendMessage({ type: 'RECORDING_ERROR', error: 'codec-unsupported' }),
);

// Invoke onStartup via captured handler ref (needs hook — see sw-hooks.ts)
await sw.evaluate(() => globalThis.__mokoshTest!.handlers.onStartup?.());

// Fetch an extension file and check size
const iconSize = await sw.evaluate(async () => {
  const r = await fetch(chrome.runtime.getURL('icons/icon48.png'));
  return r.ok ? Number(r.headers.get('content-length') ?? '0') : -1;
});

Offscreen surface (read via offPage.evaluate)

// Read displaySurface — RESEARCH §11 Req 3
const ds = await offPage.evaluate(() =>
  globalThis.__mokoshTest!.getCurrentStream!()?.getVideoTracks()[0]?.getSettings().displaySurface ?? null,
);

// Simulate user-stopped — RESEARCH §7 BLOCKER. MUST be dispatchEvent, NOT track.stop().
await offPage.evaluate(() => {
  const stream = globalThis.__mokoshTest!.getCurrentStream!();
  if (stream === null) throw new Error('no current stream — recording must be active');
  const track = stream.getVideoTracks()[0];
  track.dispatchEvent(new Event('ended'));
  // Track still readyState 'live' after dispatch; production handler will
  // call stream.getTracks().forEach(t => t.stop()) which DOES release the
  // capture (just doesn't refire 'ended' on the same track — spec).
});

Test hook contract (NEW — src/test-hooks/types.ts)

// src/test-hooks/types.ts
// SINGLE SOURCE OF TRUTH for the __mokoshTest wire shape.
// Imported by sw-hooks.ts (registers), offscreen-hooks.ts (registers),
// and tests/uat/lib/test-hook-contract.d.ts (consumes — mirror).

export interface MokoshTestSurface {
  // SW handler refs (captured by sw-hooks.ts monkey-patching addListener)
  handlers: {
    onClicked: ((tab: chrome.tabs.Tab) => void | Promise<void>) | null;
    onStartup: (() => void | Promise<void>) | null;
    notificationOnClicked: ((notificationId: string) => void | Promise<void>) | null;
  };
  // SW notification observability
  notificationCount: number;
  lastNotificationOptions: chrome.notifications.NotificationOptions | null;
  notificationIds: ReadonlyArray<string>;
  // Offscreen getCurrentStream — undefined in SW context; defined in offscreen.
  // Always-present in the type to keep the harness side simple; runtime null is
  // the "not currently recording" signal.
  getCurrentStream?: () => MediaStream | null;
}

declare global {
  // eslint-disable-next-line no-var
  var __mokoshTest: MokoshTestSurface | undefined;
}

export {};

Production hook-gate pattern (src/background/index.ts top-of-module)

// AT THE VERY TOP of src/background/index.ts, BEFORE any addListener calls.
// import.meta.env.MODE is statically replaced at build time by Vite (RESEARCH §6);
// the entire `if` block + its dynamic import are tree-shaken from production bundles
// because the literal === comparison resolves to `false` and Rollup deletes the
// unreachable branch.
if (import.meta.env.MODE === 'test') {
  await import('../test-hooks/sw-hooks');
}

CRITICAL ORDERING: the hook import MUST run BEFORE any production addListener calls so the monkey-patches catch the handlers as they register. Top-of-module placement satisfies this.

Production hook-gate pattern (src/offscreen/recorder.ts)

// Top-of-module: register the hook.
if (import.meta.env.MODE === 'test') {
  await import('../test-hooks/offscreen-hooks');
}

// Later, INSIDE startRecording after `mediaStream = stream;` (line ~247):
// Wire the runtime mediaStream reference into the hook. The hook's
// getCurrentStream getter reads through this wire. Gated identically so
// production bundle has zero hook reference at this site.
if (import.meta.env.MODE === 'test') {
  globalThis.__mokoshTest?.getCurrentStream;  // no-op read — actual wiring is in offscreen-hooks.ts setup
  // The hook installs its own getter at registration time via a closure capture of
  // a `currentStream` cell that we mutate here:
  const hooks = await import('../test-hooks/offscreen-hooks');
  hooks.setCurrentStream(stream);
}

(Note: the executor may flatten this — the simpler shape is to expose a setCurrentStream function from offscreen-hooks.ts that the recorder calls after assignment. The hook-side closes over a mutable currentStream variable. See Task 2 step 5.)

Vite test config skeleton (vite.test.config.ts)

import { defineConfig, mergeConfig } from 'vite';
import baseConfig from './vite.config';

export default defineConfig(() =>
  mergeConfig(baseConfig, {
    mode: 'test',
    build: {
      outDir: 'dist-test',
      emptyOutDir: true,
    },
  }),
);

npm scripts to add (package.json)

{
  "scripts": {
    "dev": "vite",
    "build": "tsc && vite build",
    "build:test": "tsc && vite build --mode test --config vite.test.config.ts",
    "preview": "vite preview",
    "test": "vitest run",
    "test:uat": "npm run build:test && tsx tests/uat/harness.test.ts"
  }
}

Existing surfaces the executor must NOT alter (regression risk)

src/background/index.ts lines 725-778 (RECORDING_ERROR conditional routing) — Bug B fix landed at b9eeeeb; harness asserts this is intact.
src/offscreen/recorder.ts lines 451-480 (onUserStoppedSharing) — Bug B handler; harness assertion 6 verifies the dispatchEvent path reaches it.
tests/background/sw-bundle-import.test.ts — Tier-1 gate; the new no-test-hooks-in-prod-bundle.test.ts follows the same pattern but inspects the BUILT artifact for hook leaks.
manifest.json — already declares notifications permission + all 3 icon sizes; harness assertions 8, 9, 10 read these as-is.
ALL existing 83 vitest tests — must remain GREEN.

Resolved open questions from RESEARCH (5)

#	Question	Resolution	Rationale
1	Where does `simulateUserStop` shim live?	`src/test-hooks/offscreen-hooks.ts` exports a `setCurrentStream(stream: MediaStream)` setter the recorder calls after assignment. The hook's `__mokoshTest.getCurrentStream` is a getter over the captured cell. `simulateUserStop` is harness-side (in `tests/uat/lib/offscreen.ts`) calling `dispatchEvent` directly on the track returned by `getCurrentStream()` — the offscreen-hooks side just exposes the stream; the simulate function is harness-side.	Minimum surface in production tree; the dispatchEvent invocation is harness-side so it's never bundled.
2	Notification assertions: count vs set-membership?	Count + set-membership combined. notificationCount asserts on TOTAL count (e.g. assertion 8: exactly 1 startup notification). notificationIds asserts on prefix membership (e.g. "an id starting 'mokosh-startup-' was created"). lastNotificationOptions asserts on iconUrl shape.	Pure count is brittle (retries inflate); pure set-membership misses overcount regressions. Combined assertions catch both.
3	CI plumbing scope: include or defer?	Defer to Phase 5 (P1/P2 hardening) or its own Plan 01-12. This plan ships a CI-callable harness (`npm run test:uat` exits 0 on pass, non-zero on fail) but no GitHub Actions wiring. Rationale: no existing CI infrastructure in the repo (verified — no `.github/workflows/` directory); adding CI here would force a CI-tool decision (Actions vs self-hosted) that is out of scope for Phase 1 stabilization.	Lowest-friction shipping; CI tool selection deserves its own plan.
4	Failure isolation: single browser vs per-assertion restart?	Single browser, serial assertions. Restart between assertions = ~3-5 s × 14 = 60+ s overhead per run. Single browser keeps total runtime under 60 s. Mitigation: structured diagnostic dump on first failure (SW console logs + offscreen console logs + screenshot) + `--bail` semantics (abort remaining assertions to keep failure mode unambiguous).	RESEARCH §5 recommendation matches; cost of state bleed is much lower than cost of state isolation overhead for 14 deterministic checks.
5	Test-hook contract location?	Both. Production-side canonical: `src/test-hooks/types.ts` (the file that ships with the test bundle and is type-checked by tsc). Harness-side mirror: `tests/uat/lib/test-hook-contract.d.ts` (decoupled from the production tree so the harness has no `import` reaching into `src/`). The mirror file's preamble cites the production-side file as the canonical source. Drift detection: a Tier-1-style test could later snapshot-diff the two; out of scope here, but documented as a follow-up note.	Type duplication is a small price for keeping `tests/` and `src/` import-separable. The drift risk is low because the shape is small (4 fields).

How to test Bug B without committing the revert

Per orchestrator brief ("rewinding the b9eeeeb conditional routing locally turns this assertion RED"):

Locally apply: git apply <<'EOF' ... EOF containing a temporary patch that reverts the if (errorCode === 'user-stopped-sharing') branch (so all errors route through setErrorMode).
Run npm run test:uat; assertion 6 (Bug B) MUST fail with a specific diagnostic (expected badge text '' but got 'ERR').
Revert the local patch (git checkout -- src/background/index.ts).
Re-run npm run test:uat; assertion 6 MUST pass.

This RED-on-known-broken / GREEN-on-known-good cycle is the TDD discipline for the harness ITSELF. Each assertion in Task 5/6/7 includes this self-verification step in its action block.

Task 1 (Wave 0): Install Puppeteer + tsx; add `vite.test.config.ts`; add `build:test` + `test:uat` npm scripts; commit Tier-1 hook-leak grep gate as RED. - package.json (existing scripts + devDeps — confirm puppeteer + tsx absent) - vite.config.ts (the base config the new test config will merge over) - tests/background/sw-bundle-import.test.ts (Tier-1 gate pattern to mirror) - tsconfig.json (confirm `include` covers `src/**/*` — needed for src/test-hooks/) - .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §10 (two-bundle build orchestration) package.json, package-lock.json, vite.test.config.ts, tsconfig.json, tests/background/no-test-hooks-in-prod-bundle.test.ts - `npm install --save-dev puppeteer@^25.0.2 tsx@^4` lands cleanly. Both publish to npm registry as MIT-licensed packages with active maintenance windows (puppeteer 25.0.2 published 2025; tsx 4.x current). Pin both with caret ranges per project convention. - `vite.test.config.ts` exists, extends `./vite.config.ts` via `mergeConfig`, sets `mode: 'test'` + `build.outDir: 'dist-test'` + `build.emptyOutDir: true`. Running `npx vite build --config vite.test.config.ts --mode test` produces `dist-test/` (verifiable via `test -d dist-test`). - `package.json` `scripts` block adds `build:test` and `test:uat` per the interfaces block. `npm run build:test` exits 0 and produces `dist-test/`. - `tsconfig.json` `include` covers `src/test-hooks/**/*` (verify it does already via the `src/**/*` glob; no edit needed if `include` is already that wildcard — check first and only add if absent). - `tests/background/no-test-hooks-in-prod-bundle.test.ts` exists with TWO `it` blocks: (a) After `npm run build`, ZERO occurrences of `__mokoshTest` in any file under `dist/`. RED today because the gate test is committed BEFORE the hooks land — the gate is asserting on a not-yet-extant invariant. **CORRECTION:** RED-then-GREEN polarity here is inverted vs typical TDD: the gate ITSELF is GREEN today (no hooks → no leak), but the GATE must REMAIN GREEN after Task 2 lands the hooks. The test is committed in this task so the gate is operational BEFORE the hooks ship, eliminating the window-of-vulnerability where the production bundle could contain leaked hooks. Document this polarity in the test file preamble. (b) After `npm run build`, ZERO occurrences of `simulateUserStop` in any file under `dist/`. Same polarity: GREEN today, must remain GREEN after hooks land. - Both `it` blocks run a fresh `npm run build` as part of their setup (spawned via `child_process.execFile`, mirroring sw-bundle-import.test.ts's spawn pattern). They then `readdir`+`readFileSync` walk `dist/` and assert grep counts are zero. Skip the build spawn if `process.env.SKIP_BUILD === '1'` (developer escape hatch when running the test repeatedly during this task's iteration). - The 83 baseline vitest tests + 2 new gate tests = 85 tests, ALL GREEN. (The Tier-1 gate is committed in a working state from day one.) 1. Read `package.json` to confirm `puppeteer` + `tsx` absent. 2. `npm install --save-dev puppeteer@^25.0.2 tsx@^4` — observe versions resolve correctly. Document the actually-resolved versions in the commit message body. 3. Update `package.json` `scripts` block per the interfaces section — add `build:test` and `test:uat`. Leave existing scripts (`dev`, `build`, `preview`, `test`) untouched. 4. Create `vite.test.config.ts` at repo root per the interfaces skeleton. 5. Verify `tsconfig.json` `include` covers `src/test-hooks/**/*` — if `include` is `["src/**/*"]` or omits `exclude` that would block, no edit needed. Document the actual `tsconfig.json` shape in the commit message body so reviewers see the verification ran. 6. Run `npm run build:test` → exit 0; `ls dist-test/` confirms emission. Run `npm run build` → exit 0; `ls dist/` confirms separate output. 7. Create `tests/background/no-test-hooks-in-prod-bundle.test.ts` with the two `it` blocks per behavior (a) + (b). Preamble docstring per project style: extensive (Google Python style mandate carries over — keep mirroring sw-bundle-import.test.ts's docstring density). Cite that this is a Tier-1 gate per `feedback-pre-checkpoint-bundle-gates.md` (the auto-loaded memory item). 8. Run `npx vitest run tests/background/no-test-hooks-in-prod-bundle.test.ts` → both GREEN (no hooks landed yet, nothing leaks). 9. Run `npx vitest run` (full suite) → 84 baseline + 2 new = 85 GREEN. Document the baseline + delta in the commit message body. 10. Run `npx tsc --noEmit` → exit 0. 11. Verify that NO `npm test` regression: rerun `npm test` → 85 GREEN. Per project style: extensive docstrings; absolute imports; no `as any`; no `@ts-ignore`. The new test file is the first one to touch `child_process.execFile` since `sw-bundle-import.test.ts` — mirror that file's pattern verbatim (execFile + maxBuffer + timeout + stdout sentinel scheme). Do NOT introduce a new pattern. npm run build:test && npm run build && test -d dist-test && test -d dist && npx vitest run tests/background/no-test-hooks-in-prod-bundle.test.ts && npx tsc --noEmit - `package.json` devDeps include `puppeteer` + `tsx` at the pinned versions; `scripts` block carries `build:test` + `test:uat`. - `vite.test.config.ts` exists, extends base config, emits to `dist-test/`. - `npm run build:test` exits 0; `dist-test/` populated. - `npm run build` exits 0; `dist/` populated separately (no clobber). - `tests/background/no-test-hooks-in-prod-bundle.test.ts` exists with 2 tests; both GREEN. - Full vitest suite: 83 baseline + 2 new = 85 GREEN. - `npx tsc --noEmit` exit 0. Two-bundle infrastructure landed; Tier-1 hook-leak gate operational (GREEN, will remain GREEN after Task 2 hooks land); npm scripts wired; baseline preserved. Task 2 (Wave 1): Add gated test hooks to SW + offscreen; verify production bundle remains hook-free (Tier-1 gate stays GREEN). - src/background/index.ts (top-of-module — where the import.meta.env.MODE guard lands; lines 1-50) - src/offscreen/recorder.ts (top-of-module + line ~247 where mediaStream is assigned) - tests/background/sw-bundle-import.test.ts (the Tier-1 SW-bundle-loadability gate — confirm it still passes after hooks land in test bundle) - tests/background/no-test-hooks-in-prod-bundle.test.ts (the gate from Task 1) - .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §6 (Vite tree-shaking gotchas) - vite.test.config.ts (from Task 1) src/test-hooks/types.ts, src/test-hooks/sw-hooks.ts, src/test-hooks/offscreen-hooks.ts, src/background/index.ts, src/offscreen/recorder.ts, tests/uat/lib/test-hook-contract.d.ts - `src/test-hooks/types.ts` exports `MokoshTestSurface` + declares `globalThis.__mokoshTest` per the interfaces block. - `src/test-hooks/sw-hooks.ts` registers the SW-side hook at module-load: monkey-patches `chrome.action.onClicked.addListener`, `chrome.runtime.onStartup.addListener`, `chrome.notifications.onClicked.addListener` to capture handler refs while still calling the originals. Wraps `chrome.notifications.create` to increment `notificationCount`, push id to `notificationIds`, save `lastNotificationOptions`. Initializes `globalThis.__mokoshTest = { handlers: {...}, notificationCount: 0, lastNotificationOptions: null, notificationIds: [] }`. NO `getCurrentStream` in SW (the field is optional per type — undefined in SW context). - `src/test-hooks/offscreen-hooks.ts` registers the offscreen-side hook: exposes a mutable `currentStream: MediaStream | null` cell + `setCurrentStream(s)` setter + `__mokoshTest.getCurrentStream = () => currentStream` getter. The recorder calls `setCurrentStream` after the `mediaStream = stream` assignment (gated by the same MODE check). - `src/background/index.ts` top-of-module gets: ```typescript if (import.meta.env.MODE === 'test') { await import('../test-hooks/sw-hooks'); } ``` Placement: BEFORE any `addListener` calls in the file so the monkey-patches catch every handler. This is a top-level `await` — supported in SW context per crxjs/Vite's MV3 module emission. - `src/offscreen/recorder.ts` top-of-module gets the symmetric gated import; the `setCurrentStream` call lands inside `startRecording` right after `mediaStream = stream;` (line 247), also gated. - `tests/uat/lib/test-hook-contract.d.ts` mirrors `MokoshTestSurface` for harness-side consumption (it's a declaration file; not bundled, only used at type-check time on the harness). - After all changes, `npm run build` exits 0 AND `tests/background/no-test-hooks-in-prod-bundle.test.ts` REMAINS GREEN (the literal `__mokoshTest` does NOT appear in any file under `dist/`). `npm run build:test` exits 0 AND ONE OR MORE files under `dist-test/` contain `__mokoshTest` (verifiable by `grep -l __mokoshTest dist-test/`). - `tests/background/sw-bundle-import.test.ts` REMAINS GREEN (Layer 1 + Layer 2; the gated dynamic import does not break the production bundle's module init). - Full vitest suite: 85 GREEN (no regression). 1. Create `src/test-hooks/types.ts` per the interfaces block. Extensive JSDoc; cite this plan's Task 2 + RESEARCH §6 (gating mechanism) + RESEARCH §7 (Bug B BLOCKER context for getCurrentStream's role). 2. Create `src/test-hooks/sw-hooks.ts`. Monkey-patch pattern follows RESEARCH §6 Pattern 1. Wrap `chrome.notifications.create` so all four shape fields update (count, last options, ids array, no-op chain to the original create). Use absolute Chrome types from `@types/chrome` — no `as any`. Initialization at module load: ```typescript const handlers: MokoshTestSurface['handlers'] = { onClicked: null, onStartup: null, notificationOnClicked: null, }; const notificationIds: string[] = [];

   const origActionAdd = chrome.action.onClicked.addListener.bind(chrome.action.onClicked);
   chrome.action.onClicked.addListener = (cb) => {
     handlers.onClicked = cb;
     return origActionAdd(cb);
   };
   // ... similarly for onStartup, notifications.onClicked ...

   const origNotifCreate = chrome.notifications.create.bind(chrome.notifications);
   (chrome.notifications.create as unknown) = (idOrOptions: string | chrome.notifications.NotificationOptions, optionsOrCb?: chrome.notifications.NotificationOptions | ((id: string) => void), maybeCb?: (id: string) => void) => {
     // Handle both (id, options, cb) and (options, cb) overloads;
     // surface the resolved id in notificationIds.
     // Call origNotifCreate with the same args; wrap the callback to push id.
     // Increment notificationCount; save lastNotificationOptions.
     // Return the original return value (Chrome 88+ also Promise-returning).
   };

   globalThis.__mokoshTest = {
     handlers,
     notificationCount: 0,
     lastNotificationOptions: null,
     get notificationIds() { return notificationIds.slice(); },
   };
   ```
   The `as unknown` cast in the `create` reassignment is unavoidable because Chrome's `create` is typed as overloaded callable; document this explicitly with a comment citing the overload variance issue. NO `as any` — the `as unknown` + downstream typed body is the project-style escape hatch.
3. Create `src/test-hooks/offscreen-hooks.ts`:
   ```typescript
   let currentStream: MediaStream | null = null;
   export function setCurrentStream(stream: MediaStream | null): void {
     currentStream = stream;
   }
   globalThis.__mokoshTest = {
     // ...inherit SW's surface if it was set first; in offscreen context
     // sw-hooks.ts did NOT run because this is a different document.
     // So we initialize a fresh shape with only the offscreen-relevant fields:
     handlers: { onClicked: null, onStartup: null, notificationOnClicked: null },
     notificationCount: 0,
     lastNotificationOptions: null,
     notificationIds: [],
     getCurrentStream: () => currentStream,
   };
   ```
   Note: the SW and offscreen are DIFFERENT JS isolates with DIFFERENT `globalThis`. The harness reads each surface via the appropriate `sw.evaluate` or `offPage.evaluate`. No cross-context shared state.
4. Edit `src/background/index.ts` — add the gated dynamic import at the TOP of the file (after any necessary type imports but BEFORE the existing logger initialization + addListener calls). Document inline that the placement is load-order-critical: this MUST run before any addListener.
5. Edit `src/offscreen/recorder.ts`:
   (a) Top-of-module: gated dynamic import per the SW pattern.
   (b) Inside `startRecording`, immediately after `mediaStream = stream;` (line ~247): gated `setCurrentStream(stream)` call. Use a top-level captured reference to the hooks module (set during the top-of-module import via a module-scoped `let hooks: typeof import('../test-hooks/offscreen-hooks') | null = null;` plus assignment in the import block). This avoids re-import per startRecording call.
6. Create `tests/uat/lib/test-hook-contract.d.ts`. Mirror `MokoshTestSurface`. Add a preamble docstring citing `src/test-hooks/types.ts` as the canonical source AND noting the drift-risk (manual sync) + the rationale for decoupling (no `import` from `tests/` into `src/`).
7. Run `npx tsc --noEmit` → exit 0 (all hook code typechecks).
8. Run `npm run build` (production). Then check `grep -rln __mokoshTest dist/` → ZERO matches. The Tier-1 gate test `tests/background/no-test-hooks-in-prod-bundle.test.ts` MUST stay GREEN.
9. Run `npm run build:test`. Then check `grep -rln __mokoshTest dist-test/` → ONE OR MORE matches (the hook code is bundled into the test build).
10. Run `npx vitest run` (full suite). 85 GREEN. The SW-bundle-import test must also be GREEN — verifies the gated dynamic import does NOT break production module init.
11. Sanity-check: open one of the production bundle's chunk files (the SW chunk via `dist/service-worker-loader.js` → its imported chunk) and confirm by eye that no `__mokoshTest` string is present. The grep gate is authoritative, but a manual eyeball ensures the gate isn't fooled by some bundler renaming.
DESIGN NOTE: the gated dynamic import IS the tree-shake trigger. If Vite ever fails to tree-shake a dynamic import behind a literal-comparison guard (which it shouldn't per RESEARCH §6 — the literal `'test'` !== `'production'` comparison is a static dead branch in production), the Tier-1 gate fails LOUDLY at CI time. The gate is THE mitigation for assumption A3 in RESEARCH §6.

npx tsc --noEmit && npm run build && test "$(grep -rln __mokoshTest dist/ | wc -l)" = "0" && npm run build:test && test "$(grep -rln __mokoshTest dist-test/ | wc -l)" -ge "1" && npx vitest run --reporter=dot - `src/test-hooks/{types,sw-hooks,offscreen-hooks}.ts` exist with the contracts described. - `src/background/index.ts` + `src/offscreen/recorder.ts` carry the gated dynamic import block; in offscreen, also the `setCurrentStream(stream)` call inside `startRecording`. - `tests/uat/lib/test-hook-contract.d.ts` mirrors the type. - `npm run build` exits 0; `grep -rln __mokoshTest dist/` → 0 matches. - `npm run build:test` exits 0; `grep -rln __mokoshTest dist-test/` → ≥1 match. - Tier-1 grep gate (`tests/background/no-test-hooks-in-prod-bundle.test.ts`) GREEN. - Tier-1 SW-bundle-import gate (`tests/background/sw-bundle-import.test.ts`) GREEN. - Full vitest suite: 85 GREEN. - `npx tsc --noEmit` exit 0. Hook surfaces live in test bundle; absent in production bundle (Tier-1 grep gate verifies); SW + offscreen module init unchanged for production; baseline preserved. Task 3 (Wave 2): Build harness scaffolding — `tests/uat/lib/{launch,extension,sw,offscreen,assertions,zip}.ts` + `harness.test.ts` skeleton with all 14 assertions stubbed as failing. - .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §1 (Puppeteer extension API patterns) - .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §4 (target type quirk for offscreen) - .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §7 (Bug B dispatchEvent contract — BLOCKER) - .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §11 (per-assertion implementation hints) - src/test-hooks/types.ts (from Task 2) - tests/uat/lib/test-hook-contract.d.ts (from Task 2) - tests/background/sw-bundle-import.test.ts (execFile child-process pattern — only relevant for assertion 0 which uses fs.readdir directly, not a spawned child) tests/uat/lib/launch.ts, tests/uat/lib/extension.ts, tests/uat/lib/sw.ts, tests/uat/lib/offscreen.ts, tests/uat/lib/assertions.ts, tests/uat/lib/zip.ts, tests/uat/harness.test.ts, tests/uat/README.md - `tests/uat/lib/launch.ts` exports `launchHarnessBrowser(options?: HarnessOptions): Promise` returning `{ browser, sw, ext, page, downloadsDir }`. Reads `HEADLESS` env var (`'0'` = headful for debug, anything else = headless). Wires Chrome args per RESEARCH §1 + §9. - `tests/uat/lib/extension.ts` exports `attachToSw`, `attachToOffscreen`, `waitForOffscreen` per the RESEARCH §4 patterns. The offscreen attach uses the `background_page` target type + `.asPage()` (Pitfall 1). - `tests/uat/lib/sw.ts` exports `getBadgeText(sw)`, `getPopup(sw)`, `getManifest(sw)`, `getIconSize(sw, path)`, `fireOnStartup(sw)`, `sendSyntheticRecordingError(sw, errorCode)`, `keepalivePing(sw)`, `getNotificationSnapshot(sw)`. - `tests/uat/lib/offscreen.ts` exports `getDisplaySurface(offPage)`, `simulateUserStop(offPage)` (the dispatchEvent path per RESEARCH §7 BLOCKER — with an inline comment block citing the BLOCKER reasoning so future readers don't refactor it to `track.stop()`). - `tests/uat/lib/assertions.ts` exports `assertEqual(actual, expected, msg)` + `assertMatch(actual, regex, msg)` + `assertTrue(cond, msg)` + a structured `runAssertion(name, fn)` wrapper that runs a single assertion, captures any SW/offscreen console logs since the last assertion, and dumps them to stderr on failure. Uses `node:assert/strict` per RESEARCH §4. - `tests/uat/lib/zip.ts` exports `assertArchiveShape(zipBuf, expectedVersion)` — opens with jszip, asserts `video/last_30sec.webm` present + `meta.json` carries `version === expectedVersion`. The meta.json shape is per Plan 01-07 (existing archive contract — read once at the start of the harness and pass through). - `tests/uat/harness.test.ts` is the single Node script (tsx-runnable). Top-to-bottom narrative: ``` 0. Pre-flight grep gate (filesystem readdir on dist/) — assertion 0. 1. launchHarnessBrowser → attachToSw → attachToOffscreen-when-ready. 2. Assertion 1: SW bootstrap → setIdleMode (badge '', popup '', isRecording=false). 3. Assertion 2: triggerExtensionAction → wait → badge 'REC' + popup === src/popup/index.html + isRecording=true. 4. Assertion 3: offscreen track displaySurface === 'monitor'. 5. Assertion 4: triggerExtensionAction (while recording) → popup opens, NO new offscreen target. 6. Assertion 5: sendMessage SAVE_ARCHIVE → wait for download → check downloadsDir contains session_report_*.zip. 7. Assertion 6 (BUG B): simulateUserStop → wait 300ms → badge '' + popup '' + isRecording=false + notificationCount delta = 0. 8. Assertion 7 (ERROR path): sendSyntheticRecordingError('codec-unsupported') → badge 'ERR' + notificationCount delta = 1. 9. Assertion 8 (BUG A + onStartup): fireOnStartup → notifications.create called once with iconUrl matching icons/icon48.png (or icon128.png — verify which one the production code uses; the badge_state_machine plan uses icon128, but the test asserts whichever the production code actually invokes per the lastNotificationOptions snapshot). 10. Assertion 9: icon file sizes via sw.evaluate(fetch) ≥ floors (16: 200B, 48: 500B, 128: 1024B). 11. Assertion 10: manifest has 'notifications' permission + icons.16 + icons.48 + icons.128 declared. 12. Assertion 11 (35s record): start a fresh recording, wait 35s, query SW (via runtime message → offscreen → segments count) → segments.length >= 3. 13. Assertion 12 (ffprobe gate): trigger SAVE_ARCHIVE, extract video/last_30sec.webm, spawn ffprobe → exit 0. 14. Assertion 13 (zip shape): assertArchiveShape on the latest session_report_*.zip. 15. Final summary: `console.log('UAT harness: 14/14 assertions passed')`; exit 0. ``` ALL 14 assertions stubbed today as `runAssertion('N: title', async () => { throw new Error('NOT YET IMPLEMENTED — Task 5+ wires this'); });` so the harness exits non-zero with a clear "N assertions failed" diagnostic. Assertion 0 (filesystem-only) is wired in this task; assertions 1-13 are stubbed. - `tests/uat/README.md` documents: - How to run: `npm run test:uat` (build + harness). - Local-debug headful mode: `HEADLESS=0 npm run test:uat`. - Skipping the build (developer iteration): `SKIP_BUILD=1 npx tsx tests/uat/harness.test.ts` (the build is the npm-script wrapper; the harness itself can run against an existing `dist-test/`). - Locale gotcha: `--auto-select-desktop-capture-source="Entire screen"` works on en_US; other locales need the locale-equivalent string. Fallback to operator-pick + `KEEP_PROFILE=1` documented as the Plan 01-09 fallback. - dev-dep size: puppeteer pulls ~150MB Chromium binary; CI must accept this. Production `npm install --omit=dev` skips it cleanly. - Xvfb is NOT required (per RESEARCH §3 empirical probes on Chrome 148). - Failure isolation choice: single browser, serial assertions, bail on first failure (RESEARCH §5 + open-question resolution 4). - Running `npm run test:uat` exits NON-ZERO today (the 13 stubbed assertions all throw); the diagnostic clearly identifies which assertion failed AND why ("NOT YET IMPLEMENTED — Task 5+ wires this"). Assertion 0 (the grep gate) PASSES — confirming the harness scaffolding wires correctly and the only failures are intentional stubs. 1. Create the `tests/uat/lib/` directory + all 6 helper files. Use absolute imports per project style. NO `as any`; type each helper's surface explicitly. Each helper file gets a top-of-file docstring per project style (extensive Google-style). 2. `launch.ts`: implementation uses `puppeteer.launch({ enableExtensions: [absolutePath], headless: ..., args: [...] })`. The absolutePath is computed via `path.resolve(__dirname, '../../../dist-test')` (the harness lives at `tests/uat/harness.test.ts` so `../../../` lands at repo root). Use `fileURLToPath` + `import.meta.url` for the `__dirname` shim (the harness runs as ESM under tsx). 3. `extension.ts`: implementation per RESEARCH §1 + §4 patterns. The offscreen attach uses `browser.waitForTarget(t => t.type() === 'background_page' && t.url().includes('offscreen'), { timeout: 5_000 })`. After getting the target, `.asPage()` returns the Page. 4. `sw.ts`: each helper is one or two lines of `sw.evaluate(...)`. The `getNotificationSnapshot` helper returns a structured `{ count, lastOptions, ids }` to keep the harness's reasoning unified. 5. `offscreen.ts` `simulateUserStop`: ```typescript export async function simulateUserStop(offPage: Page): Promise { // RESEARCH §7 BLOCKER — DO NOT REFACTOR to track.stop(). // track.stop() does NOT fire 'ended' per W3C spec (verified probe7); // dispatchEvent IS the only path that triggers our production // onUserStoppedSharing handler. A test that calls track.stop() would // silently pass while production reality fails — exactly the trap // Bug B fix (commit b9eeeeb) addresses. await offPage.evaluate(() => { const stream = globalThis.__mokoshTest?.getCurrentStream?.(); if (!stream) throw new Error('no current MediaStream — recording must be active'); const track = stream.getVideoTracks()[0]; if (!track) throw new Error('no video track in stream'); track.dispatchEvent(new Event('ended')); }); } ``` 6. `assertions.ts`: `runAssertion(name, fn)` captures `console.log`/`console.error` from the harness's own process; for SW + offscreen console logs, accept an optional `consoleSinks` parameter — the harness wires SW.on('console', ...) + offPage.on('console', ...) listeners at launch and passes their accumulating buffers to runAssertion. On assertion failure: dump buffers to stderr with structured "SW console (last N):" + "Offscreen console (last N):" preambles; rethrow. 7. `zip.ts`: jszip-based reader. The `expectedVersion` comes from `chrome.runtime.getManifest().version` (queried once at the start of the harness via `sw.evaluate`). Assertion is exact equality. 8. `harness.test.ts`: the top-to-bottom narrative. Wrap the whole thing in a top-level `try/finally`; the `finally` always calls `browser.close()`. The 14 assertion stubs all throw the "NOT YET IMPLEMENTED" Error. Assertion 0 is wired in this task: ```typescript await runAssertion('0: production bundle has no __mokoshTest leak', async () => { // Filesystem-only — does not require the browser. // We don't run `npm run build` here; that's the caller's responsibility // (npm run test:uat does `npm run build:test` first; a separate `npm run build` // confirmation could be added as a pre-flight, but the no-test-hooks-in-prod-bundle // unit test already covers that and runs as part of `npm test`. Here we re-verify // for E2E robustness against the case where the unit test passed against a stale dist/.) const { execFileSync } = await import('node:child_process'); execFileSync('npm', ['run', 'build'], { stdio: 'inherit' }); const distDir = path.resolve(__dirname, '../../dist'); const matches = await grepRecursive(distDir, '__mokoshTest'); assertEqual(matches.length, 0, 'production dist/ must not contain __mokoshTest'); }); ``` NOTE: assertion 0 spawns `npm run build` from inside the harness, which costs ~10s. The unit test (Task 1) makes this somewhat redundant — but the unit test runs in the vitest pass; the harness runs separately. Belt + suspenders. Alternative: skip the spawn if `process.env.SKIP_PROD_REBUILD === '1'` for developer iteration. 9. `README.md`: per the behavior list. 10. Run `npm run test:uat`. Expected output: - `npm run build:test` runs first (succeeds; emits dist-test/). - `tsx tests/uat/harness.test.ts` runs. - Assertion 0 PASSES (filesystem grep gate). - Assertions 1-13 all THROW "NOT YET IMPLEMENTED". - Exit code: non-zero. - Diagnostic line: "UAT harness: 1/14 assertions passed, 13 failed (first failure: Assertion 1)". 11. Run `npx tsc --noEmit` → exit 0 (all harness code type-clean against `@types/chrome` + puppeteer types + `tests/uat/lib/test-hook-contract.d.ts`). 12. Run `npx vitest run` (full suite) → 85 GREEN (no regression to unit tests; the harness lives outside vitest's discovery). Per project style: extensive docstrings; absolute imports; no `as any`; no `@ts-ignore`; named callbacks (the runAssertion lambdas are short enough to be acceptable as inline arrows). Use if-else chains over early returns where the assertion logic has multi-arm branching; guard-clause early returns are fine for null-checks per established project exception. npx tsc --noEmit && npm run test:uat; test $? -ne 0 && npx vitest run --reporter=dot - All 7 helper files exist with the contracts described. - `harness.test.ts` exists with assertion 0 wired (GREEN) + assertions 1-13 stubbed (RED). - `README.md` documents the runtime + local-debug + CI semantics. - `npm run test:uat` exits non-zero today; diagnostic clearly identifies assertion 0 as PASS + assertions 1-13 as "NOT YET IMPLEMENTED". - `npx tsc --noEmit` exit 0 across both `src/` and `tests/` trees. - Full vitest suite: 85 GREEN. - No file under `src/` modified by this task (the harness is purely under `tests/`). Harness scaffolding live with assertion 0 wired GREEN; assertions 1-13 staged as RED stubs for Tasks 4-7; baseline preserved. Task 4 (Wave 3 — bundle 1/4): Wire assertions 1, 2, 3, 4 (SW bootstrap + toolbar onClicked + displaySurface + popup-during-recording). - tests/uat/harness.test.ts (skeleton from Task 3) - tests/uat/lib/{launch,extension,sw,offscreen}.ts (helpers from Task 3) - src/background/index.ts lines 75-108 (setIdleMode/setRecordingMode state machine — the production code these assertions verify) - src/background/index.ts lines 411-415 (setRecordingMode call site inside startVideoCapture) - src/background/index.ts lines 844-858 (chrome.action.onClicked listener registration) - src/offscreen/recorder.ts lines 241-247 (getDisplayMedia call + mediaStream assignment) - .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §1 (triggerExtensionAction + the popup-vs-onClicked MV3 contract) - .planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md (the must-haves these assertions are verifying) tests/uat/harness.test.ts, tests/uat/lib/sw.ts - Assertion 1 (SW bootstrap): after `launchHarnessBrowser` + attach SW, query `getBadgeText` (empty), `getPopup` (empty), `getIsRecording` (false — exposed via a new helper that reads `globalThis.isRecording` from the SW context via `sw.evaluate`; the SW production code has `isRecording` as a module-level let, accessible from the SW global). PASSES today against current bundle. - Assertion 2 (onClicked-idle): `page.triggerExtensionAction(ext)` → `await waitFor(() => getBadgeText() === 'REC', 5_000)` (poll up to 5s; the picker auto-selects the screen so getDisplayMedia resolves fast). Then assert popup === 'src/popup/index.html' + getIsRecording === true. PASSES today. - Assertion 3 (displaySurface): after assertion 2 leaves recording active, attach to offscreen via `waitForOffscreen` + `attachToOffscreen`. Then `offsetPage.evaluate(() => __mokoshTest.getCurrentStream().getVideoTracks()[0].getSettings().displaySurface)` === 'monitor'. PASSES today (per Plan 01-09 D-15-display-surface; the post-grant validation in recorder.ts ensures monitor-only). - Assertion 4 (click-during-recording): record the current offscreen target count, then `page.triggerExtensionAction(ext)` again. Assert: popup state unchanged (still 'src/popup/index.html'); NO new offscreen target spawned (count unchanged). The toolbar click with popup set opens the popup (which the harness can verify via `browser.targets().find(t => t.url().includes('popup/index.html'))` — the popup target appears as a `page` type briefly). PASSES today. - All 4 assertions wired; each carries an inline RED-on-regression demonstration step in its action block: the executor must locally demonstrate the assertion CAN catch a regression before marking the assertion GREEN. 1. Wire assertion 1: replace the "NOT YET IMPLEMENTED" stub with the real logic per behavior. Add a `getIsRecording(sw)` helper to `tests/uat/lib/sw.ts`: ```typescript export async function getIsRecording(sw: WebWorker): Promise { return await sw.evaluate(() => (globalThis as any).isRecording as boolean); } ``` NOTE: this is the ONE site where `as any` is unavoidable — the production code declares `isRecording` as a module-level `let` in `src/background/index.ts:36`, which is NOT exposed on globalThis directly. To read it, we need to evaluate in the SW context AS the SW (which has implicit globalThis access to module-top let-bindings — verify this is true in MV3 SW context; if not, expose `isRecording` via a getter on `__mokoshTest` in `sw-hooks.ts`). Document the choice + rationale inline. (Per RESEARCH §6 contract verification: SW module-level `let` IS accessible as `globalThis.isRecording` in MV3 SW context — verified by probe2. If the executor sees `undefined` returned, fall back to exposing via `__mokoshTest.isRecording` getter from sw-hooks.ts and document the SW-isolation finding.) 2. Wire assertion 2: implementation per behavior. After `triggerExtensionAction`, poll `getBadgeText` for up to 5 seconds — the badge transition is async (offscreen creation + getDisplayMedia + post-grant validation + setRecordingMode all happen in sequence). Use a polling helper from `assertions.ts` or inline: ```typescript async function waitFor(probe: () => Promise, predicate: (v: T) => boolean, timeoutMs: number): Promise { const start = Date.now(); while (Date.now() - start < timeoutMs) { const v = await probe(); if (predicate(v)) return v; await new Promise(r => setTimeout(r, 100)); } throw new Error(`waitFor timeout ${timeoutMs}ms`); } ``` Use this in assertion 2 + 3 + 4. 3. Wire assertion 3: per behavior. The `waitForOffscreen` helper already handles the target wait + asPage; attach once after assertion 2 sets recording=true, then offPage.evaluate the displaySurface read. 4. Wire assertion 4: per behavior. Count `browser.targets()` filtered to offscreen-url-containing BEFORE the second click, then AFTER; assert equality. Also assert popup state unchanged. 5. RED-on-regression demonstration: - For assertion 2: locally insert `chrome.action.onClicked.addListener(async () => { return; })` BEFORE the production listener and re-build:test; assertion 2 should FAIL (badge stays empty). Revert the hack; assertion 2 PASSES. - For assertion 3: locally alter `recorder.ts` to call `getDisplayMedia({ video: true, audio: false })` (without displaySurface constraint) and rebuild; assertion 3 should FAIL (displaySurface defaults to 'browser' OR is undefined depending on Chrome behavior). Revert; PASSES. - The executor commits ONLY the working assertions; the RED demos are local-only verifications. Document each RED demo's outcome in the commit message body. 6. Run `npm run test:uat`: assertions 0+1+2+3+4 PASS; assertions 5-13 still stubbed as RED. Exit non-zero. Diagnostic: "5/14 passed, 9 failed". 7. Run `npx tsc --noEmit` → exit 0. 8. Run full vitest suite → 85 GREEN. npx tsc --noEmit && (set +e; npm run test:uat; test $? -ne 0) - Assertions 0, 1, 2, 3, 4 all PASS in `npm run test:uat`. - Assertions 5-13 still throw "NOT YET IMPLEMENTED". - `npm run test:uat` exits non-zero (because 9 stubs remain). - Diagnostic shows 5/14 passed. - `npx tsc --noEmit` exit 0. - Full vitest suite: 85 GREEN. - Each wired assertion's commit message body cites the RED-demonstration outcome. First 4 functional assertions live and GREEN; harness proves it can verify toolbar + displaySurface + popup-state via CDP. Task 5 (Wave 3 — bundle 2/4): Wire assertions 5, 6, 7 (SAVE_ARCHIVE download + Bug B user-stopped routing + ERROR-path). - tests/uat/harness.test.ts (assertions 1-4 GREEN from Task 4) - tests/uat/lib/{sw,offscreen,zip}.ts (helpers; especially simulateUserStop's BLOCKER-citing comment) - src/background/index.ts lines 725-778 (RECORDING_ERROR handler — Bug B conditional routing) - src/offscreen/recorder.ts lines 451-480 (onUserStoppedSharing — the handler simulateUserStop must trigger) - .planning/debug/resolved/01-09-recovery-flow.md (Bug B debug record — the exact contract assertion 6 verifies) - .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §7 (BLOCKER analysis — track.dispatchEvent is the ONLY valid path) tests/uat/harness.test.ts, tests/uat/lib/sw.ts - Assertion 5 (SAVE_ARCHIVE download): with recording active from prior assertions, `sw.evaluate(() => chrome.runtime.sendMessage({type: 'SAVE_ARCHIVE'}))` triggers the save flow. The download lands in `downloadsDir` (configured at launch via `--user-data-dir` + per-page download behavior, OR via `page._client().send('Browser.setDownloadBehavior', ...)` — RESEARCH didn't deep-dive this; the executor researches the cleanest path). Poll for `*session_report*.zip` appearance in downloadsDir for up to 15s. PASSES today. - Assertion 6 (BUG B): snapshot `notificationCount` via `getNotificationSnapshot(sw)`. Then `simulateUserStop(offPage)`. Wait 300ms (offscreen handler → runtime message → SW handler → state transition is async). Assert: badge text === '' (NOT 'ERR'); popup === '' (NOT 'src/popup/index.html'); isRecording === false; notificationCount delta === 0 (no recovery notification fired for deliberate stop). PASSES today against b9eeeeb. - Assertion 7 (ERROR-path preserved): start a fresh recording (since assertion 6 stopped it). Snapshot notificationCount. Then `sw.evaluate(() => chrome.runtime.sendMessage({type: 'RECORDING_ERROR', error: 'codec-unsupported'}))`. Wait 200ms. Assert: badge text === 'ERR'; notificationCount delta === 1; last notification id starts with 'mokosh-recovery-'. PASSES today. - Each assertion carries the RED-on-regression demonstration; assertion 6's RED demo is the canonical "rewinding b9eeeeb" cycle from the orchestrator brief. 1. Wire assertion 5. Investigate Puppeteer's download path config: `browser.defaultBrowserContext().overridePermissions(...)` for downloads OR `CDP Browser.setDownloadBehavior` with `behavior: 'allow'` + `downloadPath: downloadsDir`. The harness creates `downloadsDir` in the launch helper (e.g. `os.tmpdir() + '/mokosh-uat-downloads-' + Date.now()`). After `sendMessage({type:'SAVE_ARCHIVE'})`, poll the dir for ~15s for any `session_report_*.zip`. Save the path for assertion 13. PASS = file appears + non-zero size. 2. Wire assertion 6 per behavior. Use the existing `simulateUserStop` helper (with its BLOCKER comment intact). The 300ms wait is the propagation budget; if assertions intermittently flake here, bump to 500ms — the offscreen handler is synchronous-into-sendMessage, the SW handler is synchronous-into-setIdleMode, so 300ms is generous but not extravagant. 3. Wire assertion 7 per behavior. Reads `lastNotificationOptions.title` or similar to verify "Mokosh stopped" recovery copy AND `notificationIds[notificationIds.length-1].startsWith('mokosh-recovery-')`. 4. RED-on-regression demonstrations (recorded in commit body): - **Assertion 6 RED demo (THE canonical Bug B regression check)**: locally `git diff HEAD~1 -- src/background/index.ts` to recover the pre-b9eeeeb shape of the RECORDING_ERROR handler (unconditional setErrorMode); APPLY the inverse patch locally (do NOT commit). Rebuild test bundle. Run `npm run test:uat`. Assertion 6 MUST FAIL with diagnostic: "expected badge text '' but got 'ERR'". Revert (`git checkout -- src/background/index.ts`). Rebuild. Re-run. Assertion 6 PASSES. This proves the harness assertion CAN catch a Bug B regression. **Document this end-to-end demo in the commit message body.** - Assertion 5 RED demo: locally comment out the `chrome.downloads.download(...)` call in `src/background/index.ts:saveArchive` and rebuild; assertion 5 should FAIL (timeout waiting for zip). Revert; PASSES. - Assertion 7 RED demo: locally short-circuit the RECORDING_ERROR case to return without calling setErrorMode for codec-unsupported (e.g. early-return on case entry); assertion 7 should FAIL. Revert; PASSES. 5. Run `npm run test:uat`: 8/14 PASS, 6 stubs remain. Exit non-zero. 6. Run `npx tsc --noEmit` → exit 0. Vitest 85 GREEN. npx tsc --noEmit && (set +e; npm run test:uat; test $? -ne 0) - Assertions 0-7 all PASS. - Assertions 8-13 still stubbed RED. - `npm run test:uat` exits non-zero; diagnostic 8/14 passed. - Bug B RED-on-regression demo documented in commit body (mandatory). - `npx tsc --noEmit` exit 0; vitest 85 GREEN. Bug B harness assertion live AND demonstrably catches regression; SAVE_ARCHIVE + ERROR-path coverage live; bug-class root cause (state-machine routing) now CI-callable. Task 6 (Wave 3 — bundle 3/4): Wire assertions 8, 9, 10 (Bug A onStartup notification + icon file sizes + manifest shape). - tests/uat/harness.test.ts (assertions 1-7 GREEN from Tasks 4-5) - src/background/index.ts lines 860-881 (chrome.runtime.onStartup handler — the path Bug A's recovery notification was failing on before a881bf0) - manifest.json (icons declared + notifications permission) - .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §11 (per-assertion implementation hints) - icons/icon{16,48,128}.png (verify presence + size — the floors are 200/500/1024 bytes from the orchestrator brief) tests/uat/harness.test.ts - Assertion 8 (BUG A + onStartup): snapshot notificationCount. Then `sw.evaluate(() => globalThis.__mokoshTest!.handlers.onStartup?.())`. Wait 100ms (synchronous handler, but allow microtask drain). Assert: notificationCount delta === 1; `lastNotificationOptions.iconUrl` matches `/icons\/icon(?:128|48)\.png$/` (the production code uses NOTIFICATION_ICON_PATH = 'icons/icon128.png'); `lastNotificationOptions.title === 'Mokosh ready'`; `notificationIds[notificationIds.length-1].startsWith('mokosh-startup-')`. The PASS condition implies chrome.notifications.create's promise resolved cleanly — if Bug A regressed (icon below floor), Chrome's imageUtil throws and the create call REJECTS, so notificationCount would NOT increment. PASSES today against a881bf0. - Assertion 9 (icon files present + sized): for each of (16, 200), (48, 500), (128, 1024), `sw.evaluate` a fetch of `chrome.runtime.getURL('icons/icon{N}.png')` and read `content-length`. Assert >= floor. PASSES today. - Assertion 10 (manifest shape): `getManifest(sw)`. Assert: `permissions.includes('notifications')`; `icons['16']`, `icons['48']`, `icons['128']` all defined and equal to expected paths. PASSES today. - Each assertion's RED-on-regression demo documented in commit body. 1. Wire assertion 8 per behavior. The `onStartup` handler in production carries inline try/catch around the `chrome.notifications.create` call (per src/background/index.ts:868-877); the hook's notificationCount wrapper increments regardless of create's resolution path. To verify Bug A specifically, ALSO assert that the iconUrl in lastNotificationOptions points to a file that resolves to >= 1024 bytes (cross-check with assertion 9's floor). This catches the Bug A regression EVEN IF a future change wraps the create call in a swallowing try/catch. 2. Wire assertion 9 per behavior. The fetch via sw.evaluate is the cleanest path — Chrome serves extension files from `chrome-extension:///...` and fetch with a `chrome-extension://` URL works in SW context. 3. Wire assertion 10 per behavior. Direct `chrome.runtime.getManifest()` read. 4. RED-on-regression demos (commit body): - **Assertion 8 RED demo (Bug A canonical)**: locally `echo "" > icons/icon128.png` (truncate to 0 bytes). Rebuild test bundle. Run `npm run test:uat`. Assertion 8 should FAIL — Chrome's imageUtil rejects the create call (or the wrapper's lastNotificationOptions snapshot has wrong shape). Restore (`git checkout -- icons/icon128.png`). Rebuild. Re-run. Assertion 8 PASSES. **Document in commit body.** - Assertion 9 RED demo: same truncate; rebuild; assertion 9 should FAIL with "content-length 0 < floor 1024". Restore; PASSES. - Assertion 10 RED demo: locally remove "notifications" from manifest.json permissions and rebuild test bundle; assertion 10 should FAIL. Restore; PASSES. 5. Run `npm run test:uat`: 11/14 PASS, 3 stubs remain (11, 12, 13). 6. `npx tsc --noEmit` exit 0; vitest 85 GREEN. npx tsc --noEmit && (set +e; npm run test:uat; test $? -ne 0) - Assertions 0-10 all PASS. - Assertions 11-13 still stubbed RED. - `npm run test:uat` exits non-zero; diagnostic 11/14 passed. - Bug A RED-on-regression demo documented in commit body (mandatory). - `npx tsc --noEmit` exit 0; vitest 85 GREEN. Bug A harness assertion live AND demonstrably catches regression; icon + manifest coverage live; both Phase-1-escapee bug classes (Bug A + Bug B) now CI-callable. Task 7 (Wave 3 — bundle 4/4): Wire assertions 11, 12, 13 (35s buffer continuity + ffprobe gate + zip shape) — closes the 13-assertion charter. - tests/uat/harness.test.ts (assertions 1-10 GREEN from Tasks 4-6) - tests/uat/lib/zip.ts (the jszip-based archive shape helper) - tests/offscreen/webm-playback.test.ts (the existing ffprobe pattern — FFPROBE_BIN constant, skip-gate helper) - src/background/webm-remux.ts (Plan 01-08's remux helper — what the harness's ffprobe gate validates) - .planning/phases/01-stabilize-video-pipeline/01-11-RESEARCH.md §11 (per-assertion implementation hints for 11, 12, 13) tests/uat/harness.test.ts, tests/uat/lib/zip.ts - Assertion 11 (35s buffer continuity): start a fresh recording. Wait 35 seconds (with keepalive pings every 20s per RESEARCH §2). Query the offscreen segments count via offPage.evaluate (the offscreen recorder maintains a `segments` ring; expose it via a `__mokoshTest.getSegmentCount()` getter — ADD this to offscreen-hooks.ts in this task). Assert: segmentCount >= 3 (per D-13: 10s segments × MAX_SEGMENTS=3 = 30s window). PASSES today. - Assertion 12 (ffprobe gate): trigger SAVE_ARCHIVE (reusing the assertion 5 helper). Extract `video/last_30sec.webm` from the produced zip via jszip. Write to a tmpfile. Spawn `ffprobe -v error -f matroska -i ` via execFileSync. Assert exit code 0. (Skip-gate this assertion with a clear "SKIPPED: ffprobe binary not available" diagnostic if `which ffprobe` fails — matches the existing webm-playback.test.ts pattern.) - Assertion 13 (zip shape): jszip parse the same zip. Assert: `video/last_30sec.webm` entry exists + has non-zero size. Assert: `meta.json` entry exists + parsed JSON has `version === ` (read via sw.evaluate at the start of the harness or this assertion). - The 35-second wait pushes the harness runtime past 60s. Add keepalive ping infrastructure (one ping every 20s during the wait) to avoid SW eviction per RESEARCH §2 / Pitfall 5. 1. ADD a `__mokoshTest.getSegmentCount()` getter to `src/test-hooks/offscreen-hooks.ts`. The offscreen recorder has a module-level `segments` array (from D-13 restart-segments); expose a function-level setter alongside `setCurrentStream`: ```typescript // src/test-hooks/offscreen-hooks.ts let currentStream: MediaStream | null = null; let segmentCountGetter: () => number = () => 0; export function setCurrentStream(s: MediaStream | null) { currentStream = s; } export function setSegmentCountGetter(g: () => number) { segmentCountGetter = g; } globalThis.__mokoshTest = { // ... getCurrentStream: () => currentStream, getSegmentCount: () => segmentCountGetter(), }; ``` Update `src/test-hooks/types.ts` to add `getSegmentCount?: () => number` to MokoshTestSurface. In `src/offscreen/recorder.ts`, after the existing `setCurrentStream(stream)` call, add (gated): ```typescript if (import.meta.env.MODE === 'test') { const hooks = await import('../test-hooks/offscreen-hooks'); hooks.setSegmentCountGetter(() => segments.length); } ``` (Where `segments` is the module-level array. If the variable name differs, adapt. Read the file to confirm; commonly named `videoSegments` or `segments`.) 2. Wire assertion 11 per behavior. The 35s wait uses `await new Promise(r => setTimeout(r, 35_000))` with intermittent `await keepalivePing(sw)` every 20s. Use `setInterval` or a polling loop; document the keepalive purpose per RESEARCH §2. 3. Wire assertion 12 per behavior. Reuse the `FFPROBE_BIN` constant pattern from `tests/offscreen/webm-playback.test.ts`. Skip-gate: `if (!existsSync(FFPROBE_BIN)) { console.warn('Assertion 12: ffprobe not available — SKIPPED'); return; }`. The skip-gate is acceptable for assertion 12 because the unit-level tests (Plan 01-08's `tests/background/webm-remux.test.ts`) also have ffprobe gates that cover the same contract — the harness's ffprobe assertion is end-to-end validation, not the primary gate. 4. Wire assertion 13. Pass `expectedVersion = await sw.evaluate(() => chrome.runtime.getManifest().version)` into `assertArchiveShape`. 5. Update Tier-1 grep gate test (`tests/background/no-test-hooks-in-prod-bundle.test.ts`) to ALSO assert ZERO `getSegmentCount` in dist/ (new hook surface added in this task — confirm gate stays GREEN). 6. RED-on-regression demos (commit body): - Assertion 11 RED demo: locally hack `SEGMENT_DURATION_MS = 30_000` in recorder.ts so 35s yields only 1 segment; rebuild; assertion 11 should FAIL. Revert; PASSES. - Assertion 12 RED demo: locally inject a corrupted byte into the remux output (e.g. zero the EBML magic in webm-remux.ts before return); rebuild; assertion 12 should FAIL (ffprobe error). Revert; PASSES. - Assertion 13 RED demo: locally drop `version` from the `meta.json` writer in saveArchive; rebuild; assertion 13 should FAIL. Revert; PASSES. 7. Run `npm run test:uat`: ALL 14 assertions PASS. Exit 0. Diagnostic: "UAT harness: 14/14 assertions passed". 8. `npx tsc --noEmit` → exit 0. `npx vitest run` → 85 GREEN. 9. **Verify Tier-1 grep gate updates:** `npm run build && grep -rln 'getSegmentCount' dist/` → 0 matches. npx tsc --noEmit && npm run test:uat && npx vitest run --reporter=dot && test "$(grep -rln getSegmentCount dist/ 2>/dev/null | wc -l)" = "0" - All 14 assertions PASS in `npm run test:uat`; exit 0. - `npm run test:uat` total runtime ~50-90s (dominated by the 35s assertion 11 wait + the harness setup ~10s + assertion 0's `npm run build` ~10s; skip with `SKIP_PROD_REBUILD=1` for ~70s). - `npx tsc --noEmit` exit 0; vitest 85 GREEN. - Production bundle (`npm run build`): `grep -rln __mokoshTest dist/` → 0; `grep -rln simulateUserStop dist/` → 0; `grep -rln getSegmentCount dist/` → 0. Tier-1 gate remains GREEN. - Each new assertion's RED-on-regression demo documented in commit body. 13-assertion charter complete; harness exits 0 against current Plan 01-09 bundle; Phase 1 functional contract fully CI-callable. Task 8 (Wave 4): Amend Plan 01-09 Task 5 operator checkpoint to redirect functional steps to `npm run test:uat`; update STATE.md decisions; close Plan 01-09 via this plan's harness PASS. - .planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md lines 519-549 (the operator checkpoint that gets amended) - .planning/phases/01-stabilize-video-pipeline/01-09-SUMMARY.md (current closure state) - .planning/STATE.md (Decisions section + Phase 1 Closure Notes) - tests/uat/harness.test.ts (the harness that NOW closes the functional contract) .planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md, .planning/STATE.md - `.planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md` gets an AMENDMENT block at the END of the file (does NOT rewrite the original Task 5 — preserves provenance per project convention from D-A1..D-A6 cascade pattern): ``` ---

  ## Amendment (Phase 01-stabilize-video-pipeline, 2026-05-17) — Plan 01-11 harness retires operator functional steps

  Plan 01-11 (Puppeteer UAT harness) lands a CI-callable replacement for the
  functional verification work in this plan's Task 5. The operator's role is
  reduced to:

  - **Step 1 (build):** unchanged — `npm run build` must exit 0.
  - **Steps 2-13:** REDIRECTED — replaced by `npm run test:uat` exit 0. The
    Puppeteer harness implements 14 assertions (assertion 0 = production-
    bundle hook-leak grep; assertions 1-13 = the original Task 5
    functional checks).
  - **Step 14 (brand/design — implicit in steps 4, 5, 6 of original task):**
    RETAINED for operator. The harness verifies displaySurface === 'monitor'
    + notification fires; it does NOT verify the human-readable copy is
    aesthetically correct OR that the badge color reads cleanly against the
    operator's OS theme. Operator confirms.
  - **Step 15 (genuine error UX):** REDIRECTED — assertion 7 verifies the
    ERROR-path bandwidth.

  **New closure gate:** Plan 01-09 closes when `npm run test:uat` exits 0
  AND operator confirms step 14 (brand/design). The harness's 14/14 PASS
  against current bundle (verified by this plan's Task 7) supplies the
  first half today.
  ```
- `.planning/STATE.md` Decisions section gains a new entry (preserves the existing log; appends rather than rewriting):
  ```
  - [Phase 01-11]: Operator role retirement landed via Puppeteer UAT harness. 14 assertions cover Plan 01-08/01-09 functional contract; operator retained only for brand/design step. `npm run test:uat` = the new CI gate for any Phase 1 SW/offscreen/manifest change. Tier-1 grep gate `tests/background/no-test-hooks-in-prod-bundle.test.ts` enforces zero `__mokoshTest` / `simulateUserStop` / `getSegmentCount` in production `dist/`.
  ```
- This task does NOT modify Plan 01-09's status fields, frontmatter, or original Task 5 body. The amendment is appended after the original `<output>` block (mirroring the CONTEXT.md amendment-append pattern from 2026-05-16).
- Operator (in the closing checkpoint below) confirms brand/design step 14 manually and types "approved" — at which point Plan 01-09 + Plan 01-11 close together.

1. Read `.planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md` to confirm the file structure ends with the `` block (line ~596 based on the file's current shape). 2. Append the amendment block per the behavior description, AFTER the closing `` tag. Use the same horizontal-rule + ## heading + AMENDED-BY metadata convention from CONTEXT.md amendments. Cite the harness path (`tests/uat/harness.test.ts`) and the npm script (`npm run test:uat`). 3. Read `.planning/STATE.md` Decisions section (lines 72-109). 4. Append the new entry to the Decisions list (after the most recent `[Phase 01-07-deferred-to-5]` entry per the convention). Do NOT modify any existing entry. 5. Verify both edits are content-only (no frontmatter changes; no status flips — those happen in the closing checkpoint). 6. Run `npx tsc --noEmit` → exit 0 (paranoia — neither edit touches TS, but baseline). 7. Run `npm run test:uat` → exit 0 (final smoke before the closing checkpoint). 8. Run `npx vitest run` → 85 GREEN. npx tsc --noEmit && grep -q 'Plan 01-11 harness retires operator functional steps' .planning/phases/01-stabilize-video-pipeline/01-09-PLAN.md && grep -q 'Operator role retirement landed via Puppeteer UAT harness' .planning/STATE.md && npm run test:uat && npx vitest run --reporter=dot - `01-09-PLAN.md` ends with the appended amendment block (no edits to the original Task 5 body). - `STATE.md` Decisions section carries the new entry as the last item (no edits to prior entries). - `npm run test:uat` exits 0 (14/14 GREEN). - `npx tsc --noEmit` exit 0; vitest 85 GREEN. Plan 01-09 functional contract redirected to harness; STATE.md decisions log updated; ready for closing checkpoint. Task 9 (Wave 4): Operator confirms `npm run test:uat` exits 0 against current bundle AND confirms brand/design step 14 (Plan 01-09 Task 5 retained step) — closes Plan 01-09 + Plan 01-11. (operator-driven; no files modified by this checkpoint) See below — operator-driven empirical check. The executor must NOT bypass this checkpoint by stubbing harness output. echo "checkpoint:human-verify — see how-to-verify section; resume signal is the gate" Operator types "approved" after running the how-to-verify steps. See for the exact gate. Tasks 1-8 landed: Puppeteer + tsx installed, vite.test.config.ts produces dist-test/, gated test hooks in src/test-hooks/ ship in test bundle and NOT in production bundle (Tier-1 grep gate verifies), Puppeteer harness at tests/uat/harness.test.ts implements 14 assertions, all 14 GREEN against current Plan 01-09 bundle (b9eeeeb Bug B fix + a881bf0 Bug A fix both verified by Bug B + Bug A canonical RED-on-regression demos). Plan 01-09 Task 5 redirected to `npm run test:uat` for functional steps. This checkpoint validates the harness end-to-end against real Chrome AND captures operator's brand/design acceptance for Plan 01-09's retained step 14. 1. **Pre-flight cleanliness:** run `git status` — confirm working tree clean. Any uncommitted local hacks (RED-demo reverts) MUST be reverted BEFORE this step. 2. **Build production:** `npm run build` (must exit 0; this is Plan 01-09 Task 5 step 1). 3. **Build test bundle:** `npm run build:test` (must exit 0). 4. **Run harness:** `npm run test:uat` (must exit 0; runtime ~70-90s). Final output line MUST be exactly `UAT harness: 14/14 assertions passed`. If exit non-zero, paste the structured diagnostic + harness console dump + relevant SW/offscreen console logs; the plan iterates (likely a real bug surfaced). 5. **Re-run for stability:** `npm run test:uat` a second time. Same outcome. (Eliminates first-run flakiness from cold Chrome / cold dist-test cache.) 6. **Tier-1 hook-leak verification:** `grep -rln __mokoshTest dist/` must return 0 matches. Same for `simulateUserStop`, `getSegmentCount`, `setCurrentStream`, `setSegmentCountGetter`. If ANY match, the gate failed silently — STOP and triage. 7. **Local-debug mode smoke:** `HEADLESS=0 npm run test:uat`. Watch the real Chrome window: see the toolbar icon, see the picker auto-accept, see the badge transitions. Same exit 0 outcome. (This is the operator's chance to spot any visual oddity the automated assertions miss.) 8. **Brand/design acceptance (Plan 01-09 Task 5 step 14 — retained for operator):** (a) Badge color readability against your OS theme: red OFF, green REC, yellow ERR should each contrast clearly with the toolbar background. If any is hard to see in light AND dark mode, document for Phase 5 hardening (do NOT block closure on this — file as a deferred item). (b) Notification copy: "Mokosh ready — Click here to start recording your session." reads naturally in en_US. Russian operators may want a localized variant — document for Phase 5 (do NOT block closure on this). (c) Picker UX: confirm Chrome's screen-share picker still surfaces (in headful mode) at the expected moment + with the correct monitor-only options. 9. **If steps 4, 5, 6 all PASS:** Plan 01-09 + Plan 01-11 both close. Type "approved" with any brand/design notes appended. 10. **If step 4 OR 5 FAIL:** paste the failure diagnostic. Likely culprits: locale-specific picker string mismatch (RESEARCH §9 — operator's Chrome may need a different `--auto-select-desktop-capture-source` value); race window in assertion 6 / 11 (try bumping the wait in the relevant assertion). 11. **If step 6 FAILS:** STOP. The Tier-1 hook-leak gate failing means the production bundle contains test code — this is a security regression (T-1-11-01). Do NOT proceed to closure. Open a debug session. 12. **If step 7 surfaces a real UX issue (not just a deferral):** document as a P1/P2 item in STATE.md or a phase-5 backlog file; closure can still proceed IF the issue is non-blocking. Type "approved" after step 9 lands (all gates GREEN + brand/design accepted). If steps 10/11/12 hit, paste the failure mode + operator's Chrome version + locale + OS theme; the plan iterates on the failing piece (likely Task 4-7 for assertion-specific issues; Task 1-2 for hook-leak issues; a fresh debug session for novel failures).

<threat_model>

Trust Boundaries

Boundary	Description
Puppeteer driver ↔ Chrome SW (via CDP)	The harness pipes CDP commands to the SW context via `sw.evaluate`. Trust boundary is unchanged at runtime (the SW only accepts the harness's commands because the harness runs inside the Puppeteer-launched Chrome process); but the harness CAN invoke any production SW code path via `sw.evaluate`, so a malicious or buggy harness could in principle exfiltrate buffered video. Mitigation: harness code is in-tree, code-reviewed via the same pipeline as production.
Test hook surface (`__mokoshTest`) in production bundle	NEW: if tree-shaking fails or the MODE guard is misconfigured, the hook surface ships to production — exposing simulateUserStop, getCurrentStream, captured handler refs to any page that can `eval` against the SW. THIS IS THE SECURITY-CRITICAL THREAT. Mitigation: Tier-1 grep gate (`tests/background/no-test-hooks-in-prod-bundle.test.ts`) enforces zero `__mokoshTest` in `dist/`; runs as part of `npm test` so any CI pipeline picks it up.
dev-dependency Chromium binary	NEW: Puppeteer downloads ~150 MB Chromium binary at `npm install` time. Supply-chain compromise of the Chrome download endpoint would inject malicious code into developer machines. Mitigation: `package-lock.json` integrity check (Puppeteer pins the Chromium download hash via its `@puppeteer/browsers` dependency). Out of scope: separate SCA for Puppeteer itself.
--auto-select-desktop-capture-source flag in CI	NEW: in a CI container, the flag auto-accepts the "Entire screen" source — which is whatever Xvfb (or modern headless surface) presents. If a CI runner is shared with sensitive workloads, the 35-second recording assertion captures whatever is on screen during that window. Mitigation: document that CI MUST run the harness in an isolated container with no concurrent workload; local-dev runs capture the operator's real screen for 35s during assertion 11, documented in README.md.

STRIDE Threat Register

Threat ID	Category	Component	Disposition	Mitigation Plan
T-1-11-01	Elevation of Privilege	`__mokoshTest` surface leaking into production `dist/` would expose simulateUserStop, captured chrome.* handler refs, and stream getter to any code with access to the SW context	mitigate	Two layers: (a) gated dynamic import per RESEARCH §6 (the literal `'test' !== 'production'` comparison is a static dead branch that Vite/Rollup tree-shake); (b) Tier-1 unit gate `tests/background/no-test-hooks-in-prod-bundle.test.ts` greps the BUILT artifact for `__mokoshTest` / `simulateUserStop` / `getSegmentCount` / `setCurrentStream` / `setSegmentCountGetter` — ZERO matches required for GREEN. Belt + suspenders catches both tree-shake regression AND new hook-name additions.
T-1-11-02	Information Disclosure	35-second recording assertion captures whatever is on the operator's screen during local-dev runs	accept	Operator-facing — local-dev runs are by definition under operator control; the recording is consumed only by ffprobe + jszip inside the harness process and is deleted with the temp downloads dir at process exit. CI runs document the isolated-container requirement in README.md.
T-1-11-03	Tampering	Puppeteer downloads Chromium binary at `npm install`; supply-chain compromise of the download endpoint	accept	`package-lock.json` pins resolved hashes via Puppeteer's `@puppeteer/browsers` machinery. Same risk surface as any npm dependency. Phase 5 SCA work (out of scope here) covers periodic re-verification.
T-1-11-04	Denial of Service	A pathological assertion 11 (35s wait) ties up CI runner time; combined with 14 sequential assertions, total runtime ~90s ties up a runner slot	accept	90s is well within typical CI per-job budgets. Local-dev runs use `SKIP_PROD_REBUILD=1` to drop assertion 0's `npm run build` cost (~10s). Out of scope: parallelizing assertions (would require multi-browser instances, defeating the failure-isolation choice).
T-1-11-05	Repudiation	The harness asserts the absence of recovery notification (Bug B path), but the assertion is a count-delta check — a notification fired BEFORE the snapshot would be invisible	mitigate	Each assertion snapshots `notificationCount` IMMEDIATELY before the trigger event AND immediately after the propagation wait. The delta is checked, not the absolute count. The `notificationIds` array is also asserted on for ID-prefix membership — even if delta counting were fooled by some interleaving, the absence of a 'mokosh-recovery-' prefix in the post-snapshot ids array catches the same regression.
T-1-11-06	Spoofing	Harness reads `__mokoshTest.handlers.onStartup` and invokes it; a hostile production change could swap in a no-op handler that registers AFTER the hook captures the real handler	mitigate	The hook monkey-patches `addListener` AT THE TOP OF THE MODULE (before any production addListener calls). Any later addListener invocation still goes through the patched function and would OVERWRITE handlers.onStartup, not bypass. A malicious bypass would require directly calling `chrome.runtime.onStartup.addListener.call(...)` via a saved bound reference — none exist in the production tree (verified by grep `addListener.call
</threat_model>

- `npm run test:uat` exits 0 against the current Plan 01-09 bundle; final line is exactly `UAT harness: 14/14 assertions passed`. - `npm run build` exit 0; `grep -rln __mokoshTest dist/` returns 0; `grep -rln simulateUserStop dist/` returns 0; `grep -rln getSegmentCount dist/` returns 0. - `npm run build:test` exit 0; `dist-test/` populated; `grep -rln __mokoshTest dist-test/` returns ≥1. - `npx vitest run` exit 0; 85 GREEN across all test files (83 baseline + 2 from Task 1's Tier-1 grep gate). - `npx tsc --noEmit` exit 0 across `src/` + `tests/`. - Tier-1 SW-bundle-import gate (`tests/background/sw-bundle-import.test.ts`) GREEN — verifies the gated dynamic import does not break production module init. - Tier-1 hook-leak gate (`tests/background/no-test-hooks-in-prod-bundle.test.ts`) GREEN — verifies the production bundle is hook-free. - Bug B canonical RED-on-regression demo documented in Task 5's commit body (locally reverting b9eeeeb makes assertion 6 RED; re-applying makes GREEN). - Bug A canonical RED-on-regression demo documented in Task 6's commit body (locally truncating icons/icon128.png makes assertions 8 + 9 RED; restoring makes GREEN). - Plan 01-09 Task 5 amended at the end of its PLAN.md (no rewrite of the original body); STATE.md Decisions log carries the new Plan 01-11 entry. - Operator confirms brand/design step 14 + types "approved" in Task 9.

<success_criteria> Plan 01-11 is complete when:

Two-bundle separation lives. npm run build produces hook-free dist/; npm run build:test produces hook-enabled dist-test/. The Tier-1 grep gate enforces the production bundle's hook absence.
All 14 harness assertions pass against the current Plan 01-09 bundle. npm run test:uat exits 0; final line is UAT harness: 14/14 assertions passed.
Both Phase-1-escapee bugs are now CI-callable. Assertion 6 (Bug B state-machine routing) and Assertion 8 (Bug A icon-promoted notification) each have a RED-on-regression demo documented in their respective task's commit body, proving the harness assertion CAN catch a regression — not just pass under current conditions.
Operator role retired for functional verification. Plan 01-09 Task 5 steps 4-13 + 15 redirect to npm run test:uat; only step 1 (build) + step 14 (brand/design) retained. The amendment block in 01-09-PLAN.md preserves provenance (no rewrite of the original task).
Existing 83 vitest tests remain GREEN. Plus the 2 new Tier-1 gate tests in this plan = 85 total. No regression.
npx tsc --noEmit exit 0. All harness code + hook code type-clean.
npm run build exit 0; npm run build:test exit 0. Both production and test bundles emit cleanly.
Operator confirms Task 9 brand/design acceptance + types "approved". Plan 01-09 + Plan 01-11 close together. </success_criteria>

After completion, create `.planning/phases/01-stabilize-video-pipeline/01-11-SUMMARY.md` per the standard template. Cite: - The 14 assertions landed GREEN (0: prod-bundle hook-leak grep gate; 1-13: functional contract from orchestrator brief). - Both RED-on-regression canonical demos documented in commit bodies (Bug B for assertion 6; Bug A for assertion 8). - The two-bundle separation (`dist/` vs `dist-test/`) verified by Tier-1 grep gate. - npm script additions (`build:test` + `test:uat`); dev-dep additions (puppeteer + tsx) with resolved versions. - Hook surface inventory (`__mokoshTest`: handlers, notification observables, getCurrentStream, getSegmentCount) + the gated dynamic import sites in `src/background/index.ts` + `src/offscreen/recorder.ts`. - Plan 01-09 amendment block landed (Task 5 functional steps redirected; brand/design step retained). - STATE.md decision log updated with the operator-retirement decision. - Open questions resolved (5 from RESEARCH) + their resolutions; any new open questions surfaced during execution. - Bundle-size delta (`dist/` before vs after; should be near-zero since gated dynamic imports tree-shake cleanly). - Total harness runtime ranges observed (cold: ~90s including build steps; warm with SKIP_PROD_REBUILD=1: ~70s; the 35-second assertion 11 wait dominates).

97 KiB Raw Permalink Blame History Unescape Escape