Files
mokosh/.planning/phases/04-harden-clean-up-optional/04-04-PLAN.md
Mark 76fffb35b9 fix(04): revise plans per checker iter-1 — 2 BLOCKERS + 2 WARNINGS fixed
Plan-checker iter-1 found 2 BLOCKERS + 4 WARNINGS. Iter-2 revision applies
surgical fixes to 4 plans + VALIDATION:

BLOCKER 1 (Plan 04-06 Task 4): wrong SW chunk glob `dist/assets/index*-bg.js`
matched zero files → Gates 2/3/4 silently PASSED. Replaced with canonical
`dist/assets/index.ts-*.js` (verified empirically: index.ts-8LkXuqac.js
on disk; RESEARCH Q1). Added glob-existence pre-gate `ls | wc -l >= 1`
to fail-loudly on future Vite chunk-naming shift.

BLOCKER 2 (Plan 04-04 Task 1): spike called non-existent
__mokoshHarness.dispatchSaveArchive (verified: harness surface is
assertA1..A31 + getManifestVersion only). Applied Option B — spike
+ driveA33 now dispatch SAVE_ARCHIVE via chrome.runtime.sendMessage
inline in page.evaluate (matches 9 existing assertA* methods:
A5/A11/A12/A13/A26/A28/A29/A30/A31). No new harness helper introduced.

WARNING 1 (Plan 04-02 Task 2): verify omitted UAT harness run. Added
`HEADLESS=1 SKIP_PROD_REBUILD=0 npm run test:uat 2>&1 | grep -c 'UAT
harness: 33/33 assertions passed'` to verify command (stdout format
confirmed at tests/uat/harness.test.ts:537).

WARNING 4 (Plan 04-07 Task 1): weak operator-ack gate (placeholder would
pass). Added `grep -cE 'approved|All good|APPROVED|approved by|operator
ack|all good' 04-VERIFICATION.md` to verify command. Covers both
canonical Plan 04-06 resume-signal ("approved" lowercase) AND prior-art
Plan 01-10 cycle-2 ack ("All good" titlecase).

WARNINGS 2 + 3 left as-is (truly advisory: scope-sanity threshold +
conservative dependency without file overlap).

04-VALIDATION.md per-task map rows updated for the 5 revised task entries
(04-02 T2 + 04-04 T1 + 04-04 T2 + 04-06 T4 + 04-07 T1). Frontmatter
adds `revised: 2026-05-21` + iter-2 notes block.

3 plans unchanged on disk (04-01, 04-03, 04-05).

Empirical confirmations used in revision:
- Harness surface: grep extension-page-harness.ts:4018 confirms
  __mokoshHarness.{assertA1..A31, getManifestVersion}; no dispatchSaveArchive
- SW chunk filename: ls dist/assets/ shows index.ts-8LkXuqac.js;
  no index*-bg.js matches
- SAVE_ARCHIVE precedent count: 9 existing assertA* methods use the
  chrome.runtime.sendMessage pattern
- UAT harness stdout format: harness.test.ts:537 emits canonical
  "UAT harness: N/N assertions passed"

Ready for plan-checker iter-3 re-verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 10:00:07 +02:00

473 lines
33 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
phase: 04
slug: harden-clean-up-optional
plan: 04
type: execute
wave: 3
depends_on:
- 01
- 02
- 03
files_modified:
- tests/uat/extension-page-harness.ts
- tests/uat/lib/harness-page-driver.ts
- tests/uat/harness.test.ts
autonomous: true
requirements: []
tags:
- uat-harness
- a33
- sw-state-persistence
- sw-eviction
- spike-first
- cdp-worker-close
- roadmap-sc-1
- charter-d-p4-01
user_setup: []
must_haves:
truths:
- "Wave 0 spike verifies empirically whether the offscreen document survives 5-min SW idle + worker.close() while MediaRecorder is actively recording — informs whether A33 is verification-only OR needs IndexedDB persistence work"
- "stopServiceWorker(browser, extensionId) helper exists in tests/uat/lib/harness-page-driver.ts using Puppeteer CDP browser.waitForTarget + worker.close() per Chrome devrel canonical pattern"
- "assertA33 / driveA33 land per the spike outcome: if PASS (offscreen survives) → verification-only A33 that does a real 5-min wall-clock idle + SW kill + SAVE → asserts archive's video/last_30sec.webm size > 100 KB"
- "A33 is env-gated by SKIP_LONG_UAT (default: RUN for closure + alpha gate; SKIP_LONG_UAT=1 to skip for per-commit iteration)"
- "UAT harness count flips from 33 → 34 (A33 added); 34/34 GREEN when SKIP_LONG_UAT unset"
- "ROADMAP SC #1 (SW state persistence) GREEN — A33 empirical evidence that a real-world 5-min idle + SAVE produces a non-empty video buffer"
- "SAVE_ARCHIVE dispatch reuses the canonical chrome.runtime.sendMessage({type: 'SAVE_ARCHIVE'}, ...) pattern from the harness page realm (NOT a new __mokoshHarness helper); matches the established assertA5/A11/A12/A13/A26/A28/A29/A30/A31 precedent"
artifacts:
- path: "tests/uat/extension-page-harness.ts"
provides: "assertA33 page-side stub (or thin driver) for SAVE_ARCHIVE dispatch after host-side SW kill"
contains: "assertA33"
- path: "tests/uat/lib/harness-page-driver.ts"
provides: "stopServiceWorker(browser, extensionId) NEW helper + driveA33 host-side CDP-kill + JSZip video-size check"
contains: "worker.close()"
- path: "tests/uat/harness.test.ts"
provides: "driveA33 import + wrapped driver const (passes handles.browser + handles.extensionId + handles.downloadsDir) + drivers-array push + SKIP_LONG_UAT env-gate wrapper"
contains: "driveA33Wrapped"
key_links:
- from: "tests/uat/harness.test.ts driveA33Wrapped"
to: "tests/uat/lib/harness-page-driver.ts driveA33(page, browser, extensionId, downloadsDir)"
via: "(page) => driveA33(page, handles.browser, handles.extensionId, handles.downloadsDir)"
pattern: "handles\\.browser.*handles\\.extensionId"
- from: "tests/uat/lib/harness-page-driver.ts stopServiceWorker"
to: "Chrome MV3 SW target via Puppeteer CDP"
via: "browser.waitForTarget(t => t.type()==='service_worker' && t.url().startsWith(`chrome-extension://${extensionId}`)) + target.worker().close()"
pattern: "service_worker"
- from: "tests/uat/lib/harness-page-driver.ts driveA33 video-size check"
to: "zip.file('video/last_30sec.webm') → byteLength > 100_000"
via: "JSZip.loadAsync + entry.async('uint8array')"
pattern: "video/last_30sec\\.webm"
---
<objective>
Ship the A33 harness assertion that empirically verifies ROADMAP SC #1 (SW state persistence across the 30s idle unload edge cases). Per RESEARCH Q2: the current architecture stores segments only in offscreen-document RAM (src/offscreen/recorder.ts:91 `let segments: Blob[] = []`). The SW NEVER stores the buffer. So the actual question becomes: does the offscreen document survive 5 minutes of SW idle? Per Chrome docs, the offscreen has its own lifecycle independent of the SW, with active `MediaRecorder` being the canonical "compelling reason" to keep the offscreen alive.
This plan uses the SPIKE-FIRST approach to avoid over-engineering:
**Wave 0 (spike):** A30-min empirical investigation. Start recording; wait 5 min real wall-clock; force-kill the SW via Puppeteer CDP `worker.close()` (Chrome devrel canonical pattern; Puppeteer ≥22.1.0 supports it — our pin ^25 is comfortably above); dispatch SAVE_ARCHIVE from page.evaluate; check the resulting archive's `video/last_30sec.webm` size.
**Wave 1 (impl):** Based on spike outcome:
- **If spike PASSES** (likely outcome per RESEARCH architecture analysis + Chrome docs): A33 is a VERIFICATION-ONLY harness assertion that wraps the spike methodology into a repeatable test. Ships the spike's exact pattern as `driveA33` + `assertA33` + orchestrator wiring + env-gate. ROADMAP SC #1 is satisfied by the CURRENT architecture; no persistence layer needed.
- **If spike FAILS** (the offscreen dies along with the SW, contrary to Chrome docs): A33 implementation expands per RESEARCH Q2 sub-question (b) recommendation (Option C: IndexedDB persistence in offscreen — Blobs serialize cleanly to IDB; structured-clone supports them natively; per-segment write ~3 MB; ~3 writes per 30s window). This is a wider plan rewrite; the plan-checker should flag for re-planning if it materializes. RESEARCH confidence on offscreen-surviving-SW-kill is MEDIUM; the spike-first approach is the canonical risk hedge per Plan 01-07 precedent.
Purpose: Forms the empirical evidence for ROADMAP SC #1 ("After running the extension idle for >5 minutes, then exporting, the archive still contains a non-empty video buffer"). The spike-first approach hedges against the RESEARCH MEDIUM-confidence assumption (A3 — offscreen survives SW eviction). If the assumption holds, A33 is a verification gate; if not, persistence work is plain-needed.
Output: 1 NEW assertion (A33; harness count 33→34); 1 NEW helper (`stopServiceWorker` CDP wrapper); 3-file lockstep update per the Approach B pattern (extension-page-harness.ts + harness-page-driver.ts + harness.test.ts); env-gate via SKIP_LONG_UAT (default = RUN; set to '1' to skip for per-commit iteration).
</objective>
<execution_context>
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/04-harden-clean-up-optional/04-CONTEXT.md
@.planning/phases/04-harden-clean-up-optional/04-RESEARCH.md
@.planning/phases/04-harden-clean-up-optional/04-PATTERNS.md
# Source files — locus of the harness extension
@tests/uat/extension-page-harness.ts
@tests/uat/lib/harness-page-driver.ts
@tests/uat/harness.test.ts
@tests/uat/lib/launch.ts
@src/offscreen/recorder.ts
@src/background/index.ts
# Prior plan SUMMARYs to mirror — Approach B harness extension precedent
@.planning/phases/02-stabilize-export-pipeline/02-04-SUMMARY.md
@.planning/phases/03-spec-10-smoke-verification-dom-event-log-verification/03-04-SUMMARY.md
<interfaces>
<!-- Key shapes the executor consumes directly. Extracted from codebase + RESEARCH 2026-05-21. -->
From tests/uat/lib/launch.ts:80-90 (HarnessHandles — already exposes browser + extensionId; no extension needed):
```typescript
export interface HarnessHandles {
readonly browser: Browser; // ← already exposed; used by driveA33
readonly extensionId: string; // ← already exposed; used by driveA33
readonly harnessPage: Page;
readonly victimPage: Page;
readonly downloadsDir: string;
readonly swConsole: string[];
readonly offConsole: string[];
}
```
From RESEARCH Q2 Code Example Pattern 1 (stopServiceWorker — NEW helper; verbatim from Chrome devrel doc):
```typescript
import type { Browser } from 'puppeteer';
/**
* Force-terminate the MV3 service worker via Puppeteer CDP. Required
* because Puppeteer's persistent CDP attach keeps SWs alive indefinitely;
* natural 30s idle eviction does NOT fire under test conditions per Chrome
* docs (https://developer.chrome.com/docs/extensions/develop/concepts/service-workers/lifecycle).
*
* Reference: https://developer.chrome.com/docs/extensions/how-to/test/test-serviceworker-termination-with-puppeteer
*/
async function stopServiceWorker(browser: Browser, extensionId: string): Promise<void> {
const host = `chrome-extension://${extensionId}`;
const target = await browser.waitForTarget(
(t) => t.type() === 'service_worker' && t.url().startsWith(host),
);
const worker = await target.worker();
if (worker !== null) {
await worker.close();
}
}
```
SAVE_ARCHIVE dispatch (REVISION iter-2 — Option B per plan-checker BLOCKER 2):
The harness page realm has `chrome.runtime` available (it's an extension-page realm — `chrome-extension://<id>/tests/uat/extension-page-harness.html`). The canonical SAVE_ARCHIVE dispatch is `chrome.runtime.sendMessage({type: 'SAVE_ARCHIVE'}, callback)` — used by 9 existing assertions (A5/A11/A12/A13/A26/A28/A29/A30/A31; verified via `grep "type: 'SAVE_ARCHIVE'" tests/uat/extension-page-harness.ts`). The `__mokoshHarness` surface is `assertA1..A31 + getManifestVersion`; there is NO `dispatchSaveArchive` helper. We do NOT add one — driveA33 dispatches SAVE_ARCHIVE directly via `page.evaluate` + a promise-wrapped `chrome.runtime.sendMessage` callback. This matches Plan 04-05's approach (which dispatches SAVE_ARCHIVE from inside an existing assertA34 method that runs in the harness page realm).
From RESEARCH Q2 Code Example Pattern 4 (driveA33 — host-side body; REVISION iter-2 inline-dispatched SAVE):
```typescript
const A33_IDLE_WAIT_MS = 5 * 60 * 1000; // 300_000 — real wall-clock
const A33_NEW_SW_BOOT_MS = 500; // post-worker.close() settle
const A33_OVERALL_TIMEOUT_MS = A33_IDLE_WAIT_MS + 60_000; // 360_000
const A33_SAVE_ARCHIVE_TIMEOUT_MS = 15_000;
export async function driveA33(
page: Page,
browser: Browser,
extensionId: string,
downloadsDir: string,
): Promise<AssertionRecord> {
const r: AssertionRecord = { name: 'A33', passed: false, checks: [], diagnostics: [] };
// Step 1: prime recording on the probe tab via the existing harness primitive.
// setupFreshRecording is a module-internal helper inside extension-page-harness.ts
// and is reachable from page.evaluate only if exposed; in practice driveA33 calls
// assertA1 (or an equivalent existing harness method that primes a fresh recording)
// OR a thin Plan-04-04 page-side wrapper if the prior arts don't suffice. Verify
// in Task 2 read_first which existing assertA* method delivers a fresh-recording
// SUT and reuse it directly (no new harness method needed).
await page.evaluate(async () => {
// Reuse the same fresh-recording primitive that A5/A26/A30/A31 use as their Step 1.
// The exact call depends on whether setupFreshRecording is exposed; if not, A33's
// first step calls __mokoshHarness.assertA1 (which is the canonical "fresh
// recording bootstrap" assertion in the harness surface).
const harness = (window as { __mokoshHarness: { assertA1: () => Promise<unknown> } }).__mokoshHarness;
await harness.assertA1();
});
// Step 2: 5-min wall-clock idle
r.diagnostics.push(`waiting ${A33_IDLE_WAIT_MS}ms for SW idle window`);
await new Promise((res) => setTimeout(res, A33_IDLE_WAIT_MS));
// Step 3: force SW termination via CDP
await stopServiceWorker(browser, extensionId);
r.diagnostics.push('SW terminated via worker.close()');
// Step 4: brief settle for SW teardown
await new Promise((res) => setTimeout(res, A33_NEW_SW_BOOT_MS));
// Step 5: dispatch SAVE_ARCHIVE via chrome.runtime.sendMessage from the harness
// page realm — matches the established A5/A11/A12/A13/A26/A28/A29/A30/A31 pattern.
// Wakes SW back up as an event (event-driven respawn is the canonical MV3 wakeup path).
const saveResult = await page.evaluate(
(timeoutMs: number) =>
new Promise<{ success: boolean; error?: string }>((resolve) => {
const timer = setTimeout(() => {
resolve({ success: false, error: `SAVE_ARCHIVE timed out after ${timeoutMs}ms` });
}, timeoutMs);
chrome.runtime.sendMessage({ type: 'SAVE_ARCHIVE' }, (response: unknown) => {
clearTimeout(timer);
if (chrome.runtime.lastError !== undefined) {
resolve({ success: false, error: String(chrome.runtime.lastError.message) });
return;
}
resolve(response as { success: boolean; error?: string });
});
}),
A33_SAVE_ARCHIVE_TIMEOUT_MS,
);
r.checks.push({
name: 'A33.1: SAVE_ARCHIVE ack success after 5-min idle + SW kill',
expected: true,
actual: saveResult.success,
passed: saveResult.success === true,
});
// Step 6: verify zip contains non-empty video buffer
const zipPath = findLatestZip(downloadsDir);
if (zipPath === null) {
r.checks.push({ name: 'A33.0: zip present', expected: '>=1 zip', actual: 'none', passed: false });
r.passed = false;
return r;
}
const zip = await JSZip.loadAsync(readFileSync(zipPath));
const videoEntry = zip.file('video/last_30sec.webm');
const videoSize = videoEntry !== null
? (await videoEntry.async('uint8array')).byteLength
: 0;
r.checks.push({
name: 'A33.2: video/last_30sec.webm size > 0 (buffer survived SW eviction)',
expected: '>0',
actual: String(videoSize),
passed: videoSize > 0,
});
r.checks.push({
name: 'A33.3: video size > 100 KB (sanity floor; real archives 1-3 MB)',
expected: '>100000',
actual: String(videoSize),
passed: videoSize > 100_000,
});
r.passed = r.checks.every((c) => c.passed);
return r;
}
```
From RESEARCH Q2 sub-question (c) env-gate recommendation:
```typescript
// In tests/uat/harness.test.ts drivers-array entry:
{
name: 'A33',
drive: process.env.SKIP_LONG_UAT === '1'
? async (): Promise<AssertionRecord> => ({
name: 'A33',
passed: true,
checks: [],
diagnostics: ['A33 SKIPPED (SKIP_LONG_UAT=1; unset to run 5-min idle test)'],
})
: driveA33Wrapped,
},
```
Default polarity: SKIP_LONG_UAT unset → RUN A33 (this matches the closure + alpha-gate semantics; per-commit dev iteration uses SKIP_LONG_UAT=1).
From src/offscreen/recorder.ts:91 (architecture invariant — segments only in offscreen RAM):
```typescript
let segments: Blob[] = []; // module-level state; NO chrome.storage.local persistence; NO IndexedDB
```
The spike's job: verify this RAM-only design survives a 5-min SW idle. If it does, ROADMAP SC #1 is satisfied with ZERO source-code changes — just a new harness assertion that exercises the path.
</interfaces>
</context>
<tasks>
<task type="auto">
<name>Task 1: Wave 0 SPIKE — empirical verification that offscreen survives 5-min SW idle</name>
<files>tests/uat/lib/harness-page-driver.ts</files>
<read_first>tests/uat/lib/harness-page-driver.ts (full; ~2200 lines — read selectively: imports lines 1-40, findLatestZip ~1395, driveA30 host-side filter ~2039-2148), tests/uat/extension-page-harness.ts:3932-4021 (__mokoshHarness global registration block — confirm available surface BEFORE writing spike), tests/uat/extension-page-harness.ts:600-700 (setupFreshRecording helper context), src/offscreen/recorder.ts:80-100 (segments array context), .planning/phases/04-harden-clean-up-optional/04-RESEARCH.md Q2 sub-question (b), .planning/phases/01-stabilize-video-pipeline/01-07-SUMMARY.md (Plan 07 spike precedent)</read_first>
<action>
1. Add the `stopServiceWorker(browser, extensionId)` helper to `tests/uat/lib/harness-page-driver.ts` per the Code Example in `<interfaces>` above. Place it near the top of the file (after existing imports + before existing driveA-* functions). Add the `import type { Browser } from 'puppeteer';` if not already present.
2. Create a one-shot spike script `tests/uat/spike-a33-sw-persistence.ts` (NEW; treat as scratch file for this spike — delete after spike concludes; record outcome in plan SUMMARY). The script:
- Imports `launchHarnessBrowser` from `./lib/launch.ts`.
- Imports `stopServiceWorker` + `findLatestZip` from `./lib/harness-page-driver.ts`.
- Launches the harness browser.
- **Prime recording (REVISION iter-2 — Option B; no `dispatchSaveArchive` helper exists on `__mokoshHarness`):** call the existing fresh-recording primitive via `await handles.harnessPage.evaluate(() => (window as { __mokoshHarness: { assertA1: () => Promise<unknown> } }).__mokoshHarness.assertA1());`. The Task 1 read_first MUST verify that `__mokoshHarness.assertA1` is the canonical fresh-recording bootstrap (it is per the existing harness — `Harness ready. window.__mokoshHarness.{assertA1..A31, getManifestVersion} available.`); if a different assertA* method is more direct for "prime + leave recording active for 5 min", choose that instead and document in the spike script comment.
- `console.log('SPIKE: waiting 5 minutes for SW idle window...')`
- `await new Promise(r => setTimeout(r, 5 * 60 * 1000));`
- `await stopServiceWorker(handles.browser, handles.extensionId);`
- `await new Promise(r => setTimeout(r, 500));` (settle)
- **Dispatch SAVE_ARCHIVE (REVISION iter-2 — Option B; canonical chrome.runtime.sendMessage from harness page realm):**
```typescript
const saveResult = await handles.harnessPage.evaluate(
(timeoutMs: number) =>
new Promise<{ success: boolean; error?: string }>((resolve) => {
const timer = setTimeout(() => {
resolve({ success: false, error: `SAVE_ARCHIVE timed out after ${timeoutMs}ms` });
}, timeoutMs);
chrome.runtime.sendMessage({ type: 'SAVE_ARCHIVE' }, (response: unknown) => {
clearTimeout(timer);
if (chrome.runtime.lastError !== undefined) {
resolve({ success: false, error: String(chrome.runtime.lastError.message) });
return;
}
resolve(response as { success: boolean; error?: string });
});
}),
15_000,
);
console.log(`SPIKE: SAVE_ARCHIVE ack -> ${JSON.stringify(saveResult)}`);
```
- `await new Promise(r => setTimeout(r, 5000));` (let download complete)
- `const zipPath = findLatestZip(handles.downloadsDir);`
- `const zip = await JSZip.loadAsync(readFileSync(zipPath));`
- `const videoEntry = zip.file('video/last_30sec.webm');`
- `const videoSize = videoEntry ? (await videoEntry.async('uint8array')).byteLength : 0;`
- `console.log(\`SPIKE RESULT: videoSize=${videoSize} bytes (>0 = OFFSCREEN SURVIVED; =0 = OFFSCREEN DIED)\`);`
- `await handles.browser.close();`
3. Run the spike: `tsx tests/uat/spike-a33-sw-persistence.ts` with HEADLESS=1 (so it runs in CI mode; ~5 min wall-clock).
4. Record the result. If videoSize > 100_000 → SPIKE PASSED (offscreen survives) → proceed to Task 2 with verification-only A33. If videoSize ≤ 100_000 OR throw → SPIKE FAILED → SUMMARY documents the failure mode + flag to plan-checker for re-planning (IndexedDB persistence work would expand Plan 04-04 substantially; that's a planning event, not an execution event).
5. Commit the `stopServiceWorker` helper (Task 1's persisting artifact). The spike script is OK to delete OR keep committed as `tests/uat/spike-*.ts` for future SW-lifecycle investigations.
</action>
<verify>
<automated>npx tsc --noEmit && HEADLESS=1 tsx tests/uat/spike-a33-sw-persistence.ts 2>&1 | tee /tmp/04-04-spike.log; grep -c 'SPIKE RESULT' /tmp/04-04-spike.log</automated>
</verify>
<acceptance_criteria>
- `stopServiceWorker(browser, extensionId)` helper exists at `tests/uat/lib/harness-page-driver.ts` with the canonical Chrome devrel signature (`Browser` + extensionId args; `target.worker()?.close()` body).
- Spike script ran to completion (no Puppeteer throw).
- Spike result logged with explicit `videoSize=<N> bytes` line.
- Spike SAVE_ARCHIVE dispatch uses `chrome.runtime.sendMessage({type: 'SAVE_ARCHIVE'}, ...)` directly (NOT a non-existent `__mokoshHarness.dispatchSaveArchive()` call); verify by `grep -c 'dispatchSaveArchive' tests/uat/spike-a33-sw-persistence.ts` returns 0 AND `grep -c "type: 'SAVE_ARCHIVE'" tests/uat/spike-a33-sw-persistence.ts` returns ≥ 1.
- If videoSize > 100_000: spike PASSED; proceed to Task 2 with verification-only path.
- If videoSize ≤ 100_000: spike FAILED; pause plan + flag to plan-checker for re-planning (out of scope for this task to escalate, but SUMMARY documents).
- Total spike wall-clock: ~6-7 minutes (5 min idle + ~1-2 min orchestration).
</acceptance_criteria>
<done>Spike outcome recorded in plan SUMMARY; stopServiceWorker helper committed. Atomic commit: `feat(04-04): Wave 0 spike — stopServiceWorker helper + 5-min SW idle empirical result`.</done>
</task>
<task type="auto">
<name>Task 2: Wave 1 — A33 assertion + driveA33 + orchestrator wiring (assumes spike PASSED)</name>
<files>tests/uat/extension-page-harness.ts, tests/uat/lib/harness-page-driver.ts, tests/uat/harness.test.ts</files>
<read_first>tests/uat/extension-page-harness.ts:3517-3636 (assertA30 — canonical setupFreshRecording + SAVE pattern), tests/uat/extension-page-harness.ts:3878-3917 (assertA31 — most-recent chrome.runtime.sendMessage SAVE_ARCHIVE pattern; copy this), tests/uat/extension-page-harness.ts:3932-4021 (__mokoshHarness global registration block — confirm NO new method added per REVISION iter-2), tests/uat/lib/harness-page-driver.ts:2039-2148 (driveA30 — host-side filter pattern), tests/uat/harness.test.ts:100-110 (import block), tests/uat/harness.test.ts:340-360 (wrapped-driver block), tests/uat/harness.test.ts:459-486 (drivers-array push block), tests/uat/harness.test.ts:225-240 (SKIP_PROD_REBUILD env-gate pattern)</read_first>
<action>
**GATING CONDITION:** Task 1 spike produced videoSize > 100_000. (If FAILED, this task is BLOCKED and the plan must be re-planned to add IndexedDB persistence work.)
3-file lockstep update per the Approach B harness extension pattern:
**File 1: tests/uat/extension-page-harness.ts**
- REVISION iter-2 — Option B per plan-checker BLOCKER 2: the existing `__mokoshHarness` surface is `assertA1..A31 + getManifestVersion`; `dispatchSaveArchive` does NOT exist and we do NOT add it. SAVE_ARCHIVE dispatch happens directly via `chrome.runtime.sendMessage` inside driveA33's `page.evaluate` (matches the established assertA31 pattern at lines 3886-3890).
- Decision: NO new page-side function. driveA33 (host-side) drives Step 1 (prime) by calling an existing `__mokoshHarness.assertA<N>` method that bootstraps a fresh recording (confirm in read_first which existing assertA* is the canonical "prime fresh recording" entrypoint — `assertA1` is the leading candidate; falling back to `assertA5`/`assertA26` if a more direct method matches the spike's actual call site). Step 5 (SAVE) uses inline `chrome.runtime.sendMessage` per the `<interfaces>` block above.
- Verify no edits needed to `__mokoshHarness` registration block (lines 3932-4015): the surface stays at 31 assertA* + getManifestVersion. The Tier-1 FORBIDDEN_HOOK_STRINGS inventory stays at 12 entries (no new test-only symbol).
- If, during read_first, the planner determines that NONE of the existing assertA* methods deliver "prime + leave recording active for ≥5 min", THEN add a thin page-side primer `primeForA33` that calls existing production-surface APIs (REQUEST_PERMISSIONS → START_RECORDING via chrome.runtime.sendMessage); this is a deviation from Option B and must be flagged in the SUMMARY. Per RESEARCH note (FORBIDDEN_HOOK_STRINGS stays at 12): NO new test-only `__MOKOSH_UAT__`-gated symbol; any new page-side helper uses production APIs only.
**File 2: tests/uat/lib/harness-page-driver.ts**
- Append `driveA33` function per RESEARCH Code Example Pattern 4 (full body in `<interfaces>` above; REVISION iter-2 inline-dispatched SAVE).
- Place it after the existing driveA32 (which is the most-recent Phase 3 addition).
- Verify the `stopServiceWorker` helper from Task 1 is in scope (same file).
- Filter-pipeline form; no `continue`; typed function signature `(page, browser, extensionId, downloadsDir) => Promise<AssertionRecord>` per the new 4-arg shape.
- Add `import { readFileSync } from 'node:fs';` + `import JSZip from 'jszip';` if not already present (they should be — these are reused from driveA29/30/31).
- The Step-5 SAVE_ARCHIVE inline `page.evaluate` block uses `chrome.runtime.sendMessage({type: 'SAVE_ARCHIVE'}, callback)` per the `<interfaces>` Code Example; verify by `grep -c "type: 'SAVE_ARCHIVE'" tests/uat/lib/harness-page-driver.ts` increases by ≥ 1 vs pre-edit baseline (was 0 in driveA29/30/31 because those call sites are inside extension-page-harness.ts assertA* methods; A33 is unique in dispatching from the host-side via page.evaluate).
**File 3: tests/uat/harness.test.ts**
- Import: add `driveA33,` to the import block at ~line 101 (alongside `driveA29`-`driveA32`).
- Wrapped-driver: add at ~line 357 (after `driveA31Wrapped`):
```typescript
// Plan 04-04 — driveA33 needs Browser + extensionId for CDP-based SW kill
// AND downloadsDir for host-side JSZip parse of post-restart zip.
const driveA33Wrapped: (page: import('puppeteer').Page) => Promise<AssertionRecord> =
(page) => driveA33(page, handles.browser, handles.extensionId, handles.downloadsDir);
```
- Drivers-array push: add at ~line 486 (after the existing A32 entry):
```typescript
// Plan 04-04 A33: SW state persistence 5-min idle (ROADMAP SC #1; RESEARCH Q2).
// Forces SW eviction via Puppeteer CDP worker.close() per the canonical
// Chrome devrel pattern (RESEARCH Pattern 1). Verifies offscreen-RAM
// segments survive SW restart. Env-gated by SKIP_LONG_UAT for fast
// per-commit iteration; defaults to RUN for Phase 4 closure + alpha gate.
{
name: 'A33',
drive: process.env.SKIP_LONG_UAT === '1'
? async (): Promise<AssertionRecord> => ({
name: 'A33',
passed: true,
checks: [],
diagnostics: ['A33 SKIPPED (SKIP_LONG_UAT=1; unset to run 5-min idle test)'],
})
: driveA33Wrapped,
},
```
Verify:
- `npx tsc --noEmit` exits 0.
- `npm run build:test` exits 0.
- Quick UAT: `HEADLESS=1 SKIP_PROD_REBUILD=1 SKIP_LONG_UAT=1 npm run test:uat` exits 0 with 34/34 GREEN (A33 SKIPPED message visible; preserves baseline + adds A33 skip placeholder).
- Full UAT: `HEADLESS=1 SKIP_PROD_REBUILD=1 npm run test:uat` exits 0 with 34/34 GREEN (A33 actually runs ~6 min wall-clock; A33.1 SAVE ack + A33.2 size > 0 + A33.3 size > 100 KB all PASS).
- Tier-1 FORBIDDEN_HOOK_STRINGS check: `grep -c 'FORBIDDEN_HOOK_STRINGS' tests/uat/harness.test.ts tests/background/no-test-hooks-in-prod-bundle.test.ts` — verify the inventory count in both files unchanged (preserves the 12-entry invariant per CONTEXT §"Claude's Discretion").
- REVISION iter-2 gate: `grep -c 'dispatchSaveArchive' tests/uat/lib/harness-page-driver.ts tests/uat/extension-page-harness.ts tests/uat/harness.test.ts` returns 0 (the non-existent helper is NOT introduced).
</action>
<verify>
<automated>npx tsc --noEmit && npm run build:test && HEADLESS=1 SKIP_PROD_REBUILD=1 SKIP_LONG_UAT=1 npm run test:uat 2>&1 | tail -5 | tee /tmp/04-04-task-2-skip.log; grep -c '34/34' /tmp/04-04-task-2-skip.log; grep -c 'dispatchSaveArchive' tests/uat/lib/harness-page-driver.ts tests/uat/extension-page-harness.ts tests/uat/harness.test.ts</automated>
</verify>
<acceptance_criteria>
- `npx tsc --noEmit` exits 0.
- `npm run build:test` exits 0.
- UAT harness count flips 33 → 34 (A33 added).
- Skip-mode run: `HEADLESS=1 SKIP_PROD_REBUILD=1 SKIP_LONG_UAT=1 npm run test:uat` GREEN 34/34 (A33 SKIPPED placeholder GREEN; total takes ~95s — unchanged).
- Full-mode run: `HEADLESS=1 SKIP_PROD_REBUILD=1 npm run test:uat` GREEN 34/34 (~6.5 min; A33 actually runs and passes A33.1 + A33.2 + A33.3).
- `grep -c 'A33' tests/uat/harness.test.ts` returns ≥ 4 (import + wrapped + push + comment banner).
- `grep -c 'SKIP_LONG_UAT' tests/uat/harness.test.ts` returns ≥ 2 (env-gate + comment).
- FORBIDDEN_HOOK_STRINGS count unchanged at 12 (no new test-only symbols introduced per CONTEXT §"Claude's Discretion"; verify by `wc -l` of the inventory arrays).
- REVISION iter-2 gate (Option B): `grep -c 'dispatchSaveArchive' tests/uat/` returns 0 across all harness files; SAVE_ARCHIVE dispatched via `chrome.runtime.sendMessage({type: 'SAVE_ARCHIVE'}, ...)` only.
</acceptance_criteria>
<done>A33 lands; UAT 33→34 GREEN; SW persistence empirically verified at 5-min idle scale. Atomic commit: `feat(04-04): Wave 1 — A33 SW state persistence harness assertion (34/34 GREEN)`.</done>
</task>
</tasks>
<threat_model>
## Trust Boundaries
| Boundary | Description |
|----------|-------------|
| Puppeteer CDP → Chrome MV3 SW realm | `worker.close()` invokes the SW's `self.close()` via CDP `ServiceWorker.unregister` — this terminates the SW realm but does NOT touch the offscreen document's WebContents target. Native CDP surface; no untrusted input. |
| Test idle interval (5 min wall-clock) → MediaRecorder active segment buffer | the MediaRecorder is in `state === 'recording'` during the idle; segments rotate every 10s; the offscreen-RAM array accumulates 30 segments (5 min × 60 sec / 10 sec per segment); trim-to-last-3 keeps memory bounded ≤ 30 MB (well under CON-ram-ceiling) |
## STRIDE Threat Register
| Threat ID | Category | Component | Disposition | Mitigation Plan |
|-----------|----------|-----------|-------------|-----------------|
| T-04-04-01 | Tampering | a future architectural change might move the segments array to SW-side state, breaking the offscreen-survives-SW assumption A33 verifies | mitigate | A33 is a regression-catching gate; if a future PR moves segments off-offscreen, A33 fails fast (videoSize=0 after SW kill) |
| T-04-04-02 | DoS (CI) | A33's 5-min idle adds ~5 min to harness wall-clock (95s → 395s); per-commit CI lanes would suffer | mitigate | Env-gated by SKIP_LONG_UAT (default RUN for closure + alpha; documented per-commit SKIP_LONG_UAT=1 for dev iteration) |
| T-04-04-03 | Repudiation | natural 30s idle eviction does NOT fire under Puppeteer's CDP attach per Chrome docs; if a developer naively writes "wait 5 min and hope SW dies" the test silently passes via a SW that never died | mitigate | The CDP `worker.close()` call is explicit + cited in code comment; RESEARCH Pitfall 4 documents the misconception |
| T-04-04-04 | Spoofing | Puppeteer 25.x patch versions could in theory change `Worker.close()` semantics; the canonical Chrome devrel pattern is pinned at Puppeteer ≥22.1.0 | accept | The project pin `puppeteer: ^25.0.2` is comfortably past the 22.1.0 floor; minor patch drift expected to be backward-compatible per Puppeteer's semver discipline. If A33 ever fails post-Puppeteer-upgrade, the SUMMARY's commit ref provides the exact Puppeteer version where it was validated. |
</threat_model>
<verification>
- `npx tsc --noEmit` exits 0.
- `npm run build:test` exits 0.
- UAT harness count 33 → 34.
- Skip-mode: `HEADLESS=1 SKIP_LONG_UAT=1 npm run test:uat` GREEN 34/34 in ~95s.
- Full-mode: `HEADLESS=1 npm run test:uat` GREEN 34/34 in ~6.5 min.
- ROADMAP SC #1 GREEN — A33 empirical evidence: video buffer survived 5-min SW idle + worker.close().
- FORBIDDEN_HOOK_STRINGS count unchanged at 12.
- vitest baseline preserved (≥ 181 GREEN from Plans 04-01 + 04-02).
- A29 + A30 + A31 + A32 unchanged (no regression to existing assertions).
- REVISION iter-2 invariant: `grep -c 'dispatchSaveArchive' tests/uat/` returns 0 across spike script + harness files.
</verification>
<success_criteria>
- Wave 0 spike PASSED — empirical evidence that offscreen survives 5-min SW idle (Task 1).
- assertA33 + driveA33 + stopServiceWorker helper + harness orchestrator wiring landed (Task 2).
- UAT harness count 33 → 34 GREEN.
- ROADMAP SC #1 (SW state persistence) GREEN.
- Env-gated long-test pattern established (SKIP_LONG_UAT) — pattern reused by any future ≥5-min test.
- Pre-checkpoint bundle gates 6/6 PASS unchanged (Plan 04-04 makes no source-code changes).
</success_criteria>
<output>
After completion, create `.planning/phases/04-harden-clean-up-optional/04-04-SUMMARY.md` capturing:
- Spike outcome (videoSize value + interpretation; SPIKE PASSED/FAILED tag)
- stopServiceWorker helper diff (full body)
- driveA33 diff (full body; inline chrome.runtime.sendMessage SAVE per REVISION iter-2 Option B)
- Orchestrator wiring diff (3 sites in harness.test.ts)
- SKIP_LONG_UAT env-gate decision (default RUN; rationale)
- UAT before/after (33/33 → 34/34)
- Full-mode wall-clock benchmark (e.g., ~6.5 min)
- ROADMAP SC #1 closure evidence
- Commit refs (Task 1 spike + Task 2 impl)
- If spike FAILED: detailed failure mode + flag for re-planning (this branch is unlikely per RESEARCH MEDIUM-confidence; document as ALPHA-PATH-NOT-TAKEN)
</output>
</content>