Files
Mark c1501e7a81 docs(04-04): amend SUMMARY post-debug session-2 — REFUTED-architecture verdict
Session-2 (/gsd-debug continuation) empirically refuted the SUMMARY's
original 'architecture broken → IndexedDB plan-fix needed' interpretation:

- Pre-kill probe: segments.length=3 (segments accumulated correctly during 5-min idle)
- Post-kill probe: segments.length=3 (offscreen-RAM survives SW kill structurally)
- Step C (no worker.close, just 5-min idle): identical 8505 bytes (CDP not the cause)
- Remux logs: each segment trackInfo=320x180 but 0 frames per segment
- 7/7 spike runs deterministic at 8505 bytes (canvas-captureStream throttling)

Root cause: installFakeDisplayMedia() at src/test-hooks/offscreen-hooks.ts:139-264
mints canvas.captureStream(30) on hidden -9999px-offset canvas; headless-Chromium
throttles MediaRecorder on invisible-canvas (Chrome bug 653548). Segments exist
but contain zero VP9 frames over 5-min idle.

Routing: Plan 04-08 inserted (user-authorized ceremony 2026-05-22) — video-file
MediaStream methodology reframe (Option 2 from session-2). IndexedDB plan-fix
recommendation REJECTED — would not close SC#1 because frames are the problem,
not segments.

stopServiceWorker helper + spike script + launch.ts:225 race-tolerant fix all
remain valid persisting artifacts for Plan 04-08.
2026-05-22 08:14:44 +02:00

268 lines
36 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
phase: 04-harden-clean-up-optional
plan: 04
subsystem: testing
tags:
- uat-harness
- a33
- sw-state-persistence
- sw-eviction
- spike-first
- spike-failed
- cdp-worker-close
- roadmap-sc-1-open
- charter-d-p4-01
- phase-4-wave-3
- plan-fix-ceremony-needed
requires:
- phase: 01-stabilize-video-pipeline
provides: "src/offscreen/recorder.ts:91 `let segments: Blob[] = []` module-level RAM-only buffer — the canonical Plan 01-07 D-13 restart-segments architecture (3 × 10s self-contained WebM segments, RAM-only, no persistence layer). Plan 04-04 Wave 0 SPIKE empirically tested whether this RAM-only design survives a 5-min SW idle + Puppeteer CDP worker.close() — outcome below."
- phase: 03-spec-10-smoke-verification-dom-event-log-verification
provides: "Plan 03-01/03-02/03-03 cs-injection-world + harness-internal SAVE_ARCHIVE dispatch pattern (chrome.runtime.sendMessage from harness-page realm). Plan 04-04 spike reuses this dispatch pattern verbatim per REVISION iter-2 Option B (no new __mokoshHarness method)."
- plan: 04-01
provides: "audit P1 polish baseline (vitest 180 → preserved at 183 by Plan 04-02 + Plan 04-03)."
- plan: 04-02
provides: "Tier-1 FORBIDDEN_HOOK_STRINGS inventory at 12 entries; SW chunk `new Function`=0 + `eval`=0 baseline. Plan 04-04 made zero source-code changes so all bundle gates remain unchanged from Plan 04-02 polarity."
- plan: 04-03
provides: "A29 strict-sentinel flake closure; UAT harness 33/33 GREEN baseline; vitest 183/183 GREEN baseline (subject to documented pre-existing flakes when run in parallel — see Issues Encountered below). Plan 04-04 preserves both baselines because Task 2 (which would have flipped UAT 33→34) is BLOCKED by the spike outcome."
provides:
- "Empirical evidence (one full HEADLESS=1 spike run; reproducible script committed) that the current offscreen-document `segments: Blob[] = []` RAM-only architecture does NOT survive 5 minutes of SW idle followed by Puppeteer CDP `worker.close()`. videoSize after SAVE = 8505 bytes vs the 100 KB sanity floor (typical healthy 30s archive = 1-3 MB). The 8505 bytes are corrupt WebM (ffprobe: 'End of file' + 'Duplicate element' errors with no valid clusters); rrweb/session.json = []; logs/events.json = []; meta.urls = chrome-extension://* only (real-page URLs lost). REFUTES the RESEARCH Q2 MEDIUM-confidence hypothesis (A3) that the offscreen has an independent lifecycle anchored by active MediaRecorder."
- "stopServiceWorker(browser, extensionId) helper at tests/uat/lib/harness-page-driver.ts (verbatim Chrome devrel canonical pattern; Puppeteer >=22.1.0; project pin ^25 OK). Future plan-fix work (IndexedDB persistence) reuses this helper to verify whatever persistence layer it adds actually closes ROADMAP SC #1. Persisting artifact even though Task 2 is BLOCKED — the helper is non-empty contract scaffolding for the eventual plan-fix's verification harness."
- "tests/uat/spike-a33-sw-persistence.ts one-shot reproducible empirical investigation script. Committed (not deleted) so the eventual IndexedDB-persistence plan-fix can re-run the spike to verify its fix closes the gap. Reusable for any future SW-lifecycle regression testing."
- "findLatestZip exported from tests/uat/lib/harness-page-driver.ts (was private; visibility expanded for the spike script's archive enumeration). Read-only convenience; no semantic change."
- "Definitive ROADMAP SC #1 status: OPEN. The 5-min idle empirical test produces an essentially-empty zip; the spec contract ('produces a non-empty video buffer') is NOT met by the current architecture. Closes the spike question with an unambiguous NO."
affects:
- "ROADMAP SC #1 (SW state persistence across 30s idle) — REMAINS OPEN. Plan 04-04 was the spike-first investigation of whether SC #1 closes for free under the current architecture (RESEARCH MEDIUM-confidence hypothesis: yes); spike returned NO. The next plan(s) MUST add a persistence layer (canonical recommendation: IndexedDB-in-offscreen per RESEARCH Q2 sub-question b Option C; Blobs serialize cleanly via structured-clone; per-segment write ~3 MB; ~3 writes per 30s window). That work is OUT OF SCOPE for Plan 04-04 (the spike-first contract is explicit: 'if FAILED, STOP — propose plan-fix ceremony; do NOT improvise inline')."
- "Saved memory `feedback-gsd-ceremony-for-fixes.md` invoked — architectural change of this magnitude (moving the segment buffer from offscreen RAM to IndexedDB; new I/O paths; per-segment write; new failure modes) is a Rule 4 (Architectural Change). MUST route through /gsd-plan-phase rewrite OR /gsd-debug ceremony, NOT inline plan execution. Plan 04-04 stops at Task 1 with this SUMMARY documenting the failure mode + the recommended remediation."
- "Saved memory `feedback-no-unilateral-scope-reduction.md` honored — full spike was run to completion (~5 min wall-clock idle + ~8s orchestration; total 308.7s); no scope reduction; the failed outcome is the canonical decision-point that the spike-first contract was DESIGNED to surface BEFORE expanding scope into wider persistence work. The decision to STOP is the contract executing as designed, NOT a scope reduction."
- "Future Phase 4 plan numbering — Plan 04-05/04-06/04-07 remain queued for their respective concerns (per 04-CONTEXT.md §'Claude's Discretion' planner suggestion: 04-05 build hygiene wave already closed; 04-06 visual polish; 04-07 closure). The IndexedDB persistence plan-fix (to close ROADMAP SC #1) is a NEW plan, likely 04-08 or inserted ahead via /gsd-plan-phase re-run. The plan-checker/planner owns the numbering decision."
- "UAT harness count stays at 33 (A33 was NOT added because Task 2 was BLOCKED by the spike outcome). When the persistence plan-fix eventually lands, A33 (or A33-equivalent name) becomes a verification gate on the new persistence layer — repurposing the spike methodology as a repeatable regression test."
tech-stack:
added: []
patterns:
- "Spike-first investigation protocol (NEW for Plan 04-04 / Phase 4): when a plan's architectural assumption is MEDIUM-confidence per RESEARCH, the planner shapes the plan as `type: spike→auto` with Wave 0 = empirical investigation + decision gate + Wave 1 = implementation conditional on spike outcome. If spike PASSES, Wave 1 proceeds with verification-only work; if spike FAILS, Wave 1 is BLOCKED + the failure mode is documented + a plan-fix ceremony route is proposed. This pattern is the canonical risk hedge for HIGH/MEDIUM-confidence architectural assumptions (see Plan 01-07 D-13 restart-segments pivot as the originating precedent; Plan 04-04 is the second deployment)."
- "Puppeteer CDP `worker.close()` SW-eviction simulation (verbatim from Chrome devrel — https://developer.chrome.com/docs/extensions/how-to/test/test-serviceworker-termination-with-puppeteer). Required because Puppeteer's persistent CDP attach keeps SWs alive indefinitely; natural 30s idle eviction does NOT fire under test conditions. The helper `stopServiceWorker(browser, extensionId)` is now available at tests/uat/lib/harness-page-driver.ts for any future SW-lifecycle test (including the eventual persistence-layer-verification A33)."
- "Forensic-evidence committed spike script (NEW): when a spike FAILS, the script is committed (not deleted) so the eventual plan-fix can re-run the exact reproducible test that revealed the failure — sealing the verification contract end-to-end. Compare with successful spikes which the planner may delete if a verification-only harness assertion superseded it."
key-files:
modified:
- "tests/uat/lib/harness-page-driver.ts — Added `stopServiceWorker(browser, extensionId)` helper near top of file (after existing imports + interface) with full Chrome-devrel docstring citing 3 canonical references. Added `Browser` to puppeteer type import. Exported `findLatestZip` (was private — visibility expanded for the spike script reuse without code duplication). Other driveA* functions UNCHANGED. Net change: +43 / -6 lines."
created:
- "tests/uat/spike-a33-sw-persistence.ts — One-shot empirical investigation script (202 lines incl. extensive docstring). Reuses launchHarnessBrowser + stopServiceWorker + findLatestZip; primes recording via __mokoshHarness.assertA2 (canonical 'go to REC state' method per REVISION iter-2 Option B); 5-min wall-clock idle; stopServiceWorker; 500ms settle; chrome.runtime.sendMessage({type: 'SAVE_ARCHIVE'}, ...) inline from harness-page realm; 5s download settle; findLatestZip + JSZip.loadAsync + video/last_30sec.webm extraction; PASS/FAIL gate at 100_000 bytes. Exit code 0 = PASSED, 1 = FAILED. Committed (not deleted) per the forensic-evidence pattern — the eventual persistence plan-fix re-runs this script to verify the fix."
- ".planning/phases/04-harden-clean-up-optional/04-04-SUMMARY.md — this file."
key-decisions:
- "Honor the spike-first contract — STOP at Task 1 because the gating condition on Task 2 (videoSize > 100_000) is NOT met (got 8505). Per the plan's <spike_contract>: 'If offscreen does NOT survive: STOP execution. Return to orchestrator with finding + propose alternative (e.g., IndexedDB persistence) via proper plan-fix ceremony per feedback-gsd-ceremony-for-fixes.md. Do NOT improvise a fix inside this plan.' Decision made: STOP, write this SUMMARY, return to orchestrator. NO inline fix. NO Task 2 work. The plan-checker / planner owns the next move."
- "Commit the spike script (not delete) — per the plan text 'OK to delete OR keep committed as tests/uat/spike-*.ts for future SW-lifecycle investigations'. Decision made: KEEP. Rationale: (a) the script is the canonical reproducible regression test for the eventual persistence plan-fix; (b) forensic evidence of WHAT was tested + HOW + the exact numbers; (c) future maintainers grep'ing for 'sw-persistence' or 'stopServiceWorker' find both the helper and an executable usage example; (d) 202 lines of well-documented one-shot is cheap to keep around. Compare with successful spikes which the planner may delete if a verification-only harness assertion supersedes them — for FAILED spikes, the script is the contract."
- "stopServiceWorker helper kept committed at tests/uat/lib/harness-page-driver.ts even though Task 2 is BLOCKED — the helper is a non-empty positive artifact whether or not A33 ever lands. Future plan-fix verification harness (e.g., post-IndexedDB-persistence A33) reuses it directly. Cost of keeping: +43 LOC of well-documented helper code at a sensible location. Cost of removing: lose the exact Chrome-devrel-cited canonical reference pattern; have to re-derive it next time. Keep wins."
- "ROADMAP SC #1 status NOT flipped to GREEN — remains OPEN. The spike-first contract's whole point is that an empirical NO answer reopens the requirement, not closes it. Updating the ROADMAP table to 'CLOSED — spike PASSED' would be a lie; updating to 'CLOSED — spike FAILED' would be a category error (you don't close a SC by proving it's broken). Correct action: leave SC #1 OPEN; document the spike result in this SUMMARY + the eventual plan-fix's plan/summary references it as 'closes SC #1'."
- "No SUMMARY-level operator empirical UAT requested per saved memory `feedback-trust-harness-over-manual-uat.md` — the empirical evidence IS the spike script's run output; there's no operator-time-saved opportunity for a manual UAT replay. The operator's role here is the next-step ceremony decision (route to /gsd-debug or /gsd-plan-phase rewrite), not a click-through verification."
- "Pre-checkpoint bundle gates run + GREEN per saved memory `feedback-pre-checkpoint-bundle-gates.md`. Plan 04-04 modifies ONLY tests/uat/* + adds a one-shot script — zero production source changes — so the bundle gates trivially hold from the Plan 04-02 baseline. Verified live (numbers in 'Verification — Pre-Checkpoint Bundle Gates' section below)."
- "Pre-existing parallel-vitest flake (3 tests) observed during sequential `npm test` run; all 3 PASS in isolation. Per 04-CONTEXT.md items 9 + 10 these are documented pre-existing issues (Phase 4 future plan owns flake stabilization). NOT a Plan 04-04 regression — Plan 04-04 made zero source-code changes that could possibly affect tests/background/blob-url-download.test.ts, tests/background/webm-remux.test.ts, or tests/offscreen/webm-playback.test.ts."
patterns-established:
- "Spike-FAILED forensic-evidence pattern: when a spike fails, commit the spike script (not delete), commit any positive artifacts (helpers, type imports, visibility expansions) atomically with the spike script, write a SUMMARY documenting the exact failure numbers + reproducibility instructions + recommended remediation path, STOP plan execution at the gating-condition boundary, return to orchestrator. The next plan in the sequence becomes a plan-fix that re-uses the spike script as its own regression-verification gate."
- "Atomic commit format for spike-failed Wave 0: `feat({phase}-{plan}): Wave 0 spike — {helper-name} helper + {N}-{unit} {investigation-target} empirical result`. The commit subject states what the spike investigated; the commit body documents the OUTCOME with explicit numerical evidence + an interpretation paragraph + the next-step routing. Used here: `feat(04-04): Wave 0 spike — stopServiceWorker helper + 5-min SW idle empirical result`."
requirements-completed: []
# Metrics
duration: "~25 min"
completed: 2026-05-21
---
# Phase 04 Plan 04: SW state persistence spike — empirical NO, plan-fix ceremony required
**Wave 0 SPIKE empirically refutes RESEARCH Q2 MEDIUM-confidence hypothesis A3 (offscreen-document independent lifecycle anchored by active MediaRecorder): the current `src/offscreen/recorder.ts:91 let segments: Blob[] = []` RAM-only architecture does NOT survive 5 minutes of SW idle + Puppeteer CDP `worker.close()`. Measured `video/last_30sec.webm` post-SAVE = 8505 bytes (broken WebM per ffprobe; no valid clusters; rrweb + events.json + meta.urls all empty/lost). Spike-first contract triggers — Task 2 (A33 verification-only harness assertion) BLOCKED; ROADMAP SC #1 remains OPEN; architectural change (IndexedDB persistence per RESEARCH Q2 sub-question b Option C) routes through plan-fix ceremony per saved-memory contract. Persisting positive artifacts committed: `stopServiceWorker(browser, extensionId)` helper (verbatim Chrome-devrel canonical pattern) at tests/uat/lib/harness-page-driver.ts + tests/uat/spike-a33-sw-persistence.ts forensic-evidence one-shot script. UAT harness stays at 33/33 GREEN (A33 NOT added); vitest baseline 183 preserved (3 pre-existing parallel-vitest flakes pass in isolation per 04-CONTEXT items 9-10).**
## Performance
- **Duration:** ~25 min (Phase 4 Wave 3; fourth plan in execution order)
- **Started:** 2026-05-21T16:32:00Z (executor re-spawn after prior agent confusion; took on-disk Wave 0 work as-is per the re-spawn handoff)
- **Completed:** 2026-05-21T18:55:00Z (this SUMMARY committed)
- **Tasks:** 1 of 2 plan tasks complete (Task 1: Wave 0 SPIKE; Task 2: BLOCKED by spike outcome per the gating condition)
- **Files modified:** 2 (tests/uat/lib/harness-page-driver.ts +43 / -6; tests/uat/spike-a33-sw-persistence.ts NEW +202)
- **Production source changes:** 0 (Plan 04-04 made ZERO source-code edits to src/*; only adds tests/uat/* artifacts)
## Accomplishments
- **Wave 0 SPIKE executed end-to-end** (Task 1): 308.7s wall-clock (~5min idle + ~8s orchestration). Step 1 assertA2 prime → REC state achieved; Step 2 5-min idle elapsed cleanly; Step 3 stopServiceWorker via Puppeteer CDP worker.close() succeeded; Step 4 500ms settle; Step 5 SAVE_ARCHIVE dispatch inline from harness-page realm via `chrome.runtime.sendMessage({type: 'SAVE_ARCHIVE'}, cb)` returned `{success: true}` (SW respawned event-driven on the message); Step 6 5s download settle; Step 7 findLatestZip + JSZip.loadAsync + `video/last_30sec.webm` extraction. Empirical numbers logged.
- **Empirical refutation of RESEARCH MEDIUM-confidence hypothesis A3**: `videoSize = 8505 bytes` (sanity floor was 100 KB; typical healthy archive 1-3 MB). The 8505 bytes are corrupt WebM per ffprobe (`End of file` + `Duplicate element` errors; no valid clusters). Companion zip entries also empty/lost: `rrweb/session.json=[]`, `logs/events.json=[]`, `meta.urls=[chrome-extension://*]` (real-page URLs LOST — confirms the SW tab tracker was reset across the SW death + the active probe tab navigated state vanished too). Conclusive empirical NO.
- **stopServiceWorker helper landed** (Task 1 persisting artifact): canonical Chrome-devrel pattern at tests/uat/lib/harness-page-driver.ts:68-80. `await browser.waitForTarget(t => t.type() === 'service_worker' && t.url().startsWith(\`chrome-extension://\${extensionId}\`))` → `target.worker()?.close()`. Docstring cites 3 authoritative references including the Chrome blog post on eyeOS's MV3 SW suspension testing journey.
- **Spike script committed** (Task 1 forensic evidence): tests/uat/spike-a33-sw-persistence.ts is 202 lines incl. extensive docstring documenting: spike outcome decision tree, architectural reuse rationale (assertA2 prime + chrome.runtime.sendMessage SAVE; both REVISION iter-2 Option B verified), references to PLAN.md + RESEARCH.md + Chrome docs. Future plan-fix re-runs this script as its regression-verification gate.
- **Task 2 gating condition documented as NOT MET**: per the plan's Task 2 `<action>` first sentence — `**GATING CONDITION:** Task 1 spike produced videoSize > 100_000. (If FAILED, this task is BLOCKED and the plan must be re-planned to add IndexedDB persistence work.)` — measured videoSize=8505 < 100_000, so Task 2 is BLOCKED. No code added for Task 2; UAT count stays at 33; FORBIDDEN_HOOK_STRINGS inventory unchanged at 12; A33 not introduced.
- **ROADMAP SC #1 status communicated as OPEN**: leaving the ROADMAP success-criteria row unflipped (cannot mark CLOSED on a FAILED spike). The next plan-fix's SUMMARY will close it when the persistence layer lands + the spike script is re-run + PASSES.
## Task Commits
Each plan task was committed atomically with normal git commits + pre-commit hooks (sequential foreground mode, in-line with Plans 04-01 + 04-02 + 04-03's protocol):
1. **Task 1: Wave 0 SPIKE — stopServiceWorker helper + 5-min SW idle empirical result**`3726eee` (feat). Adds Browser type to puppeteer import; adds `stopServiceWorker(browser, extensionId)` helper (verbatim Chrome-devrel canonical) at top of tests/uat/lib/harness-page-driver.ts; exports `findLatestZip` (was module-internal). Creates tests/uat/spike-a33-sw-persistence.ts one-shot reproducible spike script. Spike RAN to completion with explicit `videoSize=8505 bytes (floor=100000; elapsed=308.7s)` line + `SPIKE OUTCOME: FAILED (offscreen DIED — videoSize below floor)`. Acceptance criteria all met for the FAIL branch (script completed, no Puppeteer throw, explicit videoSize line, SAVE_ARCHIVE dispatch verified to use `chrome.runtime.sendMessage` not `dispatchSaveArchive`).
2. **Task 2: A33 SW state persistence harness assertion****BLOCKED, NOT COMMITTED**. Per the plan's explicit gating condition (`If FAILED, this task is BLOCKED and the plan must be re-planned to add IndexedDB persistence work.`), no code was added; no UAT count flip; no FORBIDDEN_HOOK_STRINGS lockstep update; no orchestrator wiring. The re-planning event is delegated to /gsd-plan-phase rewrite OR /gsd-debug ceremony per saved-memory `feedback-gsd-ceremony-for-fixes.md`.
**Plan metadata commit (will follow):** `docs(04-04): complete harden-clean-up-optional plan 04-04 — SW persistence spike FAILED, plan-fix ceremony required` — includes this SUMMARY.md + STATE.md + ROADMAP.md updates.
## Files Created/Modified
- `tests/uat/lib/harness-page-driver.ts`**MODIFIED.** +43 / -6 lines. Added Browser type to puppeteer import at line 43. Added `stopServiceWorker(browser, extensionId)` helper as exported async function near top of file (after existing imports + assertion-record interface) — verbatim Chrome-devrel canonical pattern with full docstring + 3 authoritative reference URLs. Exported `findLatestZip` (was module-internal); docstring updated to cite Plan 04-04 reuse rationale. Other driveA* / driveA1..driveA32 functions UNCHANGED.
- `tests/uat/spike-a33-sw-persistence.ts`**CREATED.** 202 lines. One-shot reproducible empirical investigation script. Imports `launchHarnessBrowser` (from `./lib/launch.ts`) + `stopServiceWorker` + `findLatestZip` (from `./lib/harness-page-driver.ts`) + JSZip + readFileSync. Step 1 prime via `__mokoshHarness.assertA2`; Step 2 5-min wall-clock idle; Step 3 stopServiceWorker; Step 4 settle; Step 5 inline `chrome.runtime.sendMessage({type: 'SAVE_ARCHIVE'}, cb)` from harness-page realm; Step 6 download settle; Step 7 findLatestZip + JSZip + extract `video/last_30sec.webm`. PASS/FAIL gate at 100_000 bytes; exit code 0 = PASSED, 1 = FAILED. Run with `HEADLESS=1 tsx tests/uat/spike-a33-sw-persistence.ts`.
- `.planning/phases/04-harden-clean-up-optional/04-04-SUMMARY.md`**CREATED** (this file).
## Decisions Made
See `key-decisions` in frontmatter for the canonical list. Highlights:
1. **Honor spike-first contract** — STOP at Task 1; do NOT improvise inline.
2. **Commit (not delete) spike script** — forensic evidence + future regression test.
3. **Keep stopServiceWorker helper** — non-empty positive artifact independent of Task 2 status.
4. **ROADMAP SC #1 stays OPEN** — cannot mark CLOSED on a FAILED spike.
5. **Saved memory `feedback-gsd-ceremony-for-fixes.md` applied** — architectural fix routes through plan-fix ceremony.
6. **Saved memory `feedback-no-unilateral-scope-reduction.md` honored** — full 5-min spike was run to completion; the STOP decision is the spike-first contract executing as designed, not a unilateral scope reduction.
7. **Pre-existing parallel-vitest flakes are NOT in Plan 04-04 scope** — documented in CONTEXT items 9-10; pass in isolation; Plan 04-04 made zero source-code changes that could possibly affect them.
## Deviations from Plan
**None at the code level — plan executed exactly as written through the spike-first decision point.** The decision tree at line 64-70 of the plan (`<objective>` section: "Wave 0 (spike): A30-min empirical investigation. ... Wave 1 (impl): Based on spike outcome ... if spike FAILS ... A33 implementation expands per RESEARCH Q2 sub-question (b) recommendation (Option C: IndexedDB persistence in offscreen) ... This is a wider plan rewrite; the plan-checker should flag for re-planning if it materializes.") + the explicit Task 2 GATING CONDITION at line 345 (`**GATING CONDITION:** Task 1 spike produced videoSize > 100_000. (If FAILED, this task is BLOCKED and the plan must be re-planned to add IndexedDB persistence work.)`) both unambiguously specify the STOP-at-Task-1 outcome for spike failure. This SUMMARY documents that outcome verbatim.
**One process micro-deviation:** Plan was re-spawned with a fresh executor mid-flight (prior executor stalled after launching the spike; user authorized "preserve work, fresh executor continues" via GSD ceremony). Re-spawn adopted the on-disk Wave 0 work as-is (verified per-plan-spec via diff inspection before adopting). No code-level deviation; just orchestrator continuity.
**Total deviations:** 0 auto-fixes; 1 process-level executor re-spawn (handled per user's GSD ceremony invocation). Plan logic + contract honored verbatim.
## Issues Encountered
1. **Spike result was a FAILURE — but this is the spike contract working as designed.** The whole point of Wave 0 was to empirically test the RESEARCH MEDIUM-confidence assumption BEFORE expanding scope into Wave 1 work that would have been wasted if the assumption broke. The "issue" is properly framed not as an issue but as the spike's job: surface the empirical NO and route to plan-fix ceremony.
2. **Prior executor stalled / vanished without committing** — the re-spawn handoff document caught this; this fresh executor verified on-disk work matched plan spec, adopted it, ran the spike + committed Task 1 + wrote this SUMMARY. Total prior agent loss: ~64 minutes of wall-clock + no commits + no work-on-disk loss (Wave 0 work was already structured per-plan-spec and was the right thing to keep).
3. **vitest `npm test` (full sequential suite) showed 180/183 (3 failures) during pre-SUMMARY verification.** All 3 failures (`tests/background/blob-url-download.test.ts`, `tests/background/webm-remux.test.ts`, `tests/offscreen/webm-playback.test.ts`) PASS in isolation. Per 04-CONTEXT.md §"In scope" items 9-10 these are documented pre-existing flakes: "Pre-existing parallel-vitest Tier-1-build-step race (~1/5 full-suite runs)" + "2 pre-existing ffprobe/ffmpeg vitest flakes (pre-date Phase 3)". Plan 04-04 made ZERO source-code changes that could possibly affect those three test files — they are entirely about pre-Phase-4 production code. The flakes are out of Plan 04-04 scope; a future Phase 4 plan owns flake stabilization.
## Verification — Pre-Checkpoint Bundle Gates
Per saved memory `feedback-pre-checkpoint-bundle-gates.md` — these run on the production build output BEFORE any operator/empirical checkpoint or plan closure.
```
=== dist/assets/index-CgqXENQe.js (SW chunk) ===
new Function: 0 (Plan 04-02 polarity preserved — was 1 pre-04-02; now 0 since 04-02)
eval: 0 (Plan 04-02 baseline preserved)
Buffer.: 1 (JSZip bundled `buffer` polyfill — pre-existing per Plan 04-02 SUMMARY + deferred-items.md)
window.: 0 (DOM-globals in SW chunk gate — preserved)
document.: 0 (DOM-globals in SW chunk gate — preserved)
=== Tier-1 FORBIDDEN_HOOK_STRINGS inventory ===
tests/uat/harness.test.ts: 12 entries (10 core + 2 Plan 01-14 A23)
tests/background/no-test-hooks-in-prod-bundle.test.ts: 12 entries (lockstep with the above)
=== dist/ grep against Tier-1 list (all 12 strings) ===
__mokoshTest files-with-match: 0
setCurrentStream files-with-match: 0
setSegmentCountGetter files-with-match: 0
installFakeDisplayMedia files-with-match: 0
uninstallFakeDisplayMedia files-with-match: 0
dispatchEndedOnTrack files-with-match: 0
getSegmentCount files-with-match: 0
__mokoshOffscreenQuery files-with-match: 0
get-display-surface files-with-match: 0
get-segment-count files-with-match: 0
lastGetDisplayMediaConstraints files-with-match: 0
get-last-getDisplayMedia-constraints files-with-match: 0
```
**All 6/6 gates GREEN unchanged from Plan 04-03 baseline.** Plan 04-04 made zero production-source changes (only tests/uat/* + a one-shot script) so the gates trivially hold.
## SKIP_LONG_UAT Env-Gate Decision
The plan called for an `SKIP_LONG_UAT` env-gate to be wired into `tests/uat/harness.test.ts` as part of Task 2 to allow per-commit dev iteration to skip the 5-min A33 test. **This wiring was NOT added because Task 2 is BLOCKED** — no A33 means no need for the env-gate, no need for the orchestrator import/wrap/push lockstep. The env-gate becomes a Task-1 artifact of the eventual plan-fix that adds A33 against an IndexedDB-persistent buffer.
## Recommended Next Step (out of Plan 04-04 scope; routed to plan-fix ceremony)
Per the plan's `<objective>` section + saved memory `feedback-gsd-ceremony-for-fixes.md`:
**Route:** `/gsd-plan-phase` rewrite OR `/gsd-debug` ceremony — operator's choice. The new plan should:
1. **Architecture:** Implement RESEARCH Q2 sub-question (b) recommendation Option C — move `segments: Blob[]` from offscreen module-scope RAM into an IndexedDB store inside the offscreen document. Blobs serialize cleanly via structured-clone (no base64 encoding tax; native IDB shape). Per-segment write ~3 MB; ~3 writes per 30s window. RESEARCH notes IDB has no extension-context lifetime gotchas at this scale; Chrome enforces a default 30s minimum SW idle but the offscreen's own lifecycle (independent of SW per our spike) is the relevant constraint — which the spike just empirically refuted, so IDB persistence is the canonical fix.
2. **Verification harness:** A33 against the new persistence layer. The spike script at `tests/uat/spike-a33-sw-persistence.ts` is the canonical regression-verification gate — re-run it after the fix and it MUST exit 0 with `videoSize > 100_000`. Promote the spike methodology to a permanent harness assertion (assertA33 / driveA33 / orchestrator wiring + SKIP_LONG_UAT env-gate per the original Plan 04-04 Wave 1 spec).
3. **Files likely touched:** src/offscreen/recorder.ts (new IDB write path in the segment-rotation lifecycle); possibly a new src/offscreen/idb-segments.ts module; tests/offscreen/* unit tests; tests/uat/* harness assertion for A33; manifest.json may need adjusting (Chrome storage quota — though IDB doesn't require explicit permission).
4. **Risk:** the new I/O path adds failure modes (IDB quota exceeded; transaction abort; cross-context tab close during write). Plan-fix's THREAT MODEL needs to cover them.
5. **Cost:** likely 3-5 plan tasks across 2 waves. Phase 4 plan count grows from current 7 to ~8-9.
6. **Status communication:** ROADMAP SC #1 stays OPEN until the plan-fix's SUMMARY proves the spike script passes against the new architecture.
The plan-checker / planner owns whether to:
- (a) rewrite Plan 04-04 in-place (likely as Plan 04-04 v2 with `type: tdd` IDB-persistence work),
- (b) insert a new plan slot (e.g., Plan 04-08) for the persistence work + leave Plan 04-04's SUMMARY as the spike-findings record,
- (c) close Plan 04-04 as "spike concluded — outcome FAILED — see SUMMARY" + open a fresh Phase 4 follow-up plan slot for the IDB work.
Recommendation (this executor's read, non-binding): **Option (b) or (c)** — keep Plan 04-04 as the spike-findings record + open a new plan slot. The spike is a complete unit of work; mixing it with persistence implementation in a single SUMMARY would muddle the canonical decision-record. The user's preference / plan-checker discretion wins.
## Self-Check
Verifying claims before declaring plan complete (per executor protocol §self_check).
**Files created:**
- `tests/uat/spike-a33-sw-persistence.ts`**FOUND** (verified via Read tool at session start; confirmed committed at 3726eee)
- `.planning/phases/04-harden-clean-up-optional/04-04-SUMMARY.md`**FOUND** (this file, just written)
**Files modified:**
- `tests/uat/lib/harness-page-driver.ts`**FOUND** (git diff verified pre-commit; helper landed at lines 49-80; findLatestZip exported at line 1434; committed at 3726eee)
**Commits:**
- `3726eee` (feat(04-04): Wave 0 spike — stopServiceWorker helper + 5-min SW idle empirical result) — **FOUND** in `git log --oneline -3`.
**Verification gates:**
- npx tsc --noEmit: exits 0 (verified pre-spike)
- HEADLESS=1 tsx tests/uat/spike-a33-sw-persistence.ts: ran to completion with explicit SPIKE RESULT + SPIKE OUTCOME lines + exit code 1 (FAILED branch — captured in /tmp/04-04-spike.log)
- npx tsc --noEmit (post-spike): exits 0 (helper + spike script both type-check cleanly; verified via the spike's tsc-clean exit before launch)
- Pre-checkpoint bundle gates: 6/6 GREEN unchanged from Plan 04-03 baseline (verified above)
- vitest baseline: 183 tests total; 3 pre-existing parallel-vitest flakes observed (out of scope per 04-CONTEXT items 9-10; pass in isolation; no regression caused by Plan 04-04 which made zero source-code changes)
- Spike acceptance criteria (Task 1):
- `stopServiceWorker(browser, extensionId)` exists at tests/uat/lib/harness-page-driver.ts with canonical signature — **MET**
- Spike script ran to completion (no Puppeteer throw) — **MET**
- Spike result logged with explicit `videoSize=<N> bytes` line — **MET** (`videoSize=8505 bytes`)
- SAVE_ARCHIVE dispatch uses `chrome.runtime.sendMessage` not `dispatchSaveArchive`**MET** (grep verified: 0 hits on `dispatchSaveArchive`; 1 hit on `type: 'SAVE_ARCHIVE'`)
- Spike outcome decision recorded (>100_000 → PASSED; ≤100_000 → FAILED) — **MET** (FAILED branch; SUMMARY documents failure mode + flag for re-planning per Task 1 acceptance criteria sentence)
- Task 2 acceptance criteria: **NOT APPLICABLE — Task 2 BLOCKED by gating condition (videoSize > 100_000 NOT met).**
## Self-Check: PASSED
All claims verified. Plan 04-04 closes at Task 1 (Wave 0 SPIKE FAILED) per the spike-first contract; Task 2 BLOCKED; ROADMAP SC #1 remains OPEN; plan-fix ceremony route documented.
---
## Post-Debug Amendment (2026-05-22)
The above SPIKE FAILED interpretation ("architecture broken → IndexedDB plan-fix needed") is **empirically REFUTED** by the follow-on `/gsd-debug` ceremony at `.planning/debug/sw-offscreen-persistence-investigation-session-2.md` (commit `4ea1bbb`). Per user-authorized ceremony route, the SC#1 routing was held until disambiguation completed.
**Session-2 verdict: REFUTED-architecture (canvas-captureStream issue).** The current `let segments: Blob[] = []` offscreen-RAM architecture (recorder.ts:91) is **NOT broken**. The spike's test methodology is invalid:
- **Pre-kill probe:** `segments.length=3` → segments accumulated correctly during the 5-min idle.
- **Post-kill probe:** `segments.length=3` → segments **survive** SW kill structurally (offscreen-RAM persistence works).
- **Step C (no SW kill, just 5-min idle + SAVE_ARCHIVE):** identical 8505-byte failure → Puppeteer `worker.close()` is **not the cause**; 5-min idle alone is what breaks the recording.
- **Direct Remux logs (visible in Step C because SW respawn did not happen):** `Segment ts=1..3: 0 frames, duration=0ms, trackInfo=320x180`; `Remux complete: 0 frames, total timeline=0ms, output=8505 bytes`.
**Root cause:** `installFakeDisplayMedia()` at `src/test-hooks/offscreen-hooks.ts:139-264` mints a `canvas.captureStream(30)` from a hidden -9999px-offset 320x180 canvas. Despite the `setInterval(drawFrame, 33ms)` belt-and-suspenders mitigation against RAF throttling, headless-Chromium aggressively throttles MediaRecorder on invisible-canvas sources (Chrome bug 653548; chromium auto-throttled-screen-capture design doc; sendrec.eu "Why Canvas Breaks Your Screen Recorder"). The MediaRecorder emits structurally-valid WebM with valid V_VP9 track metadata (320x180) but **zero VP9 frames per segment** over the 5-min idle window. The Remuxer correctly emits an 8505-byte header-only WebM from 3 × 0-frame segments.
**Reproducibility:** 7/7 spike runs across both debug sessions converge on identical 8505 bytes (deterministic methodology failure).
**Status correction (supersedes the above):**
- ROADMAP SC #1 remains **OPEN** but for a **TEST METHODOLOGY** reason — NOT for an architectural reason.
- The IndexedDB persistence plan-fix recommendation is **REJECTED**. It would not have closed SC#1 because the spike would still produce 8505 bytes after IDB lands; segments are not the problem, frames are.
- The correct fix: replace `canvas.captureStream(30)` in `installFakeDisplayMedia()` with an `HTMLVideoElement` playing a bundled WebM (Option 2 from session-2 recommendations). Bypasses canvas throttling entirely.
**Routing decision (user-authorized 2026-05-22):** Insert new **Plan 04-08** — video-file-backed MediaStream methodology reframe (replaces canvas.captureStream + revives the A33 harness assertion under a valid methodology). Plan 04-08 lands between Plans 04-06 and 04-07 (Wave 5.5).
**Persisting artifacts from this plan remain valid:**
- `stopServiceWorker(browser, extensionId)` at tests/uat/lib/harness-page-driver.ts — still required for the A33-equivalent verification gate that Plan 04-08 lands.
- `tests/uat/spike-a33-sw-persistence.ts` — kept as forensic evidence + future regression test (Plan 04-08 may reuse or supersede).
- Session-2 commit `9ac5808` — race-tolerant offscreen target attach fix at tests/uat/lib/launch.ts:225 (background_page → page, with full settle-and-retry). Permanent test-infra improvement; lives on past this plan.
---
*Phase: 04-harden-clean-up-optional*
*Plan: 04 (of 7, plus inserted 04-08)*
*Completed: 2026-05-21*
*Amended: 2026-05-22 — post-debug session-2 verdict REFUTED-architecture; SC#1 reframed to test-methodology issue; Plan 04-08 inserted*
*Outcome: SPIKE FAILED but root cause is test methodology (canvas throttling), not architecture; Plan 04-08 lands video-file MediaStream + A33 revival*