diff --git a/.planning/phases/01-stabilize-video-pipeline/01-UAT.md b/.planning/phases/01-stabilize-video-pipeline/01-UAT.md new file mode 100644 index 0000000..645fb56 --- /dev/null +++ b/.planning/phases/01-stabilize-video-pipeline/01-UAT.md @@ -0,0 +1,269 @@ +--- +status: partial +phase: 01-stabilize-video-pipeline +source: + - 01-01-SUMMARY.md + - 01-02-SUMMARY.md + - 01-03-SUMMARY.md + - 01-04-SUMMARY.md + - 01-05-SUMMARY.md + - 01-06-SUMMARY.md + - 01-07-SUMMARY.md +verifier_residue: 01-VERIFICATION.md (status: human_needed, OPR-1/2/3) +started: 2026-05-16T11:14:00Z +updated: 2026-05-16T11:58:00Z +paused_for: /gsd-debug investigation of Test 3 port-reconnect uncaught-error finding +--- + +## Current Test + +[testing paused — 1 issue + 1 blocked outstanding; user routed to /gsd-debug + immediately after Test 3 surfaced 3× Uncaught Error post-reconnect. Resume + Test 3 save+play AND Test 4 SW Force-Stop after debug session lands a fix.] + +## Tests + +### 1. Cold Start Smoke Test +expected: | + Run `KEEP_PROFILE=0 ./smoke.sh` from a clean state. Chrome opens with a + fresh `/tmp/mokosh-smoke-profile`. You manually Load Unpacked the `dist/` + directory. Extension icon appears in the toolbar. No red errors in the SW + console. SW console shows `[Background] Service Worker initialized` and + an `OFFSCREEN_READY` handshake message. No "Receiving end does not exist" + errors. (This catches the bugs that only show on fresh start — orphan IDB + cleanup, missing onMessage listener, port handshake race CR-02/CR-03.) +result: pass +evidence: | + User pasted SW console logs from chrome://extensions → service worker: + - Service Worker initialized + initializing (MV3 install + startup, both clean) + - "Cleaned up orphaned VideoRecorderDB" — Plan 01-05 IDB hygiene fires + - "Received message: REQUEST_PERMISSIONS" — popup → SW message routing OK + - "Creating offscreen document at: chrome-extension://.../src/offscreen/index.html" + — Plan 01-06 crxjs build output path matches manifest + - "Offscreen port connected" — T-1-04 keepalive port host (CR-02 fix) live + - "Received message: OFFSCREEN_READY" → "OFFSCREEN_READY received" → CR-03 + handshake fix verified: SW waits for READY before sending START_RECORDING + - "Sending START_RECORDING to offscreen..." → "START_RECORDING sent successfully" + → "Video recording started successfully" + Zero errors. Zero "Receiving end does not exist". Critical handshake races + (CR-02 + CR-03) confirmed fixed at runtime. + +### 2. OPR-1: getDisplayMedia Picker + Stream Grant +expected: | + In the same smoke.sh-launched Chrome, click the extension icon. Chrome's + native screen-share picker appears. Because smoke.sh passes + `--auto-select-desktop-capture-source="Mokosh Smoke Test"`, the picker + auto-accepts the smoke tab — no manual pick needed. "Sharing your screen" + banner appears at the top of the Chrome window. SW console shows + `[Offscreen:Recorder] Stream created` (or equivalent). No errors from + `getDisplayMedia` rejection. +result: pass +evidence: | + SW console logs show end-to-end happy path with no errors: + - "Received message: REQUEST_PERMISSIONS" (popup → SW, click registered) + - "Starting video capture for tab 240122995: data:text/html,..." + (smoke tab targeted as expected) + - "Creating offscreen document at: chrome-extension://.../src/offscreen/index.html" + - "Offscreen port connected" + "OFFSCREEN_READY received" + - "Sending START_RECORDING to offscreen..." → "START_RECORDING sent successfully" + - **"Video recording started successfully"** — this final log only fires if + MediaRecorder.start() in the offscreen doc completed without throwing, + which means getDisplayMedia resolved with a stream (i.e., picker + auto-accepted, stream granted, codec assertion passed). + Note: "[Offscreen:Recorder] Stream created" log lives in the offscreen + page's DevTools console (separate context from SW DevTools); user only + shared SW console output. The SW-side "Video recording started successfully" + is the equivalent positive signal. + +### 3. OPR-2: Continuous Recording Across Tab Switches +expected: | + With recording active (after Test 2), open 2-3 NEW tabs and switch + between them rapidly for ~30 seconds. Then click the extension icon + → "Сохранить отчёт об ошибке". smoke.sh detects the new + `session_report_*.zip` in ~/Downloads, runs ffprobe (exit 0 expected), + stages the WebM to `/tmp/mokosh-last_30sec.webm`, and opens it in + Chrome. The video plays continuously for the full ~30 s with no freeze, + no black frames, no missing seconds across the tab-switch moments. + (Under getDisplayMedia, the stream is screen-or-window-scoped — tab + switches should NOT interrupt capture. This is the empirical confirmation + of D-01/D-15 amended semantics.) +result: issue +reported: | + Three layers of evidence collected during this test: + + (a) Operator UX observation: "if i open the new tab, it suggest to + 'share this tab instead'" — Chrome's tab-share affordance. + (b) Console errors: 3× Uncaught Error: Attempting to use a disconnected + port object at offscreen:1:4644, starting at the 290 s pre-emptive + reconnect mark. + (c) **Silent data loss in the produced archive** (discovered after + Test 3 was paused — user completed the save and shared smoke.sh + output): smoke.sh waited 600 s before detecting the new zip; once + detected, smoke.sh reported `caution: filename not matched: + video/last_30sec.webm`. Inspecting the zip + (`session_report_2026-05-16_13-54-52.zip`, 88 254 bytes) confirms + it contains rrweb/session.json (empty), logs/events.json (empty), + screenshot.png, meta.json — **no video/ directory at all**. +severity: blocker +observed_evidence: | + Pre-bug (the GOOD news): + - 33+ segment rotations over ~6 minutes, "kept: 3/3" invariant holding + in the OS:Recorder logs (D-13 restart-segments lifecycle live and + healthy — sizes 472-799 KB per 10s segment) + + The actual bug chain (most severe → least severe): + + 1. **SILENT VIDEO DATA LOSS on save** (BLOCKER — defeats phase goal): + - Archive shipped with no video file. The phase's whole point is + producing a playable last_30sec.webm — and the operator-facing + save flow does NOT do that. + - Root cause located by code reading (NOT a fix attempt) at + `src/background/index.ts:346-352`: + if (videoBufferResponse.segments.length > 0) { + zip.file('video/last_30sec.webm', mergeVideoSegments(...)); + } else { + logger.warn('✗ No video segments to add'); // ← we hit this branch + } + The else-branch ships an archive missing the video file entirely, + with only a warn log. This is exactly the silent-failure mode the + phase set out to fix. + - Upstream proximate cause: REQUEST_BUFFER round-trip fails to + deliver a non-empty BUFFER response. Hypothesised pathway: + offscreen's encodeAndSendBuffer captures portAtRequest via the + CR-01 guard (recorder.ts:534), then awaits base64 encoding; if + the pre-emptive 290s reconnect lands during that await, the + guard at recorder.ts:597-601 correctly refuses to post on the + stale port BUT also doesn't retry on the fresh port. The SW + per-request listener (background/index.ts:130-167) times out + silently after BUFFER_FETCH_TIMEOUT_MS=2_000, sets + videoBufferResponse.segments = [], proceeds to build the + empty-video archive. The 600 s smoke.sh wait suggests the + operator clicked save multiple times until one + REQUEST_BUFFER round-trip happened to NOT collide with a + reconnect window. + + 2. **3× Uncaught Error after pre-emptive reconnect** (MAJOR — noise + + symptom of #1): + - Locus: `src/offscreen/recorder.ts:623-625` (ping interval) + + :626-630 (pre-emptive setTimeout). The setInterval ping callback + uses `keepalivePort?.postMessage({ type: 'PING' })` — optional + chaining only catches null, not a connected-but-disconnecting + Port. Microtask race between disconnect() at line 628 and the + teardownPortTimers() in onDisconnect at line 618 allows a queued + ping callback to execute against the now-disconnected port. + - Spacing matches PORT_PING_MS=25_000 across the post-reconnect + window (errors at 11:50:10, ~11:50:40, ~11:51:00). + - Same disconnect race almost certainly contributes to bug #1 by + leaking REQUEST_BUFFER / BUFFER on the dead port. + + 3. **Operator-side foot-gun: "Share this tab instead"** (MINOR — + Phase 5 hardening; documented separately as advisory gap): + - Chrome's platform UX surfaces this button on every other tab + when extension uses getDisplayMedia in tab-share mode. + - Mitigation options: (a) operator guidance "pick Entire Screen", + (b) nudge picker via `video: { displaySurface: 'monitor' }` + constraint in the getDisplayMedia call. + +### 4. OPR-3: SW Idle-Unload Survival +expected: | + Start a new recording (run smoke.sh again with KEEP_PROFILE=1 to reuse + the loaded extension; click icon to start recording). Let it record + ~15 s. Then: open `chrome://extensions` → find Mokosh → click the + "service worker" link → in the DevTools that opens, click "Force stop" + on the SW. Wait 60 s (the SW should stay dead — no auto-revival). Click + the extension icon → "Сохранить отчёт об ошибке". SW revives, OFFSCREEN_READY + handshake re-establishes, archive downloads. Open the WebM in Chrome → + video from BEFORE the Force Stop is present in the file. (Tests that + the offscreen-document buffer survives SW unload — D-16/D-17 + CR-03 + hasDocument fix. This is the most fragile path; if it fails, the bug + is likely in `getVideoBufferFromOffscreen` post-respawn or in the + OFFSCREEN_READY handshake re-fire.) +result: blocked +blocked_by: other +reason: | + Paused by user decision after Test 3 surfaced the port-reconnect + uncaught-error finding. Routed to /gsd-debug. Test 4 deliberately not + attempted yet because it would compound debug-session signal: SW + Force-Stop triggers exactly the disconnect→reconnect path that's + currently buggy. Re-run after debug fix lands to get a clean signal + on the SW-survival contract. + +## Summary + +total: 4 +passed: 2 +issues: 1 +pending: 0 +skipped: 0 +blocked: 1 + +## Gaps + +- truth: "When the operator clicks save, the produced archive contains + a playable video/last_30sec.webm derived from the ring buffer's most + recent 30 s. This is the entire goal of Phase 1; without it the + phase is not delivered to operators regardless of structural correctness." + status: failed + reason: | + User reported smoke.sh waited 600 s then detected the new zip with + `caution: filename not matched: video/last_30sec.webm`. Inspection: + archive `session_report_2026-05-16_13-54-52.zip` (88 254 bytes) + contains rrweb/session.json, logs/events.json, screenshot.png, + meta.json — NO video/ directory. Code at + src/background/index.ts:346-352 ships the archive without video + when `videoBufferResponse.segments.length === 0`, only logging a + warning. The empty-segments condition is reached when the + REQUEST_BUFFER → BUFFER port round-trip fails. Three concrete + failure modes contribute: + (1) Pre-emptive 290 s reconnect race in offscreen recorder leaves + encodeAndSendBuffer's portAtRequest stale; CR-01 guard at + recorder.ts:597-601 correctly refuses to post BUFFER on dead + port but doesn't retry on fresh port; SW listener times out + silently after BUFFER_FETCH_TIMEOUT_MS=2_000. + (2) Ping setInterval at recorder.ts:623-625 surfaces 3× Uncaught + Error post-reconnect from the same race window — symptomatic + of the same root cause. + (3) saveArchive at background/index.ts:340+ has no retry-with- + backoff on empty BUFFER and no operator-visible error signal — + ships silently. + severity: blocker + test: 3 + artifacts: + - path: src/background/index.ts + lines: "346-352, 430-493, 130-167" + issue: "Empty videoBufferResponse.segments silently produces no-video archive; saveArchive has no retry/error-surface; per-request BUFFER listener times out without retry" + - path: src/offscreen/recorder.ts + lines: "522-545, 597-604, 606-630, 623-625" + issue: "Pre-emptive 290 s reconnect race window: encodeAndSendBuffer correctly drops stale post but never re-encodes/re-posts on fresh port; ping setInterval racing with reconnect's teardownPortTimers leaves callbacks executing on disconnected port" + missing: + - "REQUEST_BUFFER retry path on offscreen side after port reconnect (re-encode + re-post on fresh port)" + - "OR: saveArchive-side retry loop with backoff when videoBufferResponse.segments is empty" + - "Operator-visible error: surface 'no video data — try again' in popup instead of silently shipping no-video archive" + - "Atomic ping-callback guard (check port.connected before postMessage, or try/catch)" + - "Test coverage in tests/offscreen/port.test.ts: pre-emptive reconnect during in-flight REQUEST_BUFFER (currently only SW-side disconnect is pinned)" + - "Test coverage in tests/background/: saveArchive behavior under empty-BUFFER response (currently no test exists for this branch)" + debug_session: "" + +- truth: "Operator picking 'tab' at the getDisplayMedia picker should not + be one click away from accidentally redirecting the share target via + Chrome's 'Share this tab instead' affordance." + status: advisory + reason: | + Chrome's platform UX: when an extension uses getDisplayMedia to share + a tab, every other tab shows a 'Share this tab instead' button in the + 'Sharing your screen' banner. Clicking it replaces the underlying + MediaStream, which our onUserStoppedSharing handler treats as + end-of-session. User flagged this during Test 3. + severity: minor + test: 3 + artifacts: + - path: src/offscreen/recorder.ts + lines: "ref D-01/D-15 amended getDisplayMedia call" + issue: "displaySurface preference not declared" + missing: + - "Optional: pass { video: { displaySurface: 'monitor' } } to nudge + Chrome's picker toward screen-share by default" + - "Operator documentation: 'pick Entire Screen, not This Tab'" + debug_session: "" + defer_to_phase: 5 +