From 674c4159453fdb877b10bb78352c8e5b4ed4e215 Mon Sep 17 00:00:00 2001 From: Mark Date: Sat, 16 May 2026 14:17:45 +0200 Subject: [PATCH] test(debug-empty-archive): RED gate for empty-archive-port-race (H1 + H1.b + H2) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase 1 UAT Test 3 surfaced a two-headed BLOCKER: (a) silent empty-video archive when save crosses a port-reconnect window, (b) 3x "Attempting to use a disconnected port object" Uncaught Errors starting at the 290 s pre-emptive reconnect mark. Bisect confirmed: H1 (port lifecycle race) was introduced by Plan 01-04 (b064a21); H2 (createArchive silent-skip on empty segments) is an upstream defect (555eb05) that became fatal once CR-01 + sweep #5 guaranteed the silent-skip branch would fire on every save during a reconnect window. This commit lands the 3 RED tests at the unit-test level — they match the UAT error string byte-for-byte for H1/H1.b and pin the silent-drop contract for H2. They will flip GREEN as the Option C architectural refactor (request-id'd port protocol + port-health probe + retry + operator-visible error surface) lands across the next commits. Baseline: 8 files / 43 tests (40 GREEN, 3 RED). tsc --noEmit exit 0. Co-Authored-By: Claude Opus 4.7 (1M context) --- .planning/debug/empty-archive-port-race.md | 406 +++++++++++++++++++ tests/offscreen/port-reconnect-race.test.ts | 427 ++++++++++++++++++++ 2 files changed, 833 insertions(+) create mode 100644 .planning/debug/empty-archive-port-race.md create mode 100644 tests/offscreen/port-reconnect-race.test.ts diff --git a/.planning/debug/empty-archive-port-race.md b/.planning/debug/empty-archive-port-race.md new file mode 100644 index 0000000..07c6eab --- /dev/null +++ b/.planning/debug/empty-archive-port-race.md @@ -0,0 +1,406 @@ +--- +slug: empty-archive-port-race +status: tdd_red_confirmed +trigger: | + Phase 1 UAT Test 3 surfaced a two-headed BLOCKER: + (a) Archive shipped via popup save contains NO video/last_30sec.webm + (only rrweb/, logs/, screenshot.png, meta.json) — silent data loss + defeating the entire phase goal. + (b) 3× Uncaught Error: "Attempting to use a disconnected port object" + surface in the offscreen console starting at the 290 s pre-emptive + port-reconnect mark (recorder.ts line 627), spaced ~25-30 s + apart (PORT_PING_MS = 25_000 cadence). + Both symptoms cluster around src/offscreen/recorder.ts:597-630 + (port reconnect / encodeAndSendBuffer interaction) and + src/background/index.ts:130-167, 340-393 (BUFFER round-trip + zip + build branch that silently elides the video file on empty segments). +created: 2026-05-16T11:58:51Z +updated: 2026-05-16T12:09:40Z +phase: 01-stabilize-video-pipeline +related_uat: .planning/phases/01-stabilize-video-pipeline/01-UAT.md +related_review_fix: .planning/phases/01-stabilize-video-pipeline/01-REVIEW-FIX.md +prior_resolved_sessions: + - .planning/debug/resolved/d12-blob-port-transfer-fails.md + - .planning/debug/resolved/webm-playback-freeze.md +--- + +# Debug: Empty-archive + port-reconnect race (Phase 1 BLOCKER) + +## Symptoms (from UAT Test 3) + +**Expected behavior:** +When the operator clicks save in the popup, the produced ZIP archive +contains `video/last_30sec.webm` — a playable WebM with the most recent +30 s of captured screen video derived from D-13 restart-segments ring +buffer (3 × 10 s self-contained segments). + +**Actual behavior:** +1. `smoke.sh` waited 600 s before detecting a new archive in ~/Downloads. +2. The detected archive + (`/home/parf/Downloads/session_report_2026-05-16_13-54-52.zip`, + 88 254 bytes) contains: + ``` + rrweb/ (dir, empty) + rrweb/session.json (2 bytes — empty JSON) + logs/ (dir, empty) + logs/events.json (2 bytes — empty JSON) + screenshot.png (87 300 bytes) + meta.json (336 bytes) + ``` + **No `video/` directory, no `last_30sec.webm`.** Total archive + 88 KB vs. expected ~1.5 MB with video. +3. Offscreen console showed 3× `Uncaught Error: Attempting to use a + disconnected port object at offscreen-Bp_IKlxL.js:1:4644` starting + at exactly 290 s after recording start (timestamps 11:50:10, ~11:50:40, + ~11:51:00) — coincides with `pre-emptive port reconnect (290 s cap)` + log line at recorder.ts:627. +4. Ring buffer itself was healthy throughout — 33+ segments rotated + cleanly with `kept: 3/3` invariant (sizes 472-799 KB per 10 s segment). + The D-13 restart-segments lifecycle is NOT the broken component. + +**Error messages:** +``` +offscreen-Bp_IKlxL.js:1 Uncaught Error: Attempting to use a disconnected port object + at offscreen-Bp_IKlxL.js:1:4644 +``` +And smoke.sh terminal: +``` +caution: filename not matched: video/last_30sec.webm +``` + +**Timeline:** +- Bug introduced: not yet bisected. Most plausible window is one of: + - Plan 01-04 (port keepalive + OFFSCREEN_READY handshake) — added + the long-lived port that the pre-emptive reconnect operates on. + - Plan 01-05 (SW shrink + port host with T-1-04 sender check) — + added `getVideoBufferFromOffscreen` request-response pattern. + - Review-fix iteration 1 (commit 2e3f524 = CR-01 + CR-02 + CR-03 + + WR-03 + WR-09) — CR-01 specifically introduced the + "portAtRequest captured before await + refuse-to-post on stale + port" pattern in encodeAndSendBuffer that drops responses on the + floor without retry. + - Review-fix iteration 2 sweep #5 (commit 034155b "surface + port-replaced-during-fetch diagnostic on buffer timeout") — added + diagnostic logging on timeout but no retry. +- Never worked end-to-end with operator-side save under realistic + recording duration. The original Phase 1 closure on 2026-05-15 was + confirmed with a fresh fixture from a successful smoke run, but + the fixture was regenerated by a smoke session that almost certainly + completed before the 290 s pre-emptive reconnect window + (smoke required ≥ 35 s recording, not 5+ minutes). + +**Reproduction:** +1. `npm run build` → produces dist/ +2. `KEEP_PROFILE=0 ./smoke.sh` +3. In Chrome: load dist/ unpacked, click extension icon → picker + auto-accepts smoke tab → recording starts (logs confirm). +4. WAIT past the 290 s mark (e.g., 5+ minutes). Watch the offscreen + console: at 290 s see `pre-emptive port reconnect`, immediately + followed by `Uncaught Error: Attempting to use a disconnected port + object`, then 2-3 more such errors at PORT_PING_MS=25_000 intervals. +5. Click extension icon → "Сохранить отчёт об ошибке". Observe the + long wait time (the user saw 600 s — investigate whether multiple + clicks were needed or whether saveArchive has its own retry). +6. Inspect the produced zip: `unzip -l ~/Downloads/session_report_*.zip` + → no `video/last_30sec.webm` entry. + +## Current Focus + +hypothesis: | + Two coupled defects sharing the same root cause (port-reconnect race + window in src/offscreen/recorder.ts): + + H1 (the noise — Uncaught Errors): The ping `setInterval` at + recorder.ts:623-625 uses `keepalivePort?.postMessage({type:'PING'})`. + Optional chaining only guards against `null`, not against a + connected-but-being-disconnected `Port` object. When the + pre-emptive `setTimeout` fires at 290 s (recorder.ts:626-630) and + calls `keepalivePort?.disconnect()`, there's a microtask race + window before the onDisconnect handler (recorder.ts:616-622) runs + `teardownPortTimers()`. Any ping callback already queued/dispatched + in that window throws "Attempting to use a disconnected port object." + Two follow-on errors at ~25 s and ~50 s post-reconnect suggest the + recursive `connectPort()` re-installs the ping interval against + a NEW port that itself gets disconnected (possibly by a second + pre-emptive cycle, OR by the SW idle-unloading), but the error + remains because the same race recurs. + + H2 (the silent data loss — empty-video archive): The same + reconnect-race window leaves REQUEST_BUFFER / BUFFER round-trips + vulnerable. When the operator clicks save during or shortly after + a reconnect: + (a) SW's `getVideoBufferFromOffscreen` + (src/background/index.ts:130-167) sends REQUEST_BUFFER via the + OLD port that just disconnected, OR + (b) Offscreen's `encodeAndSendBuffer` + (src/offscreen/recorder.ts:522-545, 547-604) captures + `portAtRequest` before its `await blobToBase64(...)` for ~3 + segments (~150 ms), the pre-emptive reconnect fires mid-encode, + then the guard at line 597-601 correctly REFUSES to post on + the now-stale `portAtRequest` and just returns. + Either way, the SW's per-request listener times out silently after + `BUFFER_FETCH_TIMEOUT_MS = 2_000`, sets `videoBufferResponse = + { segments: [] }`, and `createArchive` + (src/background/index.ts:340-352) takes the empty branch: + if (videoBufferResponse.segments.length > 0) { ... } else { + logger.warn('✗ No video segments to add'); // only logs + } + → archive built WITHOUT the video file, shipped to operator silently. + The 600 s smoke wait is consistent with the operator clicking save + multiple times until one REQUEST_BUFFER round-trip happened to + complete on a fresh port — and even then, that single successful + trip may itself have caught only one of the 3 segments before the + next reconnect (worth checking against actual archive's NULL state + vs partial state). + +test: | + Initial evidence-gathering test (RED): write a vitest spec that + drives recorder.ts's port lifecycle through a pre-emptive reconnect + while a ping is queued, and asserts NO Uncaught Error escapes. + This pins H1. For H2 a second test drives REQUEST_BUFFER → + encodeAndSendBuffer → reconnect-mid-await → asserts BUFFER is + retried on the fresh port (or that saveArchive surfaces an error + instead of silently dropping). +expecting: | + Initial RED tests fail in current code (proves bugs are + reproducible at the unit level). Then the gsd-debugger isolates + the precise call sequence + lands the GREEN. +next_action: TDD checkpoint — RED tests landed and confirmed failing for the exact UAT error string. Awaiting user confirmation of RED gate before writing GREEN fix. +reasoning_checkpoint: "" +tdd_checkpoint: | + Test file: tests/offscreen/port-reconnect-race.test.ts (3 tests, all RED) + Status: RED ✓ (failures confirm both H1 + H2 hypotheses) + Baseline: 7 files / 40 tests still GREEN (no regression in pinning contracts) + tsc --noEmit: exit 0 + + Failure outputs: + H1 AssertionError: expected [Function] to not throw an error + but 'Error: Attempting to use a disconnected port object' was thrown + (line 200 — production ping callback throws on disconnected port) + + H1.b AssertionError: expected [Function] to not throw an error + but 'Error: Attempting to use a disconnected port object' was thrown + (line 419 — pre-emptive reconnect path triggers the same race) + + H2 AssertionError: expected 0 to be greater than or equal to 1 + (line 341 — production code silently drops BUFFER when reconnect + arrives mid-encode; neither old nor new port receives any BUFFER) + +## Constraints + +- TDD mode is ON (workflow.tdd_mode: true). RED test must land before + GREEN fix. +- Auto-loaded memories: `feedback-gsd-ceremony-for-fixes.md` (no + hot-edits; route through /gsd-debug) and + `feedback-no-unilateral-scope-reduction.md` (no scope narrowing; + surface choices via AskUserQuestion). +- Both fixes must NOT regress D-12 (base64 wire format) or D-13 + (restart-segments). Existing tests at + `tests/offscreen/port-serialization.test.ts` and + `tests/offscreen/segment-rotation.test.ts` are the pinning contracts. +- Existing tests (40 across 7 files) all green; tsc exit 0; build + exit 0. Any fix must preserve that baseline. + +## Files of Interest (preliminary; debugger may expand) + +- src/offscreen/recorder.ts: + - 522-545: encodeAndSendBuffer (CR-01 portAtRequest guard) + - 547-604: doEncodeAndSendBuffer (base64 encode loop + WR-03 + monotonic seq + stale-port refuse + postMessage) + - 606-630: connectPort (port lifecycle, ping interval, pre-emptive + reconnect setTimeout) + - 616-622: onDisconnect handler (teardownPortTimers + recursive + connectPort) + - 39: PORT_RECONNECT_MS = 290_000 + - 38: PORT_PING_MS = 25_000 +- src/background/index.ts: + - 130-167: getVideoBufferFromOffscreen (per-request listener + + BUFFER_FETCH_TIMEOUT_MS = 2_000) + - 340-393: createArchive (the silent-empty-video branch at 346-352) + - 430-493: saveArchive (orchestrator) +- tests/offscreen/port.test.ts (existing — only covers SW-side + disconnect path, NOT pre-emptive reconnect race) +- tests/offscreen/port-serialization.test.ts (D-12 contract) +- tests/offscreen/segment-rotation.test.ts (D-13 contract) +- tests/offscreen/port-reconnect-race.test.ts (NEW — RED gate for this + debug session; pins H1 + H1.b + H2 contracts at unit-test level) + +## Evidence + +- timestamp: 2026-05-16T12:06:29Z — Baseline established. `npx vitest run` + → 7 files passed / 40 tests passed. Confirms a clean RED→GREEN baseline + for the new test file. +- timestamp: 2026-05-16T12:06:45Z — Static analysis of src/offscreen/recorder.ts + + the minified production bundle dist/assets/offscreen-Bp_IKlxL.js + CONFIRMS H1 at the bytecode level: + `grep -oP '.{60}postMessage.{20}' offscreen-Bp_IKlxL.js` yields: + `connecting"),E(),o=null,O()}),p=setInterval(()=>{o==null||o.postMessage({type:"PING"})},k)` + This is exactly the pattern at recorder.ts:623-625 — the ping + callback uses `?.postMessage` whose only safety check is null. The + minified column 4644 in the UAT error stack maps to this site: + `o.postMessage({type:"PING"})` where `o` is a non-null but + disconnected Port. Chrome's documented runtime semantics for Port + say postMessage on a disconnected port throws synchronously with + the exact error string observed in UAT + ("Attempting to use a disconnected port object"). +- timestamp: 2026-05-16T12:07:00Z — Static analysis of the + doEncodeAndSendBuffer flow CONFIRMS H2 silent-drop: + `src/offscreen/recorder.ts:597-601` — + `if (keepalivePort !== portAtRequest) { logger.warn(...); return; }` + On a mid-encode reconnect, keepalivePort is reassigned by the + onDisconnect→connectPort chain while portAtRequest still points to + the OLD port. The function returns WITHOUT posting anywhere. + `src/background/index.ts:134-150` — SW per-request listener bound + to the captured `port` (the OLD port). After 2 s + (BUFFER_FETCH_TIMEOUT_MS) it resolves `{ segments: [] }`. + `src/background/index.ts:346-352` — `createArchive` branch: + `if (videoBufferResponse.segments.length > 0) { ... } else { + logger.warn('✗ No video segments to add'); }` + No throw, no surface to operator. Archive ships WITHOUT video. +- timestamp: 2026-05-16T12:09:30Z — RED tests written and executed: + `tests/offscreen/port-reconnect-race.test.ts`. 3 tests, all RED: + H1 `ping callback handles a remotely-disconnected port without + throwing` — FAILS with "Attempting to use a disconnected port + object" (matches UAT error string byte-for-byte). + H1.b `pre-emptive reconnect path — ping does not throw against + just-disconnected port` — FAILS with same error (proves the + 290 s pre-emptive path is the same race). + H2 `REQUEST_BUFFER mid-reconnect must NOT drop BUFFER when + segments exist` — FAILS with `expected 0 to be greater than + or equal to 1`. 3 seeded segments + mid-encode reconnect → + neither old nor new port receives any BUFFER. Proves the + silent-drop is real and reproducible at the unit level. +- timestamp: 2026-05-16T12:09:40Z — Full-suite check: 8 files (was 7) / + 43 tests (was 40) — exactly 3 new RED, 40 baseline still GREEN. No + regression. `npx tsc --noEmit` exit 0 (new test file is type-clean). + +## Eliminated + +- **Not a buffer-emptiness issue.** UAT logs show 33+ segments rotated + successfully with `kept: 3/3` invariant maintained throughout the + 600 s session. The ring buffer always had 3 × ~500-800 KB segments + available when SAVE_ARCHIVE fired. The data is THERE; the transport + fails to deliver it. +- **Not a D-13 segment-lifecycle regression.** existing + segment-rotation.test.ts contract still passes; the WebM payload + format is sound (preserved by the d12 base64 wire-format fix). +- **Not a D-12 wire-format regression.** existing + port-serialization.test.ts contract still passes; the base64 round- + trip is intact. The bug is at the port-lifecycle layer ABOVE the + wire format. +- **Not a smoke.sh artifact.** the actual zip on disk was inspected + by-hand (`unzip -l ~/Downloads/session_report_2026-05-16_13-54-52.zip`) + — the absence of video/last_30sec.webm is on the wire, not in the + smoke harness's reporting. +- **Not a JSZip issue.** other archive entries (screenshot.png, meta.json) + are present and well-formed. JSZip received an archive build with no + video entry — because createArchive's silent-skip branch never + called `zip.file('video/last_30sec.webm', videoBlob)`. +- **Not a chrome.runtime.sendMessage problem.** the bug is on the + long-lived `chrome.runtime.connect` port (offscreen↔SW), not on the + popup↔SW sendMessage path. The popup→SW SAVE_ARCHIVE call works + fine — it just gets `{success: true}` back even though the archive + is video-less. + +## Resolution + +root_cause: "" +fix: "" +verification: "" +files_changed: [] + +## Bisect Results (2026-05-16T12:15:00Z) + +User chose "Bisect first" from the fix-strategy fork. Targeted git-log +scan across the suspect code regions across the full Phase 1 history +(`555eb05..HEAD`): + +| Symptom | Mechanism | Introduction commit | Subsequent amplification | +|---------|----------------------------------------|------------------------------------------------------|-----------------------------------------------------------------| +| H1.a | ping setInterval with `?.postMessage` | `b064a21` (Plan 01-04, 2026-05-15) | none — race exists since this commit | +| H1.b | pre-emptive 290 s reconnect setTimeout | `b064a21` (Plan 01-04, 2026-05-15) | none | +| H2 | createArchive silent-skip on empty | **`555eb05` (imported broken upstream — predates Phase 1)** | `2e3f524` CR-01 "refuse-to-post on swapped port" without retry; `034155b` sweep #5 added diagnostic on timeout but still no retry | + +**Strategic implications surfaced by bisect:** + +1. **H2 silent-skip is an UPSTREAM defect** in the original "half-broken + blob of code" — never on the Phase 1 fix radar because: + - Original 22-defect audit didn't flag it (it's a "what happens if a + downstream returns 0?" question that needs a real failure mode to + surface) + - REVIEW.md's 18 findings didn't catch it + - Sweep #5 (commit 034155b "surface port-replaced-during-fetch + diagnostic on buffer timeout") added a log on timeout but kept the + silent-skip behaviour — the diagnostic logs went to the SW console, + but the operator-facing failure path stayed silent. + +2. **H1 port race latency-amplifies H2 from latent to fatal.** Before + Plan 01-04 there was no port → no reconnect → silent-skip never fired + in practice. After 01-04 the 290 s pre-emptive reconnect creates a + guaranteed-recurring window where REQUEST_BUFFER round-trips can + fail; after CR-01 the "refuse to post on swapped port" pattern makes + the silent-skip fire with near-100 % probability whenever a save is + clicked across a reconnect window. + +3. **Option A** (target both symptoms with retry + try/catch) is + sufficient to make the visible symptoms go away, but leaves the + upstream silent-skip pattern intact — a class of latent failures + remains. **Option B** (which converts silent-skip into operator- + visible error) directly retires the upstream class. **Option C** + (request-id'd round-trip protocol) makes port-replacement + irrelevant by architecture but is the largest blast radius. + +The bisect strengthens the case for B (it's not just symptom-control; +it retires the entire shape of bug, including the upstream one). +However, A is still defensible if treated as a pure hotfix to ship +Phase 1 and the silent-skip fix is consciously deferred to a follow-up +plan. C is justifiable if the user wants Phase 1's port lifecycle to +match the durability of the D-12/D-13 contracts before Phase 2 layers +DOM capture on top. + +(Updated by orchestrator after bisect; user decides next.) + +## Fix Strategy Chosen: Option C (Architectural) + +User decided (post-bisect): **Option C — Architectural port lifecycle refactor.** + +Rationale: bisect shows the current port lifecycle's race window is the +proximate cause of every H1 error AND the trigger for the upstream H2 +silent-skip class. Option C makes port-replacement architecturally +irrelevant by: + +1. Replace pre-emptive 290 s `setTimeout` reconnect with **port-health + probe** (ping/pong protocol; offscreen explicitly waits for SW echo + before considering port live; on any failure → clean teardown + + reconnect). +2. Switch REQUEST_BUFFER / BUFFER to **request-id'd** pattern: SW + sends `{type:'REQUEST_BUFFER', requestId: uuid}`; offscreen + responds with `{type:'BUFFER', requestId, segments}`; SW only + accepts responses matching its current pending requestId. SW + retries on port replacement until success or hard outer timeout. +3. Continuous end-to-end vitest covering 600 s of port lifecycle + (2 reconnects + simulated REQUEST_BUFFER round-trips). Becomes + the new pinning contract for the port lifecycle. +4. **Keep** the existing operator-visible safety from Option B: even + in the new design, if a hard outer timeout fires, `createArchive` + must throw → popup surfaces `RECORDING_ERROR` to operator instead + of silently shipping a video-less archive. + +Touches D-17 port-lifecycle contract (CONTEXT.md will be amended). +Expected surface: ~150-200 LOC + 2-3 new test files. Pre-existing +40 GREEN tests + 3 new RED tests must all flip GREEN at completion. + +Constraints (re-iterated for the executor): +- TDD: each new behaviour gets a RED test BEFORE GREEN implementation. +- D-12 (base64 wire format) and D-13 (restart-segments) contracts must + not regress. Existing pinning tests at + `tests/offscreen/port-serialization.test.ts` and + `tests/offscreen/segment-rotation.test.ts` are the canaries. +- `npx tsc --noEmit` exit 0, `npm run build` exit 0, type-safety + grep clean must all hold at every commit. +- `tests/fixtures/last_30sec.webm` is currently dirty in the working + tree (operator's recent smoke session staged a new file at that + path). Do NOT commit changes to that file — the committed D-13 + fixture is the canonical contract. + +(Strategy decided 2026-05-16T12:18:00Z; executor dispatch follows.) diff --git a/tests/offscreen/port-reconnect-race.test.ts b/tests/offscreen/port-reconnect-race.test.ts new file mode 100644 index 0000000..5da7a8b --- /dev/null +++ b/tests/offscreen/port-reconnect-race.test.ts @@ -0,0 +1,427 @@ +// tests/offscreen/port-reconnect-race.test.ts +// +// RED-gate tests for debug session empty-archive-port-race +// (.planning/debug/empty-archive-port-race.md). +// +// Empirically pins the two coupled defects observed in Phase 1 UAT Test 3: +// +// H1 Pre-emptive reconnect at 290 s leaves the ping interval briefly +// able to fire against a Port whose remote end has been disconnected, +// throwing "Attempting to use a disconnected port object" — 3 +// uncaught errors observed in the offscreen console. +// +// H2 REQUEST_BUFFER round-trip silently drops BUFFER when the port is +// replaced mid-encode (src/offscreen/recorder.ts:597 stale-port +// refuse) AND when the SW's videoPort is null at request time +// (src/background/index.ts:128 fast-bailout). The SW then times +// out (BUFFER_FETCH_TIMEOUT_MS = 2 s) and resolves +// `{ segments: [] }`, which createArchive's branch at +// src/background/index.ts:346 silently elides into a video-less +// archive. Operator-visible result: 88 KB zip with NO +// video/last_30sec.webm. +// +// These tests drive the REAL src/offscreen/recorder.ts module via the +// established chrome-stub harness (same scaffold as +// tests/offscreen/port.test.ts + handshake.test.ts). No source-code +// re-implementation — the production module is what we're testing. + +import { describe, it, expect, vi, beforeEach } from 'vitest'; + +interface PortStub { + name: string; + // Throw-on-disconnected-postMessage: simulates the real Chrome behaviour + // documented at https://developer.chrome.com/docs/extensions/develop/concepts/messaging + // — postMessage on a disconnected Port throws synchronously with + // "Attempting to use a disconnected port object". + postMessage: ReturnType; + onMessage: { + addListener: ReturnType; + // capture for synthetic dispatch + _listeners: Array<(msg: unknown) => void>; + }; + onDisconnect: { + addListener: (fn: () => void) => void; + // capture for synthetic dispatch + _listeners: Array<() => void>; + }; + disconnect: ReturnType; + // test seam — flip this to true to simulate the remote disconnect + // having taken effect before the next postMessage runs. + _disconnected: boolean; +} + +interface ChromeStub { + runtime: { + id: string; + sendMessage: ReturnType; + onMessage: { addListener: ReturnType }; + connect: () => PortStub; + }; +} + +interface GlobalWithChrome { + chrome?: ChromeStub; + MediaRecorder?: { isTypeSupported: (mime: string) => boolean }; +} + +// Build a PortStub whose postMessage throws once `_disconnected` flips true, +// matching Chrome's runtime semantics. The throw is the exact error string +// the offscreen console showed in UAT Test 3 ("Attempting to use a +// disconnected port object"), so the test fails loudly when the bug is +// reproduced — and is silent (pass) once the production code clears the +// ping interval BEFORE the disconnect-detection race window. +function makePortStub(): PortStub { + const port: PortStub = { + name: 'video-keepalive', + postMessage: vi.fn(), + onMessage: { + addListener: vi.fn(), + _listeners: [], + }, + onDisconnect: { + addListener: (fn: () => void) => { + port.onDisconnect._listeners.push(fn); + }, + _listeners: [], + }, + disconnect: vi.fn(() => { + port._disconnected = true; + }), + _disconnected: false, + }; + // Wire onMessage.addListener to capture into _listeners so the test can + // dispatch REQUEST_BUFFER synthetically. + port.onMessage.addListener.mockImplementation( + (fn: (msg: unknown) => void) => { + port.onMessage._listeners.push(fn); + } + ); + // Wire postMessage to throw if the port is disconnected — Chrome's + // documented behaviour. This is the smoking gun for H1. + port.postMessage.mockImplementation((msg: unknown) => { + if (port._disconnected) { + throw new Error('Attempting to use a disconnected port object'); + } + return msg; + }); + return port; +} + +function buildChromeStub(ports: PortStub[]): ChromeStub { + return { + runtime: { + id: 'ext-id-test', + sendMessage: vi.fn(), + onMessage: { addListener: vi.fn() }, + connect: () => { + const port = makePortStub(); + ports.push(port); + return port; + }, + }, + }; +} + +describe('port reconnect race (RED — confirms empty-archive-port-race H1+H2)', () => { + let originalSetInterval: typeof globalThis.setInterval; + let originalClearInterval: typeof globalThis.clearInterval; + let originalSetTimeout: typeof globalThis.setTimeout; + let originalClearTimeout: typeof globalThis.clearTimeout; + + beforeEach(() => { + vi.resetModules(); + (globalThis as unknown as GlobalWithChrome).MediaRecorder = { + isTypeSupported: vi.fn().mockReturnValue(true), + }; + originalSetInterval = globalThis.setInterval; + originalClearInterval = globalThis.clearInterval; + originalSetTimeout = globalThis.setTimeout; + originalClearTimeout = globalThis.clearTimeout; + }); + + // ────────────────────────────────────────────────────────────────────── + // H1 — ping fires against disconnected port and throws + // ────────────────────────────────────────────────────────────────────── + // + // Production code path (src/offscreen/recorder.ts:623-625): + // pingIntervalId = setInterval(() => { + // keepalivePort?.postMessage({ type: 'PING' }); // throws if disconnected + // }, PORT_PING_MS); + // + // The `?.` only guards against keepalivePort === null. After the SW + // (or the SW's idle-eviction, or a remote disconnect) tears down the + // port, the offscreen-side Port reference is still non-null but + // postMessage throws synchronously with "Attempting to use a + // disconnected port object". The interval is cleared only in the + // onDisconnect handler — there's a race window during which the next + // queued interval callback fires before onDisconnect runs. + // + // Test strategy: intercept setInterval to capture the ping callback, + // then synthetically disconnect the port (without firing onDisconnect + // first — mimicking the race), then invoke the captured ping callback + // and assert it does NOT throw an uncaught error. + it('H1: ping callback handles a remotely-disconnected port without throwing', async () => { + const ports: PortStub[] = []; + const stub = buildChromeStub(ports); + (globalThis as unknown as GlobalWithChrome).chrome = stub; + + // Intercept setInterval to capture the ping callback for direct invocation. + const intervalCallbacks: Array<() => void> = []; + globalThis.setInterval = ((cb: () => void, _ms: number) => { + intervalCallbacks.push(cb); + return 999 as unknown as ReturnType; + }) as typeof globalThis.setInterval; + // No-op clearInterval — the test wants to keep the callback callable + // even after the production code attempts a clear, to expose the + // race-window assumption. + globalThis.clearInterval = (() => {}) as typeof globalThis.clearInterval; + + await import('../../src/offscreen/recorder'); + + expect(ports.length).toBe(1); + expect(intervalCallbacks.length).toBeGreaterThanOrEqual(1); + + const port = ports[0]; + const pingCb = intervalCallbacks[0]; + + // Simulate the race: the remote side of the port has been + // disconnected (e.g. SW eviction, ~5 min cap, port glitch) but the + // offscreen's onDisconnect handler has NOT yet run, so the interval + // is still installed and keepalivePort is still non-null on the + // offscreen side. + port._disconnected = true; + + // The PRODUCTION ping callback must not throw — either by checking + // port.postMessage safety first, or by wrapping the call in + // try/catch, or by removing the interval at first postMessage + // failure. Whatever the fix, an uncaught throw here would surface + // as the "Attempting to use a disconnected port object" error the + // operator observed in UAT Test 3. + expect(() => pingCb()).not.toThrow(); + + // Restore globals. + globalThis.setInterval = originalSetInterval; + globalThis.clearInterval = originalClearInterval; + }); + + // ────────────────────────────────────────────────────────────────────── + // H2 — BUFFER lost on stale-port reconnect; SW left with empty segments + // ────────────────────────────────────────────────────────────────────── + // + // Production code path (src/offscreen/recorder.ts:547-604): + // const finalized = getSegments(); + // ... + // for (let idx = 0; idx < allSegments.length; idx++) { + // const data = await blobToBase64(segment); // ←── async window + // transferred.push(...); + // } + // if (keepalivePort !== portAtRequest) { + // logger.warn('port replaced during encode; dropping BUFFER response'); + // return; + // } + // portAtRequest.postMessage({ type: 'BUFFER', segments: transferred }); + // + // When the SW issues REQUEST_BUFFER during the ~10-50 ms reconnect + // window, the offscreen captures `portAtRequest = keepalivePort`, + // awaits the per-segment base64 encode (~150 ms for 3 segments), and + // by the time the encode completes the pre-emptive reconnect has + // swapped `keepalivePort` to a fresh Port. The guard correctly + // refuses to post on the stale Port. HOWEVER, no retry is issued — + // the SW's getVideoBufferFromOffscreen times out after 2 s and + // resolves `{ segments: [] }`. createArchive then takes the silent + // no-video branch (src/background/index.ts:350-352). The archive + // ships without video. + // + // This RED test pins the contract: when REQUEST_BUFFER arrives AND + // segments exist, BUFFER (with the encoded segments) MUST reach SOME + // live port, either: + // (a) by retrying on the freshly-connected port, OR + // (b) by re-issuing REQUEST_BUFFER on the new port from the + // reconnect handler when there's an unanswered request in + // flight. + // Whatever the strategy, silent loss with non-empty buffer is the + // bug. The current production code drops the response on the floor. + it('H2: REQUEST_BUFFER mid-reconnect must NOT drop BUFFER when segments exist', async () => { + const ports: PortStub[] = []; + const stub = buildChromeStub(ports); + (globalThis as unknown as GlobalWithChrome).chrome = stub; + + const recorder = await import('../../src/offscreen/recorder'); + + // 1) Seed the ring buffer with 3 real Blob segments via the test + // seam (recorder.ts:85 — pushSegmentForTest). Each segment is a + // plausible WebM head (EBML magic + filler bytes). The 3 × + // payload ensures the for-loop body actually awaits — opening + // the real reconnect race window. + const ebmlMagic = new Uint8Array([0x1a, 0x45, 0xdf, 0xa3]); + recorder.resetBuffer(); + for (let i = 0; i < 3; i++) { + const payload = new Uint8Array(1024); + payload.set(ebmlMagic, 0); + payload.fill(i + 1, ebmlMagic.length); + recorder.pushSegmentForTest( + new Blob([payload], { type: 'video/webm' }) + ); + } + expect(recorder.getSegments().length).toBe(3); + + expect(ports.length).toBe(1); + const oldPort = ports[0]; + expect(oldPort.onMessage._listeners.length).toBeGreaterThanOrEqual(1); + + // 2) SW issues REQUEST_BUFFER — synthetically dispatch to the + // captured onPortMessage listener. The production code kicks off + // encodeAndSendBuffer which awaits 3 blobToBase64 calls. + oldPort.onMessage._listeners[0]({ type: 'REQUEST_BUFFER' }); + + // 3) Mid-encode, the pre-emptive reconnect fires. Simulate by + // firing onDisconnect on the old port — production code's + // handler runs `keepalivePort = null` then `connectPort()` + // which spawns a new PortStub. The keepalivePort reference now + // points to the new port; the in-flight portAtRequest in the + // encode loop still points to oldPort, so the stale-port guard + // at line 597 will trigger. + oldPort.onDisconnect._listeners.forEach((fn) => fn()); + + // After reconnect a NEW port exists. + expect(ports.length).toBeGreaterThanOrEqual(2); + const newPort = ports[ports.length - 1]; + + // 4) Drain enough microtasks for the encode loop to complete and + // the post-await guard to evaluate. blobToBase64 awaits + // `blob.arrayBuffer()` (1 microtask) ×3 + the guard check. + // Generous: drain 32 microtasks then a macrotask. This is + // deterministic — vitest's Node env exposes setImmediate + // semantics via setTimeout(_, 0). + for (let i = 0; i < 32; i++) { + await Promise.resolve(); + } + await new Promise((resolve) => setTimeout(resolve, 0)); + for (let i = 0; i < 32; i++) { + await Promise.resolve(); + } + + // CONTRACT — pick exactly one: + // + // (a) BUFFER posted on the NEW port (segments encoded, retry on + // fresh port works), OR + // (b) BUFFER posted on the OLD port (encode completed and the + // post squeezed in before the swap took effect — acceptable + // because the SW's per-request listener is still bound to it + // and the SW gets the data). + // + // The current production code posts NEITHER when the reconnect + // arrives between the await and the post (the line 597 guard + // returns silently). This assertion FAILS today, proving the H2 + // bug. The fix must surface SOME signal that the SW can act on — + // silent loss when the buffer is non-empty is the bug. + + const oldPortBuffers = oldPort.postMessage.mock.calls.filter( + (call: unknown[]) => + typeof call[0] === 'object' && + call[0] !== null && + (call[0] as { type?: unknown }).type === 'BUFFER' + ); + const newPortBuffers = newPort.postMessage.mock.calls.filter( + (call: unknown[]) => + typeof call[0] === 'object' && + call[0] !== null && + (call[0] as { type?: unknown }).type === 'BUFFER' + ); + + const totalBuffersPosted = oldPortBuffers.length + newPortBuffers.length; + // Stronger assertion: when BUFFER reaches a port, it must carry + // the 3 segments we seeded. A 0-segment BUFFER is the silent-loss + // shape and should not satisfy the contract. + const allBufferCalls = [...oldPortBuffers, ...newPortBuffers]; + const segmentBearingBuffers = allBufferCalls.filter((call) => { + const msg = call[0] as { segments?: unknown }; + return Array.isArray(msg.segments) && msg.segments.length > 0; + }); + expect(totalBuffersPosted).toBeGreaterThanOrEqual(1); + expect(segmentBearingBuffers.length).toBeGreaterThanOrEqual(1); + + // Cleanup test seam state so we don't bleed segments across tests. + recorder.resetBuffer(); + }); + + // ────────────────────────────────────────────────────────────────────── + // H1.b — pre-emptive reconnect path specifically. Walk the same + // sequence the operator hit at 290 s: the production code calls + // `keepalivePort?.disconnect()` and the test then invokes the ping + // callback that was installed BEFORE the reconnect. This proves the + // race window in the production timer-clear ordering. + // ────────────────────────────────────────────────────────────────────── + it('H1.b: pre-emptive reconnect path — ping does not throw against just-disconnected port', async () => { + const ports: PortStub[] = []; + const stub = buildChromeStub(ports); + (globalThis as unknown as GlobalWithChrome).chrome = stub; + + // Capture both interval callbacks AND timeout callbacks for direct + // invocation. We need to fire the pre-emptive setTimeout + // synthetically (it would otherwise wait 290 s of wall-clock time). + const intervalCallbacks: Array<() => void> = []; + const timeoutCallbacks: Array<() => void> = []; + + globalThis.setInterval = ((cb: () => void, _ms: number) => { + intervalCallbacks.push(cb); + return 100 as unknown as ReturnType; + }) as typeof globalThis.setInterval; + globalThis.setTimeout = ((cb: () => void, _ms: number) => { + timeoutCallbacks.push(cb); + return 200 as unknown as ReturnType; + }) as typeof globalThis.setTimeout; + // No-op the clear* so the captured callbacks remain callable — + // surfaces the race-window assumption in the production code. + globalThis.clearInterval = (() => {}) as typeof globalThis.clearInterval; + globalThis.clearTimeout = (() => {}) as typeof globalThis.clearTimeout; + + await import('../../src/offscreen/recorder'); + + expect(ports.length).toBe(1); + const port = ports[0]; + + // intervalCallbacks[0] is the ping callback (first setInterval call). + // timeoutCallbacks contains at least the pre-emptive reconnect timer + // (recorder.ts:626). Other setTimeouts may exist (e.g. queueMicrotask + // shims, scheduleRotation — but the latter is gated on mediaStream + // which is null in this test). + expect(intervalCallbacks.length).toBeGreaterThanOrEqual(1); + expect(timeoutCallbacks.length).toBeGreaterThanOrEqual(1); + + const pingCb = intervalCallbacks[0]; + // The pre-emptive reconnect timeout is the only setTimeout the + // bootstrap installs (rotation is gated on mediaStream). Pick the + // first to keep the test resilient. + const preemptiveReconnect = timeoutCallbacks[0]; + + // 1) Fire pre-emptive reconnect — the production code calls + // keepalivePort?.disconnect(). Our stub flips _disconnected=true + // synchronously (matching Chrome's local disconnect semantics). + preemptiveReconnect(); + expect(port._disconnected).toBe(true); + expect(port.disconnect).toHaveBeenCalled(); + + // 2) Now the race: an already-queued ping callback (e.g. scheduled + // before the reconnect) fires against the just-disconnected port. + // Production code: keepalivePort?.postMessage({type:'PING'}) — + // keepalivePort is still the old non-null reference at this point + // (the onDisconnect handler may not have run yet in the same + // macrotask), so postMessage runs against the disconnected port + // and our stub throws — exactly matching the UAT error message. + // + // The fix must make this safe. Acceptable strategies: + // - Clear the interval as the FIRST step of the pre-emptive + // reconnect, before .disconnect(). + // - Wrap postMessage in try/catch inside the ping callback. + // - Switch to a per-port closure (capturedPort) so a stale ping + // can detect it's pinging a swapped port. + expect(() => pingCb()).not.toThrow(); + + // Restore globals. + globalThis.setInterval = originalSetInterval; + globalThis.clearInterval = originalClearInterval; + globalThis.setTimeout = originalSetTimeout; + globalThis.clearTimeout = originalClearTimeout; + }); +});