Files
mokosh/.planning/debug/resolved/empty-archive-port-race.md
Mark f0871c0237 docs(option-c): archive empty-archive-port-race + amend CONTEXT.md D-17 port lifecycle
Two doc updates closing the debug session per the resolved pattern this
phase has established (cf. resolved/d12-blob-port-transfer-fails.md and
resolved/webm-playback-freeze.md):

1. **Move debug session to resolved/** with the Resolution section
   filled in (root_cause, fix, verification, files_changed). Status
   flipped tdd_red_confirmed -> resolved. Original investigation
   notes + bisect results + Option C strategy spec all preserved
   in-place — the file is the full provenance trail.

2. **Amend 01-CONTEXT.md D-17** with the new port lifecycle commitments.
   Append-only (D-17 itself untouched) per the doc cascade rule
   established earlier this phase ("amendments append, do not replace,
   to preserve SPEC provenance"). The amendment narrates:
   - What was Claude's-discretion at Phase 1 plan time has been
     specified by Option C.
   - The 290 s pre-emptive setTimeout reconnect (Pitfall 4) is RETIRED.
   - The architectural commitments added: PING/PONG health probe,
     request-id'd REQUEST_BUFFER/BUFFER, SW retry on port replacement,
     outer 10 s hard-timeout, operator-visible EmptyVideoBufferError
     surface.
   - The 4 pinning contracts added (port-health-probe,
     request-id-protocol, port-lifecycle-continuous, plus the
     refactored port-reconnect-race).

Suite remains 11 files / 53 tests, all GREEN. Quality gates intact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 15:40:02 +02:00

26 KiB
Raw Blame History

slug, status, trigger, created, updated, phase, related_uat, related_review_fix, prior_resolved_sessions
slug status trigger created updated phase related_uat related_review_fix prior_resolved_sessions
empty-archive-port-race resolved Phase 1 UAT Test 3 surfaced a two-headed BLOCKER: (a) Archive shipped via popup save contains NO video/last_30sec.webm (only rrweb/, logs/, screenshot.png, meta.json) — silent data loss defeating the entire phase goal. (b) 3× Uncaught Error: "Attempting to use a disconnected port object" surface in the offscreen console starting at the 290 s pre-emptive port-reconnect mark (recorder.ts line 627), spaced ~25-30 s apart (PORT_PING_MS = 25_000 cadence). Both symptoms cluster around src/offscreen/recorder.ts:597-630 (port reconnect / encodeAndSendBuffer interaction) and src/background/index.ts:130-167, 340-393 (BUFFER round-trip + zip build branch that silently elides the video file on empty segments). 2026-05-16T11:58:51Z 2026-05-16T13:00:00Z 01-stabilize-video-pipeline .planning/phases/01-stabilize-video-pipeline/01-UAT.md .planning/phases/01-stabilize-video-pipeline/01-REVIEW-FIX.md
.planning/debug/resolved/d12-blob-port-transfer-fails.md
.planning/debug/resolved/webm-playback-freeze.md

Debug: Empty-archive + port-reconnect race (Phase 1 BLOCKER)

Symptoms (from UAT Test 3)

Expected behavior: When the operator clicks save in the popup, the produced ZIP archive contains video/last_30sec.webm — a playable WebM with the most recent 30 s of captured screen video derived from D-13 restart-segments ring buffer (3 × 10 s self-contained segments).

Actual behavior:

  1. smoke.sh waited 600 s before detecting a new archive in ~/Downloads.
  2. The detected archive (/home/parf/Downloads/session_report_2026-05-16_13-54-52.zip, 88 254 bytes) contains:
    rrweb/                  (dir, empty)
    rrweb/session.json      (2 bytes — empty JSON)
    logs/                   (dir, empty)
    logs/events.json        (2 bytes — empty JSON)
    screenshot.png          (87 300 bytes)
    meta.json               (336 bytes)
    
    No video/ directory, no last_30sec.webm. Total archive 88 KB vs. expected ~1.5 MB with video.
  3. Offscreen console showed 3× Uncaught Error: Attempting to use a disconnected port object at offscreen-Bp_IKlxL.js:1:4644 starting at exactly 290 s after recording start (timestamps 11:50:10, ~11:50:40, ~11:51:00) — coincides with pre-emptive port reconnect (290 s cap) log line at recorder.ts:627.
  4. Ring buffer itself was healthy throughout — 33+ segments rotated cleanly with kept: 3/3 invariant (sizes 472-799 KB per 10 s segment). The D-13 restart-segments lifecycle is NOT the broken component.

Error messages:

offscreen-Bp_IKlxL.js:1 Uncaught Error: Attempting to use a disconnected port object
    at offscreen-Bp_IKlxL.js:1:4644

And smoke.sh terminal:

caution: filename not matched:  video/last_30sec.webm

Timeline:

  • Bug introduced: not yet bisected. Most plausible window is one of:
    • Plan 01-04 (port keepalive + OFFSCREEN_READY handshake) — added the long-lived port that the pre-emptive reconnect operates on.
    • Plan 01-05 (SW shrink + port host with T-1-04 sender check) — added getVideoBufferFromOffscreen request-response pattern.
    • Review-fix iteration 1 (commit 2e3f524 = CR-01 + CR-02 + CR-03 + WR-03 + WR-09) — CR-01 specifically introduced the "portAtRequest captured before await + refuse-to-post on stale port" pattern in encodeAndSendBuffer that drops responses on the floor without retry.
    • Review-fix iteration 2 sweep #5 (commit 034155b "surface port-replaced-during-fetch diagnostic on buffer timeout") — added diagnostic logging on timeout but no retry.
  • Never worked end-to-end with operator-side save under realistic recording duration. The original Phase 1 closure on 2026-05-15 was confirmed with a fresh fixture from a successful smoke run, but the fixture was regenerated by a smoke session that almost certainly completed before the 290 s pre-emptive reconnect window (smoke required ≥ 35 s recording, not 5+ minutes).

Reproduction:

  1. npm run build → produces dist/
  2. KEEP_PROFILE=0 ./smoke.sh
  3. In Chrome: load dist/ unpacked, click extension icon → picker auto-accepts smoke tab → recording starts (logs confirm).
  4. WAIT past the 290 s mark (e.g., 5+ minutes). Watch the offscreen console: at 290 s see pre-emptive port reconnect, immediately followed by Uncaught Error: Attempting to use a disconnected port object, then 2-3 more such errors at PORT_PING_MS=25_000 intervals.
  5. Click extension icon → "Сохранить отчёт об ошибке". Observe the long wait time (the user saw 600 s — investigate whether multiple clicks were needed or whether saveArchive has its own retry).
  6. Inspect the produced zip: unzip -l ~/Downloads/session_report_*.zip → no video/last_30sec.webm entry.

Current Focus

hypothesis: | Two coupled defects sharing the same root cause (port-reconnect race window in src/offscreen/recorder.ts):

H1 (the noise — Uncaught Errors): The ping setInterval at recorder.ts:623-625 uses keepalivePort?.postMessage({type:'PING'}). Optional chaining only guards against null, not against a connected-but-being-disconnected Port object. When the pre-emptive setTimeout fires at 290 s (recorder.ts:626-630) and calls keepalivePort?.disconnect(), there's a microtask race window before the onDisconnect handler (recorder.ts:616-622) runs teardownPortTimers(). Any ping callback already queued/dispatched in that window throws "Attempting to use a disconnected port object." Two follow-on errors at ~25 s and ~50 s post-reconnect suggest the recursive connectPort() re-installs the ping interval against a NEW port that itself gets disconnected (possibly by a second pre-emptive cycle, OR by the SW idle-unloading), but the error remains because the same race recurs.

H2 (the silent data loss — empty-video archive): The same reconnect-race window leaves REQUEST_BUFFER / BUFFER round-trips vulnerable. When the operator clicks save during or shortly after a reconnect: (a) SW's getVideoBufferFromOffscreen (src/background/index.ts:130-167) sends REQUEST_BUFFER via the OLD port that just disconnected, OR (b) Offscreen's encodeAndSendBuffer (src/offscreen/recorder.ts:522-545, 547-604) captures portAtRequest before its await blobToBase64(...) for ~3 segments (~150 ms), the pre-emptive reconnect fires mid-encode, then the guard at line 597-601 correctly REFUSES to post on the now-stale portAtRequest and just returns. Either way, the SW's per-request listener times out silently after BUFFER_FETCH_TIMEOUT_MS = 2_000, sets videoBufferResponse = { segments: [] }, and createArchive (src/background/index.ts:340-352) takes the empty branch: if (videoBufferResponse.segments.length > 0) { ... } else { logger.warn('✗ No video segments to add'); // only logs } → archive built WITHOUT the video file, shipped to operator silently. The 600 s smoke wait is consistent with the operator clicking save multiple times until one REQUEST_BUFFER round-trip happened to complete on a fresh port — and even then, that single successful trip may itself have caught only one of the 3 segments before the next reconnect (worth checking against actual archive's NULL state vs partial state).

test: | Initial evidence-gathering test (RED): write a vitest spec that drives recorder.ts's port lifecycle through a pre-emptive reconnect while a ping is queued, and asserts NO Uncaught Error escapes. This pins H1. For H2 a second test drives REQUEST_BUFFER → encodeAndSendBuffer → reconnect-mid-await → asserts BUFFER is retried on the fresh port (or that saveArchive surfaces an error instead of silently dropping). expecting: | Initial RED tests fail in current code (proves bugs are reproducible at the unit level). Then the gsd-debugger isolates the precise call sequence + lands the GREEN. next_action: TDD checkpoint — RED tests landed and confirmed failing for the exact UAT error string. Awaiting user confirmation of RED gate before writing GREEN fix. reasoning_checkpoint: "" tdd_checkpoint: | Test file: tests/offscreen/port-reconnect-race.test.ts (3 tests, all RED) Status: RED ✓ (failures confirm both H1 + H2 hypotheses) Baseline: 7 files / 40 tests still GREEN (no regression in pinning contracts) tsc --noEmit: exit 0

Failure outputs: H1 AssertionError: expected [Function] to not throw an error but 'Error: Attempting to use a disconnected port object' was thrown (line 200 — production ping callback throws on disconnected port)

H1.b AssertionError: expected [Function] to not throw an error
     but 'Error: Attempting to use a disconnected port object' was thrown
     (line 419 — pre-emptive reconnect path triggers the same race)

H2   AssertionError: expected 0 to be greater than or equal to 1
     (line 341 — production code silently drops BUFFER when reconnect
     arrives mid-encode; neither old nor new port receives any BUFFER)

Constraints

  • TDD mode is ON (workflow.tdd_mode: true). RED test must land before GREEN fix.
  • Auto-loaded memories: feedback-gsd-ceremony-for-fixes.md (no hot-edits; route through /gsd-debug) and feedback-no-unilateral-scope-reduction.md (no scope narrowing; surface choices via AskUserQuestion).
  • Both fixes must NOT regress D-12 (base64 wire format) or D-13 (restart-segments). Existing tests at tests/offscreen/port-serialization.test.ts and tests/offscreen/segment-rotation.test.ts are the pinning contracts.
  • Existing tests (40 across 7 files) all green; tsc exit 0; build exit 0. Any fix must preserve that baseline.

Files of Interest (preliminary; debugger may expand)

  • src/offscreen/recorder.ts:
    • 522-545: encodeAndSendBuffer (CR-01 portAtRequest guard)
    • 547-604: doEncodeAndSendBuffer (base64 encode loop + WR-03 monotonic seq + stale-port refuse + postMessage)
    • 606-630: connectPort (port lifecycle, ping interval, pre-emptive reconnect setTimeout)
    • 616-622: onDisconnect handler (teardownPortTimers + recursive connectPort)
    • 39: PORT_RECONNECT_MS = 290_000
    • 38: PORT_PING_MS = 25_000
  • src/background/index.ts:
    • 130-167: getVideoBufferFromOffscreen (per-request listener
      • BUFFER_FETCH_TIMEOUT_MS = 2_000)
    • 340-393: createArchive (the silent-empty-video branch at 346-352)
    • 430-493: saveArchive (orchestrator)
  • tests/offscreen/port.test.ts (existing — only covers SW-side disconnect path, NOT pre-emptive reconnect race)
  • tests/offscreen/port-serialization.test.ts (D-12 contract)
  • tests/offscreen/segment-rotation.test.ts (D-13 contract)
  • tests/offscreen/port-reconnect-race.test.ts (NEW — RED gate for this debug session; pins H1 + H1.b + H2 contracts at unit-test level)

Evidence

  • timestamp: 2026-05-16T12:06:29Z — Baseline established. npx vitest run → 7 files passed / 40 tests passed. Confirms a clean RED→GREEN baseline for the new test file.
  • timestamp: 2026-05-16T12:06:45Z — Static analysis of src/offscreen/recorder.ts
    • the minified production bundle dist/assets/offscreen-Bp_IKlxL.js CONFIRMS H1 at the bytecode level: grep -oP '.{60}postMessage.{20}' offscreen-Bp_IKlxL.js yields: connecting"),E(),o=null,O()}),p=setInterval(()=>{o==null||o.postMessage({type:"PING"})},k) This is exactly the pattern at recorder.ts:623-625 — the ping callback uses ?.postMessage whose only safety check is null. The minified column 4644 in the UAT error stack maps to this site: o.postMessage({type:"PING"}) where o is a non-null but disconnected Port. Chrome's documented runtime semantics for Port say postMessage on a disconnected port throws synchronously with the exact error string observed in UAT ("Attempting to use a disconnected port object").
  • timestamp: 2026-05-16T12:07:00Z — Static analysis of the doEncodeAndSendBuffer flow CONFIRMS H2 silent-drop: src/offscreen/recorder.ts:597-601if (keepalivePort !== portAtRequest) { logger.warn(...); return; } On a mid-encode reconnect, keepalivePort is reassigned by the onDisconnect→connectPort chain while portAtRequest still points to the OLD port. The function returns WITHOUT posting anywhere. src/background/index.ts:134-150 — SW per-request listener bound to the captured port (the OLD port). After 2 s (BUFFER_FETCH_TIMEOUT_MS) it resolves { segments: [] }. src/background/index.ts:346-352createArchive branch: if (videoBufferResponse.segments.length > 0) { ... } else { logger.warn('✗ No video segments to add'); } No throw, no surface to operator. Archive ships WITHOUT video.
  • timestamp: 2026-05-16T12:09:30Z — RED tests written and executed: tests/offscreen/port-reconnect-race.test.ts. 3 tests, all RED: H1 ping callback handles a remotely-disconnected port without throwing — FAILS with "Attempting to use a disconnected port object" (matches UAT error string byte-for-byte). H1.b pre-emptive reconnect path — ping does not throw against just-disconnected port — FAILS with same error (proves the 290 s pre-emptive path is the same race). H2 REQUEST_BUFFER mid-reconnect must NOT drop BUFFER when segments exist — FAILS with expected 0 to be greater than or equal to 1. 3 seeded segments + mid-encode reconnect → neither old nor new port receives any BUFFER. Proves the silent-drop is real and reproducible at the unit level.
  • timestamp: 2026-05-16T12:09:40Z — Full-suite check: 8 files (was 7) / 43 tests (was 40) — exactly 3 new RED, 40 baseline still GREEN. No regression. npx tsc --noEmit exit 0 (new test file is type-clean).

Eliminated

  • Not a buffer-emptiness issue. UAT logs show 33+ segments rotated successfully with kept: 3/3 invariant maintained throughout the 600 s session. The ring buffer always had 3 × ~500-800 KB segments available when SAVE_ARCHIVE fired. The data is THERE; the transport fails to deliver it.
  • Not a D-13 segment-lifecycle regression. existing segment-rotation.test.ts contract still passes; the WebM payload format is sound (preserved by the d12 base64 wire-format fix).
  • Not a D-12 wire-format regression. existing port-serialization.test.ts contract still passes; the base64 round- trip is intact. The bug is at the port-lifecycle layer ABOVE the wire format.
  • Not a smoke.sh artifact. the actual zip on disk was inspected by-hand (unzip -l ~/Downloads/session_report_2026-05-16_13-54-52.zip) — the absence of video/last_30sec.webm is on the wire, not in the smoke harness's reporting.
  • Not a JSZip issue. other archive entries (screenshot.png, meta.json) are present and well-formed. JSZip received an archive build with no video entry — because createArchive's silent-skip branch never called zip.file('video/last_30sec.webm', videoBlob).
  • Not a chrome.runtime.sendMessage problem. the bug is on the long-lived chrome.runtime.connect port (offscreen↔SW), not on the popup↔SW sendMessage path. The popup→SW SAVE_ARCHIVE call works fine — it just gets {success: true} back even though the archive is video-less.

Resolution

root_cause: | Two coupled defects sharing a single architectural failure mode.

H1 (Uncaught Errors): The PING setInterval at recorder.ts:623-625 used keepalivePort?.postMessage({type:'PING'}). Optional chaining only guards against null, NOT against a connected-but-being- disconnected Port object. The 290 s pre-emptive setTimeout (recorder.ts:626-630) called keepalivePort?.disconnect(), creating a microtask race window before the onDisconnect handler ran the cleanup. Any ping callback queued in that window threw "Attempting to use a disconnected port object" synchronously.

H2 (silent empty-video archive): The reconnect-race window left REQUEST_BUFFER round-trips vulnerable. The offscreen captured portAtRequest = keepalivePort BEFORE await blobToBase64() (CR-01 pattern); on mid-encode reconnect the post-await guard correctly REFUSED to post on the now-stale portAtRequest. The SW's per- request listener was bound to that same stale port, so it timed out (BUFFER_FETCH_TIMEOUT_MS = 2 s) and resolved {segments: []}. createArchive's branch at index.ts:346-352 silently elided the video file when segments.length === 0 — operator received an 88 KB zip with no last_30sec.webm. Bisect confirmed this silent- skip branch was an UPSTREAM defect (commit 555eb05) that the Plan 01-04 port lifecycle (b064a21) + CR-01 (2e3f524) amplified from latent to fatal.

fix: | Option C (Architectural) — request-id'd port protocol + port-health probe + retry on port replacement + operator-visible error surface.

  1. RETIRED the 290 s pre-emptive setTimeout reconnect. Replaced with PONG-based health probe: each PING expects a PONG echo from the SW; the offscreen counts missedPongs and reconnects via clean teardown when MAX_MISSED_PONGS=3 exceeded.

  2. H1 fix: wrapped PING postMessage in try/catch. On synchronous throw, route through reconnectPort() — no more Uncaught Error bubbles out to the offscreen console.

  3. Request-id'd REQUEST_BUFFER / BUFFER: SW generates a uuid per request (crypto.randomUUID), sends with {type:'REQUEST_BUFFER', requestId}. Offscreen echoes the same requestId on the BUFFER response. SW routes via module-level pendingBufferRequests Map keyed by requestId — port-agnostic, so port replacement does NOT lose the response.

  4. SW retry on port replacement: every onConnect post-bootstrap scans pendingBufferRequests and re-issues REQUEST_BUFFER on the fresh port with the SAME requestId. The offscreen posts BUFFER on the CURRENT keepalivePort (no more stale-port refuse-to-post). This retires the H2 silent-drop class architecturally.

  5. createArchive throws EmptyVideoBufferError on empty video buffer. saveArchive catches and emits RECORDING_ERROR to the popup AND returns {success:false, error} — operator sees a visible failure instead of silently shipping a video-less zip.

  6. PING → PONG echo: SW replies to every PING with PONG, closing the offscreen's health-probe loop.

Outer BUFFER_FETCH_TIMEOUT_MS bumped 2 s → 10 s to cover retries across multiple port replacements (inner round-trip still ~100-200 ms).

verification: | All 12 RED-gate tests across 3 files flipped GREEN: - tests/offscreen/port-reconnect-race.test.ts (H1 + H1.b + H2) - tests/offscreen/port-health-probe.test.ts (A, B, C, D — health probe + request-id echo) - tests/background/request-id-protocol.test.ts (1, 2, 3, 4, 5 — request-id protocol + retry + error surface + PONG echo)

New continuous-end-to-end pinning contract in tests/background/port-lifecycle-continuous.test.ts covers 600 s of port lifecycle (12 ping cycles + 2 forced reconnects + 3 SAVE_ARCHIVE round-trips). Asserts no Uncaught + every BUFFER carries segments + PONGs round-trip.

Pinning contracts preserved with no regression: - tests/offscreen/port-serialization.test.ts (D-12): GREEN - tests/offscreen/segment-rotation.test.ts (D-13): GREEN - tests/offscreen/webm-playback.test.ts (A3 ffmpeg dry-run): GREEN

Final suite: 11 files / 53 tests, all GREEN. Quality gates: npx tsc --noEmit exit 0; grep "as any|@ts-ignore" clean across src/offscreen + src/background/index.ts + src/shared; npm run build exit 0 (5.71 kB offscreen, 73.79 kB main).

CONTEXT.md D-17 amended (append, not replace) with the new port lifecycle commitments. Original D-17 left intact.

files_changed:

  • src/shared/types.ts (PortMessage.requestId + 'PONG')
  • src/offscreen/recorder.ts (health probe + request-id'd encode + H1 try/catch + reconnectPort helper)
  • src/background/index.ts (request-id protocol + retry + PONG echo + EmptyVideoBufferError surface + decodeBufferSegments extraction)
  • tests/offscreen/port-reconnect-race.test.ts (H1.b refactored to externally-disconnected path; H2 sends requestId)
  • tests/offscreen/port-health-probe.test.ts (new — 4 contract tests)
  • tests/background/request-id-protocol.test.ts (new — 5 contract tests)
  • tests/background/port-lifecycle-continuous.test.ts (new — 600 s e2e simulation)
  • .planning/phases/01-stabilize-video-pipeline/01-CONTEXT.md (D-17 amendment appended; original D-17 untouched)

Bisect Results (2026-05-16T12:15:00Z)

User chose "Bisect first" from the fix-strategy fork. Targeted git-log scan across the suspect code regions across the full Phase 1 history (555eb05..HEAD):

Symptom Mechanism Introduction commit Subsequent amplification
H1.a ping setInterval with ?.postMessage b064a21 (Plan 01-04, 2026-05-15) none — race exists since this commit
H1.b pre-emptive 290 s reconnect setTimeout b064a21 (Plan 01-04, 2026-05-15) none
H2 createArchive silent-skip on empty 555eb05 (imported broken upstream — predates Phase 1) 2e3f524 CR-01 "refuse-to-post on swapped port" without retry; 034155b sweep #5 added diagnostic on timeout but still no retry

Strategic implications surfaced by bisect:

  1. H2 silent-skip is an UPSTREAM defect in the original "half-broken blob of code" — never on the Phase 1 fix radar because:

    • Original 22-defect audit didn't flag it (it's a "what happens if a downstream returns 0?" question that needs a real failure mode to surface)
    • REVIEW.md's 18 findings didn't catch it
    • Sweep #5 (commit 034155b "surface port-replaced-during-fetch diagnostic on buffer timeout") added a log on timeout but kept the silent-skip behaviour — the diagnostic logs went to the SW console, but the operator-facing failure path stayed silent.
  2. H1 port race latency-amplifies H2 from latent to fatal. Before Plan 01-04 there was no port → no reconnect → silent-skip never fired in practice. After 01-04 the 290 s pre-emptive reconnect creates a guaranteed-recurring window where REQUEST_BUFFER round-trips can fail; after CR-01 the "refuse to post on swapped port" pattern makes the silent-skip fire with near-100 % probability whenever a save is clicked across a reconnect window.

  3. Option A (target both symptoms with retry + try/catch) is sufficient to make the visible symptoms go away, but leaves the upstream silent-skip pattern intact — a class of latent failures remains. Option B (which converts silent-skip into operator- visible error) directly retires the upstream class. Option C (request-id'd round-trip protocol) makes port-replacement irrelevant by architecture but is the largest blast radius.

The bisect strengthens the case for B (it's not just symptom-control; it retires the entire shape of bug, including the upstream one). However, A is still defensible if treated as a pure hotfix to ship Phase 1 and the silent-skip fix is consciously deferred to a follow-up plan. C is justifiable if the user wants Phase 1's port lifecycle to match the durability of the D-12/D-13 contracts before Phase 2 layers DOM capture on top.

(Updated by orchestrator after bisect; user decides next.)

Fix Strategy Chosen: Option C (Architectural)

User decided (post-bisect): Option C — Architectural port lifecycle refactor.

Rationale: bisect shows the current port lifecycle's race window is the proximate cause of every H1 error AND the trigger for the upstream H2 silent-skip class. Option C makes port-replacement architecturally irrelevant by:

  1. Replace pre-emptive 290 s setTimeout reconnect with port-health probe (ping/pong protocol; offscreen explicitly waits for SW echo before considering port live; on any failure → clean teardown + reconnect).
  2. Switch REQUEST_BUFFER / BUFFER to request-id'd pattern: SW sends {type:'REQUEST_BUFFER', requestId: uuid}; offscreen responds with {type:'BUFFER', requestId, segments}; SW only accepts responses matching its current pending requestId. SW retries on port replacement until success or hard outer timeout.
  3. Continuous end-to-end vitest covering 600 s of port lifecycle (2 reconnects + simulated REQUEST_BUFFER round-trips). Becomes the new pinning contract for the port lifecycle.
  4. Keep the existing operator-visible safety from Option B: even in the new design, if a hard outer timeout fires, createArchive must throw → popup surfaces RECORDING_ERROR to operator instead of silently shipping a video-less archive.

Touches D-17 port-lifecycle contract (CONTEXT.md will be amended). Expected surface: ~150-200 LOC + 2-3 new test files. Pre-existing 40 GREEN tests + 3 new RED tests must all flip GREEN at completion.

Constraints (re-iterated for the executor):

  • TDD: each new behaviour gets a RED test BEFORE GREEN implementation.
  • D-12 (base64 wire format) and D-13 (restart-segments) contracts must not regress. Existing pinning tests at tests/offscreen/port-serialization.test.ts and tests/offscreen/segment-rotation.test.ts are the canaries.
  • npx tsc --noEmit exit 0, npm run build exit 0, type-safety grep clean must all hold at every commit.
  • tests/fixtures/last_30sec.webm is currently dirty in the working tree (operator's recent smoke session staged a new file at that path). Do NOT commit changes to that file — the committed D-13 fixture is the canonical contract.

(Strategy decided 2026-05-16T12:18:00Z; executor dispatch follows.)