fix(01-review): CR-01+CR-02+CR-03+WR-03+WR-09 critical port + handshake race fixes

What was wrong:
- CR-01 (recorder.ts): encodeAndSendBuffer captured no port identity before
  awaiting Promise.all(blobToBase64). If the port disconnected mid-encode
  and onDisconnect synchronously reconnected (re-assigning keepalivePort
  to a fresh instance), the post-await null-check evaluated false and
  the BUFFER was posted on the NEW port — but the SW's per-request
  onMessage listener was still bound to the OLD port (captured at
  getVideoBufferFromOffscreen line 110). Result: SW timed out after
  2 s, SAVE_ARCHIVE produced an empty-segments zip, data-loss path
  masquerading as a benign timeout.
- CR-02 (background/index.ts): SW's onConnect handler attached
  ONLY onDisconnect — no permanent onMessage sink. PING traffic
  had no listener when getVideoBufferFromOffscreen wasn't running
  (the normal idle state of the port), and field reports note Chrome's
  SW idle-timer reset behaves inconsistently when no listener is
  attached. Risk: PINGs silently dropped, SW evicted ~30 s into
  recording, port torn down, next SAVE_ARCHIVE fails entirely.
- CR-03 (background/index.ts): offscreenReady is a one-shot Promise
  resolved on the FIRST OFFSCREEN_READY message. If the SW is evicted
  while the offscreen document persists, the next SW lifetime creates
  a fresh Promise and waits on it forever — the offscreen never
  re-emits OFFSCREEN_READY. startVideoCapture() hangs at
  `await offscreenReady` until Chrome restarts.
- WR-03 (recorder.ts): `baseTimestamp + idx` (Date.now() + idx) used
  millisecond resolution + array offset. Two REQUEST_BUFFER calls
  within the same millisecond would collide, breaking the sort-by-
  timestamp contract in SW-side mergeVideoSegments.
- WR-09 (recorder.ts): encodeAndSendBuffer always appended the
  unfinalized in-flight segment to the BUFFER. That segment lacks
  the Matroska SegmentSize and Cues that MediaRecorder.stop()
  writes — re-introducing the "File ended prematurely" symptom
  documented in debug session webm-playback-freeze.

What changed:
- recorder.ts encodeAndSendBuffer:
  - Capture `portAtRequest = keepalivePort` BEFORE the encode.
  - After the await, refuse to post if `keepalivePort !== portAtRequest`
    (port was replaced by reconnect). SW already times out cleanly
    after BUFFER_FETCH_TIMEOUT_MS = 2 s; the next SAVE_ARCHIVE
    re-issues REQUEST_BUFFER on the fresh port. Stale data
    NEVER reaches a stranger port.
  - Include the in-flight segment ONLY when finalized.length === 0
    (preserve the SAVE-within-first-10-s UX trade-off documented at
    the original comment) — otherwise drop the unfinalized tail.
  - Replace `baseTimestamp + idx` with module-level monotonic
    `++segmentSeq` counter (zero wall-clock dependency).
  - Switch from Promise.all/map+filter to a sequential for-loop
    because each iteration now mutates the shared `segmentSeq`;
    Promise.all timing would interleave assignments. Throughput
    impact negligible (3 segments × ~50 ms base64 each ≈ 150 ms
    vs ~50 ms parallel — still well under the 2 s SW budget).

- background/index.ts onConnect:
  - Install a permanent `port.onMessage.addListener` that
    explicitly drains PING and silently drops unknown traffic.
    Per-request BUFFER listener still wins because it's attached
    LATER in the listener chain when getVideoBufferFromOffscreen
    fires; this sink only catches the idle PING stream and
    guarantees the SW idle-timer reset is consumed by a real
    handler.

- background/index.ts initialize():
  - When `chrome.offscreen.hasDocument()` returns true on SW init,
    immediately resolve `offscreenReady` AND null
    `offscreenReadyResolve`. The offscreen MUST have completed its
    bootstrap before it was observable via hasDocument(); waiting
    for an OFFSCREEN_READY that will never come is a deadlock.

Why these fixes vs alternatives:
- CR-01: alternatives considered: (a) cancel encoding when port
  disconnects (requires AbortController plumbing into blobToBase64);
  (b) re-route the BUFFER through the new port via a per-port
  request-id correlation. Both add machinery for a case the SW
  already handles correctly (2 s timeout → retry). The capture-
  identity check is the minimum-mechanism fix and matches REVIEW.md
  CR-01 fix guidance exactly.
- CR-02: alternative considered: documenting "rely on kernel-level
  port-message side effect for idle-timer reset" — REJECTED, this
  is what the existing comment did and the field evidence shows
  it's unreliable. Explicit listener is the safe default.
- CR-03: alternative considered: (a) have offscreen re-emit
  OFFSCREEN_READY on every inbound SW message — adds noise to
  the message bus and races with the original-bootstrap-emit.
  Option (b) (resolve-on-hasDocument-true) is simpler, narrower,
  and was explicitly recommended by REVIEW.md.
- WR-03: alternative considered: keeping Date.now() and adding a
  microsecond-resolution offset via performance.now() — fragile
  across SW respawns where performance.now() resets. Module-level
  monotonic counter has zero wall-clock dependency.
- WR-09: alternative considered: forcing a synchronous rotation
  at REQUEST_BUFFER time. Rejected — adds ~50–200 ms latency to
  every save AND races with scheduleRotation()'s timer. The
  "exclude unless empty" trade-off matches REVIEW.md option (a)
  exactly and preserves the documented first-10-s UX path.

Validation evidence:
- npx tsc --noEmit: exit 0 (no type errors).
- npx vitest run --reporter=dot: 30/30 tests pass in 2.67 s
  (8 test files, including port.test.ts which pins the reconnect
  invariant and port-serialization.test.ts which pins the wire format).
- grep "as any\|@ts-ignore" src/offscreen/ src/background/index.ts
  src/shared/: no matches (type-safety gate stays clean).
This commit is contained in:
2026-05-16 09:21:34 +02:00
parent bf00bf65aa
commit 2e3f5248ce
2 changed files with 103 additions and 46 deletions

View File

@@ -87,12 +87,34 @@ chrome.runtime.onConnect.addListener((port) => {
}
logger.log('Offscreen port connected');
videoPort = port;
// CR-02 fix: install a permanent onMessage sink on every accepted port.
// Chrome 110+ resets the SW idle-timer on any inbound port message, BUT
// in the field, behaviour has been observed to differ subtly when no
// listener is attached at all — Chrome may skip the idle-timer reset
// path entirely on unrouted messages. A no-op listener guarantees the
// PING traffic is consumed and the timer reset is unconditional. The
// per-request listener installed by `getVideoBufferFromOffscreen` still
// handles BUFFER routing; this sink only drains PING and any unknown
// traffic so it doesn't accumulate or surprise us later.
port.onMessage.addListener((msg) => {
if (
typeof msg === 'object' &&
msg !== null &&
(msg as { type?: unknown }).type === 'PING'
) {
// Explicit drain — silences "no listener" semantics in Chrome's
// port-message dispatch and keeps the SW idle-timer reset reliable.
return;
}
// Unknown traffic — drop silently (T-1-04 defense-in-depth).
// BUFFER is routed by the per-request listener in
// getVideoBufferFromOffscreen; that listener fires first when
// attached, so this branch never observes BUFFER in practice.
});
port.onDisconnect.addListener(() => {
logger.log('Offscreen port disconnected; offscreen will reconnect');
videoPort = null;
});
// Inbound traffic is mostly PING (ignored) and BUFFER (handled by the
// per-request listener installed in getVideoBufferFromOffscreen).
});
// 2 s budget covers the worst-case round-trip: offscreen base64-encodes
@@ -517,6 +539,19 @@ async function initialize() {
if (exists) {
offscreenCreated = true;
logger.log('Existing offscreen document detected on SW init');
// CR-03 fix: handshake deadlock after SW respawn.
// OFFSCREEN_READY is fired by the offscreen exactly once at its
// bootstrap (recorder.ts line ~452). If the SW is evicted (~30 s
// idle) while the offscreen document persists, the next SW lifetime
// creates a fresh `offscreenReady` Promise (line 33) and waits on
// it forever — the offscreen has no signal to re-emit, and
// startVideoCapture() hangs at `await offscreenReady`.
// Resolve readiness immediately when we detect a pre-existing
// offscreen: it MUST have completed its bootstrap before being
// observable via hasDocument().
offscreenReadyResolve?.();
offscreenReadyResolve = null;
logger.log('Resolving offscreenReady immediately — offscreen survived a prior SW lifetime');
}
}
} catch (err) {