Implements the offscreen-side architectural refactor per
.planning/debug/empty-archive-port-race.md "Fix Strategy: Option C":
1. **Retired** the 290_000 ms pre-emptive reconnect setTimeout. Its race
window between the synchronous .disconnect() and the onDisconnect
handler running was the bisect-confirmed proximate cause of the H1
"Attempting to use a disconnected port object" Uncaught Errors.
2. **Added** PONG-based health probe: each ping increments missedPongs;
if MAX_MISSED_PONGS (3) consecutive PINGs go without echo, reconnect
via the same clean teardown path the onDisconnect handler uses.
PONG receipt resets the counter. Liveness-based replacement for the
time-based pre-emptive rotation.
3. **H1 fix** — wrap PING postMessage in try/catch. The port object can
transition to disconnected synchronously (SW eviction, port glitch)
between the interval-callback being queued and it running. The catch
absorbs the throw and routes through reconnectPort() — no more
uncaught throws bubble out to the offscreen console.
4. **Request-id'd protocol** — REQUEST_BUFFER carries the SW-generated
requestId; BUFFER response echoes it. The offscreen now posts on the
CURRENT keepalivePort (no more portAtRequest stale-port refuse-to-
post). The SW matches BUFFER → request by id, so port replacement
mid-encode no longer drops the response — the SW retries on the new
port and the matching BUFFER routes correctly.
5. **reconnectPort(reason)** — new helper consolidating the
teardown+disconnect+reconnect dance used by both the missed-PONG
path and the synchronous-throw path. Idempotent w.r.t. the chained
onDisconnect callback.
Test updates:
- H2 now sends REQUEST_BUFFER with a requestId (Option C contract).
- H1.b refactored to test the externally-disconnected path (since the
pre-emptive timeout path is gone): port._disconnected=true, fire
ping, assert no throw + a fresh port appears.
- Top-level snapshots of timer globals + afterEach restoration so a
failing test doesn't leak overridden globals into the next test.
Status: 48 GREEN, 4 RED (the remaining RED is all SW-side — addressed
in next commit). All H1 + H1.b + H2 contracts now GREEN. Pinning
contracts (D-12 port-serialization, D-13 segment-rotation, A3 webm-
playback) untouched. tsc --noEmit exit 0; type-safety grep clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
What was wrong:
- CR-01 (recorder.ts): encodeAndSendBuffer captured no port identity before
awaiting Promise.all(blobToBase64). If the port disconnected mid-encode
and onDisconnect synchronously reconnected (re-assigning keepalivePort
to a fresh instance), the post-await null-check evaluated false and
the BUFFER was posted on the NEW port — but the SW's per-request
onMessage listener was still bound to the OLD port (captured at
getVideoBufferFromOffscreen line 110). Result: SW timed out after
2 s, SAVE_ARCHIVE produced an empty-segments zip, data-loss path
masquerading as a benign timeout.
- CR-02 (background/index.ts): SW's onConnect handler attached
ONLY onDisconnect — no permanent onMessage sink. PING traffic
had no listener when getVideoBufferFromOffscreen wasn't running
(the normal idle state of the port), and field reports note Chrome's
SW idle-timer reset behaves inconsistently when no listener is
attached. Risk: PINGs silently dropped, SW evicted ~30 s into
recording, port torn down, next SAVE_ARCHIVE fails entirely.
- CR-03 (background/index.ts): offscreenReady is a one-shot Promise
resolved on the FIRST OFFSCREEN_READY message. If the SW is evicted
while the offscreen document persists, the next SW lifetime creates
a fresh Promise and waits on it forever — the offscreen never
re-emits OFFSCREEN_READY. startVideoCapture() hangs at
`await offscreenReady` until Chrome restarts.
- WR-03 (recorder.ts): `baseTimestamp + idx` (Date.now() + idx) used
millisecond resolution + array offset. Two REQUEST_BUFFER calls
within the same millisecond would collide, breaking the sort-by-
timestamp contract in SW-side mergeVideoSegments.
- WR-09 (recorder.ts): encodeAndSendBuffer always appended the
unfinalized in-flight segment to the BUFFER. That segment lacks
the Matroska SegmentSize and Cues that MediaRecorder.stop()
writes — re-introducing the "File ended prematurely" symptom
documented in debug session webm-playback-freeze.
What changed:
- recorder.ts encodeAndSendBuffer:
- Capture `portAtRequest = keepalivePort` BEFORE the encode.
- After the await, refuse to post if `keepalivePort !== portAtRequest`
(port was replaced by reconnect). SW already times out cleanly
after BUFFER_FETCH_TIMEOUT_MS = 2 s; the next SAVE_ARCHIVE
re-issues REQUEST_BUFFER on the fresh port. Stale data
NEVER reaches a stranger port.
- Include the in-flight segment ONLY when finalized.length === 0
(preserve the SAVE-within-first-10-s UX trade-off documented at
the original comment) — otherwise drop the unfinalized tail.
- Replace `baseTimestamp + idx` with module-level monotonic
`++segmentSeq` counter (zero wall-clock dependency).
- Switch from Promise.all/map+filter to a sequential for-loop
because each iteration now mutates the shared `segmentSeq`;
Promise.all timing would interleave assignments. Throughput
impact negligible (3 segments × ~50 ms base64 each ≈ 150 ms
vs ~50 ms parallel — still well under the 2 s SW budget).
- background/index.ts onConnect:
- Install a permanent `port.onMessage.addListener` that
explicitly drains PING and silently drops unknown traffic.
Per-request BUFFER listener still wins because it's attached
LATER in the listener chain when getVideoBufferFromOffscreen
fires; this sink only catches the idle PING stream and
guarantees the SW idle-timer reset is consumed by a real
handler.
- background/index.ts initialize():
- When `chrome.offscreen.hasDocument()` returns true on SW init,
immediately resolve `offscreenReady` AND null
`offscreenReadyResolve`. The offscreen MUST have completed its
bootstrap before it was observable via hasDocument(); waiting
for an OFFSCREEN_READY that will never come is a deadlock.
Why these fixes vs alternatives:
- CR-01: alternatives considered: (a) cancel encoding when port
disconnects (requires AbortController plumbing into blobToBase64);
(b) re-route the BUFFER through the new port via a per-port
request-id correlation. Both add machinery for a case the SW
already handles correctly (2 s timeout → retry). The capture-
identity check is the minimum-mechanism fix and matches REVIEW.md
CR-01 fix guidance exactly.
- CR-02: alternative considered: documenting "rely on kernel-level
port-message side effect for idle-timer reset" — REJECTED, this
is what the existing comment did and the field evidence shows
it's unreliable. Explicit listener is the safe default.
- CR-03: alternative considered: (a) have offscreen re-emit
OFFSCREEN_READY on every inbound SW message — adds noise to
the message bus and races with the original-bootstrap-emit.
Option (b) (resolve-on-hasDocument-true) is simpler, narrower,
and was explicitly recommended by REVIEW.md.
- WR-03: alternative considered: keeping Date.now() and adding a
microsecond-resolution offset via performance.now() — fragile
across SW respawns where performance.now() resets. Module-level
monotonic counter has zero wall-clock dependency.
- WR-09: alternative considered: forcing a synchronous rotation
at REQUEST_BUFFER time. Rejected — adds ~50–200 ms latency to
every save AND races with scheduleRotation()'s timer. The
"exclude unless empty" trade-off matches REVIEW.md option (a)
exactly and preserves the documented first-10-s UX path.
Validation evidence:
- npx tsc --noEmit: exit 0 (no type errors).
- npx vitest run --reporter=dot: 30/30 tests pass in 2.67 s
(8 test files, including port.test.ts which pins the reconnect
invariant and port-serialization.test.ts which pins the wire format).
- grep "as any\|@ts-ignore" src/offscreen/ src/background/index.ts
src/shared/: no matches (type-safety gate stays clean).
Pure rename pass — zero behavioural change at any call site. The
prior "chunk" naming is a vestige of D-09..D-11's chunk-level
buffer; under D-13 the unit of transfer is a self-contained ~10 s
WebM segment, so the type name now matches the data shape.
Renames propagated atomically:
- src/shared/types.ts
* TransferredVideoChunk → TransferredVideoSegment
(also: the `isFirst?: boolean` field is dropped — D-13 segments
are all implicitly their own header, so the flag is meaningless
and only existed for the retired ring-buffer's pin semantics)
* VideoChunk → VideoSegment (drops `isFirst?` for the same reason)
* PortMessage.chunks? → PortMessage.segments?
* VideoBufferResponse.chunks → VideoBufferResponse.segments
- src/offscreen/recorder.ts
* import type rename
* encodeAndSendBuffer's per-element accumulator + filter type
* the port.postMessage payload field
- src/background/index.ts
* import type rename
* getVideoBufferFromOffscreen reads `(msg as {segments?}).segments`
(matching the new wire field name)
* empty-buffer returns `{ segments: [] }`
* mergeVideoSegments signature takes VideoSegment[]
* createArchive consumes videoBufferResponse.segments
* saveArchive log line says "segments"
Verification:
- npx tsc --noEmit clean.
- npx vitest run → 28 passed / 2 failed (the 2 fixture-empirical
ffmpeg dry-runs; unchanged — they're gated on ./smoke.sh regen).
- npm run build succeeds, all 60 modules transformed.
- grep predicates clean in src/:
* no addChunk / trimAged / firstChunkSaved / isFirst
* no TransferredVideoChunk / VideoChunk (old names)
* 21 occurrences of new names propagated correctly
- Pre-existing port-serialization.test.ts still GREEN: its `isFirst`
references are inside inline-test-scoped objects (not imports
from production types), and its tests assert JSON-roundtrip
behaviour rather than the production type shape.
Replaces the single-continuous-MediaRecorder + age-trim lifecycle
(D-09..D-11, retired) with the restart-segments lifecycle pre-staged
in CONTEXT.md D-13 and RESEARCH.md Pattern 3. Debug session
webm-playback-freeze proved empirically that the prior approach
orphans VP9 P-frame keyframe references when middle chunks are
trimmed — ffmpeg dry-run on last_30sec.webm produced 8 "Error
submitting packet to decoder" lines and the playback froze ~1 s in
Chrome (root-cause-confirmed in the debug session).
Architecture change:
- MediaRecorder is rotated every SEGMENT_DURATION_MS = 10_000 ms on
the SAME mediaStream (so no new getDisplayMedia user gesture).
- Each segment is started fresh, so the encoder seeds an EBML header
+ initial VP9 keyframe — segment is independently decodable.
- Up to MAX_SEGMENTS = 3 finalized segments are retained (3 × 10 s =
30 s, matching the legacy operator-facing window).
- MediaRecorder.start() runs without a timeslice — exactly one
ondataavailable per .stop() yields exactly one Blob per segment.
- onstop → onSegmentStopped finalizes currentChunks, evicts oldest
if over the cap, and immediately starts the next segment.
- ~50–200 ms recording gap at each rotation boundary is the
documented trade-off per CONTEXT.md D-13.
API surface delta:
- REMOVED: addChunk, trimAged, getBuffer, firstChunkSaved
(entire D-09..D-11 ring-buffer module retired)
- ADDED: getSegments, pushSegmentForTest (test seam),
MAX_SEGMENTS, SEGMENT_DURATION_MS exports
- KEPT: assertCodecSupported, resetBuffer (semantics updated to
clear segments + currentChunks + rotation timer),
VIDEO_BUFFER_DURATION_MS (now derived as
MAX_SEGMENTS * SEGMENT_DURATION_MS = 30 s)
- bootstrap(), port lifecycle, OFFSCREEN_READY handshake, T-1-04
sender-id check — all unchanged.
encodeAndSendBuffer iterates segments + the in-flight currentChunks
(if non-empty) so SAVE_ARCHIVE seconds after START_RECORDING still
returns the partial first segment instead of an empty buffer. Each
segment is base64-encoded per the D-12 wire-format contract.
Verification:
- npx vitest run → 28 passed / 2 failed; the 2 failures are exactly
the empirical ffmpeg dry-runs in tests/offscreen/webm-playback.test.ts
which stay RED until the operator regenerates the committed fixture
via ./smoke.sh (expected and documented).
- tests/offscreen/segment-keyframes.test.ts production-driven RED
block is now GREEN — getSegments exists and meets the cap contract.
- tests/offscreen/segment-rotation.test.ts (8 tests, added in the
prior commit) all GREEN.
- npx tsc --noEmit clean.
- Zero new `as any` / `@ts-ignore` introduced.
- Convert each VideoChunk's Blob to a TransferredVideoChunk via
blobToBase64 before keepalivePort.postMessage. JSON.stringify(blob)
=== "{}" across extension contexts, so the previous direct send of
VideoChunk[] was silently corrupting binary content on the wire.
- Move the work into encodeAndSendBuffer() to keep onPortMessage
synchronous-typed (chrome.runtime.Port.onMessage ignores return
values; the listener stays void-returning, the async work is
fire-and-forget).
- Defensive per-chunk encode: log + skip individual encoding failures
rather than crashing the whole BUFFER response. Operators get partial
video > no video.
- Re-check keepalivePort !== null AFTER the await: the port may have
disconnected during encoding (~100 ms for 15 chunks of ~100 KB each
per the d12 sizing estimate).
Refs: debug session d12-blob-port-transfer-fails, D-17 port lifecycle.
- Update module header to list port keepalive + OFFSCREEN_READY among
the module's owned responsibilities (no longer "wired by Plan 04")
- Replace 'Plan 04 owns the ping loop' on PORT_NAME with the actual
D-17 + Pattern 5 citation
- Replace 'Plan 04 fills the lifecycle' on keepalivePort with its
D-17 + Pattern 5 role
Pure comment cleanup — no behavior change. All 9 tests still GREEN.
- Add PORT_PING_MS (25s) and PORT_RECONNECT_MS (290s) constants
- Replace stub bootstrap with full long-lived port lifecycle:
- connectPort() registers onMessage/onDisconnect listeners, schedules
25s PING postMessages and a 290s pre-emptive reconnect (Pitfall 4
belt-and-braces against Chrome's ~5min port lifetime cap)
- onDisconnect handler synchronously calls connectPort() again
(Plan 02 port.test.ts pins this; flips reconnect test to GREEN)
- REQUEST_BUFFER over the port responds with { type: 'BUFFER',
chunks: getBuffer() } (Plan 05 SW-side will issue REQUEST_BUFFER
on export)
- Keep defensive guard on chrome.runtime sub-APIs so pure ring-buffer
and codec-check tests can import the module without a full chrome stub
- Remove the no-longer-needed 'void keepalivePort' workaround (the
variable is now used by onPortMessage + connectPort)
- T-1-04 mitigation: explicit message-shape switch in onPortMessage
(any unknown port message type silently dropped); comment block
documents the SW-side sender.id check contract for Plan 05
GREEN: all 4 test files in tests/offscreen/ pass (9 tests total —
ring-buffer 4 + codec-check 2 + handshake 1 + port 2).
npx tsc --noEmit exits 0. Zero 'as any' / '@ts-ignore' in recorder.ts.
- Add src/offscreen/recorder.ts (214 lines) — Phase 01 source of truth
owning getDisplayMedia capture, MediaRecorder lifecycle, in-memory ring
buffer with WebM header retention + 30 s age trim, codec strict-mode
(D-20), and track.onended cleanup.
- Add src/offscreen/index.html — crxjs-managed bundle entry referencing
./recorder.ts.
- Add OffscreenLogger class to src/shared/logger.ts (uses ...args:
unknown[] for strict-mode hygiene; legacy Logger / ContentLogger keep
...args: any[] per project provenance). Bundled into this commit
because recorder.ts cannot typecheck without the import (Rule 3 —
blocking dependency).
- Pre-stage D-13 restart-segments fallback as commented skeleton at
bottom of recorder.ts so Plan 07's fallback path needs no re-plan.
- Defensive bootstrap (typeof chrome guard) so the pure ring-buffer +
codec tests can import the module without stubbing the full chrome
surface (Rule 3 — Plan 02 ring-buffer test does not stub chrome).
Flips Plan 02's RED tests to GREEN:
- tests/offscreen/ring-buffer.test.ts — 4 tests passing
- tests/offscreen/codec-check.test.ts — 2 tests passing
Handshake test also passes (single OFFSCREEN_READY emission); port
reconnect test stays RED until Plan 04 wires the reconnect loop.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>