feat(option-c-offscreen): port health probe + request-id'd BUFFER + H1 try/catch
Implements the offscreen-side architectural refactor per
.planning/debug/empty-archive-port-race.md "Fix Strategy: Option C":
1. **Retired** the 290_000 ms pre-emptive reconnect setTimeout. Its race
window between the synchronous .disconnect() and the onDisconnect
handler running was the bisect-confirmed proximate cause of the H1
"Attempting to use a disconnected port object" Uncaught Errors.
2. **Added** PONG-based health probe: each ping increments missedPongs;
if MAX_MISSED_PONGS (3) consecutive PINGs go without echo, reconnect
via the same clean teardown path the onDisconnect handler uses.
PONG receipt resets the counter. Liveness-based replacement for the
time-based pre-emptive rotation.
3. **H1 fix** — wrap PING postMessage in try/catch. The port object can
transition to disconnected synchronously (SW eviction, port glitch)
between the interval-callback being queued and it running. The catch
absorbs the throw and routes through reconnectPort() — no more
uncaught throws bubble out to the offscreen console.
4. **Request-id'd protocol** — REQUEST_BUFFER carries the SW-generated
requestId; BUFFER response echoes it. The offscreen now posts on the
CURRENT keepalivePort (no more portAtRequest stale-port refuse-to-
post). The SW matches BUFFER → request by id, so port replacement
mid-encode no longer drops the response — the SW retries on the new
port and the matching BUFFER routes correctly.
5. **reconnectPort(reason)** — new helper consolidating the
teardown+disconnect+reconnect dance used by both the missed-PONG
path and the synchronous-throw path. Idempotent w.r.t. the chained
onDisconnect callback.
Test updates:
- H2 now sends REQUEST_BUFFER with a requestId (Option C contract).
- H1.b refactored to test the externally-disconnected path (since the
pre-emptive timeout path is gone): port._disconnected=true, fire
ping, assert no throw + a fresh port appears.
- Top-level snapshots of timer globals + afterEach restoration so a
failing test doesn't leak overridden globals into the next test.
Status: 48 GREEN, 4 RED (the remaining RED is all SW-side — addressed
in next commit). All H1 + H1.b + H2 contracts now GREEN. Pinning
contracts (D-12 port-serialization, D-13 segment-rotation, A3 webm-
playback) untouched. tsc --noEmit exit 0; type-safety grep clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -36,7 +36,15 @@ const VIDEO_BITRATE = 400_000; // CON-video-codec
|
||||
// контракт ротации.
|
||||
const PORT_NAME = 'video-keepalive'; // long-lived port name (D-17, Pattern 5)
|
||||
const PORT_PING_MS = 25_000; // < 30 s SW idle threshold
|
||||
const PORT_RECONNECT_MS = 290_000; // pre-empt the ~5 min port cap (Pitfall 4)
|
||||
// Option C health-probe: how many consecutive PINGs may go unanswered
|
||||
// (no PONG echo) before we declare the port dead and reconnect. With
|
||||
// PORT_PING_MS = 25_000 ms, 3 missed PONGs = 75 s of unresponsive SW —
|
||||
// well past the ~30 s Chrome SW idle threshold, so a real disconnect
|
||||
// already would have fired its onDisconnect handler by then. This
|
||||
// threshold catches the failure mode the old 290 s pre-emptive setTimeout
|
||||
// pretended to: the SW is up but the port has gone half-open.
|
||||
// See .planning/debug/empty-archive-port-race.md (Fix Strategy: Option C).
|
||||
const MAX_MISSED_PONGS = 3;
|
||||
|
||||
const logger = new OffscreenLogger('Recorder');
|
||||
|
||||
@@ -466,29 +474,56 @@ function isFromOwnExtension(sender: chrome.runtime.MessageSender | undefined): b
|
||||
return sender?.id === chrome.runtime.id;
|
||||
}
|
||||
|
||||
// Stable handles for the ping interval and the pre-emptive reconnect timer,
|
||||
// so we can clear them on disconnect / re-init.
|
||||
// Stable handle for the ping interval. The legacy 290 s pre-emptive
|
||||
// reconnect setTimeout was retired in the Option C refactor — its race
|
||||
// window between the synchronous .disconnect() and the onDisconnect
|
||||
// handler running was the proximate cause of the H1 "Attempting to use a
|
||||
// disconnected port object" Uncaught Errors per the bisect notes in
|
||||
// .planning/debug/empty-archive-port-race.md.
|
||||
let pingIntervalId: ReturnType<typeof setInterval> | null = null;
|
||||
let preemptiveReconnectId: ReturnType<typeof setTimeout> | null = null;
|
||||
|
||||
// Option C health-probe state. The offscreen pings every PORT_PING_MS and
|
||||
// expects a PONG echo from the SW. If MAX_MISSED_PONGS consecutive PINGs
|
||||
// go unanswered, the port is considered dead and a clean reconnect is
|
||||
// initiated via the same teardown path the onDisconnect handler uses.
|
||||
// This replaces the time-based pre-emptive reconnect with a liveness-
|
||||
// based one: the port lives until it stops responding, not until a wall
|
||||
// clock says it should rotate.
|
||||
let missedPongs = 0;
|
||||
|
||||
function teardownPortTimers(): void {
|
||||
if (pingIntervalId !== null) {
|
||||
clearInterval(pingIntervalId);
|
||||
pingIntervalId = null;
|
||||
}
|
||||
if (preemptiveReconnectId !== null) {
|
||||
clearTimeout(preemptiveReconnectId);
|
||||
preemptiveReconnectId = null;
|
||||
}
|
||||
}
|
||||
|
||||
function onPortMessage(message: unknown): void {
|
||||
// Defense-in-depth: explicit shape check before destructuring (T-1-04)
|
||||
// Defense-in-depth: explicit shape check before destructuring (T-1-04).
|
||||
if (typeof message !== 'object' || message === null) {
|
||||
return;
|
||||
}
|
||||
const type = (message as { type?: unknown }).type;
|
||||
if (type === 'PONG') {
|
||||
// Health-probe echo. Resetting missedPongs to 0 closes the probe loop
|
||||
// — the port is demonstrably alive.
|
||||
missedPongs = 0;
|
||||
return;
|
||||
}
|
||||
if (type === 'REQUEST_BUFFER') {
|
||||
// Option C: REQUEST_BUFFER carries a requestId the SW generated per
|
||||
// call. We echo it on the BUFFER response so the SW's per-request
|
||||
// listener can match by id (preventing cross-talk between concurrent
|
||||
// requests and between retries on port replacement).
|
||||
const requestId = (message as { requestId?: unknown }).requestId;
|
||||
if (typeof requestId !== 'string' || requestId.length === 0) {
|
||||
// Defense-in-depth: a REQUEST_BUFFER without a valid id is invalid
|
||||
// under the Option C protocol — drop with a warn. Legacy callers
|
||||
// are absent from the codebase; this is purely a guard against
|
||||
// future regressions or hostile peers.
|
||||
logger.warn('REQUEST_BUFFER without a valid requestId — dropping');
|
||||
return;
|
||||
}
|
||||
// Fire-and-forget: the chrome.runtime.Port.onMessage listener API
|
||||
// ignores the return value. We do the async base64 encoding inside
|
||||
// an IIFE so the listener stays synchronous-typed (void return).
|
||||
@@ -497,7 +532,7 @@ function onPortMessage(message: unknown): void {
|
||||
// because chrome.runtime.Port JSON-serializes across extension contexts,
|
||||
// and JSON.stringify(blob) === "{}". See src/shared/binary.ts and
|
||||
// tests/offscreen/port-serialization.test.ts for the contract.
|
||||
void encodeAndSendBuffer();
|
||||
void encodeAndSendBuffer(requestId);
|
||||
}
|
||||
// Any unknown port message type is silently dropped (T-1-04 defense-in-depth).
|
||||
}
|
||||
@@ -519,32 +554,24 @@ function onPortMessage(message: unknown): void {
|
||||
// post-completion state.
|
||||
let encodeInFlight = false;
|
||||
|
||||
async function encodeAndSendBuffer(): Promise<void> {
|
||||
async function encodeAndSendBuffer(requestId: string): Promise<void> {
|
||||
if (encodeInFlight) {
|
||||
logger.warn('encodeAndSendBuffer already running — dropping concurrent call');
|
||||
return;
|
||||
}
|
||||
// CR-01 fix: capture the port identity BEFORE the await. If `keepalivePort`
|
||||
// is replaced by a fresh reconnect during base64 encoding, posting on the
|
||||
// new port would silently leak the BUFFER to a stranger — the SW's
|
||||
// per-request listener is still bound to the OLD port. The SW already
|
||||
// times out cleanly (BUFFER_FETCH_TIMEOUT_MS = 2 s), so dropping a stale
|
||||
// response on the floor is the correct behaviour: the next SAVE_ARCHIVE
|
||||
// round-trip will REQUEST_BUFFER on the fresh port.
|
||||
const portAtRequest = keepalivePort;
|
||||
if (portAtRequest === null) {
|
||||
if (keepalivePort === null) {
|
||||
logger.warn('encodeAndSendBuffer called without an active port — drop');
|
||||
return;
|
||||
}
|
||||
encodeInFlight = true;
|
||||
try {
|
||||
await doEncodeAndSendBuffer(portAtRequest);
|
||||
await doEncodeAndSendBuffer(requestId);
|
||||
} finally {
|
||||
encodeInFlight = false;
|
||||
}
|
||||
}
|
||||
|
||||
async function doEncodeAndSendBuffer(portAtRequest: chrome.runtime.Port): Promise<void> {
|
||||
async function doEncodeAndSendBuffer(requestId: string): Promise<void> {
|
||||
// WR-09 fix: an in-flight segment lacks the Matroska finalization that
|
||||
// MediaRecorder.stop() performs (SegmentSize, Cues) — splicing it onto
|
||||
// a finalized tail re-introduces the "File ended prematurely" symptom
|
||||
@@ -589,18 +616,62 @@ async function doEncodeAndSendBuffer(portAtRequest: chrome.runtime.Port): Promis
|
||||
);
|
||||
}
|
||||
}
|
||||
// After the await: refuse to post on a port that has been REPLACED
|
||||
// by reconnect. The SW listens on the OLD port; posting on the NEW
|
||||
// port would silently lose the data. Letting the SW time out is
|
||||
// correct — the next SAVE_ARCHIVE will re-issue REQUEST_BUFFER on
|
||||
// the fresh port.
|
||||
if (keepalivePort !== portAtRequest) {
|
||||
// After the await: post the BUFFER on the CURRENT keepalivePort. Under
|
||||
// the Option C protocol the BUFFER carries the requestId the SW issued,
|
||||
// so even if the port was replaced mid-encode (and the SW's matching
|
||||
// listener migrated to the new port via the SW-side retry — see
|
||||
// src/background/index.ts:getVideoBufferFromOffscreen), the BUFFER
|
||||
// routes correctly by id. This retires the H2 silent-drop class that
|
||||
// the bisect (.planning/debug/empty-archive-port-race.md) traced to
|
||||
// the CR-01 "refuse-to-post on swapped port" pattern.
|
||||
//
|
||||
// Edge case: if the encode took so long that keepalivePort is now null
|
||||
// (e.g. mid-reconnect with no fresh port yet) we drop with a warn.
|
||||
// The SW's outer timeout will eventually fire and surface the error to
|
||||
// the operator (no more silent video-less archive).
|
||||
if (keepalivePort === null) {
|
||||
logger.warn(
|
||||
'port replaced during encode; dropping BUFFER response (SW will time out and retry)',
|
||||
'port unavailable at post time; dropping BUFFER (SW outer timeout will surface)',
|
||||
);
|
||||
return;
|
||||
}
|
||||
portAtRequest.postMessage({ type: 'BUFFER', segments: transferred });
|
||||
try {
|
||||
keepalivePort.postMessage({
|
||||
type: 'BUFFER',
|
||||
requestId,
|
||||
segments: transferred,
|
||||
});
|
||||
} catch (err) {
|
||||
// The current port may have been disconnected between the null-check
|
||||
// above and postMessage. Treat as a dropped response — the SW retries
|
||||
// on the fresh port.
|
||||
logger.warn('BUFFER post failed (port disconnected mid-post):', err);
|
||||
}
|
||||
}
|
||||
|
||||
// Triggers a clean reconnect: tear down timers, disconnect the current
|
||||
// port (if any), null the reference, and re-enter connectPort(). The
|
||||
// disconnect call fires the onDisconnect handler on the local side too
|
||||
// (Chrome's documented behaviour for explicit .disconnect()), and that
|
||||
// handler also calls connectPort() — but the missedPongs reset + the
|
||||
// teardownPortTimers idempotency keep the second pass a no-op.
|
||||
function reconnectPort(reason: string): void {
|
||||
logger.warn('Initiating port reconnect:', reason);
|
||||
teardownPortTimers();
|
||||
const stale = keepalivePort;
|
||||
keepalivePort = null;
|
||||
missedPongs = 0;
|
||||
if (stale !== null) {
|
||||
try {
|
||||
stale.disconnect();
|
||||
} catch (err) {
|
||||
// Disconnecting an already-disconnected port throws on some Chrome
|
||||
// versions; silently ignored — the local-side reference has already
|
||||
// been cleared so no further use is possible.
|
||||
logger.warn('stale.disconnect() during reconnect failed:', err);
|
||||
}
|
||||
}
|
||||
connectPort();
|
||||
}
|
||||
|
||||
function connectPort(): void {
|
||||
@@ -612,22 +683,47 @@ function connectPort(): void {
|
||||
keepalivePort = null;
|
||||
return;
|
||||
}
|
||||
missedPongs = 0;
|
||||
keepalivePort.onMessage.addListener(onPortMessage);
|
||||
keepalivePort.onDisconnect.addListener(() => {
|
||||
logger.warn('port disconnected — reconnecting');
|
||||
teardownPortTimers();
|
||||
keepalivePort = null;
|
||||
// Synchronous reconnect — tests/offscreen/port.test.ts pins this
|
||||
missedPongs = 0;
|
||||
// Synchronous reconnect — tests/offscreen/port.test.ts pins this.
|
||||
connectPort();
|
||||
});
|
||||
pingIntervalId = setInterval(() => {
|
||||
keepalivePort?.postMessage({ type: 'PING' });
|
||||
// H1 fix: wrap postMessage in try/catch. The port object can become
|
||||
// disconnected between the interval-callback being queued and it
|
||||
// running (race window between local .disconnect() and onDisconnect
|
||||
// firing). Chrome's documented runtime semantics make postMessage
|
||||
// throw synchronously with "Attempting to use a disconnected port
|
||||
// object". Without the catch, that became an Uncaught Error in the
|
||||
// offscreen console (3x at 290 s in UAT Test 3).
|
||||
//
|
||||
// Health probe: each ping costs us one missed-PONG slot. The SW
|
||||
// echoes PONG on PING (src/background/index.ts), and onPortMessage
|
||||
// resets missedPongs to 0 on receipt. If MAX_MISSED_PONGS pings go
|
||||
// without a PONG, the SW (or the port) is unresponsive and we
|
||||
// reconnect cleanly.
|
||||
if (keepalivePort === null) {
|
||||
return;
|
||||
}
|
||||
missedPongs++;
|
||||
if (missedPongs > MAX_MISSED_PONGS) {
|
||||
reconnectPort(`missed ${missedPongs} PONGs — port unresponsive`);
|
||||
return;
|
||||
}
|
||||
try {
|
||||
keepalivePort.postMessage({ type: 'PING' });
|
||||
} catch (err) {
|
||||
// The port went disconnected synchronously — initiate the same
|
||||
// clean reconnect path. The throw caught here is exactly the H1
|
||||
// "Attempting to use a disconnected port object" surface.
|
||||
reconnectPort(`PING postMessage threw: ${String(err)}`);
|
||||
}
|
||||
}, PORT_PING_MS);
|
||||
preemptiveReconnectId = setTimeout(() => {
|
||||
logger.log('pre-emptive port reconnect (290 s cap)');
|
||||
keepalivePort?.disconnect();
|
||||
// onDisconnect handler above triggers a fresh connectPort() call
|
||||
}, PORT_RECONNECT_MS);
|
||||
}
|
||||
|
||||
// Бутстрап (handshake + port lifecycle).
|
||||
|
||||
Reference in New Issue
Block a user