feat(option-c-offscreen): port health probe + request-id'd BUFFER + H1 try/catch
Implements the offscreen-side architectural refactor per
.planning/debug/empty-archive-port-race.md "Fix Strategy: Option C":
1. **Retired** the 290_000 ms pre-emptive reconnect setTimeout. Its race
window between the synchronous .disconnect() and the onDisconnect
handler running was the bisect-confirmed proximate cause of the H1
"Attempting to use a disconnected port object" Uncaught Errors.
2. **Added** PONG-based health probe: each ping increments missedPongs;
if MAX_MISSED_PONGS (3) consecutive PINGs go without echo, reconnect
via the same clean teardown path the onDisconnect handler uses.
PONG receipt resets the counter. Liveness-based replacement for the
time-based pre-emptive rotation.
3. **H1 fix** — wrap PING postMessage in try/catch. The port object can
transition to disconnected synchronously (SW eviction, port glitch)
between the interval-callback being queued and it running. The catch
absorbs the throw and routes through reconnectPort() — no more
uncaught throws bubble out to the offscreen console.
4. **Request-id'd protocol** — REQUEST_BUFFER carries the SW-generated
requestId; BUFFER response echoes it. The offscreen now posts on the
CURRENT keepalivePort (no more portAtRequest stale-port refuse-to-
post). The SW matches BUFFER → request by id, so port replacement
mid-encode no longer drops the response — the SW retries on the new
port and the matching BUFFER routes correctly.
5. **reconnectPort(reason)** — new helper consolidating the
teardown+disconnect+reconnect dance used by both the missed-PONG
path and the synchronous-throw path. Idempotent w.r.t. the chained
onDisconnect callback.
Test updates:
- H2 now sends REQUEST_BUFFER with a requestId (Option C contract).
- H1.b refactored to test the externally-disconnected path (since the
pre-emptive timeout path is gone): port._disconnected=true, fire
ping, assert no throw + a fresh port appears.
- Top-level snapshots of timer globals + afterEach restoration so a
failing test doesn't leak overridden globals into the next test.
Status: 48 GREEN, 4 RED (the remaining RED is all SW-side — addressed
in next commit). All H1 + H1.b + H2 contracts now GREEN. Pinning
contracts (D-12 port-serialization, D-13 segment-rotation, A3 webm-
playback) untouched. tsc --noEmit exit 0; type-safety grep clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -25,7 +25,7 @@
|
||||
// tests/offscreen/port.test.ts + handshake.test.ts). No source-code
|
||||
// re-implementation — the production module is what we're testing.
|
||||
|
||||
import { describe, it, expect, vi, beforeEach } from 'vitest';
|
||||
import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest';
|
||||
|
||||
interface PortStub {
|
||||
name: string;
|
||||
@@ -122,21 +122,27 @@ function buildChromeStub(ports: PortStub[]): ChromeStub {
|
||||
};
|
||||
}
|
||||
|
||||
describe('port reconnect race (RED — confirms empty-archive-port-race H1+H2)', () => {
|
||||
let originalSetInterval: typeof globalThis.setInterval;
|
||||
let originalClearInterval: typeof globalThis.clearInterval;
|
||||
let originalSetTimeout: typeof globalThis.setTimeout;
|
||||
let originalClearTimeout: typeof globalThis.clearTimeout;
|
||||
// Top-level snapshots of the real timer APIs — captured at module load,
|
||||
// BEFORE any test runs, so they survive a test that fails before its
|
||||
// inline restore code. The afterEach below restores unconditionally.
|
||||
const ORIGINAL_SET_INTERVAL = globalThis.setInterval;
|
||||
const ORIGINAL_CLEAR_INTERVAL = globalThis.clearInterval;
|
||||
const ORIGINAL_SET_TIMEOUT = globalThis.setTimeout;
|
||||
const ORIGINAL_CLEAR_TIMEOUT = globalThis.clearTimeout;
|
||||
|
||||
describe('port reconnect race (RED — confirms empty-archive-port-race H1+H2)', () => {
|
||||
beforeEach(() => {
|
||||
vi.resetModules();
|
||||
(globalThis as unknown as GlobalWithChrome).MediaRecorder = {
|
||||
isTypeSupported: vi.fn().mockReturnValue(true),
|
||||
};
|
||||
originalSetInterval = globalThis.setInterval;
|
||||
originalClearInterval = globalThis.clearInterval;
|
||||
originalSetTimeout = globalThis.setTimeout;
|
||||
originalClearTimeout = globalThis.clearTimeout;
|
||||
});
|
||||
|
||||
afterEach(() => {
|
||||
globalThis.setInterval = ORIGINAL_SET_INTERVAL;
|
||||
globalThis.clearInterval = ORIGINAL_CLEAR_INTERVAL;
|
||||
globalThis.setTimeout = ORIGINAL_SET_TIMEOUT;
|
||||
globalThis.clearTimeout = ORIGINAL_CLEAR_TIMEOUT;
|
||||
});
|
||||
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
@@ -199,9 +205,6 @@ describe('port reconnect race (RED — confirms empty-archive-port-race H1+H2)',
|
||||
// operator observed in UAT Test 3.
|
||||
expect(() => pingCb()).not.toThrow();
|
||||
|
||||
// Restore globals.
|
||||
globalThis.setInterval = originalSetInterval;
|
||||
globalThis.clearInterval = originalClearInterval;
|
||||
});
|
||||
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
@@ -272,7 +275,16 @@ describe('port reconnect race (RED — confirms empty-archive-port-race H1+H2)',
|
||||
// 2) SW issues REQUEST_BUFFER — synthetically dispatch to the
|
||||
// captured onPortMessage listener. The production code kicks off
|
||||
// encodeAndSendBuffer which awaits 3 blobToBase64 calls.
|
||||
oldPort.onMessage._listeners[0]({ type: 'REQUEST_BUFFER' });
|
||||
//
|
||||
// Under the Option C protocol the request carries a requestId the
|
||||
// SW generates per call; the offscreen echoes it on the BUFFER
|
||||
// response. This test uses a literal id since it's not driving
|
||||
// the SW — only the recorder's onPortMessage path.
|
||||
const requestId = 'h2-test-id';
|
||||
oldPort.onMessage._listeners[0]({
|
||||
type: 'REQUEST_BUFFER',
|
||||
requestId,
|
||||
});
|
||||
|
||||
// 3) Mid-encode, the pre-emptive reconnect fires. Simulate by
|
||||
// firing onDisconnect on the old port — production code's
|
||||
@@ -346,82 +358,49 @@ describe('port reconnect race (RED — confirms empty-archive-port-race H1+H2)',
|
||||
});
|
||||
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
// H1.b — pre-emptive reconnect path specifically. Walk the same
|
||||
// sequence the operator hit at 290 s: the production code calls
|
||||
// `keepalivePort?.disconnect()` and the test then invokes the ping
|
||||
// callback that was installed BEFORE the reconnect. This proves the
|
||||
// race window in the production timer-clear ordering.
|
||||
// H1.b — externally-disconnected port path. The original 290 s
|
||||
// pre-emptive setTimeout is gone (Option C retired it — see the
|
||||
// "Bisect Results" section of empty-archive-port-race.md). What
|
||||
// remains is the same H1 race shape, just triggered by a different
|
||||
// path: the SW side (or Chrome's idle eviction, or a network glitch)
|
||||
// can mark the remote end of the port as disconnected; the local
|
||||
// postMessage then throws synchronously with the same Chrome error
|
||||
// string. The fix's try/catch + reconnect path must absorb that.
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
it('H1.b: pre-emptive reconnect path — ping does not throw against just-disconnected port', async () => {
|
||||
it('H1.b: externally-disconnected port — ping path absorbs the throw and reconnects cleanly', async () => {
|
||||
const ports: PortStub[] = [];
|
||||
const stub = buildChromeStub(ports);
|
||||
(globalThis as unknown as GlobalWithChrome).chrome = stub;
|
||||
|
||||
// Capture both interval callbacks AND timeout callbacks for direct
|
||||
// invocation. We need to fire the pre-emptive setTimeout
|
||||
// synthetically (it would otherwise wait 290 s of wall-clock time).
|
||||
const intervalCallbacks: Array<() => void> = [];
|
||||
const timeoutCallbacks: Array<() => void> = [];
|
||||
|
||||
globalThis.setInterval = ((cb: () => void, _ms: number) => {
|
||||
intervalCallbacks.push(cb);
|
||||
return 100 as unknown as ReturnType<typeof setInterval>;
|
||||
}) as typeof globalThis.setInterval;
|
||||
globalThis.setTimeout = ((cb: () => void, _ms: number) => {
|
||||
timeoutCallbacks.push(cb);
|
||||
return 200 as unknown as ReturnType<typeof setTimeout>;
|
||||
}) as typeof globalThis.setTimeout;
|
||||
// No-op the clear* so the captured callbacks remain callable —
|
||||
// surfaces the race-window assumption in the production code.
|
||||
globalThis.clearInterval = (() => {}) as typeof globalThis.clearInterval;
|
||||
globalThis.clearTimeout = (() => {}) as typeof globalThis.clearTimeout;
|
||||
|
||||
await import('../../src/offscreen/recorder');
|
||||
|
||||
expect(ports.length).toBe(1);
|
||||
const port = ports[0];
|
||||
|
||||
// intervalCallbacks[0] is the ping callback (first setInterval call).
|
||||
// timeoutCallbacks contains at least the pre-emptive reconnect timer
|
||||
// (recorder.ts:626). Other setTimeouts may exist (e.g. queueMicrotask
|
||||
// shims, scheduleRotation — but the latter is gated on mediaStream
|
||||
// which is null in this test).
|
||||
expect(intervalCallbacks.length).toBeGreaterThanOrEqual(1);
|
||||
expect(timeoutCallbacks.length).toBeGreaterThanOrEqual(1);
|
||||
|
||||
const oldPort = ports[0];
|
||||
const pingCb = intervalCallbacks[0];
|
||||
// The pre-emptive reconnect timeout is the only setTimeout the
|
||||
// bootstrap installs (rotation is gated on mediaStream). Pick the
|
||||
// first to keep the test resilient.
|
||||
const preemptiveReconnect = timeoutCallbacks[0];
|
||||
|
||||
// 1) Fire pre-emptive reconnect — the production code calls
|
||||
// keepalivePort?.disconnect(). Our stub flips _disconnected=true
|
||||
// synchronously (matching Chrome's local disconnect semantics).
|
||||
preemptiveReconnect();
|
||||
expect(port._disconnected).toBe(true);
|
||||
expect(port.disconnect).toHaveBeenCalled();
|
||||
// Simulate an EXTERNAL disconnect: the remote end of the port has
|
||||
// been torn down (SW eviction, port glitch, network blip), but the
|
||||
// local onDisconnect handler hasn't fired yet — this is exactly the
|
||||
// half-open state that the original H1 race weaponised.
|
||||
oldPort._disconnected = true;
|
||||
|
||||
// 2) Now the race: an already-queued ping callback (e.g. scheduled
|
||||
// before the reconnect) fires against the just-disconnected port.
|
||||
// Production code: keepalivePort?.postMessage({type:'PING'}) —
|
||||
// keepalivePort is still the old non-null reference at this point
|
||||
// (the onDisconnect handler may not have run yet in the same
|
||||
// macrotask), so postMessage runs against the disconnected port
|
||||
// and our stub throws — exactly matching the UAT error message.
|
||||
//
|
||||
// The fix must make this safe. Acceptable strategies:
|
||||
// - Clear the interval as the FIRST step of the pre-emptive
|
||||
// reconnect, before .disconnect().
|
||||
// - Wrap postMessage in try/catch inside the ping callback.
|
||||
// - Switch to a per-port closure (capturedPort) so a stale ping
|
||||
// can detect it's pinging a swapped port.
|
||||
// Fire the ping. Under the legacy code this threw with "Attempting
|
||||
// to use a disconnected port object"; under the Option C fix the
|
||||
// ping callback wraps postMessage in try/catch and initiates a
|
||||
// clean reconnect via reconnectPort(). The test asserts BOTH that
|
||||
// no Uncaught Error escapes AND that a fresh port has been
|
||||
// requested (the visible side-effect of the reconnect path).
|
||||
expect(() => pingCb()).not.toThrow();
|
||||
expect(ports.length).toBeGreaterThanOrEqual(2);
|
||||
|
||||
// Restore globals.
|
||||
globalThis.setInterval = originalSetInterval;
|
||||
globalThis.clearInterval = originalClearInterval;
|
||||
globalThis.setTimeout = originalSetTimeout;
|
||||
globalThis.clearTimeout = originalClearTimeout;
|
||||
});
|
||||
});
|
||||
|
||||
Reference in New Issue
Block a user