Milestone v1 (v2.0.0): Mokosh — Session Capture #1
@@ -1,6 +1,6 @@
|
||||
---
|
||||
slug: empty-archive-port-race
|
||||
status: tdd_red_confirmed
|
||||
status: resolved
|
||||
trigger: |
|
||||
Phase 1 UAT Test 3 surfaced a two-headed BLOCKER:
|
||||
(a) Archive shipped via popup save contains NO video/last_30sec.webm
|
||||
@@ -15,7 +15,7 @@ trigger: |
|
||||
src/background/index.ts:130-167, 340-393 (BUFFER round-trip + zip
|
||||
build branch that silently elides the video file on empty segments).
|
||||
created: 2026-05-16T11:58:51Z
|
||||
updated: 2026-05-16T12:09:40Z
|
||||
updated: 2026-05-16T13:00:00Z
|
||||
phase: 01-stabilize-video-pipeline
|
||||
related_uat: .planning/phases/01-stabilize-video-pipeline/01-UAT.md
|
||||
related_review_fix: .planning/phases/01-stabilize-video-pipeline/01-REVIEW-FIX.md
|
||||
@@ -304,10 +304,106 @@ tdd_checkpoint: |
|
||||
|
||||
## Resolution
|
||||
|
||||
root_cause: ""
|
||||
fix: ""
|
||||
verification: ""
|
||||
files_changed: []
|
||||
root_cause: |
|
||||
Two coupled defects sharing a single architectural failure mode.
|
||||
|
||||
H1 (Uncaught Errors): The PING setInterval at recorder.ts:623-625
|
||||
used `keepalivePort?.postMessage({type:'PING'})`. Optional chaining
|
||||
only guards against `null`, NOT against a connected-but-being-
|
||||
disconnected Port object. The 290 s pre-emptive setTimeout
|
||||
(recorder.ts:626-630) called `keepalivePort?.disconnect()`, creating
|
||||
a microtask race window before the onDisconnect handler ran the
|
||||
cleanup. Any ping callback queued in that window threw "Attempting
|
||||
to use a disconnected port object" synchronously.
|
||||
|
||||
H2 (silent empty-video archive): The reconnect-race window left
|
||||
REQUEST_BUFFER round-trips vulnerable. The offscreen captured
|
||||
`portAtRequest = keepalivePort` BEFORE await blobToBase64() (CR-01
|
||||
pattern); on mid-encode reconnect the post-await guard correctly
|
||||
REFUSED to post on the now-stale portAtRequest. The SW's per-
|
||||
request listener was bound to that same stale port, so it timed
|
||||
out (BUFFER_FETCH_TIMEOUT_MS = 2 s) and resolved {segments: []}.
|
||||
createArchive's branch at index.ts:346-352 silently elided the
|
||||
video file when segments.length === 0 — operator received an
|
||||
88 KB zip with no last_30sec.webm. Bisect confirmed this silent-
|
||||
skip branch was an UPSTREAM defect (commit 555eb05) that the
|
||||
Plan 01-04 port lifecycle (b064a21) + CR-01 (2e3f524) amplified
|
||||
from latent to fatal.
|
||||
|
||||
fix: |
|
||||
Option C (Architectural) — request-id'd port protocol + port-health
|
||||
probe + retry on port replacement + operator-visible error surface.
|
||||
|
||||
1. RETIRED the 290 s pre-emptive setTimeout reconnect. Replaced with
|
||||
PONG-based health probe: each PING expects a PONG echo from the
|
||||
SW; the offscreen counts missedPongs and reconnects via clean
|
||||
teardown when MAX_MISSED_PONGS=3 exceeded.
|
||||
|
||||
2. H1 fix: wrapped PING postMessage in try/catch. On synchronous
|
||||
throw, route through reconnectPort() — no more Uncaught Error
|
||||
bubbles out to the offscreen console.
|
||||
|
||||
3. Request-id'd REQUEST_BUFFER / BUFFER: SW generates a uuid per
|
||||
request (crypto.randomUUID), sends with {type:'REQUEST_BUFFER',
|
||||
requestId}. Offscreen echoes the same requestId on the BUFFER
|
||||
response. SW routes via module-level pendingBufferRequests Map
|
||||
keyed by requestId — port-agnostic, so port replacement does
|
||||
NOT lose the response.
|
||||
|
||||
4. SW retry on port replacement: every onConnect post-bootstrap
|
||||
scans pendingBufferRequests and re-issues REQUEST_BUFFER on the
|
||||
fresh port with the SAME requestId. The offscreen posts BUFFER
|
||||
on the CURRENT keepalivePort (no more stale-port refuse-to-post).
|
||||
This retires the H2 silent-drop class architecturally.
|
||||
|
||||
5. createArchive throws EmptyVideoBufferError on empty video
|
||||
buffer. saveArchive catches and emits RECORDING_ERROR to the
|
||||
popup AND returns {success:false, error} — operator sees a
|
||||
visible failure instead of silently shipping a video-less zip.
|
||||
|
||||
6. PING → PONG echo: SW replies to every PING with PONG, closing
|
||||
the offscreen's health-probe loop.
|
||||
|
||||
Outer BUFFER_FETCH_TIMEOUT_MS bumped 2 s → 10 s to cover retries
|
||||
across multiple port replacements (inner round-trip still
|
||||
~100-200 ms).
|
||||
|
||||
verification: |
|
||||
All 12 RED-gate tests across 3 files flipped GREEN:
|
||||
- tests/offscreen/port-reconnect-race.test.ts (H1 + H1.b + H2)
|
||||
- tests/offscreen/port-health-probe.test.ts (A, B, C, D — health
|
||||
probe + request-id echo)
|
||||
- tests/background/request-id-protocol.test.ts (1, 2, 3, 4, 5 —
|
||||
request-id protocol + retry + error surface + PONG echo)
|
||||
|
||||
New continuous-end-to-end pinning contract in
|
||||
tests/background/port-lifecycle-continuous.test.ts covers 600 s
|
||||
of port lifecycle (12 ping cycles + 2 forced reconnects + 3
|
||||
SAVE_ARCHIVE round-trips). Asserts no Uncaught + every BUFFER
|
||||
carries segments + PONGs round-trip.
|
||||
|
||||
Pinning contracts preserved with no regression:
|
||||
- tests/offscreen/port-serialization.test.ts (D-12): GREEN
|
||||
- tests/offscreen/segment-rotation.test.ts (D-13): GREEN
|
||||
- tests/offscreen/webm-playback.test.ts (A3 ffmpeg dry-run): GREEN
|
||||
|
||||
Final suite: 11 files / 53 tests, all GREEN.
|
||||
Quality gates: npx tsc --noEmit exit 0; grep "as any|@ts-ignore"
|
||||
clean across src/offscreen + src/background/index.ts + src/shared;
|
||||
npm run build exit 0 (5.71 kB offscreen, 73.79 kB main).
|
||||
|
||||
CONTEXT.md D-17 amended (append, not replace) with the new port
|
||||
lifecycle commitments. Original D-17 left intact.
|
||||
|
||||
files_changed:
|
||||
- src/shared/types.ts (PortMessage.requestId + 'PONG')
|
||||
- src/offscreen/recorder.ts (health probe + request-id'd encode + H1 try/catch + reconnectPort helper)
|
||||
- src/background/index.ts (request-id protocol + retry + PONG echo + EmptyVideoBufferError surface + decodeBufferSegments extraction)
|
||||
- tests/offscreen/port-reconnect-race.test.ts (H1.b refactored to externally-disconnected path; H2 sends requestId)
|
||||
- tests/offscreen/port-health-probe.test.ts (new — 4 contract tests)
|
||||
- tests/background/request-id-protocol.test.ts (new — 5 contract tests)
|
||||
- tests/background/port-lifecycle-continuous.test.ts (new — 600 s e2e simulation)
|
||||
- .planning/phases/01-stabilize-video-pipeline/01-CONTEXT.md (D-17 amendment appended; original D-17 untouched)
|
||||
|
||||
## Bisect Results (2026-05-16T12:15:00Z)
|
||||
|
||||
@@ -299,5 +299,74 @@ phase, so downstream phases see a consistent baseline:
|
||||
|
||||
---
|
||||
|
||||
## Amendment (Phase 01-stabilize-video-pipeline, 2026-05-16) — D-17 port lifecycle
|
||||
|
||||
- **AMENDED-BY:** debug session `empty-archive-port-race` (Option C, resolved 2026-05-16)
|
||||
- **Replaces nothing.** D-17 above stands as the original port-keepalive
|
||||
contract. This amendment narrows the port LIFECYCLE shape that was
|
||||
Claude's-discretion at Phase 1 plan time.
|
||||
- **Background.** UAT Test 3 surfaced two coupled defects: 3× Uncaught
|
||||
"Attempting to use a disconnected port object" in the offscreen console
|
||||
starting at the 290 s pre-emptive reconnect mark, AND a silent
|
||||
empty-video archive shipping to the operator. Bisect proved the H1
|
||||
port race was introduced by Plan 01-04 (commit `b064a21`) and the H2
|
||||
silent-skip in `createArchive` was an upstream defect from commit
|
||||
`555eb05` that the port race amplified from latent to fatal. Full
|
||||
lineage and strategy rationale: `.planning/debug/resolved/empty-archive-port-race.md`.
|
||||
- **Architectural commitments retired:**
|
||||
- The legacy 290 000 ms pre-emptive `setTimeout` reconnect (Pitfall 4)
|
||||
is RETIRED. Its race window between the synchronous `.disconnect()`
|
||||
and the `onDisconnect` handler firing was the proximate cause of
|
||||
H1 — see the bisect notes.
|
||||
- **Architectural commitments added:**
|
||||
- **Port health probe (PING ↔ PONG).** The offscreen `PORT_PING_MS`
|
||||
interval doubles as a liveness probe; each PING expects a PONG echo
|
||||
from the SW. The offscreen tracks `missedPongs` and reconnects
|
||||
cleanly when the count exceeds `MAX_MISSED_PONGS = 3` (~75 s of
|
||||
unresponsive SW — well past Chrome's ~30 s idle threshold, so a
|
||||
real disconnect would already have surfaced its own
|
||||
`onDisconnect`). The SW echoes PONG on every PING via the
|
||||
onConnect-level message sink.
|
||||
- **Request-id'd REQUEST_BUFFER / BUFFER.** Every `REQUEST_BUFFER`
|
||||
carries a SW-generated `requestId` (crypto.randomUUID with
|
||||
Math.random fallback). The offscreen echoes the same `requestId`
|
||||
on the BUFFER response. The SW routes BUFFER → pending request via
|
||||
a module-level `Map<requestId, PendingBufferRequest>` — port-
|
||||
agnostic, so port replacement does not lose the response.
|
||||
- **Retry on port replacement.** Every `onConnect` (post-bootstrap)
|
||||
scans `pendingBufferRequests` and re-issues REQUEST_BUFFER on the
|
||||
fresh port with the SAME requestId. The offscreen posts BUFFER on
|
||||
the CURRENT `keepalivePort`, the sink matches by id, and the
|
||||
request resolves. This retires the H2 silent-drop class
|
||||
architecturally.
|
||||
- **Outer hard-timeout bumped 2 s → 10 s.** The legacy per-port
|
||||
`BUFFER_FETCH_TIMEOUT_MS = 2000` was too tight to cover a retry
|
||||
across a reconnect. The new outer budget covers EVERY retry across
|
||||
port replacements; the inner per-port round-trip is still
|
||||
~100-200 ms.
|
||||
- **Operator-visible failure surface.** `createArchive` throws
|
||||
`EmptyVideoBufferError` when the video buffer is empty. `saveArchive`
|
||||
catches and emits `{type:'RECORDING_ERROR', error:'empty-video-buffer'}`
|
||||
via `chrome.runtime.sendMessage` for the popup, AND returns
|
||||
`{success:false, error}` in the direct-response path. Replaces the
|
||||
upstream silent-skip branch in `createArchive` that shipped a
|
||||
video-less archive in silence.
|
||||
- **Pinning contracts added:**
|
||||
- `tests/offscreen/port-health-probe.test.ts` — pins the PING/PONG +
|
||||
request-id'd encode contract on the offscreen side (4 tests).
|
||||
- `tests/background/request-id-protocol.test.ts` — pins the SW-side
|
||||
request-id routing + retry + error-surface contract (5 tests).
|
||||
- `tests/background/port-lifecycle-continuous.test.ts` — continuous
|
||||
600 s end-to-end simulation: 12 ping/pong cycles + 2 forced
|
||||
reconnects + 3 SAVE_ARCHIVE round-trips, asserts no Uncaught and
|
||||
every BUFFER carries segments.
|
||||
- `tests/offscreen/port-reconnect-race.test.ts` (refactored): H1.b
|
||||
no longer pins the retired 290 s setTimeout path — it now pins
|
||||
the externally-disconnected-port → ping try/catch → reconnect path
|
||||
that the H1 fix delivers.
|
||||
|
||||
---
|
||||
|
||||
*Phase: 01-stabilize-video-pipeline*
|
||||
*Context gathered: 2026-05-15*
|
||||
*Amended: 2026-05-16 (debug session empty-archive-port-race, Option C)*
|
||||
|
||||
Reference in New Issue
Block a user