docs(option-c): archive empty-archive-port-race + amend CONTEXT.md D-17 port lifecycle

Two doc updates closing the debug session per the resolved pattern this
phase has established (cf. resolved/d12-blob-port-transfer-fails.md and
resolved/webm-playback-freeze.md):

1. **Move debug session to resolved/** with the Resolution section
   filled in (root_cause, fix, verification, files_changed). Status
   flipped tdd_red_confirmed -> resolved. Original investigation
   notes + bisect results + Option C strategy spec all preserved
   in-place — the file is the full provenance trail.

2. **Amend 01-CONTEXT.md D-17** with the new port lifecycle commitments.
   Append-only (D-17 itself untouched) per the doc cascade rule
   established earlier this phase ("amendments append, do not replace,
   to preserve SPEC provenance"). The amendment narrates:
   - What was Claude's-discretion at Phase 1 plan time has been
     specified by Option C.
   - The 290 s pre-emptive setTimeout reconnect (Pitfall 4) is RETIRED.
   - The architectural commitments added: PING/PONG health probe,
     request-id'd REQUEST_BUFFER/BUFFER, SW retry on port replacement,
     outer 10 s hard-timeout, operator-visible EmptyVideoBufferError
     surface.
   - The 4 pinning contracts added (port-health-probe,
     request-id-protocol, port-lifecycle-continuous, plus the
     refactored port-reconnect-race).

Suite remains 11 files / 53 tests, all GREEN. Quality gates intact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-16 15:40:02 +02:00
parent 246eadb2ef
commit f0871c0237
2 changed files with 171 additions and 6 deletions

View File

@@ -299,5 +299,74 @@ phase, so downstream phases see a consistent baseline:
---
## Amendment (Phase 01-stabilize-video-pipeline, 2026-05-16) — D-17 port lifecycle
- **AMENDED-BY:** debug session `empty-archive-port-race` (Option C, resolved 2026-05-16)
- **Replaces nothing.** D-17 above stands as the original port-keepalive
contract. This amendment narrows the port LIFECYCLE shape that was
Claude's-discretion at Phase 1 plan time.
- **Background.** UAT Test 3 surfaced two coupled defects: 3× Uncaught
"Attempting to use a disconnected port object" in the offscreen console
starting at the 290 s pre-emptive reconnect mark, AND a silent
empty-video archive shipping to the operator. Bisect proved the H1
port race was introduced by Plan 01-04 (commit `b064a21`) and the H2
silent-skip in `createArchive` was an upstream defect from commit
`555eb05` that the port race amplified from latent to fatal. Full
lineage and strategy rationale: `.planning/debug/resolved/empty-archive-port-race.md`.
- **Architectural commitments retired:**
- The legacy 290 000 ms pre-emptive `setTimeout` reconnect (Pitfall 4)
is RETIRED. Its race window between the synchronous `.disconnect()`
and the `onDisconnect` handler firing was the proximate cause of
H1 — see the bisect notes.
- **Architectural commitments added:**
- **Port health probe (PING ↔ PONG).** The offscreen `PORT_PING_MS`
interval doubles as a liveness probe; each PING expects a PONG echo
from the SW. The offscreen tracks `missedPongs` and reconnects
cleanly when the count exceeds `MAX_MISSED_PONGS = 3` (~75 s of
unresponsive SW — well past Chrome's ~30 s idle threshold, so a
real disconnect would already have surfaced its own
`onDisconnect`). The SW echoes PONG on every PING via the
onConnect-level message sink.
- **Request-id'd REQUEST_BUFFER / BUFFER.** Every `REQUEST_BUFFER`
carries a SW-generated `requestId` (crypto.randomUUID with
Math.random fallback). The offscreen echoes the same `requestId`
on the BUFFER response. The SW routes BUFFER → pending request via
a module-level `Map<requestId, PendingBufferRequest>` — port-
agnostic, so port replacement does not lose the response.
- **Retry on port replacement.** Every `onConnect` (post-bootstrap)
scans `pendingBufferRequests` and re-issues REQUEST_BUFFER on the
fresh port with the SAME requestId. The offscreen posts BUFFER on
the CURRENT `keepalivePort`, the sink matches by id, and the
request resolves. This retires the H2 silent-drop class
architecturally.
- **Outer hard-timeout bumped 2 s → 10 s.** The legacy per-port
`BUFFER_FETCH_TIMEOUT_MS = 2000` was too tight to cover a retry
across a reconnect. The new outer budget covers EVERY retry across
port replacements; the inner per-port round-trip is still
~100-200 ms.
- **Operator-visible failure surface.** `createArchive` throws
`EmptyVideoBufferError` when the video buffer is empty. `saveArchive`
catches and emits `{type:'RECORDING_ERROR', error:'empty-video-buffer'}`
via `chrome.runtime.sendMessage` for the popup, AND returns
`{success:false, error}` in the direct-response path. Replaces the
upstream silent-skip branch in `createArchive` that shipped a
video-less archive in silence.
- **Pinning contracts added:**
- `tests/offscreen/port-health-probe.test.ts` — pins the PING/PONG +
request-id'd encode contract on the offscreen side (4 tests).
- `tests/background/request-id-protocol.test.ts` — pins the SW-side
request-id routing + retry + error-surface contract (5 tests).
- `tests/background/port-lifecycle-continuous.test.ts` — continuous
600 s end-to-end simulation: 12 ping/pong cycles + 2 forced
reconnects + 3 SAVE_ARCHIVE round-trips, asserts no Uncaught and
every BUFFER carries segments.
- `tests/offscreen/port-reconnect-race.test.ts` (refactored): H1.b
no longer pins the retired 290 s setTimeout path — it now pins
the externally-disconnected-port → ping try/catch → reconnect path
that the H1 fix delivers.
---
*Phase: 01-stabilize-video-pipeline*
*Context gathered: 2026-05-15*
*Amended: 2026-05-16 (debug session empty-archive-port-race, Option C)*