Two doc updates closing the debug session per the resolved pattern this
phase has established (cf. resolved/d12-blob-port-transfer-fails.md and
resolved/webm-playback-freeze.md):
1. **Move debug session to resolved/** with the Resolution section
filled in (root_cause, fix, verification, files_changed). Status
flipped tdd_red_confirmed -> resolved. Original investigation
notes + bisect results + Option C strategy spec all preserved
in-place — the file is the full provenance trail.
2. **Amend 01-CONTEXT.md D-17** with the new port lifecycle commitments.
Append-only (D-17 itself untouched) per the doc cascade rule
established earlier this phase ("amendments append, do not replace,
to preserve SPEC provenance"). The amendment narrates:
- What was Claude's-discretion at Phase 1 plan time has been
specified by Option C.
- The 290 s pre-emptive setTimeout reconnect (Pitfall 4) is RETIRED.
- The architectural commitments added: PING/PONG health probe,
request-id'd REQUEST_BUFFER/BUFFER, SW retry on port replacement,
outer 10 s hard-timeout, operator-visible EmptyVideoBufferError
surface.
- The 4 pinning contracts added (port-health-probe,
request-id-protocol, port-lifecycle-continuous, plus the
refactored port-reconnect-race).
Suite remains 11 files / 53 tests, all GREEN. Quality gates intact.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
503 lines
26 KiB
Markdown
503 lines
26 KiB
Markdown
---
|
||
slug: empty-archive-port-race
|
||
status: resolved
|
||
trigger: |
|
||
Phase 1 UAT Test 3 surfaced a two-headed BLOCKER:
|
||
(a) Archive shipped via popup save contains NO video/last_30sec.webm
|
||
(only rrweb/, logs/, screenshot.png, meta.json) — silent data loss
|
||
defeating the entire phase goal.
|
||
(b) 3× Uncaught Error: "Attempting to use a disconnected port object"
|
||
surface in the offscreen console starting at the 290 s pre-emptive
|
||
port-reconnect mark (recorder.ts line 627), spaced ~25-30 s
|
||
apart (PORT_PING_MS = 25_000 cadence).
|
||
Both symptoms cluster around src/offscreen/recorder.ts:597-630
|
||
(port reconnect / encodeAndSendBuffer interaction) and
|
||
src/background/index.ts:130-167, 340-393 (BUFFER round-trip + zip
|
||
build branch that silently elides the video file on empty segments).
|
||
created: 2026-05-16T11:58:51Z
|
||
updated: 2026-05-16T13:00:00Z
|
||
phase: 01-stabilize-video-pipeline
|
||
related_uat: .planning/phases/01-stabilize-video-pipeline/01-UAT.md
|
||
related_review_fix: .planning/phases/01-stabilize-video-pipeline/01-REVIEW-FIX.md
|
||
prior_resolved_sessions:
|
||
- .planning/debug/resolved/d12-blob-port-transfer-fails.md
|
||
- .planning/debug/resolved/webm-playback-freeze.md
|
||
---
|
||
|
||
# Debug: Empty-archive + port-reconnect race (Phase 1 BLOCKER)
|
||
|
||
## Symptoms (from UAT Test 3)
|
||
|
||
**Expected behavior:**
|
||
When the operator clicks save in the popup, the produced ZIP archive
|
||
contains `video/last_30sec.webm` — a playable WebM with the most recent
|
||
30 s of captured screen video derived from D-13 restart-segments ring
|
||
buffer (3 × 10 s self-contained segments).
|
||
|
||
**Actual behavior:**
|
||
1. `smoke.sh` waited 600 s before detecting a new archive in ~/Downloads.
|
||
2. The detected archive
|
||
(`/home/parf/Downloads/session_report_2026-05-16_13-54-52.zip`,
|
||
88 254 bytes) contains:
|
||
```
|
||
rrweb/ (dir, empty)
|
||
rrweb/session.json (2 bytes — empty JSON)
|
||
logs/ (dir, empty)
|
||
logs/events.json (2 bytes — empty JSON)
|
||
screenshot.png (87 300 bytes)
|
||
meta.json (336 bytes)
|
||
```
|
||
**No `video/` directory, no `last_30sec.webm`.** Total archive
|
||
88 KB vs. expected ~1.5 MB with video.
|
||
3. Offscreen console showed 3× `Uncaught Error: Attempting to use a
|
||
disconnected port object at offscreen-Bp_IKlxL.js:1:4644` starting
|
||
at exactly 290 s after recording start (timestamps 11:50:10, ~11:50:40,
|
||
~11:51:00) — coincides with `pre-emptive port reconnect (290 s cap)`
|
||
log line at recorder.ts:627.
|
||
4. Ring buffer itself was healthy throughout — 33+ segments rotated
|
||
cleanly with `kept: 3/3` invariant (sizes 472-799 KB per 10 s segment).
|
||
The D-13 restart-segments lifecycle is NOT the broken component.
|
||
|
||
**Error messages:**
|
||
```
|
||
offscreen-Bp_IKlxL.js:1 Uncaught Error: Attempting to use a disconnected port object
|
||
at offscreen-Bp_IKlxL.js:1:4644
|
||
```
|
||
And smoke.sh terminal:
|
||
```
|
||
caution: filename not matched: video/last_30sec.webm
|
||
```
|
||
|
||
**Timeline:**
|
||
- Bug introduced: not yet bisected. Most plausible window is one of:
|
||
- Plan 01-04 (port keepalive + OFFSCREEN_READY handshake) — added
|
||
the long-lived port that the pre-emptive reconnect operates on.
|
||
- Plan 01-05 (SW shrink + port host with T-1-04 sender check) —
|
||
added `getVideoBufferFromOffscreen` request-response pattern.
|
||
- Review-fix iteration 1 (commit 2e3f524 = CR-01 + CR-02 + CR-03 +
|
||
WR-03 + WR-09) — CR-01 specifically introduced the
|
||
"portAtRequest captured before await + refuse-to-post on stale
|
||
port" pattern in encodeAndSendBuffer that drops responses on the
|
||
floor without retry.
|
||
- Review-fix iteration 2 sweep #5 (commit 034155b "surface
|
||
port-replaced-during-fetch diagnostic on buffer timeout") — added
|
||
diagnostic logging on timeout but no retry.
|
||
- Never worked end-to-end with operator-side save under realistic
|
||
recording duration. The original Phase 1 closure on 2026-05-15 was
|
||
confirmed with a fresh fixture from a successful smoke run, but
|
||
the fixture was regenerated by a smoke session that almost certainly
|
||
completed before the 290 s pre-emptive reconnect window
|
||
(smoke required ≥ 35 s recording, not 5+ minutes).
|
||
|
||
**Reproduction:**
|
||
1. `npm run build` → produces dist/
|
||
2. `KEEP_PROFILE=0 ./smoke.sh`
|
||
3. In Chrome: load dist/ unpacked, click extension icon → picker
|
||
auto-accepts smoke tab → recording starts (logs confirm).
|
||
4. WAIT past the 290 s mark (e.g., 5+ minutes). Watch the offscreen
|
||
console: at 290 s see `pre-emptive port reconnect`, immediately
|
||
followed by `Uncaught Error: Attempting to use a disconnected port
|
||
object`, then 2-3 more such errors at PORT_PING_MS=25_000 intervals.
|
||
5. Click extension icon → "Сохранить отчёт об ошибке". Observe the
|
||
long wait time (the user saw 600 s — investigate whether multiple
|
||
clicks were needed or whether saveArchive has its own retry).
|
||
6. Inspect the produced zip: `unzip -l ~/Downloads/session_report_*.zip`
|
||
→ no `video/last_30sec.webm` entry.
|
||
|
||
## Current Focus
|
||
|
||
hypothesis: |
|
||
Two coupled defects sharing the same root cause (port-reconnect race
|
||
window in src/offscreen/recorder.ts):
|
||
|
||
H1 (the noise — Uncaught Errors): The ping `setInterval` at
|
||
recorder.ts:623-625 uses `keepalivePort?.postMessage({type:'PING'})`.
|
||
Optional chaining only guards against `null`, not against a
|
||
connected-but-being-disconnected `Port` object. When the
|
||
pre-emptive `setTimeout` fires at 290 s (recorder.ts:626-630) and
|
||
calls `keepalivePort?.disconnect()`, there's a microtask race
|
||
window before the onDisconnect handler (recorder.ts:616-622) runs
|
||
`teardownPortTimers()`. Any ping callback already queued/dispatched
|
||
in that window throws "Attempting to use a disconnected port object."
|
||
Two follow-on errors at ~25 s and ~50 s post-reconnect suggest the
|
||
recursive `connectPort()` re-installs the ping interval against
|
||
a NEW port that itself gets disconnected (possibly by a second
|
||
pre-emptive cycle, OR by the SW idle-unloading), but the error
|
||
remains because the same race recurs.
|
||
|
||
H2 (the silent data loss — empty-video archive): The same
|
||
reconnect-race window leaves REQUEST_BUFFER / BUFFER round-trips
|
||
vulnerable. When the operator clicks save during or shortly after
|
||
a reconnect:
|
||
(a) SW's `getVideoBufferFromOffscreen`
|
||
(src/background/index.ts:130-167) sends REQUEST_BUFFER via the
|
||
OLD port that just disconnected, OR
|
||
(b) Offscreen's `encodeAndSendBuffer`
|
||
(src/offscreen/recorder.ts:522-545, 547-604) captures
|
||
`portAtRequest` before its `await blobToBase64(...)` for ~3
|
||
segments (~150 ms), the pre-emptive reconnect fires mid-encode,
|
||
then the guard at line 597-601 correctly REFUSES to post on
|
||
the now-stale `portAtRequest` and just returns.
|
||
Either way, the SW's per-request listener times out silently after
|
||
`BUFFER_FETCH_TIMEOUT_MS = 2_000`, sets `videoBufferResponse =
|
||
{ segments: [] }`, and `createArchive`
|
||
(src/background/index.ts:340-352) takes the empty branch:
|
||
if (videoBufferResponse.segments.length > 0) { ... } else {
|
||
logger.warn('✗ No video segments to add'); // only logs
|
||
}
|
||
→ archive built WITHOUT the video file, shipped to operator silently.
|
||
The 600 s smoke wait is consistent with the operator clicking save
|
||
multiple times until one REQUEST_BUFFER round-trip happened to
|
||
complete on a fresh port — and even then, that single successful
|
||
trip may itself have caught only one of the 3 segments before the
|
||
next reconnect (worth checking against actual archive's NULL state
|
||
vs partial state).
|
||
|
||
test: |
|
||
Initial evidence-gathering test (RED): write a vitest spec that
|
||
drives recorder.ts's port lifecycle through a pre-emptive reconnect
|
||
while a ping is queued, and asserts NO Uncaught Error escapes.
|
||
This pins H1. For H2 a second test drives REQUEST_BUFFER →
|
||
encodeAndSendBuffer → reconnect-mid-await → asserts BUFFER is
|
||
retried on the fresh port (or that saveArchive surfaces an error
|
||
instead of silently dropping).
|
||
expecting: |
|
||
Initial RED tests fail in current code (proves bugs are
|
||
reproducible at the unit level). Then the gsd-debugger isolates
|
||
the precise call sequence + lands the GREEN.
|
||
next_action: TDD checkpoint — RED tests landed and confirmed failing for the exact UAT error string. Awaiting user confirmation of RED gate before writing GREEN fix.
|
||
reasoning_checkpoint: ""
|
||
tdd_checkpoint: |
|
||
Test file: tests/offscreen/port-reconnect-race.test.ts (3 tests, all RED)
|
||
Status: RED ✓ (failures confirm both H1 + H2 hypotheses)
|
||
Baseline: 7 files / 40 tests still GREEN (no regression in pinning contracts)
|
||
tsc --noEmit: exit 0
|
||
|
||
Failure outputs:
|
||
H1 AssertionError: expected [Function] to not throw an error
|
||
but 'Error: Attempting to use a disconnected port object' was thrown
|
||
(line 200 — production ping callback throws on disconnected port)
|
||
|
||
H1.b AssertionError: expected [Function] to not throw an error
|
||
but 'Error: Attempting to use a disconnected port object' was thrown
|
||
(line 419 — pre-emptive reconnect path triggers the same race)
|
||
|
||
H2 AssertionError: expected 0 to be greater than or equal to 1
|
||
(line 341 — production code silently drops BUFFER when reconnect
|
||
arrives mid-encode; neither old nor new port receives any BUFFER)
|
||
|
||
## Constraints
|
||
|
||
- TDD mode is ON (workflow.tdd_mode: true). RED test must land before
|
||
GREEN fix.
|
||
- Auto-loaded memories: `feedback-gsd-ceremony-for-fixes.md` (no
|
||
hot-edits; route through /gsd-debug) and
|
||
`feedback-no-unilateral-scope-reduction.md` (no scope narrowing;
|
||
surface choices via AskUserQuestion).
|
||
- Both fixes must NOT regress D-12 (base64 wire format) or D-13
|
||
(restart-segments). Existing tests at
|
||
`tests/offscreen/port-serialization.test.ts` and
|
||
`tests/offscreen/segment-rotation.test.ts` are the pinning contracts.
|
||
- Existing tests (40 across 7 files) all green; tsc exit 0; build
|
||
exit 0. Any fix must preserve that baseline.
|
||
|
||
## Files of Interest (preliminary; debugger may expand)
|
||
|
||
- src/offscreen/recorder.ts:
|
||
- 522-545: encodeAndSendBuffer (CR-01 portAtRequest guard)
|
||
- 547-604: doEncodeAndSendBuffer (base64 encode loop + WR-03
|
||
monotonic seq + stale-port refuse + postMessage)
|
||
- 606-630: connectPort (port lifecycle, ping interval, pre-emptive
|
||
reconnect setTimeout)
|
||
- 616-622: onDisconnect handler (teardownPortTimers + recursive
|
||
connectPort)
|
||
- 39: PORT_RECONNECT_MS = 290_000
|
||
- 38: PORT_PING_MS = 25_000
|
||
- src/background/index.ts:
|
||
- 130-167: getVideoBufferFromOffscreen (per-request listener
|
||
+ BUFFER_FETCH_TIMEOUT_MS = 2_000)
|
||
- 340-393: createArchive (the silent-empty-video branch at 346-352)
|
||
- 430-493: saveArchive (orchestrator)
|
||
- tests/offscreen/port.test.ts (existing — only covers SW-side
|
||
disconnect path, NOT pre-emptive reconnect race)
|
||
- tests/offscreen/port-serialization.test.ts (D-12 contract)
|
||
- tests/offscreen/segment-rotation.test.ts (D-13 contract)
|
||
- tests/offscreen/port-reconnect-race.test.ts (NEW — RED gate for this
|
||
debug session; pins H1 + H1.b + H2 contracts at unit-test level)
|
||
|
||
## Evidence
|
||
|
||
- timestamp: 2026-05-16T12:06:29Z — Baseline established. `npx vitest run`
|
||
→ 7 files passed / 40 tests passed. Confirms a clean RED→GREEN baseline
|
||
for the new test file.
|
||
- timestamp: 2026-05-16T12:06:45Z — Static analysis of src/offscreen/recorder.ts
|
||
+ the minified production bundle dist/assets/offscreen-Bp_IKlxL.js
|
||
CONFIRMS H1 at the bytecode level:
|
||
`grep -oP '.{60}postMessage.{20}' offscreen-Bp_IKlxL.js` yields:
|
||
`connecting"),E(),o=null,O()}),p=setInterval(()=>{o==null||o.postMessage({type:"PING"})},k)`
|
||
This is exactly the pattern at recorder.ts:623-625 — the ping
|
||
callback uses `?.postMessage` whose only safety check is null. The
|
||
minified column 4644 in the UAT error stack maps to this site:
|
||
`o.postMessage({type:"PING"})` where `o` is a non-null but
|
||
disconnected Port. Chrome's documented runtime semantics for Port
|
||
say postMessage on a disconnected port throws synchronously with
|
||
the exact error string observed in UAT
|
||
("Attempting to use a disconnected port object").
|
||
- timestamp: 2026-05-16T12:07:00Z — Static analysis of the
|
||
doEncodeAndSendBuffer flow CONFIRMS H2 silent-drop:
|
||
`src/offscreen/recorder.ts:597-601` —
|
||
`if (keepalivePort !== portAtRequest) { logger.warn(...); return; }`
|
||
On a mid-encode reconnect, keepalivePort is reassigned by the
|
||
onDisconnect→connectPort chain while portAtRequest still points to
|
||
the OLD port. The function returns WITHOUT posting anywhere.
|
||
`src/background/index.ts:134-150` — SW per-request listener bound
|
||
to the captured `port` (the OLD port). After 2 s
|
||
(BUFFER_FETCH_TIMEOUT_MS) it resolves `{ segments: [] }`.
|
||
`src/background/index.ts:346-352` — `createArchive` branch:
|
||
`if (videoBufferResponse.segments.length > 0) { ... } else {
|
||
logger.warn('✗ No video segments to add'); }`
|
||
No throw, no surface to operator. Archive ships WITHOUT video.
|
||
- timestamp: 2026-05-16T12:09:30Z — RED tests written and executed:
|
||
`tests/offscreen/port-reconnect-race.test.ts`. 3 tests, all RED:
|
||
H1 `ping callback handles a remotely-disconnected port without
|
||
throwing` — FAILS with "Attempting to use a disconnected port
|
||
object" (matches UAT error string byte-for-byte).
|
||
H1.b `pre-emptive reconnect path — ping does not throw against
|
||
just-disconnected port` — FAILS with same error (proves the
|
||
290 s pre-emptive path is the same race).
|
||
H2 `REQUEST_BUFFER mid-reconnect must NOT drop BUFFER when
|
||
segments exist` — FAILS with `expected 0 to be greater than
|
||
or equal to 1`. 3 seeded segments + mid-encode reconnect →
|
||
neither old nor new port receives any BUFFER. Proves the
|
||
silent-drop is real and reproducible at the unit level.
|
||
- timestamp: 2026-05-16T12:09:40Z — Full-suite check: 8 files (was 7) /
|
||
43 tests (was 40) — exactly 3 new RED, 40 baseline still GREEN. No
|
||
regression. `npx tsc --noEmit` exit 0 (new test file is type-clean).
|
||
|
||
## Eliminated
|
||
|
||
- **Not a buffer-emptiness issue.** UAT logs show 33+ segments rotated
|
||
successfully with `kept: 3/3` invariant maintained throughout the
|
||
600 s session. The ring buffer always had 3 × ~500-800 KB segments
|
||
available when SAVE_ARCHIVE fired. The data is THERE; the transport
|
||
fails to deliver it.
|
||
- **Not a D-13 segment-lifecycle regression.** existing
|
||
segment-rotation.test.ts contract still passes; the WebM payload
|
||
format is sound (preserved by the d12 base64 wire-format fix).
|
||
- **Not a D-12 wire-format regression.** existing
|
||
port-serialization.test.ts contract still passes; the base64 round-
|
||
trip is intact. The bug is at the port-lifecycle layer ABOVE the
|
||
wire format.
|
||
- **Not a smoke.sh artifact.** the actual zip on disk was inspected
|
||
by-hand (`unzip -l ~/Downloads/session_report_2026-05-16_13-54-52.zip`)
|
||
— the absence of video/last_30sec.webm is on the wire, not in the
|
||
smoke harness's reporting.
|
||
- **Not a JSZip issue.** other archive entries (screenshot.png, meta.json)
|
||
are present and well-formed. JSZip received an archive build with no
|
||
video entry — because createArchive's silent-skip branch never
|
||
called `zip.file('video/last_30sec.webm', videoBlob)`.
|
||
- **Not a chrome.runtime.sendMessage problem.** the bug is on the
|
||
long-lived `chrome.runtime.connect` port (offscreen↔SW), not on the
|
||
popup↔SW sendMessage path. The popup→SW SAVE_ARCHIVE call works
|
||
fine — it just gets `{success: true}` back even though the archive
|
||
is video-less.
|
||
|
||
## Resolution
|
||
|
||
root_cause: |
|
||
Two coupled defects sharing a single architectural failure mode.
|
||
|
||
H1 (Uncaught Errors): The PING setInterval at recorder.ts:623-625
|
||
used `keepalivePort?.postMessage({type:'PING'})`. Optional chaining
|
||
only guards against `null`, NOT against a connected-but-being-
|
||
disconnected Port object. The 290 s pre-emptive setTimeout
|
||
(recorder.ts:626-630) called `keepalivePort?.disconnect()`, creating
|
||
a microtask race window before the onDisconnect handler ran the
|
||
cleanup. Any ping callback queued in that window threw "Attempting
|
||
to use a disconnected port object" synchronously.
|
||
|
||
H2 (silent empty-video archive): The reconnect-race window left
|
||
REQUEST_BUFFER round-trips vulnerable. The offscreen captured
|
||
`portAtRequest = keepalivePort` BEFORE await blobToBase64() (CR-01
|
||
pattern); on mid-encode reconnect the post-await guard correctly
|
||
REFUSED to post on the now-stale portAtRequest. The SW's per-
|
||
request listener was bound to that same stale port, so it timed
|
||
out (BUFFER_FETCH_TIMEOUT_MS = 2 s) and resolved {segments: []}.
|
||
createArchive's branch at index.ts:346-352 silently elided the
|
||
video file when segments.length === 0 — operator received an
|
||
88 KB zip with no last_30sec.webm. Bisect confirmed this silent-
|
||
skip branch was an UPSTREAM defect (commit 555eb05) that the
|
||
Plan 01-04 port lifecycle (b064a21) + CR-01 (2e3f524) amplified
|
||
from latent to fatal.
|
||
|
||
fix: |
|
||
Option C (Architectural) — request-id'd port protocol + port-health
|
||
probe + retry on port replacement + operator-visible error surface.
|
||
|
||
1. RETIRED the 290 s pre-emptive setTimeout reconnect. Replaced with
|
||
PONG-based health probe: each PING expects a PONG echo from the
|
||
SW; the offscreen counts missedPongs and reconnects via clean
|
||
teardown when MAX_MISSED_PONGS=3 exceeded.
|
||
|
||
2. H1 fix: wrapped PING postMessage in try/catch. On synchronous
|
||
throw, route through reconnectPort() — no more Uncaught Error
|
||
bubbles out to the offscreen console.
|
||
|
||
3. Request-id'd REQUEST_BUFFER / BUFFER: SW generates a uuid per
|
||
request (crypto.randomUUID), sends with {type:'REQUEST_BUFFER',
|
||
requestId}. Offscreen echoes the same requestId on the BUFFER
|
||
response. SW routes via module-level pendingBufferRequests Map
|
||
keyed by requestId — port-agnostic, so port replacement does
|
||
NOT lose the response.
|
||
|
||
4. SW retry on port replacement: every onConnect post-bootstrap
|
||
scans pendingBufferRequests and re-issues REQUEST_BUFFER on the
|
||
fresh port with the SAME requestId. The offscreen posts BUFFER
|
||
on the CURRENT keepalivePort (no more stale-port refuse-to-post).
|
||
This retires the H2 silent-drop class architecturally.
|
||
|
||
5. createArchive throws EmptyVideoBufferError on empty video
|
||
buffer. saveArchive catches and emits RECORDING_ERROR to the
|
||
popup AND returns {success:false, error} — operator sees a
|
||
visible failure instead of silently shipping a video-less zip.
|
||
|
||
6. PING → PONG echo: SW replies to every PING with PONG, closing
|
||
the offscreen's health-probe loop.
|
||
|
||
Outer BUFFER_FETCH_TIMEOUT_MS bumped 2 s → 10 s to cover retries
|
||
across multiple port replacements (inner round-trip still
|
||
~100-200 ms).
|
||
|
||
verification: |
|
||
All 12 RED-gate tests across 3 files flipped GREEN:
|
||
- tests/offscreen/port-reconnect-race.test.ts (H1 + H1.b + H2)
|
||
- tests/offscreen/port-health-probe.test.ts (A, B, C, D — health
|
||
probe + request-id echo)
|
||
- tests/background/request-id-protocol.test.ts (1, 2, 3, 4, 5 —
|
||
request-id protocol + retry + error surface + PONG echo)
|
||
|
||
New continuous-end-to-end pinning contract in
|
||
tests/background/port-lifecycle-continuous.test.ts covers 600 s
|
||
of port lifecycle (12 ping cycles + 2 forced reconnects + 3
|
||
SAVE_ARCHIVE round-trips). Asserts no Uncaught + every BUFFER
|
||
carries segments + PONGs round-trip.
|
||
|
||
Pinning contracts preserved with no regression:
|
||
- tests/offscreen/port-serialization.test.ts (D-12): GREEN
|
||
- tests/offscreen/segment-rotation.test.ts (D-13): GREEN
|
||
- tests/offscreen/webm-playback.test.ts (A3 ffmpeg dry-run): GREEN
|
||
|
||
Final suite: 11 files / 53 tests, all GREEN.
|
||
Quality gates: npx tsc --noEmit exit 0; grep "as any|@ts-ignore"
|
||
clean across src/offscreen + src/background/index.ts + src/shared;
|
||
npm run build exit 0 (5.71 kB offscreen, 73.79 kB main).
|
||
|
||
CONTEXT.md D-17 amended (append, not replace) with the new port
|
||
lifecycle commitments. Original D-17 left intact.
|
||
|
||
files_changed:
|
||
- src/shared/types.ts (PortMessage.requestId + 'PONG')
|
||
- src/offscreen/recorder.ts (health probe + request-id'd encode + H1 try/catch + reconnectPort helper)
|
||
- src/background/index.ts (request-id protocol + retry + PONG echo + EmptyVideoBufferError surface + decodeBufferSegments extraction)
|
||
- tests/offscreen/port-reconnect-race.test.ts (H1.b refactored to externally-disconnected path; H2 sends requestId)
|
||
- tests/offscreen/port-health-probe.test.ts (new — 4 contract tests)
|
||
- tests/background/request-id-protocol.test.ts (new — 5 contract tests)
|
||
- tests/background/port-lifecycle-continuous.test.ts (new — 600 s e2e simulation)
|
||
- .planning/phases/01-stabilize-video-pipeline/01-CONTEXT.md (D-17 amendment appended; original D-17 untouched)
|
||
|
||
## Bisect Results (2026-05-16T12:15:00Z)
|
||
|
||
User chose "Bisect first" from the fix-strategy fork. Targeted git-log
|
||
scan across the suspect code regions across the full Phase 1 history
|
||
(`555eb05..HEAD`):
|
||
|
||
| Symptom | Mechanism | Introduction commit | Subsequent amplification |
|
||
|---------|----------------------------------------|------------------------------------------------------|-----------------------------------------------------------------|
|
||
| H1.a | ping setInterval with `?.postMessage` | `b064a21` (Plan 01-04, 2026-05-15) | none — race exists since this commit |
|
||
| H1.b | pre-emptive 290 s reconnect setTimeout | `b064a21` (Plan 01-04, 2026-05-15) | none |
|
||
| H2 | createArchive silent-skip on empty | **`555eb05` (imported broken upstream — predates Phase 1)** | `2e3f524` CR-01 "refuse-to-post on swapped port" without retry; `034155b` sweep #5 added diagnostic on timeout but still no retry |
|
||
|
||
**Strategic implications surfaced by bisect:**
|
||
|
||
1. **H2 silent-skip is an UPSTREAM defect** in the original "half-broken
|
||
blob of code" — never on the Phase 1 fix radar because:
|
||
- Original 22-defect audit didn't flag it (it's a "what happens if a
|
||
downstream returns 0?" question that needs a real failure mode to
|
||
surface)
|
||
- REVIEW.md's 18 findings didn't catch it
|
||
- Sweep #5 (commit 034155b "surface port-replaced-during-fetch
|
||
diagnostic on buffer timeout") added a log on timeout but kept the
|
||
silent-skip behaviour — the diagnostic logs went to the SW console,
|
||
but the operator-facing failure path stayed silent.
|
||
|
||
2. **H1 port race latency-amplifies H2 from latent to fatal.** Before
|
||
Plan 01-04 there was no port → no reconnect → silent-skip never fired
|
||
in practice. After 01-04 the 290 s pre-emptive reconnect creates a
|
||
guaranteed-recurring window where REQUEST_BUFFER round-trips can
|
||
fail; after CR-01 the "refuse to post on swapped port" pattern makes
|
||
the silent-skip fire with near-100 % probability whenever a save is
|
||
clicked across a reconnect window.
|
||
|
||
3. **Option A** (target both symptoms with retry + try/catch) is
|
||
sufficient to make the visible symptoms go away, but leaves the
|
||
upstream silent-skip pattern intact — a class of latent failures
|
||
remains. **Option B** (which converts silent-skip into operator-
|
||
visible error) directly retires the upstream class. **Option C**
|
||
(request-id'd round-trip protocol) makes port-replacement
|
||
irrelevant by architecture but is the largest blast radius.
|
||
|
||
The bisect strengthens the case for B (it's not just symptom-control;
|
||
it retires the entire shape of bug, including the upstream one).
|
||
However, A is still defensible if treated as a pure hotfix to ship
|
||
Phase 1 and the silent-skip fix is consciously deferred to a follow-up
|
||
plan. C is justifiable if the user wants Phase 1's port lifecycle to
|
||
match the durability of the D-12/D-13 contracts before Phase 2 layers
|
||
DOM capture on top.
|
||
|
||
(Updated by orchestrator after bisect; user decides next.)
|
||
|
||
## Fix Strategy Chosen: Option C (Architectural)
|
||
|
||
User decided (post-bisect): **Option C — Architectural port lifecycle refactor.**
|
||
|
||
Rationale: bisect shows the current port lifecycle's race window is the
|
||
proximate cause of every H1 error AND the trigger for the upstream H2
|
||
silent-skip class. Option C makes port-replacement architecturally
|
||
irrelevant by:
|
||
|
||
1. Replace pre-emptive 290 s `setTimeout` reconnect with **port-health
|
||
probe** (ping/pong protocol; offscreen explicitly waits for SW echo
|
||
before considering port live; on any failure → clean teardown +
|
||
reconnect).
|
||
2. Switch REQUEST_BUFFER / BUFFER to **request-id'd** pattern: SW
|
||
sends `{type:'REQUEST_BUFFER', requestId: uuid}`; offscreen
|
||
responds with `{type:'BUFFER', requestId, segments}`; SW only
|
||
accepts responses matching its current pending requestId. SW
|
||
retries on port replacement until success or hard outer timeout.
|
||
3. Continuous end-to-end vitest covering 600 s of port lifecycle
|
||
(2 reconnects + simulated REQUEST_BUFFER round-trips). Becomes
|
||
the new pinning contract for the port lifecycle.
|
||
4. **Keep** the existing operator-visible safety from Option B: even
|
||
in the new design, if a hard outer timeout fires, `createArchive`
|
||
must throw → popup surfaces `RECORDING_ERROR` to operator instead
|
||
of silently shipping a video-less archive.
|
||
|
||
Touches D-17 port-lifecycle contract (CONTEXT.md will be amended).
|
||
Expected surface: ~150-200 LOC + 2-3 new test files. Pre-existing
|
||
40 GREEN tests + 3 new RED tests must all flip GREEN at completion.
|
||
|
||
Constraints (re-iterated for the executor):
|
||
- TDD: each new behaviour gets a RED test BEFORE GREEN implementation.
|
||
- D-12 (base64 wire format) and D-13 (restart-segments) contracts must
|
||
not regress. Existing pinning tests at
|
||
`tests/offscreen/port-serialization.test.ts` and
|
||
`tests/offscreen/segment-rotation.test.ts` are the canaries.
|
||
- `npx tsc --noEmit` exit 0, `npm run build` exit 0, type-safety
|
||
grep clean must all hold at every commit.
|
||
- `tests/fixtures/last_30sec.webm` is currently dirty in the working
|
||
tree (operator's recent smoke session staged a new file at that
|
||
path). Do NOT commit changes to that file — the committed D-13
|
||
fixture is the canonical contract.
|
||
|
||
(Strategy decided 2026-05-16T12:18:00Z; executor dispatch follows.)
|