fix(debug): A33.1 SAVE-ack race — gate on race-free fresh-archive signal

Root cause: driveA33's A33.1 hard-gated on the chrome.runtime.sendMessage
SAVE_ARCHIVE callback ack. After the Puppeteer CDP worker.close() SW kill,
the SAVE_ARCHIVE message wakes a fresh SW instance; that instance runs the
multi-step saveArchive() pipeline (offscreen video-keepalive port
re-establishment + REQUEST_BUFFER round-trip + rrweb collection + zip
build). The harness's original sendMessage response port has its own MV3
lifetime — on a 5-min-aged SW the pipeline INTERMITTENTLY outruns it,
surfacing chrome.runtime.lastError "message port closed before a response
was received". The archive is still written correctly every time, which is
why A33.2/A33.3 always passed (Plan 04-05 full-mode UAT: A33.1 FAIL while
A33.2/A33.3 PASS at 1.56 MB). A33.1 was gating a CI assertion on a
best-effort transport ack with inherent MV3 non-determinism.

Fix (harness-side only, Option A — race-free reframe): A33.1 now gates on
the durable race-free signal — a fresh archive on disk — via the canonical
snapshotExistingZips + pollForNewOrUpdatedZip helpers (also used by
driveA12/A13/A27). The sendMessage ack is demoted to a soft non-gating
diagnostic. This is exactly the signal the proven-reliable spike already
uses. A33.2/A33.3 substantive checks are intact and now read the verified
fresh zip. No new symbol; FORBIDDEN_HOOK_STRINGS unchanged at 12. The SW
SAVE_ARCHIVE handler is a correct MV3 async pattern — no production change.

Verified: full-mode A33 (genuine 5-min idle) 3/3 GREEN; skip-mode UAT
35/35 GREEN; tsc + build:test exit 0; vitest 184/184.

Debug session: .planning/debug/a33-save-ack-race.md

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-22 15:33:22 +02:00
parent 28ebc1fe4e
commit 7e0da63ff2
2 changed files with 263 additions and 24 deletions

View File

@@ -2527,8 +2527,12 @@ export async function driveA32(page: Page): Promise<AssertionRecord> {
// `worker.close()` because Puppeteer's persistent CDP attach keeps
// SWs alive indefinitely; natural 30s idle eviction does NOT fire
// under test conditions per Chrome devrel.
// - `findLatestZip(downloadsDir)` — exported helper from Plan 04-04;
// mtime-sort archive selection.
// - `snapshotExistingZips` + `pollForNewOrUpdatedZip` — canonical
// race-free post-SAVE archive detection (also used by driveA12/A13/
// A27). A33.1 gates on a fresh zip appearing here. The debug session
// .planning/debug/a33-save-ack-race.md replaced an earlier
// `findLatestZip` + sendMessage-ack-gated A33.1 with this race-free
// signal (the ack is now a soft diagnostic only).
// - `__mokoshHarness.assertA2` — canonical "go to REC state" entrypoint
// per Plan 04-04 REVISION iter-2 Option B (read_first verified:
// __mokoshHarness has assertA1..A31 + getManifestVersion; A2 does
@@ -2567,16 +2571,35 @@ const A33_VIDEO_SIZE_FLOOR_BYTES = 100_000;
* 2. Waiting 5 min wall-clock for the SW idle window to elapse.
* 3. Force-terminating the SW via stopServiceWorker (Puppeteer CDP).
* 4. Settling for SW teardown.
* 5. Dispatching SAVE_ARCHIVE inline via chrome.runtime.sendMessage
* (wakes SW event-driven per the canonical MV3 wakeup path).
* 5. Snapshotting the pre-SAVE zip state, then dispatching SAVE_ARCHIVE
* inline via chrome.runtime.sendMessage (wakes SW event-driven per
* the canonical MV3 wakeup path).
* 6. Settling for chrome.downloads to finish writing.
* 7. Locating the produced zip + measuring video/last_30sec.webm size.
* 7. Polling downloadsDir for a FRESH archive (race-free), then
* measuring video/last_30sec.webm size.
*
* Checks (3 total):
* - A33.1: SAVE_ARCHIVE ack success after 5-min idle + SW kill
* - A33.1: a fresh archive appeared in downloadsDir within the poll
* timeout after SAVE_ARCHIVE dispatch (race-free durable
* signal — the SAVE actually produced an archive).
* - A33.2: video/last_30sec.webm size > 0 (buffer survived SW eviction)
* - A33.3: video size > 100 KB (sanity floor; real archives 1-3 MB)
*
* A33.1 design (debug session .planning/debug/a33-save-ack-race.md):
* The chrome.runtime.sendMessage callback ack is NOT a gating check. After
* worker.close() force-kills the SW, the SAVE_ARCHIVE message wakes a
* FRESH SW instance; that instance runs the multi-step saveArchive()
* pipeline (offscreen video-keepalive port re-establishment + REQUEST_BUFFER
* round-trip + rrweb collection + zip build). The harness's original
* sendMessage response port has its own MV3 lifetime — on a 5-min-aged SW
* the pipeline INTERMITTENTLY outruns it, surfacing chrome.runtime.lastError
* ("message port closed before a response was received"). The archive is
* still written correctly every time (saveArchive() + chrome.downloads
* complete regardless of whether the ack reaches the harness). So A33.1
* gates on the durable race-free signal — a fresh zip on disk — exactly
* as the spike (tests/uat/spike-a33-sw-persistence.ts) does; the ack is
* captured as a soft diagnostic only.
*
* Env-gating: when this driver runs, the orchestrator does NOT skip the
* 5-min wait — caller should wrap with SKIP_LONG_UAT env-gate at the
* harness.test.ts level. See harness.test.ts for the gate.
@@ -2586,6 +2609,7 @@ const A33_VIDEO_SIZE_FLOOR_BYTES = 100_000;
* References:
* - Plan 04-04 PLAN.md Pattern 4 (revived verbatim under valid methodology)
* - Plan 04-08 PLAN.md Task 2
* - .planning/debug/a33-save-ack-race.md (A33.1 race-free reframe)
* - .planning/debug/sw-offscreen-persistence-investigation-session-2.md
* - https://developer.chrome.com/docs/extensions/how-to/test/test-serviceworker-termination-with-puppeteer
*
@@ -2633,10 +2657,22 @@ export async function driveA33(
// Step 4 — brief settle for SW teardown.
await new Promise((res) => setTimeout(res, A33_NEW_SW_BOOT_MS));
// Step 5 — SAVE_ARCHIVE inline dispatch from harness-page realm
// (Plan 04-04 REVISION iter-2 Option B; wakes SW event-driven).
// No dedicated dispatch-save-archive helper symbol is intentionally
// introduced — see Plan 04-08 Task 2 Step 3 contract.
// Step 5 — snapshot the pre-SAVE zip state, then dispatch SAVE_ARCHIVE
// inline from the harness-page realm (Plan 04-04 REVISION iter-2
// Option B; wakes SW event-driven). No dedicated dispatch-save-archive
// helper symbol is intentionally introduced — see Plan 04-08 Task 2
// Step 3 contract.
//
// The sendMessage callback ack is captured as a SOFT DIAGNOSTIC only,
// NOT a gating check — see the function doc + debug session
// .planning/debug/a33-save-ack-race.md. The freshly-woken SW completes
// saveArchive() + writes the archive regardless of whether the original
// response port survives long enough for the ack to land; gating on it
// is a flaky-by-design test (the ack intermittently surfaces
// chrome.runtime.lastError "message port closed before a response was
// received" on the worker.close() -> respawn boundary). A33.1 instead
// gates on the durable race-free signal — a fresh zip on disk.
const preSnapshot = snapshotExistingZips(downloadsDir);
const saveResult = await page.evaluate(
(timeoutMs: number) =>
new Promise<{ success: boolean; error?: string }>((resolve) => {
@@ -2654,25 +2690,29 @@ export async function driveA33(
}),
A33_SAVE_ARCHIVE_TIMEOUT_MS,
);
checks.push({
name: 'A33.1: SAVE_ARCHIVE ack success after 5-min idle + SW kill',
expected: true,
actual: saveResult.success,
passed: saveResult.success === true,
});
diagnostics.push(
`A33 Step 5: SAVE_ARCHIVE sendMessage ack (soft diagnostic, non-gating) -> ` +
`success=${saveResult.success}` +
(saveResult.error !== undefined ? ` error="${saveResult.error}"` : ''),
);
// Step 6 — settle for chrome.downloads to finish writing.
await new Promise((res) => setTimeout(res, A33_DOWNLOAD_SETTLE_MS));
// Step 7 — locate the produced zip + measure the video entry.
const zipPath = findLatestZip(downloadsDir);
// Step 7 — poll downloadsDir for a FRESH archive (race-free). This is
// the canonical post-SAVE detection used by driveA12/A13/A27 — it
// tolerates the CDP `download.zip` overwrite pattern (mtime diff vs the
// pre-SAVE snapshot) and uses the stable-size protocol. A33.1 gates on
// this: the SAVE provably produced an archive after the 5-min idle +
// SW kill, independent of the best-effort sendMessage ack.
const zipPath = await pollForNewOrUpdatedZip(downloadsDir, preSnapshot);
checks.push({
name: 'A33.1: fresh archive written to downloadsDir after 5-min idle + SW kill (race-free; sendMessage ack is a soft diagnostic per .planning/debug/a33-save-ack-race.md)',
expected: 'fresh zip within poll timeout',
actual: zipPath !== null ? `fresh zip: ${zipPath}` : 'no fresh zip within poll timeout',
passed: zipPath !== null,
});
if (zipPath === null) {
checks.push({
name: 'A33.0: at least one zip present in downloadsDir',
expected: '>=1 zip',
actual: 'no zip in downloadsDir',
passed: false,
});
return {
passed: false,
name: 'A33 — SW state persistence (5-min idle + SW kill; ROADMAP SC #1)',