Files
mokosh/tests/uat/harness.test.ts
Mark 47e9818cb1 feat(02-04): harness A25 — empirical <5s SAVE→zip latency (REQ-archive-export-latency, SPEC §10 #6)
Wire A25 into the UAT harness as the binding empirical gate for
REQ-archive-export-latency / SPEC §10 #6 (5000ms hard ceiling end-to-end
from SAVE_ARCHIVE dispatch to zip-on-disk).

Architecture:
- Page-side assertA25 records t0 (performance.now) + t0Wall (Date.now)
  + tAck bookends around the chrome.runtime.sendMessage(SAVE_ARCHIVE)
  call. Returns A25Result extending AssertionRecord with the 3 timing
  fields + ackSuccess flag.
- Host-side driveA25(page, downloadsDir) snapshots zip dir BEFORE
  page.evaluate dispatch, polls for new-or-overwritten .zip via mtime
  delta (mirrors A12/A13 overwrite-aware pattern), uses page-supplied
  t0Wall as the host anchor for the dispatch→file-on-disk latency
  check (NOT a host-side Date.now captured before page.evaluate, which
  would include setupFreshRecording + 11s segment-settle wall time and
  always fail the 5s budget).

[Rule 1 - Bug] Initial implementation used host-side Date.now() captured
before page.evaluate as the latency anchor — this incorrectly included
the 11s segment-settle window in the budget. First run observed
A25.3=11188ms (FAIL). Fix: page-side captures Date.now() at the
SAVE_ARCHIVE dispatch instant (AFTER setupFreshRecording + segment-settle
complete) and returns it as t0Wall in A25Result; the driver uses this
as the canonical host anchor. Result on re-run: A25.3=61ms (GREEN, well
under 5s SLO). Documented per T-02-04-02 disposition (bracket only the
SAVE dispatch, not the broader test orchestration).

Files modified:
- tests/uat/extension-page-harness.ts (+~115 lines): assertA25 +
  A25_* constants + A25Result interface
- tests/uat/lib/harness-page-driver.ts (+~95 lines): driveA25 +
  A25_HOST_POLL_TIMEOUT_MS const + A25_LATENCY_CEILING_MS const
- tests/uat/harness.test.ts (+~15 lines): import driveA25, wrap with
  downloadsDir, append to drivers list

Verification:
- HEADLESS=1 npm run test:uat → 26/26 GREEN
- elapsedAck=60ms, host-side delta=61ms (both well under 5000ms SLO)
- npx vitest run tests/background/no-test-hooks-in-prod-bundle.test.ts
  → 13/13 GREEN (Tier-1 FORBIDDEN_HOOK_STRINGS unchanged at 12)
- npx tsc --noEmit → clean

Plan 02-04 scope: 2/3 tasks landed (A24 + A25); Task 3 adds
A26 (meta.json 8-field) + A27 (multi-tab strict) + A28 (archive-layout strict).
2026-05-20 16:49:56 +02:00

471 lines
20 KiB
TypeScript

// tests/uat/harness.test.ts — Plan 01-13 orchestrator (Wave 3A → Task 9).
//
// Top-level entry for the production UAT harness. Drives all 15
// assertions sequentially against a SINGLE launched Chrome instance with
// a SINGLE harness page; bails on the first failure with a structured
// diagnostic dump. Exits 0 only when 15/15 GREEN.
//
// Wave 3A scope — wires A0+A1+A2+A3+A4+A6 (A6 via the proven Wave-2
// driver). A5+A7..A13 threw `NOT YET IMPLEMENTED — Wave 3<X> wires this`
// from `tests/uat/lib/harness-page-driver.ts`; the bail-on-first-failure
// loop stopped at the first such throw.
//
// Wave 3B wires A5 (SAVE_ARCHIVE → zip on disk) + A7 (genuine
// RECORDING_ERROR → ERR + recovery notification). Wave 3C wires A8
// (Bug A canonical onStartup-notification regression rewind) + A9 (icon
// file sizes meet imageUtil floors) + A10 (manifest shape contract).
// Wave 3D wires A11+A12+A13 for 14/14 GREEN.
//
// Plan 01-13 Task 9 closure (debug 01-09-save-stops-recording) adds A14:
// post-SAVE auto-stop state check (badge='', popup='', no new
// mokosh-recovery-*). Chains off A13's SAVE_ARCHIVE — read-only
// observation, no new dispatch.
//
// Plan 01-14 adds A23 as the final functional assertion (post-A14 chain):
// read-only inspection of the last `getDisplayMedia` constraints from
// A2's setupFreshRecording; verifies the production call site passes
// `monitorTypeSurfaces: 'include'` (W3C Screen Capture spec §6.1; Chrome
// ≥ 119 picker-narrowing semantics — removes the Window + Chrome-Tab
// panes from the operator's picker dialog). A23 has no side effects
// (the constraints cell is populated by A2 and read by the bridge op);
// hence independent of A14's no-side-effects post-SAVE contract.
// Final target: 16/16 GREEN.
//
// The orchestrator structure is final from Wave 3A onward; future waves
// only fill in the assertion-driver stubs.
//
// Architectural commitments (per 01-11-SUMMARY.md, DO NOT REGRESS):
// - Single browser, single recording per run (state machine: idle →
// A1 reads idle → A2 transitions to REC → A3+A4 read REC →
// A5 saves archive → A6 simulates user-stop → A7 surfaces ERR → ...).
// - A0 (Tier-1 grep gate) runs PRE-FLIGHT before any Chrome launch.
// Mirrors `tests/background/no-test-hooks-in-prod-bundle.test.ts`
// FORBIDDEN_HOOK_STRINGS inventory. Belt-and-suspenders: the unit
// test gate runs in `npm test` (~15s); the UAT-level A0 runs in
// `npm run test:uat` (~60-90s). Same invariant; two independent
// verification paths.
// - Drive Chrome FROM INSIDE: each assertion is a single
// `page.evaluate(() => window.__mokoshHarness.assertXX())` call;
// no SW.evaluate, no popup-bridge (both falsified per 01-11-SUMMARY).
//
// References:
// - puppeteer.launch + extension loading:
// https://pptr.dev/api/puppeteer.launchoptions
// - Node fs.readdirSync recursive walk:
// https://nodejs.org/api/fs.html#fsreaddirsyncpath-options
// - Node child_process.execFileSync:
// https://nodejs.org/api/child_process.html#child_processexecfilesyncfile-args-options
import { execFileSync } from 'node:child_process';
import { existsSync, readFileSync, readdirSync, statSync } from 'node:fs';
import { dirname, resolve as resolvePath } from 'node:path';
import { fileURLToPath } from 'node:url';
import { launchHarnessBrowser } from './lib/launch';
import {
driveA1,
driveA2,
driveA3,
driveA4,
driveA5,
driveA6,
driveA7,
driveA8,
driveA9,
driveA10,
driveA11,
driveA12,
driveA13,
driveA14,
// Plan 01-10 Wave 3 — onboarding + design-swap-readiness
driveA15,
driveA16,
driveA17,
// Plan 01-12 Wave 6 — design integration assertions
driveA18,
driveA19,
driveA20,
driveA21,
driveA22,
// Plan 01-14 — picker-narrowing constraint
driveA23,
// Plan 02-04 Task 1 — D-P2-01 empirical Blob URL verification
driveA24,
// Plan 02-04 Task 2 — REQ-archive-export-latency (5s ceiling)
driveA25,
getManifestVersion,
} from './lib/harness-page-driver';
import {
printAssertionResult,
runAssertion,
type AssertionRecord,
} from './lib/assertions';
/**
* A0 forbidden-string inventory — mirrors
* `tests/background/no-test-hooks-in-prod-bundle.test.ts:FORBIDDEN_HOOK_STRINGS`.
* Keep in sync. The two lists serving the same invariant is intentional
* (belt-and-suspenders per `feedback-pre-checkpoint-bundle-gates.md`):
* unit-test gate catches at `npm test`, UAT gate catches at `npm run test:uat`.
*/
const FORBIDDEN_HOOK_STRINGS: ReadonlyArray<string> = [
'__mokoshTest',
'setCurrentStream',
'setSegmentCountGetter',
'installFakeDisplayMedia',
'uninstallFakeDisplayMedia',
'dispatchEndedOnTrack',
'getSegmentCount',
'__mokoshOffscreenQuery',
'get-display-surface',
'get-segment-count',
// Plan 01-14 A23 surface — lockstep with unit-gate inventory at
// tests/background/no-test-hooks-in-prod-bundle.test.ts:105.
'lastGetDisplayMediaConstraints',
'get-last-getDisplayMedia-constraints',
];
/** Build timeout for the pre-flight production rebuild (matches unit-gate value). */
const PROD_BUILD_TIMEOUT_MS = 60_000;
/** Resolve repo-root paths from this file's location. */
const HARNESS_FILE_DIR = dirname(fileURLToPath(import.meta.url));
const REPO_ROOT = resolvePath(HARNESS_FILE_DIR, '..', '..');
const DIST_DIR = resolvePath(REPO_ROOT, 'dist');
/** Binary extensions skipped during the grep walk (mirror of unit gate). */
const BINARY_EXTENSIONS: ReadonlySet<string> = new Set([
'.png', '.jpg', '.jpeg', '.gif', '.ico', '.webp', '.woff', '.woff2', '.ttf', '.otf',
]);
/**
* Recursively collect every regular file under `root`. Returns absolute
* paths sorted alphabetically for stable diagnostics.
*
* @param root - Absolute directory path to walk.
* @returns Sorted list of absolute file paths under `root`.
*/
function listAllFilesRecursive(root: string): ReadonlyArray<string> {
const accumulator: string[] = [];
const stack: string[] = [root];
while (stack.length > 0) {
const dir = stack.pop()!;
const entries = readdirSync(dir, { withFileTypes: true });
for (const entry of entries) {
const fullPath = resolvePath(dir, entry.name);
if (entry.isSymbolicLink()) {
continue;
}
if (entry.isDirectory()) {
stack.push(fullPath);
} else if (entry.isFile()) {
accumulator.push(fullPath);
}
}
}
return accumulator.sort();
}
/**
* Count occurrences of `needle` in the given file. Returns 0 for binary
* file extensions (text matching against UTF-8 of a PNG would be
* meaningless and could yield spurious matches).
*
* @param filePath - Absolute file path to scan.
* @param needle - Literal substring to count.
* @returns Total occurrences in the file's text.
*/
function countOccurrencesInFile(filePath: string, needle: string): number {
const dotIdx = filePath.lastIndexOf('.');
const ext = dotIdx >= 0 ? filePath.substring(dotIdx).toLowerCase() : '';
if (BINARY_EXTENSIONS.has(ext)) {
return 0;
}
const stat = statSync(filePath);
if (stat.size === 0) {
return 0;
}
const text = readFileSync(filePath, 'utf8');
let count = 0;
let from = 0;
for (;;) {
const idx = text.indexOf(needle, from);
if (idx < 0) {
break;
}
count += 1;
from = idx + needle.length;
}
return count;
}
/**
* A0 — Tier-1 grep gate (UAT-level mirror of the unit-gate). Spawns
* `npm run build` if `SKIP_PROD_REBUILD !== '1'`, then walks `dist/`
* checking every forbidden string. Reports all matches in one pass
* (full enumeration, not bail-on-first) so the operator sees the entire
* leak surface in a single failure.
*
* @returns Structured A0 result: passed flag + list of (string, file) matches.
*/
async function assertA0_GrepGate(): Promise<{
passed: boolean;
matches: Array<{ needle: string; filePath: string; count: number }>;
}> {
if (process.env.SKIP_PROD_REBUILD !== '1') {
process.stdout.write('A0: running `npm run build` (set SKIP_PROD_REBUILD=1 to skip)...\n');
execFileSync('npm', ['run', 'build'], {
stdio: 'inherit',
timeout: PROD_BUILD_TIMEOUT_MS,
});
} else {
process.stdout.write('A0: SKIP_PROD_REBUILD=1 — using existing dist/\n');
}
if (!existsSync(DIST_DIR)) {
return {
passed: false,
matches: [
{
needle: '<missing dist/>',
filePath: DIST_DIR,
count: 0,
},
],
};
}
const files = listAllFilesRecursive(DIST_DIR);
const matches: Array<{ needle: string; filePath: string; count: number }> = [];
for (const needle of FORBIDDEN_HOOK_STRINGS) {
for (const filePath of files) {
const count = countOccurrencesInFile(filePath, needle);
if (count > 0) {
matches.push({ needle, filePath, count });
}
}
}
return { passed: matches.length === 0, matches };
}
/**
* Top-to-bottom orchestrator entry. Pre-flight A0 → launch browser →
* iterate driver list → bail on first failure → close browser → return
* exit code.
*
* Plan 01-13 Task 9 closure (debug 01-09-save-stops-recording) added A14
* after A13. The orchestrator now drives 14 page-side assertions
* (A1..A14) plus the host-side A0 grep gate = 15 total.
*
* @returns Process exit code: 0 on 15/15 GREEN, 1 on any failure.
*/
async function main(): Promise<number> {
process.stdout.write('\nMokosh Plan 01-13 + 01-14 + 02-04 — UAT harness orchestrator\n');
process.stdout.write('Architecture: A0 pre-flight + extension-internal page driver (A1..A14, A15..A17, A18..A22, A23, A24, A25)\n');
process.stdout.write('='.repeat(72) + '\n');
// A0 pre-flight (no Chrome launch needed; runs against built dist/).
const a0 = await assertA0_GrepGate();
if (!a0.passed) {
process.stderr.write('\nA0 FAIL: production bundle hook-string leak detected.\n');
for (const m of a0.matches) {
process.stderr.write(` - '${m.needle}' in ${m.filePath} (${m.count} occurrence${m.count === 1 ? '' : 's'})\n`);
}
process.stderr.write(
'\nThe Vite mode gate on the test-hook imports has regressed; verify\n' +
'src/background/index.ts + src/offscreen/recorder.ts still gate via `__MOKOSH_UAT__`.\n',
);
return 1;
}
process.stdout.write('A0: GREEN (production bundle hook-free)\n\n');
// Driver registry — execution order matters:
// A1 (idle) → A2 (REC start) → A3 (displaySurface) → A4 (popup pinned)
// → A5 (SAVE_ARCHIVE) → A6 (Bug B dispatch-ended) → A7 (genuine error)
// → A8 (Bug A onStartup) → A9 (icon sizes) → A10 (manifest)
// → A11 (35s segments) → A12 (ffprobe) → A13 (zip shape).
//
// A6 currently lives mid-list because the prototype's assertA6 does
// its own ensureOffscreen + START_RECORDING (idempotent w.r.t. A2's
// recording), then dispatch-ended. After A6 the recording is torn
// down — A7+ would need to re-start or test post-stop state.
//
// Wave 3C wires A8 + A9 + A10 in addition to A1..A7 — bail-on-first-
// failure stops at A11 (Wave 3D wires that). Expected diagnostic:
// "11/14 GREEN: A0+A1+A2+A3+A4+A5+A6+A7+A8+A9+A10; A11..A13 NOT YET IMPLEMENTED".
// The standalone `npx tsx tests/uat/a6.test.ts` entry remains the
// way to verify A6 in isolation for inner-loop iteration.
process.stdout.write('Launching Chrome + opening harness page...\n');
const handles = await launchHarnessBrowser();
process.stdout.write(`Extension id: ${handles.extensionId}\n`);
process.stdout.write(`Downloads dir: ${handles.downloadsDir}\n\n`);
// Adapter: driveA5 / driveA12 / driveA13 need `handles.downloadsDir`
// (host-side fs polling). driveA13 additionally needs the manifest
// version (read once at orchestrator startup via the page-side
// `getManifestVersion` helper). All other drivers take only `page`.
// The driver list is constructed AFTER `launchHarnessBrowser` returns
// so the closures can capture handles without a TDZ trap.
const expectedManifestVersion = await getManifestVersion(handles.harnessPage);
process.stdout.write(`Manifest version (for A13): ${expectedManifestVersion}\n\n`);
const driveA5Wrapped: (page: import('puppeteer').Page) => Promise<AssertionRecord> =
(page) => driveA5(page, handles.downloadsDir);
const driveA12Wrapped: (page: import('puppeteer').Page) => Promise<AssertionRecord> =
(page) => driveA12(page, handles.downloadsDir);
const driveA13Wrapped: (page: import('puppeteer').Page) => Promise<AssertionRecord> =
(page) => driveA13(page, handles.downloadsDir, expectedManifestVersion);
// Plan 02-04 Task 2 — driveA25 needs downloadsDir for the host-side
// dispatch→file-on-disk latency check (mirrors A5/A12/A13 wrapping).
const driveA25Wrapped: (page: import('puppeteer').Page) => Promise<AssertionRecord> =
(page) => driveA25(page, handles.downloadsDir);
const drivers: ReadonlyArray<{
readonly name: string;
readonly drive: (page: import('puppeteer').Page) => Promise<AssertionRecord>;
}> = [
{ name: 'A1', drive: driveA1 },
{ name: 'A2', drive: driveA2 },
{ name: 'A3', drive: driveA3 },
{ name: 'A4', drive: driveA4 },
{ name: 'A5', drive: driveA5Wrapped },
{ name: 'A6', drive: driveA6 },
{ name: 'A7', drive: driveA7 },
{ name: 'A8', drive: driveA8 },
{ name: 'A9', drive: driveA9 },
{ name: 'A10', drive: driveA10 },
{ name: 'A11', drive: driveA11 },
{ name: 'A12', drive: driveA12Wrapped },
{ name: 'A13', drive: driveA13Wrapped },
// Plan 01-13 Task 9 closure (debug 01-09-save-stops-recording): A14
// verifies that A13's SAVE_ARCHIVE auto-stopped the recording per
// SPEC one-shot intent. Read-only assertion on chrome.action +
// notification ids state; no new SAVE dispatch — A13's already
// exercised the SAVE path. Recording stays stopped after A14.
{ name: 'A14', drive: driveA14 },
// Plan 01-10 Wave 3 — onboarding + design-swap-readiness (read-only;
// chained AFTER A14 + before A18 so A15/A16/A17 inspect the
// welcome-page artifacts that A22's skip-gate test (Plan 01-12 Wave 6)
// previously fell through. With Plan 01-10 landed, A22 no longer
// skip-gates — it's a substantive token-usage check.
// A15 — chrome.storage.local 'onboarding-completed' + 'installed-at'
// A16 — 2s settle: no new welcome tabs spontaneously reappear
// A17 — welcome.html parse + .welcome-hero + ≥7 mokosh-keyed +
// welcome.css canonical @import or inlined tokens + zero hex
// (or canonical resolved) + ≥5 var(--mks-*) + bundled JS
// has COPY[ or chrome.i18n.getMessage(welcomeHero +
// getComputedStyle --mks-rec probe resolves
{ name: 'A15', drive: driveA15 },
{ name: 'A16', drive: driveA16 },
{ name: 'A17', drive: driveA17 },
// Plan 01-12 Wave 6 — design integration assertions (read-only;
// independent of A14). Chained here so they execute regardless of
// the recording state machine; they only inspect static brand /
// i18n / token / icon surfaces.
// A18 — Lora WOFF2 reachability + size floor
// A19 — icons NOT the Bug A placeholders
// A20 — manifest:name resolves via chrome i18n
// A21 — --mks-font-display resolves to Lora
// A22 — welcome page tokens.css adoption (CONDITIONAL on 01-10
// landing; with Plan 01-10 landed it executes the
// substantive token-usage path rather than skip-gating)
{ name: 'A18', drive: driveA18 },
{ name: 'A19', drive: driveA19 },
{ name: 'A20', drive: driveA20 },
{ name: 'A21', drive: driveA21 },
{ name: 'A22', drive: driveA22 },
// Plan 01-14 A23: read-only inspection of the last getDisplayMedia
// constraints object captured by A2's setupFreshRecording. Verifies
// the production call at src/offscreen/recorder.ts:270 passes
// `monitorTypeSurfaces: 'include'` (W3C Screen Capture spec §6.1;
// Chrome ≥ 119 picker-narrowing semantics). Independent of A14 —
// no new getDisplayMedia call, no new state change.
{ name: 'A23', drive: driveA23 },
// Plan 02-04 Task 1 A24: D-P2-01 empirical Blob URL verification.
// Installs chrome.downloads.onCreated listener cross-realm, dispatches
// SAVE_ARCHIVE, captures the download URL, asserts the `blob:` prefix
// (closes audit P0-6 end-to-end through a real Chrome instance +
// the offscreen mint round-trip + chrome.downloads platform call).
// A24 does its OWN setupFreshRecording + SAVE because the listener
// must be installed pre-dispatch. After A24 the recording stays alive
// for any chained Plan 02-04 Tasks 2-3 assertions (Phase 2 closure).
{ name: 'A24', drive: driveA24 },
// Plan 02-04 Task 2 A25: REQ-archive-export-latency / SPEC §10 #6.
// Page-side measures SAVE→ack via performance.now() bookends; host-side
// adds the dispatch→file-on-disk latency check via downloadsDir
// polling + mtime delta. Hard ceiling: 5000ms end-to-end. A25 owns
// its setupFreshRecording (clean latency measurement; not compounded
// with A24's still-pending state). The 11s segment-settle is NOT
// counted toward the 5s budget — only the SAVE dispatch.
{ name: 'A25', drive: driveA25Wrapped },
];
const buffers = { swConsole: handles.swConsole, offConsole: handles.offConsole };
const results: Array<{ name: string; passed: boolean; error?: string }> = [];
let bailReason: string | null = null;
try {
for (const { name, drive } of drivers) {
process.stdout.write(`--- ${name} ---\n`);
let driverErr: string | undefined;
let result: AssertionRecord | null = null;
try {
result = await runAssertion(
name,
() => drive(handles.harnessPage),
buffers,
);
printAssertionResult(result);
} catch (err) {
driverErr = err instanceof Error ? err.message : String(err);
// A throw here is either: (a) a Wave-3 stub firing
// (NOT YET IMPLEMENTED) — expected during incremental waves; OR
// (b) a CDP/Puppeteer-level error (e.g. page closed, timeout) —
// a genuine harness regression. Both bail uniformly.
process.stderr.write(`*** ${name} THREW: ${driverErr}\n`);
}
const passed = result !== null && result.passed && driverErr === undefined;
results.push({ name, passed, error: driverErr });
if (!passed) {
bailReason = driverErr ?? `${name} failed; see structured checks above`;
break;
}
}
} finally {
try {
await handles.browser.close();
} catch (closeErr) {
process.stderr.write(`(non-fatal: browser close threw: ${String(closeErr)})\n`);
}
}
const passedCount = results.filter((r) => r.passed).length;
// Total = 1 (A0) + drivers.length (A1..A14, A23) = 16. Plan 01-14
// appended A23 after A14 — the running count adapts via `drivers.length`
// so no manual update is needed when future plans extend the chain.
const total = drivers.length + 1;
const finalPassed = passedCount + 1; // +1 for A0 (we already passed it to reach here)
process.stdout.write('\n' + '='.repeat(72) + '\n');
process.stdout.write(
`UAT harness: ${finalPassed}/${total} assertions passed${bailReason !== null ? ` (bailed: ${bailReason})` : ''}\n`,
);
for (const r of results) {
const mark = r.passed ? '[PASS]' : '[FAIL]';
const tail = r.error !== undefined ? `${r.error}` : '';
process.stdout.write(` ${mark} ${r.name}${tail}\n`);
}
if (bailReason !== null) {
const remainingStart = results.length;
for (let i = remainingStart; i < drivers.length; i += 1) {
process.stdout.write(` [SKIP] ${drivers[i].name} (not reached — bailed at ${results[results.length - 1].name})\n`);
}
}
process.stdout.write('='.repeat(72) + '\n');
return finalPassed === total ? 0 : 1;
}
const code = await main();
process.exit(code);