Files
mokosh/.planning/phases/03-spec-10-smoke-verification-dom-event-log-verification/03-04-PLAN.md
Mark b3bfbf4a8d feat(03): plans 01-05 — Phase 3 SPEC §10 smoke + DOM/event-log verification
5 plans across 5 waves (Wave 2 sequential per RESEARCH Pitfall 6 file overlap):
- 03-01 Wave 1: rrweb DOM verification harness extension (A29; REQ-rrweb-dom-buffer; §10 #4)
- 03-02 Wave 2: event-log verification harness extension (A30; REQ-user-event-log; §10 #5)
- 03-03 Wave 3: §10 #8 password-filter PARTIAL verification (A31; D-P3-02 charter)
- 03-04 Wave 4: §10 #9 RAM ceiling best-effort + Page.metrics scaffolding (A32; D-P3-04)
- 03-05 Wave 5: §10 sweep VERIFICATION.md + REQUIREMENTS/ROADMAP/STATE marker flips
  (REQ-install-clean + REQ-rrweb-dom-buffer + REQ-user-event-log)

Each plan has:
- frontmatter (wave + depends_on + files_modified + autonomous + requirements + tags + must_haves)
- tasks with mandatory <read_first> + <acceptance_criteria> + concrete <action>
- <threat_model> block per security gate
- Validation map row(s) added to 03-VALIDATION.md (10 tasks total)

Expected UAT growth: 29/29 → 33/33 GREEN (A29-A32 + 03-05 docs).
Expected vitest baseline preserved: 171/171.
Expected Tier-1 FORBIDDEN_HOOK_STRINGS: 12 (A29+ ride production surfaces only).

ROADMAP.md Phase 3 entry replaces "Plans: TBD" with full 5-plan list.
VALIDATION.md status: planner_filled (nyquist_compliant: true; wave_0_complete: true).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 19:01:21 +02:00

381 lines
19 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
phase: 03
slug: spec-10-smoke-verification-dom-event-log-verification
plan: 04
type: execute
wave: 4
depends_on:
- 01
- 02
- 03
files_modified:
- tests/uat/lib/harness-page-driver.ts
- tests/uat/harness.test.ts
autonomous: true
requirements: []
tags:
- uat-harness
- a32
- ram-ceiling
- spec-10-9-best-effort
- approach-b
- page-metrics
- charter-d-p3-04
user_setup: []
must_haves:
truths:
- "puppeteer.Page.metrics() returns a JSHeapUsedSize value (>= 0) for the harness page realm"
- "JSHeapUsedSize for the harness page realm is below 50 MB (page-realm only; SW context excluded per RESEARCH Pitfall 2)"
- "Driver emits an explicit diagnostic line: 'NOTE: page-realm only; SW context excluded' (prevents operator misinterpretation)"
- "UAT harness exits 0 with 32 + 1 = 33/33 assertions GREEN (A31 baseline preserved + new A32)"
artifacts:
- path: "tests/uat/lib/harness-page-driver.ts"
provides: "driveA32 host-side Page.metrics scaffolding (best-effort; explicit page-realm-only diagnostic)"
contains: "driveA32"
- path: "tests/uat/harness.test.ts"
provides: "driveA32 import + drivers-array push entry (no wrapped driver — Page.metrics needs only page, not downloadsDir)"
contains: "driveA32"
key_links:
- from: "tests/uat/harness.test.ts"
to: "tests/uat/lib/harness-page-driver.ts driveA32"
via: "import + drivers-array push"
pattern: "driveA32"
- from: "tests/uat/lib/harness-page-driver.ts driveA32"
to: "puppeteer.Page.metrics() CDP Performance.getMetrics"
via: "await page.metrics()"
pattern: "page.metrics\\(\\)"
---
<objective>
Extend the UAT harness with A32 — best-effort scaffolding for SPEC §10 #9
(extension background RAM ≤ 50 MB). Per D-P3-04 locked decision: this is
best-effort + operator-driven. The harness DOES NOT measure the MV3
service worker heap (RESEARCH Pitfall 2: Page.metrics is page-realm
only). The genuine binding §10 #9 gate is the operator's
`chrome://memory-internals` observation, recorded in Plan 03-05
VERIFICATION.md `human_verification` block.
A32 SHIPS the optional Page.metrics scaffolding per RESEARCH Open
Question 3 recommendation (~30 lines; cost-cheap; informational value).
Diagnostic output explicitly states the page-realm scope so the
operator never confuses an automation GREEN with full §10 #9 closure.
Purpose: Provides a low-cost informational floor for page-realm heap
usage and exercises the puppeteer.Page.metrics API end-to-end so Phase
4 (programmatic RAM measurement upgrade) inherits a working scaffold.
Output: A32 assertion with 2 host-side checks (Page.metrics returned
JSHeapUsedSize >= 0 + JSHeapUsedSize < 50 MB) + an explicit diagnostic
line about page-realm scope; UAT count 32 → 33 GREEN.
</objective>
<execution_context>
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/REQUIREMENTS.md
@.planning/phases/03-spec-10-smoke-verification-dom-event-log-verification/03-CONTEXT.md
@.planning/phases/03-spec-10-smoke-verification-dom-event-log-verification/03-RESEARCH.md
@.planning/phases/03-spec-10-smoke-verification-dom-event-log-verification/03-PATTERNS.md
@.planning/phases/03-spec-10-smoke-verification-dom-event-log-verification/03-01-PLAN.md
@.planning/phases/03-spec-10-smoke-verification-dom-event-log-verification/03-02-PLAN.md
@.planning/phases/03-spec-10-smoke-verification-dom-event-log-verification/03-03-PLAN.md
<interfaces>
<!-- Key contracts the executor needs. -->
From puppeteer ^25.0.2 (Page.metrics):
interface Metrics {
Timestamp?: number;
Documents?: number;
Frames?: number;
JSEventListeners?: number;
Nodes?: number;
LayoutCount?: number;
RecalcStyleCount?: number;
LayoutDuration?: number;
RecalcStyleDuration?: number;
ScriptDuration?: number;
TaskDuration?: number;
JSHeapUsedSize?: number; // <- bytes; the field A32 reads
JSHeapTotalSize?: number;
}
page.metrics(): Promise<Metrics>;
From RESEARCH.md §"Code Example A3X":
- Page.metrics is page-realm only — JSHeapUsedSize covers V8 isolate
of THIS Page, NOT the MV3 service worker (separate target).
- 50 MB threshold per SPEC §10 #9; treat as best-effort floor for the
page realm alone.
- Diagnostic copy gate: emit
'NOTE: page-realm only; SW context measurement requires
chrome://memory-internals operator verification per D-P3-04.'
From src/shared/types.ts: no UserEvent / type changes for A32.
</interfaces>
# Plan Anchors
- **Sequential wave assignment (per RESEARCH Pitfall 6 + file-overlap rule):** Plan 03-04 lives in wave 4
modifies tests/uat/lib/harness-page-driver.ts + tests/uat/harness.test.ts
(SAME files as Plans 03-01..03; depends_on enforces sequential).
- **NO page-side assertion needed.** Page.metrics is a host-side
puppeteer API. Unlike A24..A31, A32 does NOT call assertA32 inside
page.evaluate — there's no need for a window.__mokoshHarness method.
This is consistent with how the host-side latency portion of A25 is
computed; A32 is similar but skips the page-side entirely.
- **No setupFreshRecording, no SAVE, no zip read.** A32 measures the
current heap state of the harness page; no archive is produced.
- **RESEARCH Pitfall 2 mitigation (HARD):** the diagnostic line about
page-realm scope MUST be emitted regardless of pass/fail. This
prevents an operator from glancing at "A32 GREEN" and concluding §10
#9 is closed.
- **50 MB threshold:** SPEC §10 #9 + CON-ram-ceiling. Page-realm typical
values: a few MB (Plan 02-04 harness measurements show ~2-8 MB
page-realm heap during recording). Far below the 50 MB ceiling on
any reasonable run.
- **FORBIDDEN_HOOK_STRINGS lockstep:** A32 is host-side only; Page.metrics
is not bundled to the page. Tier-1 inventory stays at 12 entries.
- **A6 in RESEARCH Assumptions Log MEDIUM-risk noted:** "if Plan 03-04
scaffolding requires a new bridge op (e.g., `get-page-metrics` from
offscreen → harness), that would add 1-2 entries." This plan AVOIDS
that: Page.metrics is read from the host puppeteer object directly;
no new bridge ops added; no new __MOKOSH_UAT__ symbols.
</context>
<tasks>
<task type="auto" tdd="false">
<name>Task 1: Add driveA32 host-side (puppeteer.Page.metrics scaffolding) + orchestrator wiring</name>
<files>tests/uat/lib/harness-page-driver.ts, tests/uat/harness.test.ts</files>
<read_first>
- tests/uat/lib/harness-page-driver.ts (full sense of the file; in particular how driveA1 is a 1-line page.evaluate wrapper, contrasting with A32 which is pure host-side)
- tests/uat/harness.test.ts where Plan 03-03 added driveA31 + driveA31Wrapped + drivers-array entry (study shape)
- .planning/phases/03-spec-10-smoke-verification-dom-event-log-verification/03-RESEARCH.md §"Code Example A3X" (canonical scaffolding shape; verbatim copy)
- .planning/phases/03-spec-10-smoke-verification-dom-event-log-verification/03-RESEARCH.md §"Pitfall 2" (diagnostic-copy gate)
</read_first>
<behavior>
Host-side (`tests/uat/lib/harness-page-driver.ts`):
- Adds `export async function driveA32(page: Page): Promise<AssertionRecord>`:
- Calls `const metrics = await page.metrics();`
- Computes `const jsHeapBytes = metrics.JSHeapUsedSize ?? -1;`
- Computes `const jsHeapMB = jsHeapBytes >= 0 ? jsHeapBytes / (1024 * 1024) : -1;`
- Pushes A32.1 (Page.metrics returned JSHeapUsedSize): expected '>= 0', actual `jsHeapBytes`, passed `jsHeapBytes >= 0`
- Pushes A32.2 (page-realm JS heap < 50 MB): expected '< 50 MB', actual `${jsHeapMB.toFixed(2)} MB`, passed `jsHeapMB >= 0 && jsHeapMB < 50`
- Pushes the mandatory diagnostic: `'NOTE: page-realm only; SW context measurement requires chrome://memory-internals operator verification per D-P3-04.'`
- Also pushes informational diagnostics: `JSHeapUsedSize=${jsHeapBytes} bytes` and `JSHeapTotalSize=${metrics.JSHeapTotalSize ?? -1} bytes`
- Returns AssertionRecord computed `passed = checks.every(c => c.passed)`
- The new constant `A32_RAM_CEILING_BYTES = 50 * 1024 * 1024` makes the threshold readable.
Orchestrator (`tests/uat/harness.test.ts`):
- Adds `driveA32,` to import block (after `driveA31,`).
- NO `driveA32Wrapped` const needed (driveA32 takes only `page`).
- Adds `{ name: 'A32', drive: driveA32 },` to drivers array AFTER the A31 entry, with banner comment citing D-P3-04 + Pitfall 2.
- Updates orchestrator banner line to append `, A32`.
</behavior>
<action>
1. Open `tests/uat/lib/harness-page-driver.ts`. At the end of the file (AFTER driveA31 added by Plan 03-03), append:
```typescript
/* ─── Plan 03-04 — driveA32 (RAM scaffolding best-effort) ──────────── */
/** RAM ceiling per SPEC §10 #9 + CON-ram-ceiling. */
const A32_RAM_CEILING_BYTES = 50 * 1024 * 1024;
/** Bytes-per-MB factor for diagnostic copy. */
const A32_BYTES_PER_MB = 1024 * 1024;
/**
* Drive A32 (Plan 03-04 — SPEC §10 #9 RAM best-effort per D-P3-04).
*
* Reads puppeteer.Page.metrics() against the harness page and asserts
* JSHeapUsedSize is below the 50 MB ceiling. This is informational
* scaffolding ONLY:
*
* - RESEARCH Pitfall 2: Page.metrics is page-realm only. The MV3
* service worker is a separate Puppeteer target with its own V8
* isolate; page.metrics() does not aggregate across workers/iframes.
* - The page-realm value reported here is NOT the operator-facing
* "extension background RAM" measurement that SPEC §10 #9 requires.
* - The binding §10 #9 gate lives in Plan 03-05 VERIFICATION.md
* `human_verification` block (operator runs chrome://memory-internals
* OR chrome://extensions service-worker memory display).
*
* Why ship this anyway (per RESEARCH Open Question 3):
* - Low cost (~30 lines; single API call; no new bundle surface).
* - Exercises the Page.metrics API end-to-end so Phase 4 (programmatic
* RAM measurement upgrade) inherits a working scaffold.
* - Provides a sanity floor — if the harness page-realm heap ever
* blows past 50 MB, something has gone catastrophically wrong in
* the test infrastructure itself (not necessarily a §10 #9 regression
* in production).
*
* The diagnostic line about page-realm scope MUST be emitted regardless
* of pass/fail per Pitfall 2.
*
* @param page - The harness page from `launchHarnessBrowser`.
* @returns AssertionRecord with 2 checks (heap returned + heap < 50 MB)
* + explicit page-realm-only diagnostic.
*/
export async function driveA32(page: Page): Promise<AssertionRecord> {
const checks: CheckRecord[] = [];
const diagnostics: string[] = [];
// Pitfall 2 gate: emit the page-realm caveat BEFORE any other diagnostic
// so it leads in the structured output (the operator sees it first).
diagnostics.push(
'NOTE: page-realm only; SW context measurement requires chrome://memory-internals operator verification per D-P3-04.',
);
let metricsErr: string | null = null;
let jsHeapBytes = -1;
let jsHeapTotal = -1;
try {
const metrics = await page.metrics();
jsHeapBytes = metrics.JSHeapUsedSize ?? -1;
jsHeapTotal = metrics.JSHeapTotalSize ?? -1;
} catch (err) {
metricsErr = err instanceof Error ? err.message : String(err);
}
const jsHeapMB = jsHeapBytes >= 0 ? jsHeapBytes / A32_BYTES_PER_MB : -1;
diagnostics.push(`A32 JSHeapUsedSize=${jsHeapBytes} bytes (${jsHeapMB.toFixed(2)} MB)`);
diagnostics.push(`A32 JSHeapTotalSize=${jsHeapTotal} bytes`);
if (metricsErr !== null) {
diagnostics.push(`A32 Page.metrics threw: ${metricsErr}`);
}
checks.push({
name: 'A32.1: Page.metrics returned a JSHeapUsedSize value >= 0',
expected: '>= 0',
actual: jsHeapBytes,
passed: jsHeapBytes >= 0,
});
checks.push({
name: `A32.2: Page-realm JS heap < ${A32_RAM_CEILING_BYTES / A32_BYTES_PER_MB} MB (NOTE: scaffolding only; SW context excluded per D-P3-04)`,
expected: `< ${A32_RAM_CEILING_BYTES / A32_BYTES_PER_MB} MB`,
actual: jsHeapMB >= 0 ? `${jsHeapMB.toFixed(2)} MB` : 'unavailable',
passed: jsHeapBytes >= 0 && jsHeapBytes < A32_RAM_CEILING_BYTES,
});
const passed = checks.every((c) => c.passed);
return {
passed,
name: 'A32 — RAM scaffolding (best-effort; page-realm only per D-P3-04 / SPEC §10 #9)',
checks,
diagnostics,
error: metricsErr ?? undefined,
};
}
```
2. Open `tests/uat/harness.test.ts`. In the import block from `./lib/harness-page-driver`, AFTER `driveA31,` and BEFORE `getManifestVersion,` add:
```typescript
// Plan 03-04 — RAM scaffolding best-effort (SPEC §10 #9 per D-P3-04)
driveA32,
```
3. In the drivers array, AFTER the `{ name: 'A31', ... }` entry from Plan 03-03, add:
```typescript
// Plan 03-04 A32: RAM scaffolding (SPEC §10 #9 best-effort per D-P3-04).
// NOTE — Page.metrics is page-realm only; SW context is a separate
// Puppeteer target (RESEARCH Pitfall 2). A32 is informational
// scaffolding; the binding §10 #9 gate lives in Plan 03-05
// VERIFICATION.md `human_verification` block. No wrapped const
// needed — driveA32 takes only `page`.
{ name: 'A32', drive: driveA32 },
```
4. Update the orchestrator banner line (line 268) to append `, A32`:
```typescript
process.stdout.write('Architecture: A0 pre-flight + extension-internal page driver (A1..A14, A15..A17, A18..A22, A23, A24, A25, A26, A27, A28, A29, A30, A31, A32)\n');
```
5. Run `npx tsc --noEmit`. Expected: clean.
6. Run `HEADLESS=1 SKIP_PROD_REBUILD=0 npm run test:uat`. Expected: `33/33 GREEN`.
</action>
<verify>
<automated>npx tsc --noEmit; D=$(grep -c "driveA32" tests/uat/lib/harness-page-driver.ts); test "$D" -ge 2 &amp;&amp; H=$(grep -c "driveA32" tests/uat/harness.test.ts); test "$H" -ge 2 &amp;&amp; grep -q "NOTE: page-realm only" tests/uat/lib/harness-page-driver.ts &amp;&amp; HEADLESS=1 SKIP_PROD_REBUILD=0 npm run test:uat</automated>
</verify>
<acceptance_criteria>
- `npx tsc --noEmit` exits 0.
- `grep -c 'driveA32' tests/uat/lib/harness-page-driver.ts` returns >=2.
- `grep -c 'driveA32' tests/uat/harness.test.ts` returns >=2 (import line + drivers-array push; no wrapped const).
- `grep -c 'NOTE: page-realm only' tests/uat/lib/harness-page-driver.ts` returns exactly 1.
- `grep -c 'page.metrics()' tests/uat/lib/harness-page-driver.ts` returns exactly 1.
- `grep -c 'A32_RAM_CEILING_BYTES' tests/uat/lib/harness-page-driver.ts` returns >=2 (declaration + usage).
- `HEADLESS=1 SKIP_PROD_REBUILD=0 npm run test:uat` exits 0 with stdout containing `UAT harness: 33/33 assertions passed` AND the diagnostic line `NOTE: page-realm only; SW context measurement requires chrome://memory-internals operator verification per D-P3-04.` (printed by printAssertionResult on A32).
- `npm test -- --run tests/background/no-test-hooks-in-prod-bundle.test.ts` exits 0 (Tier-1 inventory stays at 12).
</acceptance_criteria>
<done>
UAT harness runs 33/33 GREEN. A32 emits the page-realm-only diagnostic
on EVERY run (pass or fail). FORBIDDEN_HOOK_STRINGS unchanged at 12.
Page.metrics scaffolding lives in the harness for Phase 4 to upgrade.
The binding §10 #9 gate remains operator-driven and is recorded as
human_verification in Plan 03-05.
</done>
</task>
</tasks>
<threat_model>
## Trust Boundaries
| Boundary | Description |
|----------|-------------|
| Puppeteer host ↔ CDP | Page.metrics is a thin wrapper over CDP Performance.getMetrics; runs in the puppeteer host process, no extension code path |
| Page realm ↔ host realm | A32 does NOT use page.evaluate; no new contract between page and host |
| dist-test/ ↔ dist/ | Two-bundle separation: Plan 03-04 adds NO test-only symbols; production bundle invariant unchanged |
## STRIDE Threat Register
| Threat ID | Category | Component | Disposition | Mitigation Plan |
|-----------|----------|-----------|-------------|-----------------|
| T-03-04-01 | Repudiation | Operator interprets A32 GREEN as full §10 #9 closure, skips chrome://memory-internals check | mitigate | Mandatory diagnostic line `'NOTE: page-realm only; SW context measurement requires chrome://memory-internals operator verification per D-P3-04.'` emitted on EVERY run; check name itself includes the caveat; Plan 03-05 VERIFICATION.md explicitly lists §10 #9 in `human_verification` block. Three layers of operator-visible signal. |
| T-03-04-02 | Information Disclosure | Test-only hook surface leaking to production bundle | mitigate | A32 is host-side only; Page.metrics is not bundled to the page realm. FORBIDDEN_HOOK_STRINGS unchanged at 12 entries. |
| T-03-04-03 | Denial of Service | Page.metrics returns 0 or throws on first call after browser launch | mitigate | A32 wraps the call in try/catch + falls through gracefully (jsHeapBytes stays -1; A32.1 RED with clear diagnostic). Per A3 in RESEARCH Assumptions Log, Page.metrics has been stable since Puppeteer 1.x; failure is extremely unlikely on 25.0.2. |
| T-03-04-04 | Elevation of Privilege | New chrome.* permission grant for measurement | accept | A32 uses zero chrome.* APIs. Page.metrics is a CDP call, not an extension API. No manifest delta. |
No new production surface; threat surface unchanged from Plan 03-03.
UAT harness extension is test-only and adds no bundle surface (Page.metrics
is host-side only).
</threat_model>
<verification>
- `npx tsc --noEmit` exits 0.
- `HEADLESS=1 SKIP_PROD_REBUILD=0 npm run test:uat` exits 0 with 33/33 GREEN.
- The diagnostic line `NOTE: page-realm only; SW context measurement requires chrome://memory-internals operator verification per D-P3-04.` appears in stdout from A32.
- `npm test -- --run tests/background/no-test-hooks-in-prod-bundle.test.ts` exits 0 (12 FORBIDDEN_HOOK_STRINGS × 0 hits each).
</verification>
<success_criteria>
- A32 GREEN with 2 checks (heap returned + heap < 50 MB).
- Pitfall 2 diagnostic emitted on every run.
- Page.metrics scaffolding in place for Phase 4 to upgrade.
- FORBIDDEN_HOOK_STRINGS unchanged at 12 entries.
- vitest baseline preserved (171/171 GREEN).
- Plan 03-05 will record §10 #9 as `human_verification` regardless of A32
status — A32 is informational scaffolding, NOT the binding gate.
</success_criteria>
<output>
After completion, create
`.planning/phases/03-spec-10-smoke-verification-dom-event-log-verification/03-04-SUMMARY.md`
documenting:
- A32 host-side-only scaffolding rationale (no page-side; Page.metrics is host)
- D-P3-04 + Pitfall 2 compliance (mandatory page-realm-only diagnostic)
- Phase 4 inheritance: programmatic RAM measurement upgrade path
- UAT 32 → 33 GREEN; Tier-1 inventory unchanged at 12
- Plan 03-05 wave dependency: VERIFICATION.md aggregator; depends on Plans 03-01..04 GREEN
</output>