Files
mokosh/.planning/phases/04-harden-clean-up-optional/04-VALIDATION.md

6.0 KiB

phase, slug, status, nyquist_compliant, wave_0_complete, created
phase slug status nyquist_compliant wave_0_complete created
04 harden-clean-up-optional draft false false 2026-05-21

Phase 04 — Validation Strategy

Per-phase validation contract for feedback sampling during execution.

Phase 4 character: Final v1 hardening phase. Mix of bug fixes + flake stabilization + audit P1 polish + visual polish + build hygiene + closure aggregator. Wave 0 RED test scaffolds for new audit-P1 fixes; existing harness extended for new A33+ assertions; spike-first for SW state persistence (Plan 04-03 per RESEARCH finding 2).


Test Infrastructure

Property Value
Framework vitest 4.x (unit) + custom Puppeteer harness (UAT — npm run test:uat)
Config file vitest.config.ts + tests/uat/harness.test.ts (orchestrator)
Quick run command npm test -- --run tests/<focused-file>.test.ts
Full suite command npm test -- --run (vitest) + HEADLESS=1 SKIP_PROD_REBUILD=1 npm run test:uat (UAT harness)
Estimated runtime ~50s (vitest 171→~190 tests post Wave 0) + ~95-300s (UAT harness 33→~37 assertions; 5-min idle test adds ~300s to single run if not gated) ≈ 2.5-6 min full sweep

Sampling Rate

  • After every task commit: Run focused test command (vitest single-file OR npm run test:uat -- --grep A<NN> for harness)
  • After every plan wave: Run full vitest + full UAT harness — both MUST be GREEN
  • Before /gsd-verify-work 4: Full suite GREEN + pre-checkpoint bundle gates 6/6 PASS (per saved memory feedback-pre-checkpoint-bundle-gates.md)
  • 5-min idle test (Plan 04-03): dedicated npm run test:uat:long lane with 6-min timeout; NOT in default sample
  • Max feedback latency: ~2.5 min (default full sweep); ~10s (focused vitest); ~25s (focused UAT assertion)

Per-Task Verification Map

Task ID Plan Wave Requirement Threat Ref Secure Behavior Test Type Automated Command File Exists Status
Filled by planner during Plans 04-01..08 creation via planner via planner via planner via planner via planner via planner via planner via planner pending

Status: pending · green · red · ⚠️ flaky

Planner instructions: Populate one row per task. Per RESEARCH finding 5 (6 new Wave-0 test files anticipated), expect ~6 unit-test rows + ~4 harness-A33+ rows + ~4 bundle-gate rows + ~3 docs rows. Format per Phase 3 03-VALIDATION precedent.


Wave 0 Requirements

Per RESEARCH finding 5, 6 new test files anticipated as Wave 0 RED scaffolds for audit-P1 / ROADMAP-SC items:

  • tests/build/no-new-function-in-sw-chunk.test.ts — grep dist/assets/index.ts-*.js for new Function( count = 0 after setimmediate polyfill replacement
  • tests/build/dead-code-grep.test.ts — ROADMAP SC #4: rg permissions\.request + duplicate offscreen inline string in src/ = 0 hits
  • tests/content/fetch-interception.test.ts — audit P1 #11: Request-arg vs string-arg case for args[0]
  • tests/content/navigation-tracking.test.ts — audit P1 #14: module-level previousUrl tracking
  • tests/content/rrweb-timestamps.test.ts — audit P1 #15: epoch vs page-load-relative
  • tests/welcome/inline-svg.test.ts — UI-SPEC: ?raw import + DOMParser inline-SVG + currentColor resolves via CSS

Existing infrastructure already in place (inherited from Phases 1-3):

  • tests/uat/extension-page-harness.ts — page-side assertA* host (extend with assertA33+)
  • tests/uat/lib/harness-page-driver.ts — host-side driveA* host (extend with driveA33+ + improved driveA29)
  • tests/uat/harness.test.ts — orchestrator (extend)
  • tests/uat/lib/assertions.ts — shared helpers
  • tests/uat/lib/zip.ts — jszip-based archive parsing
  • tests/uat/lib/launch.ts — Puppeteer Chrome launch + extension load
  • tests/background/no-test-hooks-in-prod-bundle.test.ts — FORBIDDEN_HOOK_STRINGS lockstep

wave_0_complete: false until 6 new test files committed per planner Wave 0.


Manual-Only Verifications

Behavior Requirement Why Manual Test Instructions
Dark-surface logo visual contrast judgment (WCAG ratio not specified in UI-SPEC) UI-SPEC acceptance criterion #2 + operator empirical Aesthetic judgment of deep-indigo stroke on madder-orange wrapper in dark-OS context — UI-SPEC defers to operator empirical checkpoint Load extension in OS dark mode; open welcome page; verify mark wrapper still madder, stroke is deep-indigo, contrast is acceptable. Plan 04-06 operator empirical UAT cycle covers this.
Cursor visibility verification ROADMAP cursor visibility item per D-P4-03 Already shipped at src/offscreen/recorder.ts:285 per Plan 01-09 (RESEARCH finding 4) — Plan 04-06 downgrades to verification + stale-note correction; minor manual check of one captured frame to confirm cursor present Load extension, start recording, click anywhere, SAVE archive; open video/last_30sec.webm in Chrome; verify cursor visible in playback.

All other Phase 4 behaviors have automated verification via vitest (Wave 0 unit tests) + UAT harness (A33+) + bundle gates.


Validation Sign-Off

  • All tasks have <automated> verify or Wave 0 dependencies — pending planner fill-in
  • Sampling continuity: no 3 consecutive tasks without automated verify — verify after planner fills table
  • Wave 0 covers all MISSING references — 6 new test files anticipated; planner confirms count
  • No watch-mode flags — verify in planner output (focused commands use --run)
  • Feedback latency < ~2.5 min default (5-min idle test on dedicated lane) — confirmed by RESEARCH
  • nyquist_compliant: true set in frontmatter — pending sign-off after planner completes

Approval: pending (planner fills per-task map; checker validates)