Files

Mark 526ac78046 docs(04): create phase plan — 7 plans for Phase 4 hardening (audit P1 polish + flake stabilization + SW persistence + visual polish + closure)

Wave structure:
- W1 (parallel): 04-01 (Audit P1 polish #11/#14/#15 TDD) + 04-02 (build/CSP hygiene: setimmediate polyfill + dead-code + generate-icons.cjs)
- W2: 04-03 (A29 cs-injection-world rewrite; closes flake)
- W3: 04-04 (A33 SW state persistence; spike-first + CDP worker.close())
- W4: 04-05 (A34 fetch+XHR network_error; ROADMAP SC #2 + validates Plan 04-01 P1 #11 end-to-end)
- W5: 04-06 (dark-logo currentColor + cursor verification + 01-07-SUMMARY back-patch; operator empirical)
- W6: 04-07 (04-VERIFICATION.md aggregator + ROADMAP backfill + v1 close prep)

Honors locked decisions D-P4-01..05 (full Phase 4 + all 3 P1 polish + both visual items + alpha-independent + ROADMAP backfill).
Implements RESEARCH Q1 (setimmediate option a), Q2 (spike-first SW persistence), Q3 (A29 cs-injection-world), Finding 4 (cursor already shipped — verification only).
UI-SPEC dark-logo currentColor strategy with inline-SVG injection landed per UI-SPEC §"Implementation amendment".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-21 09:30:49 +02:00

9.8 KiB

Raw Blame History

phase, slug, status, nyquist_compliant, wave_0_complete, created

phase	slug	status	nyquist_compliant	wave_0_complete	created
04	harden-clean-up-optional	draft	false	false	2026-05-21

Phase 04 — Validation Strategy

Per-phase validation contract for feedback sampling during execution.

Phase 4 character: Final v1 hardening phase. Mix of bug fixes + flake stabilization + audit P1 polish + visual polish + build hygiene + closure aggregator. Wave 0 RED test scaffolds for new audit-P1 fixes; existing harness extended for new A33+ assertions; spike-first for SW state persistence (Plan 04-03 per RESEARCH finding 2).

Test Infrastructure

Property	Value
Framework	vitest 4.x (unit) + custom Puppeteer harness (UAT — `npm run test:uat`)
Config file	`vitest.config.ts` + `tests/uat/harness.test.ts` (orchestrator)
Quick run command	`npm test -- --run tests/<focused-file>.test.ts`
Full suite command	`npm test -- --run` (vitest) + `HEADLESS=1 SKIP_PROD_REBUILD=1 npm run test:uat` (UAT harness)
Estimated runtime	~50s (vitest 171→~190 tests post Wave 0) + ~95-300s (UAT harness 33→~37 assertions; 5-min idle test adds ~300s to single run if not gated) ≈ 2.5-6 min full sweep

Sampling Rate

After every task commit: Run focused test command (vitest single-file OR npm run test:uat -- --grep A<NN> for harness)
After every plan wave: Run full vitest + full UAT harness — both MUST be GREEN
Before /gsd-verify-work 4: Full suite GREEN + pre-checkpoint bundle gates 6/6 PASS (per saved memory feedback-pre-checkpoint-bundle-gates.md)
5-min idle test (Plan 04-03): dedicated npm run test:uat:long lane with 6-min timeout; NOT in default sample
Max feedback latency: ~2.5 min (default full sweep); ~10s (focused vitest); ~25s (focused UAT assertion)

Per-Task Verification Map

Task ID	Plan	Wave	Requirement	Threat Ref	Secure Behavior	Test Type	Automated Command	File Exists	Status
04-01 T1 RED 3 tests	04-01	1	Audit P1 #11/#14/#15	T-04-01-01..03	URL extraction + previousUrl + epoch normalization	unit (vitest jsdom)	`npm test -- tests/content/ --run`	❌ NEW (Wave 0)	⬜ pending
04-01 T2 GREEN edits	04-01	1	Audit P1 #11/#14/#15	T-04-01-01..03	Same; src/content/index.ts edits	unit (vitest)	`npm test -- tests/content/ --run` (+8 GREEN) + `npx tsc --noEmit`	✗ EXISTS (modify)	⬜ pending
04-02 T1 RED build gates	04-02	1	SC #4 dead-code + setimmediate hygiene	T-04-02-01/03	grep gate	unit (vitest + execFile build)	`npm test -- tests/build/no-new-function-in-sw-chunk.test.ts tests/build/dead-code-grep.test.ts --run`	❌ NEW (Wave 0)	⬜ pending
04-02 T2 GREEN polyfill + rename + flip	04-02	1	SC #3 generate-icons + setimmediate Q1	T-04-02-01/02/04	queueMicrotask polyfill; .cjs rename	build-gate + unit	`npm run build && grep -c 'new Function' dist/assets/index.ts-*.js` -> 0 + `node generate-icons.cjs` exit 0	✗ EXISTS (modify + rename)	⬜ pending
04-03 T1 assertA29 rewrite	04-03	2	A29 flake stabilization	T-04-03-01/02	cs-injection-world ISOLATED + sentinel	UAT (page-side)	`npx tsc --noEmit && npm run build:test`	✗ EXISTS (modify)	⬜ pending
04-03 T2 driveA29 strict-sentinel	04-03	2	A29 sentinel filter	T-04-03-01	rrweb IncrementalSource.Mutation filter	UAT (host-side)	`HEADLESS=1 SKIP_PROD_REBUILD=1 npm run test:uat` 33/33 GREEN; 5/5 stress	✗ EXISTS (modify)	⬜ pending
04-04 T1 SPIKE	04-04	3	SC #1 SW state persistence empirical	T-04-04-01	offscreen survives SW idle	spike script	`HEADLESS=1 tsx tests/uat/spike-a33-sw-persistence.ts` -> videoSize > 100_000	❌ NEW (Wave 0 spike)	⬜ pending
04-04 T2 A33 + stopServiceWorker + orchestrator	04-04	3	SC #1 5-min idle harness	T-04-04-02/03/04	CDP worker.close() + 5-min wait + SAVE	UAT	`HEADLESS=1 SKIP_LONG_UAT=1 npm run test:uat` 34/34 GREEN (skip-mode); full-mode 34/34 ~6.5 min	✗ EXISTS (modify)	⬜ pending
04-05 T1 assertA34 fetch+XHR	04-05	4	SC #2 fetch+XHR network_error	T-04-05-01	cs-injection-world dual-trigger	UAT (page-side)	`npx tsc --noEmit && npm run build:test`	✗ EXISTS (modify)	⬜ pending
04-05 T2 driveA34 + orchestrator	04-05	4	SC #2 + P1 #11 end-to-end empirical	T-04-05-01	2 network_error entries with status===404	UAT	`HEADLESS=1 SKIP_LONG_UAT=1 npm run test:uat` 35/35 GREEN; full-mode ~7 min	✗ EXISTS (modify)	⬜ pending
04-06 T1 RED inline-SVG + cursor-pin	04-06	5	UI-SPEC dark-logo + RESEARCH Finding 4	T-04-06-01	DOMParser inline injection (no innerHTML); cursor: 'always' literal	unit (vitest jsdom + build-grep)	`npm test -- tests/welcome/ tests/build/cursor-visibility.test.ts --run`	❌ NEW (Wave 0)	⬜ pending
04-06 T2 GREEN SVG + welcome.ts + globals	04-06	5	UI-SPEC stroke recolor + ?raw import	T-04-06-01	currentColor + DOMParser inline	unit	`npm test -- tests/welcome/inline-svg.test.ts --run` 3/3 GREEN	✗ EXISTS (modify)	⬜ pending
04-06 T3 A17.8 + 01-07 back-patch	04-06	5	UI-SPEC harness invariant + docs hygiene	T-04-06-01	A17.8 raw-source grep	UAT + docs	`HEADLESS=1 SKIP_LONG_UAT=1 npm run test:uat` 35/35 + grep verify	✗ EXISTS (modify)	⬜ pending
04-06 T4 Operator empirical	04-06	5	UI-SPEC AC #6 aesthetic judgment	T-04-06-01	dark-mode visual contrast	manual	operator returns "approved" or describes issue	n/a	⬜ pending
04-07 T1 04-VERIFICATION.md	04-07	6	Phase 4 closure aggregator	T-04-07-01	scorecard + override notes + deferred items	docs aggregator	`test -f .planning/phases/04-harden-clean-up-optional/04-VERIFICATION.md && grep -cE '^## '` >= 5	❌ NEW	⬜ pending
04-07 T2 Marker flips	04-07	6	D-P4-05 + ROADMAP/STATE flips	T-04-07-02/03	Phase 4 [x] + completed_phases: 4	docs	`grep -c '\[x\] \\Phase 4' .planning/ROADMAP.md` >= 1	✗ EXISTS (modify)	⬜ pending

Status: ⬜ pending · ✅ green · ❌ red · ⚠️ flaky

Planner instructions: Populate one row per task. Per RESEARCH finding 5 (6 new Wave-0 test files anticipated), expect ~6 unit-test rows + ~4 harness-A33+ rows + ~4 bundle-gate rows + ~3 docs rows. Format per Phase 3 03-VALIDATION precedent.

Wave 0 Requirements

Per RESEARCH finding 5, 6 new test files anticipated as Wave 0 RED scaffolds for audit-P1 / ROADMAP-SC items:

⬜ tests/build/no-new-function-in-sw-chunk.test.ts — grep dist/assets/index.ts-*.js for new Function( count = 0 after setimmediate polyfill replacement
⬜ tests/build/dead-code-grep.test.ts — ROADMAP SC #4: rg permissions\.request + duplicate offscreen inline string in src/ = 0 hits
⬜ tests/content/fetch-interception.test.ts — audit P1 #11: Request-arg vs string-arg case for args[0]
⬜ tests/content/navigation-tracking.test.ts — audit P1 #14: module-level previousUrl tracking
⬜ tests/content/rrweb-timestamps.test.ts — audit P1 #15: epoch vs page-load-relative
⬜ tests/welcome/inline-svg.test.ts — UI-SPEC: ?raw import + DOMParser inline-SVG + currentColor resolves via CSS

Existing infrastructure already in place (inherited from Phases 1-3):

✅ tests/uat/extension-page-harness.ts — page-side assertA* host (extend with assertA33+)
✅ tests/uat/lib/harness-page-driver.ts — host-side driveA* host (extend with driveA33+ + improved driveA29)
✅ tests/uat/harness.test.ts — orchestrator (extend)
✅ tests/uat/lib/assertions.ts — shared helpers
✅ tests/uat/lib/zip.ts — jszip-based archive parsing
✅ tests/uat/lib/launch.ts — Puppeteer Chrome launch + extension load
✅ tests/background/no-test-hooks-in-prod-bundle.test.ts — FORBIDDEN_HOOK_STRINGS lockstep

wave_0_complete: false until 6 new test files committed per planner Wave 0.

Manual-Only Verifications

Behavior	Requirement	Why Manual	Test Instructions
Dark-surface logo visual contrast judgment (WCAG ratio not specified in UI-SPEC)	UI-SPEC acceptance criterion #2 + operator empirical	Aesthetic judgment of deep-indigo stroke on madder-orange wrapper in dark-OS context — UI-SPEC defers to operator empirical checkpoint	Load extension in OS dark mode; open welcome page; verify mark wrapper still madder, stroke is deep-indigo, contrast is acceptable. Plan 04-06 operator empirical UAT cycle covers this.
Cursor visibility verification	ROADMAP cursor visibility item per D-P4-03	Already shipped at src/offscreen/recorder.ts:285 per Plan 01-09 (RESEARCH finding 4) — Plan 04-06 downgrades to verification + stale-note correction; minor manual check of one captured frame to confirm cursor present	Load extension, start recording, click anywhere, SAVE archive; open `video/last_30sec.webm` in Chrome; verify cursor visible in playback.

All other Phase 4 behaviors have automated verification via vitest (Wave 0 unit tests) + UAT harness (A33+) + bundle gates.

Validation Sign-Off

All tasks have <automated> verify or Wave 0 dependencies — pending planner fill-in
Sampling continuity: no 3 consecutive tasks without automated verify — verify after planner fills table
Wave 0 covers all MISSING references — 6 new test files anticipated; planner confirms count
No watch-mode flags — verify in planner output (focused commands use --run)
Feedback latency < ~2.5 min default (5-min idle test on dedicated lane) — confirmed by RESEARCH
nyquist_compliant: true set in frontmatter — pending sign-off after planner completes

Approval: pending (planner fills per-task map; checker validates)

9.8 KiB Raw Blame History