Files
mokosh/.planning/phases/02-stabilize-export-pipeline/02-CONTEXT.md
Mark cc042a5583 docs(02): capture phase context — discuss-phase complete
Phase 2 (Stabilize export pipeline) discuss-phase landed via inline canonical
workflow execution. 02-CONTEXT.md captures 3 locked decisions:

- **D-P2-01:** Offscreen-minted Blob URL pipeline replaces base64 data: URL
  download path (src/background/index.ts:709-710). SW Blob → offscreen
  URL.createObjectURL → SW chrome.downloads.download → URL.revokeObjectURL on
  onChanged. Closes audit P0-6. Unblocks real-archive size (>2 MB).

- **D-P2-02:** meta.json schema migrates singular `url: string` to plural
  `urls: string[]` capturing all tabs visible during the 30s recording window.
  Schema-breaking change requires REQUIREMENTS.md REQ-meta-json-schema
  amendment + SessionMetadata type update. Closes audit P1 #10 captured-URL bug.

- **D-P2-03:** Full Phase 2 scope = Blob URL migration + meta.urls schema
  migration + strict meta.json schema validation test + UAT harness A24+
  <5s latency assertion. ~3-4 plans expected.

Decision provenance:
- D-P2-01 rationale: user "up to you. If you think we need to migrate — good
  let's do it." Plus analysis: real archives EXCEED base64 cap.
- D-P2-02 rationale: user picked "All tabs' URLs as an array
  (meta.json.urls)" — highest informational fidelity for multi-tab bug
  reproduction. Privacy acceptable per "log is internal" v1 charter.
- D-P2-03 rationale: user picked "Full scope: bug fixes + schema + harness
  latency assertion".

Canonical references + code context + deferred items captured in CONTEXT.md.
Phase boundary explicit: not from-scratch; closes residual gaps after Plans
01-08/01-09/01-10/01-12 substantively shipped REQ-popup-ui + REQ-archive-
layout + REQ-screenshot-on-export.

Next: /gsd-plan-phase 2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 13:41:08 +02:00

12 KiB

Phase 2: Stabilize export pipeline - Context

Gathered: 2026-05-20 Status: Ready for planning

## Phase Boundary

The export pipeline — saveArchive coordination of video buffer fetch, screenshot capture, content-script rrweb messaging, JSZip assembly, and Blob URL download. Phase 2 closes the audit's last functional export defect (P0-6: base64→Blob URL migration), corrects the meta.json captured-URL bug (P1 #10), and locks the meta.json schema with strict validation + harness <5s latency assertion.

Most of REQ-popup-ui + REQ-archive-layout + REQ-screenshot-on-export was substantively shipped via Plans 01-08 (webm-remux + JSZip), 01-09 (popup state machine + SAVE-only UI), and 01-12 (manifest i18n). Phase 2 scope is the residual gaps, NOT a from-scratch implementation.

2026-05-20 re-phasing context: Original ROADMAP Phase 3 (Stabilize export pipeline) renumbered to Phase 2 after the original Phase 2 (DOM/event-capture privacy) was removed entirely per the "log is internal" charter shift. REQ-rrweb-dom-buffer + REQ-user-event-log verification absorbed by new Phase 3 (smoke). REQ-password-confidentiality moved to Out of Scope (v1).

## Implementation Decisions

Blob URL migration architecture (P0-6 fix)

  • D-P2-01: Offscreen-minted Blob URL pipeline. Replace the current base64 data: URL download path (src/background/index.ts:709-710) with: SW packages zip via JSZip → SW sends Blob to offscreen via port → offscreen calls URL.createObjectURL(blob) → offscreen sends URL back to SW → SW calls chrome.downloads.download({ url, filename }) → SW onChanged listener triggers URL.revokeObjectURL after download completes. Matches the original audit P0-6 recommendation. Properly unblocks any archive size (real bug-report archives are ~5-10 MB; current base64 path hits Chrome's ~2 MB data-URL cap).
    • Rationale: user direction "up to you. If you think we need to migrate — good let's do it." Combined with analysis: real archives EXCEED base64 limit; without migration the canonical use case breaks; offscreen has URL.createObjectURL (SW does not — DEC-006 binding).
    • Port-transfer plumbing: existing offscreen↔SW port from Plan 01-04/07 (D-12 binary wire format at src/shared/binary.ts blobToBase64/base64ToBlob); needs new message types for Blob transfer (reverse direction from existing SW-bound video segments). Likely uses base64 wire-encoding for the Blob (same pattern as video segments) OR direct Blob via port (since both contexts are in the same extension origin).

meta.json captured-URL fix (P1 #10)

  • D-P2-02: meta.json schema migrates from singular url: string to plural urls: string[]. Captures all tabs visible during the 30s recording window. Captures the operator's full multi-tab context, not just the active-at-save tab.
    • Schema-breaking change — CON-meta-json-schema currently mandates 7 fields including url: string. Phase 2 MUST amend REQUIREMENTS.md REQ-meta-json-schema to reflect the new shape (8 fields, urls: string[] replacing url: string).
    • Tab-tracking infrastructure: SW needs to maintain a Set of tab URLs seen during the 30s window. Simplest: chrome.tabs.onUpdated + chrome.tabs.onActivated listeners maintain tabUrlsSeen: Set<string>; pruned alongside the video segment ring buffer. Alternative: query chrome.windows.getAll + iterate tabs at SAVE time (simpler but doesn't capture history during the window).
    • Rationale: user direction "All tabs' URLs as an array (meta.json.urls)" — highest informational fidelity for multi-tab bug-reproduction workflows.
    • Privacy note: since "log is internal" per re-phasing, tab URLs may include sensitive operator state — acceptable per current charter.

Phase 2 scope tightness

  • D-P2-03: Full Phase 2 scope. Three workstreams:
    1. Blob URL migration (D-P2-01)
    2. meta.json urls array migration + REQUIREMENTS.md schema amendment (D-P2-02)
    3. meta.json strict schema validation (new test file: ISO-8601 timestamp regex, urls array non-empty, version semver, totalEvents non-negative integer, 8 fields exact)
    4. UAT harness <5s latency assertion (A24+ extension, consistent with Phase 1's Approach-B harness pattern)
    • Rationale: user direction "Full scope: bug fixes + schema + harness latency assertion". Most comprehensive Phase 2 closure.
    • Estimated plans: ~3-4 plans. Wave structure likely: Wave 0 RED tests → Wave 1 meta.urls + tab-tracking infra → Wave 2 Blob URL pipeline (offscreen + SW + port plumbing) → Wave 3 schema validation + harness extension → Wave 4 operator empirical checkpoint.

Claude's Discretion

  • Wave structure + plan granularity (planner figures out)
  • Tab-tracking infrastructure exact implementation (researcher may scope; SW state vs onActivated-based)
  • Blob transfer wire format choice (base64 vs direct Blob port message) — both have precedent in Plan 01-07 D-12; planner picks
  • Specific harness assertion shapes for A24+ (planner figures out)
  • Whether to spawn researcher for Blob URL implementation patterns OR proceed directly to planner (likely direct: implementation is well-understood)
  • Whether to bundle the offscreen Blob URL change with the SW download change in a single commit OR atomic per-step
  • README.md updates if the archive size limit lifts (informational only)

<canonical_refs>

Canonical References

Downstream agents MUST read these before planning or implementing.

Roadmap + Requirements

  • .planning/ROADMAP.md §"Phase 2: Stabilize export pipeline" — phase boundary, depends_on Phase 1, scope note from 2026-05-20 re-phasing
  • .planning/REQUIREMENTS.md — REQ-popup-ui, REQ-screenshot-on-export, REQ-archive-layout, REQ-meta-json-schema (needs amendment for urls), REQ-archive-export-latency, REQ-manifest-permissions (Complete)
  • .planning/PROJECT.md — DEC-005 (JSZip), DEC-006 (chrome.downloads), DEC-008 (screenshot via captureVisibleTab)

Audit + Original Defects

  • /home/parf/.claude/plans/dear-claude-there-is-snazzy-fox.md (the original manifest.zip audit) — P0-6 (base64 download), P1 #10 (meta.json url + version), P1 #11 (fetch Request handling — moved to Phase 4 per re-phasing)

Source code surfaces

  • src/background/index.ts saveArchive (line 736), createArchive (line 608), downloadArchive (line 695), captureScreenshot (line 568), meta.json construction (line 674-682)
  • src/shared/binary.ts — base64ToBlob / blobToBase64; existing wire-format for offscreen→SW; Phase 2 may reuse for SW→offscreen reverse direction OR introduce a new message type
  • src/offscreen/recorder.ts — current offscreen context; Phase 2 adds Blob URL minting handler
  • src/shared/types.ts — SessionMetadata interface; Phase 2 amends url: stringurls: string[]

Plan precedents

  • .planning/phases/01-stabilize-video-pipeline/01-08-SUMMARY.md — webm-remux + JSZip integration
  • .planning/phases/01-stabilize-video-pipeline/01-09-SUMMARY.md — SAVE-only popup + SAVE_ARCHIVE message channel; Amendment 3 always-on charter
  • .planning/phases/01-stabilize-video-pipeline/01-07-SUMMARY.md — D-12 base64 wire-format precedent for port-transfer plumbing
  • .planning/phases/01-stabilize-video-pipeline/01-13-SUMMARY.md — Approach B UAT harness pattern for A24+ extension

Architectural constraints (from Phase 1 SUMMARYs)

  • Never await import(...) in src/background/index.ts (Plan 01-11 SUMMARY)
  • Test-mode symbols stay in dist-test/ only via __MOKOSH_UAT__ define-token (Plan 01-11 SUMMARY)
  • Tier-1 FORBIDDEN_HOOK_STRINGS gate (currently 12 entries; Phase 2 may add A24+ entries — must update lockstep)
  • Pre-checkpoint bundle gates per saved memory feedback-pre-checkpoint-bundle-gates.md before any operator checkpoint

</canonical_refs>

<code_context>

Existing Code Insights

Reusable Assets

  • chrome.tabs.query + chrome.tabs.captureVisibleTab: already wired in captureScreenshot (line 568). Tab-tracking for D-P2-02 can reuse the active-tab query pattern.
  • chrome.tabs.onUpdated + chrome.tabs.onActivated listeners: not currently used in SW; D-P2-02 introduces them for tab-URL set maintenance.
  • chrome.runtime.connect long-lived port (D-17 keepalive): existing offscreen↔SW connection; can carry new SW→offscreen Blob transfer message.
  • src/shared/binary.ts blobToBase64 / base64ToBlob: reusable wire-format for SW→offscreen Blob transfer (same encoding as Plan 01-07 D-12).
  • chrome.downloads.onChanged listener (NOT yet wired): Phase 2 adds for URL.revokeObjectURL lifecycle.
  • JSZip in createArchive (line 617): Blob output via zip.generateAsync({ type: 'blob' }) already produces the right type.

Established Patterns

  • D-06 always-on charter: SAVE creates a new zip; recorder stays in REC. Phase 2's Blob URL pipeline MUST preserve this (no recorder side-effects in downloadArchive).
  • Atomic commits per task per plan: Phase 1 precedent; Phase 2 follows.
  • UAT harness Approach B (Plan 01-13): page-side assertions + driver wrappers + harness.test.ts orchestrator. Phase 2's A24+ extends this pattern.
  • Pre-checkpoint bundle gates per feedback-pre-checkpoint-bundle-gates.md: SW CSP grep + Node-globals grep + DOM-globals grep + Tier-1 SW-bundle-import gate + manifest validation before operator checkpoints.

Integration Points

  • saveArchive → createArchive → downloadArchive chain (line 736 → 608 → 695): Blob URL migration changes downloadArchive but createArchive's interface (returns Blob) is preserved.
  • Offscreen recorder: existing message handler at src/offscreen/recorder.ts listens for START_RECORDING, STOP_RECORDING; Phase 2 adds a new handler for CREATE_DOWNLOAD_URL (or similar) that mints + returns Blob URL.
  • meta.json construction in createArchive (line 674-682): D-P2-02 changes url: ... to urls: ...; SessionMetadata type updates lockstep.
  • chrome.runtime.connect port: Plan 01-04/07 binary wire-format precedent; Phase 2's Blob → URL transfer rides this OR a new port (planner picks).

</code_context>

## Specific Ideas
  • Real-archive-size assumption: operators in production may capture sessions yielding 5-10 MB archives (~1.5-5 MB VP9 video + screenshot + rrweb + events + meta). Base64 data: URL has ~2 MB cap; this is the GATING constraint for Blob URL migration. Phase 2's success criterion includes empirically saving a real archive >2 MB without failure.

  • Tab-URL set semantics: D-P2-02's urls: string[] should be DEDUPLICATED + ORDERED (first-seen first). Empty array is acceptable for sessions where no tab events were observed (purely whole-desktop recording without browser-tab interaction). The operator's primary tab at SAVE time should always be in the array if it has a valid URL.

  • URL filter: chrome-extension://, chrome://, devtools://, about:, file:// URLs may or may not be included. Default: exclude chrome:// and about: (low diagnostic value); INCLUDE chrome-extension:// (so the welcome tab or popup URLs show up if the operator was there). Planner-determined.

## Deferred Ideas
  • REQ-password-confidentiality — Out of Scope (v1) per 2026-05-20 re-phasing ("log is internal"). Phase 4 (optional) or v2 candidate.
  • rrweb 2.0.0-alpha.4 → stable v2 upgrade research — Phase 3 (smoke) absorbs this when DOM verification is planned; researcher spawn deferred until then.
  • Audit P1 #11 fetch Request handling — Phase 4 hardening (operator-event log polish; not export-pipeline scope).
  • Audit P1 #14 navigation URL fix — Phase 4 hardening.
  • Audit P1 #15 rrweb timestamp semantics — Phase 4 hardening.
  • Dark-surface logo contrast — Phase 4 hardening (Plan 01-10 operator observation 2026-05-20).
  • ROADMAP backfill for Plans 01-08..01-13 entries (Plan 01-13 plan-checker flag #4) — Phase 4 or housekeeping commit.
  • getDisplayMedia cursor visibility (video: { cursor: 'always' }) — Phase 4 hardening (Plan 01-07 operator observation 2026-05-15).
  • setimmediate polyfill new Function in SW chunk via vite-plugin-node-polyfills — Phase 4 hardening (verified pre-existing across Phase 1).

Reviewed Todos (not folded)

None — discussion stayed within phase scope.


Phase: 2-stabilize-export-pipeline Context gathered: 2026-05-20