diff --git a/.planning/REQUIREMENTS.md b/.planning/REQUIREMENTS.md index 7916262..1569a31 100644 --- a/.planning/REQUIREMENTS.md +++ b/.planning/REQUIREMENTS.md @@ -16,7 +16,7 @@ Requirements for the Phase 1 SPEC. Each maps to one phase in ROADMAP.md. ### Video -- [x] **REQ-video-ring-buffer**: The extension maintains an in-memory ring +- [ ] **REQ-video-ring-buffer**: The extension maintains an in-memory ring buffer containing the most recent 30 seconds of captured video. AMENDED in Phase 01: video is acquired via `navigator.mediaDevices.getDisplayMedia()` invoked from the offscreen document (with `chrome.offscreen.Reason.DISPLAY_MEDIA`), @@ -35,9 +35,19 @@ Requirements for the Phase 1 SPEC. Each maps to one phase in ROADMAP.md. CON-video-window, CON-video-codec, CON-display-capture-binding (replaces RETIRED CON-tab-capture-binding). CON-webm-header-retention RETIRED in favor of D-13 per-segment header isolation. - - SPEC §10 acceptance criteria: #2, #3, #7 — all green 2026-05-15 - (D-12 ffprobe gate + operator-confirmed Chrome playback + ffmpeg dry-run - exit 0 with zero decoder errors against `tests/fixtures/last_30sec.webm`). + - SPEC §10 acceptance criteria: #2, #3 — green 2026-05-15 (D-12 ffprobe + gate). #7 (last_30sec.webm plays back in a browser) — **REOPENED + 2026-05-16**: D-13's concat-of-self-contained-segments architecture + produces a multi-EBML-header file that standards-compliant Matroska + parsers (mpv, ffmpeg, Chrome's HTMLMediaElement) play only as the + first segment (~9.94 s) and silently drop segments 2 and 3. The + 2026-05-15 "operator-confirmed clean Chrome playback" assessment was + insufficient — it checked playback ran without freezing but did not + measure total duration. Plan 01-08 (WebM remux via ts-ebml + + webm-muxer) will replace `mergeVideoSegments`'s file-concat with a + real single-EBML-headered remux, restoring SPEC §10 #7. See + `.planning/debug/d13-multi-ebml-concat-unplayable.md` for the + byte-level root-cause evidence. ### DOM Capture @@ -193,7 +203,7 @@ Which phase covers which requirement. See ROADMAP.md for phase details. | Requirement | Phase | Status | |-------------|-------|--------| -| REQ-video-ring-buffer | Phase 1 | Complete | +| REQ-video-ring-buffer | Phase 1 | In progress (reopened 2026-05-16: SPEC §10 #7 fails; Plan 01-08 WebM remux pending) | | REQ-rrweb-dom-buffer | Phase 2 | Pending | | REQ-user-event-log | Phase 2 | Pending | | REQ-password-confidentiality | Phase 2 | Pending | diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md index b4c6c7b..4720014 100644 --- a/.planning/ROADMAP.md +++ b/.planning/ROADMAP.md @@ -22,7 +22,7 @@ working export → green §10 smoke → harden + clean up**. Decimal phases appear between their surrounding integers in numeric order. -- [x] **Phase 1: Stabilize video pipeline** — Collapse offscreen duality, fix MediaRecorder shadow, fix WebM ring buffer playability, replace `chrome.tabCapture` with offscreen `getDisplayMedia` (AMENDED from original DEC-003). **Closed 2026-05-15** — D-12 ffprobe gate + A3 empirical-playback gate both green against `tests/fixtures/last_30sec.webm` (1.6 MB VP9 1142×1038); D-13 restart-segments retired D-09..D-11 mid-phase; 30/30 vitest green, tsc clean. SPEC §10 #2, #3, #7 functionally satisfied (end-to-end Phase 4 smoke remains owner of §10). +- [ ] **Phase 1: Stabilize video pipeline** — Collapse offscreen duality, fix MediaRecorder shadow, fix WebM ring buffer playability, replace `chrome.tabCapture` with offscreen `getDisplayMedia` (AMENDED from original DEC-003). **Closed 2026-05-15 then REOPENED 2026-05-16**: the 2026-05-15 closure was based on insufficient operator playback verification; D-13's concat-of-self-contained-segments architecture produces a multi-EBML WebM that plays only ~9 s instead of ~30 s in standards-compliant parsers (mpv, ffmpeg, Chrome HTMLMediaElement). UAT Test 3 retest on 2026-05-16 confirmed via byte-level EBML probe. SPEC §10 #7 not actually satisfied. Plan 01-08 (WebM remux via ts-ebml + webm-muxer) replaces `mergeVideoSegments`'s file-concat with a real single-EBML remux. See `.planning/debug/d13-multi-ebml-concat-unplayable.md`. Option C port-lifecycle refactor (debug session `empty-archive-port-race`) DID land cleanly and is retained. Phase 1 will additionally absorb whole-desktop + auto-start UX work (Plans 01-09/01-10) per the 2026-05-16 amended charter. - [ ] **Phase 2: Stabilize DOM + event capture privacy** — Migrate rrweb to v2 `maskInputFn`, plug `content/index.ts setupInputLogging` password leak - [ ] **Phase 3: Stabilize export pipeline** — Restore user-activation gesture in popup, delete dead `permissions.request`, replace base64 `data:` URL with Blob URL minted in offscreen - [ ] **Phase 4: SPEC §10 smoke verification** — End-to-end install-and-record-and-export pass against all 9 acceptance criteria diff --git a/.planning/STATE.md b/.planning/STATE.md index e333958..f0508a3 100644 --- a/.planning/STATE.md +++ b/.planning/STATE.md @@ -2,16 +2,16 @@ gsd_state_version: 1.0 milestone: v2.0.0 milestone_name: milestone -status: phase_complete -stopped_at: "Phase 1 closure 2026-05-15: D-12 ffprobe gate + A3 empirical-playback gate both green against tests/fixtures/last_30sec.webm (1.6 MB VP9 1142×1038, 3-segment multi-EBML-header concat). D-13 restart-segments retired D-09..D-11 mid-phase. 30/30 vitest green incl. empirical webm-playback dry-runs; tsc clean; ffmpeg -v warning -i fixture -f null - exit 0 with zero decoder errors (only expected muxer DTS-monotonicity warnings at segment join boundaries); operator confirmed clean Chrome playback end-to-end. REQ-video-ring-buffer marked Complete. Ready to plan Phase 2 (DOM + event-capture privacy)." -last_updated: "2026-05-15T21:42:00.000Z" -last_activity: "2026-05-15 — Phase 1 closure: D-12 + A3 gates green; REQ-video-ring-buffer complete; ready for Phase 2" +status: phase_reopened +stopped_at: "Phase 1 REOPENED 2026-05-16: D-13 multi-EBML-concat architecture confirmed broken via UAT Test 3 retest + byte-level EBML probe — produced WebM plays only ~9 s instead of ~30 s in mpv AND Chrome (the 2026-05-15 closure's operator playback check was insufficient). Phase 1's primary deliverable (REQ-video-ring-buffer, SPEC §10 #7) is NOT satisfied. Plan 01-08 (WebM remux via ts-ebml + webm-muxer) will replace mergeVideoSegments file-concat with real single-EBML remux. RED test landed in tests/offscreen/webm-playback.test.ts (2 failures, 53 baseline GREEN). Option C port-lifecycle refactor (debug session empty-archive-port-race) DID land cleanly (commits 674c415..f0871c0). Phase 1 also absorbs whole-desktop + auto-start UX (Plans 01-09/01-10) per 2026-05-16 amended charter." +last_updated: "2026-05-16T17:35:00Z" +last_activity: "2026-05-16 — Phase 1 reopened: D-13 multi-EBML architecture confirmed broken by mpv/Chrome playback test; Plan 01-08 (ts-ebml + webm-muxer remux) pending; markers reverted" progress: total_phases: 5 - completed_phases: 1 - total_plans: 7 + completed_phases: 0 + total_plans: 10 completed_plans: 7 - percent: 100 + percent: 70 --- # Project State diff --git a/.planning/debug/d13-multi-ebml-concat-unplayable.md b/.planning/debug/d13-multi-ebml-concat-unplayable.md index 8cc2e98..7497e86 100644 --- a/.planning/debug/d13-multi-ebml-concat-unplayable.md +++ b/.planning/debug/d13-multi-ebml-concat-unplayable.md @@ -8,8 +8,8 @@ trigger: | 1675899 bytes, total segments merged: 3") but the resulting file plays ONLY ~9 s in Chrome AND in mpv. Cross-checking the canonical fixture committed at Phase 1 closure on 2026-05-15 (`tests/fixtures/last_30sec.webm`, - 1633459 bytes, 3 segments per architecture) reveals it ALSO plays only ~9 s - in mpv. Operator confirmed both via mpv playback test. + 1633459 bytes, 3 segments per architecture) reveals it ALSO plays only + ~9 s in mpv. Operator confirmed both via mpv playback test. This means D-13's "concat of self-contained WebM segments → playable 30 s WebM" architecture is fundamentally broken. The 2026-05-15 Phase 1 @@ -24,7 +24,7 @@ trigger: | though it was marked Complete in REQUIREMENTS.md/ROADMAP.md/STATE.md on 2026-05-15. created: 2026-05-16T16:56:41Z -updated: 2026-05-16T16:56:41Z +updated: 2026-05-16T17:25:00Z phase: 01-stabilize-video-pipeline related_uat: .planning/phases/01-stabilize-video-pipeline/01-UAT.md related_review_fix: .planning/phases/01-stabilize-video-pipeline/01-REVIEW-FIX.md @@ -121,15 +121,19 @@ the segment boundary noise from concatenation, not playback failure. ## Current Focus hypothesis: | - **H4 confirmed by operator empirical test**: D-13's "concat of self- - contained WebM segments → produce playable 30 s WebM" architecture - does not work in practice because most Matroska/WebM players do not - implement the multi-segment Matroska feature. The Matroska spec - permits multiple segments in one file but most decoders read only - the first segment's EBML header and stop there. ffmpeg's behavior - (which mpv inherits) is to honor the first EBML's duration metadata. - Chrome's MSE implementation appears to do the same (per UAT operator - observation). + **H4 confirmed by byte-level EBML probe (2026-05-16T17:10Z, see + Evidence/H4 below)**: D-13's "concat of self-contained WebM + segments → produce playable 30 s WebM" architecture does not work + because standards-compliant Matroska parsers (mpv, mkvtoolnix, + Chrome's HTMLMediaElement, ffprobe's `format=duration` path) honor + the FIRST Segment element's Info.Duration EBML (~9_934 ms for the + fixture) and stop there. Even ffmpeg's matroska DEMUXER — which is + unusually liberal and reads through the second segment's EBML + header — collapses segments 2..N onto seg1's local timestamp axis + (verified empirically: 601 packets decoded from segs 1+2, ZERO + packets from seg3, output `time=00:00:09.96`). Multi-segment + Matroska is technically permitted by the spec but in practice + consumer-grade players do not implement it. **H3 confirmed by operator empirical test**: The 2026-05-15 Phase 1 closure's "operator-confirmed clean Chrome playback" check was @@ -145,49 +149,98 @@ hypothesis: | produces a file that any player can read end-to-end as one continuous ~30 s stream. - **Candidate implementations**: - - `webm-muxer` npm package (Vanilla. ~10 KB. Browser + Node support. - Single-segment output. Active maintenance.) - - `ts-ebml` (EBML parser + writer. Allows manual control over - structure. ~50 KB.) + **Candidate implementations** (researched 2026-05-16, see + Evidence/library-survey below for full table): + - `webm-muxer` 5.1.4 (Vanilagy, MIT, last release 2025-07-02, + gzipped ~12 KB, pure ESM/CJS no DOM globals). `addVideoChunkRaw(data, + type:'key'|'delta', timestamp, meta?)` accepts already-encoded VP9 + frames — exactly the shape produced by a stream of existing WebM + segments. SW-compatible. PRIMARY CANDIDATE for the write half. + - `ts-ebml` 3.0.2 (legokichi, MIT, last release 2025-09-28, gzipped + ~87 KB, UMD has a single `typeof window` check with self-fallback + so SW-compatible). Decoder+Encoder API. Needed for the parse half + (extract VP9 SimpleBlock payloads + cluster timecodes + keyframe + flags from each segment). + - `ebml` 3.0.0 (node-ebml, MIT, last release **2018-09-06** — dead + upstream). Smaller but unmaintained. + - `mp4-muxer` 5.2.2 (sibling of webm-muxer; not applicable — we need + WebM container output). - Custom EBML parser (full control, ~500-1000 LOC, no dep weight) - **Alternative path: MediaRecorder timeslice with cluster-aware trim**: revisit retired D-09..D-11 architecture but trim ONLY on keyframe boundaries (preserving every cluster from the most recent keyframe - onwards). This avoids the A3 orphan-P-frame freeze by guaranteeing - every kept cluster's references are present. ~200-400 LOC. The - risk: requires understanding EBML/Matroska cluster structure to - trim correctly. + onwards). See Evidence/cluster-aware-trim below — the DETERMINISTIC + floor on retained-content duration is much less than 30 s + (worst-case: keyframe just emitted → retain only the post-keyframe + sliver) because VP9 kf_max_dist under Chrome's MediaRecorder is + irregular (3-5 s typical, 26 s observed in the prior debug + session). This path produces a NON-DETERMINISTIC content window; + rejected as architecturally weaker than remux. - **Alternative path: WebCodecs API** (VideoEncoder + Muxer.js or similar): full control over container framing. Significant rewrite - (~1000-2000 LOC). Most flexible but heaviest. + (~1000-2000 LOC). Most flexible but heaviest. WebCodecs is + available in MV3 service workers per Chrome 94+ — viable but + over-engineered for the current need (we already have VP9 frames, + we just need to RE-CONTAIN them). - The remux approach (webm-muxer or equivalent) is likely the right - trade-off: small, well-tested library, preserves D-13's segment - lifecycle benefits (no orphan-P-frame freeze, ~10s rotation gap - acceptable), but produces a single-EBML output that all players - read correctly. + The recommendation (TIEBREAKER only — the user makes the call): + `ts-ebml` (parse) + `webm-muxer` (write) is the smallest fix that + matches the actual problem shape. Combined ~100 KB gzipped, both + MIT, both actively maintained, both verified SW-compatible. Net + source-edit LOC ~150-300 in `src/background/index.ts` + mergeVideoSegments() — we don't decode/re-encode VP9 frames, we + just parse them out of segments and re-emit with monotonic + timestamps. Preserves D-13's recorder-side lifecycle (which DID + fix the orphan-P-frame freeze) and adds a single new SW-side + remux pass on the save path. test: | - RED test: introduce a playable-duration assertion to - tests/offscreen/webm-playback.test.ts. Use ffprobe -count_frames - -show_streams to count VIDEO FRAMES (not just metadata duration), - then divide by reported frame rate to compute actual playable - content duration. Assert actual_duration > 25_000 ms for the - generated/committed fixture. This test should FAIL against the - current D-13 architecture and PASS after the remux fix lands. + RED test LANDED at tests/offscreen/webm-playback.test.ts. Two new + assertions in the new `describe('webm playable duration (RED — + confirms d13-multi-ebml-concat-unplayable bug)')` block: - Alternative RED test: ffprobe -read_intervals -i FILE - '0%+#90000' (seek to last 90s, read all packets). Count packets - read. Should be ~600 packets for 30s @ ~20fps, not ~200 for 9s. + 1. `container-level format=duration on last_30sec.webm exceeds 25 s` + — uses ffprobe to read `format=duration`. Asserts + `>= MIN_PLAYABLE_DURATION_MS = 25_000`. RED today + (actual: 9_934 ms). + + 2. `ffmpeg full decode of last_30sec.webm reaches at least 25 s of + timeline` — parses the last `time=HH:MM:SS.MS` from `ffmpeg -stats + -f null -` output. Asserts `>= 25_000 ms`. RED today + (actual: 9_960 ms). + + Both gate behind `it.skipIf(!ffprobeAvailable())` / + `it.skipIf(!ffmpegAvailable())` so CI environments without those + binaries auto-skip rather than hard-fail (matches the existing + webm-playback.test.ts skip discipline). The existing two structural- + validity tests in the same file (`...zero decoder packet errors` and + `...does not end prematurely`) remain GREEN and untouched. expecting: | RED test fails on current code (both fixture and freshly-recorded output should fail the duration assertion). Debugger then implements - the chosen fix path (webm-muxer remux most likely) and re-asserts - GREEN. -next_action: gather initial evidence from EBML parsing of both fixtures + research candidate JS remux libraries -reasoning_checkpoint: "" -tdd_checkpoint: "" + the chosen fix path (webm-muxer + ts-ebml remux most likely) and + re-asserts GREEN. RED confirmed 2026-05-16T17:20Z: 11 test files, + 53 passed + 2 failed (the two new assertions). All pre-existing + tests still GREEN; tsc clean (exit 0). +next_action: CHECKPOINT to orchestrator — root cause confirmed, RED test landed, fix-strategy options surfaced; awaiting user's chosen path via orchestrator routing. +reasoning_checkpoint: | + Why CHECKPOINT here rather than execute: the choice between + `ts-ebml + webm-muxer` vs `custom EBML parser` vs `cluster-aware + trim revisit of D-09..D-11` vs `WebCodecs rewrite` is architecturally + significant (it determines whether Phase 1's deliverable stays in + the debug-session hotfix lane OR escalates to a fresh Plan 01-08, + and whether the project gains two new runtime deps). Per the + feedback memory `feedback-no-unilateral-scope-reduction.md` the + debugger does not narrow this for the user — surface options and + let the user pick. +tdd_checkpoint: | + RED gate honored. Two new failing assertions in + tests/offscreen/webm-playback.test.ts pin the playable-duration + contract that the 2026-05-15 closure check missed. Existing + structural-validity tests remain GREEN. tsc clean. Full vitest run + reports `Test Files 1 failed | 10 passed (11) / Tests 2 failed | 53 + passed (55)` — exactly the expected RED-on-new shape, no collateral + regression. ## Constraints @@ -255,6 +308,197 @@ tdd_checkpoint: "" - Both files report `duration=9.94 s` via ffprobe -show_entries format=duration - Decoder errors: zero (segments are individually valid) +### H4 byte-level EBML probe (2026-05-16T17:10Z) — confirms multi-EBML-concat is the root cause + +Probe target: `tests/fixtures/last_30sec.webm` (1_633_459 bytes, committed +fixture from Phase 1 closure 2026-05-15). + +**EBML structural scan** (raw byte search for element IDs per +[Matroska spec](https://www.matroska.org/technical/elements.html)): + +| EBML element | ID (hex) | Occurrences in file | Byte offsets | +|---|---|---|---| +| EBML header | `1A 45 DF A3` | **3** | `[0, 509038, 970967]` | +| Segment | `18 53 80 67` | **3** | `[36, 509074, 971003]` | +| Cluster | `1F 43 B6 75` | 13 | spread across all 3 segments | + +The file is THREE concatenated WebM files, each with its own EBML header ++ Segment element. mkvinfo (without `--all-elements`) reports only the +FIRST segment + its EBML header — two top-level elements visible — +confirming standards-compliant parsers stop at the first segment. + +**Per-segment isolated probes** (sliced via Python at the EBML offsets +above into `/tmp/d13-seg{1,2,3}.webm`): + +| Segment | Bytes | format=duration | -count_frames | +|---|---|---|---| +| seg1 | 509_038 | 9.934 s | 301 frames | +| seg2 | 461_929 | 9.963 s | 300 frames | +| seg3 | 662_492 | 9.958 s | 311 frames | +| **TOTAL** | **1_633_459** | (29.86 s of real content) | **912 frames** | + +Each segment is individually a valid, complete ~10 s WebM. The +underlying VP9 stream is intact across all three. The bug is purely the +multi-segment topology of the concatenated container. + +**Concatenated file probe** (the actual fixture): + +| Probe command | Reported value | +|---|---| +| `ffprobe -show_entries format=duration` | **9.934024 s** (first segment's Info.Duration metadata only) | +| `ffprobe -count_frames` | **601 frames** (= 301 + 300 = segs 1+2 only) | +| `ffmpeg -f null -` decoder | **frame=601 time=00:00:09.96** + 299 non-monotonic-DTS warnings | +| Packets read from byte range `pos<509038` (seg1) | 301 | +| Packets read from byte range `509038 ≤ pos < 970967` (seg2) | 300 | +| Packets read from byte range `pos ≥ 970967` (seg3) | **0** | +| mkvinfo top-level elements visible | 2 (EBML head + Segment) — seg2 + seg3 invisible | + +Two observations both fatal to D-13: + +1. ffmpeg's matroska demuxer is the most-permissive parser in common + use and even IT silently drops segment 3 (zero packets from + `pos ≥ 970967`). +2. Even when ffmpeg DOES read segments 1+2 it does not offset seg2's + local timestamps onto the global timeline. The 299 non-monotonic-DTS + warnings are seg2's local timestamps (`tt < 9934 ms`) colliding with + seg1's end timestamp (`9934 ms`). Output `time=00:00:09.96` because + the muxer cannot grow the timeline past the maximum monotonic DTS + it has accepted. + +Conclusion: H4 confirmed at the byte level. The file is structurally +valid as a "concatenated archive of three WebMs" but is NOT a single +30-second playable WebM. To produce a 30-second playable WebM the +segments must be REMUXED (parse VP9 frames + keyframe flags + cluster +timestamps from each segment, then re-emit them inside a single +EBML-headered container with monotonically-adjusted timestamps). + +### Library survey (2026-05-16T17:15Z) — candidate JS WebM remux libraries + +All sizes are the bundled dist (no source-map, no tests, no docs). +Gzipped values measured locally via `gzip -c`. SW-compat verdict is +based on grep of dist for `window`/`document`/`navigator`/`XMLHttpRequest` +followed by manual inspection of any hits. + +| Lib | Version | License | Last release | Dist size | Gzipped | SW-compat | API shape | Verdict | +|---|---|---|---|---|---|---|---|---| +| `webm-muxer` | 5.1.4 | MIT | 2025-07-02 | 69 KB | **~12 KB** | YES (zero DOM refs) | `addVideoChunkRaw(data, type:'key'\|'delta', ts, meta?)` accepts encoded VP9 frames | PRIMARY — write half | +| `ts-ebml` | 3.0.2 | MIT | 2025-09-28 | 356 KB | ~87 KB | YES (`typeof window` with `self` fallback in UMD wrapper) | `Decoder.decode(ArrayBuffer) → EBMLElementDetail[]` ; `Encoder.encode(elms) → ArrayBuffer` | PRIMARY — parse half | +| `ebml` | 3.0.0 | MIT | **2018-09-06** | 7.7 MB unpacked | n/a | uncertain | older streaming parser API | DEAD UPSTREAM — avoid | +| `mp4-muxer` | 5.2.2 | MIT | (active) | 70 KB | ~13 KB | YES | analogous to webm-muxer but MP4 | n/a — wrong container | +| Custom EBML parser | n/a | n/a | n/a | 0 KB | 0 KB | YES | hand-rolled per Matroska spec | ~500-1000 LOC, full ownership | + +Important note on the `webm-muxer` API: `addVideoChunkRaw()` takes +already-encoded VP9 frame bytes + a keyframe flag + a timestamp. We +do NOT need to decode/re-encode the VP9 stream — the existing +segments already contain valid VP9 frame payloads inside their +Cluster/SimpleBlock elements. The remux path is: + +1. For each segment blob, parse via `ts-ebml.Decoder` → walk the EBML + tree → for each Cluster's SimpleBlock children, extract the VP9 + frame bytes + keyframe flag (Matroska SimpleBlock bit 7 of the + first flag byte = "keyframe" per + [spec](https://www.matroska.org/technical/elements.html#SimpleBlock)) + + cluster Timestamp + local block offset. +2. Compute monotonic adjusted timestamp: `globalTs = segmentBaseMs + + clusterTsMs + blockOffsetMs` where `segmentBaseMs` accumulates the + prior segment's total content duration. +3. Stream all adjusted frames into a single `webm-muxer.Muxer` with + `addVideoChunkRaw(frameData, isKey ? 'key' : 'delta', globalUs)`. +4. `muxer.finalize()` → `ArrayBufferTarget.buffer` → single-EBML + WebM Blob. + +Combined dep weight: ~100 KB gzipped (`webm-muxer` ~12 KB + `ts-ebml` +~87 KB). Combined source edit estimate at `mergeVideoSegments()`: +~150-300 LOC including type defs. + +### Cluster-aware-trim alternative path (D-09..D-11 revisit, 2026-05-16T17:18Z) + +Path summary: keep MediaRecorder running continuously (the retired +D-09 lifecycle) but, on each periodic trim pass, scan the chunk buffer +for the OLDEST keyframe whose position would keep total duration ≤ 30 +s, then drop everything strictly before that keyframe. Preserves header +chunk + a contiguous run of keyframe-anchored clusters. + +Why this is architecturally weaker than remux: + +1. **Non-deterministic content window.** MediaRecorder/VP9 keyframe + cadence under Chrome's default `kf_max_dist=100` is irregular — + the prior `webm-playback-freeze` debug session observed a 26-second + keyframe gap empirically. If the latest keyframe was emitted 2 s + ago, cluster-aware trim retains only 2 s of content. The user's + `last_30sec.webm` would be anywhere in `[~few seconds .. ~30 s]` + depending on when SAVE landed in the keyframe cycle. That breaks + SPEC §10 #7's implicit "≥ 30 s of recent context" requirement. + +2. **Still need EBML parsing.** To find keyframe boundaries inside + the chunk buffer we still need to parse the WebM container for + SimpleBlock keyframe flags. So the dep weight is similar (`ts-ebml` + at minimum) but the output is worse. + +3. **Re-introduces the freeze-risk surface area.** The prior debug + session retired D-09..D-11 precisely because age-trim repeatedly + produced orphan-P-frame freezes. A "keyframe-aware" variant still + has to delete content; one bug in the keyframe-detection path and + the freeze returns. The risk surface is wider than the remux path, + which never deletes — it only re-containers what already exists. + +LOC estimate: ~200-400 LOC for keyframe parsing + buffer mutation + +tests. Net: similar dep weight, worse playable-duration guarantee, +re-opens the freeze regression surface. **REJECTED as inferior to +remux.** Documenting here only because the orchestrator brief +explicitly requested the comparison. + +### WebCodecs API path (2026-05-16T17:19Z) + +WebCodecs (`VideoEncoder` + `VideoDecoder`) is available in MV3 service +workers from Chrome 94+. The path would be: feed each segment's +clusters → `VideoDecoder` → emit `VideoFrame` objects → feed back into +`VideoEncoder` (re-encode VP9) → wrap output via `webm-muxer`. + +This works but adds a re-encode pass that: +- doubles CPU cost during the save flow +- introduces an additional quality loss (re-encoding lossy VP9) +- adds 500-1000 LOC of encoder/decoder lifecycle management +- requires Chrome 94+ exclusively (we already require modern Chrome, + so OK, but it tightens the version floor) + +There is no benefit over the `ts-ebml + webm-muxer` path for this +specific shape of problem — we already have encoded VP9 frames and +just need to put them in a different container. Re-encoding is +unnecessary work. **REJECTED as over-engineered.** + +### RED test landing evidence (2026-05-16T17:20Z) + +File edited: `tests/offscreen/webm-playback.test.ts` (preserved +existing 2 GREEN tests; appended new `describe` block with 2 new +assertions + supporting helpers). + +Test run scoped to file: +``` +$ npx vitest run tests/offscreen/webm-playback.test.ts + Test Files 1 failed (1) + Tests 2 failed | 2 passed (4) +``` + +Failures: +- `container-level format=duration on last_30sec.webm exceeds 25 s` + — `expected 9934 to be greater than or equal to 25000` +- `ffmpeg full decode of last_30sec.webm reaches at least 25 s of timeline` + — `expected 9960 to be greater than or equal to 25000` + +Full suite (proves zero collateral regression): +``` +$ npx vitest run + Test Files 1 failed | 10 passed (11) + Tests 2 failed | 53 passed (55) +``` + +All 53 pre-existing tests still GREEN. tsc: +``` +$ npx tsc --noEmit; echo exit=$? +exit=0 +``` + ## Eliminated (populated by debugger as hypotheses are ruled out) @@ -266,6 +510,13 @@ tdd_checkpoint: "" - (H5: defective committed fixture in storage): ruled out — file size matches expected (1.63 MB matches what was committed on 2026-05-15; not bit-rot) +- H6 (cluster-aware-trim revisit of D-09..D-11): rejected on architectural + weakness — non-deterministic content window (depends on keyframe + cadence), still needs EBML parsing, re-opens freeze-regression + surface area. See Evidence/cluster-aware-trim section. +- H7 (WebCodecs re-encode path): rejected as over-engineered — re-encodes + VP9 frames we already have. ~500-1000 LOC for zero quality/playability + benefit. See Evidence/WebCodecs section. ## Resolution diff --git a/tests/offscreen/webm-playback.test.ts b/tests/offscreen/webm-playback.test.ts index e82f373..871f491 100644 --- a/tests/offscreen/webm-playback.test.ts +++ b/tests/offscreen/webm-playback.test.ts @@ -33,6 +33,27 @@ // Skip discipline: if ffmpeg is missing from the environment the test // auto-skips rather than failing. CI ships ffmpeg per `smoke.sh` so this is // a developer-convenience fence, not a behavioural softening. +// +// --- 2026-05-16 amendment: D-13 architecture failure RED tests --- +// +// Debug session `.planning/debug/d13-multi-ebml-concat-unplayable.md` proved +// the existing two assertions ABOVE pass under D-13 only because they check +// structural validity (ffmpeg null-decode tolerates the multi-EBML-header +// concat by silently reading segments 1+2 and dropping segment 3, and by +// collapsing all segments onto seg1's local timestamp axis so no muxer +// "File ended prematurely" warning fires). Players that respect Matroska's +// segment-info Duration element (mpv, Chrome's HTMLMediaElement, ffprobe's +// `format=duration`) read 9.94 s — the FIRST segment's metadata duration — +// and stop. The committed 1.6 MB fixture contains ~30 s of valid VP9 frames +// but presents as ~10 s of content to operators and tests. +// +// The "container-level playable duration" describe block below adds the +// assertion the closure check missed on 2026-05-15: that ffprobe-reported +// format duration EXCEEDS 25_000 ms for the canonical fixture. This is +// RED today under D-13 and stays RED until the multi-EBML concat at +// src/background/index.ts mergeVideoSegments() is replaced with a true +// remux that writes a single EBML header whose Info.Duration covers the +// whole ~30 s span. import { describe, it, expect } from 'vitest'; import { existsSync, statSync } from 'node:fs'; @@ -43,12 +64,21 @@ import { dirname, resolve } from 'node:path'; const here = dirname(fileURLToPath(import.meta.url)); const FIXTURE_PATH = resolve(here, '..', 'fixtures', 'last_30sec.webm'); const FFMPEG_BIN = '/usr/bin/ffmpeg'; +const FFPROBE_BIN = '/usr/bin/ffprobe'; // Cap: a clean 30-second WebM decoded with `-f null` finishes well under // 10 s on commodity hardware. If we ever exceed this we want a hard failure, // not a hung CI job. const FFMPEG_TIMEOUT_MS = 30_000; +// Playable-duration floor. The recorder rotates every 10 s and keeps 3 +// segments (D-13 / SEGMENT_DURATION_MS × MAX_SEGMENTS = 30_000 ms). The +// rotation lifecycle can drop a partial sub-second at each boundary so the +// final remux file is bounded by [~27_000, ~30_000] ms in steady state. We +// gate at 25_000 ms to keep slack for boundary noise but still firmly above +// the broken-architecture failure mode (9_940 ms — first segment only). +const MIN_PLAYABLE_DURATION_MS = 25_000; + function ffmpegAvailable(): boolean { try { return existsSync(FFMPEG_BIN) && statSync(FFMPEG_BIN).isFile(); @@ -57,6 +87,14 @@ function ffmpegAvailable(): boolean { } } +function ffprobeAvailable(): boolean { + try { + return existsSync(FFPROBE_BIN) && statSync(FFPROBE_BIN).isFile(); + } catch { + return false; + } +} + interface DecodeResult { stderr: string; packetErrorCount: number; @@ -113,6 +151,44 @@ function decodeDryRunStrict(fixturePath: string): DecodeResult { }; } +/** + * Read the container-level `format=duration` value from a WebM file via + * ffprobe. This is the value that mpv, Chrome's HTMLMediaElement, and most + * Matroska parsers honor when deciding "how long is this file?" — they pick + * up the first Segment's Info.Duration EBML element and stop seeking past + * the EBML header's reported length. + * + * Returns NaN on parse failure (ffprobe missing input track, malformed + * float, etc.) so the assertion downstream can produce a precise error + * message rather than masking a probe-side failure as a duration check. + * + * @param fixturePath - Absolute path to the WebM file under test. + * @returns Container-level duration in milliseconds. + */ +function probeContainerDurationMs(fixturePath: string): number { + const proc = spawnSync( + FFPROBE_BIN, + [ + '-v', 'error', + '-show_entries', 'format=duration', + '-of', 'default=noprint_wrappers=1:nokey=1', + '-i', fixturePath, + ], + { + stdio: ['ignore', 'pipe', 'pipe'], + encoding: 'utf-8', + timeout: FFMPEG_TIMEOUT_MS, + maxBuffer: 1 * 1024 * 1024, + }, + ); + if (proc.signal !== null) { + throw new Error(`ffprobe was killed by signal ${proc.signal}`); + } + const stdout = (proc.stdout ?? '').trim(); + const seconds = parseFloat(stdout); + return Number.isFinite(seconds) ? Math.round(seconds * 1000) : Number.NaN; +} + describe('webm playback (RED — confirms webm-playback-freeze bug)', () => { it.skipIf(!ffmpegAvailable())( 'ffmpeg dry-run on last_30sec.webm produces zero decoder packet errors', @@ -151,3 +227,90 @@ describe('webm playback (RED — confirms webm-playback-freeze bug)', () => { }, ); }); + +describe('webm playable duration (RED — confirms d13-multi-ebml-concat-unplayable bug)', () => { + it.skipIf(!ffprobeAvailable())( + 'container-level format=duration on last_30sec.webm exceeds 25 s', + () => { + // SPEC §10 #7 requires last_30sec.webm to "play back in a browser" + // covering the most recent ~30 s. Both mpv and Chrome's HTMLMediaElement + // honor the first Segment's Info.Duration EBML element — which under + // D-13's multi-EBML concat is hardcoded to the FIRST segment's local + // duration (~9.94 s for the canonical fixture). That bug means the + // canonical Phase 1 closure fixture (committed 2026-05-15) presents + // as ~10 s of content to any standards-compliant Matroska parser, + // even though segments 2+3 are physically present in the bytes. + // + // The fix is a true WebM REMUX of the concatenated segments: parse + // each segment's clusters via an EBML library, extract the VP9 + // frame payloads with their keyframe/delta flags, and re-mux into + // a single-EBML-header WebM whose clusters carry monotonically + // increasing timestamps. The resulting file's Info.Duration will + // span the full ~30 s window. + // + // Floor of MIN_PLAYABLE_DURATION_MS (25_000) accommodates the + // ~3 s boundary slack from segment rotation while remaining well + // above the broken-architecture failure mode (9_940 ms). + expect(existsSync(FIXTURE_PATH)).toBe(true); + const durationMs = probeContainerDurationMs(FIXTURE_PATH); + expect( + durationMs, + `ffprobe reported container duration=${durationMs} ms for ${FIXTURE_PATH}. ` + + `Under SPEC §10 #7 the file must present at least ${MIN_PLAYABLE_DURATION_MS} ms ` + + `of playable content to standards-compliant Matroska parsers (mpv, Chrome). ` + + `If this value is ~9_940 ms the file is a multi-EBML-header concat (D-13 raw output) ` + + `where players honor only the first segment's local Info.Duration metadata. ` + + `Fix: replace mergeVideoSegments() in src/background/index.ts with a true WebM remux ` + + `(parse + rewrite into a single-EBML-headered WebM with adjusted monotonic timestamps).`, + ).toBeGreaterThanOrEqual(MIN_PLAYABLE_DURATION_MS); + }, + ); + + it.skipIf(!ffmpegAvailable())( + 'ffmpeg full decode of last_30sec.webm reaches at least 25 s of timeline', + () => { + // Defense-in-depth: even if a future ffprobe quirk computes + // format=duration by summing all reachable cluster timestamps, + // ffmpeg's full null-decode of the concatenated file collapses + // segments 2..N onto the first segment's local timestamp axis + // (verified empirically 2026-05-16: 601 frames decoded, time=09.96) + // because the multi-EBML format provides no segment-level offset. + // The remux fix will produce a stream whose decoded `time=...` + // reaches at least 25 s end-to-end. + expect(existsSync(FIXTURE_PATH)).toBe(true); + const proc = spawnSync( + FFMPEG_BIN, + ['-nostdin', '-v', 'error', '-stats', '-i', FIXTURE_PATH, '-f', 'null', '-'], + { + stdio: ['ignore', 'ignore', 'pipe'], + encoding: 'utf-8', + timeout: FFMPEG_TIMEOUT_MS, + maxBuffer: 4 * 1024 * 1024, + }, + ); + if (proc.signal !== null) { + throw new Error(`ffmpeg was killed by signal ${proc.signal}`); + } + const stderr = proc.stderr ?? ''; + // ffmpeg's `-stats` line on the final frame looks like: + // frame= 601 fps=0.0 q=-0.0 Lsize=N/A time=00:00:09.96 bitrate=N/A ... + // We want the LAST time= match (subsequent stats lines overwrite the + // earlier ones with monotonically increasing time values). + const timeMatches = [...stderr.matchAll(/time=(\d{2}):(\d{2}):(\d{2})\.(\d{2})/g)]; + const last = timeMatches[timeMatches.length - 1]; + const decodedMs = last + ? (parseInt(last[1], 10) * 3600 + parseInt(last[2], 10) * 60 + parseInt(last[3], 10)) * 1000 + + parseInt(last[4], 10) * 10 + : Number.NaN; + expect( + decodedMs, + `ffmpeg decoded only ${decodedMs} ms of timeline from ${FIXTURE_PATH}. ` + + `SPEC §10 #7 requires at least ${MIN_PLAYABLE_DURATION_MS} ms of decoded content. ` + + `If decoded duration is ~9_960 ms the multi-EBML concat is collapsing all segments ` + + `onto seg1's local timestamp axis (the timestamp-collision symptom). ` + + `Fix: real WebM remux per d13-multi-ebml-concat-unplayable debug session. ` + + `Full ffmpeg stderr:\n${stderr}`, + ).toBeGreaterThanOrEqual(MIN_PLAYABLE_DURATION_MS); + }, + ); +});