revert(01): reopen Phase 1 — D-13 multi-EBML-concat is unplayable

REQ-video-ring-buffer flipped from [x] back to [ ]. ROADMAP.md Phase 1
row reverted from [x] Closed 2026-05-15 to [ ] reopened 2026-05-16.
STATE.md status flipped phase_complete → phase_reopened with full
historical narrative preserved.

Root cause (confirmed at byte level by gsd-debugger 2026-05-16):
D-13's concat-of-self-contained-WebM-segments architecture produces a
3-EBML-header WebM that standards-compliant Matroska parsers
(mpv, ffmpeg, Chrome HTMLMediaElement) play only as the first segment
(~9.94 s) and silently drop the remaining 2 segments. Confirmed via
operator mpv drag-drop test of BOTH the canonical 2026-05-15 closure
fixture and the 2026-05-16 UAT-produced fixture — both exhibit the
same broken playback.

The 2026-05-15 "operator-confirmed clean Chrome playback" assessment
was insufficient: it verified the file plays without freezing but did
not measure total duration. Phase 1's primary deliverable
(REQ-video-ring-buffer / SPEC §10 #7) is therefore NOT satisfied.

Fix path chosen by user: ts-ebml (parse) + webm-muxer (write) to
replace mergeVideoSegments file-concat with real single-EBML remux.
Will land as Plan 01-08 via fresh /gsd-plan-phase ceremony.

RED test landed in tests/offscreen/webm-playback.test.ts (2 new
assertions on container-format-duration + ffmpeg-full-decode-duration).
2 failures, 53 baseline tests still GREEN.

Option C port-lifecycle refactor (debug session
empty-archive-port-race, commits 674c415..f0871c0) DID land cleanly
and is retained — that fix was orthogonal and correctly resolved the
silent-empty-archive symptom that previously masked this deeper bug.

Debug session: .planning/debug/d13-multi-ebml-concat-unplayable.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-16 19:47:47 +02:00
parent f1026954fc
commit bc310d98cf
5 changed files with 479 additions and 55 deletions

View File

@@ -16,7 +16,7 @@ Requirements for the Phase 1 SPEC. Each maps to one phase in ROADMAP.md.
### Video ### Video
- [x] **REQ-video-ring-buffer**: The extension maintains an in-memory ring - [ ] **REQ-video-ring-buffer**: The extension maintains an in-memory ring
buffer containing the most recent 30 seconds of captured video. AMENDED in buffer containing the most recent 30 seconds of captured video. AMENDED in
Phase 01: video is acquired via `navigator.mediaDevices.getDisplayMedia()` Phase 01: video is acquired via `navigator.mediaDevices.getDisplayMedia()`
invoked from the offscreen document (with `chrome.offscreen.Reason.DISPLAY_MEDIA`), invoked from the offscreen document (with `chrome.offscreen.Reason.DISPLAY_MEDIA`),
@@ -35,9 +35,19 @@ Requirements for the Phase 1 SPEC. Each maps to one phase in ROADMAP.md.
CON-video-window, CON-video-codec, CON-display-capture-binding (replaces CON-video-window, CON-video-codec, CON-display-capture-binding (replaces
RETIRED CON-tab-capture-binding). CON-webm-header-retention RETIRED in RETIRED CON-tab-capture-binding). CON-webm-header-retention RETIRED in
favor of D-13 per-segment header isolation. favor of D-13 per-segment header isolation.
- SPEC §10 acceptance criteria: #2, #3, #7 — all green 2026-05-15 - SPEC §10 acceptance criteria: #2, #3 green 2026-05-15 (D-12 ffprobe
(D-12 ffprobe gate + operator-confirmed Chrome playback + ffmpeg dry-run gate). #7 (last_30sec.webm plays back in a browser) — **REOPENED
exit 0 with zero decoder errors against `tests/fixtures/last_30sec.webm`). 2026-05-16**: D-13's concat-of-self-contained-segments architecture
produces a multi-EBML-header file that standards-compliant Matroska
parsers (mpv, ffmpeg, Chrome's HTMLMediaElement) play only as the
first segment (~9.94 s) and silently drop segments 2 and 3. The
2026-05-15 "operator-confirmed clean Chrome playback" assessment was
insufficient — it checked playback ran without freezing but did not
measure total duration. Plan 01-08 (WebM remux via ts-ebml +
webm-muxer) will replace `mergeVideoSegments`'s file-concat with a
real single-EBML-headered remux, restoring SPEC §10 #7. See
`.planning/debug/d13-multi-ebml-concat-unplayable.md` for the
byte-level root-cause evidence.
### DOM Capture ### DOM Capture
@@ -193,7 +203,7 @@ Which phase covers which requirement. See ROADMAP.md for phase details.
| Requirement | Phase | Status | | Requirement | Phase | Status |
|-------------|-------|--------| |-------------|-------|--------|
| REQ-video-ring-buffer | Phase 1 | Complete | | REQ-video-ring-buffer | Phase 1 | In progress (reopened 2026-05-16: SPEC §10 #7 fails; Plan 01-08 WebM remux pending) |
| REQ-rrweb-dom-buffer | Phase 2 | Pending | | REQ-rrweb-dom-buffer | Phase 2 | Pending |
| REQ-user-event-log | Phase 2 | Pending | | REQ-user-event-log | Phase 2 | Pending |
| REQ-password-confidentiality | Phase 2 | Pending | | REQ-password-confidentiality | Phase 2 | Pending |

View File

@@ -22,7 +22,7 @@ working export → green §10 smoke → harden + clean up**.
Decimal phases appear between their surrounding integers in numeric order. Decimal phases appear between their surrounding integers in numeric order.
- [x] **Phase 1: Stabilize video pipeline** — Collapse offscreen duality, fix MediaRecorder shadow, fix WebM ring buffer playability, replace `chrome.tabCapture` with offscreen `getDisplayMedia` (AMENDED from original DEC-003). **Closed 2026-05-15** — D-12 ffprobe gate + A3 empirical-playback gate both green against `tests/fixtures/last_30sec.webm` (1.6 MB VP9 1142×1038); D-13 restart-segments retired D-09..D-11 mid-phase; 30/30 vitest green, tsc clean. SPEC §10 #2, #3, #7 functionally satisfied (end-to-end Phase 4 smoke remains owner of §10). - [ ] **Phase 1: Stabilize video pipeline** — Collapse offscreen duality, fix MediaRecorder shadow, fix WebM ring buffer playability, replace `chrome.tabCapture` with offscreen `getDisplayMedia` (AMENDED from original DEC-003). **Closed 2026-05-15 then REOPENED 2026-05-16**: the 2026-05-15 closure was based on insufficient operator playback verification; D-13's concat-of-self-contained-segments architecture produces a multi-EBML WebM that plays only ~9 s instead of ~30 s in standards-compliant parsers (mpv, ffmpeg, Chrome HTMLMediaElement). UAT Test 3 retest on 2026-05-16 confirmed via byte-level EBML probe. SPEC §10 #7 not actually satisfied. Plan 01-08 (WebM remux via ts-ebml + webm-muxer) replaces `mergeVideoSegments`'s file-concat with a real single-EBML remux. See `.planning/debug/d13-multi-ebml-concat-unplayable.md`. Option C port-lifecycle refactor (debug session `empty-archive-port-race`) DID land cleanly and is retained. Phase 1 will additionally absorb whole-desktop + auto-start UX work (Plans 01-09/01-10) per the 2026-05-16 amended charter.
- [ ] **Phase 2: Stabilize DOM + event capture privacy** — Migrate rrweb to v2 `maskInputFn`, plug `content/index.ts setupInputLogging` password leak - [ ] **Phase 2: Stabilize DOM + event capture privacy** — Migrate rrweb to v2 `maskInputFn`, plug `content/index.ts setupInputLogging` password leak
- [ ] **Phase 3: Stabilize export pipeline** — Restore user-activation gesture in popup, delete dead `permissions.request`, replace base64 `data:` URL with Blob URL minted in offscreen - [ ] **Phase 3: Stabilize export pipeline** — Restore user-activation gesture in popup, delete dead `permissions.request`, replace base64 `data:` URL with Blob URL minted in offscreen
- [ ] **Phase 4: SPEC §10 smoke verification** — End-to-end install-and-record-and-export pass against all 9 acceptance criteria - [ ] **Phase 4: SPEC §10 smoke verification** — End-to-end install-and-record-and-export pass against all 9 acceptance criteria

View File

@@ -2,16 +2,16 @@
gsd_state_version: 1.0 gsd_state_version: 1.0
milestone: v2.0.0 milestone: v2.0.0
milestone_name: milestone milestone_name: milestone
status: phase_complete status: phase_reopened
stopped_at: "Phase 1 closure 2026-05-15: D-12 ffprobe gate + A3 empirical-playback gate both green against tests/fixtures/last_30sec.webm (1.6 MB VP9 1142×1038, 3-segment multi-EBML-header concat). D-13 restart-segments retired D-09..D-11 mid-phase. 30/30 vitest green incl. empirical webm-playback dry-runs; tsc clean; ffmpeg -v warning -i fixture -f null - exit 0 with zero decoder errors (only expected muxer DTS-monotonicity warnings at segment join boundaries); operator confirmed clean Chrome playback end-to-end. REQ-video-ring-buffer marked Complete. Ready to plan Phase 2 (DOM + event-capture privacy)." stopped_at: "Phase 1 REOPENED 2026-05-16: D-13 multi-EBML-concat architecture confirmed broken via UAT Test 3 retest + byte-level EBML probe — produced WebM plays only ~9 s instead of ~30 s in mpv AND Chrome (the 2026-05-15 closure's operator playback check was insufficient). Phase 1's primary deliverable (REQ-video-ring-buffer, SPEC §10 #7) is NOT satisfied. Plan 01-08 (WebM remux via ts-ebml + webm-muxer) will replace mergeVideoSegments file-concat with real single-EBML remux. RED test landed in tests/offscreen/webm-playback.test.ts (2 failures, 53 baseline GREEN). Option C port-lifecycle refactor (debug session empty-archive-port-race) DID land cleanly (commits 674c415..f0871c0). Phase 1 also absorbs whole-desktop + auto-start UX (Plans 01-09/01-10) per 2026-05-16 amended charter."
last_updated: "2026-05-15T21:42:00.000Z" last_updated: "2026-05-16T17:35:00Z"
last_activity: "2026-05-15 — Phase 1 closure: D-12 + A3 gates green; REQ-video-ring-buffer complete; ready for Phase 2" last_activity: "2026-05-16 — Phase 1 reopened: D-13 multi-EBML architecture confirmed broken by mpv/Chrome playback test; Plan 01-08 (ts-ebml + webm-muxer remux) pending; markers reverted"
progress: progress:
total_phases: 5 total_phases: 5
completed_phases: 1 completed_phases: 0
total_plans: 7 total_plans: 10
completed_plans: 7 completed_plans: 7
percent: 100 percent: 70
--- ---
# Project State # Project State

View File

@@ -8,8 +8,8 @@ trigger: |
1675899 bytes, total segments merged: 3") but the resulting file plays 1675899 bytes, total segments merged: 3") but the resulting file plays
ONLY ~9 s in Chrome AND in mpv. Cross-checking the canonical fixture ONLY ~9 s in Chrome AND in mpv. Cross-checking the canonical fixture
committed at Phase 1 closure on 2026-05-15 (`tests/fixtures/last_30sec.webm`, committed at Phase 1 closure on 2026-05-15 (`tests/fixtures/last_30sec.webm`,
1633459 bytes, 3 segments per architecture) reveals it ALSO plays only ~9 s 1633459 bytes, 3 segments per architecture) reveals it ALSO plays only
in mpv. Operator confirmed both via mpv playback test. ~9 s in mpv. Operator confirmed both via mpv playback test.
This means D-13's "concat of self-contained WebM segments → playable 30 s This means D-13's "concat of self-contained WebM segments → playable 30 s
WebM" architecture is fundamentally broken. The 2026-05-15 Phase 1 WebM" architecture is fundamentally broken. The 2026-05-15 Phase 1
@@ -24,7 +24,7 @@ trigger: |
though it was marked Complete in REQUIREMENTS.md/ROADMAP.md/STATE.md though it was marked Complete in REQUIREMENTS.md/ROADMAP.md/STATE.md
on 2026-05-15. on 2026-05-15.
created: 2026-05-16T16:56:41Z created: 2026-05-16T16:56:41Z
updated: 2026-05-16T16:56:41Z updated: 2026-05-16T17:25:00Z
phase: 01-stabilize-video-pipeline phase: 01-stabilize-video-pipeline
related_uat: .planning/phases/01-stabilize-video-pipeline/01-UAT.md related_uat: .planning/phases/01-stabilize-video-pipeline/01-UAT.md
related_review_fix: .planning/phases/01-stabilize-video-pipeline/01-REVIEW-FIX.md related_review_fix: .planning/phases/01-stabilize-video-pipeline/01-REVIEW-FIX.md
@@ -121,15 +121,19 @@ the segment boundary noise from concatenation, not playback failure.
## Current Focus ## Current Focus
hypothesis: | hypothesis: |
**H4 confirmed by operator empirical test**: D-13's "concat of self- **H4 confirmed by byte-level EBML probe (2026-05-16T17:10Z, see
contained WebM segments → produce playable 30 s WebM" architecture Evidence/H4 below)**: D-13's "concat of self-contained WebM
does not work in practice because most Matroska/WebM players do not segments → produce playable 30 s WebM" architecture does not work
implement the multi-segment Matroska feature. The Matroska spec because standards-compliant Matroska parsers (mpv, mkvtoolnix,
permits multiple segments in one file but most decoders read only Chrome's HTMLMediaElement, ffprobe's `format=duration` path) honor
the first segment's EBML header and stop there. ffmpeg's behavior the FIRST Segment element's Info.Duration EBML (~9_934 ms for the
(which mpv inherits) is to honor the first EBML's duration metadata. fixture) and stop there. Even ffmpeg's matroska DEMUXER — which is
Chrome's MSE implementation appears to do the same (per UAT operator unusually liberal and reads through the second segment's EBML
observation). header — collapses segments 2..N onto seg1's local timestamp axis
(verified empirically: 601 packets decoded from segs 1+2, ZERO
packets from seg3, output `time=00:00:09.96`). Multi-segment
Matroska is technically permitted by the spec but in practice
consumer-grade players do not implement it.
**H3 confirmed by operator empirical test**: The 2026-05-15 Phase 1 **H3 confirmed by operator empirical test**: The 2026-05-15 Phase 1
closure's "operator-confirmed clean Chrome playback" check was closure's "operator-confirmed clean Chrome playback" check was
@@ -145,49 +149,98 @@ hypothesis: |
produces a file that any player can read end-to-end as one continuous produces a file that any player can read end-to-end as one continuous
~30 s stream. ~30 s stream.
**Candidate implementations**: **Candidate implementations** (researched 2026-05-16, see
- `webm-muxer` npm package (Vanilla. ~10 KB. Browser + Node support. Evidence/library-survey below for full table):
Single-segment output. Active maintenance.) - `webm-muxer` 5.1.4 (Vanilagy, MIT, last release 2025-07-02,
- `ts-ebml` (EBML parser + writer. Allows manual control over gzipped ~12 KB, pure ESM/CJS no DOM globals). `addVideoChunkRaw(data,
structure. ~50 KB.) type:'key'|'delta', timestamp, meta?)` accepts already-encoded VP9
frames — exactly the shape produced by a stream of existing WebM
segments. SW-compatible. PRIMARY CANDIDATE for the write half.
- `ts-ebml` 3.0.2 (legokichi, MIT, last release 2025-09-28, gzipped
~87 KB, UMD has a single `typeof window` check with self-fallback
so SW-compatible). Decoder+Encoder API. Needed for the parse half
(extract VP9 SimpleBlock payloads + cluster timecodes + keyframe
flags from each segment).
- `ebml` 3.0.0 (node-ebml, MIT, last release **2018-09-06** — dead
upstream). Smaller but unmaintained.
- `mp4-muxer` 5.2.2 (sibling of webm-muxer; not applicable — we need
WebM container output).
- Custom EBML parser (full control, ~500-1000 LOC, no dep weight) - Custom EBML parser (full control, ~500-1000 LOC, no dep weight)
- **Alternative path: MediaRecorder timeslice with cluster-aware trim**: - **Alternative path: MediaRecorder timeslice with cluster-aware trim**:
revisit retired D-09..D-11 architecture but trim ONLY on keyframe revisit retired D-09..D-11 architecture but trim ONLY on keyframe
boundaries (preserving every cluster from the most recent keyframe boundaries (preserving every cluster from the most recent keyframe
onwards). This avoids the A3 orphan-P-frame freeze by guaranteeing onwards). See Evidence/cluster-aware-trim below — the DETERMINISTIC
every kept cluster's references are present. ~200-400 LOC. The floor on retained-content duration is much less than 30 s
risk: requires understanding EBML/Matroska cluster structure to (worst-case: keyframe just emitted → retain only the post-keyframe
trim correctly. sliver) because VP9 kf_max_dist under Chrome's MediaRecorder is
irregular (3-5 s typical, 26 s observed in the prior debug
session). This path produces a NON-DETERMINISTIC content window;
rejected as architecturally weaker than remux.
- **Alternative path: WebCodecs API** (VideoEncoder + Muxer.js or - **Alternative path: WebCodecs API** (VideoEncoder + Muxer.js or
similar): full control over container framing. Significant rewrite similar): full control over container framing. Significant rewrite
(~1000-2000 LOC). Most flexible but heaviest. (~1000-2000 LOC). Most flexible but heaviest. WebCodecs is
available in MV3 service workers per Chrome 94+ — viable but
over-engineered for the current need (we already have VP9 frames,
we just need to RE-CONTAIN them).
The remux approach (webm-muxer or equivalent) is likely the right The recommendation (TIEBREAKER only — the user makes the call):
trade-off: small, well-tested library, preserves D-13's segment `ts-ebml` (parse) + `webm-muxer` (write) is the smallest fix that
lifecycle benefits (no orphan-P-frame freeze, ~10s rotation gap matches the actual problem shape. Combined ~100 KB gzipped, both
acceptable), but produces a single-EBML output that all players MIT, both actively maintained, both verified SW-compatible. Net
read correctly. source-edit LOC ~150-300 in `src/background/index.ts`
mergeVideoSegments() — we don't decode/re-encode VP9 frames, we
just parse them out of segments and re-emit with monotonic
timestamps. Preserves D-13's recorder-side lifecycle (which DID
fix the orphan-P-frame freeze) and adds a single new SW-side
remux pass on the save path.
test: | test: |
RED test: introduce a playable-duration assertion to RED test LANDED at tests/offscreen/webm-playback.test.ts. Two new
tests/offscreen/webm-playback.test.ts. Use ffprobe -count_frames assertions in the new `describe('webm playable duration (RED —
-show_streams to count VIDEO FRAMES (not just metadata duration), confirms d13-multi-ebml-concat-unplayable bug)')` block:
then divide by reported frame rate to compute actual playable
content duration. Assert actual_duration > 25_000 ms for the
generated/committed fixture. This test should FAIL against the
current D-13 architecture and PASS after the remux fix lands.
Alternative RED test: ffprobe -read_intervals -i FILE 1. `container-level format=duration on last_30sec.webm exceeds 25 s`
'0%+#90000' (seek to last 90s, read all packets). Count packets — uses ffprobe to read `format=duration`. Asserts
read. Should be ~600 packets for 30s @ ~20fps, not ~200 for 9s. `>= MIN_PLAYABLE_DURATION_MS = 25_000`. RED today
(actual: 9_934 ms).
2. `ffmpeg full decode of last_30sec.webm reaches at least 25 s of
timeline` — parses the last `time=HH:MM:SS.MS` from `ffmpeg -stats
-f null -` output. Asserts `>= 25_000 ms`. RED today
(actual: 9_960 ms).
Both gate behind `it.skipIf(!ffprobeAvailable())` /
`it.skipIf(!ffmpegAvailable())` so CI environments without those
binaries auto-skip rather than hard-fail (matches the existing
webm-playback.test.ts skip discipline). The existing two structural-
validity tests in the same file (`...zero decoder packet errors` and
`...does not end prematurely`) remain GREEN and untouched.
expecting: | expecting: |
RED test fails on current code (both fixture and freshly-recorded RED test fails on current code (both fixture and freshly-recorded
output should fail the duration assertion). Debugger then implements output should fail the duration assertion). Debugger then implements
the chosen fix path (webm-muxer remux most likely) and re-asserts the chosen fix path (webm-muxer + ts-ebml remux most likely) and
GREEN. re-asserts GREEN. RED confirmed 2026-05-16T17:20Z: 11 test files,
next_action: gather initial evidence from EBML parsing of both fixtures + research candidate JS remux libraries 53 passed + 2 failed (the two new assertions). All pre-existing
reasoning_checkpoint: "" tests still GREEN; tsc clean (exit 0).
tdd_checkpoint: "" next_action: CHECKPOINT to orchestrator — root cause confirmed, RED test landed, fix-strategy options surfaced; awaiting user's chosen path via orchestrator routing.
reasoning_checkpoint: |
Why CHECKPOINT here rather than execute: the choice between
`ts-ebml + webm-muxer` vs `custom EBML parser` vs `cluster-aware
trim revisit of D-09..D-11` vs `WebCodecs rewrite` is architecturally
significant (it determines whether Phase 1's deliverable stays in
the debug-session hotfix lane OR escalates to a fresh Plan 01-08,
and whether the project gains two new runtime deps). Per the
feedback memory `feedback-no-unilateral-scope-reduction.md` the
debugger does not narrow this for the user — surface options and
let the user pick.
tdd_checkpoint: |
RED gate honored. Two new failing assertions in
tests/offscreen/webm-playback.test.ts pin the playable-duration
contract that the 2026-05-15 closure check missed. Existing
structural-validity tests remain GREEN. tsc clean. Full vitest run
reports `Test Files 1 failed | 10 passed (11) / Tests 2 failed | 53
passed (55)` — exactly the expected RED-on-new shape, no collateral
regression.
## Constraints ## Constraints
@@ -255,6 +308,197 @@ tdd_checkpoint: ""
- Both files report `duration=9.94 s` via ffprobe -show_entries format=duration - Both files report `duration=9.94 s` via ffprobe -show_entries format=duration
- Decoder errors: zero (segments are individually valid) - Decoder errors: zero (segments are individually valid)
### H4 byte-level EBML probe (2026-05-16T17:10Z) — confirms multi-EBML-concat is the root cause
Probe target: `tests/fixtures/last_30sec.webm` (1_633_459 bytes, committed
fixture from Phase 1 closure 2026-05-15).
**EBML structural scan** (raw byte search for element IDs per
[Matroska spec](https://www.matroska.org/technical/elements.html)):
| EBML element | ID (hex) | Occurrences in file | Byte offsets |
|---|---|---|---|
| EBML header | `1A 45 DF A3` | **3** | `[0, 509038, 970967]` |
| Segment | `18 53 80 67` | **3** | `[36, 509074, 971003]` |
| Cluster | `1F 43 B6 75` | 13 | spread across all 3 segments |
The file is THREE concatenated WebM files, each with its own EBML header
+ Segment element. mkvinfo (without `--all-elements`) reports only the
FIRST segment + its EBML header — two top-level elements visible —
confirming standards-compliant parsers stop at the first segment.
**Per-segment isolated probes** (sliced via Python at the EBML offsets
above into `/tmp/d13-seg{1,2,3}.webm`):
| Segment | Bytes | format=duration | -count_frames |
|---|---|---|---|
| seg1 | 509_038 | 9.934 s | 301 frames |
| seg2 | 461_929 | 9.963 s | 300 frames |
| seg3 | 662_492 | 9.958 s | 311 frames |
| **TOTAL** | **1_633_459** | (29.86 s of real content) | **912 frames** |
Each segment is individually a valid, complete ~10 s WebM. The
underlying VP9 stream is intact across all three. The bug is purely the
multi-segment topology of the concatenated container.
**Concatenated file probe** (the actual fixture):
| Probe command | Reported value |
|---|---|
| `ffprobe -show_entries format=duration` | **9.934024 s** (first segment's Info.Duration metadata only) |
| `ffprobe -count_frames` | **601 frames** (= 301 + 300 = segs 1+2 only) |
| `ffmpeg -f null -` decoder | **frame=601 time=00:00:09.96** + 299 non-monotonic-DTS warnings |
| Packets read from byte range `pos<509038` (seg1) | 301 |
| Packets read from byte range `509038 ≤ pos < 970967` (seg2) | 300 |
| Packets read from byte range `pos ≥ 970967` (seg3) | **0** |
| mkvinfo top-level elements visible | 2 (EBML head + Segment) — seg2 + seg3 invisible |
Two observations both fatal to D-13:
1. ffmpeg's matroska demuxer is the most-permissive parser in common
use and even IT silently drops segment 3 (zero packets from
`pos ≥ 970967`).
2. Even when ffmpeg DOES read segments 1+2 it does not offset seg2's
local timestamps onto the global timeline. The 299 non-monotonic-DTS
warnings are seg2's local timestamps (`tt < 9934 ms`) colliding with
seg1's end timestamp (`9934 ms`). Output `time=00:00:09.96` because
the muxer cannot grow the timeline past the maximum monotonic DTS
it has accepted.
Conclusion: H4 confirmed at the byte level. The file is structurally
valid as a "concatenated archive of three WebMs" but is NOT a single
30-second playable WebM. To produce a 30-second playable WebM the
segments must be REMUXED (parse VP9 frames + keyframe flags + cluster
timestamps from each segment, then re-emit them inside a single
EBML-headered container with monotonically-adjusted timestamps).
### Library survey (2026-05-16T17:15Z) — candidate JS WebM remux libraries
All sizes are the bundled dist (no source-map, no tests, no docs).
Gzipped values measured locally via `gzip -c`. SW-compat verdict is
based on grep of dist for `window`/`document`/`navigator`/`XMLHttpRequest`
followed by manual inspection of any hits.
| Lib | Version | License | Last release | Dist size | Gzipped | SW-compat | API shape | Verdict |
|---|---|---|---|---|---|---|---|---|
| `webm-muxer` | 5.1.4 | MIT | 2025-07-02 | 69 KB | **~12 KB** | YES (zero DOM refs) | `addVideoChunkRaw(data, type:'key'\|'delta', ts, meta?)` accepts encoded VP9 frames | PRIMARY — write half |
| `ts-ebml` | 3.0.2 | MIT | 2025-09-28 | 356 KB | ~87 KB | YES (`typeof window` with `self` fallback in UMD wrapper) | `Decoder.decode(ArrayBuffer) → EBMLElementDetail[]` ; `Encoder.encode(elms) → ArrayBuffer` | PRIMARY — parse half |
| `ebml` | 3.0.0 | MIT | **2018-09-06** | 7.7 MB unpacked | n/a | uncertain | older streaming parser API | DEAD UPSTREAM — avoid |
| `mp4-muxer` | 5.2.2 | MIT | (active) | 70 KB | ~13 KB | YES | analogous to webm-muxer but MP4 | n/a — wrong container |
| Custom EBML parser | n/a | n/a | n/a | 0 KB | 0 KB | YES | hand-rolled per Matroska spec | ~500-1000 LOC, full ownership |
Important note on the `webm-muxer` API: `addVideoChunkRaw()` takes
already-encoded VP9 frame bytes + a keyframe flag + a timestamp. We
do NOT need to decode/re-encode the VP9 stream — the existing
segments already contain valid VP9 frame payloads inside their
Cluster/SimpleBlock elements. The remux path is:
1. For each segment blob, parse via `ts-ebml.Decoder` → walk the EBML
tree → for each Cluster's SimpleBlock children, extract the VP9
frame bytes + keyframe flag (Matroska SimpleBlock bit 7 of the
first flag byte = "keyframe" per
[spec](https://www.matroska.org/technical/elements.html#SimpleBlock))
+ cluster Timestamp + local block offset.
2. Compute monotonic adjusted timestamp: `globalTs = segmentBaseMs +
clusterTsMs + blockOffsetMs` where `segmentBaseMs` accumulates the
prior segment's total content duration.
3. Stream all adjusted frames into a single `webm-muxer.Muxer` with
`addVideoChunkRaw(frameData, isKey ? 'key' : 'delta', globalUs)`.
4. `muxer.finalize()` → `ArrayBufferTarget.buffer` → single-EBML
WebM Blob.
Combined dep weight: ~100 KB gzipped (`webm-muxer` ~12 KB + `ts-ebml`
~87 KB). Combined source edit estimate at `mergeVideoSegments()`:
~150-300 LOC including type defs.
### Cluster-aware-trim alternative path (D-09..D-11 revisit, 2026-05-16T17:18Z)
Path summary: keep MediaRecorder running continuously (the retired
D-09 lifecycle) but, on each periodic trim pass, scan the chunk buffer
for the OLDEST keyframe whose position would keep total duration ≤ 30
s, then drop everything strictly before that keyframe. Preserves header
chunk + a contiguous run of keyframe-anchored clusters.
Why this is architecturally weaker than remux:
1. **Non-deterministic content window.** MediaRecorder/VP9 keyframe
cadence under Chrome's default `kf_max_dist=100` is irregular —
the prior `webm-playback-freeze` debug session observed a 26-second
keyframe gap empirically. If the latest keyframe was emitted 2 s
ago, cluster-aware trim retains only 2 s of content. The user's
`last_30sec.webm` would be anywhere in `[~few seconds .. ~30 s]`
depending on when SAVE landed in the keyframe cycle. That breaks
SPEC §10 #7's implicit "≥ 30 s of recent context" requirement.
2. **Still need EBML parsing.** To find keyframe boundaries inside
the chunk buffer we still need to parse the WebM container for
SimpleBlock keyframe flags. So the dep weight is similar (`ts-ebml`
at minimum) but the output is worse.
3. **Re-introduces the freeze-risk surface area.** The prior debug
session retired D-09..D-11 precisely because age-trim repeatedly
produced orphan-P-frame freezes. A "keyframe-aware" variant still
has to delete content; one bug in the keyframe-detection path and
the freeze returns. The risk surface is wider than the remux path,
which never deletes — it only re-containers what already exists.
LOC estimate: ~200-400 LOC for keyframe parsing + buffer mutation +
tests. Net: similar dep weight, worse playable-duration guarantee,
re-opens the freeze regression surface. **REJECTED as inferior to
remux.** Documenting here only because the orchestrator brief
explicitly requested the comparison.
### WebCodecs API path (2026-05-16T17:19Z)
WebCodecs (`VideoEncoder` + `VideoDecoder`) is available in MV3 service
workers from Chrome 94+. The path would be: feed each segment's
clusters → `VideoDecoder` → emit `VideoFrame` objects → feed back into
`VideoEncoder` (re-encode VP9) → wrap output via `webm-muxer`.
This works but adds a re-encode pass that:
- doubles CPU cost during the save flow
- introduces an additional quality loss (re-encoding lossy VP9)
- adds 500-1000 LOC of encoder/decoder lifecycle management
- requires Chrome 94+ exclusively (we already require modern Chrome,
so OK, but it tightens the version floor)
There is no benefit over the `ts-ebml + webm-muxer` path for this
specific shape of problem — we already have encoded VP9 frames and
just need to put them in a different container. Re-encoding is
unnecessary work. **REJECTED as over-engineered.**
### RED test landing evidence (2026-05-16T17:20Z)
File edited: `tests/offscreen/webm-playback.test.ts` (preserved
existing 2 GREEN tests; appended new `describe` block with 2 new
assertions + supporting helpers).
Test run scoped to file:
```
$ npx vitest run tests/offscreen/webm-playback.test.ts
Test Files 1 failed (1)
Tests 2 failed | 2 passed (4)
```
Failures:
- `container-level format=duration on last_30sec.webm exceeds 25 s`
— `expected 9934 to be greater than or equal to 25000`
- `ffmpeg full decode of last_30sec.webm reaches at least 25 s of timeline`
— `expected 9960 to be greater than or equal to 25000`
Full suite (proves zero collateral regression):
```
$ npx vitest run
Test Files 1 failed | 10 passed (11)
Tests 2 failed | 53 passed (55)
```
All 53 pre-existing tests still GREEN. tsc:
```
$ npx tsc --noEmit; echo exit=$?
exit=0
```
## Eliminated ## Eliminated
(populated by debugger as hypotheses are ruled out) (populated by debugger as hypotheses are ruled out)
@@ -266,6 +510,13 @@ tdd_checkpoint: ""
- (H5: defective committed fixture in storage): ruled out — file size - (H5: defective committed fixture in storage): ruled out — file size
matches expected (1.63 MB matches what was committed on 2026-05-15; matches expected (1.63 MB matches what was committed on 2026-05-15;
not bit-rot) not bit-rot)
- H6 (cluster-aware-trim revisit of D-09..D-11): rejected on architectural
weakness — non-deterministic content window (depends on keyframe
cadence), still needs EBML parsing, re-opens freeze-regression
surface area. See Evidence/cluster-aware-trim section.
- H7 (WebCodecs re-encode path): rejected as over-engineered — re-encodes
VP9 frames we already have. ~500-1000 LOC for zero quality/playability
benefit. See Evidence/WebCodecs section.
## Resolution ## Resolution

View File

@@ -33,6 +33,27 @@
// Skip discipline: if ffmpeg is missing from the environment the test // Skip discipline: if ffmpeg is missing from the environment the test
// auto-skips rather than failing. CI ships ffmpeg per `smoke.sh` so this is // auto-skips rather than failing. CI ships ffmpeg per `smoke.sh` so this is
// a developer-convenience fence, not a behavioural softening. // a developer-convenience fence, not a behavioural softening.
//
// --- 2026-05-16 amendment: D-13 architecture failure RED tests ---
//
// Debug session `.planning/debug/d13-multi-ebml-concat-unplayable.md` proved
// the existing two assertions ABOVE pass under D-13 only because they check
// structural validity (ffmpeg null-decode tolerates the multi-EBML-header
// concat by silently reading segments 1+2 and dropping segment 3, and by
// collapsing all segments onto seg1's local timestamp axis so no muxer
// "File ended prematurely" warning fires). Players that respect Matroska's
// segment-info Duration element (mpv, Chrome's HTMLMediaElement, ffprobe's
// `format=duration`) read 9.94 s — the FIRST segment's metadata duration —
// and stop. The committed 1.6 MB fixture contains ~30 s of valid VP9 frames
// but presents as ~10 s of content to operators and tests.
//
// The "container-level playable duration" describe block below adds the
// assertion the closure check missed on 2026-05-15: that ffprobe-reported
// format duration EXCEEDS 25_000 ms for the canonical fixture. This is
// RED today under D-13 and stays RED until the multi-EBML concat at
// src/background/index.ts mergeVideoSegments() is replaced with a true
// remux that writes a single EBML header whose Info.Duration covers the
// whole ~30 s span.
import { describe, it, expect } from 'vitest'; import { describe, it, expect } from 'vitest';
import { existsSync, statSync } from 'node:fs'; import { existsSync, statSync } from 'node:fs';
@@ -43,12 +64,21 @@ import { dirname, resolve } from 'node:path';
const here = dirname(fileURLToPath(import.meta.url)); const here = dirname(fileURLToPath(import.meta.url));
const FIXTURE_PATH = resolve(here, '..', 'fixtures', 'last_30sec.webm'); const FIXTURE_PATH = resolve(here, '..', 'fixtures', 'last_30sec.webm');
const FFMPEG_BIN = '/usr/bin/ffmpeg'; const FFMPEG_BIN = '/usr/bin/ffmpeg';
const FFPROBE_BIN = '/usr/bin/ffprobe';
// Cap: a clean 30-second WebM decoded with `-f null` finishes well under // Cap: a clean 30-second WebM decoded with `-f null` finishes well under
// 10 s on commodity hardware. If we ever exceed this we want a hard failure, // 10 s on commodity hardware. If we ever exceed this we want a hard failure,
// not a hung CI job. // not a hung CI job.
const FFMPEG_TIMEOUT_MS = 30_000; const FFMPEG_TIMEOUT_MS = 30_000;
// Playable-duration floor. The recorder rotates every 10 s and keeps 3
// segments (D-13 / SEGMENT_DURATION_MS × MAX_SEGMENTS = 30_000 ms). The
// rotation lifecycle can drop a partial sub-second at each boundary so the
// final remux file is bounded by [~27_000, ~30_000] ms in steady state. We
// gate at 25_000 ms to keep slack for boundary noise but still firmly above
// the broken-architecture failure mode (9_940 ms — first segment only).
const MIN_PLAYABLE_DURATION_MS = 25_000;
function ffmpegAvailable(): boolean { function ffmpegAvailable(): boolean {
try { try {
return existsSync(FFMPEG_BIN) && statSync(FFMPEG_BIN).isFile(); return existsSync(FFMPEG_BIN) && statSync(FFMPEG_BIN).isFile();
@@ -57,6 +87,14 @@ function ffmpegAvailable(): boolean {
} }
} }
function ffprobeAvailable(): boolean {
try {
return existsSync(FFPROBE_BIN) && statSync(FFPROBE_BIN).isFile();
} catch {
return false;
}
}
interface DecodeResult { interface DecodeResult {
stderr: string; stderr: string;
packetErrorCount: number; packetErrorCount: number;
@@ -113,6 +151,44 @@ function decodeDryRunStrict(fixturePath: string): DecodeResult {
}; };
} }
/**
* Read the container-level `format=duration` value from a WebM file via
* ffprobe. This is the value that mpv, Chrome's HTMLMediaElement, and most
* Matroska parsers honor when deciding "how long is this file?" — they pick
* up the first Segment's Info.Duration EBML element and stop seeking past
* the EBML header's reported length.
*
* Returns NaN on parse failure (ffprobe missing input track, malformed
* float, etc.) so the assertion downstream can produce a precise error
* message rather than masking a probe-side failure as a duration check.
*
* @param fixturePath - Absolute path to the WebM file under test.
* @returns Container-level duration in milliseconds.
*/
function probeContainerDurationMs(fixturePath: string): number {
const proc = spawnSync(
FFPROBE_BIN,
[
'-v', 'error',
'-show_entries', 'format=duration',
'-of', 'default=noprint_wrappers=1:nokey=1',
'-i', fixturePath,
],
{
stdio: ['ignore', 'pipe', 'pipe'],
encoding: 'utf-8',
timeout: FFMPEG_TIMEOUT_MS,
maxBuffer: 1 * 1024 * 1024,
},
);
if (proc.signal !== null) {
throw new Error(`ffprobe was killed by signal ${proc.signal}`);
}
const stdout = (proc.stdout ?? '').trim();
const seconds = parseFloat(stdout);
return Number.isFinite(seconds) ? Math.round(seconds * 1000) : Number.NaN;
}
describe('webm playback (RED — confirms webm-playback-freeze bug)', () => { describe('webm playback (RED — confirms webm-playback-freeze bug)', () => {
it.skipIf(!ffmpegAvailable())( it.skipIf(!ffmpegAvailable())(
'ffmpeg dry-run on last_30sec.webm produces zero decoder packet errors', 'ffmpeg dry-run on last_30sec.webm produces zero decoder packet errors',
@@ -151,3 +227,90 @@ describe('webm playback (RED — confirms webm-playback-freeze bug)', () => {
}, },
); );
}); });
describe('webm playable duration (RED — confirms d13-multi-ebml-concat-unplayable bug)', () => {
it.skipIf(!ffprobeAvailable())(
'container-level format=duration on last_30sec.webm exceeds 25 s',
() => {
// SPEC §10 #7 requires last_30sec.webm to "play back in a browser"
// covering the most recent ~30 s. Both mpv and Chrome's HTMLMediaElement
// honor the first Segment's Info.Duration EBML element — which under
// D-13's multi-EBML concat is hardcoded to the FIRST segment's local
// duration (~9.94 s for the canonical fixture). That bug means the
// canonical Phase 1 closure fixture (committed 2026-05-15) presents
// as ~10 s of content to any standards-compliant Matroska parser,
// even though segments 2+3 are physically present in the bytes.
//
// The fix is a true WebM REMUX of the concatenated segments: parse
// each segment's clusters via an EBML library, extract the VP9
// frame payloads with their keyframe/delta flags, and re-mux into
// a single-EBML-header WebM whose clusters carry monotonically
// increasing timestamps. The resulting file's Info.Duration will
// span the full ~30 s window.
//
// Floor of MIN_PLAYABLE_DURATION_MS (25_000) accommodates the
// ~3 s boundary slack from segment rotation while remaining well
// above the broken-architecture failure mode (9_940 ms).
expect(existsSync(FIXTURE_PATH)).toBe(true);
const durationMs = probeContainerDurationMs(FIXTURE_PATH);
expect(
durationMs,
`ffprobe reported container duration=${durationMs} ms for ${FIXTURE_PATH}. ` +
`Under SPEC §10 #7 the file must present at least ${MIN_PLAYABLE_DURATION_MS} ms ` +
`of playable content to standards-compliant Matroska parsers (mpv, Chrome). ` +
`If this value is ~9_940 ms the file is a multi-EBML-header concat (D-13 raw output) ` +
`where players honor only the first segment's local Info.Duration metadata. ` +
`Fix: replace mergeVideoSegments() in src/background/index.ts with a true WebM remux ` +
`(parse + rewrite into a single-EBML-headered WebM with adjusted monotonic timestamps).`,
).toBeGreaterThanOrEqual(MIN_PLAYABLE_DURATION_MS);
},
);
it.skipIf(!ffmpegAvailable())(
'ffmpeg full decode of last_30sec.webm reaches at least 25 s of timeline',
() => {
// Defense-in-depth: even if a future ffprobe quirk computes
// format=duration by summing all reachable cluster timestamps,
// ffmpeg's full null-decode of the concatenated file collapses
// segments 2..N onto the first segment's local timestamp axis
// (verified empirically 2026-05-16: 601 frames decoded, time=09.96)
// because the multi-EBML format provides no segment-level offset.
// The remux fix will produce a stream whose decoded `time=...`
// reaches at least 25 s end-to-end.
expect(existsSync(FIXTURE_PATH)).toBe(true);
const proc = spawnSync(
FFMPEG_BIN,
['-nostdin', '-v', 'error', '-stats', '-i', FIXTURE_PATH, '-f', 'null', '-'],
{
stdio: ['ignore', 'ignore', 'pipe'],
encoding: 'utf-8',
timeout: FFMPEG_TIMEOUT_MS,
maxBuffer: 4 * 1024 * 1024,
},
);
if (proc.signal !== null) {
throw new Error(`ffmpeg was killed by signal ${proc.signal}`);
}
const stderr = proc.stderr ?? '';
// ffmpeg's `-stats` line on the final frame looks like:
// frame= 601 fps=0.0 q=-0.0 Lsize=N/A time=00:00:09.96 bitrate=N/A ...
// We want the LAST time= match (subsequent stats lines overwrite the
// earlier ones with monotonically increasing time values).
const timeMatches = [...stderr.matchAll(/time=(\d{2}):(\d{2}):(\d{2})\.(\d{2})/g)];
const last = timeMatches[timeMatches.length - 1];
const decodedMs = last
? (parseInt(last[1], 10) * 3600 + parseInt(last[2], 10) * 60 + parseInt(last[3], 10)) * 1000 +
parseInt(last[4], 10) * 10
: Number.NaN;
expect(
decodedMs,
`ffmpeg decoded only ${decodedMs} ms of timeline from ${FIXTURE_PATH}. ` +
`SPEC §10 #7 requires at least ${MIN_PLAYABLE_DURATION_MS} ms of decoded content. ` +
`If decoded duration is ~9_960 ms the multi-EBML concat is collapsing all segments ` +
`onto seg1's local timestamp axis (the timestamp-collision symptom). ` +
`Fix: real WebM remux per d13-multi-ebml-concat-unplayable debug session. ` +
`Full ffmpeg stderr:\n${stderr}`,
).toBeGreaterThanOrEqual(MIN_PLAYABLE_DURATION_MS);
},
);
});