Milestone v1 (v2.0.0): Mokosh — Session Capture #1
@@ -16,7 +16,7 @@ Requirements for the Phase 1 SPEC. Each maps to one phase in ROADMAP.md.
|
||||
|
||||
### Video
|
||||
|
||||
- [x] **REQ-video-ring-buffer**: The extension maintains an in-memory ring
|
||||
- [ ] **REQ-video-ring-buffer**: The extension maintains an in-memory ring
|
||||
buffer containing the most recent 30 seconds of captured video. AMENDED in
|
||||
Phase 01: video is acquired via `navigator.mediaDevices.getDisplayMedia()`
|
||||
invoked from the offscreen document (with `chrome.offscreen.Reason.DISPLAY_MEDIA`),
|
||||
@@ -35,9 +35,19 @@ Requirements for the Phase 1 SPEC. Each maps to one phase in ROADMAP.md.
|
||||
CON-video-window, CON-video-codec, CON-display-capture-binding (replaces
|
||||
RETIRED CON-tab-capture-binding). CON-webm-header-retention RETIRED in
|
||||
favor of D-13 per-segment header isolation.
|
||||
- SPEC §10 acceptance criteria: #2, #3, #7 — all green 2026-05-15
|
||||
(D-12 ffprobe gate + operator-confirmed Chrome playback + ffmpeg dry-run
|
||||
exit 0 with zero decoder errors against `tests/fixtures/last_30sec.webm`).
|
||||
- SPEC §10 acceptance criteria: #2, #3 — green 2026-05-15 (D-12 ffprobe
|
||||
gate). #7 (last_30sec.webm plays back in a browser) — **REOPENED
|
||||
2026-05-16**: D-13's concat-of-self-contained-segments architecture
|
||||
produces a multi-EBML-header file that standards-compliant Matroska
|
||||
parsers (mpv, ffmpeg, Chrome's HTMLMediaElement) play only as the
|
||||
first segment (~9.94 s) and silently drop segments 2 and 3. The
|
||||
2026-05-15 "operator-confirmed clean Chrome playback" assessment was
|
||||
insufficient — it checked playback ran without freezing but did not
|
||||
measure total duration. Plan 01-08 (WebM remux via ts-ebml +
|
||||
webm-muxer) will replace `mergeVideoSegments`'s file-concat with a
|
||||
real single-EBML-headered remux, restoring SPEC §10 #7. See
|
||||
`.planning/debug/d13-multi-ebml-concat-unplayable.md` for the
|
||||
byte-level root-cause evidence.
|
||||
|
||||
### DOM Capture
|
||||
|
||||
@@ -193,7 +203,7 @@ Which phase covers which requirement. See ROADMAP.md for phase details.
|
||||
|
||||
| Requirement | Phase | Status |
|
||||
|-------------|-------|--------|
|
||||
| REQ-video-ring-buffer | Phase 1 | Complete |
|
||||
| REQ-video-ring-buffer | Phase 1 | In progress (reopened 2026-05-16: SPEC §10 #7 fails; Plan 01-08 WebM remux pending) |
|
||||
| REQ-rrweb-dom-buffer | Phase 2 | Pending |
|
||||
| REQ-user-event-log | Phase 2 | Pending |
|
||||
| REQ-password-confidentiality | Phase 2 | Pending |
|
||||
|
||||
@@ -22,7 +22,7 @@ working export → green §10 smoke → harden + clean up**.
|
||||
|
||||
Decimal phases appear between their surrounding integers in numeric order.
|
||||
|
||||
- [x] **Phase 1: Stabilize video pipeline** — Collapse offscreen duality, fix MediaRecorder shadow, fix WebM ring buffer playability, replace `chrome.tabCapture` with offscreen `getDisplayMedia` (AMENDED from original DEC-003). **Closed 2026-05-15** — D-12 ffprobe gate + A3 empirical-playback gate both green against `tests/fixtures/last_30sec.webm` (1.6 MB VP9 1142×1038); D-13 restart-segments retired D-09..D-11 mid-phase; 30/30 vitest green, tsc clean. SPEC §10 #2, #3, #7 functionally satisfied (end-to-end Phase 4 smoke remains owner of §10).
|
||||
- [ ] **Phase 1: Stabilize video pipeline** — Collapse offscreen duality, fix MediaRecorder shadow, fix WebM ring buffer playability, replace `chrome.tabCapture` with offscreen `getDisplayMedia` (AMENDED from original DEC-003). **Closed 2026-05-15 then REOPENED 2026-05-16**: the 2026-05-15 closure was based on insufficient operator playback verification; D-13's concat-of-self-contained-segments architecture produces a multi-EBML WebM that plays only ~9 s instead of ~30 s in standards-compliant parsers (mpv, ffmpeg, Chrome HTMLMediaElement). UAT Test 3 retest on 2026-05-16 confirmed via byte-level EBML probe. SPEC §10 #7 not actually satisfied. Plan 01-08 (WebM remux via ts-ebml + webm-muxer) replaces `mergeVideoSegments`'s file-concat with a real single-EBML remux. See `.planning/debug/d13-multi-ebml-concat-unplayable.md`. Option C port-lifecycle refactor (debug session `empty-archive-port-race`) DID land cleanly and is retained. Phase 1 will additionally absorb whole-desktop + auto-start UX work (Plans 01-09/01-10) per the 2026-05-16 amended charter.
|
||||
- [ ] **Phase 2: Stabilize DOM + event capture privacy** — Migrate rrweb to v2 `maskInputFn`, plug `content/index.ts setupInputLogging` password leak
|
||||
- [ ] **Phase 3: Stabilize export pipeline** — Restore user-activation gesture in popup, delete dead `permissions.request`, replace base64 `data:` URL with Blob URL minted in offscreen
|
||||
- [ ] **Phase 4: SPEC §10 smoke verification** — End-to-end install-and-record-and-export pass against all 9 acceptance criteria
|
||||
|
||||
@@ -2,16 +2,16 @@
|
||||
gsd_state_version: 1.0
|
||||
milestone: v2.0.0
|
||||
milestone_name: milestone
|
||||
status: phase_complete
|
||||
stopped_at: "Phase 1 closure 2026-05-15: D-12 ffprobe gate + A3 empirical-playback gate both green against tests/fixtures/last_30sec.webm (1.6 MB VP9 1142×1038, 3-segment multi-EBML-header concat). D-13 restart-segments retired D-09..D-11 mid-phase. 30/30 vitest green incl. empirical webm-playback dry-runs; tsc clean; ffmpeg -v warning -i fixture -f null - exit 0 with zero decoder errors (only expected muxer DTS-monotonicity warnings at segment join boundaries); operator confirmed clean Chrome playback end-to-end. REQ-video-ring-buffer marked Complete. Ready to plan Phase 2 (DOM + event-capture privacy)."
|
||||
last_updated: "2026-05-15T21:42:00.000Z"
|
||||
last_activity: "2026-05-15 — Phase 1 closure: D-12 + A3 gates green; REQ-video-ring-buffer complete; ready for Phase 2"
|
||||
status: phase_reopened
|
||||
stopped_at: "Phase 1 REOPENED 2026-05-16: D-13 multi-EBML-concat architecture confirmed broken via UAT Test 3 retest + byte-level EBML probe — produced WebM plays only ~9 s instead of ~30 s in mpv AND Chrome (the 2026-05-15 closure's operator playback check was insufficient). Phase 1's primary deliverable (REQ-video-ring-buffer, SPEC §10 #7) is NOT satisfied. Plan 01-08 (WebM remux via ts-ebml + webm-muxer) will replace mergeVideoSegments file-concat with real single-EBML remux. RED test landed in tests/offscreen/webm-playback.test.ts (2 failures, 53 baseline GREEN). Option C port-lifecycle refactor (debug session empty-archive-port-race) DID land cleanly (commits 674c415..f0871c0). Phase 1 also absorbs whole-desktop + auto-start UX (Plans 01-09/01-10) per 2026-05-16 amended charter."
|
||||
last_updated: "2026-05-16T17:35:00Z"
|
||||
last_activity: "2026-05-16 — Phase 1 reopened: D-13 multi-EBML architecture confirmed broken by mpv/Chrome playback test; Plan 01-08 (ts-ebml + webm-muxer remux) pending; markers reverted"
|
||||
progress:
|
||||
total_phases: 5
|
||||
completed_phases: 1
|
||||
total_plans: 7
|
||||
completed_phases: 0
|
||||
total_plans: 10
|
||||
completed_plans: 7
|
||||
percent: 100
|
||||
percent: 70
|
||||
---
|
||||
|
||||
# Project State
|
||||
|
||||
@@ -8,8 +8,8 @@ trigger: |
|
||||
1675899 bytes, total segments merged: 3") but the resulting file plays
|
||||
ONLY ~9 s in Chrome AND in mpv. Cross-checking the canonical fixture
|
||||
committed at Phase 1 closure on 2026-05-15 (`tests/fixtures/last_30sec.webm`,
|
||||
1633459 bytes, 3 segments per architecture) reveals it ALSO plays only ~9 s
|
||||
in mpv. Operator confirmed both via mpv playback test.
|
||||
1633459 bytes, 3 segments per architecture) reveals it ALSO plays only
|
||||
~9 s in mpv. Operator confirmed both via mpv playback test.
|
||||
|
||||
This means D-13's "concat of self-contained WebM segments → playable 30 s
|
||||
WebM" architecture is fundamentally broken. The 2026-05-15 Phase 1
|
||||
@@ -24,7 +24,7 @@ trigger: |
|
||||
though it was marked Complete in REQUIREMENTS.md/ROADMAP.md/STATE.md
|
||||
on 2026-05-15.
|
||||
created: 2026-05-16T16:56:41Z
|
||||
updated: 2026-05-16T16:56:41Z
|
||||
updated: 2026-05-16T17:25:00Z
|
||||
phase: 01-stabilize-video-pipeline
|
||||
related_uat: .planning/phases/01-stabilize-video-pipeline/01-UAT.md
|
||||
related_review_fix: .planning/phases/01-stabilize-video-pipeline/01-REVIEW-FIX.md
|
||||
@@ -121,15 +121,19 @@ the segment boundary noise from concatenation, not playback failure.
|
||||
## Current Focus
|
||||
|
||||
hypothesis: |
|
||||
**H4 confirmed by operator empirical test**: D-13's "concat of self-
|
||||
contained WebM segments → produce playable 30 s WebM" architecture
|
||||
does not work in practice because most Matroska/WebM players do not
|
||||
implement the multi-segment Matroska feature. The Matroska spec
|
||||
permits multiple segments in one file but most decoders read only
|
||||
the first segment's EBML header and stop there. ffmpeg's behavior
|
||||
(which mpv inherits) is to honor the first EBML's duration metadata.
|
||||
Chrome's MSE implementation appears to do the same (per UAT operator
|
||||
observation).
|
||||
**H4 confirmed by byte-level EBML probe (2026-05-16T17:10Z, see
|
||||
Evidence/H4 below)**: D-13's "concat of self-contained WebM
|
||||
segments → produce playable 30 s WebM" architecture does not work
|
||||
because standards-compliant Matroska parsers (mpv, mkvtoolnix,
|
||||
Chrome's HTMLMediaElement, ffprobe's `format=duration` path) honor
|
||||
the FIRST Segment element's Info.Duration EBML (~9_934 ms for the
|
||||
fixture) and stop there. Even ffmpeg's matroska DEMUXER — which is
|
||||
unusually liberal and reads through the second segment's EBML
|
||||
header — collapses segments 2..N onto seg1's local timestamp axis
|
||||
(verified empirically: 601 packets decoded from segs 1+2, ZERO
|
||||
packets from seg3, output `time=00:00:09.96`). Multi-segment
|
||||
Matroska is technically permitted by the spec but in practice
|
||||
consumer-grade players do not implement it.
|
||||
|
||||
**H3 confirmed by operator empirical test**: The 2026-05-15 Phase 1
|
||||
closure's "operator-confirmed clean Chrome playback" check was
|
||||
@@ -145,49 +149,98 @@ hypothesis: |
|
||||
produces a file that any player can read end-to-end as one continuous
|
||||
~30 s stream.
|
||||
|
||||
**Candidate implementations**:
|
||||
- `webm-muxer` npm package (Vanilla. ~10 KB. Browser + Node support.
|
||||
Single-segment output. Active maintenance.)
|
||||
- `ts-ebml` (EBML parser + writer. Allows manual control over
|
||||
structure. ~50 KB.)
|
||||
**Candidate implementations** (researched 2026-05-16, see
|
||||
Evidence/library-survey below for full table):
|
||||
- `webm-muxer` 5.1.4 (Vanilagy, MIT, last release 2025-07-02,
|
||||
gzipped ~12 KB, pure ESM/CJS no DOM globals). `addVideoChunkRaw(data,
|
||||
type:'key'|'delta', timestamp, meta?)` accepts already-encoded VP9
|
||||
frames — exactly the shape produced by a stream of existing WebM
|
||||
segments. SW-compatible. PRIMARY CANDIDATE for the write half.
|
||||
- `ts-ebml` 3.0.2 (legokichi, MIT, last release 2025-09-28, gzipped
|
||||
~87 KB, UMD has a single `typeof window` check with self-fallback
|
||||
so SW-compatible). Decoder+Encoder API. Needed for the parse half
|
||||
(extract VP9 SimpleBlock payloads + cluster timecodes + keyframe
|
||||
flags from each segment).
|
||||
- `ebml` 3.0.0 (node-ebml, MIT, last release **2018-09-06** — dead
|
||||
upstream). Smaller but unmaintained.
|
||||
- `mp4-muxer` 5.2.2 (sibling of webm-muxer; not applicable — we need
|
||||
WebM container output).
|
||||
- Custom EBML parser (full control, ~500-1000 LOC, no dep weight)
|
||||
- **Alternative path: MediaRecorder timeslice with cluster-aware trim**:
|
||||
revisit retired D-09..D-11 architecture but trim ONLY on keyframe
|
||||
boundaries (preserving every cluster from the most recent keyframe
|
||||
onwards). This avoids the A3 orphan-P-frame freeze by guaranteeing
|
||||
every kept cluster's references are present. ~200-400 LOC. The
|
||||
risk: requires understanding EBML/Matroska cluster structure to
|
||||
trim correctly.
|
||||
onwards). See Evidence/cluster-aware-trim below — the DETERMINISTIC
|
||||
floor on retained-content duration is much less than 30 s
|
||||
(worst-case: keyframe just emitted → retain only the post-keyframe
|
||||
sliver) because VP9 kf_max_dist under Chrome's MediaRecorder is
|
||||
irregular (3-5 s typical, 26 s observed in the prior debug
|
||||
session). This path produces a NON-DETERMINISTIC content window;
|
||||
rejected as architecturally weaker than remux.
|
||||
- **Alternative path: WebCodecs API** (VideoEncoder + Muxer.js or
|
||||
similar): full control over container framing. Significant rewrite
|
||||
(~1000-2000 LOC). Most flexible but heaviest.
|
||||
(~1000-2000 LOC). Most flexible but heaviest. WebCodecs is
|
||||
available in MV3 service workers per Chrome 94+ — viable but
|
||||
over-engineered for the current need (we already have VP9 frames,
|
||||
we just need to RE-CONTAIN them).
|
||||
|
||||
The remux approach (webm-muxer or equivalent) is likely the right
|
||||
trade-off: small, well-tested library, preserves D-13's segment
|
||||
lifecycle benefits (no orphan-P-frame freeze, ~10s rotation gap
|
||||
acceptable), but produces a single-EBML output that all players
|
||||
read correctly.
|
||||
The recommendation (TIEBREAKER only — the user makes the call):
|
||||
`ts-ebml` (parse) + `webm-muxer` (write) is the smallest fix that
|
||||
matches the actual problem shape. Combined ~100 KB gzipped, both
|
||||
MIT, both actively maintained, both verified SW-compatible. Net
|
||||
source-edit LOC ~150-300 in `src/background/index.ts`
|
||||
mergeVideoSegments() — we don't decode/re-encode VP9 frames, we
|
||||
just parse them out of segments and re-emit with monotonic
|
||||
timestamps. Preserves D-13's recorder-side lifecycle (which DID
|
||||
fix the orphan-P-frame freeze) and adds a single new SW-side
|
||||
remux pass on the save path.
|
||||
|
||||
test: |
|
||||
RED test: introduce a playable-duration assertion to
|
||||
tests/offscreen/webm-playback.test.ts. Use ffprobe -count_frames
|
||||
-show_streams to count VIDEO FRAMES (not just metadata duration),
|
||||
then divide by reported frame rate to compute actual playable
|
||||
content duration. Assert actual_duration > 25_000 ms for the
|
||||
generated/committed fixture. This test should FAIL against the
|
||||
current D-13 architecture and PASS after the remux fix lands.
|
||||
RED test LANDED at tests/offscreen/webm-playback.test.ts. Two new
|
||||
assertions in the new `describe('webm playable duration (RED —
|
||||
confirms d13-multi-ebml-concat-unplayable bug)')` block:
|
||||
|
||||
Alternative RED test: ffprobe -read_intervals -i FILE
|
||||
'0%+#90000' (seek to last 90s, read all packets). Count packets
|
||||
read. Should be ~600 packets for 30s @ ~20fps, not ~200 for 9s.
|
||||
1. `container-level format=duration on last_30sec.webm exceeds 25 s`
|
||||
— uses ffprobe to read `format=duration`. Asserts
|
||||
`>= MIN_PLAYABLE_DURATION_MS = 25_000`. RED today
|
||||
(actual: 9_934 ms).
|
||||
|
||||
2. `ffmpeg full decode of last_30sec.webm reaches at least 25 s of
|
||||
timeline` — parses the last `time=HH:MM:SS.MS` from `ffmpeg -stats
|
||||
-f null -` output. Asserts `>= 25_000 ms`. RED today
|
||||
(actual: 9_960 ms).
|
||||
|
||||
Both gate behind `it.skipIf(!ffprobeAvailable())` /
|
||||
`it.skipIf(!ffmpegAvailable())` so CI environments without those
|
||||
binaries auto-skip rather than hard-fail (matches the existing
|
||||
webm-playback.test.ts skip discipline). The existing two structural-
|
||||
validity tests in the same file (`...zero decoder packet errors` and
|
||||
`...does not end prematurely`) remain GREEN and untouched.
|
||||
expecting: |
|
||||
RED test fails on current code (both fixture and freshly-recorded
|
||||
output should fail the duration assertion). Debugger then implements
|
||||
the chosen fix path (webm-muxer remux most likely) and re-asserts
|
||||
GREEN.
|
||||
next_action: gather initial evidence from EBML parsing of both fixtures + research candidate JS remux libraries
|
||||
reasoning_checkpoint: ""
|
||||
tdd_checkpoint: ""
|
||||
the chosen fix path (webm-muxer + ts-ebml remux most likely) and
|
||||
re-asserts GREEN. RED confirmed 2026-05-16T17:20Z: 11 test files,
|
||||
53 passed + 2 failed (the two new assertions). All pre-existing
|
||||
tests still GREEN; tsc clean (exit 0).
|
||||
next_action: CHECKPOINT to orchestrator — root cause confirmed, RED test landed, fix-strategy options surfaced; awaiting user's chosen path via orchestrator routing.
|
||||
reasoning_checkpoint: |
|
||||
Why CHECKPOINT here rather than execute: the choice between
|
||||
`ts-ebml + webm-muxer` vs `custom EBML parser` vs `cluster-aware
|
||||
trim revisit of D-09..D-11` vs `WebCodecs rewrite` is architecturally
|
||||
significant (it determines whether Phase 1's deliverable stays in
|
||||
the debug-session hotfix lane OR escalates to a fresh Plan 01-08,
|
||||
and whether the project gains two new runtime deps). Per the
|
||||
feedback memory `feedback-no-unilateral-scope-reduction.md` the
|
||||
debugger does not narrow this for the user — surface options and
|
||||
let the user pick.
|
||||
tdd_checkpoint: |
|
||||
RED gate honored. Two new failing assertions in
|
||||
tests/offscreen/webm-playback.test.ts pin the playable-duration
|
||||
contract that the 2026-05-15 closure check missed. Existing
|
||||
structural-validity tests remain GREEN. tsc clean. Full vitest run
|
||||
reports `Test Files 1 failed | 10 passed (11) / Tests 2 failed | 53
|
||||
passed (55)` — exactly the expected RED-on-new shape, no collateral
|
||||
regression.
|
||||
|
||||
## Constraints
|
||||
|
||||
@@ -255,6 +308,197 @@ tdd_checkpoint: ""
|
||||
- Both files report `duration=9.94 s` via ffprobe -show_entries format=duration
|
||||
- Decoder errors: zero (segments are individually valid)
|
||||
|
||||
### H4 byte-level EBML probe (2026-05-16T17:10Z) — confirms multi-EBML-concat is the root cause
|
||||
|
||||
Probe target: `tests/fixtures/last_30sec.webm` (1_633_459 bytes, committed
|
||||
fixture from Phase 1 closure 2026-05-15).
|
||||
|
||||
**EBML structural scan** (raw byte search for element IDs per
|
||||
[Matroska spec](https://www.matroska.org/technical/elements.html)):
|
||||
|
||||
| EBML element | ID (hex) | Occurrences in file | Byte offsets |
|
||||
|---|---|---|---|
|
||||
| EBML header | `1A 45 DF A3` | **3** | `[0, 509038, 970967]` |
|
||||
| Segment | `18 53 80 67` | **3** | `[36, 509074, 971003]` |
|
||||
| Cluster | `1F 43 B6 75` | 13 | spread across all 3 segments |
|
||||
|
||||
The file is THREE concatenated WebM files, each with its own EBML header
|
||||
+ Segment element. mkvinfo (without `--all-elements`) reports only the
|
||||
FIRST segment + its EBML header — two top-level elements visible —
|
||||
confirming standards-compliant parsers stop at the first segment.
|
||||
|
||||
**Per-segment isolated probes** (sliced via Python at the EBML offsets
|
||||
above into `/tmp/d13-seg{1,2,3}.webm`):
|
||||
|
||||
| Segment | Bytes | format=duration | -count_frames |
|
||||
|---|---|---|---|
|
||||
| seg1 | 509_038 | 9.934 s | 301 frames |
|
||||
| seg2 | 461_929 | 9.963 s | 300 frames |
|
||||
| seg3 | 662_492 | 9.958 s | 311 frames |
|
||||
| **TOTAL** | **1_633_459** | (29.86 s of real content) | **912 frames** |
|
||||
|
||||
Each segment is individually a valid, complete ~10 s WebM. The
|
||||
underlying VP9 stream is intact across all three. The bug is purely the
|
||||
multi-segment topology of the concatenated container.
|
||||
|
||||
**Concatenated file probe** (the actual fixture):
|
||||
|
||||
| Probe command | Reported value |
|
||||
|---|---|
|
||||
| `ffprobe -show_entries format=duration` | **9.934024 s** (first segment's Info.Duration metadata only) |
|
||||
| `ffprobe -count_frames` | **601 frames** (= 301 + 300 = segs 1+2 only) |
|
||||
| `ffmpeg -f null -` decoder | **frame=601 time=00:00:09.96** + 299 non-monotonic-DTS warnings |
|
||||
| Packets read from byte range `pos<509038` (seg1) | 301 |
|
||||
| Packets read from byte range `509038 ≤ pos < 970967` (seg2) | 300 |
|
||||
| Packets read from byte range `pos ≥ 970967` (seg3) | **0** |
|
||||
| mkvinfo top-level elements visible | 2 (EBML head + Segment) — seg2 + seg3 invisible |
|
||||
|
||||
Two observations both fatal to D-13:
|
||||
|
||||
1. ffmpeg's matroska demuxer is the most-permissive parser in common
|
||||
use and even IT silently drops segment 3 (zero packets from
|
||||
`pos ≥ 970967`).
|
||||
2. Even when ffmpeg DOES read segments 1+2 it does not offset seg2's
|
||||
local timestamps onto the global timeline. The 299 non-monotonic-DTS
|
||||
warnings are seg2's local timestamps (`tt < 9934 ms`) colliding with
|
||||
seg1's end timestamp (`9934 ms`). Output `time=00:00:09.96` because
|
||||
the muxer cannot grow the timeline past the maximum monotonic DTS
|
||||
it has accepted.
|
||||
|
||||
Conclusion: H4 confirmed at the byte level. The file is structurally
|
||||
valid as a "concatenated archive of three WebMs" but is NOT a single
|
||||
30-second playable WebM. To produce a 30-second playable WebM the
|
||||
segments must be REMUXED (parse VP9 frames + keyframe flags + cluster
|
||||
timestamps from each segment, then re-emit them inside a single
|
||||
EBML-headered container with monotonically-adjusted timestamps).
|
||||
|
||||
### Library survey (2026-05-16T17:15Z) — candidate JS WebM remux libraries
|
||||
|
||||
All sizes are the bundled dist (no source-map, no tests, no docs).
|
||||
Gzipped values measured locally via `gzip -c`. SW-compat verdict is
|
||||
based on grep of dist for `window`/`document`/`navigator`/`XMLHttpRequest`
|
||||
followed by manual inspection of any hits.
|
||||
|
||||
| Lib | Version | License | Last release | Dist size | Gzipped | SW-compat | API shape | Verdict |
|
||||
|---|---|---|---|---|---|---|---|---|
|
||||
| `webm-muxer` | 5.1.4 | MIT | 2025-07-02 | 69 KB | **~12 KB** | YES (zero DOM refs) | `addVideoChunkRaw(data, type:'key'\|'delta', ts, meta?)` accepts encoded VP9 frames | PRIMARY — write half |
|
||||
| `ts-ebml` | 3.0.2 | MIT | 2025-09-28 | 356 KB | ~87 KB | YES (`typeof window` with `self` fallback in UMD wrapper) | `Decoder.decode(ArrayBuffer) → EBMLElementDetail[]` ; `Encoder.encode(elms) → ArrayBuffer` | PRIMARY — parse half |
|
||||
| `ebml` | 3.0.0 | MIT | **2018-09-06** | 7.7 MB unpacked | n/a | uncertain | older streaming parser API | DEAD UPSTREAM — avoid |
|
||||
| `mp4-muxer` | 5.2.2 | MIT | (active) | 70 KB | ~13 KB | YES | analogous to webm-muxer but MP4 | n/a — wrong container |
|
||||
| Custom EBML parser | n/a | n/a | n/a | 0 KB | 0 KB | YES | hand-rolled per Matroska spec | ~500-1000 LOC, full ownership |
|
||||
|
||||
Important note on the `webm-muxer` API: `addVideoChunkRaw()` takes
|
||||
already-encoded VP9 frame bytes + a keyframe flag + a timestamp. We
|
||||
do NOT need to decode/re-encode the VP9 stream — the existing
|
||||
segments already contain valid VP9 frame payloads inside their
|
||||
Cluster/SimpleBlock elements. The remux path is:
|
||||
|
||||
1. For each segment blob, parse via `ts-ebml.Decoder` → walk the EBML
|
||||
tree → for each Cluster's SimpleBlock children, extract the VP9
|
||||
frame bytes + keyframe flag (Matroska SimpleBlock bit 7 of the
|
||||
first flag byte = "keyframe" per
|
||||
[spec](https://www.matroska.org/technical/elements.html#SimpleBlock))
|
||||
+ cluster Timestamp + local block offset.
|
||||
2. Compute monotonic adjusted timestamp: `globalTs = segmentBaseMs +
|
||||
clusterTsMs + blockOffsetMs` where `segmentBaseMs` accumulates the
|
||||
prior segment's total content duration.
|
||||
3. Stream all adjusted frames into a single `webm-muxer.Muxer` with
|
||||
`addVideoChunkRaw(frameData, isKey ? 'key' : 'delta', globalUs)`.
|
||||
4. `muxer.finalize()` → `ArrayBufferTarget.buffer` → single-EBML
|
||||
WebM Blob.
|
||||
|
||||
Combined dep weight: ~100 KB gzipped (`webm-muxer` ~12 KB + `ts-ebml`
|
||||
~87 KB). Combined source edit estimate at `mergeVideoSegments()`:
|
||||
~150-300 LOC including type defs.
|
||||
|
||||
### Cluster-aware-trim alternative path (D-09..D-11 revisit, 2026-05-16T17:18Z)
|
||||
|
||||
Path summary: keep MediaRecorder running continuously (the retired
|
||||
D-09 lifecycle) but, on each periodic trim pass, scan the chunk buffer
|
||||
for the OLDEST keyframe whose position would keep total duration ≤ 30
|
||||
s, then drop everything strictly before that keyframe. Preserves header
|
||||
chunk + a contiguous run of keyframe-anchored clusters.
|
||||
|
||||
Why this is architecturally weaker than remux:
|
||||
|
||||
1. **Non-deterministic content window.** MediaRecorder/VP9 keyframe
|
||||
cadence under Chrome's default `kf_max_dist=100` is irregular —
|
||||
the prior `webm-playback-freeze` debug session observed a 26-second
|
||||
keyframe gap empirically. If the latest keyframe was emitted 2 s
|
||||
ago, cluster-aware trim retains only 2 s of content. The user's
|
||||
`last_30sec.webm` would be anywhere in `[~few seconds .. ~30 s]`
|
||||
depending on when SAVE landed in the keyframe cycle. That breaks
|
||||
SPEC §10 #7's implicit "≥ 30 s of recent context" requirement.
|
||||
|
||||
2. **Still need EBML parsing.** To find keyframe boundaries inside
|
||||
the chunk buffer we still need to parse the WebM container for
|
||||
SimpleBlock keyframe flags. So the dep weight is similar (`ts-ebml`
|
||||
at minimum) but the output is worse.
|
||||
|
||||
3. **Re-introduces the freeze-risk surface area.** The prior debug
|
||||
session retired D-09..D-11 precisely because age-trim repeatedly
|
||||
produced orphan-P-frame freezes. A "keyframe-aware" variant still
|
||||
has to delete content; one bug in the keyframe-detection path and
|
||||
the freeze returns. The risk surface is wider than the remux path,
|
||||
which never deletes — it only re-containers what already exists.
|
||||
|
||||
LOC estimate: ~200-400 LOC for keyframe parsing + buffer mutation +
|
||||
tests. Net: similar dep weight, worse playable-duration guarantee,
|
||||
re-opens the freeze regression surface. **REJECTED as inferior to
|
||||
remux.** Documenting here only because the orchestrator brief
|
||||
explicitly requested the comparison.
|
||||
|
||||
### WebCodecs API path (2026-05-16T17:19Z)
|
||||
|
||||
WebCodecs (`VideoEncoder` + `VideoDecoder`) is available in MV3 service
|
||||
workers from Chrome 94+. The path would be: feed each segment's
|
||||
clusters → `VideoDecoder` → emit `VideoFrame` objects → feed back into
|
||||
`VideoEncoder` (re-encode VP9) → wrap output via `webm-muxer`.
|
||||
|
||||
This works but adds a re-encode pass that:
|
||||
- doubles CPU cost during the save flow
|
||||
- introduces an additional quality loss (re-encoding lossy VP9)
|
||||
- adds 500-1000 LOC of encoder/decoder lifecycle management
|
||||
- requires Chrome 94+ exclusively (we already require modern Chrome,
|
||||
so OK, but it tightens the version floor)
|
||||
|
||||
There is no benefit over the `ts-ebml + webm-muxer` path for this
|
||||
specific shape of problem — we already have encoded VP9 frames and
|
||||
just need to put them in a different container. Re-encoding is
|
||||
unnecessary work. **REJECTED as over-engineered.**
|
||||
|
||||
### RED test landing evidence (2026-05-16T17:20Z)
|
||||
|
||||
File edited: `tests/offscreen/webm-playback.test.ts` (preserved
|
||||
existing 2 GREEN tests; appended new `describe` block with 2 new
|
||||
assertions + supporting helpers).
|
||||
|
||||
Test run scoped to file:
|
||||
```
|
||||
$ npx vitest run tests/offscreen/webm-playback.test.ts
|
||||
Test Files 1 failed (1)
|
||||
Tests 2 failed | 2 passed (4)
|
||||
```
|
||||
|
||||
Failures:
|
||||
- `container-level format=duration on last_30sec.webm exceeds 25 s`
|
||||
— `expected 9934 to be greater than or equal to 25000`
|
||||
- `ffmpeg full decode of last_30sec.webm reaches at least 25 s of timeline`
|
||||
— `expected 9960 to be greater than or equal to 25000`
|
||||
|
||||
Full suite (proves zero collateral regression):
|
||||
```
|
||||
$ npx vitest run
|
||||
Test Files 1 failed | 10 passed (11)
|
||||
Tests 2 failed | 53 passed (55)
|
||||
```
|
||||
|
||||
All 53 pre-existing tests still GREEN. tsc:
|
||||
```
|
||||
$ npx tsc --noEmit; echo exit=$?
|
||||
exit=0
|
||||
```
|
||||
|
||||
## Eliminated
|
||||
|
||||
(populated by debugger as hypotheses are ruled out)
|
||||
@@ -266,6 +510,13 @@ tdd_checkpoint: ""
|
||||
- (H5: defective committed fixture in storage): ruled out — file size
|
||||
matches expected (1.63 MB matches what was committed on 2026-05-15;
|
||||
not bit-rot)
|
||||
- H6 (cluster-aware-trim revisit of D-09..D-11): rejected on architectural
|
||||
weakness — non-deterministic content window (depends on keyframe
|
||||
cadence), still needs EBML parsing, re-opens freeze-regression
|
||||
surface area. See Evidence/cluster-aware-trim section.
|
||||
- H7 (WebCodecs re-encode path): rejected as over-engineered — re-encodes
|
||||
VP9 frames we already have. ~500-1000 LOC for zero quality/playability
|
||||
benefit. See Evidence/WebCodecs section.
|
||||
|
||||
## Resolution
|
||||
|
||||
|
||||
@@ -33,6 +33,27 @@
|
||||
// Skip discipline: if ffmpeg is missing from the environment the test
|
||||
// auto-skips rather than failing. CI ships ffmpeg per `smoke.sh` so this is
|
||||
// a developer-convenience fence, not a behavioural softening.
|
||||
//
|
||||
// --- 2026-05-16 amendment: D-13 architecture failure RED tests ---
|
||||
//
|
||||
// Debug session `.planning/debug/d13-multi-ebml-concat-unplayable.md` proved
|
||||
// the existing two assertions ABOVE pass under D-13 only because they check
|
||||
// structural validity (ffmpeg null-decode tolerates the multi-EBML-header
|
||||
// concat by silently reading segments 1+2 and dropping segment 3, and by
|
||||
// collapsing all segments onto seg1's local timestamp axis so no muxer
|
||||
// "File ended prematurely" warning fires). Players that respect Matroska's
|
||||
// segment-info Duration element (mpv, Chrome's HTMLMediaElement, ffprobe's
|
||||
// `format=duration`) read 9.94 s — the FIRST segment's metadata duration —
|
||||
// and stop. The committed 1.6 MB fixture contains ~30 s of valid VP9 frames
|
||||
// but presents as ~10 s of content to operators and tests.
|
||||
//
|
||||
// The "container-level playable duration" describe block below adds the
|
||||
// assertion the closure check missed on 2026-05-15: that ffprobe-reported
|
||||
// format duration EXCEEDS 25_000 ms for the canonical fixture. This is
|
||||
// RED today under D-13 and stays RED until the multi-EBML concat at
|
||||
// src/background/index.ts mergeVideoSegments() is replaced with a true
|
||||
// remux that writes a single EBML header whose Info.Duration covers the
|
||||
// whole ~30 s span.
|
||||
|
||||
import { describe, it, expect } from 'vitest';
|
||||
import { existsSync, statSync } from 'node:fs';
|
||||
@@ -43,12 +64,21 @@ import { dirname, resolve } from 'node:path';
|
||||
const here = dirname(fileURLToPath(import.meta.url));
|
||||
const FIXTURE_PATH = resolve(here, '..', 'fixtures', 'last_30sec.webm');
|
||||
const FFMPEG_BIN = '/usr/bin/ffmpeg';
|
||||
const FFPROBE_BIN = '/usr/bin/ffprobe';
|
||||
|
||||
// Cap: a clean 30-second WebM decoded with `-f null` finishes well under
|
||||
// 10 s on commodity hardware. If we ever exceed this we want a hard failure,
|
||||
// not a hung CI job.
|
||||
const FFMPEG_TIMEOUT_MS = 30_000;
|
||||
|
||||
// Playable-duration floor. The recorder rotates every 10 s and keeps 3
|
||||
// segments (D-13 / SEGMENT_DURATION_MS × MAX_SEGMENTS = 30_000 ms). The
|
||||
// rotation lifecycle can drop a partial sub-second at each boundary so the
|
||||
// final remux file is bounded by [~27_000, ~30_000] ms in steady state. We
|
||||
// gate at 25_000 ms to keep slack for boundary noise but still firmly above
|
||||
// the broken-architecture failure mode (9_940 ms — first segment only).
|
||||
const MIN_PLAYABLE_DURATION_MS = 25_000;
|
||||
|
||||
function ffmpegAvailable(): boolean {
|
||||
try {
|
||||
return existsSync(FFMPEG_BIN) && statSync(FFMPEG_BIN).isFile();
|
||||
@@ -57,6 +87,14 @@ function ffmpegAvailable(): boolean {
|
||||
}
|
||||
}
|
||||
|
||||
function ffprobeAvailable(): boolean {
|
||||
try {
|
||||
return existsSync(FFPROBE_BIN) && statSync(FFPROBE_BIN).isFile();
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
interface DecodeResult {
|
||||
stderr: string;
|
||||
packetErrorCount: number;
|
||||
@@ -113,6 +151,44 @@ function decodeDryRunStrict(fixturePath: string): DecodeResult {
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Read the container-level `format=duration` value from a WebM file via
|
||||
* ffprobe. This is the value that mpv, Chrome's HTMLMediaElement, and most
|
||||
* Matroska parsers honor when deciding "how long is this file?" — they pick
|
||||
* up the first Segment's Info.Duration EBML element and stop seeking past
|
||||
* the EBML header's reported length.
|
||||
*
|
||||
* Returns NaN on parse failure (ffprobe missing input track, malformed
|
||||
* float, etc.) so the assertion downstream can produce a precise error
|
||||
* message rather than masking a probe-side failure as a duration check.
|
||||
*
|
||||
* @param fixturePath - Absolute path to the WebM file under test.
|
||||
* @returns Container-level duration in milliseconds.
|
||||
*/
|
||||
function probeContainerDurationMs(fixturePath: string): number {
|
||||
const proc = spawnSync(
|
||||
FFPROBE_BIN,
|
||||
[
|
||||
'-v', 'error',
|
||||
'-show_entries', 'format=duration',
|
||||
'-of', 'default=noprint_wrappers=1:nokey=1',
|
||||
'-i', fixturePath,
|
||||
],
|
||||
{
|
||||
stdio: ['ignore', 'pipe', 'pipe'],
|
||||
encoding: 'utf-8',
|
||||
timeout: FFMPEG_TIMEOUT_MS,
|
||||
maxBuffer: 1 * 1024 * 1024,
|
||||
},
|
||||
);
|
||||
if (proc.signal !== null) {
|
||||
throw new Error(`ffprobe was killed by signal ${proc.signal}`);
|
||||
}
|
||||
const stdout = (proc.stdout ?? '').trim();
|
||||
const seconds = parseFloat(stdout);
|
||||
return Number.isFinite(seconds) ? Math.round(seconds * 1000) : Number.NaN;
|
||||
}
|
||||
|
||||
describe('webm playback (RED — confirms webm-playback-freeze bug)', () => {
|
||||
it.skipIf(!ffmpegAvailable())(
|
||||
'ffmpeg dry-run on last_30sec.webm produces zero decoder packet errors',
|
||||
@@ -151,3 +227,90 @@ describe('webm playback (RED — confirms webm-playback-freeze bug)', () => {
|
||||
},
|
||||
);
|
||||
});
|
||||
|
||||
describe('webm playable duration (RED — confirms d13-multi-ebml-concat-unplayable bug)', () => {
|
||||
it.skipIf(!ffprobeAvailable())(
|
||||
'container-level format=duration on last_30sec.webm exceeds 25 s',
|
||||
() => {
|
||||
// SPEC §10 #7 requires last_30sec.webm to "play back in a browser"
|
||||
// covering the most recent ~30 s. Both mpv and Chrome's HTMLMediaElement
|
||||
// honor the first Segment's Info.Duration EBML element — which under
|
||||
// D-13's multi-EBML concat is hardcoded to the FIRST segment's local
|
||||
// duration (~9.94 s for the canonical fixture). That bug means the
|
||||
// canonical Phase 1 closure fixture (committed 2026-05-15) presents
|
||||
// as ~10 s of content to any standards-compliant Matroska parser,
|
||||
// even though segments 2+3 are physically present in the bytes.
|
||||
//
|
||||
// The fix is a true WebM REMUX of the concatenated segments: parse
|
||||
// each segment's clusters via an EBML library, extract the VP9
|
||||
// frame payloads with their keyframe/delta flags, and re-mux into
|
||||
// a single-EBML-header WebM whose clusters carry monotonically
|
||||
// increasing timestamps. The resulting file's Info.Duration will
|
||||
// span the full ~30 s window.
|
||||
//
|
||||
// Floor of MIN_PLAYABLE_DURATION_MS (25_000) accommodates the
|
||||
// ~3 s boundary slack from segment rotation while remaining well
|
||||
// above the broken-architecture failure mode (9_940 ms).
|
||||
expect(existsSync(FIXTURE_PATH)).toBe(true);
|
||||
const durationMs = probeContainerDurationMs(FIXTURE_PATH);
|
||||
expect(
|
||||
durationMs,
|
||||
`ffprobe reported container duration=${durationMs} ms for ${FIXTURE_PATH}. ` +
|
||||
`Under SPEC §10 #7 the file must present at least ${MIN_PLAYABLE_DURATION_MS} ms ` +
|
||||
`of playable content to standards-compliant Matroska parsers (mpv, Chrome). ` +
|
||||
`If this value is ~9_940 ms the file is a multi-EBML-header concat (D-13 raw output) ` +
|
||||
`where players honor only the first segment's local Info.Duration metadata. ` +
|
||||
`Fix: replace mergeVideoSegments() in src/background/index.ts with a true WebM remux ` +
|
||||
`(parse + rewrite into a single-EBML-headered WebM with adjusted monotonic timestamps).`,
|
||||
).toBeGreaterThanOrEqual(MIN_PLAYABLE_DURATION_MS);
|
||||
},
|
||||
);
|
||||
|
||||
it.skipIf(!ffmpegAvailable())(
|
||||
'ffmpeg full decode of last_30sec.webm reaches at least 25 s of timeline',
|
||||
() => {
|
||||
// Defense-in-depth: even if a future ffprobe quirk computes
|
||||
// format=duration by summing all reachable cluster timestamps,
|
||||
// ffmpeg's full null-decode of the concatenated file collapses
|
||||
// segments 2..N onto the first segment's local timestamp axis
|
||||
// (verified empirically 2026-05-16: 601 frames decoded, time=09.96)
|
||||
// because the multi-EBML format provides no segment-level offset.
|
||||
// The remux fix will produce a stream whose decoded `time=...`
|
||||
// reaches at least 25 s end-to-end.
|
||||
expect(existsSync(FIXTURE_PATH)).toBe(true);
|
||||
const proc = spawnSync(
|
||||
FFMPEG_BIN,
|
||||
['-nostdin', '-v', 'error', '-stats', '-i', FIXTURE_PATH, '-f', 'null', '-'],
|
||||
{
|
||||
stdio: ['ignore', 'ignore', 'pipe'],
|
||||
encoding: 'utf-8',
|
||||
timeout: FFMPEG_TIMEOUT_MS,
|
||||
maxBuffer: 4 * 1024 * 1024,
|
||||
},
|
||||
);
|
||||
if (proc.signal !== null) {
|
||||
throw new Error(`ffmpeg was killed by signal ${proc.signal}`);
|
||||
}
|
||||
const stderr = proc.stderr ?? '';
|
||||
// ffmpeg's `-stats` line on the final frame looks like:
|
||||
// frame= 601 fps=0.0 q=-0.0 Lsize=N/A time=00:00:09.96 bitrate=N/A ...
|
||||
// We want the LAST time= match (subsequent stats lines overwrite the
|
||||
// earlier ones with monotonically increasing time values).
|
||||
const timeMatches = [...stderr.matchAll(/time=(\d{2}):(\d{2}):(\d{2})\.(\d{2})/g)];
|
||||
const last = timeMatches[timeMatches.length - 1];
|
||||
const decodedMs = last
|
||||
? (parseInt(last[1], 10) * 3600 + parseInt(last[2], 10) * 60 + parseInt(last[3], 10)) * 1000 +
|
||||
parseInt(last[4], 10) * 10
|
||||
: Number.NaN;
|
||||
expect(
|
||||
decodedMs,
|
||||
`ffmpeg decoded only ${decodedMs} ms of timeline from ${FIXTURE_PATH}. ` +
|
||||
`SPEC §10 #7 requires at least ${MIN_PLAYABLE_DURATION_MS} ms of decoded content. ` +
|
||||
`If decoded duration is ~9_960 ms the multi-EBML concat is collapsing all segments ` +
|
||||
`onto seg1's local timestamp axis (the timestamp-collision symptom). ` +
|
||||
`Fix: real WebM remux per d13-multi-ebml-concat-unplayable debug session. ` +
|
||||
`Full ffmpeg stderr:\n${stderr}`,
|
||||
).toBeGreaterThanOrEqual(MIN_PLAYABLE_DURATION_MS);
|
||||
},
|
||||
);
|
||||
});
|
||||
|
||||
Reference in New Issue
Block a user