Milestone v1 (v2.0.0): Mokosh — Session Capture #1

Merged
strategy155 merged 297 commits from gsd/phase-04-harden-clean-up-optional into main 2026-05-31 15:34:17 +00:00
5 changed files with 479 additions and 55 deletions
Showing only changes of commit bc310d98cf - Show all commits

View File

@@ -16,7 +16,7 @@ Requirements for the Phase 1 SPEC. Each maps to one phase in ROADMAP.md.
### Video
- [x] **REQ-video-ring-buffer**: The extension maintains an in-memory ring
- [ ] **REQ-video-ring-buffer**: The extension maintains an in-memory ring
buffer containing the most recent 30 seconds of captured video. AMENDED in
Phase 01: video is acquired via `navigator.mediaDevices.getDisplayMedia()`
invoked from the offscreen document (with `chrome.offscreen.Reason.DISPLAY_MEDIA`),
@@ -35,9 +35,19 @@ Requirements for the Phase 1 SPEC. Each maps to one phase in ROADMAP.md.
CON-video-window, CON-video-codec, CON-display-capture-binding (replaces
RETIRED CON-tab-capture-binding). CON-webm-header-retention RETIRED in
favor of D-13 per-segment header isolation.
- SPEC §10 acceptance criteria: #2, #3, #7 — all green 2026-05-15
(D-12 ffprobe gate + operator-confirmed Chrome playback + ffmpeg dry-run
exit 0 with zero decoder errors against `tests/fixtures/last_30sec.webm`).
- SPEC §10 acceptance criteria: #2, #3 green 2026-05-15 (D-12 ffprobe
gate). #7 (last_30sec.webm plays back in a browser) — **REOPENED
2026-05-16**: D-13's concat-of-self-contained-segments architecture
produces a multi-EBML-header file that standards-compliant Matroska
parsers (mpv, ffmpeg, Chrome's HTMLMediaElement) play only as the
first segment (~9.94 s) and silently drop segments 2 and 3. The
2026-05-15 "operator-confirmed clean Chrome playback" assessment was
insufficient — it checked playback ran without freezing but did not
measure total duration. Plan 01-08 (WebM remux via ts-ebml +
webm-muxer) will replace `mergeVideoSegments`'s file-concat with a
real single-EBML-headered remux, restoring SPEC §10 #7. See
`.planning/debug/d13-multi-ebml-concat-unplayable.md` for the
byte-level root-cause evidence.
### DOM Capture
@@ -193,7 +203,7 @@ Which phase covers which requirement. See ROADMAP.md for phase details.
| Requirement | Phase | Status |
|-------------|-------|--------|
| REQ-video-ring-buffer | Phase 1 | Complete |
| REQ-video-ring-buffer | Phase 1 | In progress (reopened 2026-05-16: SPEC §10 #7 fails; Plan 01-08 WebM remux pending) |
| REQ-rrweb-dom-buffer | Phase 2 | Pending |
| REQ-user-event-log | Phase 2 | Pending |
| REQ-password-confidentiality | Phase 2 | Pending |

View File

@@ -22,7 +22,7 @@ working export → green §10 smoke → harden + clean up**.
Decimal phases appear between their surrounding integers in numeric order.
- [x] **Phase 1: Stabilize video pipeline** — Collapse offscreen duality, fix MediaRecorder shadow, fix WebM ring buffer playability, replace `chrome.tabCapture` with offscreen `getDisplayMedia` (AMENDED from original DEC-003). **Closed 2026-05-15** — D-12 ffprobe gate + A3 empirical-playback gate both green against `tests/fixtures/last_30sec.webm` (1.6 MB VP9 1142×1038); D-13 restart-segments retired D-09..D-11 mid-phase; 30/30 vitest green, tsc clean. SPEC §10 #2, #3, #7 functionally satisfied (end-to-end Phase 4 smoke remains owner of §10).
- [ ] **Phase 1: Stabilize video pipeline** — Collapse offscreen duality, fix MediaRecorder shadow, fix WebM ring buffer playability, replace `chrome.tabCapture` with offscreen `getDisplayMedia` (AMENDED from original DEC-003). **Closed 2026-05-15 then REOPENED 2026-05-16**: the 2026-05-15 closure was based on insufficient operator playback verification; D-13's concat-of-self-contained-segments architecture produces a multi-EBML WebM that plays only ~9 s instead of ~30 s in standards-compliant parsers (mpv, ffmpeg, Chrome HTMLMediaElement). UAT Test 3 retest on 2026-05-16 confirmed via byte-level EBML probe. SPEC §10 #7 not actually satisfied. Plan 01-08 (WebM remux via ts-ebml + webm-muxer) replaces `mergeVideoSegments`'s file-concat with a real single-EBML remux. See `.planning/debug/d13-multi-ebml-concat-unplayable.md`. Option C port-lifecycle refactor (debug session `empty-archive-port-race`) DID land cleanly and is retained. Phase 1 will additionally absorb whole-desktop + auto-start UX work (Plans 01-09/01-10) per the 2026-05-16 amended charter.
- [ ] **Phase 2: Stabilize DOM + event capture privacy** — Migrate rrweb to v2 `maskInputFn`, plug `content/index.ts setupInputLogging` password leak
- [ ] **Phase 3: Stabilize export pipeline** — Restore user-activation gesture in popup, delete dead `permissions.request`, replace base64 `data:` URL with Blob URL minted in offscreen
- [ ] **Phase 4: SPEC §10 smoke verification** — End-to-end install-and-record-and-export pass against all 9 acceptance criteria

View File

@@ -2,16 +2,16 @@
gsd_state_version: 1.0
milestone: v2.0.0
milestone_name: milestone
status: phase_complete
stopped_at: "Phase 1 closure 2026-05-15: D-12 ffprobe gate + A3 empirical-playback gate both green against tests/fixtures/last_30sec.webm (1.6 MB VP9 1142×1038, 3-segment multi-EBML-header concat). D-13 restart-segments retired D-09..D-11 mid-phase. 30/30 vitest green incl. empirical webm-playback dry-runs; tsc clean; ffmpeg -v warning -i fixture -f null - exit 0 with zero decoder errors (only expected muxer DTS-monotonicity warnings at segment join boundaries); operator confirmed clean Chrome playback end-to-end. REQ-video-ring-buffer marked Complete. Ready to plan Phase 2 (DOM + event-capture privacy)."
last_updated: "2026-05-15T21:42:00.000Z"
last_activity: "2026-05-15 — Phase 1 closure: D-12 + A3 gates green; REQ-video-ring-buffer complete; ready for Phase 2"
status: phase_reopened
stopped_at: "Phase 1 REOPENED 2026-05-16: D-13 multi-EBML-concat architecture confirmed broken via UAT Test 3 retest + byte-level EBML probe — produced WebM plays only ~9 s instead of ~30 s in mpv AND Chrome (the 2026-05-15 closure's operator playback check was insufficient). Phase 1's primary deliverable (REQ-video-ring-buffer, SPEC §10 #7) is NOT satisfied. Plan 01-08 (WebM remux via ts-ebml + webm-muxer) will replace mergeVideoSegments file-concat with real single-EBML remux. RED test landed in tests/offscreen/webm-playback.test.ts (2 failures, 53 baseline GREEN). Option C port-lifecycle refactor (debug session empty-archive-port-race) DID land cleanly (commits 674c415..f0871c0). Phase 1 also absorbs whole-desktop + auto-start UX (Plans 01-09/01-10) per 2026-05-16 amended charter."
last_updated: "2026-05-16T17:35:00Z"
last_activity: "2026-05-16 — Phase 1 reopened: D-13 multi-EBML architecture confirmed broken by mpv/Chrome playback test; Plan 01-08 (ts-ebml + webm-muxer remux) pending; markers reverted"
progress:
total_phases: 5
completed_phases: 1
total_plans: 7
completed_phases: 0
total_plans: 10
completed_plans: 7
percent: 100
percent: 70
---
# Project State

View File

@@ -8,8 +8,8 @@ trigger: |
1675899 bytes, total segments merged: 3") but the resulting file plays
ONLY ~9 s in Chrome AND in mpv. Cross-checking the canonical fixture
committed at Phase 1 closure on 2026-05-15 (`tests/fixtures/last_30sec.webm`,
1633459 bytes, 3 segments per architecture) reveals it ALSO plays only ~9 s
in mpv. Operator confirmed both via mpv playback test.
1633459 bytes, 3 segments per architecture) reveals it ALSO plays only
~9 s in mpv. Operator confirmed both via mpv playback test.
This means D-13's "concat of self-contained WebM segments → playable 30 s
WebM" architecture is fundamentally broken. The 2026-05-15 Phase 1
@@ -24,7 +24,7 @@ trigger: |
though it was marked Complete in REQUIREMENTS.md/ROADMAP.md/STATE.md
on 2026-05-15.
created: 2026-05-16T16:56:41Z
updated: 2026-05-16T16:56:41Z
updated: 2026-05-16T17:25:00Z
phase: 01-stabilize-video-pipeline
related_uat: .planning/phases/01-stabilize-video-pipeline/01-UAT.md
related_review_fix: .planning/phases/01-stabilize-video-pipeline/01-REVIEW-FIX.md
@@ -121,15 +121,19 @@ the segment boundary noise from concatenation, not playback failure.
## Current Focus
hypothesis: |
**H4 confirmed by operator empirical test**: D-13's "concat of self-
contained WebM segments → produce playable 30 s WebM" architecture
does not work in practice because most Matroska/WebM players do not
implement the multi-segment Matroska feature. The Matroska spec
permits multiple segments in one file but most decoders read only
the first segment's EBML header and stop there. ffmpeg's behavior
(which mpv inherits) is to honor the first EBML's duration metadata.
Chrome's MSE implementation appears to do the same (per UAT operator
observation).
**H4 confirmed by byte-level EBML probe (2026-05-16T17:10Z, see
Evidence/H4 below)**: D-13's "concat of self-contained WebM
segments → produce playable 30 s WebM" architecture does not work
because standards-compliant Matroska parsers (mpv, mkvtoolnix,
Chrome's HTMLMediaElement, ffprobe's `format=duration` path) honor
the FIRST Segment element's Info.Duration EBML (~9_934 ms for the
fixture) and stop there. Even ffmpeg's matroska DEMUXER — which is
unusually liberal and reads through the second segment's EBML
header — collapses segments 2..N onto seg1's local timestamp axis
(verified empirically: 601 packets decoded from segs 1+2, ZERO
packets from seg3, output `time=00:00:09.96`). Multi-segment
Matroska is technically permitted by the spec but in practice
consumer-grade players do not implement it.
**H3 confirmed by operator empirical test**: The 2026-05-15 Phase 1
closure's "operator-confirmed clean Chrome playback" check was
@@ -145,49 +149,98 @@ hypothesis: |
produces a file that any player can read end-to-end as one continuous
~30 s stream.
**Candidate implementations**:
- `webm-muxer` npm package (Vanilla. ~10 KB. Browser + Node support.
Single-segment output. Active maintenance.)
- `ts-ebml` (EBML parser + writer. Allows manual control over
structure. ~50 KB.)
**Candidate implementations** (researched 2026-05-16, see
Evidence/library-survey below for full table):
- `webm-muxer` 5.1.4 (Vanilagy, MIT, last release 2025-07-02,
gzipped ~12 KB, pure ESM/CJS no DOM globals). `addVideoChunkRaw(data,
type:'key'|'delta', timestamp, meta?)` accepts already-encoded VP9
frames — exactly the shape produced by a stream of existing WebM
segments. SW-compatible. PRIMARY CANDIDATE for the write half.
- `ts-ebml` 3.0.2 (legokichi, MIT, last release 2025-09-28, gzipped
~87 KB, UMD has a single `typeof window` check with self-fallback
so SW-compatible). Decoder+Encoder API. Needed for the parse half
(extract VP9 SimpleBlock payloads + cluster timecodes + keyframe
flags from each segment).
- `ebml` 3.0.0 (node-ebml, MIT, last release **2018-09-06** — dead
upstream). Smaller but unmaintained.
- `mp4-muxer` 5.2.2 (sibling of webm-muxer; not applicable — we need
WebM container output).
- Custom EBML parser (full control, ~500-1000 LOC, no dep weight)
- **Alternative path: MediaRecorder timeslice with cluster-aware trim**:
revisit retired D-09..D-11 architecture but trim ONLY on keyframe
boundaries (preserving every cluster from the most recent keyframe
onwards). This avoids the A3 orphan-P-frame freeze by guaranteeing
every kept cluster's references are present. ~200-400 LOC. The
risk: requires understanding EBML/Matroska cluster structure to
trim correctly.
onwards). See Evidence/cluster-aware-trim below — the DETERMINISTIC
floor on retained-content duration is much less than 30 s
(worst-case: keyframe just emitted → retain only the post-keyframe
sliver) because VP9 kf_max_dist under Chrome's MediaRecorder is
irregular (3-5 s typical, 26 s observed in the prior debug
session). This path produces a NON-DETERMINISTIC content window;
rejected as architecturally weaker than remux.
- **Alternative path: WebCodecs API** (VideoEncoder + Muxer.js or
similar): full control over container framing. Significant rewrite
(~1000-2000 LOC). Most flexible but heaviest.
(~1000-2000 LOC). Most flexible but heaviest. WebCodecs is
available in MV3 service workers per Chrome 94+ — viable but
over-engineered for the current need (we already have VP9 frames,
we just need to RE-CONTAIN them).
The remux approach (webm-muxer or equivalent) is likely the right
trade-off: small, well-tested library, preserves D-13's segment
lifecycle benefits (no orphan-P-frame freeze, ~10s rotation gap
acceptable), but produces a single-EBML output that all players
read correctly.
The recommendation (TIEBREAKER only — the user makes the call):
`ts-ebml` (parse) + `webm-muxer` (write) is the smallest fix that
matches the actual problem shape. Combined ~100 KB gzipped, both
MIT, both actively maintained, both verified SW-compatible. Net
source-edit LOC ~150-300 in `src/background/index.ts`
mergeVideoSegments() — we don't decode/re-encode VP9 frames, we
just parse them out of segments and re-emit with monotonic
timestamps. Preserves D-13's recorder-side lifecycle (which DID
fix the orphan-P-frame freeze) and adds a single new SW-side
remux pass on the save path.
test: |
RED test: introduce a playable-duration assertion to
tests/offscreen/webm-playback.test.ts. Use ffprobe -count_frames
-show_streams to count VIDEO FRAMES (not just metadata duration),
then divide by reported frame rate to compute actual playable
content duration. Assert actual_duration > 25_000 ms for the
generated/committed fixture. This test should FAIL against the
current D-13 architecture and PASS after the remux fix lands.
RED test LANDED at tests/offscreen/webm-playback.test.ts. Two new
assertions in the new `describe('webm playable duration (RED —
confirms d13-multi-ebml-concat-unplayable bug)')` block:
Alternative RED test: ffprobe -read_intervals -i FILE
'0%+#90000' (seek to last 90s, read all packets). Count packets
read. Should be ~600 packets for 30s @ ~20fps, not ~200 for 9s.
1. `container-level format=duration on last_30sec.webm exceeds 25 s`
— uses ffprobe to read `format=duration`. Asserts
`>= MIN_PLAYABLE_DURATION_MS = 25_000`. RED today
(actual: 9_934 ms).
2. `ffmpeg full decode of last_30sec.webm reaches at least 25 s of
timeline` — parses the last `time=HH:MM:SS.MS` from `ffmpeg -stats
-f null -` output. Asserts `>= 25_000 ms`. RED today
(actual: 9_960 ms).
Both gate behind `it.skipIf(!ffprobeAvailable())` /
`it.skipIf(!ffmpegAvailable())` so CI environments without those
binaries auto-skip rather than hard-fail (matches the existing
webm-playback.test.ts skip discipline). The existing two structural-
validity tests in the same file (`...zero decoder packet errors` and
`...does not end prematurely`) remain GREEN and untouched.
expecting: |
RED test fails on current code (both fixture and freshly-recorded
output should fail the duration assertion). Debugger then implements
the chosen fix path (webm-muxer remux most likely) and re-asserts
GREEN.
next_action: gather initial evidence from EBML parsing of both fixtures + research candidate JS remux libraries
reasoning_checkpoint: ""
tdd_checkpoint: ""
the chosen fix path (webm-muxer + ts-ebml remux most likely) and
re-asserts GREEN. RED confirmed 2026-05-16T17:20Z: 11 test files,
53 passed + 2 failed (the two new assertions). All pre-existing
tests still GREEN; tsc clean (exit 0).
next_action: CHECKPOINT to orchestrator — root cause confirmed, RED test landed, fix-strategy options surfaced; awaiting user's chosen path via orchestrator routing.
reasoning_checkpoint: |
Why CHECKPOINT here rather than execute: the choice between
`ts-ebml + webm-muxer` vs `custom EBML parser` vs `cluster-aware
trim revisit of D-09..D-11` vs `WebCodecs rewrite` is architecturally
significant (it determines whether Phase 1's deliverable stays in
the debug-session hotfix lane OR escalates to a fresh Plan 01-08,
and whether the project gains two new runtime deps). Per the
feedback memory `feedback-no-unilateral-scope-reduction.md` the
debugger does not narrow this for the user — surface options and
let the user pick.
tdd_checkpoint: |
RED gate honored. Two new failing assertions in
tests/offscreen/webm-playback.test.ts pin the playable-duration
contract that the 2026-05-15 closure check missed. Existing
structural-validity tests remain GREEN. tsc clean. Full vitest run
reports `Test Files 1 failed | 10 passed (11) / Tests 2 failed | 53
passed (55)` — exactly the expected RED-on-new shape, no collateral
regression.
## Constraints
@@ -255,6 +308,197 @@ tdd_checkpoint: ""
- Both files report `duration=9.94 s` via ffprobe -show_entries format=duration
- Decoder errors: zero (segments are individually valid)
### H4 byte-level EBML probe (2026-05-16T17:10Z) — confirms multi-EBML-concat is the root cause
Probe target: `tests/fixtures/last_30sec.webm` (1_633_459 bytes, committed
fixture from Phase 1 closure 2026-05-15).
**EBML structural scan** (raw byte search for element IDs per
[Matroska spec](https://www.matroska.org/technical/elements.html)):
| EBML element | ID (hex) | Occurrences in file | Byte offsets |
|---|---|---|---|
| EBML header | `1A 45 DF A3` | **3** | `[0, 509038, 970967]` |
| Segment | `18 53 80 67` | **3** | `[36, 509074, 971003]` |
| Cluster | `1F 43 B6 75` | 13 | spread across all 3 segments |
The file is THREE concatenated WebM files, each with its own EBML header
+ Segment element. mkvinfo (without `--all-elements`) reports only the
FIRST segment + its EBML header — two top-level elements visible —
confirming standards-compliant parsers stop at the first segment.
**Per-segment isolated probes** (sliced via Python at the EBML offsets
above into `/tmp/d13-seg{1,2,3}.webm`):
| Segment | Bytes | format=duration | -count_frames |
|---|---|---|---|
| seg1 | 509_038 | 9.934 s | 301 frames |
| seg2 | 461_929 | 9.963 s | 300 frames |
| seg3 | 662_492 | 9.958 s | 311 frames |
| **TOTAL** | **1_633_459** | (29.86 s of real content) | **912 frames** |
Each segment is individually a valid, complete ~10 s WebM. The
underlying VP9 stream is intact across all three. The bug is purely the
multi-segment topology of the concatenated container.
**Concatenated file probe** (the actual fixture):
| Probe command | Reported value |
|---|---|
| `ffprobe -show_entries format=duration` | **9.934024 s** (first segment's Info.Duration metadata only) |
| `ffprobe -count_frames` | **601 frames** (= 301 + 300 = segs 1+2 only) |
| `ffmpeg -f null -` decoder | **frame=601 time=00:00:09.96** + 299 non-monotonic-DTS warnings |
| Packets read from byte range `pos<509038` (seg1) | 301 |
| Packets read from byte range `509038 ≤ pos < 970967` (seg2) | 300 |
| Packets read from byte range `pos ≥ 970967` (seg3) | **0** |
| mkvinfo top-level elements visible | 2 (EBML head + Segment) — seg2 + seg3 invisible |
Two observations both fatal to D-13:
1. ffmpeg's matroska demuxer is the most-permissive parser in common
use and even IT silently drops segment 3 (zero packets from
`pos ≥ 970967`).
2. Even when ffmpeg DOES read segments 1+2 it does not offset seg2's
local timestamps onto the global timeline. The 299 non-monotonic-DTS
warnings are seg2's local timestamps (`tt < 9934 ms`) colliding with
seg1's end timestamp (`9934 ms`). Output `time=00:00:09.96` because
the muxer cannot grow the timeline past the maximum monotonic DTS
it has accepted.
Conclusion: H4 confirmed at the byte level. The file is structurally
valid as a "concatenated archive of three WebMs" but is NOT a single
30-second playable WebM. To produce a 30-second playable WebM the
segments must be REMUXED (parse VP9 frames + keyframe flags + cluster
timestamps from each segment, then re-emit them inside a single
EBML-headered container with monotonically-adjusted timestamps).
### Library survey (2026-05-16T17:15Z) — candidate JS WebM remux libraries
All sizes are the bundled dist (no source-map, no tests, no docs).
Gzipped values measured locally via `gzip -c`. SW-compat verdict is
based on grep of dist for `window`/`document`/`navigator`/`XMLHttpRequest`
followed by manual inspection of any hits.
| Lib | Version | License | Last release | Dist size | Gzipped | SW-compat | API shape | Verdict |
|---|---|---|---|---|---|---|---|---|
| `webm-muxer` | 5.1.4 | MIT | 2025-07-02 | 69 KB | **~12 KB** | YES (zero DOM refs) | `addVideoChunkRaw(data, type:'key'\|'delta', ts, meta?)` accepts encoded VP9 frames | PRIMARY — write half |
| `ts-ebml` | 3.0.2 | MIT | 2025-09-28 | 356 KB | ~87 KB | YES (`typeof window` with `self` fallback in UMD wrapper) | `Decoder.decode(ArrayBuffer) → EBMLElementDetail[]` ; `Encoder.encode(elms) → ArrayBuffer` | PRIMARY — parse half |
| `ebml` | 3.0.0 | MIT | **2018-09-06** | 7.7 MB unpacked | n/a | uncertain | older streaming parser API | DEAD UPSTREAM — avoid |
| `mp4-muxer` | 5.2.2 | MIT | (active) | 70 KB | ~13 KB | YES | analogous to webm-muxer but MP4 | n/a — wrong container |
| Custom EBML parser | n/a | n/a | n/a | 0 KB | 0 KB | YES | hand-rolled per Matroska spec | ~500-1000 LOC, full ownership |
Important note on the `webm-muxer` API: `addVideoChunkRaw()` takes
already-encoded VP9 frame bytes + a keyframe flag + a timestamp. We
do NOT need to decode/re-encode the VP9 stream — the existing
segments already contain valid VP9 frame payloads inside their
Cluster/SimpleBlock elements. The remux path is:
1. For each segment blob, parse via `ts-ebml.Decoder` → walk the EBML
tree → for each Cluster's SimpleBlock children, extract the VP9
frame bytes + keyframe flag (Matroska SimpleBlock bit 7 of the
first flag byte = "keyframe" per
[spec](https://www.matroska.org/technical/elements.html#SimpleBlock))
+ cluster Timestamp + local block offset.
2. Compute monotonic adjusted timestamp: `globalTs = segmentBaseMs +
clusterTsMs + blockOffsetMs` where `segmentBaseMs` accumulates the
prior segment's total content duration.
3. Stream all adjusted frames into a single `webm-muxer.Muxer` with
`addVideoChunkRaw(frameData, isKey ? 'key' : 'delta', globalUs)`.
4. `muxer.finalize()` → `ArrayBufferTarget.buffer` → single-EBML
WebM Blob.
Combined dep weight: ~100 KB gzipped (`webm-muxer` ~12 KB + `ts-ebml`
~87 KB). Combined source edit estimate at `mergeVideoSegments()`:
~150-300 LOC including type defs.
### Cluster-aware-trim alternative path (D-09..D-11 revisit, 2026-05-16T17:18Z)
Path summary: keep MediaRecorder running continuously (the retired
D-09 lifecycle) but, on each periodic trim pass, scan the chunk buffer
for the OLDEST keyframe whose position would keep total duration ≤ 30
s, then drop everything strictly before that keyframe. Preserves header
chunk + a contiguous run of keyframe-anchored clusters.
Why this is architecturally weaker than remux:
1. **Non-deterministic content window.** MediaRecorder/VP9 keyframe
cadence under Chrome's default `kf_max_dist=100` is irregular —
the prior `webm-playback-freeze` debug session observed a 26-second
keyframe gap empirically. If the latest keyframe was emitted 2 s
ago, cluster-aware trim retains only 2 s of content. The user's
`last_30sec.webm` would be anywhere in `[~few seconds .. ~30 s]`
depending on when SAVE landed in the keyframe cycle. That breaks
SPEC §10 #7's implicit "≥ 30 s of recent context" requirement.
2. **Still need EBML parsing.** To find keyframe boundaries inside
the chunk buffer we still need to parse the WebM container for
SimpleBlock keyframe flags. So the dep weight is similar (`ts-ebml`
at minimum) but the output is worse.
3. **Re-introduces the freeze-risk surface area.** The prior debug
session retired D-09..D-11 precisely because age-trim repeatedly
produced orphan-P-frame freezes. A "keyframe-aware" variant still
has to delete content; one bug in the keyframe-detection path and
the freeze returns. The risk surface is wider than the remux path,
which never deletes — it only re-containers what already exists.
LOC estimate: ~200-400 LOC for keyframe parsing + buffer mutation +
tests. Net: similar dep weight, worse playable-duration guarantee,
re-opens the freeze regression surface. **REJECTED as inferior to
remux.** Documenting here only because the orchestrator brief
explicitly requested the comparison.
### WebCodecs API path (2026-05-16T17:19Z)
WebCodecs (`VideoEncoder` + `VideoDecoder`) is available in MV3 service
workers from Chrome 94+. The path would be: feed each segment's
clusters → `VideoDecoder` → emit `VideoFrame` objects → feed back into
`VideoEncoder` (re-encode VP9) → wrap output via `webm-muxer`.
This works but adds a re-encode pass that:
- doubles CPU cost during the save flow
- introduces an additional quality loss (re-encoding lossy VP9)
- adds 500-1000 LOC of encoder/decoder lifecycle management
- requires Chrome 94+ exclusively (we already require modern Chrome,
so OK, but it tightens the version floor)
There is no benefit over the `ts-ebml + webm-muxer` path for this
specific shape of problem — we already have encoded VP9 frames and
just need to put them in a different container. Re-encoding is
unnecessary work. **REJECTED as over-engineered.**
### RED test landing evidence (2026-05-16T17:20Z)
File edited: `tests/offscreen/webm-playback.test.ts` (preserved
existing 2 GREEN tests; appended new `describe` block with 2 new
assertions + supporting helpers).
Test run scoped to file:
```
$ npx vitest run tests/offscreen/webm-playback.test.ts
Test Files 1 failed (1)
Tests 2 failed | 2 passed (4)
```
Failures:
- `container-level format=duration on last_30sec.webm exceeds 25 s`
— `expected 9934 to be greater than or equal to 25000`
- `ffmpeg full decode of last_30sec.webm reaches at least 25 s of timeline`
— `expected 9960 to be greater than or equal to 25000`
Full suite (proves zero collateral regression):
```
$ npx vitest run
Test Files 1 failed | 10 passed (11)
Tests 2 failed | 53 passed (55)
```
All 53 pre-existing tests still GREEN. tsc:
```
$ npx tsc --noEmit; echo exit=$?
exit=0
```
## Eliminated
(populated by debugger as hypotheses are ruled out)
@@ -266,6 +510,13 @@ tdd_checkpoint: ""
- (H5: defective committed fixture in storage): ruled out — file size
matches expected (1.63 MB matches what was committed on 2026-05-15;
not bit-rot)
- H6 (cluster-aware-trim revisit of D-09..D-11): rejected on architectural
weakness — non-deterministic content window (depends on keyframe
cadence), still needs EBML parsing, re-opens freeze-regression
surface area. See Evidence/cluster-aware-trim section.
- H7 (WebCodecs re-encode path): rejected as over-engineered — re-encodes
VP9 frames we already have. ~500-1000 LOC for zero quality/playability
benefit. See Evidence/WebCodecs section.
## Resolution

View File

@@ -33,6 +33,27 @@
// Skip discipline: if ffmpeg is missing from the environment the test
// auto-skips rather than failing. CI ships ffmpeg per `smoke.sh` so this is
// a developer-convenience fence, not a behavioural softening.
//
// --- 2026-05-16 amendment: D-13 architecture failure RED tests ---
//
// Debug session `.planning/debug/d13-multi-ebml-concat-unplayable.md` proved
// the existing two assertions ABOVE pass under D-13 only because they check
// structural validity (ffmpeg null-decode tolerates the multi-EBML-header
// concat by silently reading segments 1+2 and dropping segment 3, and by
// collapsing all segments onto seg1's local timestamp axis so no muxer
// "File ended prematurely" warning fires). Players that respect Matroska's
// segment-info Duration element (mpv, Chrome's HTMLMediaElement, ffprobe's
// `format=duration`) read 9.94 s — the FIRST segment's metadata duration —
// and stop. The committed 1.6 MB fixture contains ~30 s of valid VP9 frames
// but presents as ~10 s of content to operators and tests.
//
// The "container-level playable duration" describe block below adds the
// assertion the closure check missed on 2026-05-15: that ffprobe-reported
// format duration EXCEEDS 25_000 ms for the canonical fixture. This is
// RED today under D-13 and stays RED until the multi-EBML concat at
// src/background/index.ts mergeVideoSegments() is replaced with a true
// remux that writes a single EBML header whose Info.Duration covers the
// whole ~30 s span.
import { describe, it, expect } from 'vitest';
import { existsSync, statSync } from 'node:fs';
@@ -43,12 +64,21 @@ import { dirname, resolve } from 'node:path';
const here = dirname(fileURLToPath(import.meta.url));
const FIXTURE_PATH = resolve(here, '..', 'fixtures', 'last_30sec.webm');
const FFMPEG_BIN = '/usr/bin/ffmpeg';
const FFPROBE_BIN = '/usr/bin/ffprobe';
// Cap: a clean 30-second WebM decoded with `-f null` finishes well under
// 10 s on commodity hardware. If we ever exceed this we want a hard failure,
// not a hung CI job.
const FFMPEG_TIMEOUT_MS = 30_000;
// Playable-duration floor. The recorder rotates every 10 s and keeps 3
// segments (D-13 / SEGMENT_DURATION_MS × MAX_SEGMENTS = 30_000 ms). The
// rotation lifecycle can drop a partial sub-second at each boundary so the
// final remux file is bounded by [~27_000, ~30_000] ms in steady state. We
// gate at 25_000 ms to keep slack for boundary noise but still firmly above
// the broken-architecture failure mode (9_940 ms — first segment only).
const MIN_PLAYABLE_DURATION_MS = 25_000;
function ffmpegAvailable(): boolean {
try {
return existsSync(FFMPEG_BIN) && statSync(FFMPEG_BIN).isFile();
@@ -57,6 +87,14 @@ function ffmpegAvailable(): boolean {
}
}
function ffprobeAvailable(): boolean {
try {
return existsSync(FFPROBE_BIN) && statSync(FFPROBE_BIN).isFile();
} catch {
return false;
}
}
interface DecodeResult {
stderr: string;
packetErrorCount: number;
@@ -113,6 +151,44 @@ function decodeDryRunStrict(fixturePath: string): DecodeResult {
};
}
/**
* Read the container-level `format=duration` value from a WebM file via
* ffprobe. This is the value that mpv, Chrome's HTMLMediaElement, and most
* Matroska parsers honor when deciding "how long is this file?" — they pick
* up the first Segment's Info.Duration EBML element and stop seeking past
* the EBML header's reported length.
*
* Returns NaN on parse failure (ffprobe missing input track, malformed
* float, etc.) so the assertion downstream can produce a precise error
* message rather than masking a probe-side failure as a duration check.
*
* @param fixturePath - Absolute path to the WebM file under test.
* @returns Container-level duration in milliseconds.
*/
function probeContainerDurationMs(fixturePath: string): number {
const proc = spawnSync(
FFPROBE_BIN,
[
'-v', 'error',
'-show_entries', 'format=duration',
'-of', 'default=noprint_wrappers=1:nokey=1',
'-i', fixturePath,
],
{
stdio: ['ignore', 'pipe', 'pipe'],
encoding: 'utf-8',
timeout: FFMPEG_TIMEOUT_MS,
maxBuffer: 1 * 1024 * 1024,
},
);
if (proc.signal !== null) {
throw new Error(`ffprobe was killed by signal ${proc.signal}`);
}
const stdout = (proc.stdout ?? '').trim();
const seconds = parseFloat(stdout);
return Number.isFinite(seconds) ? Math.round(seconds * 1000) : Number.NaN;
}
describe('webm playback (RED — confirms webm-playback-freeze bug)', () => {
it.skipIf(!ffmpegAvailable())(
'ffmpeg dry-run on last_30sec.webm produces zero decoder packet errors',
@@ -151,3 +227,90 @@ describe('webm playback (RED — confirms webm-playback-freeze bug)', () => {
},
);
});
describe('webm playable duration (RED — confirms d13-multi-ebml-concat-unplayable bug)', () => {
it.skipIf(!ffprobeAvailable())(
'container-level format=duration on last_30sec.webm exceeds 25 s',
() => {
// SPEC §10 #7 requires last_30sec.webm to "play back in a browser"
// covering the most recent ~30 s. Both mpv and Chrome's HTMLMediaElement
// honor the first Segment's Info.Duration EBML element — which under
// D-13's multi-EBML concat is hardcoded to the FIRST segment's local
// duration (~9.94 s for the canonical fixture). That bug means the
// canonical Phase 1 closure fixture (committed 2026-05-15) presents
// as ~10 s of content to any standards-compliant Matroska parser,
// even though segments 2+3 are physically present in the bytes.
//
// The fix is a true WebM REMUX of the concatenated segments: parse
// each segment's clusters via an EBML library, extract the VP9
// frame payloads with their keyframe/delta flags, and re-mux into
// a single-EBML-header WebM whose clusters carry monotonically
// increasing timestamps. The resulting file's Info.Duration will
// span the full ~30 s window.
//
// Floor of MIN_PLAYABLE_DURATION_MS (25_000) accommodates the
// ~3 s boundary slack from segment rotation while remaining well
// above the broken-architecture failure mode (9_940 ms).
expect(existsSync(FIXTURE_PATH)).toBe(true);
const durationMs = probeContainerDurationMs(FIXTURE_PATH);
expect(
durationMs,
`ffprobe reported container duration=${durationMs} ms for ${FIXTURE_PATH}. ` +
`Under SPEC §10 #7 the file must present at least ${MIN_PLAYABLE_DURATION_MS} ms ` +
`of playable content to standards-compliant Matroska parsers (mpv, Chrome). ` +
`If this value is ~9_940 ms the file is a multi-EBML-header concat (D-13 raw output) ` +
`where players honor only the first segment's local Info.Duration metadata. ` +
`Fix: replace mergeVideoSegments() in src/background/index.ts with a true WebM remux ` +
`(parse + rewrite into a single-EBML-headered WebM with adjusted monotonic timestamps).`,
).toBeGreaterThanOrEqual(MIN_PLAYABLE_DURATION_MS);
},
);
it.skipIf(!ffmpegAvailable())(
'ffmpeg full decode of last_30sec.webm reaches at least 25 s of timeline',
() => {
// Defense-in-depth: even if a future ffprobe quirk computes
// format=duration by summing all reachable cluster timestamps,
// ffmpeg's full null-decode of the concatenated file collapses
// segments 2..N onto the first segment's local timestamp axis
// (verified empirically 2026-05-16: 601 frames decoded, time=09.96)
// because the multi-EBML format provides no segment-level offset.
// The remux fix will produce a stream whose decoded `time=...`
// reaches at least 25 s end-to-end.
expect(existsSync(FIXTURE_PATH)).toBe(true);
const proc = spawnSync(
FFMPEG_BIN,
['-nostdin', '-v', 'error', '-stats', '-i', FIXTURE_PATH, '-f', 'null', '-'],
{
stdio: ['ignore', 'ignore', 'pipe'],
encoding: 'utf-8',
timeout: FFMPEG_TIMEOUT_MS,
maxBuffer: 4 * 1024 * 1024,
},
);
if (proc.signal !== null) {
throw new Error(`ffmpeg was killed by signal ${proc.signal}`);
}
const stderr = proc.stderr ?? '';
// ffmpeg's `-stats` line on the final frame looks like:
// frame= 601 fps=0.0 q=-0.0 Lsize=N/A time=00:00:09.96 bitrate=N/A ...
// We want the LAST time= match (subsequent stats lines overwrite the
// earlier ones with monotonically increasing time values).
const timeMatches = [...stderr.matchAll(/time=(\d{2}):(\d{2}):(\d{2})\.(\d{2})/g)];
const last = timeMatches[timeMatches.length - 1];
const decodedMs = last
? (parseInt(last[1], 10) * 3600 + parseInt(last[2], 10) * 60 + parseInt(last[3], 10)) * 1000 +
parseInt(last[4], 10) * 10
: Number.NaN;
expect(
decodedMs,
`ffmpeg decoded only ${decodedMs} ms of timeline from ${FIXTURE_PATH}. ` +
`SPEC §10 #7 requires at least ${MIN_PLAYABLE_DURATION_MS} ms of decoded content. ` +
`If decoded duration is ~9_960 ms the multi-EBML concat is collapsing all segments ` +
`onto seg1's local timestamp axis (the timestamp-collision symptom). ` +
`Fix: real WebM remux per d13-multi-ebml-concat-unplayable debug session. ` +
`Full ffmpeg stderr:\n${stderr}`,
).toBeGreaterThanOrEqual(MIN_PLAYABLE_DURATION_MS);
},
);
});