Files
mokosh/tests/uat
Mark f44ca3afba wip(01-11): wave-3 partial — A1+A4 attempted, popup-bridge SW state query unreliable
Task 4 of Plan 01-11 attempted A1-A4 wiring. Empirical run reveals an
architectural blocker that needs orchestrator-level decision.

Current state after this commit (SKIP_PROD_REBUILD=1 npx tsx tests/uat/harness.test.ts):
- A0 [PASS]: production bundle hook-leak grep gate (17ms)
- A1 [FAIL]: SW bootstrap → setIdleMode — popup state never transitions
  to '' despite keepalive ping + 3s waitFor. chrome.action.getPopup({})
  from the popup page consistently returns the manifest default
  (chrome-extension://<id>/src/popup/index.html), not the '' that
  setIdleMode's chrome.action.setPopup({popup:''}) should produce.
- A2 [FAIL]: toolbar onClicked — badge never transitions to "REC" after
  page.triggerExtensionAction(extension); 8s timeout. Either the
  toolbar action isn't reaching the SW listener, OR getDisplayMedia's
  picker isn't resolving in headless mode (despite the auto-select flag).
- A3 [FAIL]: offscreen target never appears (correlates with A2 — no
  recording started, no offscreen document spawned).
- A4 [PASS]: trivially passes (offscreen count is 0 → 0, both before
  + after the click). Not a true assertion of behavior; would also pass
  if the whole extension were broken.
- A5-A13: stubbed RED per plan.

Architectural blocker (Rule 4 — needs orchestrator decision):
- Puppeteer 25.0.2 + Chrome 148 + headless cannot reliably keep the MV3
  SW alive long enough OR expose its real chrome.* state to a popup
  page query. The popup-bridge architecture (Task 3 commit dbd977c)
  works for synchronous bridge queries (snapshot, fire-on-startup)
  but does NOT reliably reflect chrome.action.setPopup / setBadgeText
  state changes initiated by the SW.

Three plausible paths forward (need orchestrator pick):

  Option A — Content-script bridge: inject a content script that
    bridges chrome.* queries to a webpage's window.* RPC surface;
    harness uses page.evaluate against the content script instead of
    popup.evaluate. Pros: content scripts have stable lifetime tied to
    the page they're injected into. Cons: content scripts have
    DIFFERENT chrome.* surface (no chrome.action API surface — they
    can't read getBadgeText / getPopup at all). Likely DOESN'T solve
    the underlying problem.

  Option B — Headful with Xvfb on CI: relax the headless requirement;
    accept Xvfb dependency. Per Plan 01-11 RESEARCH §3, RESEARCH
    claimed headless works on Chrome 148 — empirical refutation here.
    Pros: SW lifetime is more stable in headful mode; setPopup
    propagation is reliable. Cons: introduces Xvfb dep that RESEARCH
    explicitly said wasn't needed; CI complication.

  Option C — Shrink harness scope to bridge-able assertions: A0 (grep
    gate), A8 (Bug A onStartup via bridge), A9 (icon sizes via popup
    fetch), A10 (manifest via popup), A13 (zip shape — operator runs
    SAVE_ARCHIVE manually + drops zip to a known path; harness reads
    it). Skip A1-A7, A11, A12 (the ones that require live SW state
    observation through chrome.action API). Pros: ships the
    bug-A-coverage portion of the harness today; keeps Plan 01-09's
    Task 5 operator-checkpoint partly automated. Cons: doesn't retire
    operator entirely; Plan 01-09 stays open on operator-empirical
    A1-A7.

  Option D — Switch to WebDriver BiDi (the Puppeteer 25 alternative
    backend): Puppeteer 25 supports BiDi via {protocol: 'webDriverBiDi'}.
    BiDi may handle extension SW evaluation differently (different
    isolation model). Speculative — no empirical evidence either way.

What landed cleanly:
- Tier-1 hook-leak grep gate (T-1-11-01) GREEN: dist/ has zero
  __mokoshTest / simulateUserStop / getSegmentCount / setCurrentStream
  / setSegmentCountGetter / __mokoshTestQuery / __mokoshKeepalive
  occurrences after npm run build.
- Two-bundle infrastructure (dist/ vs dist-test/) operational.
- Bridge handler in sw-hooks.ts works for snapshot + fire-on-startup
  + handler-types ops (verified by no-hang on keepalivePing call).
- Existing 89-test vitest baseline preserved (no regression from any
  Wave 0/1/2/3 work).

Verification:
- npx tsc --noEmit (src/): exit 0
- npx tsc --noEmit -p tests/uat: exit 0
- npm run build: exit 0; dist/ hook-free
- SKIP_BUILD=1 npx vitest run: 89/89 GREEN
- SKIP_PROD_REBUILD=1 npx tsx tests/uat/harness.test.ts:
  2/14 passed (A0 + A4-trivially), 12 FAIL — non-zero exit as expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 09:24:06 +02:00
..

Mokosh UAT harness (Plan 01-11)

Puppeteer-driven Node script that runs 14 assertions end-to-end against a real Chrome instance loaded with the Mokosh extension. Replaces Plan 01-09 Task 5's operator-empirical functional verification (the operator retains only step 1 — build — and step 14 — brand/design acceptance).

Quick start

npm run test:uat

This builds dist-test/ (the hook-enabled bundle) and runs the harness. Exit 0 means all 14 assertions passed. Final line: UAT harness: 14/14 assertions passed.

Local-debug mode

HEADLESS=0 npm run test:uat

Opens a real Chrome window so you can watch the picker auto-accept, the badge transitions, the popup appear, etc.

Developer iteration tricks

# Skip the production build inside assertion 0 (uses existing dist/):
SKIP_PROD_REBUILD=1 npm run test:uat

# Run the harness against an existing dist-test/ (skip npm run build:test):
npx tsx tests/uat/harness.test.ts

Assertion catalog

# Title Bug class Hook used
0 Production bundle has no test-hook leaks T-1-11-01 filesystem grep
1 SW bootstrap → setIdleMode sw.evaluate
2 Toolbar onClicked-idle → REC + popup triggerExtensionAction
3 Offscreen displaySurface === monitor D-15 __mokoshTest.getCurrentStream
4 Toolbar onClicked-recording → popup, no new offscreen targets count
5 SAVE_ARCHIVE → download fires downloads polling
6 BUG B: simulateUserStop → badge OFF + no recovery notif b9eeeeb dispatchEvent('ended')
7 RECORDING_ERROR codec-unsupported → ERR + recovery notif sendMessage
8 BUG A: onStartup → mokosh-startup- notification creates a881bf0 __mokoshTest.handlers.onStartup
9 Icon file sizes meet floors Bug A precondition sw.evaluate(fetch)
10 Manifest has notifications + 3 icons Bug A precondition chrome.runtime.getManifest
11 35s recording → segments.length >= 3 D-13 __mokoshTest.getSegmentCount
12 ffprobe on extracted webm exits 0 Plan 01-08 jszip + execFile
13 Archive shape — video + meta.json version match Plan 01-07 jszip

Failure isolation

Single browser, serial assertions, bail on first failure for setup- dependent assertions (assertion 0 abort means refusing to launch a potentially-leaky bundle). Per-assertion bail keeps the diagnostic output unambiguous — see RESEARCH §5 + Plan 01-11 open-question resolution 4.

On failure, the harness dumps the last 30 lines of SW console + last 30 lines of offscreen console (captured live during the run) to stderr BEFORE rethrowing — gives you contextual triage without needing to re- run with debug logging.

Known gotchas

Locale-specific picker auto-accept

The --auto-select-desktop-capture-source=Entire screen Chrome flag auto-accepts the screen-share picker. The string "Entire screen" is en_US-specific. If your Chrome is set to a non-English locale, the picker option label will differ and the auto-accept will silently fail (picker stays open; assertion 2 times out).

Fallback: switch your Chrome user-data-dir's locale to en_US for harness runs, OR adjust the launch arg in tests/uat/lib/launch.ts to match your locale's equivalent string.

dev-dep Chromium binary size

puppeteer pulls a ~150 MB Chromium binary at npm install time. CI must accept this. Production npm install --omit=dev skips it cleanly.

Xvfb is NOT required

Per Plan 01-11 RESEARCH §3 empirical probes against Chrome 148, the --headless=new mode handles screen capture without Xvfb on Linux CI runners. If a future Chrome regresses this, Xvfb :99 & DISPLAY=:99 npm run test:uat is the fallback.

CI runner screen-capture concern

The 35s recording assertion (A11) captures whatever is on screen during that window. CI MUST run the harness in an isolated container with no concurrent workload — see T-1-11-02 in Plan 01-11's threat model.

Real Chrome download (assertion 5 → A12)

The harness configures per-page download behavior via CDP to a fresh os.tmpdir()/mokosh-uat-downloads-* directory; downloads are NOT written to your real ~/Downloads. The temp directory is deleted by OS tmpdir GC.