feat(02-03): tab-url-tracker — chrome.tabs.onActivated + onUpdated → urls[] with dedup + filter (D-P2-02)

- Add src/background/tab-url-tracker.ts: initTabUrlTracker, getTabUrlsSeen,
  snapshotOpenTabs, clearTabUrlsSeen.
- Filter: positive-allow regex ^(https?|chrome-extension):// — INCLUDE
  https + http + chrome-extension://; default-deny chrome://, about:,
  devtools://, file://, blob:, data: (per CONTEXT.md `<specifics>` URL
  filter clause).
- Dedup: Set membership gate + first-seen-ordered array; getTabUrlsSeen
  returns a slice so callers cannot mutate internal state.
- snapshotOpenTabs: defensive chrome.tabs.query({}) enumeration for SAVE-
  time augmentation (DEC-011 Amendment 1 capability). Captures tabs the
  operator opened but never activated.
- Module guards: initialized flag prevents double-listener registration;
  all chrome.tabs.* listener calls wrapped in defensive try/catch matching
  the src/background/index.ts:bootstrap pattern.
- Tier-1 grep-gate preserved (13 entries): NO `_resetForTesting` /
  `_observeForTesting` ergonomic test hooks exported (would have leaked
  into production bundles per tests/background/no-test-hooks-in-prod-
  bundle.test.ts). Tests drive chrome.tabs.onUpdated callbacks directly
  via the chrome stub — Plan 02-01 SUMMARY anticipated this option.

[Rule 3 - Blocking] tests/background/meta-json-urls-schema.test.ts Tests 3+4
extended to wire chrome.tabs.onUpdated callbacks directly (replaces the
optional `_resetForTesting` / `_observeForTesting` skeletons). Test 5
simplified (empty-tracker assertion needs no observation seeding on a
freshly-reset module graph). Test 5 F2 contract preserved verbatim.

Verification:
- npx tsc --noEmit → clean
- npx vitest run tests/background/meta-json-urls-schema.test.ts → 3/5 GREEN
  (Tests 3+4+5 the tracker-contract trio flipped; Tests 1+2 still RED as
  they pin the SessionMetadata + createArchive amendment — Task 2 territory)
This commit is contained in:
2026-05-20 16:06:06 +02:00
parent d3aa567a54
commit 7beb69059e
2 changed files with 302 additions and 85 deletions

View File

@@ -0,0 +1,246 @@
// src/background/tab-url-tracker.ts
//
// Phase 2 Plan 02-03 — D-P2-02 tab-URL tracker.
//
// Maintains an internal, deduplicated, first-seen-ordered Set of tab URLs
// observed during the SW's lifetime via chrome.tabs.onActivated +
// chrome.tabs.onUpdated listeners. Feeds the multi-tab `meta.urls` array
// in `createArchive()` (closes audit P1 #10).
//
// Architectural decisions (frozen by .planning/phases/02-stabilize-export-
// pipeline/02-03-PLAN.md):
//
// - The tracker is FED PASSIVELY by Chrome's tab events: every time the
// operator switches tabs OR a tab navigates (changeInfo.url present),
// we get a URL — pass it through the filter, dedup, append.
//
// - It is SAVE-time AUGMENTED via `snapshotOpenTabs()` which calls
// `chrome.tabs.query({})` (requires the `tabs` permission per
// DEC-011 Amendment 1, 2026-05-20) to catch tabs the operator opened
// during the 30 s window but never activated.
//
// - Always-on charter (Plan 01-09 Amendment 3): `clearTabUrlsSeen()` is
// NOT called by createArchive. The tracker keeps accumulating across
// saves so the next save captures any tabs activated AFTER the prior
// save fired.
//
// - F2 (plan-checker iteration 1): no sentinel-URL fallback. If the
// tracker is empty (whole-desktop-no-tab session — operator captured
// a non-Chrome surface via desktopCapture only), `getTabUrlsSeen()`
// returns `[]` faithfully. createArchive emits `urls: []`.
//
// URL filter (per .planning/phases/02-stabilize-export-pipeline/02-CONTEXT.md
// `<specifics>`):
//
// INCLUDE: https://, http://, chrome-extension://
// EXCLUDE: chrome://, about:, devtools://, file://, blob:, data:, edge://
//
// (Implemented via a single positive-allow regex; default-deny everything
// else.)
//
// Tier-1 grep-gate compliance: the module exposes NO `_resetForTesting`
// / `_observeForTesting` ergonomic test hooks (which would have leaked
// into production bundles and violated the
// `tests/background/no-test-hooks-in-prod-bundle.test.ts` 13-entry gate).
// Unit tests drive the registered chrome.tabs.onUpdated callbacks
// directly via the chrome stub's `_callbacks` array; module state is
// reset between tests via vitest's `vi.resetModules()` in beforeEach.
//
// References:
// - .planning/phases/02-stabilize-export-pipeline/02-CONTEXT.md
// <decisions> D-P2-02 + <specifics> URL filter clause
// - .planning/PROJECT.md DEC-011 Amendment 1 (`tabs` permission)
// - Chrome tabs API: https://developer.chrome.com/docs/extensions/reference/api/tabs
import { Logger } from '../shared/logger';
const logger = new Logger('TabUrlTracker');
// ─── Module state ───────────────────────────────────────────────────────
// `tabUrlsSeen` is the dedup Set (O(1) membership checks).
// `firstSeenOrder` is the append-only list preserving first-seen ordering.
// Both stay in lockstep: a URL is added to BOTH only on its first
// observation (the Set membership check gates the array push).
//
// `initialized` guards against double-listener-registration if some caller
// (a unit test, a future re-init path) calls `initTabUrlTracker()` twice.
let tabUrlsSeen: Set<string> = new Set();
let firstSeenOrder: string[] = [];
let initialized = false;
// ─── URL filter ─────────────────────────────────────────────────────────
// Positive-allow: anything matching the regex is INCLUDED; everything else
// is DROPPED. Equivalent to the long-form switch over schemes but
// preserves a single canonical pattern that's easy to grep for.
//
// Test 4 of tests/background/meta-json-urls-schema.test.ts asserts this
// exact filter behaviour (https://example.com + chrome-extension://abc/...
// IN; chrome://newtab + about:blank OUT).
const URL_SCHEME_ALLOW = /^(https?|chrome-extension):\/\//;
/**
* Whether a URL passes the inclusion filter.
*
* @param url - Candidate URL string.
* @returns true if the URL's scheme is in the allow-list; false otherwise.
*/
function passesFilter(url: string): boolean {
if (typeof url !== 'string' || url.length === 0) return false;
return URL_SCHEME_ALLOW.test(url);
}
/**
* Add a URL to the tracker if it passes the filter and is not already
* present. Idempotent: re-observation of an existing URL is a no-op.
*
* @param url - Candidate URL string (may be empty or malformed —
* defensively filtered).
*/
function addUrl(url: string): void {
if (!passesFilter(url)) return;
if (tabUrlsSeen.has(url)) return;
tabUrlsSeen.add(url);
firstSeenOrder.push(url);
}
/**
* Initialize the tab-URL tracker. Registers chrome.tabs.onActivated +
* chrome.tabs.onUpdated listeners that maintain an internal Set of URLs
* observed during the SW's lifetime. Must be called once at SW init.
*
* Idempotent — subsequent calls return early after logging a warning
* (defensive pattern matching src/background/index.ts:bootstrap try/catch
* wrappers around chrome.* listener registrations).
*
* D-P2-02 binding: captures the operator's multi-tab context, not just
* the active-at-save tab.
*/
export function initTabUrlTracker(): void {
if (initialized) {
logger.warn('initTabUrlTracker called twice — second call ignored');
return;
}
initialized = true;
// chrome.tabs.onActivated: fires when the user switches to a different
// tab. The activated tab's URL is fetched via chrome.tabs.get because
// onActivated's payload omits .url (it carries tabId + windowId only).
// DEC-011 Amendment 1: the `tabs` permission makes chrome.tabs.get
// reliably return the `.url` field for any tab in any window.
try {
chrome.tabs.onActivated.addListener((activeInfo: { tabId: number; windowId: number }) => {
const onTabResolved = (tab: { url?: string } | undefined): void => {
if (tab === undefined || tab === null) return;
if (typeof tab.url !== 'string' || tab.url.length === 0) {
logger.warn(`tabs.onActivated: tab ${activeInfo.tabId} has no .url (permission gap?)`);
return;
}
addUrl(tab.url);
};
const onTabFailed = (err: unknown): void => {
logger.warn(`tabs.onActivated: chrome.tabs.get(${activeInfo.tabId}) failed:`, err);
};
try {
const result = chrome.tabs.get(activeInfo.tabId);
// chrome.tabs.get returns a Promise in MV3. The Promise.resolve
// wrapper accepts both Promise and plain values (defensive against
// older stubs that might return synchronously).
Promise.resolve(result).then(onTabResolved).catch(onTabFailed);
} catch (err) {
onTabFailed(err);
}
});
} catch (err) {
logger.warn('chrome.tabs.onActivated.addListener failed:', err);
}
// chrome.tabs.onUpdated: fires on every tab state change. We only care
// about URL transitions (changeInfo.url present). NOT gated on
// changeInfo.status === 'complete' so SPA-style routing changes (which
// emit changeInfo.url WITHOUT a top-level load event) still get
// captured.
try {
chrome.tabs.onUpdated.addListener(
(_tabId: number, changeInfo: { url?: string }, tab: { url?: string }) => {
// Prefer changeInfo.url (the transition URL) but fall back to
// tab.url (the current resolved URL) for stubs that don't populate
// changeInfo.url consistently. Both are forwarded through the
// filter + dedup gate by addUrl.
const candidate = typeof changeInfo.url === 'string' && changeInfo.url.length > 0
? changeInfo.url
: (typeof tab.url === 'string' ? tab.url : '');
if (candidate.length > 0) addUrl(candidate);
},
);
} catch (err) {
logger.warn('chrome.tabs.onUpdated.addListener failed:', err);
}
}
/**
* Return the deduplicated, first-seen-ordered, filtered list of tab URLs
* observed since module init OR since the last `clearTabUrlsSeen()` call.
*
* Filter (per .planning/phases/02-stabilize-export-pipeline/02-CONTEXT.md
* `<specifics>`):
* - INCLUDE: https://, http://, chrome-extension://
* - EXCLUDE: chrome://, about:, devtools://, file://, blob:, data:
*
* Dedup: each URL appears exactly once.
* Order: first-seen-first.
*
* Always returns a NEW array (slice) — caller cannot mutate the internal
* state.
*
* @returns A copy of the observed URL list. Empty array IS the canonical
* representation of a whole-desktop-no-tab session (per F2
* resolution from plan-checker iteration 1).
*/
export function getTabUrlsSeen(): string[] {
return firstSeenOrder.slice();
}
/**
* Snapshot every currently-open tab's URL into the tracker. Called by
* `createArchive()` (src/background/index.ts) AT SAVE TIME as a defensive
* fallback: captures tabs the operator OPENED during the recording window
* but never activated (so the onActivated listener never fired). Dedup
* + first-seen ordering preserves invariants — tabs already in the Set
* stay where they are; new tabs append in chrome.tabs.query() order.
*
* Requires `tabs` permission (DEC-011 Amendment 1, 2026-05-20).
*
* Errors are caught + logged; the SAVE flow continues with whatever
* state the tracker already had. Per F2: if the snapshot leaves the
* tracker empty (e.g. operator captured a non-Chrome surface), the
* `meta.urls` array stays `[]` — no fake extension-origin URL inserted.
*
* @returns Promise that resolves once the snapshot completes (or fails
* softly via the catch path).
*/
export async function snapshotOpenTabs(): Promise<void> {
try {
const tabs = await chrome.tabs.query({});
if (!Array.isArray(tabs)) {
logger.warn('snapshotOpenTabs: chrome.tabs.query did not return an Array');
return;
}
for (const tab of tabs) {
const url = (tab as { url?: string }).url;
if (typeof url === 'string' && url.length > 0) addUrl(url);
}
} catch (err) {
logger.warn('snapshotOpenTabs: chrome.tabs.query failed:', err);
}
}
/**
* Clear the internal Set + first-seen array. NOT called by
* `createArchive()` — always-on charter (Plan 01-09 Amendment 3) keeps
* the tracker accumulating across saves. Reserved for future use
* (manual session-reset, test isolation outside vi.resetModules).
*/
export function clearTabUrlsSeen(): void {
tabUrlsSeen = new Set();
firstSeenOrder = [];
}