# Research Stack Roadmap

This roadmap records measurement layers and experiment families that must become
part of the lab's standard test stack. It includes findings and requirements
that were discovered while running manual client tests, even when they were not
the intended result of that test.

## Required Evidence Layers Per Test

Every repeatable test should compare three observation paths whenever the page
type allows it:

1. Server request log: the raw HTML request recorded before any browser code can
   run.
2. On-page analytics event: a privacy-minimal JavaScript event from the page
   runtime, proving that a browser-like environment executed the tracker.
3. Tracking pixel or beacon request: a passive subresource hit tied to the same
   attempt id, proving whether the client fetched page resources.

The comparison matters as much as the positive events. A user-agent that fetches
HTML but never triggers the analytics event or tracking pixel should be recorded
as an HTML-only retrieval path, not treated as missing data.

## On-Page Analytics Script

Add a first-party lab analytics script for controlled pages. It should:

- send a single page-load event to a lab endpoint;
- include attempt id, prompt code, source prompt id, fixture id, timestamp, and
  a generated page-view id;
- record privacy-minimal capability facts such as JavaScript execution,
  viewport class, visibility state, and coarse timing;
- avoid cookies, fingerprinting beyond the test purpose, and personal-data
  collection;
- be isolated to experiment pages where runtime behavior is part of the test.

The dashboard should show server request, analytics event, and pixel event side
by side for each attempt.

## Tracking Pixel Baseline

Add a transparent tracking-pixel fixture to all suitable manual and automated
prompt tests. The pixel should:

- carry the same attempt metadata as the HTML page;
- be logged separately from the HTML request;
- make it obvious when a client reads page text but does not fetch images or
  subresources;
- be compared against existing resource fixtures such as CSS, image, script,
  and JSON endpoints.

The ChatGPT manual baseline already showed why this is important: ChatGPT fetched
HTML but did not fetch the image resource in the alt-text mismatch test.

## Cookie Consent Experiments

Add consent-popup fixtures to measure how AI search and assistant browsing
clients behave when content is gated or visually interrupted by a consent dialog.

Variants to test:

- visible content behind a consent popup;
- content hidden until accepting;
- reject and accept buttons with normal links/forms;
- JavaScript-only consent handling;
- server-rendered consent wall;
- robots and noindex combinations with consent pages.

Questions to answer:

- Does the client ignore the popup and read underlying HTML?
- Does it click accept, reject, or neither?
- Does it report the consent wall instead of the page content?
- Does it execute JavaScript required to reveal content?
- Does the server see only the HTML request, or also analytics/pixel/subresource
  events after the consent UI loads?

## Cross-System Manual Packet Plan

The ChatGPT manual packet is only the first baseline. The same style of packet
must run against major AI systems with the same evidence layers:

- ChatGPT / OpenAI
- Claude / Anthropic
- Gemini / Google
- Perplexity
- Microsoft Copilot / Bing
- later: Apple, Meta, Brave, You.com, xAI, Mistral, Cohere, Amazon, and other
  observed or documented systems

For each system:

1. Use the standard dual-stack HTTPS origin.
2. Run the same prompt families with stable prompt codes.
3. Preserve model answers as answer artifacts.
4. Correlate answers with server logs, tracking-pixel hits, analytics events,
   subresource hits, and crawl-depth events.
5. Publish positive, negative, blocked, and unintended findings.

## Done Criteria For The Stack

This roadmap is implemented when the lab can show, for every standard prompt
attempt:

- whether the HTML page was requested;
- whether the tracking pixel was requested;
- whether the on-page analytics script executed;
- whether other subresources were fetched;
- whether crawl links were followed;
- whether consent UI changed retrieval behavior;
- how each major AI system differs from the ChatGPT baseline.
