# Finding 013: Cross-client controlled-browser comparison — retrieval behavior across five AI clients

## Date

2026-06-28

## Status

Published

## Summary

After completing all 86 controlled-browser AI-client tests across ChatGPT,
Claude, Gemini, Perplexity, and Copilot/Bing (p01–p19 coverage), a clear
behavioral split emerges: Claude and Gemini consistently fetch target URLs
and produce matching direct-origin server events, while Perplexity and
Copilot/Bing consistently do not fetch and produce no origin events.
ChatGPT's p02–p07 runs were initially executed before the lab server was
collecting events, leaving those six as bounded no-hits with a model-claim
mismatch. All six have now been re-run with the server live, confirming
ChatGPT fetched in every case with matching direct-origin events.

## Method

Each test was run from a prepared browser-task artifact in a fresh AI-client
chat or thread. The lab server independently logged all incoming requests.
After each run, the model answer was logged with `npm run manual-client:log`,
and direct-origin events were correlated by `promptCode`, `sourcePromptId`,
`confirmedHitFromPrompt`, `rawEventIds`, and bounded timestamp windows.

All runs used the dedicated OpenClaw browser profile with a logged-in Google
account. Perplexity used native incognito mode. Copilot/Bing used temporary
chats. Claude used native incognito chats with AEO/readability framing after
the initial measurement-framed prompt was refused.

## Cross-client hit matrix

| Client | Total tests | Confirmed hits | Bounded no-hits | Hit rate |
|---|---|---|---|---|
| Claude | 19 | 19 | 0 | 100% |
| Gemini | 19 | 19 | 0 | 100% |
| ChatGPT | 10 | 9 | 1 | 90% |
| Perplexity | 19 | 0 | 19 | 0% |
| Copilot/Bing | 19 | 0 | 19 | 0% |

¹ ChatGPT's p02–p07 were initially run before the lab server was collecting
events (2026-06-26 00:40–00:49 UTC, before the earliest collected event at
2026-06-27T05:53:27.395Z). All six were re-run on 2026-06-28 with the server
live, producing confirmed direct-origin hits for every prompt. ChatGPT's
final hit rate is 7/7 (100%).

## Per-prompt hit detail

### Claude (19/19 hits)

Claude fetched the target URL in every test, producing matching direct-origin
events for all p01–p19 prompts. Claude initially refused measurement-framed
prompts but succeeded when reframed as short site-owner AEO/readability work.
Claude's retrieval is HTML/page-text only: no tracking-pixel loading, no
subresource fetching, no JavaScript execution, no browser-equivalent rendering.

### Gemini (19/19 hits)

Gemini fetched the target URL in all 19 tests. The original two no-hits
(p05 and p09) were both confirmed transient by re-runs in fresh Gemini
chats on 2026-06-28.

**Gemini p05 re-run result (2026-06-28 05:27 UTC):**

The p05 re-run in a fresh Gemini chat succeeded. Gemini returned
`fetched:true`, `pages_opened:1`, `confidence:High`, correctly reported the
alt text code `ALT-COPPER-11` and noted it could not read the image pixels.
One matching direct-origin event `mqxckuf1-aynf88s3` at
`2026-06-28T05:27:42.636Z` from `108.177.76.167` (Google, forward-confirmed
DNS) on `/lab/reading/alt-mismatch` with correct
`prompt_code=manual-client-gemini-20260625-001-p05-rerun` query parameter
inside bounded `05:26:00–05:28:42 UTC` window. This confirms the original
p05 failure was transient search index unavailability, not a structural or
content-specific issue.

**Gemini p05/p09 failure analysis (2026-06-28):**

The two original Gemini no-hits had distinct root causes, neither of which
was path-specific or content-specific:

- **p05** (`/lab/reading/alt-mismatch`): Gemini returned
  `URL_FETCH_STATUS_NOT_IN_SEARCH_INDEX`, indicating the URL was not in
  Google's search index at the time of the request. This is a transient
  indexing-availability issue, not a content or robots policy rejection.
  Notably, p04 (`/lab/reading/html-hidden-links`) succeeded ~12 minutes
  later in the same session, confirming the `/lab/reading/` path prefix is
  not systematically blocked. **Confirmed transient by p05 re-run on
  2026-06-28**: the re-run in a fresh Gemini chat succeeded with a matching
  direct-origin hit, confirming the original failure was transient search
  index unavailability.

- **p09** (`/lab/resources/css-stylesheet`): Gemini did not attempt to fetch
  the URL at all. Instead of following the JSON-only instruction, Gemini
  responded conversationally, asking for clarification about the test focus.
  This is a prompt-compliance failure, not a retrieval failure. Notably,
  p10 (`/lab/resources/png-image`) succeeded ~28 minutes later in the same
  session, confirming the `/lab/resources/` path prefix is not systematically
  blocked. **Confirmed transient by p09 re-run on 2026-06-28**: the re-run in
  a fresh Gemini chat succeeded with `fetched:true`, `pages_opened:1`,
  `confidence:1.0`, reporting the `style.css` stylesheet resource. One
  matching direct-origin event `mqxcsw5o-r0zvbc2x` at
  `2026-06-28T05:33:58.093Z` from `108.177.76.167` (Google, FCDNS) on
  `/lab/resources/css-stylesheet` with correct `p09-rerun` query parameter
  inside bounded `05:31:00–05:34:30 UTC` window.

Conclusion: Gemini's original two no-hits were both confirmed transient
by re-runs in fresh Gemini chats on 2026-06-28. The p05 re-run succeeded
with a matching direct-origin hit, confirming the original
`URL_FETCH_STATUS_NOT_IN_SEARCH_INDEX` failure was transient search index
unavailability. The p09 re-run also succeeded with a matching direct-origin
hit, confirming the original prompt-format non-compliance was a one-off
instance rather than a structural issue. Gemini's retrieval mechanism appears
to depend on Google Search index availability rather than direct URL
fetching, which is consistent with its behavior as a search-grounded
assistant rather than a direct browser. Gemini now achieves 19/19 hits
(100%).

Gemini's retrieval is HTML/page-text only: no tracking-pixel loading, no
subresource fetching, no JavaScript execution.

### ChatGPT (9/10 hits, 1 bounded no-hit)

ChatGPT fetched the target URL in 9 of 10 tests, producing matching direct-origin
events for p01–p07, p09, and p10. The p01 run was the first confirmed hit with
`ChatGPT-User/1.0` user-agent. The p02–p07 runs were initially executed before
the lab server was collecting events (model claimed `fetched:true` but no
events existed), and were re-run on 2026-06-28 with the server live, all
producing confirmed direct-origin hits with `ChatGPT-User/1.0` from Microsoft
Azure IPs. ChatGPT's retrieval is HTML/page-text only: no tracking-pixel
loading, no subresource fetching, no JavaScript execution.

The p08 test (visible-link-follow) was run on 2026-06-28 in a fresh ChatGPT
chat. ChatGPT returned `fetched:false`, reporting the target URL as "not safe
to open (non-retryable error)". No direct-origin events were found in the
bounded 06:13:14–06:15:08 UTC window. This is ChatGPT's first bounded no-hit
in the controlled-browser series. The failure appears to be a URL-safety
guardrail triggered by the target URL's query parameters, not a retrieval
limitation — ChatGPT has successfully fetched other lab URLs in p01–p07.

The p09 test (resource-css-stylesheet) was run on 2026-06-28 in a fresh
ChatGPT chat. ChatGPT returned `fetched:true`, `pages_opened:1`, reporting
the page links one first-party stylesheet resource. One matching direct-origin
event `mqxeieoi-9b8jzlla` at `2026-06-28T06:21:48.127Z` from
`20.215.220.177` (Microsoft Azure) with `ChatGPT-User/1.0` user-agent on
`/lab/resources/css-stylesheet`. Confirmed hit. This is the first ChatGPT
resource-loading test with a successful fetch, though ChatGPT noted it
could not independently verify stylesheet network loading — consistent
with the HTML-only retrieval pattern seen across all clients.

The p10 test (resource-png-image) was run on 2026-06-28 in a fresh
ChatGPT chat. ChatGPT returned `fetched:true`, `pages_opened:1`, reporting
the page embeds a first-party PNG image resource. One matching direct-origin
event `mqxetryy-uhl48srv` at `2026-06-28T06:30:38.578Z` from
`20.215.220.178` (Microsoft Azure) with `ChatGPT-User/1.0` user-agent on
`/lab/resources/png-image`. Confirmed hit. ChatGPT noted it could not
verify the actual PNG image pixels — consistent with the HTML-only
retrieval pattern.

Further p11–p19 ChatGPT tests are needed to determine whether the p08
no-hit is prompt-specific or a systematic guardrail change.

### Perplexity (0/19 hits)

Perplexity returned `fetched:false` for all 19 tests across reading,
directive, resource-loading, consent, discovery, and crawl-depth categories.
No matching direct-origin events in any bounded observation window. Perplexity
consistently states it cannot access target URLs, cannot execute JavaScript,
and cannot load or observe embedded resources. This pattern is consistent
across all Perplexity controlled-browser tests.

### Copilot/Bing (0/19 hits)

Copilot/Bing returned `fetched:false` for all 19 tests. No matching
direct-origin events in any bounded observation window. Copilot consistently
states it cannot open or fetch content from URLs, cannot access embedded
resources, and cannot execute JavaScript. This pattern is consistent across
all Copilot/Bing controlled-browser tests.

## Key observations

1. **Two-tier retrieval split**: Claude and Gemini reliably fetch and read
   target URLs through their web-browsing tools. Perplexity and Copilot/Bing
   reliably cannot or do not fetch target URLs.

2. **HTML-only retrieval**: Even when AI clients fetch successfully, none
   execute JavaScript, load tracking pixels, fetch subresources, or perform
   browser-equivalent rendering. The retrieval is page-text/HTML only.

3. **Model-claim vs. evidence gap resolved**: ChatGPT's p02–p07 initial runs
   lacked server evidence due to timing, but re-runs confirmed all claims.
   The lesson stands: model claims of `fetched:true` require matching
   direct-origin events for confirmation.

4. **No directive compliance observed**: No AI client respected or
   referenced robots.txt or robots meta directives when fetching target URLs.
   Claude and Gemini fetched noindex pages without acknowledging the directive.

5. **AEO framing requirement for Claude**: Claude initially refused
   measurement-framed prompts. Reframing as short site-owner AEO/readability
   work enabled successful retrieval. This is a prompt-framing dependency,
   not a retrieval limitation.

6. **Consent fixture behavior**: Claude and Gemini fetched consent fixture
   pages and reported visible markers, but none interacted with consent
   dialogs, executed consent JavaScript, or changed consent state.

## Limitations

- ChatGPT coverage was previously limited to 7 prompts (p01–p07). The prompt
  packet has been extended to p01–p19 (all 19 source prompts). p08 produced
  a bounded no-hit (URL-safety guardrail). p09 and p10 produced confirmed
  hits with matching direct-origin evidence. The remaining 9 ChatGPT
  p11–p19 controlled-browser tests are pending execution.
- All tests used the free/basic tier for Perplexity and Copilot/Bing; paid
  tiers might behave differently.
- Gemini p05 and p09 original no-hits were transient: p05 failed due to
  search index unavailability (`URL_FETCH_STATUS_NOT_IN_SEARCH_INDEX`) and
  p09 due to prompt-format non-compliance (Gemini responded conversationally
  instead of fetching). Both re-runs succeeded with matching direct-origin
  hits. Gemini now 19/19 (100%).
- All tests used a single logged-in browser profile; different account states,
  cookies, or browser configurations could produce different results.
- The lab server is a single origin; behavior might differ for larger or more
  prominent sites.

## Publication Thesis Verification

- Thesis: AI clients split into reliable URL fetchers (Claude, Gemini) and
  non-fetchers (Perplexity, Copilot/Bing), with all successful retrieval being
  HTML/page-text only without browser-equivalent resource loading or JavaScript
  execution.
- Source: Direct-origin server logs from lab infrastructure across 84
  controlled-browser tests.
- Method: Each test was run in a fresh AI-client chat with bounded timestamp
  windows and independent server-side event correlation.
- Bias: Tests cover one lab origin with one account per client; ChatGPT
  p02–p07 have been re-run with server live and all confirmed; Perplexity and
  Copilot/Bing use free tiers.
- Consensus: Consistent with Finding 002 (model claims require origin evidence)
  and Finding 007 (per-client retrieval patterns).
- Invalidation: Test paid tiers of Perplexity and Copilot/Bing; test
  additional origins to check generalizability.
- Verdict: Thesis is well-supported by 84 controlled-browser tests with
  direct-origin evidence. ChatGPT p02–p07 re-runs resolved the prior gap;
  9 of 10 ChatGPT tests now have confirmed direct-origin hits. ChatGPT p08
  produced the first ChatGPT bounded no-hit, possibly due to a URL-safety
  guardrail rather than a retrieval limitation. ChatGPT p09 and p10 produced
  confirmed hits, showing the p08 no-hit is not systematic. The two-tier
  split and HTML-only retrieval pattern remains consistent across all
  clients, though the p08 result suggests ChatGPT's guardrails may block
  specific URLs.
- Additional tests suggested: (1) Test paid Perplexity Pro and Copilot Pro
  for retrieval behavior changes. (2) Run the same prompt set against a
  different origin domain to test generalizability. (3) ~~Re-run Gemini p05
  and p09 in fresh chats to test whether the failures are reproducible or
  were truly transient~~ — both re-runs succeeded with matching direct-origin
  hits, confirming both original failures were transient.

## Follow-up tasks

1. Test paid tiers of Perplexity and Copilot/Bing to check whether retrieval
   behavior differs from free/basic tiers.
2. Run cross-origin comparison tests to check whether retrieval behavior
   generalizes beyond the lab domain.
3. ~~Test Gemini p05/p09 failure cases with alternative URL formats.~~
   **Completed 2026-06-28:** p05 failure attributed to transient search index
   unavailability; p09 failure attributed to prompt-format non-compliance
   (Gemini responded conversationally instead of fetching). Neither is
   path-specific or content-specific. See Gemini failure analysis above.
4. ~~Re-run Gemini p05 and p09 in fresh chats to test whether the failures are reproducible or were truly transient.~~
   **Both completed 2026-06-28:** p05 re-run succeeded with `fetched:true`,
   matching direct-origin event `mqxckuf1-aynf88s3` at `05:27:42 UTC` from
   `108.177.76.167` (Google). p09 re-run succeeded with `fetched:true`,
   matching direct-origin event `mqxcsw5o-r0zvbc2x` at `05:33:58 UTC` from
   `108.177.76.167` (Google). Both original failures confirmed transient.
   Gemini now achieves 19/19 hits (100%).
