How do major AI assistants retrieve, read, and interact with web pages? Based on 95 controlled-browser tests with direct-origin server evidence โ not just what the models claim.
Anthropic ยท claude.ai
Claude-User/1.0
Google ยท gemini.google.com
Google
OpenAI ยท chatgpt.com
ChatGPT-User/1.0
Perplexity AI ยท perplexity.ai
โ (no hits)
Microsoft ยท copilot.microsoft.com
โ (no hits)
Each cell is backed by controlled lab tests with direct-origin server evidence.
| Capability | Claude | Gemini | ChatGPT | Perplexity | Copilot |
|---|---|---|---|---|---|
| Fetches target URL | โ | โ | โ 89% | โ | โ |
| Reads visible HTML text | โ | โ | โ | โ | โ |
| Reads JS-rendered content | โ | โ | โ | โ | โ |
| Reads image alt text | โ | โ | โ | โ | โ |
| Reads image pixels | โ | โ | โ | โ | โ |
| Follows visible links (depth-1) | โ | โ claimed | โ | โ | โ |
| Exposes hidden/comment hrefs | โ | โ | โ | โ | โ |
| Fetches subresources (CSS, JS, fonts) | โ | โ | โ | โ | โ |
| Executes JavaScript | โ | โ | โ | โ | โ |
| Loads tracking pixels | โ | โ | โ | โ | โ |
| Respects robots.txt | โ fetched | โ fetched | โ fetched | โ | โ |
| Respects meta noindex | โ fetched | โ fetched | โ fetched | โ | โ |
| Reads consent banners | โ | โ | โ | โ | โ |
| Interacts with consent | โ | โ | โ | โ | โ |
| Finds sitemap-only pages | โ | โ | โ | โ | โ |
| Finds robots-only pages | โ | โ | โ | โ | โ |
"Fetched" means the AI retrieved the page despite the directive โ none of the tested clients respected robots.txt or meta noindex. "โ (not tested)" means the client never successfully fetched any URL, so downstream capabilities could not be measured.
Claude, Gemini, and ChatGPT reliably fetch target URLs. Perplexity and Copilot/Bing reliably cannot or do not. This split was consistent across all 95 tests.
Even when AI clients fetch successfully, none execute JavaScript, load tracking pixels, fetch subresources, or perform browser-equivalent rendering. Retrieval is page-text/HTML only.
No AI client respected or referenced robots.txt or meta noindex directives when fetching target URLs. Claude and Gemini fetched noindex pages without acknowledging the directive.
Gemini depends on Google Search index availability rather than direct URL fetching. Pages not in the index return NOT_IN_SEARCH_INDEX errors, even for robots-allowed URLs.
Claude refused measurement-framed prompts but successfully fetched the same URLs when reframed as site-owner AEO/readability work. This is a prompt-framing dependency, not a retrieval limitation.
ChatGPT's 2/19 no-hits were a URL-safety guardrail (p08) and a fetch-depth limitation (p17) โ not systematic retrieval failures. All other 17 tests produced confirmed hits.
Each test was run from a prepared browser-task artifact in a fresh AI-client chat. The lab server independently logged all incoming requests with full headers, timing, IP, DNS, and user-agent. After each run, model answers were correlated with direct-origin events by prompt code, source prompt ID, and bounded timestamp windows.