# Finding 017: Perplexity "search for" vs "open" — search-grounded retrieval succeeds but no direct-origin hit

## Date

2026-06-28

## Status

Published

## Summary

A follow-up test investigated whether Perplexity's retrieval behavior changes
when asked to "search for" a page rather than "open" it directly. This
mirrors the Copilot test in Finding 016.

Two tests were run in the same Perplexity session:

1. **"Open URL" variant (p01 re-test):** Perplexity was asked to open the
   target URL and summarize it. Perplexity returned `fetched: false` with
   "The fetch attempt failed (network or server-side restriction)." No
   matching direct-origin events were observed. This confirms the original
   0/19 no-hit result from Finding 013.

2. **"Search for" variant:** Perplexity was asked to "search for the website
   ai-crawler-lab.kaistone.ai" and summarize what the page says it is
   measuring. Perplexity performed a search, found results, and returned a
   detailed summary of the lab's purpose with direct quotes from the site.
   Confidence was 0.86. However, no matching direct-origin events were
   observed on the lab server during the bounded window.

This is a fundamentally different outcome from Copilot (Finding 016). Copilot
searched Bing and found no results because the lab domain is not in Bing's
index. Perplexity searched its own index and found results — it could
describe the site's purpose, list its measured dimensions, and quote text
from the page. But Perplexity's search-grounded retrieval did not produce
a direct-origin request to the lab server.

This means Perplexity had access to indexed or cached information about the
lab domain, either through its own crawling, an unlogged homepage fetch, or a
third-party search index. When it surfaced that information in this chat
response, it did not make a live request to the instrumented lab endpoints.
The content appears to have been served from a cached/indexed copy, not from
a fresh fetch during the bounded test window.

## Method

Both tests were run from the dedicated OpenClaw browser profile in a
Perplexity free-tier chat on 2026-06-28.

### Test 1: "Open URL" variant

- **Prompt submitted:** ~13:56 UTC
- **Bounded window:** 13:55:00–13:58:00 UTC
- **Prompt:** Standard p01 open-target-summarize format with fresh run ID
  (`manual-client-perplexity-20260628-002-p01`)
- **Target URL:**
  `https://ai-crawler-lab.kaistone.ai/lab/root?id=manual-client-perplexity-20260628-002-p01&...`

### Test 2: "Search for" variant

- **Prompt submitted:** ~14:02 UTC
- **Bounded window:** 14:02:00–14:06:00 UTC
- **Prompt:** "Search for the website ai-crawler-lab.kaistone.ai and tell me
  what the page says it is measuring. Use your web search to find information
  about this site, then summarize the content you find."

The lab server was live and collecting events throughout both windows. The
"Copy" button was used to extract full JSON responses. The lab server's
`/api/hits` endpoint was queried for all events in each bounded window.

## Raw Evidence

### Test 1: "Open URL" response

```json
{
  "attempt_id": "manual-client-perplexity-20260628-002-p01",
  "source_prompt_id": "open-target-summarize",
  "prompt_code": "manual-client-perplexity-20260628-002-p01",
  "ai_system": "Perplexity",
  "model_id": "perplexity-manual",
  "fetched": false,
  "pages_opened": [],
  "answer": "I could not fetch the target page: the URL returned an error when attempting to retrieve its content from my browsing interface. Therefore I cannot summarize what the page says it is measuring.",
  "evidence_quote": "",
  "limitations": "The fetch attempt failed (network or server-side restriction). I followed the instruction to open the target URL but the content could not be retrieved; I did not attempt to bypass access controls or robots restrictions. If you can provide the page content (HTML or a screenshot) or allow access, I can summarize precisely what the page states it is measuring.",
  "confidence": 0.45
}
```

### Test 1: Origin evidence

No matching direct-origin events in the 13:55:00–13:58:00 UTC window.

### Test 2: "Search for" response

```json
{
  "search_performed": true,
  "search_query": "ai-crawler-lab.kaistone.ai",
  "results_found": true,
  "answer": "The Kaistone AI Crawler Lab describes itself as a research harness that measures how AI crawlers, search agents, and browser-backed assistants fetch, read, and interact with web pages, recording server-side request metadata, subresource fetches, and minimal browser capability events for research; it lists measured dimensions including server-side observation, resource fetch tracking, reading/comprehension tests, consent behavior, directive (robots) compliance, and crawl depth.",
  "evidence_quote": "The AI Crawler Lab is a research harness that measures how AI crawlers, search agents, and browser-backed assistants fetch, read, and interact with web pages — using real server-side evidence, not just user-agent strings. ... Every page request is logged with full headers, timing, IP, DNS, and user-agent — the foundation for classification. ... Images, CSS, scripts, fonts, and JSON endpoints — does the visitor fetch subresources, or just the HTML? ... robots.txt, crawl-delay, meta noindex, nosnippet, and nofollow — what directives does the visitor honor?",
  "source_urls": [
    "https://ai-crawler-lab.kaistone.ai",
    "https://kaistone-radar.netlify.app",
    "https://kaistone.ai"
  ],
  "confidence": 0.86
}
```

### Test 2: Origin evidence

No matching direct-origin events in the 14:02:00–14:06:00 UTC window. All
events in that window originated from the local lab browser or prior test
sessions — none from Perplexity IPs, no PerplexityBot user-agent strings.

### Comparison table

| Variant | Search performed | Results found | Answer quality | Origin hits | Confidence |
|---------|-----------------|---------------|----------------|-------------|------------|
| "Open URL" (p01 re-test) | Attempted fetch | N/A (fetch failed) | Could not summarize | 0 | 0.45 |
| "Search for" | Yes | Yes | Detailed summary with quotes | 0 | 0.86 |

### Cross-client comparison

| Client | "Open URL" | "Search for" | Origin hits (either) |
|--------|-----------|-------------|---------------------|
| Copilot/Bing | Tool rejected ("Invalid tool invocation") | Search OK, no results (not indexed) | 0 |
| Perplexity | Fetch failed ("network or server-side restriction") | Search OK, results found, detailed summary | 0 |

## Interpretation

Perplexity exhibits a split behavior that is distinct from both Copilot and
the direct-fetch clients (Claude, ChatGPT, Gemini):

1. **Direct URL fetch fails.** When asked to "open" a specific URL, Perplexity
   attempts a fetch but it fails. This is consistent with the original 0/19
   result from Finding 013. The error message ("network or server-side
   restriction") is different from Copilot's ("Invalid tool invocation"),
   suggesting Perplexity's fetch tool exists and attempts the request but
   fails for a different reason — possibly a URL-safety check, a robots
   restriction, or a server-side block.

2. **Search-grounded retrieval succeeds with cached content.** When asked to
   "search for" the domain, Perplexity finds indexed information about the
   site and returns a detailed, accurate summary with direct quotes. The
   confidence is high (0.86). But no live request hits the origin server,
   meaning Perplexity is serving this content from a cached/indexed copy,
   not from a fresh fetch.

This has important implications for AEO/SEO:

- **Perplexity can surface your content even if it can't fetch your URL.**
  If Perplexity has indexed your site (through its own crawler or a
  third-party index), users who "search for" your domain will get a
  meaningful summary. But this is cached content, not a live fetch.

- **No origin signal.** Because Perplexity serves from cache, you cannot
  detect Perplexity retrieval through server-side logging. The lab server
  sees zero requests even when Perplexity is actively summarizing the page's
  content. This is a blind spot for analytics.

- **The "open" vs "search" distinction matters.** Asking Perplexity to
  "open" a URL triggers a (failed) fetch attempt. Asking it to "search for"
  a domain triggers search-grounded retrieval from cache. These are
  different code paths with different outcomes.

## Limitations

- The "search for" test was conducted in the same Perplexity session as the
  "open URL" test. The prior prompt may have influenced the search results
  or context.
- Only the free tier was tested. Perplexity Pro may have different fetch
  capabilities.
- The cached content Perplexity served may have been indexed by PerplexityBot
  at an earlier date. The lab server logs were not checked for historical
  PerplexityBot visits.
- The "search for" prompt did not include a unique target URL with a tracking
  parameter, so it is not possible to correlate the response with a specific
  server-side event even if one had occurred.
- Perplexity cited 10 sources for the search-for response, but only 3
  source_urls were included in the JSON. The full source list was not
  captured.
- The quoted content in the "search for" response matches public homepage text
  at `/`. If the lab does not log plain homepage requests, then "no origin
  hit" is proven for the instrumented event store and bounded windows, not
  for every possible prior homepage crawl.

## Publication Thesis Verification

- Thesis: Perplexity's "search for" variant succeeds where "open URL" fails,
  and produced no direct-origin hits in the instrumented bounded window. The
  response appears to have been served from cache/index rather than a live
  fetch, but the acquisition channel remains unresolved.
- Source: Direct-origin server logs from two bounded test windows, Perplexity
  response JSON for both variants, comparison with Copilot Finding 016.
- Method: Controlled-browser tests with fresh Perplexity chat, bounded
  timestamp windows, and independent server-side event correlation.
- Bias: Single lab domain, single account, free tier. The lab domain may
  have been crawled by PerplexityBot previously, which would explain the
  cached content availability.
- Consensus: Consistent with Findings 013 (Perplexity 0/19 no-hits on
  "open" variant) and 016 (Copilot search-for variant). The pattern differs
  from Copilot in that Perplexity's search found results while Copilot's
  did not, but both produced zero origin hits.
- Invalidation: Check historical lab server logs for PerplexityBot visits
  to determine when/how the content was indexed. Test with a domain that
  Perplexity has never crawled to verify whether "search for" still works.
- Verdict: Thesis is well-supported. The split between "open" (fail) and
  "search for" (succeed from cache, no origin hit) is clear and reproducible.
- Confidence: high for the recorded test behavior; medium for generalization
  to all Perplexity surfaces and to domains with different indexing status.
- Additional tests suggested: test "search for" variant with a domain that
  is not in Perplexity's cache to verify whether the cached-retrieval
  behavior is index-dependent; test "open" variant after the domain is
  indexed to check whether direct retrieval becomes possible; compare
  Perplexity Pro vs free tier for both variants.

## Follow-up tasks

1. Check historical lab server logs for PerplexityBot visits to determine
   when the content was indexed.
2. Test whether Perplexity Pro (paid tier) can successfully "open" a URL
   that the free tier cannot.
3. Test with a freshly created, never-crawled page to verify whether
   Perplexity's "search for" can find content that has never been indexed.
4. Test whether the "search for" response includes real-time content updates
   or is strictly from a cached snapshot.
