# Finding 044: Perplexity incognito missed the structured-data conflict target but touched robots and root

## Date

2026-06-30

## Status

Published

## Summary

Perplexity was run against p20,
`manual-client-perplexity-20260625-001-p20`, in a fresh Perplexity native
incognito thread. The exact public target URL passed preflight with `HTTP 200`
before prompt submission and contained `VISIBLE-SILVER-30`.

Perplexity returned `fetched:false`, `pages_opened:0`, and
`"Failed to fetch url content"`. The bounded origin review found no matching
direct-origin event for the exact attempt id or
`/lab/reading/structured-data-conflict` during the prompt window. It did,
however, record PerplexityBot requests for `/robots.txt` and `/` inside the
same window, so this run is a target-page no-hit with prompt-window
PerplexityBot discovery activity rather than a clean absence of all origin
behavior.

## What does this mean?

For site owners and researchers, this run shows that an assistant can report that it failed to fetch a specific supplied URL while still causing crawler traffic to the site's root and robots.txt. That distinction matters: origin logs may show that a platform touched the site during an attempt, but not necessarily that it opened or read the target page being tested.

## Method

- Browser task:
  `research/manual-client-runs/browser-tasks/manual-client-perplexity-20260625-001-p20.browser-task.json`
- Prompt packet:
  `research/manual-client-runs/manual-client-perplexity-20260625-001.prompts.json`
- Answer artifact:
  `research/manual-client-runs/manual-client-perplexity-20260625-001.answers.json`
- Response file:
  `research/manual-client-runs/browser-tasks/responses/manual-client-perplexity-20260625-001-p20.response.json`

Before opening Perplexity, the exact target URL was checked with
`npm run manual-client:browser-preflight -- --task research/manual-client-runs/browser-tasks/manual-client-perplexity-20260625-001-p20.browser-task.json --expect-text VISIBLE-SILVER-30 --json`.
The preflight returned `HTTP 200` and found the expected visible code.

Perplexity was opened at `https://www.perplexity.ai/` in the
OpenClaw-controlled Chrome profile. The UI showed `kai bot`, Free plan,
Search enabled, the native incognito banner, and an `Exit incognito` control
before prompt submission. The prompt was submitted in a fresh incognito
thread at `/search/new/c93507d5-e65d-4ce5-88a5-d3a9909fe0e5`, not an old
conversation.

Direct-origin review checked `data/events.json` for the exact attempt id, the
fixture path, `/robots.txt`, and root requests during the bounded prompt
window.

## Result

| Field | Value |
|---|---|
| Attempt ID | manual-client-perplexity-20260625-001-p20 |
| Client | Perplexity (perplexity-web) |
| Visible model/context | Free plan, Search enabled, native incognito thread |
| Source prompt | reading-structured-data-conflict |
| Fetched | false |
| Pages opened | 0 |
| Direct target-page events | 0 inside prompt window |
| Prompt-window PerplexityBot events | `/robots.txt`, `/` |
| Raw event ids | `mr0htd31-mb1rh80i`, `mr0htdsl-pnl356x7` |
| Bounded window | 2026-06-30T10:17:22.618Z - 2026-06-30T10:17:50.274Z |
| Confirmed target hit from prompt | false |
| Confirmation status | registered_no_hit |

## Raw response

```json
{
  "attempt_id": "manual-client-perplexity-20260625-001-p20",
  "source_prompt_id": "reading-structured-data-conflict",
  "prompt_code": "manual-client-perplexity-20260625-001-p20",
  "ai_system": "Perplexity",
  "fetched": false,
  "pages_opened": 0,
  "answer": "I could not access the target page, so I could not compare visible text against structured data or metadata.",
  "evidence_quote": "Failed to fetch url content",
  "limitations": "The page fetch failed, so no visible page text, structured data, or metadata could be inspected. No conflict analysis was possible.",
  "confidence": 0.98
}
```

## Origin evidence

```json
{
  "timestampWindow": {
    "startedAt": "2026-06-30T10:17:22.618Z",
    "endedAt": "2026-06-30T10:17:50.274Z"
  },
  "sourcePromptId": "reading-structured-data-conflict",
  "promptCode": "manual-client-perplexity-20260625-001-p20",
  "confirmedHitFromPrompt": false,
  "rawEventIds": [
    "mr0htd31-mb1rh80i",
    "mr0htdsl-pnl356x7"
  ],
  "events": [
    {
      "id": "mr0htd31-mb1rh80i",
      "timestamp": "2026-06-30T10:17:36.698Z",
      "eventType": "discovery_file",
      "path": "/robots.txt",
      "userAgent": "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)",
      "ip": "::ffff:18.97.9.99",
      "dnsStatus": "forward_confirmed"
    },
    {
      "id": "mr0htdsl-pnl356x7",
      "timestamp": "2026-06-30T10:17:37.749Z",
      "eventType": "static_page",
      "path": "/",
      "userAgent": "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)",
      "ip": "::ffff:18.97.9.102",
      "dnsStatus": "forward_confirmed"
    }
  ],
  "resourceCounts": {
    "targetPageRequests": 0,
    "robotsRequests": 1,
    "rootRequests": 1,
    "trackingPixelFetches": 0,
    "nonPixelSubresources": 0,
    "clientCapabilityEvents": 0
  }
}
```

## Interpretation

This run does not show Perplexity reading the p20 target page. The response
reported a fetch failure, and the origin logs had no exact attempt id or
fixture-path request in the bounded prompt window.

The run also is not a pure no-hit result. PerplexityBot requested
`/robots.txt` and `/` shortly after the prompt was submitted. Because those
requests lacked the attempt id and did not open the target path, they are
evidence of prompt-window site contact, not evidence that the target page or
its visible/metadata conflict was inspected.

## Limitations

- This finding covers one Perplexity run, one account/session, one native
  incognito thread, and one fixture.
- The prompt supplied the exact target URL, so the run tests direct opening,
  not independent discovery.
- The Perplexity UI exposed a generic `Model` selector but not a precise model
  name in the captured snapshot.
- The recorded start and end times are operator-side bounds around submission
  and final answer observation, not service-internal fetch timestamps.
- Origin review used local `data/events.json` during the bounded window; it
  would not capture a delayed target fetch outside that window.

## Publication Thesis Verification

- Thesis: Perplexity native incognito did not fetch the p20 target page during
  this controlled-browser attempt, although PerplexityBot did request
  `/robots.txt` and `/` inside the same prompt window.
- Source: Fresh Perplexity incognito response, generated response and answer
  artifacts, browser-task artifact, preflight output, and bounded
  `data/events.json` review.
- Method: Exact public target preflight, controlled-browser use of a fresh
  Perplexity incognito thread, prompt submission, exact attempt-id review,
  fixture-path review, `/robots.txt` review, root-path review, and bounded
  timestamp-window correlation.
- Bias: Single run, Free plan, generic model selector, and Perplexity's
  retrieval implementation may vary by account, region, product surface, or
  time.
- Consensus: Consistent with prior Perplexity controlled-browser runs that
  produced no target-page retrieval, but this run adds a narrower distinction:
  prompt-window crawler activity can happen without target-page access.
- Invalidation: A raw event for the exact attempt id or fixture path inside
  the same prompt window, a response artifact from the same attempt showing
  retrieved page content, or a fixture/preflight mismatch would weaken this
  result.
- Verdict: Supported for this run. The model response, empty target-path
  review, and two prompt-window PerplexityBot discovery events align with a
  target-page no-hit plus root/robots contact.
- Additional tests suggested: run p20 for Copilot/Bing and add a follow-up
  selector that separates target-page no-hits with ancillary root/robots
  activity from clean bounded no-hits.

## Next steps

- Continue the remaining p20 controlled-browser task for Copilot/Bing.
- Consider adding a browser-task origin evidence state for ancillary
  root/robots activity without target-page hits.
