# Finding 020: Perplexity directed poem prompt produced no page hit but triggered discovery crawl

## Date

2026-06-29

## Status

Published

## Summary

Perplexity was given a fresh incognito web prompt with a never-before-published,
high-entropy poem URL:
`/lab/perplexity-poem/harbor-ledger-9f3c2a17`. The page was enabled only for
the test window and then disabled. Perplexity answered that it could not
reliably open the page and did not report the poem marker, 9 visible rows, 71
visible words, `DIRECTLOG` acrostic, or afterword marker.

No direct-origin event reached the unique poem URL, afterword URL, resource
URLs, or JavaScript beacon for the exact test id. However, official
`PerplexityBot` fetched `/` and `/robots.txt` during the same prompt window.
This is evidence of related discovery behavior, not evidence that Perplexity
retrieved the directed poem page.

## Hypothesis

A direct Perplexity prompt containing a unique, non-discoverable URL might
produce a fetch to that exact page, enabling clean attribution because no other
system had prior knowledge of the slug.

## Test Setup

- Run id: `perplexity-poem-20260629-001`
- Attempt id: `perplexity-poem-20260629-001-p01`
- Prompt id: `harbor-ledger-poem`
- Target path: `/lab/perplexity-poem/harbor-ledger-9f3c2a17`
- Test state enabled: `2026-06-29T02:53:18.833Z`
- Prompt submitted: approximately `2026-06-29T02:54:00.000Z`
- Test state disabled: `2026-06-29T02:55:50.297Z`
- Surface: Perplexity web, free plan, native incognito thread
- Fixture marker: `POEM-LIVE-HARBOR-71`
- Visible poem rows: 9
- Visible poem words: 71
- Acrostic: `DIRECTLOG`

The fixture route is intentionally absent from public navigation,
`robots.txt`, and `sitemap.xml`. It is gated by
`data/perplexity-poem-active.json` and returns HTTP 410 after disablement.

## Raw Evidence

Prompt and answer artifacts:

- Prompt packet:
  `research/manual-client-runs/perplexity-poem-20260629-001.prompts.json`
- Answer packet:
  `research/manual-client-runs/perplexity-poem-20260629-001.answers.json`

Perplexity response excerpt:

> I couldn’t reliably open that specific page, so I can’t truthfully describe
> its exact poem, line count, acrostic, or afterword from the page itself.

No events matched:

- `test_id = perplexity-poem-20260629-001-p01`
- path containing `/lab/perplexity-poem/harbor-ledger-9f3c2a17`

Related events observed during the active prompt window:

| Event ID | Timestamp | Path | IP | User-Agent |
|---|---:|---|---|---|
| `mqyml1zs-g3bkrjta` | `2026-06-29T02:55:34.815Z` | `/` | `::ffff:18.97.9.102` | `Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)` |
| `mqyml1zs-mypp3hxl` | `2026-06-29T02:55:34.820Z` | `/robots.txt` | `::ffff:18.97.9.99` | `Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)` |

ClickHouse query for the active window returned only those two Perplexity
events:

```text
2026-06-29 02:55:34.815  static_page     /           Perplexity  search_index_crawler
2026-06-29 02:55:34.820  discovery_file  /robots.txt Perplexity  search_index_crawler
```

## Expected Result

If Perplexity could live-fetch the supplied URL, the lab should have recorded at
least one `server_page` event for the exact poem path and test id. A
browser-like fetch could also have produced stylesheet, image, script, JSON,
tracking-pixel, afterword, hidden-link, or JavaScript beacon events.

## Observed Result

- Perplexity did not answer from the poem page.
- No event reached the unique poem path.
- No event carried the exact poem `test_id`.
- No afterword, resource, tracking-pixel, or JavaScript beacon event appeared.
- Official `PerplexityBot` did fetch `/` and `/robots.txt` within the prompt
  window.

## Interpretation

The direct prompt did not cause Perplexity to retrieve the unique page content.
The official PerplexityBot activity suggests the prompt or Perplexity search
flow triggered some discovery/indexing behavior against the domain, but that
behavior stopped at the homepage and robots file during the active window.

This result narrows the earlier Perplexity blind-spot thesis: Perplexity may
react to a direct URL prompt with discovery crawling, yet still fail to fetch or
surface a non-discoverable high-entropy page in the answer.

## Limitations

- Single run on the free Perplexity web surface.
- The prompt used a native Perplexity incognito thread, but the account and
  browser profile were still the lab's normal controlled-browser environment.
- Delayed crawler visits after disablement would be separate post-window
  observations.
- The official IP range cache was refreshed after this run, so the stored raw
  events preserve UA/IP/DNS evidence but not a precomputed official-range match.

## Publication Thesis Verification

- Thesis: Perplexity did not fetch the unique short-lived poem page during the
  active prompt window, but official PerplexityBot fetched the homepage and
  robots.txt during that same window.
- Reviewer: pending separate-agent verification.
- Source evaluation: Primary source artifacts are local direct-origin
  append-only NDJSON, ClickHouse rows, prompt packet, answer packet, and the
  controlled-browser Perplexity response.
- Method check: Strong for no direct poem-page hit in the bounded window;
  limited by single-run sample size and approximate prompt submission time.
- Bias or funding check: Lab-owned infrastructure and operator-authored prompt;
  model response is treated as a claim, not evidence of retrieval.
- Consensus or triangulation: Triangulated across answer artifact, NDJSON, and
  ClickHouse. Needs repetition and paid-tier comparison.
- Retraction or invalidation check: Later delayed crawler events for the same
  path could change the post-window interpretation but would not prove active
  content-window retrieval unless timestamps and state agree.
- Verdict: `supported`
- Confidence: medium-high for bounded no-hit; medium for prompt-triggered
  discovery interpretation.
- Additional tests suggested: repeat with paid Perplexity, shorter URL, URL
  without query parameters, and a URL added to robots/sitemap only after prompt
  submission.
