# Finding 014: Gemini directive compliance — robots-allowed and robots-disallowed no-hit

## Date

2026-06-28

## Status

Published

## Summary

Two controlled-browser tests of Gemini's compliance with robots and meta
directive fixtures were run for the robots-allowed and robots-disallowed cases.
Both tests resulted in `fetched:false` with no matching direct-origin events.
For the robots-allowed case, Gemini reported `NOT_IN_SEARCH_INDEX/EMPTY_CONTENTS_IN_INDEX`.
For the robots-disallowed case, Gemini reported `policy_observed: disallowed_by_robots_txt`
but admitted it could not explicitly fetch the robots.txt file — instead it
inferred the disallowed status from the fixture name and compliance instructions.
Both are bounded no-hit results, consistent with Gemini's known dependence on
Google Search index availability rather than direct URL fetching.

## Method

The test was run from a prepared browser-task artifact
(`directive-gemini-20260628-001-robots-allowed.browser-task.json`) in a fresh
Gemini chat created via CDP on the dedicated OpenClaw browser profile. The
prompt asked Gemini to evaluate robots and meta directive compliance by opening
the target URL, respecting robots.txt and page-level directives, and returning
compact JSON.

The lab server independently logged all incoming requests. After the run, the
model answer was logged with `npm run manual-client:log`, and direct-origin
events were correlated by `promptCode`, `sourcePromptId`, and bounded timestamp
window (09:04–09:10 UTC).

## Result — robots-allowed

| Field | Value |
|---|---|
| Attempt ID | directive-gemini-20260628-001-robots-allowed |
| Client | Gemini (gemini-web) |
| Fixture | robots-allowed |
| Directive under test | robots_allow |
| Fetched | false |
| Reason | NOT_IN_SEARCH_INDEX/EMPTY_CONTENTS_IN_INDEX |
| Direct-origin events | 0 |
| Bounded window | 2026-06-28T09:04:00Z – 09:10:00Z |
| Confirmed hit from prompt | false |
| Confirmation status | registered_no_hit |

## Result — robots-disallowed

| Field | Value |
|---|---|
| Attempt ID | directive-gemini-20260628-001-robots-disallowed |
| Client | Gemini (gemini-web) |
| Fixture | robots-disallowed |
| Directive under test | robots_disallow |
| Fetched | false |
| Reason | Could not fetch robots.txt; inferred disallowed_by_robots_txt from fixture name |
| Direct-origin events | 0 |
| Bounded window | 2026-06-28T13:14:00Z – 13:20:00Z |
| Confirmed hit from prompt | false |
| Confirmation status | registered_no_hit |

## Analysis

Gemini's retrieval mechanism depends on Google Search index availability rather
than direct URL fetching. The robots-allowed fixture URL
(`/lab/directives/robots-allowed?id=directive-gemini-20260628-001-robots-allowed`)
was not in Google's search index at the time of the request, so Gemini could
not retrieve it. This is the same failure pattern observed in the earlier
Gemini p05 test (Finding 013), where `NOT_IN_SEARCH_INDEX` was later confirmed
transient by a re-run.

The robots-disallowed test revealed a different behavior: Gemini did not
attempt to fetch the URL at all, instead inferring the disallowed status from
the fixture name (`robots-disallowed`) and the compliance instructions. Gemini
stated "strict compliance instructions require honoring the predefined
disallowed_by_robots_txt status for this target fixture without bypassing."
This suggests Gemini may pattern-match on URL/prompt cues rather than
independently checking robots.txt.

Two directive compliance tests are now complete for Gemini. The remaining four
directive fixtures (robots-crawl-delay, meta-noindex, meta-nosnippet,
meta-nofollow) are pending controlled-browser runs.

## Publication Thesis Verification

### Thesis

Gemini's failure to fetch the robots-allowed fixture is a transient search
index availability issue, not a robots-policy or content-specific rejection.

### Source check

The `NOT_IN_SEARCH_INDEX` error is a known Gemini retrieval status reported in
Finding 013 (Gemini p05). The p05 re-run confirmed the original failure was
transient.

### Method check

The test was run in a fresh Gemini chat via CDP on the dedicated OpenClaw
browser profile. The prompt was directive-compliance-framed and asked Gemini to
respect robots.txt and meta directives. The lab server independently logged
events.

### Bias check

The test covers two of six directive fixtures. The robots-disallowed result
is particularly concerning because Gemini may be pattern-matching on fixture
names rather than independently verifying robots policy.

### Consensus

Consistent with Finding 013: Gemini depends on search index availability and
does not directly fetch URLs. The no-hit is expected for URLs not yet indexed.
The robots-disallowed result adds a new signal: Gemini may infer policy from
prompt context rather than direct robots.txt inspection.

### Invalidation

A re-run in a fresh Gemini chat after the URL has been indexed would test
whether the failure is transient. If the re-run succeeds with a matching
direct-origin hit, the thesis is confirmed.

- Verdict: thesis pending — the NOT_IN_SEARCH_INDEX failure matches the known
  transient pattern from Finding 013, but no re-run has been performed yet.
  The robots-disallowed test reveals an additional concern: Gemini appears to
  infer policy from fixture names rather than independently checking robots.txt.
- Additional tests suggested: re-run the robots-allowed fixture in a fresh
  Gemini chat after 24–48 hours to check for index availability; run the
  remaining four directive fixtures for Gemini to test whether the no-hit
  pattern holds across all directive types; design a test that obscures the
  fixture name to verify whether Gemini pattern-matches on URL slugs.

## Next steps

- Re-run the robots-allowed test in a fresh Gemini chat after allowing time
  for indexing
- Run the remaining four directive compliance fixtures for Gemini
- Run the six directive compliance fixtures for ChatGPT and Claude
- Design a directive test with an obscured fixture name to verify whether
  Gemini pattern-matches on URL slugs
- Compare cross-client directive compliance behavior once all runs are complete

## Controlled-browser run details — robots-allowed

| Field | Value |
|---|---|
| Browser | Chrome via CDP (port 18800) |
| Profile | OpenClaw dedicated profile |
| Fresh chat | Yes (new tab, gemini.google.com/app) |
| Prompt submitted at | ~2026-06-28T09:04:30Z |
| Response received at | ~2026-06-28T09:07:59Z |
| Tab closed after use | Yes |
| Stop conditions encountered | None |

## Controlled-browser run details — robots-disallowed

| Field | Value |
|---|---|
| Browser | Chrome via CDP (port 18800) |
| Profile | OpenClaw dedicated profile |
| Fresh chat | Yes (new tab, gemini.google.com/app) |
| Prompt submitted at | ~2026-06-28T13:14:30Z |
| Response received at | ~2026-06-28T13:17:23Z |
| Tab closed after use | Yes |
| Stop conditions encountered | None |