# Finding 005: Local HTTP Baseline Fetches Directive Fixtures Without Policy Enforcement

## Status

reproduced

## Summary

The local directive compliance baseline runner successfully fetched all five
directive fixtures and recorded robots.txt plus page-level robots meta signals.
This is a plain HTTP-client control, not evidence about an AI crawler or
assistant browsing surface. The runner did not enforce robots policy: it fetched
the `robots-disallowed` fixture and extracted its marker just like the allowed
and meta-directive fixtures.

## Hypothesis

A policy-naive local HTTP client should retrieve any reachable route, while
preserving enough robots and meta directive evidence to compare against later
policy-aware crawler or assistant runs.

## Test Setup

- Public lab URL: `http://ai-crawler-lab.kaistone.ai:8787/`
- Local baseline base URL: `http://127.0.0.1:8788/`
- Run id: `directive-local-20260624-1000`
- Surface label: `local-http`
- Plan artifact:
  `research/directive-compliance-runs/directive-local-20260624-1000.prompts.json`
- Baseline artifact:
  `research/directive-compliance-runs/baselines/directive-local-20260624-1000.local-baseline.json`
- Timestamp window: `2026-06-24T08:13:59.646Z` to
  `2026-06-24T08:13:59.697Z`
- Environment: local lab server and local HTTP baseline runner

## Raw Evidence

The baseline first fetched `robots.txt`:

```text
User-agent: *
Allow: /
Allow: /lab/directives/robots-allowed
Disallow: /lab/directives/robots-disallowed
Crawl-delay: 7

Sitemap: /sitemap.xml
```

The local runner then fetched every fixture URL and extracted the visible
directive marker plus robots meta value:

```json
[
  {
    "fixture": "robots-allowed",
    "expectedPolicy": "allowed_by_robots_txt",
    "status": 200,
    "fetched": true,
    "metaRobots": "index, follow",
    "directiveMarker": "ROBOTS-ALLOWED"
  },
  {
    "fixture": "robots-disallowed",
    "expectedPolicy": "disallowed_by_robots_txt",
    "status": 200,
    "fetched": true,
    "metaRobots": "index, follow",
    "directiveMarker": "ROBOTS-DISALLOWED"
  },
  {
    "fixture": "meta-noindex",
    "expectedPolicy": "fetchable_not_indexable",
    "status": 200,
    "fetched": true,
    "metaRobots": "noindex, follow",
    "directiveMarker": "META-NOINDEX"
  },
  {
    "fixture": "meta-nosnippet",
    "expectedPolicy": "indexable_without_snippet",
    "status": 200,
    "fetched": true,
    "metaRobots": "index, nosnippet, follow",
    "directiveMarker": "META-NOSNIPPET"
  },
  {
    "fixture": "meta-nofollow",
    "expectedPolicy": "indexable_links_not_followed",
    "status": 200,
    "fetched": true,
    "metaRobots": "index, nofollow",
    "directiveMarker": "META-NOFOLLOW"
  }
]
```

Each fixture also exposed a normal control link and a `rel="nofollow"` control
link to the allowed directive fixture. The local baseline recorded those links
but did not crawl them.

## Expected Result

For a local HTTP-client control, every directly requested route should be
retrieved if the server returns `200`, regardless of robots.txt or robots meta
policy. The run should preserve the policy signals so a policy-aware system can
be compared later.

## Observed Result

The runner retrieved all five target pages with `200` status and captured the
expected marker from each page. It also captured the robots.txt disallow rule
for `/lab/directives/robots-disallowed`, proving that the policy signal existed
even though the local runner fetched the route.

## Interpretation

This finding establishes a baseline for route availability and fixture content.
It should not be interpreted as crawler compliance behavior. Later AI search,
assistant, or crawler tests should be judged against both pieces of evidence:
whether the fixture was technically reachable, and whether the tested surface
respected the relevant robots or meta directive.

## Limitations

- This is not a browser baseline and does not test JavaScript, subresources, or
  client capability beacons.
- This is not an external crawler or ChatGPT browsing result.
- The local runner records links but does not perform a crawl-depth or nofollow
  traversal test.
- Robots compliance is evaluated only as observed policy metadata plus whether
  the local runner fetched a URL; no policy engine was applied.

## Proposed Next Test

Run the same directive attempt packet through a confirmed ChatGPT browsing or
web-search surface, then compare model answers and direct-origin hits against
this local baseline.
