About

Search engines rank pages.
AI engines cite facts.

The AI crawler landscape is changing fast. AEO Pugmill exists to track it honestly and help WordPress sites prepare for it practically.

What the plugin does
Structuring content for AI crawlers

AEO Pugmill adds structured data and machine-readable endpoints to WordPress posts. Some outputs are served as separate URLs that bots can request independently. Others are embedded in the HTML of the page itself. The distinction matters for tracking — and for understanding the limits of what bot analytics can tell us.

Trackable endpoints

Outputs served at their own URLs

Each of these is a distinct resource a bot can request. When a crawler fetches one, the plugin logs the bot name, the resource type, and the date. That per-resource granularity is what makes the network dashboard possible — it shows which bots are requesting which content formats.

llms.txt
/llms.txt

A plain-text index of the site — title, description, and a list of posts with summaries and links to their Markdown versions. Follows the llms.txt specification. AI crawlers use it as a table of contents to decide which pages to fetch in full.

# My Site

> https://example.com

A site about distributed systems and infrastructure.

## Posts

- [Circuit Breakers in Practice](https://example.com/circuit-breakers): How circuit breaker patterns prevent cascading failures in microservice architectures.
  Markdown: https://example.com/circuit-breakers/?aeopugmill_llm=1

## Pages

- [About](https://example.com/about): Background on the author and site focus.
  Markdown: https://example.com/about/?aeopugmill_llm=1
Tracked as: llms_txt — each bot request is counted separately from HTML page visits.
Post Markdown
/your-post/?aeopugmill_llm=1

A structured Markdown rendering of a single post. Includes metadata (publish date, modified date, featured image), the AEO summary, entity list, Q&A pairs, keywords, and the full post content converted to Markdown. Gives AI crawlers a clean, parse-ready version of the content without HTML markup or theme chrome.

# Circuit Breakers in Practice

URL: https://example.com/circuit-breakers
Published: 2026-01-15T10:30:00Z
Modified: 2026-03-10T14:22:15Z

## Summary

Circuit breaker patterns prevent cascading failures by wrapping
remote calls in a state machine that trips open after repeated errors.

**Keywords:** circuit breaker, microservices, fault tolerance

## Entities

- Martin Fowler (Person) — Software author and ThoughtWorks chief scientist
- Hystrix (Technology) — Netflix's circuit breaker library

## Q&A

**Q: When should a circuit breaker trip open?**
After a configurable threshold of consecutive failures within
a rolling time window.

## Content

The full post body in Markdown...
Tracked as: post_markdown
Site summary
/?aeopugmill_llm=1

A Markdown overview of the site served at the home URL with the aeopugmill_llm=1 parameter. Lists the five most recent posts with summaries and links to the full content index at /llms.txt and /llms-full.txt.

Tracked as: site_summary
Standalone JSON-LD
/aeo/your-post.jsonld

A standalone JSON-LD file containing the FAQPage schema, entity mentions, citations, and an associatedMedia link to the Markdown endpoint. Served only for posts that have AEO data. Gives bots direct access to the structured data without parsing the HTML page.

{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "FAQPage",
      "mainEntity": [{
        "@type": "Question",
        "name": "When should a circuit breaker trip open?",
        "acceptedAnswer": {
          "@type": "Answer",
          "text": "After a configurable threshold of consecutive failures..."
        }
      }]
    },
    {
      "@type": "Article",
      "headline": "Circuit Breakers in Practice",
      "mentions": [{
        "@type": "Person",
        "name": "Martin Fowler",
        "sameAs": "https://en.wikipedia.org/wiki/Martin_Fowler_(software_engineer)"
      }],
      "citation": [{
        "@type": "WebPage",
        "url": "https://martinfowler.com/bliki/CircuitBreaker.html",
        "name": "CircuitBreaker — Martin Fowler"
      }],
      "associatedMedia": {
        "@type": "MediaObject",
        "encodingFormat": "text/markdown",
        "contentUrl": "https://example.com/circuit-breakers/?aeopugmill_llm=1"
      }
    }
  ]
}
Tracked as: aeo_jsonld
XML sitemap
/sitemap.xml

A standard XML sitemap with one addition: each post entry includes an xhtml:link alternate pointing to its Markdown endpoint. Bots that understand alternate links can discover the structured version without a separate crawl of /llms.txt.

<url>
  <loc>https://example.com/circuit-breakers</loc>
  <lastmod>2026-03-10</lastmod>
  <changefreq>weekly</changefreq>
  <priority>0.8</priority>
  <xhtml:link rel="alternate" type="text/markdown"
    href="https://example.com/circuit-breakers/?aeopugmill_llm=1"/>
</url>
Tracked as: sitemap
robots.txt additions
/robots.txt

The plugin appends a Sitemap directive and an LLMs-Txt directive to the WordPress-generated robots.txt. The LLMs-Txt line signals to AI crawlers that a structured content index is available.

Sitemap: https://example.com/sitemap.xml

# AI content index
LLMs-Txt: https://example.com/llms.txt
Tracked as: robots_txt
Embedded in HTML

Outputs that live inside the page

These outputs are injected into the HTML <head> or <body> of each post. They are present when any bot (or person) loads the page. There is no separate URL to request — the data rides along with the HTML.

This means bot analytics cannot distinguish whether a crawler looked at the FAQPage schema, the entity mentions, or the post text. The visit registers as a single HTML page request. The plugin tracks these visits under the html resource type (or aeo_post if the post has AEO metadata), but cannot attribute the visit to any specific embedded element.

The data is embedded this way because that is where it belongs. FAQPage schema is most effective when it is part of the page's own <script type="application/ld+json"> block, where search engines and AI crawlers expect to find it. Separating it into standalone files would reduce its utility for traditional search while providing no clear benefit for AI crawlers, which already parse the full page.

FAQPage JSON-LD
Embedded in <head>

Generated from the Q&A pairs stored in the post's AEO metadata. Each question becomes a Question node with an acceptedAnswer. This is the same schema format that Google uses for FAQ rich results — AI crawlers also parse it to extract question-answer pairs as citable facts.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@graph": [{
    "@type": "FAQPage",
    "@id": "https://example.com/circuit-breakers/#faqpage",
    "mainEntity": [{
      "@type": "Question",
      "name": "When should a circuit breaker trip open?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "After a configurable threshold of consecutive failures within a rolling time window."
      }
    }]
  }]
}
</script>
Tracked as: html or aeo_post — not separately distinguishable from the page visit.
Entity mentions with sameAs
Embedded in <head>

Each entity stored in the post's AEO metadata becomes a typed mentions entry in the Article JSON-LD node. The sameAs URL links to an authoritative reference (typically Wikipedia or an official site), giving AI systems a way to disambiguate the entity — "Martin Fowler the software author" vs. any other Martin Fowler.

"mentions": [
  {
    "@type": "Person",
    "name": "Martin Fowler",
    "description": "Software author and ThoughtWorks chief scientist",
    "sameAs": "https://en.wikipedia.org/wiki/Martin_Fowler_(software_engineer)"
  },
  {
    "@type": "SoftwareApplication",
    "name": "Hystrix",
    "sameAs": "https://github.com/Netflix/Hystrix"
  }
]
Tracked as: html or aeo_post
Citation JSON-LD
Embedded in <head>

External links in the post content are extracted and added as citation entries in the Article JSON-LD node. Each citation includes the URL and the link's anchor text. This gives AI systems a structured list of the post's sources without parsing the HTML body.

"citation": [
  {
    "@type": "WebPage",
    "url": "https://martinfowler.com/bliki/CircuitBreaker.html",
    "name": "CircuitBreaker — Martin Fowler"
  },
  {
    "@type": "WebPage",
    "url": "https://netflix.github.io/Hystrix/",
    "name": "Hystrix Wiki"
  }
]
Tracked as: html or aeo_post
Meta tags
Embedded in <head>

The plugin injects description, Open Graph, and Twitter Card meta tags derived from the post's AEO summary (falling back to the WordPress excerpt if no summary exists). These are suppressed when an active SEO plugin is detected and the conflict toggle is enabled.

<meta name="description" content="Circuit breaker patterns prevent cascading failures by wrapping remote calls in a state machine.">
<meta property="og:title" content="Circuit Breakers in Practice">
<meta property="og:description" content="Circuit breaker patterns prevent cascading failures...">
<meta property="og:type" content="article">
<meta name="twitter:card" content="summary_large_image">
Tracked as: html or aeo_post
Bot analytics
How the data is captured and reported

On every incoming request, the plugin checks the user-agent string against a list of 25 known bot signatures — AI answer engines (GPTBot, ClaudeBot, PerplexityBot, Google-Extended), training crawlers (CCBot, Bytespider, DeepSeekBot), search engines (Googlebot, Bingbot), and SEO tools (SemrushBot, AhrefsBot). Matching is case-insensitive substring comparison.

When a match is found, the plugin records three things: the canonical bot name, the resource type requested (one of html, llms_txt, post_markdown, site_summary, sitemap, robots_txt, aeo_jsonld, or aeo_post), and the date. Counts are stored locally in a daily summary table — no per-request log is kept.

For content-bearing requests (HTML pages), the plugin also records content signals: word count bucket, content freshness, fact density, and URL depth. These are the metrics displayed in the Crawl Intelligence section of the network dashboard.

Network participation is opt-in. The plugin functions without contributing to the network. Enabling network intelligence sends a daily count summary — bot name, resource type, visit total, and content signal distributions. No URLs, no content, no user data.

The site identifier sent with each payload is a one-way SHA-256 hash of the site URL and an instance ID. It cannot be reversed to recover the domain. It exists to deduplicate contributions from the same site across days.

Discovery
IndexNow pings

When a post is published or updated, the plugin sends an IndexNow ping to notify search engines that the URL has changed. The ping includes a verification key stored as a static file at the site root. Pings are throttled to one burst per 30 minutes to avoid rate-limiting.

IndexNow is supported by Bing, Yandex, and other participating search engines. Google does not currently participate in the IndexNow protocol but receives a separate sitemap ping.

Contact
Built by Janzen Works

AEO Pugmill is built by Janzen Works. The plugin is free and available to download directly. The network intelligence dashboard is open to anyone at aeopugmill.com.

Feedback, bug reports, and data questions: support@aeopugmill.com