# Structured Data & Technical GEO

> The technical side of GEO: JSON-LD schemas, llms.txt, AI crawler management and the Markdown mirror — the last one as this site's lived experiment.

Source: https://www.jpkc.com/db/en/blog/structured-data-technical-geo/

Technical GEO makes sure a machine can read, attribute and cite your content cleanly — before phrasing even enters the picture. Structured, machine-readable content is favored; unstructured text blocks are increasingly ignored. I cover two levels here: the data markup (JSON-LD, tables, `llms.txt`) and the technical foundation (crawler management, Markdown alternate, server-side rendering). I write the Markdown part first-hand — this very site is the experiment. The framing comes from the GEO pillar [What is GEO?](https://www.jpkc.com/db/en/blog/was-ist-geo/).

## Structured data: JSON-LD and machine-readable formats

Structured data tells a machine explicitly what content means — instead of forcing it to guess from prose. For GEO, these building blocks matter most. An important caveat: Google declares Schema *not required* for its own AI search (see [Google's AI Optimization Guide](https://www.jpkc.com/db/en/blog/google-ai-optimization/)), but for other AI engines it remains a useful signal.

| Building block | Purpose |
| --- | --- |
| `FAQPage`, `HowTo` | Make question-and-answer and step-by-step content directly extractable |
| `Article` | Mark up articles with author, date and `dateModified` |
| `Review`, `Product` | Give reviews and product data explicit attributes |
| `Organization`, `Person` | Make visible who is behind the content — the basis for E-E-A-T and attribution |
| Tables with `<th>` | Correctly marked-up HTML tables are machine-readable and highly quotable |
| Definition lists | `<dl>`/`<dt>`/`<dd>` for glossaries and key-value explanations |
| `llms.txt` | A Markdown file at the domain root describing the site's purpose and structure for AI |

The `Organization` and `Person` schema is the underrated part: AI uses it to verify experience, expertise and authority (E-E-A-T) and to attribute a citation correctly. Publish anonymously and you forfeit exactly that attribution.

## The Markdown mirror: this site's experiment

The most powerful technical lever is a clean Markdown version of every page — and that is precisely what I run here. AI agents extract content far more reliably from Markdown than from rendered HTML, because navigation, ads and styling noise are gone. There are two ways to offer it, and this site uses both:

1. **Static `.md` mirror plus a `<head>` link** — every page here has a Markdown counterpart and a `<link rel="alternate" type="text/markdown" href="…">` in the head. Crawlers that don't negotiate still find the content this way.
2. **Content negotiation over HTTP** — when an agent sends `Accept: text/markdown`, the server responds with `Content-Type: text/markdown` instead of HTML.

I run this on this knowledge platform under `/db/` as a deliberate experiment: every article — including this one — is additionally held natively as Markdown so AI systems can access the data without noise. You can try it yourself by opening the Markdown variant of this page. For me this is not a theoretical tip but lived practice — and the basis on which I judge what works.

## Technical foundation: crawlers, rendering, URLs

For AI to reach your content at all, the technical setup has to be right. Four points are decisive here.

- **AI crawler management** — the common AI crawlers are `GPTBot`, `OAI-SearchBot`, `Google-Extended`, `ClaudeBot`, `PerplexityBot`, `CCBot` and `Bytespider`. You allow or block them selectively in `robots.txt`. A deliberate decision matters more here than blanket blocking or allowing.
- **Server-side rendering** — AI crawlers struggle with JavaScript-only pages. Critical content must be in the initial HTML response.
- **Fast load times** — AI crawlers respect crawl-delays and skip slow sites. Optimize Time to First Byte, enable compression, use a CDN.
- **Clean URLs and internal linking** — logical, descriptive URLs aid topical categorization, and a strong internal link structure makes your topical authority visible.

`llms.txt` rounds this off: a Markdown file at the domain root describing your site, its key content areas and the citation you'd prefer. It is still experimental but gaining adoption. You'll find more on adoption and the standard at llmstxt.org.

## FAQ

### Do I need Schema markup for GEO?

It depends on the engine. Google explicitly declares Schema not required for its own AI search, because that uses the classic search index. For Perplexity, ChatGPT, Claude and other engines, structured markup remains a useful signal — and it never hurts. My recommendation: use `Organization`/`Person` and matching content types like `Article` or `FAQPage` wherever they correctly describe the content anyway.

### What is a Markdown mirror and do I need one?

A Markdown mirror is a clean Markdown version of your page without navigation, ads and styling noise — offered via `<link rel="alternate" type="text/markdown">` or HTTP content negotiation. AI agents extract from it more reliably than from HTML. It isn't mandatory, but it's a clear advantage. I run it on this site for every article and can confirm the difference.

### Should I block or allow AI crawlers?

That's a deliberate decision, not a default. If you want to be cited in AI answers, allow the relevant crawlers like `GPTBot`, `OAI-SearchBot`, `Google-Extended`, `ClaudeBot` and `PerplexityBot`. If you want to protect content from training but stay visible in search, differentiate between training and search bots in `robots.txt`. Blanket blocking costs visibility.

## Further reading

The framing is the GEO pillar [What is GEO?](https://www.jpkc.com/db/en/blog/was-ist-geo/). How Google views Schema and `llms.txt` is clarified in [Google's AI Optimization Guide](https://www.jpkc.com/db/en/blog/google-ai-optimization/). How to phrase the content to be cited is in [Writing for AI](https://www.jpkc.com/db/en/blog/schreiben-fuer-ki/); how AI breaks questions into sub-queries is in [Multi-Turn and Query Fan-Out](https://www.jpkc.com/db/en/blog/multi-turn-query-fan-out/). Check the technical state of your site with the [SEO & GEO Analyzer](https://www.jpkc.com/db/en/tools/seo/).

