# robots.txt & Sitemap — Examples

> Hands-on walkthroughs with robots.txt & Sitemap: build a WordPress robots.txt, block AI crawlers, generate a sitemap, and check a foreign file live.

Source: https://www.jpkc.com/db/en/tools/robots-sitemap/examples/

Back to the overview: [robots.txt & Sitemap](https://www.jpkc.com/db/en/tools/robots-sitemap/) · Open the tool: [www.jpkc.com/tools/robots-sitemap/](https://www.jpkc.com/tools/robots-sitemap/)

The [manual](https://www.jpkc.com/db/en/tools/robots-sitemap/manual/) explains every tab and option in detail. This page adds **concrete workflows**: typical tasks, played through step by step. The tool's interface is in English, so tab and button names appear exactly as you'll see them.

## Example 1: Build a robots.txt for WordPress

The classic — a clean `robots.txt` for a typical WordPress site.

1. Open the [tool](https://www.jpkc.com/tools/robots-sitemap/), go to the **Examples** tab, and click **Load into Generator** on *WordPress*. The form jumps to the **robots.txt** tab and is filled with the WordPress template.
2. Look at the `User-agent: *` block: it blocks `/wp-admin/` but explicitly frees `/wp-admin/admin-ajax.php` via `Allow` (WordPress needs that for front-end features), and blocks `/wp-includes/`, `/wp-content/plugins/`, `/wp-content/cache/`, and the search-query parameters `/?s=` and `/?p=`.
3. **Adjust the sitemap URL.** The template sets `https://example.com/sitemap_index.xml` (the typical name with the major WordPress SEO plugins). Enter your real domain.
4. **Check the preview on the right**, then **Copy** or **Download**. Upload the file so it's reachable at `https://your-domain.com/robots.txt`.

Important: never block `/wp-content/` wholesale — it also holds your images and often CSS/JS that Google needs to render. The template deliberately blocks only `plugins/` and `cache/`.

## Example 2: Block AI crawlers, allow search engines

You want to stay visible in classic search engines, but not have AI crawlers scrape your content for training.

1. In the **Examples** tab click **Load into Generator** on *Block AI & Scraper Bots*.
2. The template adds an open `User-agent: *` block with an empty `Disallow:` (everyone allowed) — and below it a dedicated block with `Disallow: /` for the AI crawlers `GPTBot`, `ChatGPT-User`, `OAI-SearchBot`, `CCBot`, `anthropic-ai`, `Claude-Web`, `Bytespider`, and `PerplexityBot`.
3. **Add missing bots as needed.** Via **Add User-agent Block** you can add more AI crawlers — the autocomplete field suggests, among others, `ClaudeBot`, `Google-Extended`, `Applebot-Extended`, `meta-externalagent`, and `cohere-ai`. Set a `Disallow: /` rule in each new block.
4. **Copy/Download**, upload, done.

Two honest notes: first, not every bot respects `robots.txt` — it's a voluntary policy, not a technical barrier. Second, blocking `Google-Extended` (Gemini training) does **not** exclude Google's normal search; that keeps running via `Googlebot`. Which bots serve which purpose is in the tool's **Reference** table.

## Example 3: Build an XML sitemap by hand

For a smaller site without a sitemap plugin, you build the file right in the tool.

1. Switch to the **Sitemap** tab. Enter your domain at the top in **Base URL** (e.g. `https://your-domain.com`).
2. Via **Add URL** add one row per page. Per row: the **Path** (e.g. `/`, `/services/`, `/contact/`), an optional **Last Modified** date, a **Freq.**, and a **Priority**.
3. **Set priority relatively.** Common: `1.0` for the homepage, `0.8`–`0.9` for important section pages, `0.5` for static subpages. It's a ranking *within* your site, not a ranking factor against other websites.
4. **Only set `lastmod` when you actually know the date.** A made-up modification date does more harm than good — otherwise leave the field empty.
5. Drag rows into the order you want by the grip handle. Then **Copy** or **Download** and place the file as `sitemap.xml` in your web root.

Tip: submit the sitemap afterwards in the [Google Search Console](https://search.google.com/search-console) and the [Bing Webmaster Tools](https://www.bing.com/webmasters) — the note under the editor reminds you of this too.

## Example 4: Check a foreign domain's robots.txt

You want to know how an existing site (your own or a competitor's) steers its crawlers.

1. Go to the **Check robots.txt** tab, enter a domain (e.g. `example.com` — `/robots.txt` is appended automatically), and click **Check**.
2. **Read the Per-Bot Access table first.** It shows `Allowed` or `Blocked` at a glance for 40-plus known bots. The **Source** column reveals where the decision comes from: `Specific` (a dedicated block for exactly this bot), `Wildcard *` (via the `User-agent: *` block), or `Default` (no matching rule — so allowed).
3. **Watch the AI rows** (type `AI`): does the site lock out `GPTBot`, `ClaudeBot`, `PerplexityBot` and co.? That's exactly the information you also need for your own GEO strategy.
4. **Test a specific URL.** In the *Test a URL against these rules* block you enter a path (e.g. `/blog/my-article`) and a user-agent (e.g. `Googlebot`) — the tool tells you whether the file allows or blocks access and which rule applies.
5. Under **Sitemaps declared** are the sitemaps linked in the file. Via the **check** button next to each you jump straight into sitemap checking (see the next example).

## Example 5: Validate a sitemap and check spec limits

You want to make sure your `sitemap.xml` is clean and complete.

1. In the **Check Sitemap** tab, enter a domain or full sitemap URL and click **Check**. (Or you arrived here directly via the **check** button from the robots.txt check.)
2. **Watch the spec warnings.** The tool warns as soon as the file exceeds **50,000 URLs** or **50 MB** — then you have to split into multiple files plus an index file.
3. **Read the metadata coverage table.** It shows with progress bars at what percentage of URLs `lastmod`, `changefreq`, and `priority` are set. Low `lastmod` coverage isn't inherently bad — but if you want to signal freshness, there's room here.
4. In the **URLs** table you see the first 100 entries. Via **Copy all URLs** you get the complete list (beyond 100 too) into your clipboard — handy for reconciling with your CMS.

## Example 6: Break down a sitemap index file

Large sites spread their URLs across multiple sitemaps bundled by an index file.

1. In the **Check Sitemap** tab, enter the URL of the index file (often `…/sitemap_index.xml`) and click **Check**.
2. The tool automatically detects that this is a **sitemap index file** and lists the **child sitemaps** with their optional `lastmod`.
3. **Drill into individual child sitemaps.** Next to each is a **Check** button that loads that file and analyzes it in detail (URL count, metadata coverage, spec warnings).

That way you work from the index down to the individual URL — without having to open the files yourself. (To build a sitemap, by the way, use the generator from [Example 3](https://www.jpkc.com/db/en/tools/robots-sitemap/examples/#example-3-build-an-xml-sitemap-by-hand); the generator's import button deliberately handles only individual sitemaps, not index files.)

---

There's more depth: the [overview](https://www.jpkc.com/db/en/tools/robots-sitemap/) for the big picture, the [manual](https://www.jpkc.com/db/en/tools/robots-sitemap/manual/) for every option, and the [tips & tricks](https://www.jpkc.com/db/en/tools/robots-sitemap/tips/) for strategy and pitfalls. You can try all of it directly in the [tool](https://www.jpkc.com/tools/robots-sitemap/).

