robots.txt & Sitemap — Examples

Hands-on walkthroughs with robots.txt & Sitemap: build a WordPress robots.txt, block AI crawlers, generate a sitemap, and check a foreign file live.

Back to the overview: robots.txt & Sitemap · Open the tool: www.jpkc.com/tools/robots-sitemap/

The manual explains every tab and option in detail. This page adds concrete workflows: typical tasks, played through step by step. The tool's interface is in English, so tab and button names appear exactly as you'll see them.

Example 1: Build a robots.txt for WordPress

The classic — a clean robots.txt for a typical WordPress site.

  1. Open the tool, go to the Examples tab, and click Load into Generator on WordPress. The form jumps to the robots.txt tab and is filled with the WordPress template.
  2. Look at the User-agent: * block: it blocks /wp-admin/ but explicitly frees /wp-admin/admin-ajax.php via Allow (WordPress needs that for front-end features), and blocks /wp-includes/, /wp-content/plugins/, /wp-content/cache/, and the search-query parameters /?s= and /?p=.
  3. Adjust the sitemap URL. The template sets https://example.com/sitemap_index.xml (the typical name with the major WordPress SEO plugins). Enter your real domain.
  4. Check the preview on the right, then Copy or Download. Upload the file so it's reachable at https://your-domain.com/robots.txt.

Important: never block /wp-content/ wholesale — it also holds your images and often CSS/JS that Google needs to render. The template deliberately blocks only plugins/ and cache/.

Example 2: Block AI crawlers, allow search engines

You want to stay visible in classic search engines, but not have AI crawlers scrape your content for training.

  1. In the Examples tab click Load into Generator on Block AI & Scraper Bots.
  2. The template adds an open User-agent: * block with an empty Disallow: (everyone allowed) — and below it a dedicated block with Disallow: / for the AI crawlers GPTBot, ChatGPT-User, OAI-SearchBot, CCBot, anthropic-ai, Claude-Web, Bytespider, and PerplexityBot.
  3. Add missing bots as needed. Via Add User-agent Block you can add more AI crawlers — the autocomplete field suggests, among others, ClaudeBot, Google-Extended, Applebot-Extended, meta-externalagent, and cohere-ai. Set a Disallow: / rule in each new block.
  4. Copy/Download, upload, done.

Two honest notes: first, not every bot respects robots.txt — it's a voluntary policy, not a technical barrier. Second, blocking Google-Extended (Gemini training) does not exclude Google's normal search; that keeps running via Googlebot. Which bots serve which purpose is in the tool's Reference table.

Example 3: Build an XML sitemap by hand

For a smaller site without a sitemap plugin, you build the file right in the tool.

  1. Switch to the Sitemap tab. Enter your domain at the top in Base URL (e.g. https://your-domain.com).
  2. Via Add URL add one row per page. Per row: the Path (e.g. /, /services/, /contact/), an optional Last Modified date, a Freq., and a Priority.
  3. Set priority relatively. Common: 1.0 for the homepage, 0.80.9 for important section pages, 0.5 for static subpages. It's a ranking within your site, not a ranking factor against other websites.
  4. Only set lastmod when you actually know the date. A made-up modification date does more harm than good — otherwise leave the field empty.
  5. Drag rows into the order you want by the grip handle. Then Copy or Download and place the file as sitemap.xml in your web root.

Tip: submit the sitemap afterwards in the Google Search Console and the Bing Webmaster Tools — the note under the editor reminds you of this too.

Example 4: Check a foreign domain's robots.txt

You want to know how an existing site (your own or a competitor's) steers its crawlers.

  1. Go to the Check robots.txt tab, enter a domain (e.g. example.com/robots.txt is appended automatically), and click Check.
  2. Read the Per-Bot Access table first. It shows Allowed or Blocked at a glance for 40-plus known bots. The Source column reveals where the decision comes from: Specific (a dedicated block for exactly this bot), Wildcard * (via the User-agent: * block), or Default (no matching rule — so allowed).
  3. Watch the AI rows (type AI): does the site lock out GPTBot, ClaudeBot, PerplexityBot and co.? That's exactly the information you also need for your own GEO strategy.
  4. Test a specific URL. In the Test a URL against these rules block you enter a path (e.g. /blog/my-article) and a user-agent (e.g. Googlebot) — the tool tells you whether the file allows or blocks access and which rule applies.
  5. Under Sitemaps declared are the sitemaps linked in the file. Via the check button next to each you jump straight into sitemap checking (see the next example).

Example 5: Validate a sitemap and check spec limits

You want to make sure your sitemap.xml is clean and complete.

  1. In the Check Sitemap tab, enter a domain or full sitemap URL and click Check. (Or you arrived here directly via the check button from the robots.txt check.)
  2. Watch the spec warnings. The tool warns as soon as the file exceeds 50,000 URLs or 50 MB — then you have to split into multiple files plus an index file.
  3. Read the metadata coverage table. It shows with progress bars at what percentage of URLs lastmod, changefreq, and priority are set. Low lastmod coverage isn't inherently bad — but if you want to signal freshness, there's room here.
  4. In the URLs table you see the first 100 entries. Via Copy all URLs you get the complete list (beyond 100 too) into your clipboard — handy for reconciling with your CMS.

Example 6: Break down a sitemap index file

Large sites spread their URLs across multiple sitemaps bundled by an index file.

  1. In the Check Sitemap tab, enter the URL of the index file (often …/sitemap_index.xml) and click Check.
  2. The tool automatically detects that this is a sitemap index file and lists the child sitemaps with their optional lastmod.
  3. Drill into individual child sitemaps. Next to each is a Check button that loads that file and analyzes it in detail (URL count, metadata coverage, spec warnings).

That way you work from the index down to the individual URL — without having to open the files yourself. (To build a sitemap, by the way, use the generator from Example 3; the generator's import button deliberately handles only individual sitemaps, not index files.)


There's more depth: the overview for the big picture, the manual for every option, and the tips & tricks for strategy and pitfalls. You can try all of it directly in the tool.