robots.txt & Sitemap — Examples
Hands-on walkthroughs with robots.txt & Sitemap: build a WordPress robots.txt, block AI crawlers, generate a sitemap, and check a foreign file live.
Back to the overview: robots.txt & Sitemap · Open the tool: www.jpkc.com/tools/robots-sitemap/
The manual explains every tab and option in detail. This page adds concrete workflows: typical tasks, played through step by step. The tool's interface is in English, so tab and button names appear exactly as you'll see them.
Example 1: Build a robots.txt for WordPress
The classic — a clean robots.txt for a typical WordPress site.
- Open the tool, go to the Examples tab, and click Load into Generator on WordPress. The form jumps to the robots.txt tab and is filled with the WordPress template.
- Look at the
User-agent: *block: it blocks/wp-admin/but explicitly frees/wp-admin/admin-ajax.phpviaAllow(WordPress needs that for front-end features), and blocks/wp-includes/,/wp-content/plugins/,/wp-content/cache/, and the search-query parameters/?s=and/?p=. - Adjust the sitemap URL. The template sets
https://example.com/sitemap_index.xml(the typical name with the major WordPress SEO plugins). Enter your real domain. - Check the preview on the right, then Copy or Download. Upload the file so it's reachable at
https://your-domain.com/robots.txt.
Important: never block /wp-content/ wholesale — it also holds your images and often CSS/JS that Google needs to render. The template deliberately blocks only plugins/ and cache/.
Example 2: Block AI crawlers, allow search engines
You want to stay visible in classic search engines, but not have AI crawlers scrape your content for training.
- In the Examples tab click Load into Generator on Block AI & Scraper Bots.
- The template adds an open
User-agent: *block with an emptyDisallow:(everyone allowed) — and below it a dedicated block withDisallow: /for the AI crawlersGPTBot,ChatGPT-User,OAI-SearchBot,CCBot,anthropic-ai,Claude-Web,Bytespider, andPerplexityBot. - Add missing bots as needed. Via Add User-agent Block you can add more AI crawlers — the autocomplete field suggests, among others,
ClaudeBot,Google-Extended,Applebot-Extended,meta-externalagent, andcohere-ai. Set aDisallow: /rule in each new block. - Copy/Download, upload, done.
Two honest notes: first, not every bot respects robots.txt — it's a voluntary policy, not a technical barrier. Second, blocking Google-Extended (Gemini training) does not exclude Google's normal search; that keeps running via Googlebot. Which bots serve which purpose is in the tool's Reference table.
Example 3: Build an XML sitemap by hand
For a smaller site without a sitemap plugin, you build the file right in the tool.
- Switch to the Sitemap tab. Enter your domain at the top in Base URL (e.g.
https://your-domain.com). - Via Add URL add one row per page. Per row: the Path (e.g.
/,/services/,/contact/), an optional Last Modified date, a Freq., and a Priority. - Set priority relatively. Common:
1.0for the homepage,0.8–0.9for important section pages,0.5for static subpages. It's a ranking within your site, not a ranking factor against other websites. - Only set
lastmodwhen you actually know the date. A made-up modification date does more harm than good — otherwise leave the field empty. - Drag rows into the order you want by the grip handle. Then Copy or Download and place the file as
sitemap.xmlin your web root.
Tip: submit the sitemap afterwards in the Google Search Console and the Bing Webmaster Tools — the note under the editor reminds you of this too.
Example 4: Check a foreign domain's robots.txt
You want to know how an existing site (your own or a competitor's) steers its crawlers.
- Go to the Check robots.txt tab, enter a domain (e.g.
example.com—/robots.txtis appended automatically), and click Check. - Read the Per-Bot Access table first. It shows
AllowedorBlockedat a glance for 40-plus known bots. The Source column reveals where the decision comes from:Specific(a dedicated block for exactly this bot),Wildcard *(via theUser-agent: *block), orDefault(no matching rule — so allowed). - Watch the AI rows (type
AI): does the site lock outGPTBot,ClaudeBot,PerplexityBotand co.? That's exactly the information you also need for your own GEO strategy. - Test a specific URL. In the Test a URL against these rules block you enter a path (e.g.
/blog/my-article) and a user-agent (e.g.Googlebot) — the tool tells you whether the file allows or blocks access and which rule applies. - Under Sitemaps declared are the sitemaps linked in the file. Via the check button next to each you jump straight into sitemap checking (see the next example).
Example 5: Validate a sitemap and check spec limits
You want to make sure your sitemap.xml is clean and complete.
- In the Check Sitemap tab, enter a domain or full sitemap URL and click Check. (Or you arrived here directly via the check button from the robots.txt check.)
- Watch the spec warnings. The tool warns as soon as the file exceeds 50,000 URLs or 50 MB — then you have to split into multiple files plus an index file.
- Read the metadata coverage table. It shows with progress bars at what percentage of URLs
lastmod,changefreq, andpriorityare set. Lowlastmodcoverage isn't inherently bad — but if you want to signal freshness, there's room here. - In the URLs table you see the first 100 entries. Via Copy all URLs you get the complete list (beyond 100 too) into your clipboard — handy for reconciling with your CMS.
Example 6: Break down a sitemap index file
Large sites spread their URLs across multiple sitemaps bundled by an index file.
- In the Check Sitemap tab, enter the URL of the index file (often
…/sitemap_index.xml) and click Check. - The tool automatically detects that this is a sitemap index file and lists the child sitemaps with their optional
lastmod. - Drill into individual child sitemaps. Next to each is a Check button that loads that file and analyzes it in detail (URL count, metadata coverage, spec warnings).
That way you work from the index down to the individual URL — without having to open the files yourself. (To build a sitemap, by the way, use the generator from Example 3; the generator's import button deliberately handles only individual sitemaps, not index files.)
There's more depth: the overview for the big picture, the manual for every option, and the tips & tricks for strategy and pitfalls. You can try all of it directly in the tool.