Making-of: My SEO and GEO Tool

I always wanted an SEO tool that gets to the essentials fast and without detours — no subscription trap, no login, no twelve dashboards to click through just to answer a simple question: is this page technically clean, and is it understood by classic search engines and by AI systems? So I built it. This article is the making-of: the motivations, the architecture, and above all the real calculation logic behind it — with code you can lift verbatim. The finished tool runs for free at jpkc.com/tools/seo/.

Why build my own tool at all

After a good 25 years on the web I know all the SEO suites. They can do a lot, cost money every month, and hide the two or three numbers that actually matter behind mountains of features I never touch 90 % of. For a quick look at a single URL — "is the foundation solid?" — that's like renting an excavator to dig a hole for a tomato plant.

What I was missing was a tool with three properties:

Instant and account-free. URL in, analysis out. No registration, no tracking, no "14 days free, then €99/month".
Honestly weighted. Not 200 micro-signals all equally loud, but the things up front that I judge — from experience — to matter for ranking and citation.
SEO and GEO. Classic search engine optimization and Generative Engine Optimization in one pass — because in 2026 both decide visibility. What GEO even is, I covered in depth in What is GEO?.

The decisive advantage of "build it yourself": I get to expose how the scoring works. With the big tools the score is a black box. Here every point is traceable — and that's exactly what I'll show you now.

The architecture: proxy plus browser

The first fundamental decision was technical. An SEO analysis needs to see a page's raw HTTP traffic: status code, redirect chain, response headers, SSL certificate, timing. You can't do that from the browser — the same-origin policy and CORS forbid loading a foreign page including its headers. So it needed a server-side proxy in PHP that fetches the target page and hands the result back to the front end.

A proxy that fetches arbitrary URLs is, however, a classic SSRF risk (server-side request forgery): someone could request http://127.0.0.1/ or internal cloud metadata endpoints and reach things that were never meant to be public. That's why the proxy validates every URL — including every single redirect hop — against allowlist logic before it touches it:

function isPublicUrl( string $url, ?string &$error = null ): bool {

    if ( ! filter_var( $url, FILTER_VALIDATE_URL ) ) {
        $error = 'Invalid URL format';
        return false;
    }

    // http/https only — no file://, gopher://, dict:// etc.
    if ( ! str_starts_with( $url, 'http://' ) && ! str_starts_with( $url, 'https://' ) ) {
        $error = 'Only http/https URLs are allowed';
        return false;
    }

    $host     = parse_url( $url, PHP_URL_HOST );
    $bareHost = trim( (string) $host, '[]' );   // IPv6 arrives in [brackets]

    // Resolve hostname and check EVERY returned address
    $ips = @gethostbynamel( $bareHost ) ?: [];

    foreach ( $ips as $ip ) {
        if ( ! isPublicIpAddr( $ip ) ) {       // blocks 127.0.0.0/8, 10/8, 192.168/16, ::1 …
            $error = 'IP address is not publicly routable';
            return false;
        }
    }

    return ! empty( $ips );
}

The important part isn't the happy path but that the hostname resolution happens before the request and all resolved IPs are checked — otherwise a DNS record pointing at 10.0.0.5 would slip past the block. The redirect chain is re-validated hop by hop, because a harmless public URL must not point to something internal via a 302 redirect.

Everything that comes after — parsing the HTML, counting headings, reading schema — then happens client-side in the browser again, via the native DOMParser. That keeps the server lean: it only fetches bytes, the actual analysis engine is JavaScript. This split — a minimal, hardened PHP proxy plus a thick JS analyzer — is the backbone of the whole tool.

The core principle: a check is just four values

Before a single SEO value gets computed, I needed a shape that every check fits into — from the HTTPS tick to readability. I settled on four values, and that's perhaps the most important design decision in the whole tool, because it keeps everything else simple:

// A check always produces: points earned, a status, a hint.
function scoreCheck( name, maxPoints, earned, status, hint ) {
	return { name, maxPoints, earned, status, hint: hint || "" };
}
// status ∈ { "pass", "partial", "fail" }

status is deliberately three-valued, not binary. The half step partial is the reason the score feels "fair": a page with a 65-character title hasn't "failed", it's just not optimal — so it earns partial points, not zero. That's exactly what most pass/fail checkers get wrong.

With this building block every single check becomes a small, isolated function I can understand and test independently. Let me show you four of them in detail.

SEO score: four examples from practice

The SEO score consists of around 40 such checks across eight categories (Technical, On-Page, Crawling, Social, Performance, Accessibility, Content). Here are four that illustrate the principle well.

Title length: partial credit instead of black and white

The page title is one of the strongest on-page signals. Too short wastes context, too long gets truncated in search. I score it in three tiers — optimal, acceptable, off:

function checkTitleLength( title ) {
	const len = title ? title.length : 0;

	const optimal    = len >= 50 && len <= 60;   // ideal display length
	const acceptable = len >= 30 && len <= 70;   // usable, but not ideal

	return scoreCheck(
		"Title (50–60 chars)",
		8,
		optimal ? 8 : acceptable ? 4 : 0,
		optimal ? "pass" : acceptable ? "partial" : "fail",
		title ? `${len} characters` : "No <title> tag found"
	);
}

Title length	Status	Points (of 8)
50–60 chars	pass	8
30–70 chars	partial	4
anything else / missing	fail	0

Eight points for the title, because it genuinely contributes a lot to click-through. The weight is an experience call, not a universal truth — but those are exactly the calls I wanted to make myself instead of leaving them to someone else's algorithm.

TTFB: tiered scoring instead of one threshold

Time to first byte shows the same idea even more clearly. A single cutoff would be brutal — 199 ms full, 201 ms nothing. Instead there are two tiers:

function checkTtfb( ttfbMs ) {
	if ( ttfbMs === null ) {
		return scoreCheck( "TTFB < 500 ms", 4, 0, "fail", "No timing data" );
	}

	const earned = ttfbMs < 200 ? 4 : ttfbMs < 500 ? 2 : 0;
	const status = ttfbMs < 200 ? "pass" : ttfbMs < 500 ? "partial" : "fail";

	return scoreCheck( "TTFB < 500 ms", 4, earned, status, `${ttfbMs} ms` );
}

Under 200 ms is excellent (full 4 points), under 500 ms still fine (2 points), beyond that too slow. Why TTFB matters at all and how it connects to the Core Web Vitals is in Core Web Vitals & Performance.

Security headers: count instead of checking one by one

Some checks are a question of quantity. With security headers, what matters is less which one exactly is missing and more whether the page takes the topic seriously at all. So I count how many out of a list of relevant headers are set:

const SECURITY_HEADERS = [
	"Strict-Transport-Security", "Content-Security-Policy",
	"X-Frame-Options", "X-Content-Type-Options",
	"Referrer-Policy", "Permissions-Policy",
	"Cross-Origin-Opener-Policy", "Cross-Origin-Resource-Policy",
];

function checkSecurityHeaders( headers ) {
	const present = SECURITY_HEADERS.filter( h => Boolean( headers[ h ] ) ).length;

	const earned = present >= 4 ? 3 : present >= 2 ? 1 : 0;
	const status = present >= 4 ? "pass" : present >= 2 ? "partial" : "fail";

	return scoreCheck( "Security headers ≥ 4", 3, earned, status,
		`${present}/8 security headers present` );
}

Four or more out of eight: full point. Two to three: half credit. Below that: room to improve. This "count the good signals" logic shows up in several places in the tool — for Open Graph tags, for semantic HTML elements, for the AI crawlers.

Readability: the language-aware Flesch formula

This is my favorite part, because there's real linguistics in the code here. Readability can be measured — via the Flesch Reading Ease. The formula relates average sentence length to average syllables per word. The catch: the constants only hold for English. German words are longer and richer in syllables; with the English formula every German text would look artificially "hard".

That's why the tool detects the language from <html lang> and computes with the matching variant. For German that's the Amstad formula:

// Average sentence length (ASL) and syllables per word (ASW)
const asl = wordCount / Math.max( 1, sentenceCount );
const asw = syllableCount / Math.max( 1, wordCount );

let flesch;
if ( lang === "de" ) {
	flesch = 180 - asl - 58.5 * asw;            // Amstad (German)
} else if ( lang === "fr" ) {
	flesch = 207 - 1.015 * asl - 73.6 * asw;    // Kandel-Moles (French)
} else if ( lang === "es" ) {
	flesch = 206.84 - 1.02 * asl - 60.0 * asw;  // Fernández Huerta (Spanish)
} else {
	flesch = 206.835 - 1.015 * asl - 84.6 * asw; // Original Flesch (English)
}

const score = Math.round( Math.min( 100, Math.max( 0, flesch ) ) );

Counting syllables is itself language-dependent. For German words, umlauts and ß are phonetically replaced first, then vowel groups are counted:

function countSyllablesDE( word ) {
	word = word.toLowerCase();
	if ( word.length <= 3 ) return 1;

	// Resolve umlauts/ß phonetically so vowel groups are correct
	word = word.replace( /ä/g, "ae" ).replace( /ö/g, "oe" )
		.replace( /ü/g, "ue" ).replace( /ß/g, "ss" )
		.replace( /[^a-z]/g, "" );

	const groups = word.match( /[aeiouy]{1,2}/g );   // contiguous vowels ≈ one syllable
	return groups ? Math.max( 1, groups.length ) : 1;
}

It's a heuristic, not a perfect hyphenator — but for a relative readability assessment across a whole text it's entirely sufficient. The resulting value is translated into tiers (90+ "Very Easy", 60–69 "Standard", below 30 "Very Difficult") and feeds two points into the content score. Why readable writing matters specifically for AI systems is something I go into in Writing for AI.

In the same step the top keywords with density fall out as a byproduct — after subtracting stopwords in several languages:

const freq = {};
for ( const w of words ) {
	const lower = w.toLowerCase();
	if ( lower.length >= 3 && ! STOPWORDS[ lower ] ) {
		freq[ lower ] = ( freq[ lower ] || 0 ) + 1;
	}
}

const topKeywords = Object.keys( freq )
	.sort( ( a, b ) => freq[ b ] - freq[ a ] )
	.map( word => ( {
		word,
		count: freq[ word ],
		density: Math.round( freq[ word ] / wordCount * 1000 ) / 10,   // percent, 1 decimal
	} ) );

From check to score: the sum

Once all checks have run, the overall score is trivial — and that's the point. Because every check knows its own maxPoints and earned, the aggregation is a single loop:

function totalScore( checks ) {
	let earned = 0, max = 0;
	for ( const c of checks ) {
		earned += c.earned;
		max    += c.maxPoints;
	}
	return Math.round( earned / max * 100 );   // 0–100
}

From that, finally, comes a letter grade: A from 90, B from 80, C from 60, D from 40, otherwise F. No magic, no secret weighting in the background — all the "intelligence" lives in the individual, readable check functions. That's exactly how I wanted it: traceable down to the single point.

GEO score: why a second, independent value

This is where it gets interesting, because it's the actual reason I picked the tool up again in 2026. Classic SEO tells you whether Google likes your page. But whether Claude, ChatGPT or Perplexity can cleanly read, attribute and cite your content is a different question — with different signals. That's why there's a second, entirely independent GEO score that never influences the SEO score, and vice versa. A page can be technically excellent and AI-blind, or the other way around.

The GEO score uses the same scoreCheck building block but inspects different things.

Schema variety: recursing through the @graph

Structured data is gold for AI citations (background: Structured Data & Technical GEO). But it's not enough whether JSON-LD is there — what's interesting is how many different types are marked up. Modern pages nest this in a @graph, so I collect the @type values recursively:

function collectSchemaTypes( node, types = new Set() ) {
	if ( ! node || typeof node !== "object" ) return types;

	const t = node[ "@type" ];
	if ( Array.isArray( t ) ) t.forEach( x => types.add( x ) );
	else if ( t ) types.add( t );

	for ( const key of Object.keys( node ) ) {
		if ( key === "@context" ) continue;
		const val = node[ key ];
		if ( val && typeof val === "object" ) collectSchemaTypes( val, types );  // dig deeper
	}
	return types;
}

function checkSchemaVariety( jsonLdBlocks ) {
	const types = new Set();
	jsonLdBlocks.forEach( block => collectSchemaTypes( block, types ) );
	const n = types.size;

	return scoreCheck( "Schema variety", 3,
		n >= 3 ? 3 : n >= 2 ? 2 : n >= 1 ? 1 : 0,
		n >= 3 ? "pass" : n >= 1 ? "partial" : "fail",
		n > 0 ? `${n} type(s): ${[ ...types ].slice( 0, 5 ).join( ", " )}` : "No schema types" );
}

Three or more types — say Article plus Person plus BreadcrumbList — speak for a page that's been deliberately marked up to be machine-readable. That's exactly what the check rewards.

llms.txt and the AI crawlers

Two GEO checks revolve around whether you let AI systems in at all. The tool automatically fetches the llms.txt from the domain root (the Markdown file that explains a page's purpose and structure to an AI) and checks the robots.txt specifically against the common AI crawlers — not against the generic User-agent: *:

const AI_CRAWLERS = [
	"GPTBot", "ChatGPT-User", "OAI-SearchBot", "Google-Extended",
	"ClaudeBot", "anthropic-ai", "PerplexityBot", "CCBot",
];

function checkAiCrawlers( robotsTxt, pageUrl ) {
	const blocked = AI_CRAWLERS.filter(
		bot => ! isAllowedForAgent( robotsTxt, pageUrl, bot )
	);

	const earned = blocked.length === 0 ? 3
		: blocked.length <= 3 ? 2
		: blocked.length <= 6 ? 1 : 0;

	return scoreCheck( "AI crawlers allowed", 3, earned,
		earned === 3 ? "pass" : earned >= 1 ? "partial" : "fail",
		blocked.length ? `blocked: ${blocked.join( ", " )}` : "All major AI crawlers allowed" );
}

It's a deliberately neutral check: it scores allowing them positively, because the tool is built from the "I want to appear in AI answers" perspective. Anyone who wants to protect content from training still reads, in black and white, which bots they're currently locking out — and that's just as valuable.

Short paragraphs: what LLMs prefer to read

One last, small but telling example: AI systems extract from short, focused paragraphs more reliably than from walls of text. So the tool measures the average paragraph length:

function checkParagraphLength( wordCount, paragraphCount ) {
	if ( ! paragraphCount ) {
		return scoreCheck( "Short paragraphs (avg < 150 words)", 2, 0, "fail", "No paragraphs detected" );
	}
	const avg = Math.round( wordCount / paragraphCount );

	return scoreCheck( "Short paragraphs (avg < 150 words)", 2,
		avg < 150 ? 2 : avg < 250 ? 1 : 0,
		avg < 150 ? "pass" : avg < 250 ? "partial" : "fail",
		`${paragraphCount} paragraphs, avg ${avg} words` );
}

Under 150 words per paragraph: full point. It's the same recommendation I make in Writing for AI — just cast here as a measurable number.

What's deliberately missing

Just as important as what's in it is what I left out. There is no account, no tracking, no crawl-budget monitoring across thousands of URLs, no rank-tracking history. That's intent, not laziness. The tool answers one question — "is this one page clean for search and AI?" — and does it in seconds. Anything that doesn't directly serve that question would only have made it slower, more complicated and more in need of explanation. YAGNI as a feature: the omission is the product.

FAQ

Does the tool cost anything?

No. The SEO & GEO analyzer is free and usable without an account. There's no registration, no subscription and no tracking. Enter a URL, read the analysis — that was the whole point of the exercise from the start.

What's the difference between the SEO and the GEO score?

The SEO score rates classic search engine fitness: HTTPS, title, meta description, headings, load time, security headers and the like. The GEO score rates AI fitness: structured data, llms.txt, whether AI crawlers are allowed, clean heading hierarchy and short paragraphs. The two values are independent — a page can shine at one and fail the other.

How accurate is the score?

Every point is traceably tied to a concrete, tested rule — no black box. Thresholds like "title 50–60 chars" or "TTFB < 200 ms" are established best practices, and the weighting rests on 25 years of practice. The score is a well-founded assessment, not a law of nature: it reliably shows where a page stands and what to tackle next — but it doesn't replace editorial judgment about the content itself.

Because the point of the tool is speed and honesty. A login would be a hurdle in front of the actual task, and saved history would have meant tracking and data retention — both of which I explicitly did not want. Anyone who wants to keep an analysis can export the result as JSON.