Inside Quarkdown: The Compilation Pipeline in Six Stages

This part is for the curious. You don't need any of it to use Quarkdown — but if you want to understand how a .qd file ends up as HTML, here's the look behind the scenes. After the AI part, it gets technical: Quarkdown processes your source in a sequential pipeline of six stages, where the output of each stage is the input of the next. (The architecture comes from the author's bachelor's thesis — here in short form.)

1. Lexing: text becomes tokens

Like breaking a sentence into individual words first, the lexer scans the source — a pure sequence of characters — and splits it into tokens: small pieces with type, position and content. Markdown knows two categories: block tokens (paragraph, list, heading, code, quote — the outer structure) and inline tokens (bold, italic, links, images — formatting within a block). There are two lexers for this; function calls are recognized both as block and inline. First only the block lexer runs — the outer blocks then go to the parser.

2. Parsing: tokens become a tree

The parser organizes the tokens into an Abstract Syntax Tree (AST) — a tree structure whose elements are called nodes. A heading, a paragraph with bold text and a list become:

AstRoot
├─ Heading(depth=1) → Text("Title")
├─ Paragraph → Text("This is ") + Strong("bold") + Text(" text")
└─ UnorderedList → ListItem → Paragraph → Text("Item 1")

The trick is recursive parsing: for each block token the parser re-triggers lexing on its inner content, parses the inner tokens again, and so on — until nothing nested remains.

3. Function-call expansion: where the magic happens

Among the nodes there's a special one: the FunctionCallNode. It's the only mutable node — initially without children, later populated by the function-call expander. Quarkdown functions always return a type-checked Value, and a value-node mapper translates it into a renderable node: a StringValue becomes text, a BooleanValue a checkbox, a DictionaryValue a table, a collection a list.

For each call the function is looked up in the loaded libraries, arguments are bound to parameters and the function is executed. Here you see how Quarkdown's dynamic typing meets the statically typed Kotlin stdlib: on binding, a ValueFactory converts dynamic arguments into the static parameter type — if that fails, there's an error. This is exactly the boundary between "your" Markdown script and the compiled engine.

4. Tree traversal: understanding the document

Now the finished tree is traversed depth-first once to gather cross-cutting information: the heading hierarchy for the table of contents, the numbering of each element, and the binding of link references to their definitions. For performance there's only one pass — each task attaches its own hook to the iterator, which fires on matching node types.

5. Rendering: the tree becomes the target format

The enriched AST is translated — again traversed depth-first — into the target format. Each node produces its output, ideally one to one:

<h1>Title</h1>
<p>This is <strong>bold</strong> text</p>
<ul><li>Item 1</li></ul>

To make this scale, each render target lives in its own module (quarkdown-html, quarkdown-plaintext) that plugs into the core — that's how further target formats are added down the line.

6. Post-rendering: the whole document

The render result is only the <body> content — metadata, styling and runtime are still missing. The post-renderer (for HTML the HtmlDocumentBuilder via the kotlinx.html DSL) builds the full document around it: content into the <body>, the right scripts and stylesheets depending on the document type, title, page format and fonts — and loads KaTeX only if a formula occurs at all. In the end the pipeline returns a bundle of HTML, stylesheets (global + layout theme + color theme) and runtime scripts, which the CLI writes to files.

Bonus: how live preview works

The live preview from part 8 is a neat piece of engineering. The built-in web server holds a WebSocket connection (/reload) to the browser; after each compilation the CLI sends a message, the server broadcasts it, the browser reloads. So the reload doesn't flicker, Quarkdown uses double buffering with two iframes: while A stays visible, the update renders into B; once it's ready, it switches over — including restoring the scroll position. The same principle your GPU uses to draw frames.

FAQ

Do I need to know the pipeline to use Quarkdown?

No, not at all. This part is pure background for the technically interested. For everyday work, everything from parts 1 through 9 is enough.

Why is the `FunctionCallNode` the only mutable node?

Because its content only arises after the tree is built — namely by executing the function. All other nodes are fixed on creation; the function call, by contrast, only fills its children in the expansion stage.

Can Quarkdown output formats other than HTML?

Yes — rendering and post-rendering live in interchangeable modules. Besides HTML there's already a plaintext renderer; the architecture is deliberately designed to dock further targets.