Skip to content

Generative DOM's Markdown Subset — What's In, What's Out

Generative DOM is not a CommonMark implementation. It is a streaming markdown renderer built for LLM output, which means it optimises for a specific subset of the spec — the features LLMs actually emit and humans actually read — and intentionally omits the rest.

As of the current release, Generative DOM passes 316 / 652 CommonMark spec cases (~48%). The skipped 336 cases are deliberate design decisions, not bugs. The CommonMark handbook at generativeui.ru/solutions/generative-dom/handbook/ shows the exact test-by-test breakdown.

This page summarises what that means in practice.

Supported

Everything in this table ships with the core + one of the seven standard @generative-dom/plugin-markdown-* packages. If you register all seven, you have the full supported surface.

FeatureSyntaxPlugin
ParagraphsPlain text separated by blank lines@generative-dom/plugin-markdown-base
Line breaksNewline inside a paragraph → soft break@generative-dom/plugin-markdown-base
Hard breaks\ at end of line → <br>@generative-dom/plugin-markdown-base
Horizontal rule---, ***, ___ on their own line@generative-dom/plugin-markdown-base
ATX headings# through ###### at line start@generative-dom/plugin-markdown-heading
Fenced code blocks```lang ... ``` with optional language tag@generative-dom/plugin-markdown-code
Blockquotes> at line start@generative-dom/plugin-markdown-quote
Unordered lists-, +, or * at line start@generative-dom/plugin-markdown-list
Ordered lists1., 2., ... at line start@generative-dom/plugin-markdown-list
Pipe tables`col
Bold**text** or __text__@generative-dom/plugin-markdown-inline
Italic*text* or _text_@generative-dom/plugin-markdown-inline
Strikethrough~~text~~@generative-dom/plugin-markdown-inline
Inline code`code`@generative-dom/plugin-markdown-inline
Links[text](https://url)@generative-dom/plugin-markdown-link
Images![alt](https://url)@generative-dom/plugin-markdown-link
Autolinks<https://example.com>@generative-dom/plugin-markdown-link

Link URLs are whitelist-checked to https:, http:, mailto:. Image URLs to https:, http:. Other schemes (javascript:, data:, file:, custom schemes) render the link as text with no href.

Intentionally omitted

These are CommonMark features Generative DOM does not parse. Each has a reason; several are on the roadmap but none are blocking.

FeatureWhy it's omitted
Setext headings (=== / --- under text)Ambiguous mid-stream — a line of --- could be a heading underline or a horizontal rule until the next line arrives. ATX headings disambiguate immediately.
Indented code blocks (4-space indent)Conflicts with list-item indentation in ways that hurt more than they help. Fenced code blocks are always unambiguous.
Reference-style links ([text][id] + [id]: url)Require forward lookups to a definition block that may arrive later. Doesn't stream. LLMs rarely use them.
HTML passthrough (raw <div>, <span>, <p>, etc.)Architectural — Generative DOM is innerHTML-free by design. See Emitting HTML / Custom Elements for the whitelisted custom-element alternative.
Raw HTML paragraphs (HTML blocks per CommonMark §4.6)Same as above. All 64 HTML-block spec cases are deliberately skipped.
<details> / <summary>Not on the custom-element whitelist. A heading + short paragraph is usually better streaming UX than a collapsible block.
Footnotes ([^1] + [^1]: text)Forward-lookup problem. Also vanishingly rare in LLM output.
Definition lists (term\n: definition)Not in CommonMark core; GFM and some other extensions define them differently. Use a two-column table.
GFM task lists (- [ ] / - [x])Planned. Track progress on the roadmap.
HTML entities beyond a small whitelistThe entities.ts module supports the common named entities (&amp;, &lt;, &gt;, &quot;, &#x hex). Rare entities render as text.
Emphasis with complex flanking rulesPartial — basic **bold** and *italic* work; CommonMark's full left-flanking/right-flanking rules (132 emphasis spec cases) are partially implemented.

Added beyond CommonMark

Things Generative DOM does that CommonMark doesn't.

AdditionWhat it doesPlugin
Strikethrough~~text~~ (GFM)@generative-dom/plugin-markdown-inline
Pipe tablesGFM tables, with streaming row-by-row rendering@generative-dom/plugin-markdown-table
Custom elements (md-*)Whitelisted md-clock, md-plot, md-counter, user-extensible@generative-dom/plugin-custom-elements
Interactive elementsmd-button, md-toggle, md-input with event emission@generative-dom/plugin-interactive
Event-only tags<progress>, <status>, <milestone> — fire events, produce no DOM@generative-dom/plugin-events
Syntax highlightingTokenises fenced-code content by language (no runtime dep on highlight.js or Prism)@generative-dom/plugin-highlight
Streaming-safe incremental parsingThe entire raison d'être — push(chunk) + incremental AST diff + DOM patchingcore

Why these specific exclusions

Why no raw HTML

Two reasons, neither negotiable.

Security. Once you accept raw HTML from a model, you accept the possibility of <script>, <iframe src="evil.com">, <img onerror="...">, and a hundred other XSS vectors. The standard defence is a sanitizer (DOMPurify, sanitize-html). Sanitizers are correct most of the time — the attack surface is a moving target (mutation XSS, parser differentials between sanitizer and browser, newly-spec'd elements) and sanitizer bugs are a regular source of CVEs. Generative DOM skips the class of bug entirely by never having HTML to sanitize. The worst a model can do is produce literal text that looks like a tag.

Performance. The naive streaming markdown approach — concat tokens, innerHTML = parse(buffer) on every chunk — is O(N²) because the entire buffer gets re-parsed every time. Generative DOM's AST-diff-and-patch machinery only works on a structured token stream with stable positions. Raw HTML blocks, which can contain arbitrary nested markdown, break the tokenizer's stability assumption.

The whitelisted custom-element mechanism is the escape hatch. It gives you real web components without opening the HTML floodgate.

LLMs almost never emit them. Check a hundred ChatGPT or Claude responses; you'll find inline links throughout and zero reference-style links. Supporting a feature the target workload doesn't use is complexity for no benefit.

Reference-style links also don't stream well. See the [spec][1] for details. leaves [spec][1] in an ambiguous state until the [1]: https://... definition arrives — which might be on the next line, or at the bottom of the document, or never. Generative DOM renders tokens the moment they're complete; waiting on a forward reference breaks that model.

Why no setext headings

md
My heading
==========

is ambiguous mid-stream. When Generative DOM sees My heading\n, it has to decide: render this as a paragraph (because there's no #, so it's not an ATX heading) or wait to see if the next line is ===== (in which case it's a setext heading)?

Waiting creates a visible lag. Rendering as a paragraph and then retroactively upgrading to a heading creates a visible jump.

ATX headings (# My heading) commit immediately. No ambiguity, no lag, no flicker. For an LLM target, the tradeoff is obvious.

Why incomplete emphasis support

CommonMark's emphasis rules are famously intricate — 132 spec cases covering left-flanking delimiters, right-flanking delimiters, Unicode whitespace classes, punctuation handling, and overlapping * / _ patterns. The Generative DOM inline matcher implements the common cases (**bold**, *italic*, nested with plain text) and punts on the corners.

This is an honest gap. Some pathological inputs render differently from CommonMark reference implementations. For typical LLM output this never comes up; for documents written by humans who lean on *foo *bar* style patterns, it occasionally does. If you need full CommonMark emphasis compliance, Generative DOM is the wrong tool.

For the full picture

  • The authoritative in/out breakdown is the CommonMark handbook at generativeui.ru/solutions/generative-dom/handbook/. Every one of the 652 spec cases is run; you can see which pass, which skip, and what Generative DOM does with each.
  • The per-section counts live in docs/commonmark.md and docs/commonmark-audit.md in the repo.
  • If you hit a missing feature that matters for your use case, the plugin system is how you fix it. Every piece of syntax in Generative DOM is a plugin — paragraphs, headings, everything — and nothing prevents you from writing a reference-link or setext-heading plugin on top.

Back to the guide