Appearance
Generative DOM's Markdown Subset — What's In, What's Out
Generative DOM is not a CommonMark implementation. It is a streaming markdown renderer built for LLM output, which means it optimises for a specific subset of the spec — the features LLMs actually emit and humans actually read — and intentionally omits the rest.
As of the current release, Generative DOM passes 316 / 652 CommonMark spec cases (~48%). The skipped 336 cases are deliberate design decisions, not bugs. The CommonMark handbook at generativeui.ru/solutions/generative-dom/handbook/ shows the exact test-by-test breakdown.
This page summarises what that means in practice.
Supported
Everything in this table ships with the core + one of the seven standard @generative-dom/plugin-markdown-* packages. If you register all seven, you have the full supported surface.
| Feature | Syntax | Plugin |
|---|---|---|
| Paragraphs | Plain text separated by blank lines | @generative-dom/plugin-markdown-base |
| Line breaks | Newline inside a paragraph → soft break | @generative-dom/plugin-markdown-base |
| Hard breaks | \ at end of line → <br> | @generative-dom/plugin-markdown-base |
| Horizontal rule | ---, ***, ___ on their own line | @generative-dom/plugin-markdown-base |
| ATX headings | # through ###### at line start | @generative-dom/plugin-markdown-heading |
| Fenced code blocks | ```lang ... ``` with optional language tag | @generative-dom/plugin-markdown-code |
| Blockquotes | > at line start | @generative-dom/plugin-markdown-quote |
| Unordered lists | -, +, or * at line start | @generative-dom/plugin-markdown-list |
| Ordered lists | 1., 2., ... at line start | @generative-dom/plugin-markdown-list |
| Pipe tables | ` | col |
| Bold | **text** or __text__ | @generative-dom/plugin-markdown-inline |
| Italic | *text* or _text_ | @generative-dom/plugin-markdown-inline |
| Strikethrough | ~~text~~ | @generative-dom/plugin-markdown-inline |
| Inline code | `code` | @generative-dom/plugin-markdown-inline |
| Links | [text](https://url) | @generative-dom/plugin-markdown-link |
| Images |  | @generative-dom/plugin-markdown-link |
| Autolinks | <https://example.com> | @generative-dom/plugin-markdown-link |
Link URLs are whitelist-checked to https:, http:, mailto:. Image URLs to https:, http:. Other schemes (javascript:, data:, file:, custom schemes) render the link as text with no href.
Intentionally omitted
These are CommonMark features Generative DOM does not parse. Each has a reason; several are on the roadmap but none are blocking.
| Feature | Why it's omitted |
|---|---|
Setext headings (=== / --- under text) | Ambiguous mid-stream — a line of --- could be a heading underline or a horizontal rule until the next line arrives. ATX headings disambiguate immediately. |
| Indented code blocks (4-space indent) | Conflicts with list-item indentation in ways that hurt more than they help. Fenced code blocks are always unambiguous. |
Reference-style links ([text][id] + [id]: url) | Require forward lookups to a definition block that may arrive later. Doesn't stream. LLMs rarely use them. |
HTML passthrough (raw <div>, <span>, <p>, etc.) | Architectural — Generative DOM is innerHTML-free by design. See Emitting HTML / Custom Elements for the whitelisted custom-element alternative. |
| Raw HTML paragraphs (HTML blocks per CommonMark §4.6) | Same as above. All 64 HTML-block spec cases are deliberately skipped. |
<details> / <summary> | Not on the custom-element whitelist. A heading + short paragraph is usually better streaming UX than a collapsible block. |
Footnotes ([^1] + [^1]: text) | Forward-lookup problem. Also vanishingly rare in LLM output. |
Definition lists (term\n: definition) | Not in CommonMark core; GFM and some other extensions define them differently. Use a two-column table. |
GFM task lists (- [ ] / - [x]) | Planned. Track progress on the roadmap. |
| HTML entities beyond a small whitelist | The entities.ts module supports the common named entities (&, <, >, ", &#x hex). Rare entities render as text. |
| Emphasis with complex flanking rules | Partial — basic **bold** and *italic* work; CommonMark's full left-flanking/right-flanking rules (132 emphasis spec cases) are partially implemented. |
Added beyond CommonMark
Things Generative DOM does that CommonMark doesn't.
| Addition | What it does | Plugin |
|---|---|---|
| Strikethrough | ~~text~~ (GFM) | @generative-dom/plugin-markdown-inline |
| Pipe tables | GFM tables, with streaming row-by-row rendering | @generative-dom/plugin-markdown-table |
Custom elements (md-*) | Whitelisted md-clock, md-plot, md-counter, user-extensible | @generative-dom/plugin-custom-elements |
| Interactive elements | md-button, md-toggle, md-input with event emission | @generative-dom/plugin-interactive |
| Event-only tags | <progress>, <status>, <milestone> — fire events, produce no DOM | @generative-dom/plugin-events |
| Syntax highlighting | Tokenises fenced-code content by language (no runtime dep on highlight.js or Prism) | @generative-dom/plugin-highlight |
| Streaming-safe incremental parsing | The entire raison d'être — push(chunk) + incremental AST diff + DOM patching | core |
Why these specific exclusions
Why no raw HTML
Two reasons, neither negotiable.
Security. Once you accept raw HTML from a model, you accept the possibility of <script>, <iframe src="evil.com">, <img onerror="...">, and a hundred other XSS vectors. The standard defence is a sanitizer (DOMPurify, sanitize-html). Sanitizers are correct most of the time — the attack surface is a moving target (mutation XSS, parser differentials between sanitizer and browser, newly-spec'd elements) and sanitizer bugs are a regular source of CVEs. Generative DOM skips the class of bug entirely by never having HTML to sanitize. The worst a model can do is produce literal text that looks like a tag.
Performance. The naive streaming markdown approach — concat tokens, innerHTML = parse(buffer) on every chunk — is O(N²) because the entire buffer gets re-parsed every time. Generative DOM's AST-diff-and-patch machinery only works on a structured token stream with stable positions. Raw HTML blocks, which can contain arbitrary nested markdown, break the tokenizer's stability assumption.
The whitelisted custom-element mechanism is the escape hatch. It gives you real web components without opening the HTML floodgate.
Why no reference-style links
LLMs almost never emit them. Check a hundred ChatGPT or Claude responses; you'll find inline links throughout and zero reference-style links. Supporting a feature the target workload doesn't use is complexity for no benefit.
Reference-style links also don't stream well. See the [spec][1] for details. leaves [spec][1] in an ambiguous state until the [1]: https://... definition arrives — which might be on the next line, or at the bottom of the document, or never. Generative DOM renders tokens the moment they're complete; waiting on a forward reference breaks that model.
Why no setext headings
md
My heading
==========is ambiguous mid-stream. When Generative DOM sees My heading\n, it has to decide: render this as a paragraph (because there's no #, so it's not an ATX heading) or wait to see if the next line is ===== (in which case it's a setext heading)?
Waiting creates a visible lag. Rendering as a paragraph and then retroactively upgrading to a heading creates a visible jump.
ATX headings (# My heading) commit immediately. No ambiguity, no lag, no flicker. For an LLM target, the tradeoff is obvious.
Why incomplete emphasis support
CommonMark's emphasis rules are famously intricate — 132 spec cases covering left-flanking delimiters, right-flanking delimiters, Unicode whitespace classes, punctuation handling, and overlapping * / _ patterns. The Generative DOM inline matcher implements the common cases (**bold**, *italic*, nested with plain text) and punts on the corners.
This is an honest gap. Some pathological inputs render differently from CommonMark reference implementations. For typical LLM output this never comes up; for documents written by humans who lean on *foo *bar* style patterns, it occasionally does. If you need full CommonMark emphasis compliance, Generative DOM is the wrong tool.
For the full picture
- The authoritative in/out breakdown is the CommonMark handbook at generativeui.ru/solutions/generative-dom/handbook/. Every one of the 652 spec cases is run; you can see which pass, which skip, and what Generative DOM does with each.
- The per-section counts live in
docs/commonmark.mdanddocs/commonmark-audit.mdin the repo. - If you hit a missing feature that matters for your use case, the plugin system is how you fix it. Every piece of syntax in Generative DOM is a plugin — paragraphs, headings, everything — and nothing prevents you from writing a reference-link or setext-heading plugin on top.
Back to the guide
- LLM Integration Guide — index.
- Writing System Prompts for Generative DOM — teaching the model the supported subset.
- Example System Prompts — six worked examples.
- Emitting HTML / Custom Elements — the custom-element whitelist in depth.