Appearance
Appendix B — Divergences from CommonMark
This appendix enumerates every deliberate difference between MdFlow v1.0 and CommonMark v0.31.2. MdFlow's policy is aspire-identical + explicit divergence: where we do not diverge, conforming output MUST match CommonMark. Every divergence has a rationale rooted in one of the MdFlow pillars: streaming, safety, or scope.
Format: each entry cites the CommonMark § it diverges from, the MdFlow behavior, and the rationale.
B.1 Raw HTML passthrough
- CommonMark §6.6 (Raw HTML): permits
<tag>...</tag>and HTML comments to pass through verbatim to output. - MdFlow §8.1: not permitted. Such sequences are literal text.
- Rationale: Safety. Raw HTML is incompatible with the security guarantees of §10. Users seeking HTML features MUST use custom elements (
md-*) which have a safety grammar.
B.2 Setext headings
- CommonMark §4.3 (Setext headings): supports underline-style headings (
===/---under text). - MdFlow §6: not supported. Only ATX (
#prefix) headings. - Rationale: Streaming. Setext headings require look-ahead beyond the current line, complicating pending-token management. Community usage of ATX dominates.
B.3 Reference-style link definitions
- CommonMark §4.7 + §6.3 (Link reference definitions): supports
[label]: url titledefinitions and[text][label]references. - MdFlow §7.6: not supported in v1.0. Inline links only.
- Rationale: Streaming. Forward references (text
[x]before definition of[x]: url) are unsound in a streaming model without rewriting. Deferred to v1.1 with a scoped rewriting rule.
B.4 Numeric HTML entities
- CommonMark §6.2 (Entity and numeric character references): supports
&#xhh;and&#nnn;. - MdFlow §7.11: only a fixed named set is recognized.
- Rationale: Scope. The entity table is large and its semantics deserve a dedicated working group. Fixed named set covers 95%+ of practical use; extensions MAY register more.
B.5 Emphasis algorithm
- CommonMark §6.2 (Emphasis and strong emphasis): uses a delimiter-stack algorithm with complex flanking and punctuation rules.
- MdFlow §7.4: uses a greedy-matcher algorithm with a simplified underscore-flanking rule.
- Rationale: Streaming & performance. The delimiter stack requires buffering delimiter runs until the paragraph closes; the greedy matcher decides at each position. Divergences manifest only in highly-nested corner cases (e.g.,
*foo**bar*baz**). These cases are enumerated in the conformance suite'scommonmark-overlap/subdirectory.
B.6 Hard breaks in headings
- CommonMark §6.9 (Hard line breaks): permits hard breaks inside headings.
- MdFlow §7.9: hard breaks are suppressed in
Headingcontent; the source\nwithin a heading is literal whitespace. - Rationale: Safety & simplicity. Headings as semantic navigation anchors should not contain hard breaks; rendered
<br>inside<h1>is a screen-reader annoyance.
B.7 Code block info string
- CommonMark §4.5 (Fenced code blocks): info string may contain any characters except the fence character.
- MdFlow §6.5: info string is further trimmed; only the first whitespace-delimited token is stored in
CodeBlock.lang. The remainder is discarded (not stored asmeta). - Rationale: Scope. Meta-string semantics are controversial and deferred; extensions wishing to carry metadata should use custom- element wrappers.
B.8 Table support
- CommonMark: no table support in core (v0.31.2). GFM tables are an extension.
- MdFlow §6.8: pipe tables are core.
- Rationale: Usage. Tables are ubiquitous in LLM output and technical documentation; excluding them forces every implementation to re-invent the feature.
B.9 Strikethrough
- CommonMark: no strikethrough in core. GFM extension.
- MdFlow §7.5:
~~text~~is core inline syntax. - Rationale: Same as tables; universal in modern markdown-producing systems.
B.10 Autolink scheme filter
- CommonMark §6.5: autolinks accept any scheme matching
[a-zA-Z][a-zA-Z0-9+.-]*. - MdFlow §7.8: autolinks additionally apply the URL scheme filter of §10.3.1.
<javascript:alert(1)>renders as inert anchor. - Rationale: Safety.
B.11 HTML entities in URLs
- CommonMark §6.3: entity references inside link destinations are decoded before output.
- MdFlow §7.6: no entity decoding in URLs. Raw bytes only (plus the percent-decoding described in §10.3.1.1 for evasion resistance).
- Rationale: Safety & predictability.
B.12 Tab expansion
- CommonMark §2.2: tabs are expanded to 4-column stops in all contexts.
- MdFlow §4.4.1: tab expansion applies only at block-start contexts (indent determination). Tabs elsewhere are preserved.
- Rationale: Simplification; tabs inside text are rare and expansion affects only visual layout.
B.13 Unicode case folding
- CommonMark §2.2: no general case folding of markdown syntax.
- MdFlow: identical — no Unicode case folding of markdown syntax. Noted here for explicit parity.
B.14 Minor formatting divergences
The following are cosmetic, not semantic:
- MdFlow canonical HTML uses double-quoted attributes; CommonMark's reference also uses double quotes — parity.
- MdFlow does not emit a trailing newline after block elements; CommonMark dingus does. Canonical form is normalized for vector comparison.
B.15 Not a divergence: custom elements
MdFlow's md-* custom elements are a capability not present in CommonMark, not a divergence. See §8.
B.16 Not a divergence: streaming
MdFlow's streaming model is additive over CommonMark's batch model. Class S implementations can always answer "what would a CommonMark parser produce for the full input?" by implementing Class B atop them.