Skip to content

Appendix B — Divergences from CommonMark

This appendix enumerates every deliberate difference between MdFlow v1.0 and CommonMark v0.31.2. MdFlow's policy is aspire-identical + explicit divergence: where we do not diverge, conforming output MUST match CommonMark. Every divergence has a rationale rooted in one of the MdFlow pillars: streaming, safety, or scope.

Format: each entry cites the CommonMark § it diverges from, the MdFlow behavior, and the rationale.

B.1 Raw HTML passthrough

  • CommonMark §6.6 (Raw HTML): permits <tag>...</tag> and HTML comments to pass through verbatim to output.
  • MdFlow §8.1: not permitted. Such sequences are literal text.
  • Rationale: Safety. Raw HTML is incompatible with the security guarantees of §10. Users seeking HTML features MUST use custom elements (md-*) which have a safety grammar.

B.2 Setext headings

  • CommonMark §4.3 (Setext headings): supports underline-style headings (=== / --- under text).
  • MdFlow §6: not supported. Only ATX (# prefix) headings.
  • Rationale: Streaming. Setext headings require look-ahead beyond the current line, complicating pending-token management. Community usage of ATX dominates.
  • CommonMark §4.7 + §6.3 (Link reference definitions): supports [label]: url title definitions and [text][label] references.
  • MdFlow §7.6: not supported in v1.0. Inline links only.
  • Rationale: Streaming. Forward references (text [x] before definition of [x]: url) are unsound in a streaming model without rewriting. Deferred to v1.1 with a scoped rewriting rule.

B.4 Numeric HTML entities

  • CommonMark §6.2 (Entity and numeric character references): supports &#xhh; and &#nnn;.
  • MdFlow §7.11: only a fixed named set is recognized.
  • Rationale: Scope. The entity table is large and its semantics deserve a dedicated working group. Fixed named set covers 95%+ of practical use; extensions MAY register more.

B.5 Emphasis algorithm

  • CommonMark §6.2 (Emphasis and strong emphasis): uses a delimiter-stack algorithm with complex flanking and punctuation rules.
  • MdFlow §7.4: uses a greedy-matcher algorithm with a simplified underscore-flanking rule.
  • Rationale: Streaming & performance. The delimiter stack requires buffering delimiter runs until the paragraph closes; the greedy matcher decides at each position. Divergences manifest only in highly-nested corner cases (e.g., *foo**bar*baz**). These cases are enumerated in the conformance suite's commonmark-overlap/ subdirectory.

B.6 Hard breaks in headings

  • CommonMark §6.9 (Hard line breaks): permits hard breaks inside headings.
  • MdFlow §7.9: hard breaks are suppressed in Heading content; the source \n within a heading is literal whitespace.
  • Rationale: Safety & simplicity. Headings as semantic navigation anchors should not contain hard breaks; rendered <br> inside <h1> is a screen-reader annoyance.

B.7 Code block info string

  • CommonMark §4.5 (Fenced code blocks): info string may contain any characters except the fence character.
  • MdFlow §6.5: info string is further trimmed; only the first whitespace-delimited token is stored in CodeBlock.lang. The remainder is discarded (not stored as meta).
  • Rationale: Scope. Meta-string semantics are controversial and deferred; extensions wishing to carry metadata should use custom- element wrappers.

B.8 Table support

  • CommonMark: no table support in core (v0.31.2). GFM tables are an extension.
  • MdFlow §6.8: pipe tables are core.
  • Rationale: Usage. Tables are ubiquitous in LLM output and technical documentation; excluding them forces every implementation to re-invent the feature.

B.9 Strikethrough

  • CommonMark: no strikethrough in core. GFM extension.
  • MdFlow §7.5: ~~text~~ is core inline syntax.
  • Rationale: Same as tables; universal in modern markdown-producing systems.
  • CommonMark §6.5: autolinks accept any scheme matching [a-zA-Z][a-zA-Z0-9+.-]*.
  • MdFlow §7.8: autolinks additionally apply the URL scheme filter of §10.3.1. <javascript:alert(1)> renders as inert anchor.
  • Rationale: Safety.

B.11 HTML entities in URLs

  • CommonMark §6.3: entity references inside link destinations are decoded before output.
  • MdFlow §7.6: no entity decoding in URLs. Raw bytes only (plus the percent-decoding described in §10.3.1.1 for evasion resistance).
  • Rationale: Safety & predictability.

B.12 Tab expansion

  • CommonMark §2.2: tabs are expanded to 4-column stops in all contexts.
  • MdFlow §4.4.1: tab expansion applies only at block-start contexts (indent determination). Tabs elsewhere are preserved.
  • Rationale: Simplification; tabs inside text are rare and expansion affects only visual layout.

B.13 Unicode case folding

  • CommonMark §2.2: no general case folding of markdown syntax.
  • MdFlow: identical — no Unicode case folding of markdown syntax. Noted here for explicit parity.

B.14 Minor formatting divergences

The following are cosmetic, not semantic:

  • MdFlow canonical HTML uses double-quoted attributes; CommonMark's reference also uses double quotes — parity.
  • MdFlow does not emit a trailing newline after block elements; CommonMark dingus does. Canonical form is normalized for vector comparison.

B.15 Not a divergence: custom elements

MdFlow's md-* custom elements are a capability not present in CommonMark, not a divergence. See §8.

B.16 Not a divergence: streaming

MdFlow's streaming model is additive over CommonMark's batch model. Class S implementations can always answer "what would a CommonMark parser produce for the full input?" by implementing Class B atop them.