Appearance
3. Document Model
3.1 Overview
An MdFlow document is modeled as an ordered tree of nodes. A conforming implementation MUST produce a tree semantically equivalent to the one defined in this chapter, though it MAY use any internal representation.
The pipeline from source to output is:
source (UTF-8 bytes)
→ chunks (§5)
→ tokens (§3.3)
→ AST nodes (§3.4)
→ DOM nodes (§3.5)All four stages are specified so that intermediate representations can be inspected for testing and for plugin integration.
3.2 Definitions
- A node is a tree vertex with a kind, an optional content payload, and an ordered sequence of child nodes.
- The root is the topmost node. Its kind is
Document. It has no parent. - A block is any node whose kind is listed in §6.1.
- An inline is any node whose kind is listed in §7.1.
- A leaf is a node with no children. Every
Textnode is a leaf. - A container is a non-leaf node.
Documentand every block that accepts children are containers.
3.3 Token layer
During parsing, source is first reduced to a linear sequence of tokens. A token has:
- a kind (one of the block or inline kinds),
- a slice — the half-open byte range
[start, end)in the source, - a state — one of
pendingorcomplete(§5), - zero or more sub-tokens (used for streaming — e.g., a table row contains sub-tokens for each cell).
The token layer is the unit of incremental processing. Re-parsing a chunk MUST produce a token sequence that is a suffix-addition of the previous sequence — no previously-complete token may change kind, slice start, or existing sub-token structure (see §5.4).
3.4 AST layer
Tokens are assembled into an Abstract Syntax Tree (AST) whose nodes are defined by this specification.
3.4.1 Document root
Document {
children: Block*
}3.4.2 Block node types
See §6.1 for the authoritative catalogue. Summary:
Paragraph { children: Inline* }Heading { level: 1–6, children: Inline* }ThematicBreak {}(leaf)CodeBlock { lang: string?, content: string }(leaf)BlockQuote { children: Block* }List { ordered: boolean, start: integer?, tight: boolean, children: ListItem* }ListItem { children: Block* }Table { align: Align[], header: TableRow, rows: TableRow* }TableRow { children: TableCell* }TableCell { children: Inline* }CustomBlock { tag: string, attrs: Attr[], children: Block* }(see §8)
3.4.3 Inline node types
Text { content: string }(leaf)Emphasis { children: Inline* }Strong { children: Inline* }Strikethrough { children: Inline* }Code { content: string }(leaf)Link { url: URL, title: string?, children: Inline* }Image { url: URL, alt: string, title: string? }(leaf)LineBreak { hard: boolean }(leaf)CustomInline { tag: string, attrs: Attr[], children: Inline* }
3.4.4 Attributes
Attr { name: string, value: string }Attribute value types are strings; coercion (boolean / number / enum) is consumer-specific and out of scope for the AST layer.
3.4.5 Invariants
The following invariants MUST hold in every AST produced by a conforming parser:
- INV-1. Blocks contain only blocks, inlines, or (for
CodeBlock) raw text. No block may contain another block of an inline-only kind. - INV-2. The
Heading.levelfield is in the range 1–6. - INV-3. Every
Link.urlandImage.urlhas a scheme admitted by §10.3.1 or is the empty URL""(which MUST render as an inert anchor). - INV-4.
CustomBlockandCustomInlinetags MUST be in the registered whitelist for their context (§8.2). - INV-5. Trees are well-formed: every non-root node has exactly one parent; no cycles.
- INV-6. Inside
CodeBlockandCode, thecontentfield is treated as opaque text — no inline parsing, no escape interpretation other than the escaping performed at the lexical layer.
3.5 DOM layer
When rendering to a DOM, each AST node maps to a DOM subtree by the rules in the block and inline chapters. The renderer MUST NOT introduce any DOM node, attribute, or text content beyond what the AST specifies, with two exceptions:
- The renderer MAY add a
data-mdflow-plugin="<name>"attribute to the root element of each plugin-rendered subtree, for debugging and for ownership-tracking in incremental updates. - The renderer MAY add a
data-mdflow-token-id="<id>"attribute to elements corresponding to a token, for diff tracking.
Both data attributes are informational, not normative output. Vectors that compare DOM output MUST ignore attributes in the data-mdflow-* namespace.
3.6 Serialization to HTML
A renderer targeting HTML string output MUST produce markup semantically equivalent to the DOM output. Whitespace SHOULD be minimal. Attribute quoting style MUST be double-quote. Attribute value escaping MUST follow §10.3.2.
3.7 Canonical form
For comparison purposes (in vectors and in diffs), the canonical HTML form of an AST is defined by the block-by-block and inline-by-inline mapping in this specification. Two renderers that produce the same canonical form are considered to produce equivalent output.
The canonical form does not include data-mdflow-* attributes.