Skip to content

14. Grammar Appendix

This chapter provides partial formal grammar for constructs whose prose definition benefits from a machine-checkable form. MdFlow does not ship a complete formal grammar; where a PEG fragment is given, the fragment is normative and prose is explanatory.

The full collection of PEG fragments lives in src/grammar/index.peg as a single file.

14.1 Notation

PEG rules follow the conventions of Bryan Ford's original PEG paper (2004), using:

  • A / B — ordered choice.
  • A B — sequence.
  • A*, A+, A? — repetition.
  • &A, !A — positive / negative lookahead.
  • [...] — character class.
  • "..." — literal string.

14.2 Lexical fragments

space       = ' ' / '\t'
newline     = '\r\n' / '\n' / '\r'
eol         = newline / !.
blank_line  = space* newline
indent      = space{0,3}

14.3 URL filter

Normative for §10.3.1.

url             = url_with_scheme / relative_url
url_with_scheme = scheme ':' path
scheme          = [a-zA-Z] [a-zA-Z0-9+.-]*
path            = (![\s<>] .)*
relative_url    = path

Host post-processing (strip NULs, fold scheme case, single-layer percent-decode of first :) is prose-normative per §10.3.1.1 and cannot be expressed as a pure PEG rule. Implementations MUST apply the pre-processing to the parsed url_with_scheme.scheme before matching the whitelist.

14.4 Emphasis

Normative for §7.4. This rule captures MdFlow's greedy-matcher semantics; it intentionally diverges from CommonMark's delimiter-stack algorithm.

emphasis      = strong / emph
strong        = "**" !space strong_body "**"
              / "__" !space &underscore_open strong_body "__" &underscore_close
emph          = "*" !space emph_body "*"
              / "_" !space &underscore_open emph_body "_" &underscore_close
strong_body   = (!("**") inline)+
emph_body     = (!("*"/"**") inline)+

underscore_open  = ![A-Za-z0-9] .        // previous char not alphanumeric
underscore_close = ![A-Za-z0-9]          // next char not alphanumeric

The underscore_open and underscore_close lookarounds use host- language context (the char before the opening _ and after the closing _); implementations MAY encode this as a pre-check.

14.5 Code span

Normative for §7.3.

code_span   = run:backticks space_trim content:(!same_run .)* same_run
            { return Code(strip_edges(content)) }
backticks   = "`"+
same_run    = &{ same_length(run) } "`"+
space_trim  = // single leading and trailing space stripped if both present

Normative for §7.6.

link          = "[" label:inline_body "]" "(" dest:link_dest title:link_title? ")"
link_dest     = "<" bracketed_dest ">" / bare_dest
bracketed_dest = (!("<"/">") .)*
bare_dest     = bare_dest_char+
bare_dest_char = !space !["()"] .
              / "(" bare_dest_char* ")"       // balanced parens
              / "\\" [!-~]                    // escape
link_title    = space+ quoted
quoted        = '"' (!'"' .)* '"'
              / "'" (!"'" .)* "'"
              / "(" (!")" .)* ")"

14.7 Custom-element attribute

Normative for §8.4.

attributes    = (space+ attribute)*
attribute     = name:attr_name (space* "=" space* value:attr_value)?
attr_name     = [a-z_:] [a-z0-9_:.-]*
attr_value    = dquoted / squoted / unquoted
dquoted       = '"' (![\"] .)* '"'
squoted       = "'" (![\'] .)* "'"
unquoted      = (![ \t\n\r\"\'=<>`] .)+

Names matching ^on MUST be rejected after parsing (post-filter; cannot be expressed in PEG without negative-lookahead gymnastics).

14.8 Custom element start tag

Normative for §8.3.

tag_name    = [a-z] [a-z0-9]* "-" [a-z0-9-]*
void_tag    = "<" tag:tag_name attrs:attributes space* "/>"
open_tag    = "<" tag:tag_name attrs:attributes space* ">"
close_tag   = "</" tag:tag_name space* ">"

Tag must match the whitelist post-parse.

14.9 Streaming boundary

Normative for §5 Streaming Model.

The streaming boundary is not a syntactic construct but a state transition. A token's PEG rule is classified as streamable if evaluating it on a truncated input can produce a well-formed partial result with a pending marker. The following table summarizes streamable block rules:

Block kindStreamable?Pending while…
Paragraphyesno blank line seen
Headingnoalways complete at LF
ThematicBreaknoalways complete at LF
CodeBlockyesclosing fence not seen
BlockQuoteyeschildren still pending
Listyesnext line could extend
Tableyesnext line could be row
CustomBlockyesclosing tag not seen

An inline's streamable-ness is inherited from its containing block.

14.10 Full grammar file

The above fragments are collected in src/grammar/index.peg, which is the normative artifact. The prose in this chapter is explanatory.

14.11 Grammar maintenance

Grammar fragments MUST stay in sync with prose and with implementation. When prose and grammar disagree, it is an editorial error; one MUST be corrected in a subsequent draft. Per §2.4, a vector disagreeing with both MUST be the authority, and both MUST be updated.