Skip to content

8. HTML & Custom Elements

8.1 Scope of HTML in MdFlow

MdFlow does not permit raw HTML passthrough. Any sequence that resembles an HTML tag outside the custom-element grammar of this chapter MUST be treated as literal text subject to the inline rules of §7.

In particular, the sequences <script>, <style>, <iframe>, <embed>, <object>, <form>, and <input> are NEVER recognized as elements. They render as escaped text (&lt;script&gt; etc.) in HTML output, or as Text nodes in the AST.

This is an intentional divergence from CommonMark, which permits raw HTML passthrough; see Appendix B.

8.2 Tag whitelist

A custom element has a tag matching the pattern:

[a-z][a-z0-9]* '-' [a-z0-9-]*

i.e., a leading ASCII lowercase letter, followed by ASCII alphanumeric, then a hyphen, then ASCII alphanumeric or hyphens. This matches the HTML Custom Elements specification's tag name requirement.

Additionally, a custom element is recognized by MdFlow only if its tag is registered in the implementation's tag whitelist. Implementations MUST maintain:

  • A block whitelist: tags admissible at block level.
  • An inline whitelist: tags admissible inside inline content.
  • An event whitelist: tags that produce event records instead of DOM (§12.1).

A tag MAY appear in multiple whitelists. Tags not in any whitelist are literal text.

Core specification defines no specific custom-element tags; all specific tags are defined in extension specifications (e.g., spec/extensions/md-clock.md). This chapter specifies the mechanism only.

8.3 Element forms

Three forms are recognized:

8.3.1 Void (self-closing)

<tag attr1="value" attr2="value"/>

The trailing / MUST be present in source. Produces either a leaf CustomBlock / CustomInline (no children) or, for event-whitelisted tags, an event record.

8.3.2 Open/close

<tag attr="value">content</tag>

Content is parsed per the tag's content model. Block-level custom blocks contain block children; inline custom inlines contain inline children.

Tags MUST be case-matched: <md-foo></md-foo> matches; <md-foo></MD-FOO> does NOT (MdFlow tag matching is case-sensitive, lowercase-only).

8.3.3 Open without close

An open tag with no matching close is a parse error; the implementation MUST emit the opening as literal text via the fallback of §7.14.

8.4 Attribute grammar

An attribute has the form:

name ('=' value)?
value = quoted | unquoted
quoted = '"' [^"]* '"' | "'" [^']* "'"
unquoted = [^\s"'=<>`]+

PEG sketch (see also §14):

attr      = space+ name ('=' value)?
name      = [a-z_:] [a-z0-9_:.-]*
value     = '"' dqchar* '"' / "'" sqchar* "'" / unquoted
dqchar    = ![\"] .
sqchar    = !['] .
unquoted  = [^ \t\n\r\"\'=<>`]+

8.4.1 Name filter

Attribute names MUST satisfy:

  • Match ^[a-z_:][a-z0-9_:.-]*$ (ASCII lowercase after any case-folding).
  • Not match ^on (event handler attributes REJECTED).
  • Not begin with data-mdflow- (reserved namespace, §3.5).

Names failing the filter cause the whole attribute to be silently dropped. Surrounding element parse continues.

8.4.2 Value filter

Attribute values MUST be escaped per §10.3.2 when emitted in HTML output. Text attribute values contain UTF-8 characters; the filter is an emission-time concern, not a parsing-time rejection.

8.4.3 URL-valued attributes

Extensions MAY declare that specific attribute names hold URLs (e.g., src, href, data). For declared-URL attributes, the value MUST pass the URL scheme filter of §10.3.1. Non-declared attributes are treated as opaque text.

Core specification declares no URL-valued custom-element attributes; all such declarations live in extension specifications.

8.4.4 Duplicate attributes

If the same name appears more than once on a single element, the last occurrence wins. No error is raised.

8.5 Content model

The content model of a custom block is the block content model: children may be blocks of any kind (recursive).

The content model of a custom inline is the inline content model: children may be inlines of any kind.

Extensions MAY restrict the content model further (e.g., <md-plot> content may be only <md-plot-series/>); such restrictions are per-extension and are enforced by the extension's parser, not the core.

8.6 Whitespace within elements

Leading and trailing whitespace inside an open/close element is preserved in inline contexts and is part of the first/last child's text. In block contexts, whitespace between the tag and the first block child is normalized: any sequence of whitespace starts a new block context.

8.7 Parsing ordering

Custom elements compete with inline and block parsers. Typical plugin priorities:

  • Custom block plugin: 50
  • Custom inline plugin: 50 (inline priority context)
  • Event element plugin: 40

Because these are lower than core block/inline priorities, a sequence like [link](url)<md-x/> parses the link first, then the custom inline.

8.8 Nested custom elements

Custom elements MAY contain other custom elements, subject to each outer element's content model. Nesting depth is not bounded by this specification; implementations MAY impose a safe upper bound (e.g., 64) and treat exceedance as a parse error per §11.4.1.

8.9 Errors during custom-element parsing

Malformed custom elements (unclosed, mismatched, invalid attributes) MUST be handled per §11. They MUST NOT crash the pipeline and MUST NOT produce unsafe DOM.

8.10 Extension registration

A tag is registered by a plugin declaring the tag in its blockTags, inlineTags, or eventTags arrays at plugin creation time. The core merges all registered tag sets into the three whitelists of §8.2. Duplicate registrations across plugins MUST resolve by plugin priority order; later lower-priority plugins do not override earlier registrations.