Skip to content

5. Streaming Model

This chapter defines the contract between an incremental source producer (e.g., an LLM, a network stream, a file reader) and a Class S implementation. Class B implementations MAY skip this chapter except where cross-referenced.

5.1 Source model

An MdFlow source is an immutable UTF-8 byte sequence. A chunk is any contiguous prefix-extending byte sequence of the source. The source producer emits chunks by calling:

push(chunk: bytes) → backpressure: boolean

The producer MAY call push zero or more times, then calls:

flush() → void

to signal end-of-input. After flush(), no further push calls MAY occur on the same instance.

5.2 Chunk boundary independence

A conforming implementation's final output MUST NOT depend on the chunk boundaries chosen by the producer. Formally: for any source S split into chunk sequences C1 and C2, the output after consuming C1 followed by flush() MUST be byte-identical in canonical form to the output after consuming C2 followed by flush(), provided concat(C1) = concat(C2) = S.

Rationale. Producers MUST NOT be burdened with knowing markdown lexical boundaries. An implementation MUST be able to handle a chunk split mid-word, mid-token, mid-UTF-8-codepoint, or between any two bytes.

5.3 Pending vs. complete tokens

During processing, a token exists in one of two states:

  • complete — the token's slice is final; no further source byte can change its kind or slice start.
  • pending — the token's slice end MAY still grow; its kind MAY still change subject to the constraints in §5.4.

The implementation MUST classify every token. A complete token MUST NOT transition back to pending.

5.3.1 Examples

  • A Heading token for # Title\n is complete once the terminating newline has been consumed.
  • A CodeBlock token that has seen ```js\n and some body bytes but no closing ``` is pending; its slice end and its content grow with every new chunk.
  • A Paragraph token is pending until a blank line or a higher-priority block opener is seen.
  • A Table token is pending as long as subsequent non-blank lines could extend it with additional rows.

5.4 Incremental idempotency

For any split of source S into a prefix P and a suffix Q (S = P ++ Q), let T(X) denote the token sequence produced after consuming X followed by flush(). Let T'(P) denote the token sequence after consuming P without flushing. A conforming implementation MUST satisfy:

REQ-IDEMP-1. Every token in T'(P) that is classified complete MUST appear in T(S) with the same kind, the same slice start, the same set of complete sub-tokens (kinds and slice starts), and a slice end that is either identical to the one in T'(P) or that differs only by subsequent appended bytes that belong to still-pending sub-tokens. Informally: complete tokens are frozen.

REQ-IDEMP-2. The order of complete tokens in T'(P) MUST be a prefix of the order of tokens in T(S).

REQ-IDEMP-3. After flush(), every pending token MUST be finalized (transitioned to complete) according to the end-of-input rules of its kind.

Rationale. These requirements permit renderers to commit DOM for complete tokens incrementally without rework. Only pending tokens' DOM MAY be replaced on subsequent chunks.

5.5 Flush semantics

flush() MUST:

  1. Mark end-of-input. Any pending token is finalized per its kind's finalization rule.
  2. Emit all DOM operations for newly-complete tokens.
  3. Be idempotent: a second flush() call is a no-op.
  4. Not consume additional source (no push may occur after flush).

Implementations MAY additionally expose an explicit flush bypass that forces the scheduler to emit outstanding DOM ops without finalizing pending tokens — this is OPTIONAL and has no normative output semantics.

5.6 Back-pressure

push() MAY return a boolean back-pressure indicator. When true, the producer SHOULD wait before pushing more data. The specification does not mandate a particular signaling mechanism (e.g., Promise, callback, event), only that:

REQ-BP-1. If back-pressure is signaled, the implementation MUST continue to accept further push() calls without loss; back-pressure is advisory, not mandatory.

REQ-BP-2. Back-pressure state MUST NOT affect the canonical output.

5.7 Scheduling

Implementations MAY batch DOM operations across multiple push() calls using a scheduler (e.g., requestAnimationFrame in browsers, microtask queue, etc.). Scheduling MUST preserve:

REQ-SCHED-1. The order of DOM operations matches the order of completed tokens.

REQ-SCHED-2. Between two flush() calls, observable DOM state is monotonic — nodes already rendered for a complete token MAY be appended to by subsequent operations for later tokens, but MUST NOT be removed or reordered except as permitted by INV-5.

REQ-SCHED-3. An explicit flush() MUST cause all outstanding scheduled operations to be drained synchronously before returning.

5.8 Sub-token patching

For pending tokens whose content grows monotonically (e.g., CodeBlock, Paragraph), implementations SHOULD emit append-at-tail DOM patches rather than replacing the entire node. This is a performance RECOMMENDED optimization, not a conformance requirement.

Vectors that test streaming correctness compare canonical output, not DOM operation sequences.

5.9 Byte-boundary and UTF-8 handling

A chunk MAY end mid-codepoint. The implementation MUST buffer incomplete UTF-8 byte sequences internally and MUST NOT emit a token whose slice includes an incomplete codepoint.

After flush(), any remaining incomplete codepoint MUST be treated as the Unicode replacement character U+FFFD in the output, per the Unicode standard's W3C-stable substitution practice.

5.10 Concurrency

A single MdFlow instance is not required to be thread-safe. A producer SHOULD ensure that push() and flush() calls are serialized by the caller.

Multiple independent instances MAY process unrelated sources concurrently; the specification places no constraint on inter-instance behavior.

5.11 Memory bounds

An implementation MUST NOT retain a complete copy of consumed source beyond what is required to resolve pending tokens. In particular:

REQ-MEM-1. Bytes that belong to complete tokens with no pending ancestors MAY be released.

REQ-MEM-2. The buffered pending prefix is bounded by the maximum length a single pending token may attain under any plugin mix.

Implementations SHOULD expose a high-water mark for diagnostic purposes. This is OPTIONAL.

5.12 Error recovery during streaming

See §11 Error Handling & Recovery. In summary: a plugin error during token processing MUST NOT terminate the stream; the pipeline MUST recover by treating the offending input as opaque text and continue processing subsequent chunks.