LLM Integration Guide

A practical guide to wiring Generative DOM into LLM-driven user interfaces — chat, copilots, streaming dashboards, and anything else that pipes tokens from a model into a DOM container.

Why streaming markdown is its own problem

Rendering a finished markdown string is easy. Rendering markdown that's still arriving, one chunk at a time, while the user is reading it, is not. Three failure modes dominate the naive approach (concatenate tokens, call container.innerHTML = parse(buffer) on every chunk):

Flicker. Replacing innerHTML tears down and rebuilds every child node on every chunk. Cursors blink. Transitions restart. Images re-fetch.
Lost selection and scroll. If the user highlighted a sentence or scrolled up to re-read something, the DOM swap wipes that state. On long responses this is the difference between "I can actually read this" and "I'll wait for it to finish."
O(N²) parse cost. Every chunk re-parses the entire accumulated buffer. By chunk #500, the renderer is slower than the model.

Generative DOM's benchmarks against a 55 KB / 1001-line corpus, chunked 564 × 100 chars:

Scenario	Generative DOM	Naive `innerHTML`	Ratio
Full stream	49 ms	4 793 ms	~98x faster
Per-chunk latency (median, mid-stream)	0.10 ms	9.22 ms	~89x faster

The gap grows with response length, because the naive baseline's cost is quadratic and Generative DOM's is linear. Full methodology in the repo's BENCHMARKS.md.

One-shot renders are a different story

If you render markdown once, from a complete string, on page load — Generative DOM is slower than innerHTML (~2.6x). The incremental machinery is pure overhead when there is nothing to be incremental about. Use marked + DOMPurify for static content. Generative DOM earns its weight from chunk #2 onward.

Who this guide is for

Developers building chat UIs (user sends a message, model streams a reply).
Developers building AI copilots embedded in larger apps (IDE sidebars, writing assistants, support widgets).
Developers building live dashboards where an agent continuously streams updates (status boards, log summarisers, trading desks).

You already know how to call an LLM and get a token stream out. This guide is about what happens between that stream and the screen.

How this guide is organised

This is the index page. The four companion pages go deeper on specific aspects:

Writing System Prompts for Generative DOM — the handful of rules an LLM needs to know about the target renderer. Token-shape discipline, the supported markdown subset, the custom element whitelist, what the sanitizer strips.
Example System Prompts — six worked examples (docs assistant, code tutor, live dashboard, data report, interactive tutorial, minimal chat) with the full prompt, a sample exchange, and annotations.
Emitting HTML / Custom Elements from LLMs — how the @generative-dom/plugin-custom-elements whitelist works, every tag and its attributes, and the prompt fragments that get models to use them correctly.
Generative DOM's Markdown Subset — What's In, What's Out — the delta between CommonMark and what Generative DOM actually parses, with the rationale for each omission and addition.

Read them in order the first time. After that they're a reference — jump to the page that matches your current problem.

The 30-second integration

Here's the complete pattern for streaming an OpenAI-style endpoint into Generative DOM. Plain fetch + ReadableStream, no SDK required.

import { GenerativeDom } from '@generative-dom/core';
import { markdownBase } from '@generative-dom/plugin-markdown-base';
import { markdownInline } from '@generative-dom/plugin-markdown-inline';
import { markdownHeading } from '@generative-dom/plugin-markdown-heading';
import { markdownCode } from '@generative-dom/plugin-markdown-code';
import { markdownList } from '@generative-dom/plugin-markdown-list';
import { markdownLink } from '@generative-dom/plugin-markdown-link';

const md = new GenerativeDom({
  container: document.getElementById('output')!,
  plugins: [
    markdownBase(),
    markdownInline(),
    markdownHeading(),
    markdownCode(),
    markdownList(),
    markdownLink(),
  ],
});

const response = await fetch('https://api.openai.com/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
  },
  body: JSON.stringify({
    model: 'gpt-4o-mini',
    stream: true,
    messages: [
      { role: 'system', content: 'You render into a streaming markdown UI. Use short paragraphs, fenced code blocks with language tags, and tables for structured data. Do not emit raw HTML.' },
      { role: 'user', content: 'Explain how TCP handshakes work.' },
    ],
  }),
});

const reader = response.body!.pipeThrough(new TextDecoderStream()).getReader();
let leftover = '';

while (true) {
  const { value, done } = await reader.read();
  if (done) break;

  const chunk = leftover + value;
  const lines = chunk.split('\n');
  leftover = lines.pop() ?? '';

  for (const line of lines) {
    if (!line.startsWith('data: ')) continue;
    const payload = line.slice(6).trim();
    if (payload === '[DONE]') continue;
    const delta = JSON.parse(payload).choices?.[0]?.delta?.content;
    if (delta) md.push(delta);
  }
}

md.flush();

Four things to notice:

One push() per delta. Generative DOM batches internally via rAF + debounce — you do not need to throttle at the call site.
flush() at the end. Forces a final render pass and commits any buffered-but-not-yet-drawn tokens. The scheduler will also flush on its own timer, but calling it explicitly on stream close keeps the "last line" from landing a frame late.
System prompt matters. That one sentence in the system role is what keeps the model on the supported subset. The next page expands this into a full template.
SSE line buffering. The leftover variable handles the case where a chunk boundary splits an SSE line in half. Forget this and you'll occasionally JSON-parse {"choices":[{"delta":{"content": and crash.

Framework wrappers

The code above targets the core API directly. If you're in React, Vue, Svelte, Angular, Lit, or Astro, the wrapper packages (@generative-dom/react, @generative-dom/vue, etc.) hide the ref and lifecycle plumbing — push() and flush() still look the same. See the per-framework READMEs under packages/adapters/.

What to read next

Start with Writing System Prompts for Generative DOM. The quality of an Generative DOM-rendered response is bounded by how well the model understands the target format, and the model only knows what you tell it.

LLM Integration Guide ​

Why streaming markdown is its own problem ​

Who this guide is for ​

How this guide is organised ​