live

Live transcription provides near real-time partial & final transcript segments for an active meeting.

Use cases:

show live captions/subtitles in UI

power live summaries and highlights

enable “search while meeting” and live QA bots

feed AI agents or sidecars (note-taking bots)

Key design principles:

Partial-first: show partial text immediately; rely on final segments for persistence.

Ordered by time: use segment_id + timestamps to resolve ordering.

Resumable: reconnect clients can request replay of recent segments.

Lightweight control channels: send client annotations (key moments) or request speaker mapping.

Auth & tokens

joinToken is short-lived (recommended 5–15 minutes) JWT scoped to meeting:{meetingId} and a transcribe capability.

For web clients, passing token in query string is supported (tokens must be short-lived and over TLS). If you can set headers (Node, some libs), prefer Authorization: Bearer <token> header.

Server validates aud, exp, meetingId, and token scope on connect.

Handshake

After WS open, send a handshake:

{
  "type": "handshake",
  "clientId": "ui-abc-123",
  "capabilities": ["partial","final","diarization","punctuation"],
  "lastSeenSegmentId": null  // set to resume from this segment
}

Server will reply with hello including negotiated features and server time.

Message schemas

All messages are JSON. Below are the canonical schemas (consumer view).

Partial transcript (interim)

{
  "type": "partial_transcript",
  "segmentId": "string",        // unique id for the segment (stable)
  "isFinal": false,
  "text": "string",              // interim unfinalized text
  "speakerId": "string|null",   // may be null until diarization
  "confidence": 0.0,             // optional
  "startTime": 1625.23,         // seconds from meeting start
  "endTime": 1625.50,           // current end estimate
  "timestamp": "2025-10-15T09:05:12Z"
}

Final transcript (authoritative)

{
  "type": "final_transcript",
  "segmentId": "string",
  "isFinal": true,
  "text": "We will refactor the API and push changes by Friday.",
  "speakerTd": "user_1",
  "confidence": 0.94,
  "startTime": 1625.23,
  "endTime": 1628.01,
  "metadata": { "punctuated": true, "model":"faster-whisper-v2" },
  "source": { "recording_position_ms": 162523 } // optional mapping
}

Speaker map / diarization update

{
  "type": "speaker_map",
  "mappings": [
    { "speakerId":"spk_1", "participantId":"p_12", "displayName":"Jane" },
    { "speakerId":"spk_2", "participantId":"p_4", "displayName":"Abdo" }
  ],
  "timestamp":"2025-10-15T09:06:00Z"
}

Annotation confirmation (server ack)

{
  "type": "ack",
  "ackType": "annotation",
  "clientMsgId": "uuid",
  "serverId": "ann_123",
  "timestamp": "2025-10-15T09:07:12Z"
}

Error frame

{
  "type": "error",
  "code": "MODEL_OVERLOAD",
  "message": "STT node overloaded; partials may be delayed",
  "retryAfterSeconds": 5
}

Control messages

{ "type": "annotation", "clientMsgId":"uuid","annotationType":"keyMoment","note":"Decision: freeze scope","timestamp":"2025-10-15T09:10:02Z" }

Request speaker mapping (ask STT to map speaker audio to participants)

{ "type":"requestSpeakerMap", "clientMsgId":"uuid", "hints": { "participantAudioRefs":["p_1","p_2"] } }

Consumer-focused behavior and semantics

Partial messages are ephemeral. Show them as “live captioning” but only persist after receiving the same segmentId with isFinal: true.

Segment identity: segmentId is stable across partial → final updates. Use it to merge text in UI.

Ordering: order by startTime; when ties, by server timestamp. Do not rely on arrival order.

Speaker IDs: may be null initially; diarization (speaker map msg) will map speakerId to participant later. UIs should display a “Speaker X” placeholder until mapping arrives.

Replay after reconnect: send lastSeenSegmentId in handshake; server will replay last N segments (configurable; usually last 30s–120s).

Backpressure: server may send error with MODEL_OVERLOAD. Apply UI rate-limiting (drop or coalesce intermediate partials) to maintain responsiveness.

Acks: client can ack final segments to confirm delivery; server retries events for unacked clients (optional).

Use cases:#

Auth & tokens#

Handshake#

Message schemas#

Partial transcript (interim)#

Final transcript (authoritative)#

Speaker map / diarization update#

Annotation confirmation (server ack)#

Error frame#

Control messages#

Consumer-focused behavior and semantics#

Request