joinToken is short-lived (recommended 5–15 minutes) JWT scoped to meeting:{meetingId} and a transcribe capability.Authorization: Bearer <token> header.{
"type": "handshake",
"clientId": "ui-abc-123",
"capabilities": ["partial","final","diarization","punctuation"],
"lastSeenSegmentId": null // set to resume from this segment
}Server will reply with hello including negotiated features and server time.
{
"type": "partial_transcript",
"segmentId": "string", // unique id for the segment (stable)
"isFinal": false,
"text": "string", // interim unfinalized text
"speakerId": "string|null", // may be null until diarization
"confidence": 0.0, // optional
"startTime": 1625.23, // seconds from meeting start
"endTime": 1625.50, // current end estimate
"timestamp": "2025-10-15T09:05:12Z"
}{
"type": "final_transcript",
"segmentId": "string",
"isFinal": true,
"text": "We will refactor the API and push changes by Friday.",
"speakerTd": "user_1",
"confidence": 0.94,
"startTime": 1625.23,
"endTime": 1628.01,
"metadata": { "punctuated": true, "model":"faster-whisper-v2" },
"source": { "recording_position_ms": 162523 } // optional mapping
}{
"type": "speaker_map",
"mappings": [
{ "speakerId":"spk_1", "participantId":"p_12", "displayName":"Jane" },
{ "speakerId":"spk_2", "participantId":"p_4", "displayName":"Abdo" }
],
"timestamp":"2025-10-15T09:06:00Z"
}{
"type": "ack",
"ackType": "annotation",
"clientMsgId": "uuid",
"serverId": "ann_123",
"timestamp": "2025-10-15T09:07:12Z"
}{
"type": "error",
"code": "MODEL_OVERLOAD",
"message": "STT node overloaded; partials may be delayed",
"retryAfterSeconds": 5
}{ "type": "annotation", "clientMsgId":"uuid","annotationType":"keyMoment","note":"Decision: freeze scope","timestamp":"2025-10-15T09:10:02Z" }{ "type":"requestSpeakerMap", "clientMsgId":"uuid", "hints": { "participantAudioRefs":["p_1","p_2"] } }segmentId with isFinal: true.segmentId is stable across partial → final updates. Use it to merge text in UI.startTime; when ties, by server timestamp. Do not rely on arrival order.speakerId to participant later. UIs should display a “Speaker X” placeholder until mapping arrives.lastSeenSegmentId in handshake; server will replay last N segments (configurable; usually last 30s–120s).MODEL_OVERLOAD. Apply UI rate-limiting (drop or coalesce intermediate partials) to maintain responsiveness.