1. Transcription
Cognita
  • Introduction
  • Auth
    • register
      POST
    • login
      POST
    • callback
      GET
    • profile
      GET
    • verify email
      POST
    • refresh
      POST
    • logout
      POST
    • forgot password
      POST
    • reset password
      POST
    • request reset password
      POST
    • update-profile
      PATCH
  • Meetings
    • Transcripts
      • Retrieve transcripts for a meeting
    • upload init
    • Get meeting details
  • Transcription
    • live
    • Get Transcript segments
      GET
    • Create translation for transcript
      POST
  • Organization
    • Billing
      • Get Billing Details
      • Update Plan
    • Org
      • Create Organization
      • Get Organization
      • Get Organizations
      • update organization
      • update settings
      • Delete Organization
      • get organization settings
      • Suspend
      • Resume
    • Retention
      • Set Retention Policy
      • Get Retention Policy
      • Update Retention Policy
    • Domains
      • Allowed Domains
      • Allowed Domains
      • Domain verfying
  • Workspaces
    • Org
      • Get Organization Workspaces
    • Workspace
      • Create Workspace
      • Get Workspace
      • Update Workspace
      • Delete Workspace
    • Users
      • Get Workspace Members
      • Delete Member from Workspace
    • Meetings
      • Start a meeting in workspace
  • configurations
    • feature-flags
    • configs
    • tenant-defaults
  • Schemas
    • Organization
      • OrganizationSettings
      • FeatureFlags
      • RetentionPolicy
      • Organization
      • CreateOrgRequestBody
    • Billing & Subscriptions
      • BillingAccount
      • SubscriptionEntitlements
      • SubscriptionPlan
      • Subscription
      • PaymentMethod
      • Invoice
      • UsageRecords
      • Bill
    • Roles & permissions
      • Permissions
    • Workspace
      • Workspaces
    • Meeting & Recordings
      • Meetings
    • Trascriptions & Summaries
    • Integrations
    • Notifications & Audit logs
    • AI Services
    • Localization
    • Auth & Users
      • user
      • AccessToken
    • ErrorResponse
    • SuccessResponse
    • DeviceInfo
    • Session
  1. Transcription

live

Developing
wss://transcribe.cognita.ai/v1/live
Maintainer:Not configured
Live transcription provides near real-time partial & final transcript segments for an active meeting.

Use cases:#

show live captions/subtitles in UI
power live summaries and highlights
enable “search while meeting” and live QA bots
feed AI agents or sidecars (note-taking bots)
Key design principles:
Partial-first: show partial text immediately; rely on final segments for persistence.
Ordered by time: use segment_id + timestamps to resolve ordering.
Resumable: reconnect clients can request replay of recent segments.
Lightweight control channels: send client annotations (key moments) or request speaker mapping.

Auth & tokens#

joinToken is short-lived (recommended 5–15 minutes) JWT scoped to meeting:{meetingId} and a transcribe capability.
For web clients, passing token in query string is supported (tokens must be short-lived and over TLS). If you can set headers (Node, some libs), prefer Authorization: Bearer <token> header.
Server validates aud, exp, meetingId, and token scope on connect.

Handshake#

After WS open, send a handshake:
{
  "type": "handshake",
  "clientId": "ui-abc-123",
  "capabilities": ["partial","final","diarization","punctuation"],
  "lastSeenSegmentId": null  // set to resume from this segment
}
Server will reply with hello including negotiated features and server time.

Message schemas#

All messages are JSON. Below are the canonical schemas (consumer view).

Partial transcript (interim)#

{
  "type": "partial_transcript",
  "segmentId": "string",        // unique id for the segment (stable)
  "isFinal": false,
  "text": "string",              // interim unfinalized text
  "speakerId": "string|null",   // may be null until diarization
  "confidence": 0.0,             // optional
  "startTime": 1625.23,         // seconds from meeting start
  "endTime": 1625.50,           // current end estimate
  "timestamp": "2025-10-15T09:05:12Z"
}

Final transcript (authoritative)#

{
  "type": "final_transcript",
  "segmentId": "string",
  "isFinal": true,
  "text": "We will refactor the API and push changes by Friday.",
  "speakerTd": "user_1",
  "confidence": 0.94,
  "startTime": 1625.23,
  "endTime": 1628.01,
  "metadata": { "punctuated": true, "model":"faster-whisper-v2" },
  "source": { "recording_position_ms": 162523 } // optional mapping
}

Speaker map / diarization update#

{
  "type": "speaker_map",
  "mappings": [
    { "speakerId":"spk_1", "participantId":"p_12", "displayName":"Jane" },
    { "speakerId":"spk_2", "participantId":"p_4", "displayName":"Abdo" }
  ],
  "timestamp":"2025-10-15T09:06:00Z"
}

Annotation confirmation (server ack)#

{
  "type": "ack",
  "ackType": "annotation",
  "clientMsgId": "uuid",
  "serverId": "ann_123",
  "timestamp": "2025-10-15T09:07:12Z"
}

Error frame#

{
  "type": "error",
  "code": "MODEL_OVERLOAD",
  "message": "STT node overloaded; partials may be delayed",
  "retryAfterSeconds": 5
}

Control messages#

{ "type": "annotation", "clientMsgId":"uuid","annotationType":"keyMoment","note":"Decision: freeze scope","timestamp":"2025-10-15T09:10:02Z" }
Request speaker mapping (ask STT to map speaker audio to participants)
{ "type":"requestSpeakerMap", "clientMsgId":"uuid", "hints": { "participantAudioRefs":["p_1","p_2"] } }

Consumer-focused behavior and semantics#

Partial messages are ephemeral. Show them as “live captioning” but only persist after receiving the same segmentId with isFinal: true.
Segment identity: segmentId is stable across partial → final updates. Use it to merge text in UI.
Ordering: order by startTime; when ties, by server timestamp. Do not rely on arrival order.
Speaker IDs: may be null initially; diarization (speaker map msg) will map speakerId to participant later. UIs should display a “Speaker X” placeholder until mapping arrives.
Replay after reconnect: send lastSeenSegmentId in handshake; server will replay last N segments (configurable; usually last 30s–120s).
Backpressure: server may send error with MODEL_OVERLOAD. Apply UI rate-limiting (drop or coalesce intermediate partials) to maintain responsiveness.
Acks: client can ack final segments to confirm delivery; server retries events for unacked clients (optional).

Request

Query Params

Modified at 2025-10-11 11:34:20
Previous
Get meeting details
Next
Get Transcript segments
Built with