Using real-time transcription
This section is part of the AI Live Streaming Manual. It explains how event organizers can make the AI text conversion and translation(s) available in the form of a real-time scrolling transcript without delay for on-site participants (or remote participants in virtual events).
Transcript Delivery
Since participants need to get the stream without delay, WebRTC is used instead of HTTP Live Streaming (HLS) to deliver the real-time transcript to on-site participants. This is supported by all modern (mobile) browsers. By default, the AI Multilingual plan allows for 5 simultaneous real-time transcription viewers, but you can contact us to add more viewers to your plan.
The look and feel of the scrolling transcript can be customized by the user, via the cog wheel in the top-right corner of the transcript page.
Event Configuration
Follow the normal workflow to create an event with AI captions. In the AI wizard, make sure to choose 'SRT' as broadcast protocol (step 1) and check 'Provide real-time transcription for event attendees' (step 2).
We recommend creating an AI vocabulary, as this will (also) improve the quality of the real-time transcription.
The further configuration of your event depends on whether it is also used for live video streaming or not.
1. Real-time transcription only
If the sole purpose of the event is to provide real-time transcription(s), the video stream plays no part. You will want to minimise the delay of the transcription(s) as much as possible.
Management
After the AI wizard finishes, you are directed to the 'Management' tab of the event. On this tab, scroll down and adjust following settings:
- Resolutions: remove all resolutions except for 240p. Since you are not streaming video, transcoding to multiple video resolutions is pointless.
- Latency: select 'Low Latency' to avoid additional delay when generating the transcript text(s).
Broadcast
Currently, Clevercast still expects an incoming video and audio stream. But since the video isn't watched by anyone, you can simply broadcast a black screen.
Go to the 'Broadcast' tab of the event to configure your encoder. The SRT protocol should be selected (after you chose it in the AI wizard). Please note that using RTMP will introduce extra second(s) of delay.
To minimise delay, set your encoder to the minimum latency of 200 milliseconds for a broadcast from Europe (if you use OBS Studio, you can copy the SRT caller URL on the 'Broadcast' tab of the event page).
Note: we hope to add support for sending audio-only through the browser soon, eliminating the need for an encoder if you only need real-time transcription.
2. Combination with live streaming
Management
On the 'Management' tab of the event, you can choose between 'Default Latency' and 'Low Latency'. If you are also live streaming, we recommend selecting Default Latency, as this will improve the quality of the closed captions.
Notes:
- Combining live streaming with real-time transcription may, to a limited extent, negatively affect the quality of the closed captions in the live stream. So if you don't have on-site users, we recommend leaving this setting turned off.
- Since the transcripts are generated in real-time, the accuracy of translated transcrips will be lower than the accuracy of translated closed captions in the live stream, since the latter's delay allows more context to be provided to the AI models.
- Real-time transcripts are also supported for events with human correctors. However, corrections only apply to the closed captions and speech translations in the live stream, not to the real-time transcripts.
Broadcast
If you combine real-time transcription with live streaming for online viewers, the same broadcast settings apply to both. Test beforehand to determine what SRT latency to set on your encoder (as low as possible for the transcript, but sufficiently high to guarantee a stable live stream). For a broadcast from Europe, this will probably be between 400 and 800 milliseconds. For a broadcast from the US, 1 second is more appropriate.
Usage
On the 'Caption Languages' tab of the event, the real-time transcript(s) for each caption language are available. If you don't see the 'Real-time Transcription' panel, check the 'Provide real-time transcription for event attendees' setting on top of the page (it should be set to 'Yes').
Distribute the transcript links to your on-site production team and/or event participants.
There are 2 kinds of transcription links:
- On-site participant links: intended for distribution to event participants, so they can read the transcription on mobile devices. The scrolling text is only displayed when the event status is
Started
. - On-site production links: intended for the event organizer to display the scrolling text on-site, for example on large screens. The scrolling text is also displayed when the event status is
Preview
orPaused
.
Follow the normal workflow to test, start and stop your live stream, including the real-time transcription.