deepgram python sdk: Real-Time Live Stream Audio Transcription with Deepgram API

deepgram python sdk: Learn how to transcribe live stream audio in real-time using the Deepgram API and Google Colab with this step-by-step tutorial.

October 18, 2024 at 11:34

Transcribing Live Stream Audio in Real Time with Deepgram API

Transcribing live stream audio in real-time can be a challenging task, but with the help of Deepgram API, it becomes a seamless process. In this tutorial, we will guide you through the step-by-step process of transcribing live stream audio in real-time using Deepgram API and Google Colab.

Prerequisites

Before we begin, make sure you have the following prerequisites:

  • Familiarity with Google Colab or similar notebook environments (Jupyter Notebooks or VS Code)
  • Deepgram API key

Step-by-Step Instructions

To get started, follow these steps:

  1. Open a new notebook and make a copy of the provided notebook link in the description.
  2. Run the first cell, which installs dependencies using pip. Note that you may need to use pip 3 instead of pip depending on your setup.
  3. Run the second cell, which has a few variables to fill in:
    • Deepgram API key: The only required variable, which can be obtained by creating an account with Deepgram.
    • Optional variables: Leave these blank, as they are not essential for the transcription process.
  4. Fill in the Deepgram API key and run the cell again.

What to Expect

Once you have completed the above steps, the notebook will quickly transcribe the live audio feed in real-time. The transcription process will be accelerated by the background music, which is Frederick Chopin's Fantasy-Impromptu. You can expect accurate and efficient transcription results within the five-minute time frame of the music.

Notes

  • This tutorial is designed to be quick and efficient, hence the fast-paced background music.
  • You can pause or stop the tutorial at any time to explore the notebook further.
  • The general instructions should be applicable to other notebook environments, such as Jupyter Notebooks or VS Code.

Using the Deepgram API and Transcription

To use the Deepgram API, you need to sign up with your email and receive 12,000 minutes of free transcription. No credit card information is required. Once you have the API key, you can run the cell immediately by plugging in the API key and stream audio from a URL.

Variables

There are three main variables to consider:

  • URL: This variable should be set to the URL of the audio stream you wish to transcribe. By default, the code streams from BBC Radio.
  • Params: This variable should be set to the parameters that configure your Deep Gram model. The starter code provides default parameters that should not need to be modified for this demo. However, you can modify them according to your needs and refer to the Deep Gram documentation for more information.
  • Starter Code Parameters:
    • Punctuation: Set to true, meaning the transcript will be punctuated with periods, commas, and so on.
    • Numerals: Set to true, meaning numbers will be represented as digits instead of words.
    • Language: Set to English, as the audio stream is in English, but the API also supports other languages.

Deepgram Parameters and Configuration

Deepgram offers multiple languages and models for different types of audio streams, including meetings, phone calls, voicemails, video streams, and conversational AI. For more information, refer to the Deepgram documentation.

The pre-written parameters for this demo are:

  • time_limit: an INT representing the number of seconds to transcribe for.
  • transcription_only: a Boolean that should be set to True for transcribed text only, or False for full JSON responses including metadata, word-level timestamps, and confidence measurements.

Demo Configuration

For this demo, we will create subtitles for a BBC Radio Show. There are two latencies to consider:

  • Radio to speaker latency
  • Radio to AI latency

These latencies are independent of each other. As of today, the radio to speaker latency is larger than the radio to AI latency.

Example Subtitles

The result subtitles look like this: A very short time and pulled me up, so I didn't go very far from that moment on, they are bound...

Deepgram Live Transcription Feature

Deepgram's real-time live stream audio recording feature creates a bond between users and their projects. The subtitles are looking good and can be used in various ways, such as:

  • Live conversation with a child
  • Real-time translation
  • Wearing live subtitles on the chest (some users have done this before)

Users have used Deepgram to:

  • Control a small car with voice commands
  • Create a Disney princess dress that lights up different colors based on the song being sung
  • Transcribe pre-recorded audios

Deepgram's language models offer additional features, including:

  • Summarizing long audios
  • Diarizing audios with multiple speakers
  • Filtering profanity
  • And much more

A notebook is available for Deepgram's pre-recorded audio transcription feature. Deepgram also provides SDKs for Node, Python, Go, and more, as well as a quick and easy-to-use API with documentation written by humans for humans.