deepgram python sdk: How to Get Nicely Formatted Human-Readable Transcripts for Your Podcast Using Deepgram API

deepgram python sdk: Learn how to generate nicely formatted human-readable transcripts for your podcast using the Deepgram API. This tutorial covers setting up the environment, installing dependencies, and transcribing audio files to get a readable transcript with paragraphs, speaker detection, and punctuation.

October 18, 2024 at 11:31

Getting Nicely Formatted Human-Readable Transcripts for Your Podcast

In this tutorial, we will explore how to get nicely formatted human-readable transcripts for our podcast using the Deepgram API. We will be guided through the process by Kevin Lewis, a developer advocate at Deepgram.

Setting Up the Environment

To begin, we need to set up our environment. Kevin provides a link to the direct media file and shows how to get a transcript for the latest episode from the RSS feed URL. Additionally, he demonstrates how to get a transcript from a media file on your computer. Python is used in today's tutorial.

Prerequisites

Before we start, Kevin has created and activated a virtual environment called virtual_env. He has also created a .env file with one property, DEEPGRAM_API_KEY, which includes a free Deepgram API key available at console.deepgram.com. Finally, he has downloaded an episode of his favorite podcast.

Installing Dependencies

To proceed, we need to install the following dependencies:

  • Deepgram SDK
  • Async IO
  • Python mediafile package
  • feedparser package (only needed if transcribing from RSS feed)

Importing Dependencies

Next, we need to import all dependencies:

  • Deepgram SDK
  • Async IO
  • Python mediafile package
  • feedparser package (if needed)
  • Native OS package

Setting Up the Main Function

Now, we create a main function to contain the project's logic. We initialize the Deepgram SDK:

deepgram = Deepgram(deepgram_api_key)

Transcribing a Direct URL

To transcribe a direct URL, we follow these steps:

  • Store the URL in a variable.
  • Create a Source object with one property, URL, with the value of the URL string.
  • Set up transcription options with Deepgram.

Setting Up Transcription Options

We turn on transcription options:

  • Start transcription with the default settings.

Getting a Transcript for the Latest Episode from RSS Feed URL

To get a transcript for the latest episode from the RSS feed URL, we follow these steps:

  • Store the RSS feed URL in a variable.
  • Use feedparser to parse the RSS feed.
  • Extract the URL of the latest episode from the parsed feed.
  • Use the steps above to transcribe the latest episode.

Getting a Transcript from a Media File on Your Computer

To get a transcript from a media file on your computer, we follow these steps:

  • Store the local media file path in a variable.
  • Use mediafile to read the media file.
  • Create a Source object with one property, URL, with the value of the local media file path.
  • Use the steps above to transcribe the local media file.

Using Deep Gram for Transcription and Post-Processing

Setting Up the Transcription

We use the Deep Gram API to transcribe audio or video files. We pass in the pre-recorded audio file and define the transcription object options. We await the response and store it in a variable (e.g., response). We print the response to see the output.

Understanding the Output

The output is a huge JSON object containing:

  • Words spoken
  • Metadata (confidence, punctuation, start and end times)
  • Super useful for computers and applications, but not human-readable

Enhancing the Transcription with Additional Features

We enable:

  • punctuate to add punctuation to the transcript
  • diarize (speaker detection) to identify speakers
  • paragraphs to break down the transcript into paragraphs

Creating a Pretty Transcript

We extract the value from the response variable (e.g., response). We access the results array and then the channels array. We get the first item (Alternatives) and then the transcript value. We extract the paragraphs object and print the transcript.

Running the Code

We rerun the code with the additional features enabled. We see the output, which should be a nicely formatted transcript with paragraphs, speaker detection, and punctuation.

Output Example

The output should resemble a human-readable transcript with:

  • Correct punctuation
  • Speaker detection (if enabled)
  • Paragraph breaks (if enabled)
  • Easy to read and understand

Obtaining RSS Feed URL

Every podcast has an RSS feed that accompanies it. We can use a tool like FeedPasser to get the RSS feed URL. Here's an example of how to use FeedPasser to get the latest episode and transcribe it:

  • Create a new variable called RSS and use FeedPasser to pass a URL (e.g. an NPR podcast episode)
  • Get the latest episode by accessing the entries array and grabbing the first entry
  • Get the URL of the latest episode using entries[0].enclosures[0].href

Transcribing a Podcast

To transcribe a podcast, we can use the Deepgram API to transcribe the audio file directly from our computer, without having to upload it to a server first. To do this, we'll need to:

  • Open the audio file in read mode as audio
  • Create a new source object with a buffer and a mime type of audio MP3
  • Send the source object to the Deepgram API

The transcript is returned pretty quickly, and we can save it to a file with the desired format (e.g. with paragraphs and speaker labels)

Methods for Transcribing a Podcast

We can transcribe a podcast using three methods:

  • Method 1: Providing the URL Directly
    • Provide the URL of the podcast directly to the Deepgram API
    • This method allows us to transcribe the latest episodes of the podcast
  • Method 2: Passing an RSS Feed
    • Pass an RSS feed of the podcast to the Deepgram API
    • The API will then retrieve the latest episodes and transcribe them for us
  • Method 3: Uploading a Local Media File
    • Upload a local media file (e.g. MP3) to the Deepgram API
    • The API will then transcribe the audio file and return the transcript to us

Using the Deepgram API

The Deepgram API allows us to transcribe audio files quickly and easily. We can use the API to transcribe podcasts, voice recordings, and other types of audio files. The API returns the transcript in a formatted text file, with options for adding speaker labels and paragraphs.

Conclusion

Transcribing a podcast using the Deepgram API is a simple and efficient process. We can use the API to transcribe podcasts in different ways, including providing the URL directly, passing an RSS feed, and uploading a local media file. The API returns the transcript in a formatted text file, making it easy to use the transcribed text in our own applications.