Getting Nicely Formatted Human-Readable Transcripts for Your Podcast
In this tutorial, we will explore how to get nicely formatted human-readable transcripts for our podcast using the Deepgram API. We will be guided through the process by Kevin Lewis, a developer advocate at Deepgram.
Setting Up the Environment
To begin, we need to set up our environment. Kevin provides a link to the direct media file and shows how to get a transcript for the latest episode from the RSS feed URL. Additionally, he demonstrates how to get a transcript from a media file on your computer. Python is used in today's tutorial.
Prerequisites
Before we start, Kevin has created and activated a virtual environment called virtual_env
. He has also created a .env
file with one property, DEEPGRAM_API_KEY
, which includes a free Deepgram API key available at console.deepgram.com. Finally, he has downloaded an episode of his favorite podcast.
Installing Dependencies
To proceed, we need to install the following dependencies:
- Deepgram SDK
- Async IO
- Python
mediafile
package feedparser
package (only needed if transcribing from RSS feed)
Importing Dependencies
Next, we need to import all dependencies:
- Deepgram SDK
- Async IO
- Python
mediafile
package feedparser
package (if needed)- Native OS package
Setting Up the Main Function
Now, we create a main function to contain the project's logic. We initialize the Deepgram SDK:
deepgram = Deepgram(deepgram_api_key)
Transcribing a Direct URL
To transcribe a direct URL, we follow these steps:
- Store the URL in a variable.
- Create a
Source
object with one property,URL
, with the value of the URL string. - Set up transcription options with Deepgram.
Setting Up Transcription Options
We turn on transcription options:
- Start transcription with the default settings.
Getting a Transcript for the Latest Episode from RSS Feed URL
To get a transcript for the latest episode from the RSS feed URL, we follow these steps:
- Store the RSS feed URL in a variable.
- Use
feedparser
to parse the RSS feed. - Extract the URL of the latest episode from the parsed feed.
- Use the steps above to transcribe the latest episode.
Getting a Transcript from a Media File on Your Computer
To get a transcript from a media file on your computer, we follow these steps:
- Store the local media file path in a variable.
- Use
mediafile
to read the media file. - Create a
Source
object with one property,URL
, with the value of the local media file path. - Use the steps above to transcribe the local media file.
Using Deep Gram for Transcription and Post-Processing
Setting Up the Transcription
We use the Deep Gram
API to transcribe audio or video files. We pass in the pre-recorded audio file and define the transcription object options. We await the response and store it in a variable (e.g., response
). We print the response to see the output.
Understanding the Output
The output is a huge JSON object containing:
- Words spoken
- Metadata (confidence, punctuation, start and end times)
- Super useful for computers and applications, but not human-readable
Enhancing the Transcription with Additional Features
We enable:
punctuate
to add punctuation to the transcriptdiarize
(speaker detection) to identify speakersparagraphs
to break down the transcript into paragraphs
Creating a Pretty Transcript
We extract the value from the response variable (e.g., response
). We access the results
array and then the channels
array. We get the first item (Alternatives
) and then the transcript
value. We extract the paragraphs
object and print the transcript.
Running the Code
We rerun the code with the additional features enabled. We see the output, which should be a nicely formatted transcript with paragraphs, speaker detection, and punctuation.
Output Example
The output should resemble a human-readable transcript with:
- Correct punctuation
- Speaker detection (if enabled)
- Paragraph breaks (if enabled)
- Easy to read and understand
Obtaining RSS Feed URL
Every podcast has an RSS feed that accompanies it. We can use a tool like FeedPasser to get the RSS feed URL. Here's an example of how to use FeedPasser to get the latest episode and transcribe it:
- Create a new variable called
RSS
and use FeedPasser to pass a URL (e.g. an NPR podcast episode) - Get the latest episode by accessing the
entries
array and grabbing the first entry - Get the URL of the latest episode using
entries[0].enclosures[0].href
Transcribing a Podcast
To transcribe a podcast, we can use the Deepgram API to transcribe the audio file directly from our computer, without having to upload it to a server first. To do this, we'll need to:
- Open the audio file in read mode as audio
- Create a new source object with a buffer and a mime type of audio MP3
- Send the source object to the Deepgram API
The transcript is returned pretty quickly, and we can save it to a file with the desired format (e.g. with paragraphs and speaker labels)
Methods for Transcribing a Podcast
We can transcribe a podcast using three methods:
- Method 1: Providing the URL Directly
- Provide the URL of the podcast directly to the Deepgram API
- This method allows us to transcribe the latest episodes of the podcast
- Method 2: Passing an RSS Feed
- Pass an RSS feed of the podcast to the Deepgram API
- The API will then retrieve the latest episodes and transcribe them for us
- Method 3: Uploading a Local Media File
- Upload a local media file (e.g. MP3) to the Deepgram API
- The API will then transcribe the audio file and return the transcript to us
Using the Deepgram API
The Deepgram API allows us to transcribe audio files quickly and easily. We can use the API to transcribe podcasts, voice recordings, and other types of audio files. The API returns the transcript in a formatted text file, with options for adding speaker labels and paragraphs.
Conclusion
Transcribing a podcast using the Deepgram API is a simple and efficient process. We can use the API to transcribe podcasts in different ways, including providing the URL directly, passing an RSS feed, and uploading a local media file. The API returns the transcript in a formatted text file, making it easy to use the transcribed text in our own applications.