deepgram python sdk: Transcribe Audio Files with High Accuracy Using DeepNote and Google Colab

deepgram python sdk: Learn how to accurately transcribe audio files using DeepNote and Google Colab in this step-by-step tutorial. Get started with audio transcription and explore advanced features with Deep Prep SDK.

October 18, 2024 at 11:43

Quick Start Tutorial: Accurately Transcribing Audio with DeepNote

Accurately transcribing audio files can be a tedious and time-consuming task, but with DeepNote and Google Colab, you can easily transcribe audio files with high accuracy. In this tutorial, we will guide you through the process of transcribing audio files using DeepNote and Google Colab.

Prerequisites

Before we begin, make sure you have:

  • A copy of the Google Colab notebook (or Jupyter Notebook or VS Code)
  • Familiarity with Python

Step 1: Open the Notebook and Make a Copy

Open the notebook and create a copy of it. This tutorial will assume you are using Google Colab, but the instructions should be applicable to Jupyter Notebooks or VS Code as well.

Step 2: Run the First Cell

To run a cell, click the play button or press Shift+Enter (or Shift+Return if you are using automatic formatting). This cell will install dependencies using pip. Wait for a few minutes to see the colorful text output.

Step 3: Install Dependencies

The cell will install the necessary dependencies for DeepNote. This will enable you to use DeepNote for transcribing audio. Be patient and wait for the installation to complete before proceeding.

Transcribing Audio Files with Google Colab and Newgram API

Preparation

  • Use pip 3 instead of pip depending on your setup
  • Grab an audio file (multiple files can be used, but for this tutorial, we'll use the first chapter of the audiobook "Emma" by Jane Austen)
  • Ensure the audio file is already downloaded on your laptop and uploaded to Google Colab (or use the anonymous upload feature in Jupiter or VS Code)

Transcription Setup

  • The next cell to run is the transcription code
  • Fill in the necessary variables:
    • API Key: obtain your own API key by signing up for Newgram and creating an API key with your own account (link provided in description)
    • Mime Type: currently set to MP3, but can be changed if working with MP4, M4A, or other audio formats
    • Directory: set to the name of the folder containing the audio files
  • Ensure the audio file is in the correct format (supported encoding formats are listed on screen)

Notes

  • The audio file used in this tutorial is the first chapter of the audiobook "Emma" by Jane Austen, in MP3 format
  • The API key is blurred out for security reasons
  • The link for signing up for Newgram and creating an API key can be found in the description
  • Since you have placed your audio in the current directory, you can use the classic dot (.).
  • However, if you have placed your audios in the default sample data folder or created a custom folder to host your audios, you need to set this variable to the name of that folder.
  • Running the cell will generate a JSON file in the folder on the left. There may be a slight delay between the cell finishing and the JSON appearing, but it should appear within a minute.
  • The JSON file will contain metadata, word-level timestamp, and confidence levels for every single word.
  • If you just want the transcript, you can access the entire transcript as a string within the JSON file.
  • The line transcript = json.load(open('transcription.json')) accesses the transcript.
  • You can run the final cell to see the transcript.
  • The function print_lines is written to break each printed line into a sentence.

Example Output

  • Deep Prep and Indulgent Father
    • An indulgent father with a bad consequence resulted in his daughter being the Mistress of the house from a very early period.
    • The sentence is a good match for the actual text, with only a few minor comments.

Using Deep Prep

  • To use Deep Prep as quickly as possible, feel free to rewrite the code in the notebook as much as you desire.
  • Alternatively, you can use the Deep gram SDK or software development kit for Node, Python, Go, and more.
  • Other features of Deep Prep include:
    • Transcribing audio straight from a URL
    • Live transcription
    • Managing projects and diarizing transcripts
    • Labeling every speaker
    • Using models optimized for other types of audio, such as:
      • Phone calls
      • Meetings
      • Voicemails
      • Conversations with AI

Deep Prep SDK and Features

  • The Deep Prep SDK is available for Node, Python, Go, and more.
  • The SDK allows you to:
    • Transcribe audio straight from a URL
    • Manage projects and diarizing transcripts
    • Label every speaker
    • Use models optimized for other types of audio
  • The Deep Prep documentation reveals many other features, including:
    • Quick look at our documentation

Cool Demos and Applications

  • Users can create cool demos using Deep Prep, such as:
    • Driving a car with your voice
    • Creating a live subtitles badge
    • Much more
  • The possibilities are endless with the Deep Prep API.

Deep Prep API

  • The Deep Prep API is a quick and easy-to-use API with intuitive documentation written by humans for humans. With the Deep Prep API, you can create a wide range of applications and demos that can revolutionize the way we interact with audio files.