deepgram python sdk: Speech Transcription using Deepgram API: A Step-by-Step Guide with Node.js

deepgram python sdk: Learn how to use Deepgram API for speech transcription with Node.js, including setting up the API key, making API calls, and handling responses. Get started with live transcription and conversation APIs.

October 18, 2024 at 11:32

Speech Transcription using Deepgram API

Introduction

Deepgram is a speech recognition API that provides a way to transcribe audio files. It can be used to transcribe pre-recorded audio and live audio streams. In this tutorial, we will explore how to use Deepgram to transcribe audio files and streams using Node.js.

Deepgram API

Deepgram offers a REST API and WebSocket interface that supports multiple programming languages, including Node.js, Python, and .NET. It also provides a free trial with $150 of free credit, making it easy to get started with the API.

Getting Started with Deepgram

To get started with Deepgram, you need to create a Deepgram account and get a free trial. After signing up, you can use the Deepgram console to create a new API key and set permissions. Create a new project and add the API key to the environment variables file.

Glitch Environment

Glitch is an online code editor and runtime environment that allows you to write and run code in the cloud. You can remix a Glitch project to create a new copy with boilerplate code already set up. The Glitch interface allows you to edit code and run applications.

Code Setup

To set up the code, create a server.js file as the hub of the application. Require and initialize the Deepgram Node.js SDK. Set up a little Express.js application to make API calls to Deepgram.

Challenges

For Local Hack Day, we have three challenges:

  • Getting started with Deepgram
  • Mashing up Deepgram with another API
  • Accessibility challenge

Notes

  • Environmental variables are used to store sensitive information, such as API keys.
  • API keys should not be shared with others, as they can be used to interact with the Deepgram account.
  • The Deepgram API requires an internet connection to make API calls and establish WebSocket connections.

Requiring the Node.js SDK and Initializing the Deepgram API

To use the Deepgram API, you need to require the Node.js SDK and initialize it with an API key stored in an environment variable. Here's how to do it:

const deepgram = require('deepgram');
const apiKEY = process.env.DG_API_KEY;
const deepgramVariable = new deepgram.Deepgram(apiKEY);

Getting the First Transcription

To get the first transcription, use the deepgram.transcription.prerecorded() method to make a request for a transcription. Pass in an object with a url property containing the URL of the audio file. Use the then() method to handle the promise returned by the method.

deepgramVariable.transcription.prerecorded({
  url: 'https://example.com/audiofile.wav'
})
.then((data) => {
  console.log(data);
})
.catch((err) => {
  console.error(err);
});

Understanding the Response

The response object contains a metadata property with a unique request ID and other metadata. The response object also contains a results property with an array of channels. Each channel contains an alternatives property with an array of possible transcriptions. The alternatives array contains objects with text, type, and confidence properties.

Features of the Deepgram API

The Deepgram API has several features, including:

  • Punctuate: adds punctuation to the transcript
  • Utterances: returns phrases (utterances) instead of individual words
  • Diarize: separates the audio file into individual speakers
  • Keyword Boosting: makes the API more likely to hear specific keywords
  • Other features: see the documentation for a complete list

Live Transcription

To perform live transcription, use the deepgram.transcription.prerecorded() method to make a request for a live transcription. Pass in an object with a url property containing the URL of the live audio stream. Use the then() method to handle the promise returned by the method.

deepgramVariable.transcription.prerecorded({
  url: 'https://example.com/livestream.wav'
})
.then((data) => {
  console.log(data);
})
.catch((err) => {
  console.error(err);
});

Conversation API

The Conversation API allows you to differentiate between two people having a conversation. The Diarize feature can be used to separate the audio file into individual speakers.

Browser Transcription

To perform browser transcription, set up the browser transcription in index.html. There are four steps to achieve live transcription:

  1. Get access to the user's microphone
  2. Create a two-way connection with Deepgram using websockets
  3. Prepare the data from the mic and send it to Deepgram
  4. Display the transcription in the browser

Step 1: Get Access to the User's Microphone

Use the navigator.mediaDevices.getUserMedia method to request access to the user's microphone. Specify audio: true in the object to request access to an audio device (microphone). Create a MediaStream object to access the raw data from the microphone.

navigator.mediaDevices.getUserMedia({ audio: true })
  .then((stream) => {
    console.log(stream);
  })
  .catch((err) => {
    console.error(err);
  });

Step 2: Create a Persistent Two-Way Connection with Deepgram

Use the WebSocket client built into the browser to create a connection to Deepgram's live transcription endpoint. Specify the URL and API key as arguments in the WebSocket constructor. Handle the connection opening and closing events to send and receive data.

const socket = new WebSocket('wss://api.deepgram.com/v1/ws', apiKEY);

socket.onopen = () => {
  console.log('Connected to Deepgram');
};

socket.onclose = () => {
  console.log('Disconnected from Deepgram');
};

socket.onmessage = (message) => {
  console.log(message);
};

socket.onerror = (err) => {
  console.error(err);
};

Step 3: Prepare the Data from the Mic and Send it to Deepgram

Add an event listener to the MediaRecorder object to listen for the dataavailable event. Send the data to Deepgram using the socket.send() method. Start the MediaRecorder object to make the data available.

const mediaRecorder = new MediaRecorder(stream);

mediaRecorder.ondataavailable = (event) => {
  socket.send(event.data);
};

mediaRecorder.start();

Step 4: Listen for a Response from Deepgram

Listen for the message event on the WebSocket object to receive the transcription from Deepgram. Handle the response by console logging the message or displaying it in the browser.

socket.onmessage = (message) => {
  console.log(message);
};

Creating a Variable and Converting JSON Data

Create a variable called data and pass the message to it, which returns a string. Convert the string to a JSON object.

const data = JSON.parse(message);

Handling Reconnecting and Final Transcripts

Use socket.on('message', (message) => { ... }); to handle reconnecting and final transcripts. Check if message.data is not empty and if isFinal is true. If isFinal is true, log the transcript to the console.

socket.on('message', (message) => {
  if (message.data && message.isFinal) {
    console.log(message.transcript);
  }
});

Displaying Transcripts on the Page

Use document.querySelector to select a paragraph element and add the transcript to it. Add a space before adding the transcript to separate phrases.

const paragraph = document.querySelector('p');
paragraph.innerText += ' ' + message.transcript;

Troubleshooting and Handling Errors

Check the console for errors and provide more information to help with troubleshooting. Make sure to update the API key and handle it securely.

Additional Notes

  • The mediaRecorder is not supported in every browser, including Safari.
  • Use a try-catch block to handle errors and reconnect if the socket connection is lost.
  • Use a secure method to handle API keys, such as generating temporary keys on the server-side.

Challenge

Complete the missions on the dashboard by signing up for an account and completing the four challenges. Submit a screenshot of the completed challenges to Dev Post to get credit.