Speech Transcription using Deepgram API
Introduction
Deepgram is a speech recognition API that provides a way to transcribe audio files. It can be used to transcribe pre-recorded audio and live audio streams. In this tutorial, we will explore how to use Deepgram to transcribe audio files and streams using Node.js.
Deepgram API
Deepgram offers a REST API and WebSocket interface that supports multiple programming languages, including Node.js, Python, and .NET. It also provides a free trial with $150 of free credit, making it easy to get started with the API.
Getting Started with Deepgram
To get started with Deepgram, you need to create a Deepgram account and get a free trial. After signing up, you can use the Deepgram console to create a new API key and set permissions. Create a new project and add the API key to the environment variables file.
Glitch Environment
Glitch is an online code editor and runtime environment that allows you to write and run code in the cloud. You can remix a Glitch project to create a new copy with boilerplate code already set up. The Glitch interface allows you to edit code and run applications.
Code Setup
To set up the code, create a server.js
file as the hub of the application. Require and initialize the Deepgram Node.js SDK. Set up a little Express.js application to make API calls to Deepgram.
Challenges
For Local Hack Day, we have three challenges:
- Getting started with Deepgram
- Mashing up Deepgram with another API
- Accessibility challenge
Notes
- Environmental variables are used to store sensitive information, such as API keys.
- API keys should not be shared with others, as they can be used to interact with the Deepgram account.
- The Deepgram API requires an internet connection to make API calls and establish WebSocket connections.
Requiring the Node.js SDK and Initializing the Deepgram API
To use the Deepgram API, you need to require the Node.js SDK and initialize it with an API key stored in an environment variable. Here's how to do it:
const deepgram = require('deepgram');
const apiKEY = process.env.DG_API_KEY;
const deepgramVariable = new deepgram.Deepgram(apiKEY);
Getting the First Transcription
To get the first transcription, use the deepgram.transcription.prerecorded()
method to make a request for a transcription. Pass in an object with a url
property containing the URL of the audio file. Use the then()
method to handle the promise returned by the method.
deepgramVariable.transcription.prerecorded({
url: 'https://example.com/audiofile.wav'
})
.then((data) => {
console.log(data);
})
.catch((err) => {
console.error(err);
});
Understanding the Response
The response object contains a metadata
property with a unique request ID and other metadata. The response object also contains a results
property with an array of channels. Each channel contains an alternatives
property with an array of possible transcriptions. The alternatives
array contains objects with text
, type
, and confidence
properties.
Features of the Deepgram API
The Deepgram API has several features, including:
- Punctuate: adds punctuation to the transcript
- Utterances: returns phrases (utterances) instead of individual words
- Diarize: separates the audio file into individual speakers
- Keyword Boosting: makes the API more likely to hear specific keywords
- Other features: see the documentation for a complete list
Live Transcription
To perform live transcription, use the deepgram.transcription.prerecorded()
method to make a request for a live transcription. Pass in an object with a url
property containing the URL of the live audio stream. Use the then()
method to handle the promise returned by the method.
deepgramVariable.transcription.prerecorded({
url: 'https://example.com/livestream.wav'
})
.then((data) => {
console.log(data);
})
.catch((err) => {
console.error(err);
});
Conversation API
The Conversation API allows you to differentiate between two people having a conversation. The Diarize feature can be used to separate the audio file into individual speakers.
Browser Transcription
To perform browser transcription, set up the browser transcription in index.html
. There are four steps to achieve live transcription:
- Get access to the user's microphone
- Create a two-way connection with Deepgram using websockets
- Prepare the data from the mic and send it to Deepgram
- Display the transcription in the browser
Step 1: Get Access to the User's Microphone
Use the navigator.mediaDevices.getUserMedia
method to request access to the user's microphone. Specify audio: true
in the object to request access to an audio device (microphone). Create a MediaStream
object to access the raw data from the microphone.
navigator.mediaDevices.getUserMedia({ audio: true })
.then((stream) => {
console.log(stream);
})
.catch((err) => {
console.error(err);
});
Step 2: Create a Persistent Two-Way Connection with Deepgram
Use the WebSocket
client built into the browser to create a connection to Deepgram's live transcription endpoint. Specify the URL and API key as arguments in the WebSocket
constructor. Handle the connection opening and closing events to send and receive data.
const socket = new WebSocket('wss://api.deepgram.com/v1/ws', apiKEY);
socket.onopen = () => {
console.log('Connected to Deepgram');
};
socket.onclose = () => {
console.log('Disconnected from Deepgram');
};
socket.onmessage = (message) => {
console.log(message);
};
socket.onerror = (err) => {
console.error(err);
};
Step 3: Prepare the Data from the Mic and Send it to Deepgram
Add an event listener to the MediaRecorder
object to listen for the dataavailable
event. Send the data to Deepgram using the socket.send()
method. Start the MediaRecorder
object to make the data available.
const mediaRecorder = new MediaRecorder(stream);
mediaRecorder.ondataavailable = (event) => {
socket.send(event.data);
};
mediaRecorder.start();
Step 4: Listen for a Response from Deepgram
Listen for the message
event on the WebSocket
object to receive the transcription from Deepgram. Handle the response by console logging the message or displaying it in the browser.
socket.onmessage = (message) => {
console.log(message);
};
Creating a Variable and Converting JSON Data
Create a variable called data
and pass the message
to it, which returns a string. Convert the string to a JSON object.
const data = JSON.parse(message);
Handling Reconnecting and Final Transcripts
Use socket.on('message', (message) => { ... });
to handle reconnecting and final transcripts. Check if message.data
is not empty and if isFinal
is true
. If isFinal
is true
, log the transcript to the console.
socket.on('message', (message) => {
if (message.data && message.isFinal) {
console.log(message.transcript);
}
});
Displaying Transcripts on the Page
Use document.querySelector
to select a paragraph element and add the transcript to it. Add a space before adding the transcript to separate phrases.
const paragraph = document.querySelector('p');
paragraph.innerText += ' ' + message.transcript;
Troubleshooting and Handling Errors
Check the console for errors and provide more information to help with troubleshooting. Make sure to update the API key and handle it securely.
Additional Notes
- The
mediaRecorder
is not supported in every browser, including Safari. - Use a try-catch block to handle errors and reconnect if the socket connection is lost.
- Use a secure method to handle API keys, such as generating temporary keys on the server-side.
Challenge
Complete the missions on the dashboard by signing up for an account and completing the four challenges. Submit a screenshot of the completed challenges to Dev Post to get credit.