A Nuxt 3 application that listens to chatter in the background and transcribes it using OpenAI Whisper

openai-chatterbox

A sample Nuxt 3 application that listens to chatter in the background and transcribes it using the powerful OpenAI Whisper, an automatic speech recognition (ASR) system.

「Nuxt 3アプリケーションのサンプルで、背景のチャタリングをリスニングし、強力なOpenAI Whisperという自動音声認識システムを使ってトランスクリプトします。」



Motivation

This is the Nuxt.js version of the openai-whisper project I built using Next.js, and is part of the series of projects to learn more about Vue 3 and Next 3. I made some improvements in audio data capture and simplified the user interface.

Audio Capture

To start audio capture, press the Start button first.

However, please note that it will not start recording immediately. Recording will automatically begin only if sound is detected.

There is a threshold setting to eliminate background noise from triggering the audio capture. By default it is set to -45dB (0dB is the loudest sound). Adjust the variable MIN_DECIBELS if you want to set it to lower or higher depending on your needs.

In normal human conversation, it is said that we tend to pause, on average, around 2 seconds between each sentences. Keeping this in mind, if sound is not detected for more than 2 seconds, recording will stop and the audio data will be sent to the backend for transcribing. You can change this by editing the value of MAX_PAUSE, by default set to 2500ms.

Press the Stop button once again to stop recording. This will also stop transcribing the audio data currently not yet finished. If you do not want this behavior, edit the line of codes for AbortController.

Not all uploaded audio data will contain voice data. Only audio data that are successfully transcribed will be shown in the list. It is possible to verify the accuracy of the transcription by pressing the Play button to play the recorded audio data.

OpenAI Whisper

Transcribing of audio data is done by OpenAI Whisper and it takes time so do not expect real-time transcription or translation. I have set the model to tiny to adapt to my developing circumstance but if you find that your machine is faster, set it to other models for improved voice transcription.

If the audio source can contain other languages aside from English, you need to set the language option and set the task option to translate.

$ whisper audio.ogg --language Japanese --task translate --model tiny --output_dir './public/upload'

The output will consist of 3 files (srt, txt, vtt) and will be saved in the output directory. If you use the app for very long time, you might see exponential increase of number of files in the output directory. The app do not actually need these. However, it seems there is no option to prevent Whisper from outputting these files.

Anyway, you might be interested in other configuration options using Whisper so please check whisper --help.

Nuxt.js/Vue.js

Currently, I am using the basic fetch to send audio data to the API endpoint and it can cause blocking. I am planning to change it and use useLazyFetch instead later on to see if there is any improvement.

Installation

First, you need to install Whisper and its Python dependencies

$ pip install git+https://github.com/openai/whisper.git

You also need ffmpeg installed on your system

# macos
$ brew install ffmpeg

# windows using chocolatey
$ choco install ffmpeg

# windows using scoop
$ scoop install ffmpeg

By this time, you can test Whisper using command line

$ whisper myaudiofile.ogg --language English --task translate

You can find sample audio files for testing from here.

If that is successful, you can proceed to install this app.

Clone the repository and install the dependencies

$ git clone https://github.com/supershaneski/openai-chatterbox.git myproject

$ cd myproject

$ npm install

To run the app

$ npm run dev

Open your browser to http://localhost:5000/ (port number depends on availability) to load the application page.

Using HTTPS

You might want to run this app using https protocol. This is needed to enable audio capture using a separate device like a smartphone.

In order to do so, prepare the proper certificate and key files and edit server.mjs at the root directory.

Then buid the project

$ npm run build

Finally, run the app

$ node server.mjs

Now, open your browser to https://localhost:3000/ (port number depends on availability) or use your local IP address to load the page.

Known Issues

If you encounter an error that says __dirname is not found (seem to be related to formidable) when you run npm run dev, please run the build command.

$ npm run build

Then try to run the app again

$ npm run dev

GitHub

View Github