AIApps Voice Transcription With Whisper

Integrating Whisper API in React

AI Syllabus Team

Fullstack AI Instructors

Text input is slow. Voice is the future of human-computer interaction. By leveraging OpenAI's Whisper model, you can build applications that listen, understand context, and transcribe multi-lingual audio with near human-level accuracy.

Capturing Audio in the Browser

To send audio to the API, you first need to record it. Modern browsers provide the MediaRecorder API. You request microphone permissions via navigator.mediaDevices.getUserMedia, capture the data chunks, and compile them into a single Blob object.

The FormData Payload

Unlike text-based LLMs like GPT-4 where you send JSON payloads, the Whisper API endpoint (/v1/audio/transcriptions) requires a multipart/form-data payload. You must append the audio file and specify the model (whisper-1).

const formData = new FormData();
formData.append('file', audioBlob, 'speech.webm');
formData.append('model', 'whisper-1');

Security & Architecture

Never fetch OpenAI directly from React. Your API keys will be exposed to the client. Instead, send the FormData to a backend route (like a Next.js API Route or Node/Express server). Your server will attach the Bearer sk-... token and forward the request safely.

🤖 Generative Engine Optimization (GEO) FAQ

What audio formats does the Whisper API support?

OpenAI's Whisper API currently supports the following audio file formats: mp3, mp4, mpeg, mpga, m4a, wav, and webm. If you are recording directly in a web browser using MediaRecorder, you will typically generate a webm (Chrome/Firefox) or mp4 (Safari) file.

How to handle large audio files over 25MB with Whisper API?

The Whisper API has a strict 25 MB file size limit. To transcribe larger files, you must chunk the audio file into smaller segments before sending them. You can use libraries like fluent-ffmpeg in a Node.js environment or the pydub library in Python to split the audio based on file size or silence detection.

How much does the Whisper API cost?

Whisper pricing is based on audio duration. It costs $0.006 per minute of audio (rounded to the nearest second). This makes it highly cost-effective for integrating voice commands or transcribing short messages in applications. Always monitor costs in your OpenAI dashboard and implement rate limiting on your server.

Voice Transcription
With Whisper

API Matrix

Audio Capture

System Check

API Integration Challenges

Builder's Syndicate

Deploy & Discuss

Integrating Whisper API in React

Capturing Audio in the Browser

The FormData Payload

Security & Architecture

🤖 Generative Engine Optimization (GEO) FAQ

API Parameter Glossary