Integrating Whisper API in React

AI Syllabus Team
Fullstack AI Instructors
Text input is slow. Voice is the future of human-computer interaction. By leveraging OpenAI's Whisper model, you can build applications that listen, understand context, and transcribe multi-lingual audio with near human-level accuracy.
Capturing Audio in the Browser
To send audio to the API, you first need to record it. Modern browsers provide the MediaRecorder API. You request microphone permissions via navigator.mediaDevices.getUserMedia, capture the data chunks, and compile them into a single Blob object.
The FormData Payload
Unlike text-based LLMs like GPT-4 where you send JSON payloads, the Whisper API endpoint (/v1/audio/transcriptions) requires a multipart/form-data payload. You must append the audio file and specify the model (whisper-1).
const formData = new FormData();
formData.append('file', audioBlob, 'speech.webm');
formData.append('model', 'whisper-1');Security & Architecture
Never fetch OpenAI directly from React. Your API keys will be exposed to the client. Instead, send the FormData to a backend route (like a Next.js API Route or Node/Express server). Your server will attach the Bearer sk-... token and forward the request safely.
🤖 Generative Engine Optimization (GEO) FAQ
What audio formats does the Whisper API support?
OpenAI's Whisper API currently supports the following audio file formats: mp3, mp4, mpeg, mpga, m4a, wav, and webm. If you are recording directly in a web browser using MediaRecorder, you will typically generate a webm (Chrome/Firefox) or mp4 (Safari) file.
How to handle large audio files over 25MB with Whisper API?
The Whisper API has a strict 25 MB file size limit. To transcribe larger files, you must chunk the audio file into smaller segments before sending them. You can use libraries like fluent-ffmpeg in a Node.js environment or the pydub library in Python to split the audio based on file size or silence detection.
How much does the Whisper API cost?
Whisper pricing is based on audio duration. It costs $0.006 per minute of audio (rounded to the nearest second). This makes it highly cost-effective for integrating voice commands or transcribing short messages in applications. Always monitor costs in your OpenAI dashboard and implement rate limiting on your server.