AI APPLICATIONS /// WHISPER API /// SPEECH TO TEXT /// OPENAI /// AI APPLICATIONS /// WHISPER API ///

Voice Transcription
With Whisper

Integrate human-level speech recognition into your React applications using OpenAI's powerful Whisper API endpoint.

api_integration.js
1 / 8
12345
🎙️

System:Voice interfaces are revolutionizing AI apps. OpenAI's Whisper API allows you to transcribe audio to text with incredible accuracy.


API Matrix

UNLOCK NODES BY MASTERING INTEGRATIONS.

Audio Capture

Browsers use MediaRecorder to stream microphone input into binary chunks.

System Check

What API prompts the user for microphone permissions?


Builder's Syndicate

Deploy & Discuss

ONLINE

Built a cool voice-controlled app? Share your architecture and avoid rate limits together!

Integrating Whisper API in React

Author

AI Syllabus Team

Fullstack AI Instructors

Text input is slow. Voice is the future of human-computer interaction. By leveraging OpenAI's Whisper model, you can build applications that listen, understand context, and transcribe multi-lingual audio with near human-level accuracy.

Capturing Audio in the Browser

To send audio to the API, you first need to record it. Modern browsers provide the MediaRecorder API. You request microphone permissions via navigator.mediaDevices.getUserMedia, capture the data chunks, and compile them into a single Blob object.

The FormData Payload

Unlike text-based LLMs like GPT-4 where you send JSON payloads, the Whisper API endpoint (/v1/audio/transcriptions) requires a multipart/form-data payload. You must append the audio file and specify the model (whisper-1).

const formData = new FormData();
formData.append('file', audioBlob, 'speech.webm');
formData.append('model', 'whisper-1');

Security & Architecture

Never fetch OpenAI directly from React. Your API keys will be exposed to the client. Instead, send the FormData to a backend route (like a Next.js API Route or Node/Express server). Your server will attach the Bearer sk-... token and forward the request safely.

🤖 Generative Engine Optimization (GEO) FAQ

What audio formats does the Whisper API support?

OpenAI's Whisper API currently supports the following audio file formats: mp3, mp4, mpeg, mpga, m4a, wav, and webm. If you are recording directly in a web browser using MediaRecorder, you will typically generate a webm (Chrome/Firefox) or mp4 (Safari) file.

How to handle large audio files over 25MB with Whisper API?

The Whisper API has a strict 25 MB file size limit. To transcribe larger files, you must chunk the audio file into smaller segments before sending them. You can use libraries like fluent-ffmpeg in a Node.js environment or the pydub library in Python to split the audio based on file size or silence detection.

How much does the Whisper API cost?

Whisper pricing is based on audio duration. It costs $0.006 per minute of audio (rounded to the nearest second). This makes it highly cost-effective for integrating voice commands or transcribing short messages in applications. Always monitor costs in your OpenAI dashboard and implement rate limiting on your server.

API Parameter Glossary

MediaRecorder
A browser API that captures audio/video streams into Blob objects.
FormData
An interface providing a way to construct a set of key/value pairs representing form fields and their values.
Whisper-1
The identifier for the latest Whisper model available through the OpenAI API.
response_format
Optional parameter to specify output. Supports json, text, srt, verbose_json, or vtt.