
OpenAI has launched new AI models for voice generation and transcription, promising major improvements in realism, accuracy, and usability.
The updated models, now available via OpenAI's API, are part of the company's vision to enhance AI-powered agents capable of handling complex user interactions independently.
One of the key advancements is the new text-to-speech model, gpt-4o-mini-tts, which OpenAI claims can produce more natural-sounding speech with better emotional range.
Developers can now instruct the AI on how to express itself, such as speaking in a calming tone or even mimicking the enthusiasm of a sports commentator.
Jeff Harris, a product team member at OpenAI, highlighted the significance of this development, saying, "Our big belief here is that developers and users want to really control not just what is spoken, but how things are spoken."
This could revolutionize industries like customer service, where the AI can adjust its tone based on the context of the conversation, TechCrunch said.
In addition to its voice generation improvements, OpenAI has introduced gpt-4o-transcribe and gpt-4o-mini-transcribe, which are designed to replace the older Whisper model.
These transcription models are trained on diverse, high-quality audio datasets, enabling them to accurately capture spoken words—even in noisy environments or with strong accents.
Harris noted that these models significantly reduce errors and hallucinations—where AI generates false words or phrases.
This was a major issue with Whisper, which sometimes inserted incorrect details into transcripts. He explained, "Making sure the models are accurate is completely essential to getting a reliable voice experience."
OpenAI Launches Advanced Voice AI Models to Enhance Natural Real-Time Speechhttps://t.co/Rq0z5lfILp#ciobulletin #LatestNews #OpenAI #VoiceAI #AIModels #RealTimeSpeech #ArtificialIntelligence #TechInnovation #AIAdvancements #SpeechTechnology #ConversationalAI #FutureOfAI pic.twitter.com/ok9T5as0tL
— CIO Bulletin (@ciobulletin) March 21, 2025
Read more: OpenAI Board Says It 'Unanimously Rejected Mr. Musk's Latest Attempt to Disrupt His Competition'
OpenAI Keeps Latest AI Transcription Models Proprietary, Citing Complexity
Unlike previous OpenAI transcription models, these new versions will not be openly available.
While Whisper was accessible under an MIT license, OpenAI has decided to keep gpt-4o-transcribe and gpt-4o-mini-transcribe proprietary due to their complexity and scale.
"They're not the kind of model that you can just run locally on your laptop," Harris stated. He emphasized that OpenAI is being cautious about releasing models as open source, prioritizing thoughtful deployment.
The improvements in AI voice and transcription models open doors for various industries. Call centers, virtual assistants, and interactive voice response (IVR) systems can now offer more personalized and human-like interactions.
According to INC, OpenAI also introduced an integration with its Agents SDK, allowing developers to convert text-based AI agents into voice-driven assistants with minimal coding.
Companies like EliseAI have already tested the new voice technology, using it to enhance tenant communication in the housing sector.
Co-founder Tony Stoyanov praised the AI's effectiveness, stating that its "emotionally rich voices encourage callers to comfortably engage with AI for leasing inquiries, maintenance requests, and tour scheduling."
Join the Conversation