OpenAI has made its Realtime API generally available and launched gpt-realtime, a next-generation speech-to-speech model with new voices, image and phone support, stronger benchmarks, lower pricing, and enterprise-ready safeguards—bringing natural, production-ready voice agents to real-world deployment. (Source: Image by RR)

Zillow, T-Mobile, and StubHub Already Using Realtime API for Customer Experience

OpenAI has officially launched its Realtime API into general availability, alongside gpt-realtime, its most advanced speech-to-speech model yet. The upgrades mark a major step in making production-ready voice agents practical for enterprises, with capabilities including support for remote MCP servers, image input, and Session Initiation Protocol (SIP) phone calls. These features, according to an article in openai.com, allow developers to build AI systems that interact more naturally with users while integrating seamlessly into existing business infrastructure.

The new gpt-realtime model delivers substantial improvements in intelligence, naturalness, and instruction-following. It can handle complex requests, repeat disclaimers word-for-word, read alphanumeric sequences across multiple languages, and even switch tone or accent mid-sentence. Two new voices—Marin and Cedar—debut with the launch, bringing more human-like expressiveness. Benchmarks highlight the leap in performance: 82.8% accuracy on the Big Bench Audio reasoning test (up from 65.6% in 2024), 30.5% on MultiChallenge instruction following (vs. 20.6% previously), and 66.5% accuracy on ComplexFuncBench for function calling (up from 49.7%).

Unlike traditional voice pipelines that chain separate speech-to-text and text-to-speech models, Realtime runs audio directly through a single model, reducing latency and preserving nuance. Early enterprise partners—including Zillow, T-Mobile, StubHub, Oscar Health and Lemonade—are already using the technology to power real-world use cases such as guiding home searches, handling customer support, and managing insurance or healthcare conversations. Zillow’s head of AI called the improvements a way to make digital interactions feel “as natural as a conversation with a friend.”

OpenAI is also lowering costs and expanding safety controls. Pricing for gpt-realtime has been cut 20% compared to previous previews, now starting at $32 per million audio input tokens and $64 per million audio output tokens. Developers can reuse prompts across sessions and use new tools for managing long conversations cost-effectively. On the safety front, the system integrates active classifiers to halt misuse, preset voices to avoid impersonation, and enterprise-level privacy commitments including EU data residency. Together, these updates make gpt-realtime and the Realtime API a foundation for scaling conversational AI in production environments.

read more at openai.com