OpenAI has upgraded ChatGPT’s voice mode to deliver emotionally expressive speech and real-time translation, bringing human-like interactions—and a few odd bugs—to the forefront of AI conversation. (Source: Image by RR)

Advanced Voice Mode Now Offers Improved Intonation, Pausing and Emotional Range

OpenAI has significantly enhanced ChatGPT’s voice capabilities for paying subscribers with a major update to its “Advanced Voice Mode.” The improvements make the AI’s speech sound more natural, adding lifelike elements such as empathy, sarcasm and emotional nuance. These changes aim to make conversations with ChatGPT feel smoother and more human-like, improving intonation, timing and pauses.

One of the key features in this update is real-time translation. Users can now carry out live, bilingual conversations with ChatGPT acting as an interpreter. This function, as noted in the-decoder.com, is designed for practical, real-world use cases like ordering food in a foreign language or managing multilingual workplace discussions. The translation tool is continuous and active until the user stops it.

The new voice features are now accessible to all paying users across devices by clicking the language icon in the app. OpenAI has expanded the rollout since introducing Advanced Voice Mode in May 2024 and made it available in the EU by October 2024. With this update, ChatGPT also supports live visual analysis when the user’s camera is on—similar to features offered by Google’s Gemini app.

However, limitations remain. Some users may still experience glitches, including abrupt shifts in pitch or volume and occasional lapses in audio quality, especially when using certain AI voices. OpenAI also acknowledges the persistence of “hallucinated” sounds, where ChatGPT unexpectedly generates random audio elements such as background music, strange noises, or even snippets resembling advertisements—despite the platform not serving any ads. One notable report involved ChatGPT suddenly playing what sounded like a commercial mid-conversation, raising questions about the root of such anomalies. While these bugs don’t affect core functionality, they highlight the ongoing challenge of perfecting realistic AI-generated speech at scale.

read more at the-decoder.com