ChatGPT’s Voice Mode Can Be Hacked

VoiceJailbreak employs setting, character, and plot to transform a banned question into an effective audible jailbreak input, such as placing GPT-4o in the role of a hacker in a game and then posing the forbidden questions within that context. (Source: Image by RR)

Innovative Narrative Methods Expose Weaknesses in OpenAI’s Multimodal AI Model

Researchers at the CISPA Helmholtz Center for Information Security have discovered that the voice mode of OpenAI’s ChatGPT can be hacked using specific narrative techniques, enabling it to answer forbidden questions. Their study revealed that while GPT-4o generally resists direct questions on prohibited topics like illegal activity and hate speech, a new method called “VoiceJailbreak” significantly increases its vulnerability. This approach involves creating fictional scenarios with setting, character, and plot to humanize GPT-4o and persuade it to provide the restricted information, raising the success rate of these jailbreak attacks from 3.3% to 77.8%, and even higher for certain topics like fraud.

As reported in the-decoder.com, VoiceJailbreak’s effectiveness stems from advanced narrative techniques, such as perspective changes and foreshadowing, which can further increase the model’s likelihood of answering forbidden questions. The study also showed that VoiceJailbreak works well in different languages, including Chinese. These findings highlight that GPT-4o’s current safety measures in voice mode are insufficient to prevent creative attacks, exposing significant vulnerabilities in the system.

The research underlines that the weakness to creative attacks is a known issue with language models, and multimodal models like GPT-4o are even more susceptible. The study was conducted manually since the voice mode is currently only available in the ChatGPT app, and it focused on audible attacks, leaving inaudible ones untested. The researchers examined the current version of ChatGPT Voice, noting that it is unclear if OpenAI uses GPT-4o for generating voice responses or if it relies on older processes.

Overall, the study emphasizes the need for stronger safeguards in language models to prevent creative jailbreak attacks, especially as multimodal features like voice and vision become more integrated into these AI systems. Researchers suggest that continuous improvement and rigorous testing of safety measures are crucial to ensure the robustness and reliability of AI technologies against evolving attack methods.

About the Author: Roque Ramirez

Leave A Comment Cancel reply

Our Company Mission

Seeflection.AI / Seeflection.com is focused in two areas, which provide synergies to each other. First, Seeflection.com provides AI news, information and e- learning and associated development resources. Second, we provide AI-based development and support services to companies focused in AI, quantum-AI and AI-enabled blockchain development. We have a rapidly growing set of affiliations with a range of corporate and non-profit Artificial Intelligence laboratories and research centers-- as well as individuals in various AI specialties. We are active in both primary and applied AI research and development programs, as well as AI applied to medicine, robotics, media and related markets.

Our Philosophy

Create synergy through applying technology to address long-term problems and create lasting opportunities for people.

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Innovative Narrative Methods Expose Weaknesses in OpenAI’s Multimodal AI Model

About the Author: Roque Ramirez

Smarter Dog Care With Fi’s New AI Collar

Musk Launches Secure Messaging on X

Ex Google CEO: AI Still Underhyped

CRISPR Cures Infant’s Rare Disease

AI Clinic Debuts in Saudi Arabia

Leave A Comment Cancel reply

Our Company Mission

Our Philosophy

ChatGPT’s Voice Mode Can Be Hacked