Meta’s Spirit Redefines AI Voice

Meta’s new open-source AI model, Spirit LM, combines text and speech capabilities with phonetic, pitch, and tone integration to deliver human-like expressiveness in voice outputs, aiming to transform AI voice technology and encourage open research in multimodal AI systems. (Source: Image by RR)

Meta’s Spirit LLM Can Detect and Reproduce Emotions Like Anger, Happiness and Surprise

Meta Platforms Inc.’s Fundamental AI Research team has introduced Spirit LM, a new open-source multimodal large language model (LLM) that can process both text and speech inputs and outputs, competing with advanced models like OpenAI’s GPT-4o and Hume AI’s EVI 2. Spirit LM aims to overcome limitations in existing AI voice systems that often sound robotic by incorporating tokens for phonetics, pitch and tone to add human-like expressiveness to speech. This innovative design allows Spirit LM to learn new tasks across various modalities, including speech recognition, text-to-speech, and speech classification, making its outputs more natural and emotionally nuanced.

According to a story in siliconangle.com, Meta is releasing two versions of Spirit LM under the FAIR Noncommercial Research License: Spirit LM Base, which uses phonetic tokens, and Spirit LM Expressive, which adds tokens for pitch and tone to understand and reproduce more complex emotional states like excitement, sadness, anger, and happiness. Trained on diverse datasets of both text and speech, these models can perform cross-modal tasks with human-like expressiveness, enhancing the capabilities of AI assistants in areas like customer service, where nuanced interactions can significantly improve user satisfaction.

To support the research community, Meta is providing access to the models’ weights, code, and documentation, encouraging further experimentation and development. The goal is to inspire researchers to explore new integrations of speech and text in multimodal AI systems, potentially leading to more advanced and emotionally intelligent AI applications. Meta’s approach emphasizes open collaboration to push the boundaries of AI expressiveness and functionality.

In addition to Spirit LM, Meta announced updates to its Segment Anything model, which powers image and video segmentation tasks for applications like medical imaging and meteorology. The company also shared new research on enhancing the efficiency of large language models (LLMs), aligning with its broader aim to develop advanced machine intelligence (AMI) that integrates multiple AI capabilities seamlessly.

About the Author: Roque Ramirez

Leave A Comment Cancel reply

Our Company Mission

Seeflection.AI / Seeflection.com is focused in two areas, which provide synergies to each other. First, Seeflection.com provides AI news, information and e- learning and associated development resources. Second, we provide AI-based development and support services to companies focused in AI, quantum-AI and AI-enabled blockchain development. We have a rapidly growing set of affiliations with a range of corporate and non-profit Artificial Intelligence laboratories and research centers-- as well as individuals in various AI specialties. We are active in both primary and applied AI research and development programs, as well as AI applied to medicine, robotics, media and related markets.

Our Philosophy

Create synergy through applying technology to address long-term problems and create lasting opportunities for people.

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Meta’s Spirit LLM Can Detect and Reproduce Emotions Like Anger, Happiness and Surprise

About the Author: Roque Ramirez

Robotic Sheet Crawls, Grips, Adapts

AI Maps Any Protein in a Cell

Gen X CEO Numbers Shrink

GPT-5: OpenAI’s Most Powerful AI Yet

AI Bots Form Wall Street Cartels

Leave A Comment Cancel reply

Our Company Mission

Our Philosophy

Meta’s Spirit Redefines AI Voice