Meta’s new open-source AI model, Spirit LM, combines text and speech capabilities with phonetic, pitch, and tone integration to deliver human-like expressiveness in voice outputs, aiming to transform AI voice technology and encourage open research in multimodal AI systems. (Source: Image by RR)

Meta’s Spirit LLM Can Detect and Reproduce Emotions Like Anger, Happiness and Surprise

Meta Platforms Inc.’s Fundamental AI Research team has introduced Spirit LM, a new open-source multimodal large language model (LLM) that can process both text and speech inputs and outputs, competing with advanced models like OpenAI’s GPT-4o and Hume AI’s EVI 2. Spirit LM aims to overcome limitations in existing AI voice systems that often sound robotic by incorporating tokens for phonetics, pitch and tone to add human-like expressiveness to speech. This innovative design allows Spirit LM to learn new tasks across various modalities, including speech recognition, text-to-speech, and speech classification, making its outputs more natural and emotionally nuanced.

According to a story in siliconangle.com, Meta is releasing two versions of Spirit LM under the FAIR Noncommercial Research License: Spirit LM Base, which uses phonetic tokens, and Spirit LM Expressive, which adds tokens for pitch and tone to understand and reproduce more complex emotional states like excitement, sadness, anger, and happiness. Trained on diverse datasets of both text and speech, these models can perform cross-modal tasks with human-like expressiveness, enhancing the capabilities of AI assistants in areas like customer service, where nuanced interactions can significantly improve user satisfaction.

To support the research community, Meta is providing access to the models’ weights, code, and documentation, encouraging further experimentation and development. The goal is to inspire researchers to explore new integrations of speech and text in multimodal AI systems, potentially leading to more advanced and emotionally intelligent AI applications. Meta’s approach emphasizes open collaboration to push the boundaries of AI expressiveness and functionality.

In addition to Spirit LM, Meta announced updates to its Segment Anything model, which powers image and video segmentation tasks for applications like medical imaging and meteorology. The company also shared new research on enhancing the efficiency of large language models (LLMs), aligning with its broader aim to develop advanced machine intelligence (AMI) that integrates multiple AI capabilities seamlessly.

read more at siliconangle.com