Fugatto, a groundbreaking generative AI model, revolutionizes audio creation by enabling users to generate, transform and customize music, voices and soundscapes through text prompts, offering unparalleled versatility for industries like music, gaming and advertising. (Source: Image by RR)

Generate, Transform, and Evolve Soundscapes With Text Prompts Using Fugatto

Fugatto, a revolutionary generative AI model for audio created by a team of researchers, serves as a “Swiss Army knife” for sound, enabling users to generate or transform music, voices and soundscapes through text prompts. Leveraging advanced capabilities, Fugatto can create music snippets, modify songs by adding or removing instruments, adjust accents or emotions in voices, and even produce entirely novel sounds never heard before. Designed with cutting-edge technology, this tool offers unprecedented possibilities for creative industries, from music production to advertising and gaming.

Built with 2.5 billion parameters and trained on NVIDIA DGX systems, Fugatto showcases emergent capabilities that allow it to blend free-form instructions for highly customizable outputs. Unlike traditional AI models, Fugatto supports structured tasks such as creating evolving soundscapes or transforming audio with complex attributes. Its innovative temporal interpolation feature lets users generate dynamic soundscapes, such as thunderstorms transitioning into bird songs at dawn, providing fine-grained control over sound evolution. These unique features, as noted in blogs.nvidia.com, make Fugatto a foundational model for audio synthesis and transformation.

The potential applications for Fugatto span multiple industries. Music producers can rapidly prototype new ideas, try different instruments or styles, and enhance audio quality. Advertisers can create localized campaigns by adjusting accents or tones for different audiences, while game developers can modify sound assets to align with real-time gameplay. The model’s ability to synthesize personalized voices for language learning tools or tailor audio for professional use cases, such as scientific or legal applications, further highlights its versatility.

Developed by a diverse, global team, Fugatto represents a milestone in generative AI for audio. The project required over a year of collaboration and a multifaceted approach to compiling millions of audio samples for training. The researchers’ innovative methods enabled Fugatto to perform new tasks and improve its accuracy without needing additional data. From its first demonstration of generating music to creating electronic beats synchronized with barking dogs, Fugatto has set the stage for the future of creative and technical sound applications.

read more at blogs.nvidia.com