V2A technology can be paired with video generation models like Veo to create dramatic scores, realistic sound effects, or dialogue that matches video content, and it can also generate soundtracks for archival material, silent films and other traditional footage, expanding creative opportunities. (Source: Image by RR)

V2A AI Innovation Brings Creative Control in Sound to Filmmakers, Content Creators

Video generation models have advanced significantly, but many still produce silent videos. To address this, new video-to-audio (V2A) technology can create synchronized soundtracks using video pixels and text prompts. As noted in deepmind.google, V2A can generate various soundscapes, such as dramatic scores, realistic sound effects, or dialogue that matches the characters and tone of a video, and it can be used with both generated and traditional footage, like silent films or archival material.

V2A offers enhanced creative control, allowing users to generate unlimited soundtracks for any video. Users can define ‘positive’ or ‘negative’ prompts to guide the generated sounds toward or away from specific audio elements, enabling rapid experimentation and selection of the best match for their needs. This flexibility gives more control over the audio output, making the technology adaptable to different creative requirements.

The V2A system employs a diffusion-based approach for audio generation, which starts by encoding video input into a compressed format. The diffusion model iteratively refines audio from random noise, guided by visual input and natural language prompts, to produce synchronized and realistic sound. Additional training data, such as AI-generated annotations and dialogue transcripts, enhances the model’s ability to associate specific audio events with visual scenes, improving the overall quality and accuracy of the generated audio.

Despite its progress, V2A technology faces challenges, including maintaining audio quality when video input is distorted and improving lip synchronization for speech videos. The development team is committed to addressing these issues and ensuring the technology’s responsible use. They are gathering feedback from creators and filmmakers and incorporating safeguards like the SynthID toolkit to watermark AI-generated content. Before public release, the technology will undergo rigorous safety assessments and testing to ensure its positive impact on the creative community.

read more at deepmind.google