By teaching a robot to learn lip movements through observation rather than programming, Columbia researchers have taken a major step toward emotionally expressive humanoids that can communicate more naturally with humans. (Source: Image by RR)

Columbia Researchers Use Observational Learning to Teach Robot Lip Motion

Researchers at Columbia University’s School of Engineering and Applied Science have developed a robot that can learn realistic lip movements for speech and singing—by watching itself and humans on video rather than following preprogrammed rules. Published in Science Robotics, the study demonstrates a major advance in humanoid facial expression, an area long considered one of robotics’ most difficult challenges due to the uncanny valley effect.

The robot, according to an article in techxplore.com, first learned how its own face worked by observing its reflection in a mirror. Equipped with 26 facial motors and flexible skin, it generated thousands of random expressions to understand how motor actions translated into visible facial movements. This process created a foundational “vision-to-action” model, allowing the robot to associate specific motor activations with particular facial appearances, much like a child experimenting with expressions for the first time.

Next, the researchers exposed the robot to hours of YouTube videos showing humans talking and singing. By observing how mouths moved in response to different sounds and phonemes, the robot learned to translate audio directly into coordinated lip movements—without understanding the meaning of the speech itself. The result was a system capable of lip-syncing across languages and even performing songs, including tracks from its AI-generated album titled hello world.

While the lip-syncing is not yet perfect—certain sounds like “B” and “W” remain challenging—the researchers see this work as a crucial step toward more emotionally resonant robots. They argue that facial affect is the missing link in human–robot interaction, especially as humanoids move into education, healthcare, entertainment and elder care. Combined with conversational AI systems, realistic facial gestures could dramatically deepen human–robot connection, though the team cautions that such powerful social technologies must be developed carefully.

read more at techxplore.com