AI Surpasses Human Ability to Read Lips from Videos

LIBS makes lip reading more accurate.

Some Pose Questions about Ethics, Profitability of Lip Reading

Major League Baseball and the Houston Astros are investigating new allegations that the Texas team illegally used electronic equipment to steal opponents’ pitching signs during their championship season two years ago. This problem began as soon as baseball started using signals to tell pitchers whether to throw a fastball or a curveball. Now it may become even easier to see what the manager or line coach is saying.

Kyle Wiggers of venturebeat.com wrote about AI’s ability to read lips from videos, which he explains is not a brand new idea:

“AI and machine-learning algorithms capable of reading lips from videos aren’t anything out of the ordinary, in truth. Back in 2016, researchers from Google and the University of Oxford detailed a system that could annotate video footage with 46.8% accuracy, outperforming a professional human lip-reader’s 12.4% accuracy.”

In pursuit of a higher performing system, researchers at Alibaba, Zhejiang University, and the Stevens Institute of Technology devised a method dubbed Lip by Speech (LIBS), which uses features extracted from speech recognizers to serve as complementary clues. They say it manages industry-leading accuracy on two benchmarks, besting the baseline by a margin of 7.66% and 2.75% in character error rate.

Wiggers describes how this could help a wide range of people. LIBS and other solutions like it could help those hard of hearing to follow videos that lack subtitles. An estimated 466 million people globally suffer from disabling hearing loss, about 5% of the world’s population. By 2050, the number could rise to over 900 million, according to the World Health Organization. LIBS distills useful audio information from videos of human speakers at multiple scales, including at the sequence level, context level, and frame level. It then aligns this data with video data by identifying the correspondence between them (due to different sampling rates and blanks that sometimes appear at the beginning or end, the video and audio sequences have inconsistent lengths), and it leverages a filtering technique to refine the distilled features.

Both the speech recognizer and lip reader components of LIBS are based on an attention-based sequence-to-sequence architecture, a method of machine translation that maps an input of a sequence (i.e., audio or video) to an output with a tag and attention value.

It’s a technical article containing complex information, but could be of use⏤especially for scouts seeking the Washington Nationals pitch signals from this year’s World Series.

About the Author: Paul Morris

Morris has a background in multimedia and is a published author. He also has spent years in radio broadcasting, the music industry, and is now a contributor to Seeflection.com.

Leave A Comment Cancel reply

Our Company Mission

Seeflection.AI / Seeflection.com is focused in two areas, which provide synergies to each other. First, Seeflection.com provides AI news, information and e- learning and associated development resources. Second, we provide AI-based development and support services to companies focused in AI, quantum-AI and AI-enabled blockchain development. We have a rapidly growing set of affiliations with a range of corporate and non-profit Artificial Intelligence laboratories and research centers-- as well as individuals in various AI specialties. We are active in both primary and applied AI research and development programs, as well as AI applied to medicine, robotics, media and related markets.

Our Philosophy

Create synergy through applying technology to address long-term problems and create lasting opportunities for people.

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Some Pose Questions about Ethics, Profitability of Lip Reading

About the Author: Paul Morris

Anthropic Opens AI App Floodgates

Disney Declares War on AI

OpenAI’s “io” Device Delayed Until 2026

New Hope for Chromosome Disorders

Military Commissions Big Tech Executives

Leave A Comment Cancel reply

Our Company Mission

Our Philosophy

AI Surpasses Human Ability to Read Lips from Videos