The fake human faces were created by algorithms after real people’s faces and bodies were scanned in for training purposes. (Source: Datagen)

Sometimes Its Better to Be Fake Than Real in AI Profiles

According to a story on MIT’s, a growing number of AI companies are creating synthetic human data for use in training deep learning models so that the algorithms can build models for firms ranging from finance and insurance companies to health care companies.

Synthesis AI, based in San Francisco, creates fake humans, as does Datagen, an Israeli company. One of the reasons for the strategy is because real data is flawed, as well as riddled with privacy issues. As the story notes:

“You can produce perfectly labeled faces, say, of different ages, shapes, and ethnicities to build a face-detection system that works across populations.”

The drawback is that it needs to be based on reality, or it could cause problems. According to Cathy O’Neil, a data scientist and founder of the algorithmic auditing firm ORCAA:

“What I don’t want to do is give the thumbs up to this paradigm and say, ‘Oh, this will solve so many problems, because it will also ignore a lot of things.”

Datagen tries to avoid that kind of problem by giving detailed instructions on scanning people whose bodies are put into the system and then altered by age, BMI and ethnicity, so a range of humans are represented, and have them do a range of actions so that their physical avatars seem real. Datagen’s algorithms then expand the data into hundreds of thousands of combinations. Then it’s checked again. Fake faces are plotted against real faces to check realism.

“Datagen is now generating facial expressions to monitor driver alertness in smart cars, body motions to track customers in cashier-free stores, and irises and hand motions to improve the eye- and hand-tracking capabilities of VR headsets. The company says its data has already been used to develop computer-vision systems serving tens of millions of users.”

Synthetic people are being used for everything from inspecting cars to simulating data  that a diverse population would generate and predicting whether fraudulent activity is occuring.