Nvidia Says Synthetic Data Will Be the Wave of the Future for Machine Learning
Zuckerberg may have the Metaverse. Nvidia has created the Omniverse. Two different worlds, of course, but Omniverse will probably be far more useful to programmers, researchers, and the like. Omniverse deals with synthetic data to program algorithms.
In spectrum.ieee.org this week we found an intriguing and clearly stated explanation on the advantages of using synthetic data by Rev Lebaredian, vice president of simulation technology and Omniverse engineering at Nvidia.
Some say synthetic data is what will unlock the true potential of AI. Synthetic data is generated instead of being collected, and the consultancy Gartner has estimated that 60 percent of data used to train AI systems will be synthetic. But its use is controversial, as questions remain about whether synthetic data can accurately mirror real-world data and prepare AI systems for real-world situations.
Eliza Strickland interviewed the Nividia VP for her article and she began with:
Q 1: The Omniverse Replicator is described as “a powerful synthetic data generation engine that produces physically simulated synthetic data for training neural networks.” Can you explain what that means, and especially what you mean by “physically simulated”?
Rev Lebaredian: “Video games are essentially simulations of fantastic worlds. There are attempts to make the physics of games somewhat realistic: When you blow up a wall or a building, it crumbles. But for the most part, games aren’t trying to be truly physically accurate, because that’s computationally very expensive. So it’s always about: What approximations are you willing to do in order to make it tractable as a computing problem? A video game typically has to run on a small computer, like a console or even on a phone. So you have those severe constraints. The other thing with games is that they’re fantasy worlds and they’re meant to be fun, so real-world physics and accuracy is not necessarily a great thing.
With Omniverse, our goal is to do something that really hasn’t been done before in real-time world simulators. We’re trying to make a physically accurate simulation of the world. And when we say physically accurate, we mean all aspects of physics that are relevant. How things look in the physical world is the physics of how light interacts with matter, so we simulate that. We simulate how atoms interact with each other with rigid-body physics, soft-body physics, fluid dynamics, and whatever else is relevant. Because we believe that if you can simulate the real world closely enough, then you gain superpowers.
With synthetic data, it’s just far easier and far less expensive to simulate the worlds you need in video games and similar products. The real world is hard to translate into the digital world unless you get to ignore some of the real laws of physics. Synthetic data allows you to do that.”
Q 2: “Okay, so that’s what you’re trying to build with Omniverse. How does all this help with AI?”
Lebaredian: “In this new era of AI, developing advanced software is no longer something that just a grad student with a laptop can do. It requires serious investment. All the most advanced algorithms that mankind will develop in the future are going to be trained by systems that require a lot of data. That’s why people say data is the new oil. And it seems like the big tech companies that collect data have a natural advantage. But the truth is that for most of the AI that we’re going to create in the future, none of the data we have collected is that useful.
What we’re building with Omniverse is a very general development platform that anyone can take and customize for their particular needs. Out of the box, you get multiple renderers, which are simulators of the physics of light and matter. You get a spectrum of them that let you trade off accuracy for speed.”
Lebaredian has a way with his explanations that make even laymen understand more about synthetic data and the role it will play in all of our futures. The data that is generated will surely find its way to you in all manner of AI-driven products. Akready it is being used for autonomous vehicle training, robotics and is expected to address algorithmic bias derived from databases in the real world.
read more at spectrum.ieee.org