AI Research SuperCluster Will Encrypt Data, Stay Off the Internet in Powering Metaverse
Mark Zuckerberg and his team at Meta have built a supercomputer. It’s easy to be skeptical, but it looks like the CEO of the 3 billion patron platform has hit a winner.
By the middle of this year, when an expansion of the system is complete, it will be the fastest around, predicted Meta researchers Kevin Lee and Shubho Sengupta in a blog post recently. The AI Research SuperCluster (RSC) will one day work with neural networks with trillions of parameters, they write. The number of parameters in neural network models has been rapidly growing. The natural language processor GPT-3, for example, has 175 billion parameters, and such sophisticated AIs are only expected to grow.
Samuel K. Moore writing for spectrum.ieeee.com says that the RSC could be one of the fastest if not the fastest computers on the planet. And considering it is made up of racks and racks of processors and takes up most of a warehouse it’s hard to call it a computer when it’s really a network of computers focused on the same goal: creating and maintaining the Metaverse.
A basic description of how to train a neural network is repetition. Exposing an algorithm to repeated photos, or math formulas as it is taught which is correct and which is incorrect takes a lot of time and energy. Meta thinks its RSC will drastically cut time, energy, and the costs of doing Meta business.
And there are others with the same idea in mind. Start-ups such as Cerebras and SambaNova were launched in part to address training times.
The idea of merging computers to work on a project is certainly not new. But having a new type of network to be able to oversee these synchronized computers is another story. Here are a few of the improved numbers from Moore’s article.
Compared to the AI research cluster Meta uses today, which was designed in 2017, RSC is a change in the number of GPUs involved, how they communicate, and the storage attached to them.
“In early 2020, we decided the best way to accelerate progress was to design a new computing infrastructure from a clean slate to take advantage of new GPU and network fabric technology. We wanted this infrastructure to be able to train models with more than a trillion parameters on data sets as large as an exabyte—which, to provide a sense of scale, is the equivalent of 36,000 years of high-quality video.”
The old system connected 22,000 Nvidia V100 Tensor Core GPUs. The new one switches over to Nvidia’s latest core, the A100, which has dominated in recent benchmark tests of AI systems. At present, the new system is a cluster of 760 Nvidia DGX A100 computers, with a total of 6,080 GPUs. The computer cluster is bound together using an Nvidia 200-gigabit-per-second Infiniband network. The storage includes 46 petabytes (46 million billion bytes) of cache storage and 175 petabytes of bulk flash storage.
For us laymen, those are intimidating numbers. But for Meta, it means supplying the corporation major computing power for the next phase of their dynamic growth into the Metaverse.
“Ultimately, the work done with RSC will pave the way toward building technologies for the next major computing platform—the metaverse, where AI-driven applications and products will play an important role,” they write.
The amount of power this project will bring is jaw-dropping. Speeding up Computer Vision by 20 times its current speed and natural language processors by 300%.
“The experiences we’re building for the metaverse require enormous compute power (quintillions of operations / second!) and RSC will enable new AI models that can learn from trillions of examples, understand hundreds of languages, and more,” Meta CEO and cofounder Mark Zuckerberg said in a statement.
Moore’s article is detailed and exciting to read. It sure looks like Zuck hit it out of the park, again.
read more at spectrum.ieee.org
Leave A Comment