The use of synthetic data is expected to skyrocket in the next two years.

Synthetic Data Expected to Transform Programming by Generating Precise Data

Today’s story is about data. In particular synthetic data. It’s a little bit like clothing material. We used to use ground-grown, hand-processed cotton or wool to make our clothing, then came synthetic materials. That changed the whole picture about apparel. According to experts, synthetic data will do the same thing to the digital world by 2024.

So here is what an article from had to say when it comes to data: Synthetic data is an elegantly simple concept—one of those ideas that seems almost too good to be true. In a nutshell, synthetic data technology enables practitioners to simply digitally generate the data that they need, on-demand, in whatever volume they require, tailored to their precise specifications.

And on the surface that seems all on the up and up. But what if you are building a program that won’t produce the exact results you were shooting for, do you then create the data synthetically? Is that ethical?

According to a widely referenced Gartner study, 60% of all data used in the development of AI will be synthetic rather than real by 2024. Take a moment to digest this. This is a striking prediction.

Data is the foundation of the modern economy. It is, in the words of The Economist, “the world’s most valuable resource.” And within a few short years, the majority of the data used for AI may come from a disruptive new source—one that few companies today understand or even know about.

“We can simply say that the total addressable market of synthetic data and the total addressable market of data will converge,” said Ofir Zuk, CEO/cofounder of synthetic data startup Datagen.

Yes, synthetic data and the ability to produce and use it will be huge money for investments and huge profits from results. But will it be ethical?

The autonomous vehicle sector was where the technology first found serious commercial adoption, starting in the mid-2010s. It was used as a way to train autonomous vehicles. The AV sector had the capital and the machine learning talent to apply to developing it.

“Collecting real-world driving data for every conceivable scenario an autonomous vehicle might encounter on the road is simply not possible. Given how unpredictable and unbounded the world is, it would take literally hundreds of years of real-world driving to collect all the data required to build a truly safe autonomous vehicle.

So instead, AV companies developed sophisticated simulation engines to synthetically generate the requisite volume of data and efficiently expose their AI systems to the “long tail” of driving scenarios. These simulated worlds make it possible to automatically produce thousands or millions of permutations of any imaginable driving scenario—e.g., changing the locations of other cars, adding or removing pedestrians, increasing or decreasing vehicle speeds, adjusting the weather, and so on.”

The first batch of synthetic data startups that emerged thus targeted the autonomous vehicle end market. This included companies like Applied Intuition (most recently valued at $3.6 billion), Parallel Domain, and Cognata.

The Forbes columnist predicts that synthetic data will transform the economics, ownership, strategic dynamics, and even (geo)politics of data. And there’s hope it will create data that’s ethical and honest.