OpenAI’s Dall-E Signals a Paradigm Shift

AI Gets Interpretive with the Dall-E Image Producing Program

Rob Toews of forbes.com wrote a column last week on AI that ended with this statement: “Things are only going to get more amazing from here.”

That is an understatement based on OpenAI’s new program model.

Earlier this month, OpenAI—the research organization behind last summer’s much-hyped language model GPT-3—released a new AI model named DALL-E. While it has generated less buzz than GPT-3 did, DALL-E has even more profound implications for the future of AI.

DALL-E uses text captions as input and produces original images as output. (The name is a tribute to the surrealist artist Salvador Dalí and the adorable Pixar robot WALL-E.)

For instance, when fed phrases as diverse as “a pentagonal green clock,” “a sphere made of fire” or “a mural of a blue pumpkin on the side of a building,” DALL-E is able to generate shockingly accurate visual renderings. (It is worth taking a few minutes to play around with some examples yourself.)

Why is DALL-E important?

As the first step towards a new AI paradigm known as “multimodal AI,” DALL-E represents the future of AI. Multimodal AI systems can interpret, synthesize, and translate between multiple informational modalities—in DALL-E’s case, language and imagery. DALL-E is not the first, but is the most advanced.

OpenAI co-founder Ilya Sutskever summed it up well:

“The world isn’t just text. Humans don’t just talk: we also see. A lot of important context comes from looking.”

Most AI systems in existence today deal with only one type of data. NLP models (e.g., GPT-3) handle only text; computer vision models (e.g., facial recognition systems) handle only images. This is a far less rich form of intelligence than what the human brain achieves effortlessly.

Humans continuously receive and integrate information from not one but five senses—we understand the world around us through a combination of sight, sound, touch, smell and taste. And we communicate information back to the world in a variety of ways—speech, text, body language, facial expression, music.

By pairing an understanding of natural language with an ability to generate corresponding visual representations—in other words, by being able to both “read” and “see”—DALL-E is a powerful demonstration of multimodal AI’s potential.

Toews predicts that future AI systems will “engage seamlessly across audio, video, speech, images, written text, haptics, and beyond.”

About the Author: Paul Morris

Morris has a background in multimedia and is a published author. He also has spent years in radio broadcasting, the music industry, and is now a contributor to Seeflection.com.

Leave A Comment Cancel reply

Our Company Mission

Seeflection.AI / Seeflection.com is focused in two areas, which provide synergies to each other. First, Seeflection.com provides AI news, information and e- learning and associated development resources. Second, we provide AI-based development and support services to companies focused in AI, quantum-AI and AI-enabled blockchain development. We have a rapidly growing set of affiliations with a range of corporate and non-profit Artificial Intelligence laboratories and research centers-- as well as individuals in various AI specialties. We are active in both primary and applied AI research and development programs, as well as AI applied to medicine, robotics, media and related markets.

Our Philosophy

Create synergy through applying technology to address long-term problems and create lasting opportunities for people.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31