ChatGPT and other large language models learn better from optimization prompts in natural language, DeepMind discovered. (Source: Adobe Stock)

DeepMind Researchers Succeed with Model to Teach AI How to Perform Better

With the headline “Tell AI Model to ‘Take a Deep Breath,'” a recent arstechnica.com story shows that sometimes the solution to poor AI performance is leveraging another AI that’s specifically trained to improve it. Math AI models were optimized by DeepMind’s AI that told ChatGPT and PaLM 2 what to do in natural language.

“In a paper called ‘Large Language Models as Optimizers’ listed this month on arXiv.org, DeepMind scientists introduced Optimization by PROmpting (OPRO), a method to improve the performance of large language models (LLMs) such as OpenAI’s ChatGPT and Google’s PaLM 2. This new approach sidesteps the limitations of traditional math-based optimizers by using natural language to guide LLMs in problem-solving. ‘Natural language’ is a fancy way of saying everyday human speech.”

OPRO uses “meta-prompts” in natural language to work on optimization. The large language models come up with potential solutions, then OPRO “tests them by assigning each a quality score.” The scores are ranked on accuracy, past results and a natural language description. Trial and error, using the optimizer to generate higher-scoring prompts, helps the system find the best solutions, which are added back to the meta-prompt. Prompts in natural language, like “let’s think step-by-step” help the system to become more accurate. It’s almost as if the OPRO is a coach for ChatGPT and other LLMs like it.

“What ‘reasoning’ they do (and ‘reasoning’ is a contentious term among some, though it is readily used as a term of art in AI) is borrowed from a massive data set of language phrases scraped from books and the web. That includes things like Q&A forums, which include many examples of ‘let’s take a deep breath’ or ‘think step by step’ before showing more carefully reasoned solutions. Those phrases may help the LLM tap into better answers or produce better examples of reasoning or problem-solving from the data set it absorbed into its neural network during training.”

Because it’s trained on the internet, the models “understand” human language and its implications and may become more accurate as these OPROs find the magic words to get them to work better.

read more at arstechnica.com