Researchers from Carnegie Mellon and Tel Aviv University discovered that ‘in-context learning’ (ICL), which involves providing large language models with numerous examples directly in the prompt, is more effective than traditional fine-tuning, particularly as the model’s context window expands to accommodate hundreds or thousands of examples for diverse tasks. (Source: Image by RR)

AI’s New Frontier: Employing Thousands of Examples for Enhanced Machine Learning

Researchers from Carnegie Mellon and Tel Aviv University have found that using “in-context learning” (ICL) with large language models (LLMs), where many examples are included directly in the prompt, can be more effective than the traditional method of fine-tuning the models. This approach benefits particularly from the ability of LLMs to handle larger context windows, allowing for hundreds to thousands of examples, which is especially useful for tasks with a wide range of possible answers. As noted in, the use of a retrieval algorithm to select the most relevant examples from a dataset further enhances ICL’s effectiveness by ensuring that only the most pertinent information is presented to the model.

The study also explored the diminishing returns of using more examples in ICL as the number of examples grows, finding that with larger sets of data, the specific examples or their sequence becomes less critical. This suggests that longer prompts inherently become more robust, reducing the dependency on the fine-tuning process which traditionally requires extensive datasets and can be time-consuming. This finding highlights the potential for ICL to achieve high performance even without the model learning tasks in the traditional sense, instead leveraging the provided examples to generate answers.

Further experiments with variants of Llama-2-7B and Mistral-7B language models capable of processing very long texts showed that ICL with extensive examples can effectively replace both retrieval methods and fine-tuning. The choice between using ICL and fine-tuning hinges on cost considerations: fine-tuning incurs higher one-time costs while ICL demands more computing power for processing numerous examples. Researchers suggest that employing many-shot prompts until a reliable outcome is achieved might be more cost-effective before committing to fine-tuning.

The findings align with similar research from Google Deepmind, confirming that using a large number of examples in prompts substantially improves the performance of LLMs. As models continue to advance in handling longer inputs, the study predicts that long-context ICL will become an increasingly powerful method for a broad range of AI tasks, offering a feasible alternative to fine-tuning by balancing the costs of model training and inference.