Improving LLMs when scaling is infeasible

Blog by Ai4Value software developer Otto Westerlund

There’s no doubt that LLM technology led by the efforts of OpenAI have revolutionized how many people think about AI and how it will impact our future world. While the transformer architecture invented by Google, which is behind foundation models like GPT marks a significant breakthrough, allowing for processing and encoding of massive amounts of data into neural networks, simply scaling up the models have started seeing diminishing returns in the sense that it is becoming economically infeasible to just make the neural networks bigger and bigger. AI labs are already looking at new techniques to enhance the model capabilities. A promising avenue is using machine learning to enhance classic algorithmic techniques such as search.

Machine learning, including neural networks behind GPT models fundamentally work in a way such that it is not sensible to use them as is for every problem out there. The reason is that machine learning models essentially “guess” answers. These guesses are in many cases good enough, and sometimes they can become much better than what a human is capable of. But in some cases an approximate answer does not make sense (as an example it does not make sense to train a neural network to perform basic arithmetic), and here traditional algorithmic techniques are better suited to provide a precise verified solution to the problem at hand.

Classic methods such as search algorithms have their own tradeoffs. One is that sometimes the space of solutions to search through is intractably large, so finding the correct solution may take a long time (anything from hours to years). Another challenge is that sometimes it is very difficult to define the rules that the algorithm should follow to evaluate a solution. In these scenarios machine learning methods may be preferable.

Combining these paradigms may lead to even better results in the future. We can see that giant foundation models such as GPT provide decent answers, getting us 80-90% of the way there. New techniques include having the LLM generate a ton of potential answers and using search algorithm to evaluate and find the best one, using the LLM as a sort of heuristic to drastically reduce the search space of possible solutions. There is also growing interest in going back to smaller specialized models, rather than trying to have one giant “god” model know and do everything.

While fully delivering on the current promises of AI may be a breakthrough or two away, there are already many possibilities to bring value through currently available methods. In the meantime interesting things are happening in the background and progress is being made toward even bigger rewards.