Artificial Intelligence – Reasoning Debate pt. V

It is still the simple stuff that causes all the problems.

  • Despite large improvements in general performance, LLM-based systems are still making the same types of mistakes, which continues to underpin my long-held view that the machines are incapable of true reasoning.
  • A new research paper published in Transactions on Machine Learning Research from Stanford University pulls together all of the instances where LLMs have failed to reason and presents them within a framework that greatly aids in the understanding of this topic (see here).
  • RFM Research has long concluded that the weaknesses demonstrated by LLMs are due to their lack of understanding of the causality of the task that they are being asked to perform.
  • This is inherent in their design as they are probabilistic systems rather than deterministic, which is why one gets a slightly different answer every time one asks the same question.
  • This is the weakness in their nature, which, in my opinion, will prevent any system purely based on an LLM-like architecture from achieving AGI (artificial general intelligence), the point at which the machines become more intelligent than humans.
  • Reasoning is crucial because true reasoning can only really occur when causal understanding is present which is why RFM Research pays particular attention to the LLM’s ability to reason.
  • Song et al define 3 different types of reasoning and types of failure for each category to create an easy-to-use taxonomy of where the LLMs are going wrong (fig. 1).
  • The paper also presents an extensive bibliography of research references to each failure, as well as a detailed explanation of each failure and how it failed, starting on page 47.
  • The paper finds a number of common threads in terms of why the models fail in different reasoning tasks, which enables the authors to put forward some ideas in terms of how these can be minimised.
  • The key point here is that the paper is highlighting far more than just the most simple flaw (reversal curse), which RFM Research has long used as its main example of how the machines are unable to reason.
  • Furthermore, the paper does not offer a solution in terms of how reasoning in LLMs can be fixed, but a series of workarounds to try and minimise the problems so that it becomes less of an issue.
  • Hence, this paper is a further clear indication that the machines are still incapable of true reasoning from first principles despite hundreds of billions of dollars of investment and model sizes that boggle the mind.
  • Consequently, there remains no evidence that a purely LLM-based system will ever achieve a human-like intelligence, no matter how big the model becomes or how much data is thrown at it.
  • Hence, I continue to believe that while this avenue of AI development will produce apps and services that have plenty of economic value, it will not produce AGI.
  • Consequently, any company that has a valuation that is predicated on the machines becoming more intelligent than humans is heading for a major reset.
  • I would put both OpenAI and Anthropic in this category, which is why I continue to think that the public markets may think twice when it comes to paying the massive valuations that are going to be demanded when these companies go public.
  • This could be the catalyst for a general reset where hopes and dreams give way to a focus on revenues and cash flow.
  • However, in the interim, the industry will continue to spend everything it has, and more on AI compute capacity, simply not to fall behind competitors.
  • Hence, a reset is also likely to reduce the competitive pressure and allow the industry to spend much more slowly, which I suspect it would be very happy to do if it no longer feared missing out.
  • Once again, it is clear that this environment benefits the providers of the silicon chips and equipment that power the AI data centres, and it is here where I have positions.
  • Nvidia is the best direct investment, but the valuations of the memory makers look more attractive, and here I have a position in Samsung Electronics.
  • The adjacencies of inference at the edge, where I hold Qualcomm and nuclear power, where I have a range of uranium companies, are also good ways to play the trend without losing one’s shirt (hopefully).
  • The net result is that it is clear that the machines are still unable to reason, meaning that LLM-only is not a winning solution.
  • Hence, I continue to think that a combination of software (which can reason) and deep learning systems (which can learn and include LLMs) is how a more advanced machine intelligence will be achieved.
  • However, at the moment, this view exists in a small minority, meaning that there is very little attention and resources being paid to it.
  • Everything is still pointing towards a reset at some point.

RICHARD WINDSOR

Richard is founder, owner of research company, Radio Free Mobile. He has 16 years of experience working in sell side equity research. During his 11 year tenure at Nomura Securities, he focused on the equity coverage of the Global Technology sector.

Leave a Comment