The so called Chain of Thought(CoT) or Tree of Thought reasoning with LLM has very little to do with reasoning. In CoT the problem is broken down to smaller reasoning problems to guide the LLM response. All CoT does is to trigger the right sequence of latent states in the Transformer. When new similar problem is entered as prompt, the same sequence of hidden states is triggered. Those hidden states generate the right token based on the earlier occurence of a similar token from a broken down problem and what appeared after that token. A latent state represents representation of the token and its relationship with earlier tokens. The same relationship chain is used for a new similar problem. The only mechanism at play is co occurrence pattern learning and matching. There is no mystical neural circuits for reasoning form inside the Transformer. It’s precisely the reason LLM fails badly for complex and compositional reasoning tasks. But the narrative you will hear from LLM gurus is that a problem is broken down to smaller problems because that’s how humans reason and solve problems. #ai #llm #cot #reasining https://lnkd.in/gjshg9Ag
I feel like when we are talking about models of a sufficient size, that this miopic view isn't really very accurate at all. CoT and similar methods work primarily because of two factors: 1. Memozation of intermediate decisions/state. If you neeed to make several decisions to output a correct response then by outputing intermediate tokens the model can apply more computation to each intermediate decision. Going forward the attention layers can then focus on more on the pre-determined intermediate results on the following inference passes. This gets less useful (and necessary) as the amount of state a model can contain in flight during a single inference pass increases. 2. Increasing raw computation. This is a secondary effect where as you pad the length of the outputs you are also increasing the width of your active context window for models that use masking. This effect can be seen in the recent paper https://arxiv.org/abs/2310.02226 When dealing with the larger models, they tend to operate at higher levels of abstraction anyway, though certainly some completion biases are still present. It isn't simply looking at token probabilities, it is looking at high level concept relations.
There are roughly two schools of thoughts when it comes to LLM performance. One is to take an analogy of how it works with human thought process and use lingo from human cognitive science to explain the model behavior. The other is to treat it as a “stochastic parrot” and thus lingo from statistics. The truth is we don’t know enough to know which is which, but obviously the first approach tend to hype the digital intelligence while the second one downplays some of the amazing things transformers can do that was not possible in any previous generations of statistical models. I do want to point one thing out that it is not fair to say that prompt engineering is to push the model to a set of “hidden states”. In fact we should not think of Transformer as a representative learning model. Instead, it calculates token interactions. I am not sure if we can get a meaningful interpretation of “latent space” from the layers of transformers (the mechanists interpretation of Transformers is far less successful than CNNs). Transformers likely don’t have the concept of latent states. They are more like calculations across the whole input.
This is debatable...
Humams dont have to break down higher-level tasks once they have learned them. They learn them from combining lower-level tasks but once the complex task is established it functions like another low-level task.
Simply put: it's correlation (tokens) mistaken for causation (meanings)
how does this compare with reductionist cognitive science theories where no free-will exists, as in deterministic reductionist cognition of humans .
A must read. Thanks Pranab Ghosh for sharing
AI Consultant || MIT Alumni || Entrepreneur || Open Source Project Owner || Blogger
9moA better name for LLM Chain of Thought is Chain of co occurrence.