Pranab Ghosh’s Post

View profile for Pranab Ghosh, graphic

AI Consultant || MIT Alumni || Entrepreneur || Open Source Project Owner || Blogger

The so called Chain of Thought(CoT) or Tree of Thought reasoning with LLM has very little to do with reasoning. In CoT the problem is broken down to smaller reasoning problems to guide the LLM response. All CoT does is to trigger the right sequence of latent states in the Transformer. When new similar problem is entered as prompt, the same sequence of hidden states is triggered. Those hidden states generate the right token based on the earlier occurence of a similar token from a broken down problem and what appeared after that token. A latent state represents representation of the token and its relationship with earlier tokens. The same relationship chain is used for a new similar problem. The only mechanism at play is co occurrence pattern learning and matching. There is no mystical neural circuits for reasoning form inside the Transformer. It’s precisely the reason LLM fails badly for complex and compositional reasoning tasks. But the narrative you will hear from LLM gurus is that a problem is broken down to smaller problems because that’s how humans reason and solve problems. #ai #llm #cot #reasining https://lnkd.in/gjshg9Ag

Something-of-Thought in LLM Prompting: An Overview of Structured LLM Reasoning

Something-of-Thought in LLM Prompting: An Overview of Structured LLM Reasoning

towardsdatascience.com

Pranab Ghosh

AI Consultant || MIT Alumni || Entrepreneur || Open Source Project Owner || Blogger

9mo

A better name for LLM Chain of Thought is Chain of co occurrence.

Jeremy Owen

Staff Software Engineer @ LinkedIn | Machine Learning Expert

9mo

I feel like when we are talking about models of a sufficient size, that this miopic view isn't really very accurate at all. CoT and similar methods work primarily because of two factors: 1. Memozation of intermediate decisions/state. If you neeed to make several decisions to output a correct response then by outputing intermediate tokens the model can apply more computation to each intermediate decision. Going forward the attention layers can then focus on more on the pre-determined intermediate results on the following inference passes. This gets less useful (and necessary) as the amount of state a model can contain in flight during a single inference pass increases. 2. Increasing raw computation. This is a secondary effect where as you pad the length of the outputs you are also increasing the width of your active context window for models that use masking. This effect can be seen in the recent paper https://arxiv.org/abs/2310.02226 When dealing with the larger models, they tend to operate at higher levels of abstraction anyway, though certainly some completion biases are still present. It isn't simply looking at token probabilities, it is looking at high level concept relations.

Eric X.

Strive to be a Renaissance man everyday

9mo

There are roughly two schools of thoughts when it comes to LLM performance. One is to take an analogy of how it works with human thought process and use lingo from human cognitive science to explain the model behavior. The other is to treat it as a “stochastic parrot” and thus lingo from statistics. The truth is we don’t know enough to know which is which, but obviously the first approach tend to hype the digital intelligence while the second one downplays some of the amazing things transformers can do that was not possible in any previous generations of statistical models. I do want to point one thing out that it is not fair to say that prompt engineering is to push the model to a set of “hidden states”. In fact we should not think of Transformer as a representative learning model. Instead, it calculates token interactions. I am not sure if we can get a meaningful interpretation of “latent space” from the layers of transformers (the mechanists interpretation of Transformers is far less successful than CNNs). Transformers likely don’t have the concept of latent states. They are more like calculations across the whole input.

Li Deng

Chief AI Officer and Global Head of Machine Learning at @Vatic Investments

9mo

This is debatable...

Gabriele Scheler

Computational Neuroscience and Theoretical Biology

9mo

Humams dont have to break down higher-level tasks once they have learned them. They learn them from combining lower-level tasks but once the complex task is established it functions like another low-level task. 

Rémy Fannader

Author of 'Enterprise Architecture Fundamentals', Founder & Owner of Caminao

9mo

Simply put: it's correlation (tokens) mistaken for causation (meanings)

Mark Spivey

Helping us all "Figure It Out" (Explore, Describe, Explain), many Differentiations + Integrations at any time .

9mo

how does this compare with reductionist cognitive science theories where no free-will exists, as in deterministic reductionist cognition of humans .

Like
Reply
Vasily Orlov

I help companies realise the full potential of their investment in data and AI

9mo

A must read. Thanks Pranab Ghosh for sharing

See more comments

To view or add a comment, sign in

Explore topics