We all have the habit of trying to guess the killer in a movie before the big reveal. That’s us making inferences. It’s what happens when your brain connects the dots without being told everything ...
Having read the paper, I am not sure how applicable these results are. The model they used was a version of a tiny, 6-year old LLM, GPT 2, which has ONLY 774 million parameters. It appears that CoT ...