LLM Reasoning: Key Ideas and Limitations

Reasoning is pivotal for advancing LLM capabilities

Introduction

Expectations for AI: Solving complex math problems, discovering scientific theories, achieving AGI.
Baseline Expectation: AI should emulate human-like learning with few examples.

What is Missing in ML?
- Reasoning: The ability to logically derive answers from minimal examples.

Problem

: Extract the last letters of words and concatenate them.
- Example: "Elon Musk" → "nk".
Traditional ML: Requires significant labeled data.
LLMs: Achieve 100% accuracy with one demonstration using reasoning.

Humans solve problems through reasoning and intermediate steps.
Example:
- Input: "Elon Musk"
- Reasoning: Last letter of "Elon" = "n", of "Musk" = "k".
- Output: "nk".

Chain-of-Thought (CoT) Prompting
- Breaking problems into logical steps.
- Examples from math word problems demonstrate enhanced problem-solving accuracy.
Least-to-Most Prompting
- Decomposing problems into easier sub-questions for gradual generalization.
Analogical Reasoning
- Adapting solutions from related problems.
- Example: Finding the area of a square by recalling distance formula logic.
Zero-Shot and Few-Shot CoT
- Triggering reasoning without explicit examples.
Self-Consistency in Decoding
- Sampling multiple responses to improve step-by-step reasoning accuracy.

Distraction by Irrelevant Context
- Adding irrelevant details significantly lowers performance.
- Solution: Explicitly instructing the model to ignore distractions.
Challenges in Self-Correction
- LLMs can fail to self-correct errors, sometimes worsening correct answers.
- Oracle feedback is essential for effective corrections.
Premise Order Matters
- Performance drops with re-ordered problem premises, emphasizing logical progression.

Intermediate reasoning steps are crucial for solving serial problems.
Techniques like self-debugging with unit tests are promising for future improvements.

Defining the right problem is critical for progress.
Solving reasoning limitations by developing models that autonomously address these issues.

Want to keep learning more?

Twitter LinkedIn Telegram Discord 小红书