Improving complex reasoning in large language models
dc.contributor.advisor
Lapata, Maria
dc.contributor.advisor
Titov, Ivan
dc.contributor.author
Fu, Yao
dc.date.accessioned
2025-06-11T12:09:27Z
dc.date.available
2025-06-11T12:09:27Z
dc.date.issued
2025-06-11
dc.description.abstract
This thesis studies complex reasoning in language models. We use the term reasoning to refer to tasks that would require a human to perform slow deliberate, step-by-step thinking (instead of providing an intuitive and instantaneous response) , such as mathematical and scientific reasoning, commonsense reasoning, logical reasoning, and strategic reasoning. We use reasoning capability to collectively refer to the ability to solve tasks requiring complex sub-problem decomposition and detailed step-by-step analysis.
Our motivation for studying reasoning in language models stems from intriguing theoretical properties (e.g., how scaling laws relate to emergent abilities) and their vast application potential. From an application perspective, we envisage large language models (LLMs) to become the next-generation computational platforms, just like operating systems, and aim to build a new application ecosystem upon LLMs. This vision naturally requires the underlying base model to be able to reason over various complex real-world scenarios. From a modeling perspective, complex reasoning is viewed as a typical ability that emerges with scaling: given other conditions being proper (e.g., given clean data and stable training process), the more compute one spends, the more likely the model has stronger reasoning capability.
We start by reviewing the learning paradigms of large language models, and then discuss fundamental methods for improving reasoning along multiple stages of the model development pipeline. Typically, modern language model development consists of four stages: pretraining, instruction finetuning, reinforcement learning from human feedback, and in-context learning after model deployment. This thesis discusses improving reasoning by in-context learning, finetuning, and learning from feedback. For in-context learning, we propose complexity-based prompting, and demonstrate that the model’s scientific and logical reasoning performance consistently improves as the complexity of in-context demonstrations improves. This work achieved state-of-the-art performance on the GSM8K [Cobbe et al., 2021] and MATH [Hendrycks et al.] datasets at the time it was proposed and has influenced follow-on work by highlighting the importance of data complexity. For instruction tuning, we devise a detailed recipe for
specializing smaller language models on mathematical reasoning tasks. We highlight the importance of chain-of-though formatted data, the use of a finetuned checkpoint, and the balance between capabilities of different directions. This work significantly improved small models’ GSM8K and other math performance by the time it was proposed and has consistently influenced follow-on work by highlighting the importance of capability balancing. For learning from AI feedback, we show the possibility of constructing a self-improving agent on strategic reasoning tasks by letting agents play against and criticize each other, and show that the ability to self-improve is strongly correlated with the base model and how much it aligns with human instructions. Finally, we review the current state-of-the-art models, highlighting the benchmark saturation problem and the importance of constructing new challenging datasets. We further discuss future directions on multimodal scaling and iterative learning from human, environment, and AI feedback.
en
dc.identifier.uri
https://hdl.handle.net/1842/43549
dc.identifier.uri
http://dx.doi.org/10.7488/era/6083
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, and Tushar Knot. Complexity-Based Prompting for Multi-step Reasoning. ICLR 2023
en
dc.relation.hasversion
Yao Fu, Hao Peng, Litu Ou, Ashish Sabharwal, and Tushar Khot. Specializing Smaller Language Models towards Multi-Step Reasoning. ICML 2023
en
dc.relation.hasversion
Yao Fu, Hao Peng, Tushar Khot, and Mirella Lapata. Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback. Arxiv 2023
en
dc.relation.hasversion
Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, Ashish Sabharwal. ICLR 2023. Decomposed prompting: A modular approach for solving complex tasks.
en
dc.relation.hasversion
Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen. ICLR 2024. Mammoth: Building math generalist models through hybrid instruction tuning
en
dc.relation.hasversion
Yao Fu, Hao Peng and Tushar Khot 2022. How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources.
en
dc.relation.hasversion
Yao Fu 2023. Towards Complex Reasoning: the Polaris of Large Language Models.
en
dc.relation.hasversion
Yao Fu 2023. A Stage Review of Instruction Tuning.
en
dc.relation.hasversion
Yao Fu, Litu Ou, Mingyu Chen and Yuhao Wan. Chain-of-thougth Hub: Measuring LLMs’ Reasoning Performance. ICML 2023, Deployable GenAI Workshop
en
dc.subject
large language models
en
dc.subject
machine reasoning
en
dc.subject
in-context learning
en
dc.subject
finetuning
en
dc.subject
self-improvement
en
dc.subject
AI feedback
en
dc.subject
agent
en
dc.title
Improving complex reasoning in large language models
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- Fu2025.pdf
- Size:
- 4.84 MB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

