Improving complex reasoning in large language models

Fu, Yao

Improving complex reasoning in large language models

Simple item page

dc.contributor.advisor

Lapata, Maria

dc.contributor.advisor

Titov, Ivan

dc.contributor.author

Fu, Yao

dc.date.accessioned

2025-06-11T12:09:27Z

dc.date.available

2025-06-11T12:09:27Z

dc.date.issued

2025-06-11

dc.description.abstract

This thesis studies complex reasoning in language models. We use the term reasoning to refer to tasks that would require a human to perform slow deliberate, step-by-step thinking (instead of providing an intuitive and instantaneous response) , such as mathematical and scientific reasoning, commonsense reasoning, logical reasoning, and strategic reasoning. We use reasoning capability to collectively refer to the ability to solve tasks requiring complex sub-problem decomposition and detailed step-by-step analysis. Our motivation for studying reasoning in language models stems from intriguing theoretical properties (e.g., how scaling laws relate to emergent abilities) and their vast application potential. From an application perspective, we envisage large language models (LLMs) to become the next-generation computational platforms, just like operating systems, and aim to build a new application ecosystem upon LLMs. This vision naturally requires the underlying base model to be able to reason over various complex real-world scenarios. From a modeling perspective, complex reasoning is viewed as a typical ability that emerges with scaling: given other conditions being proper (e.g., given clean data and stable training process), the more compute one spends, the more likely the model has stronger reasoning capability. We start by reviewing the learning paradigms of large language models, and then discuss fundamental methods for improving reasoning along multiple stages of the model development pipeline. Typically, modern language model development consists of four stages: pretraining, instruction finetuning, reinforcement learning from human feedback, and in-context learning after model deployment. This thesis discusses improving reasoning by in-context learning, finetuning, and learning from feedback. For in-context learning, we propose complexity-based prompting, and demonstrate that the model’s scientific and logical reasoning performance consistently improves as the complexity of in-context demonstrations improves. This work achieved state-of-the-art performance on the GSM8K [Cobbe et al., 2021] and MATH [Hendrycks et al.] datasets at the time it was proposed and has influenced follow-on work by highlighting the importance of data complexity. For instruction tuning, we devise a detailed recipe for specializing smaller language models on mathematical reasoning tasks. We highlight the importance of chain-of-though formatted data, the use of a finetuned checkpoint, and the balance between capabilities of different directions. This work significantly improved small models’ GSM8K and other math performance by the time it was proposed and has consistently influenced follow-on work by highlighting the importance of capability balancing. For learning from AI feedback, we show the possibility of constructing a self-improving agent on strategic reasoning tasks by letting agents play against and criticize each other, and show that the ability to self-improve is strongly correlated with the base model and how much it aligns with human instructions. Finally, we review the current state-of-the-art models, highlighting the benchmark saturation problem and the importance of constructing new challenging datasets. We further discuss future directions on multimodal scaling and iterative learning from human, environment, and AI feedback.

en

dc.identifier.uri

https://hdl.handle.net/1842/43549

dc.identifier.uri

http://dx.doi.org/10.7488/era/6083

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, and Tushar Knot. Complexity-Based Prompting for Multi-step Reasoning. ICLR 2023

en

dc.relation.hasversion

Yao Fu, Hao Peng, Litu Ou, Ashish Sabharwal, and Tushar Khot. Specializing Smaller Language Models towards Multi-Step Reasoning. ICML 2023

en

dc.relation.hasversion

Yao Fu, Hao Peng, Tushar Khot, and Mirella Lapata. Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback. Arxiv 2023

en

dc.relation.hasversion

Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, Ashish Sabharwal. ICLR 2023. Decomposed prompting: A modular approach for solving complex tasks.

en

dc.relation.hasversion

Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen. ICLR 2024. Mammoth: Building math generalist models through hybrid instruction tuning

en

dc.relation.hasversion

Yao Fu, Hao Peng and Tushar Khot 2022. How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources.

en

dc.relation.hasversion

Yao Fu 2023. Towards Complex Reasoning: the Polaris of Large Language Models.

en

dc.relation.hasversion

Yao Fu 2023. A Stage Review of Instruction Tuning.

en

dc.relation.hasversion

Yao Fu, Litu Ou, Mingyu Chen and Yuhao Wan. Chain-of-thougth Hub: Measuring LLMs’ Reasoning Performance. ICML 2023, Deployable GenAI Workshop

en

dc.subject

large language models

en

dc.subject

machine reasoning

en

dc.subject

in-context learning

en

dc.subject

finetuning

en

dc.subject

self-improvement

en

dc.subject

AI feedback

en

dc.subject

agent

en

dc.title

Improving complex reasoning in large language models

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Fu2025.pdf
Size:: 4.84 MB
Format:: Adobe Portable Document Format
Description:

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection