Verification using formalised mathematics and theorem proving of reinforcement and deep learning

Chevallier, Mark

Verification using formalised mathematics and theorem proving of reinforcement and deep learning

Simple item page

dc.contributor.advisor

Fleuriot, Jacques

dc.contributor.advisor

Jackson, Paul

dc.contributor.author

Chevallier, Mark

dc.contributor.sponsor

Engineering and Physical Sciences Research Council (EPSRC)

en

dc.date.accessioned

2024-08-05T10:44:41Z

dc.date.available

2024-08-05T10:44:41Z

dc.date.issued

2024-08-05

dc.description.abstract

In modern artificial intelligence research, frequently there is little emphasis on mathe- matical certainty; results are often shown by experimentation, and understanding pre- cisely why a particular method works, or the guarantees that they will be effective, is often constrained to speculation and discussion. Formal mathematics via theorem proving brings a precision of explanation and certainty that can be missing in this field. We present work that applies the benefits of formal mathematics to two different fields of artificial intelligence, in two different ways. Using the Isabelle theorem prover, we formalise Markov Decision Processes (MDPs) with rewards, fundamental to reinforcement learning, and use this as the basis for a formalisation of Q learning, a significant reinforcement learning algorithm. Q learning attempts to learn the reward function of an unknown MDP by estimation, correcting its estimates as it navigates the MDP repeatedly. We also formalise the Dvoretzky Stochastic Approximation theorem, a result fundamental to many stochastic processes. It is especially relevant to our work as it is necessary to prove that (given certain assumptions) the estimates of the Q learning algorithm converge to the true values of the reward function. Secondly, we use theorem proving to integrate a formalised logical system with deep learning, into a neurosymbolic process. We formalise Linear Temporal Logic over finite paths (LTLf), and develop a loss function (and its derivative) over it that returns a real value corresponding to the satisfaction of a given LTLf constraint over a given path. We prove that this is sound with respect to the semantics of LTLf. We use the code generation capabilities of Isabelle to then integrate this into a PyTorch deep learning process designed to learn trajectories. Lastly, we demonstrate experimentally that we can use the resulting neurosymbolic process to learn using LTLf constraints on the trajectories as well as by imitation of a demonstrator.

en

dc.identifier.uri

https://hdl.handle.net/1842/42053

dc.identifier.uri

http://dx.doi.org/10.7488/era/4775

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Mark Chevallier, Matthew Whyte, and Jacques D. Fleuriot. Constrained training of neural networks via theorem proving. In Short Paper Proceedings of the 4th Workshop on Artificial Intelligence and Formal Verification, Logic, Automata, and Synthesis, volume 3311, pages 7–12. CEUR-WS.org, 2022.

en

dc.subject

Markov Decision Processes

en

dc.subject

Q learning

en

dc.subject

Dvoretzky Stochastic Approximation theorem

en

dc.subject

LTLf

en

dc.subject

PyTorch deep learning process

en

dc.subject

neurosymbolic processes

en

dc.title

Verification using formalised mathematics and theorem proving of reinforcement and deep learning

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Chevallier2024.pdf
Size:: 2.1 MB
Format:: Adobe Portable Document Format
Description:

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection