Verification using formalised mathematics and theorem proving of reinforcement and deep learning

Chevallier, Mark

Verification using formalised mathematics and theorem proving of reinforcement and deep learning

Files

Chevallier2024.pdf (2.1 MB)

Date

2024-08-05

Authors

Chevallier, Mark

Full item page

Abstract

In modern artificial intelligence research, frequently there is little emphasis on mathe- matical certainty; results are often shown by experimentation, and understanding pre- cisely why a particular method works, or the guarantees that they will be effective, is often constrained to speculation and discussion. Formal mathematics via theorem proving brings a precision of explanation and certainty that can be missing in this field. We present work that applies the benefits of formal mathematics to two different fields of artificial intelligence, in two different ways. Using the Isabelle theorem prover, we formalise Markov Decision Processes (MDPs) with rewards, fundamental to reinforcement learning, and use this as the basis for a formalisation of Q learning, a significant reinforcement learning algorithm. Q learning attempts to learn the reward function of an unknown MDP by estimation, correcting its estimates as it navigates the MDP repeatedly. We also formalise the Dvoretzky Stochastic Approximation theorem, a result fundamental to many stochastic processes. It is especially relevant to our work as it is necessary to prove that (given certain assumptions) the estimates of the Q learning algorithm converge to the true values of the reward function. Secondly, we use theorem proving to integrate a formalised logical system with deep learning, into a neurosymbolic process. We formalise Linear Temporal Logic over finite paths (LTLf), and develop a loss function (and its derivative) over it that returns a real value corresponding to the satisfaction of a given LTLf constraint over a given path. We prove that this is sound with respect to the semantics of LTLf. We use the code generation capabilities of Isabelle to then integrate this into a PyTorch deep learning process designed to learn trajectories. Lastly, we demonstrate experimentally that we can use the resulting neurosymbolic process to learn using LTLf constraints on the trajectories as well as by imitation of a demonstrator.

URI

https://hdl.handle.net/1842/42053
http://dx.doi.org/10.7488/era/4775

This item appears in the following Collection(s)

Informatics thesis and dissertation collection