Edinburgh Research Archive

Verification using formalised mathematics and theorem proving of reinforcement and deep learning

Item Status

Embargo End Date

Authors

Chevallier, Mark

Abstract

In modern artificial intelligence research, frequently there is little emphasis on mathe- matical certainty; results are often shown by experimentation, and understanding pre- cisely why a particular method works, or the guarantees that they will be effective, is often constrained to speculation and discussion. Formal mathematics via theorem proving brings a precision of explanation and certainty that can be missing in this field. We present work that applies the benefits of formal mathematics to two different fields of artificial intelligence, in two different ways. Using the Isabelle theorem prover, we formalise Markov Decision Processes (MDPs) with rewards, fundamental to reinforcement learning, and use this as the basis for a formalisation of Q learning, a significant reinforcement learning algorithm. Q learning attempts to learn the reward function of an unknown MDP by estimation, correcting its estimates as it navigates the MDP repeatedly. We also formalise the Dvoretzky Stochastic Approximation theorem, a result fundamental to many stochastic processes. It is especially relevant to our work as it is necessary to prove that (given certain assumptions) the estimates of the Q learning algorithm converge to the true values of the reward function. Secondly, we use theorem proving to integrate a formalised logical system with deep learning, into a neurosymbolic process. We formalise Linear Temporal Logic over finite paths (LTLf), and develop a loss function (and its derivative) over it that returns a real value corresponding to the satisfaction of a given LTLf constraint over a given path. We prove that this is sound with respect to the semantics of LTLf. We use the code generation capabilities of Isabelle to then integrate this into a PyTorch deep learning process designed to learn trajectories. Lastly, we demonstrate experimentally that we can use the resulting neurosymbolic process to learn using LTLf constraints on the trajectories as well as by imitation of a demonstrator.

This item appears in the following Collection(s)