Verification using formalised mathematics and theorem proving of reinforcement and deep learning
Item Status
Embargo End Date
Date
Authors
Chevallier, Mark
Abstract
In modern artificial intelligence research, frequently there is little emphasis on mathe-
matical certainty; results are often shown by experimentation, and understanding pre-
cisely why a particular method works, or the guarantees that they will be effective, is
often constrained to speculation and discussion.
Formal mathematics via theorem proving brings a precision of explanation and
certainty that can be missing in this field. We present work that applies the benefits
of formal mathematics to two different fields of artificial intelligence, in two different
ways.
Using the Isabelle theorem prover, we formalise Markov Decision Processes (MDPs)
with rewards, fundamental to reinforcement learning, and use this as the basis for a
formalisation of Q learning, a significant reinforcement learning algorithm. Q learning
attempts to learn the reward function of an unknown MDP by estimation, correcting its estimates as it navigates the MDP repeatedly. We also formalise the Dvoretzky Stochastic Approximation theorem, a result fundamental to many stochastic processes.
It is especially relevant to our work as it is necessary to prove that (given certain assumptions) the estimates of the Q learning algorithm converge to the true values of the reward function.
Secondly, we use theorem proving to integrate a formalised logical system with
deep learning, into a neurosymbolic process. We formalise Linear Temporal Logic
over finite paths (LTLf), and develop a loss function (and its derivative) over it that
returns a real value corresponding to the satisfaction of a given LTLf constraint over a
given path. We prove that this is sound with respect to the semantics of LTLf. We use
the code generation capabilities of Isabelle to then integrate this into a PyTorch deep
learning process designed to learn trajectories. Lastly, we demonstrate experimentally
that we can use the resulting neurosymbolic process to learn using LTLf constraints
on the trajectories as well as by imitation of a demonstrator.
This item appears in the following Collection(s)

