Verification using formalised mathematics and theorem proving of reinforcement and deep learning
dc.contributor.advisor
Fleuriot, Jacques
dc.contributor.advisor
Jackson, Paul
dc.contributor.author
Chevallier, Mark
dc.contributor.sponsor
Engineering and Physical Sciences Research Council (EPSRC)
en
dc.date.accessioned
2024-08-05T10:44:41Z
dc.date.available
2024-08-05T10:44:41Z
dc.date.issued
2024-08-05
dc.description.abstract
In modern artificial intelligence research, frequently there is little emphasis on mathe-
matical certainty; results are often shown by experimentation, and understanding pre-
cisely why a particular method works, or the guarantees that they will be effective, is
often constrained to speculation and discussion.
Formal mathematics via theorem proving brings a precision of explanation and
certainty that can be missing in this field. We present work that applies the benefits
of formal mathematics to two different fields of artificial intelligence, in two different
ways.
Using the Isabelle theorem prover, we formalise Markov Decision Processes (MDPs)
with rewards, fundamental to reinforcement learning, and use this as the basis for a
formalisation of Q learning, a significant reinforcement learning algorithm. Q learning
attempts to learn the reward function of an unknown MDP by estimation, correcting its estimates as it navigates the MDP repeatedly. We also formalise the Dvoretzky Stochastic Approximation theorem, a result fundamental to many stochastic processes.
It is especially relevant to our work as it is necessary to prove that (given certain assumptions) the estimates of the Q learning algorithm converge to the true values of the reward function.
Secondly, we use theorem proving to integrate a formalised logical system with
deep learning, into a neurosymbolic process. We formalise Linear Temporal Logic
over finite paths (LTLf), and develop a loss function (and its derivative) over it that
returns a real value corresponding to the satisfaction of a given LTLf constraint over a
given path. We prove that this is sound with respect to the semantics of LTLf. We use
the code generation capabilities of Isabelle to then integrate this into a PyTorch deep
learning process designed to learn trajectories. Lastly, we demonstrate experimentally
that we can use the resulting neurosymbolic process to learn using LTLf constraints
on the trajectories as well as by imitation of a demonstrator.
en
dc.identifier.uri
https://hdl.handle.net/1842/42053
dc.identifier.uri
http://dx.doi.org/10.7488/era/4775
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Mark Chevallier, Matthew Whyte, and Jacques D. Fleuriot. Constrained training of neural networks via theorem proving. In Short Paper Proceedings of the 4th Workshop on Artificial Intelligence and Formal Verification, Logic, Automata, and Synthesis, volume 3311, pages 7–12. CEUR-WS.org, 2022.
en
dc.subject
Markov Decision Processes
en
dc.subject
Q learning
en
dc.subject
Dvoretzky Stochastic Approximation theorem
en
dc.subject
LTLf
en
dc.subject
PyTorch deep learning process
en
dc.subject
neurosymbolic processes
en
dc.title
Verification using formalised mathematics and theorem proving of reinforcement and deep learning
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- Chevallier2024.pdf
- Size:
- 2.1 MB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

