An Investigation of Supervised Learning in Genetic Programming
This thesis is an investigation into Supervised Learning (SL) in Genetic Programming (GP). With its flexible tree-structured representation, GP is a type of Genetic Algorithm, using the Darwinian idea of natural selection and genetic recombination, evolving populations of solutions over many generations to solve problems. SL is a common approach in Machine Learning where the problem is presented as a set of examples. A good or fit solution is one which can successfully deal with all of the examples.In common with most Machine Learning approaches, GP has been used to solve many trivial problems. When applied to larger and more complex problems, however, several difficulties become apparent. When focusing on the basic features of GP, this thesis highlights the immense size of the GP search space, and describes an approach to measure this space. A stupendously flexible but frustratingly useless representation, Anarchically Automatically Defined Functions, is described. Some difficulties associated with the normal use of the GP operator Crossover (perhaps the most common method of combining GP trees to produce new trees) are demonstrated in the simple MAX problem. Crossover can lead to irreversible sub-optimal GP performance when used in combination with a restriction on tree size. There is a brief study of tournament selection which is a common method of selecting fit individuals from a GP population to act as parents in the construction of the next generation.The main contributions of this thesis however are two approaches for avoiding the fitness evaluation bottleneck resulting from the use of SL in GP. to establish the capability of a GP individual using SL, it must be tested or evaluated against each example in the set of training examples.