The aim of supervised learning is to approximate an unknown target function
by adjusting the parameters of a learning model in response to possibly noisy
examples generated by the target function. The performance of the learning model
at this task can be quantified by examining its generalization ability. Initially the
concept of generalization is reviewed, and various methods of measuring it, such as
generalization error, prediction error, PAC learning and the evidence, are discussed
and the relations between them examined. Some of these relations are dependent
on the architecture of the learning model.
Two architectures are prevalent in practical supervised learning: the multi -layer
perceptron (MLP) and the radial basis function network (RBF). While the RBF
has previously been examined from a worst -case perspective, this gives little insight
into the performance and phenomena that can be expected in the typical case.
This thesis focusses on the properties of learning and generalization that can be
expected on average in the RBF.
There are two methods in use for training the RBF. The basis functions can be
fixed in advance, utilising an unsupervised learning algorithm, or can adapt during
the training process. For the case in which the basis functions are fixed, the
typical generalization error given a data set of particular size is calculated by
employing the Bayesian framework. The effects of noisy data and regularization
are examined, the optimal settings of the parameters that control the learning
process are calculated, and the consequences of a mismatch between the learning
model and the data -generating mechanism are demonstrated.
The second case, in which the basis functions are adapted, is studied utilising the
on -line learning paradigm. The average evolution of generalization error is calculated in a manner which allows the phenomena of the learning process, such as the
specialization of the basis functions, to be eludicated. The three most important
stages of training: the symmetric phase, the symmetry- breaking phase and the
convergence phase, are analyzed in detail; the convergence phase analysis allows
the derivation of maximal and optimal learning rates. Noise on both the inputs
and outputs of the data -generating mechanism is introduced, and the consequences
examined. Regularization via weight decay is also studied, as are the effects of the
learning model being poorly matched to the data generator.