Using Interior Point Methods for Large-scale Support Vector Machine training
Date
2010Author
Woodsend, Kristian
Metadata
Abstract
Support Vector Machines (SVMs) are powerful machine learning techniques for classification
and regression, but the training stage involves a convex quadratic optimization program
that is most often computationally expensive. Traditionally, active-set methods have been
used rather than interior point methods, due to the Hessian in the standard dual formulation
being completely dense. But as active-set methods are essentially sequential, they may not
be adequate for machine learning challenges of the future. Additionally, training time may be
limited, or data may grow so large that cluster-computing approaches need to be considered.
Interior point methods have the potential to answer these concerns directly. They scale
efficiently, they can provide good early approximations, and they are suitable for parallel
and multi-core environments. To apply them to SVM training, it is necessary to address
directly the most computationally expensive aspect of the algorithm. We therefore present an
exact reformulation of the standard linear SVM training optimization problem that exploits
separability of terms in the objective. By so doing, per-iteration computational complexity
is reduced from O(n3) to O(n). We show how this reformulation can be applied to many
machine learning problems in the SVM family.
Implementation issues relating to specializing the algorithm are explored through extensive
numerical experiments. They show that the performance of our algorithm for large dense
or noisy data sets is consistent and highly competitive, and in some cases can out perform all
other approaches by a large margin. Unlike active set methods, performance is largely unaffected
by noisy data. We also show how, by exploiting the block structure of the augmented
system matrix, a hybrid MPI/Open MP implementation of the algorithm enables data and
linear algebra computations to be efficiently partitioned amongst parallel processing nodes
in a clustered computing environment.
The applicability of our technique is extended to nonlinear SVMs by low-rank approximation
of the kernel matrix. We develop a heuristic designed to represent clusters using a
small number of features. Additionally, an early approximation scheme reduces the number of samples that need to be considered. Both elements improve the computational efficiency
of the training phase.
Taken as a whole, this thesis shows that with suitable problem formulation and efficient
implementation techniques, interior point methods are a viable optimization technology to
apply to large-scale SVM training, and are able to provide state-of-the-art performance.