Efficient Learning and Feature Selection in High Dimensional Regression
We present a novel algorithm for efficient learning and feature selection in high-dimensional regression problems. We arrive at this model through a modification of the standard regression model, enabling us to derive a probabilistic version of the well-known statistical regression technique of backfitting. Using the expectation-maximization algorithm, along with variational approximation methods to overcome intractability, we extend our algorithm to include automatic relevance detection of the input features. This variational Bayesian least squares (VBLS) approach retains its simplicity as a linear model, but offers a novel statistically robust black-box approach to generalized linear regression with high-dimensional inputs. It can be easily extended to nonlinear regression and classification problems. In particular, we derive the framework of sparse Bayesian learning, the relevance vector machine, with VBLS at its core, offering significant computational and robustness advantages for this class of methods. The iterative nature of VBLS makes it most suitable for real-time incremental learning, which is crucial especially in the application domain of robotics, brain-machine interfaces, and neural prosthetics, where real-time learning of models for control is needed. We evaluate our algorithm on synthetic and neurophysiological data sets, as well as on standard regression and classification benchmark data sets, comparing it with other competitive statistical approaches and demonstrating its suitability as a drop-in replacement for other generalized linear regression techniques.