In Advances in Neural Information Processing Systems (NIPS) 25, pp. 1430-1438. 2012.
Multiple Kernel Learning (MKL) generalizes SVMs to the setting where one simultaneously trains a linear classifier and chooses an optimal combination of given base kernels. Model complexity is typ- ically controlled using various norm regularizations on the base kernel mixing coefficients. Existing methods neither regularize nor exploit potentially useful information pertaining to how kernels in the input set ‘interact’; that is, higher order kernel-pair relationships that can be easily obtained via unsupervised (similarity, geodesics), supervised (correlation in errors), or domain knowledge driven mechanisms (which features were used to construct the kernel?). We show that by substituting the norm penalty with an arbitrary quadratic function Q ≽ 0, one can impose a desired covariance structure on mixing weights, and use this as an inductive bias when learning the concept. This for- mulation significantly generalizes the widely used 1- and 2-norm MKL objectives. We explore the model’s utility via experiments on a challenging Neuroimaging problem, where the goal is to pre- dict a subject’s conversion to Alzheimer’s Disease (AD) by exploiting aggregate information from many distinct imaging modalities. Here, our new model outperforms the state of the art (p-values ≪ 10−3). We briefly discuss ramifications in terms of learning bounds (Rademacher complexity).