JCSE

Back Issues

JCSE, vol. 7, no. 2, pp.99-111, June, 2013

DOI: http://dx.doi.org/10.5626/JCSE.2013.7.2.99

Direct Divergence Approximation between Probability Distributions and Its Applications in Machine Learning

Masashi Sugiyama, Song Liu, and Marthinus Christoffel du Plessis, Masao Yamanaka, Makoto Yamada, Taiji Suzuki, Takafumi Kanamori
Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan/ Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology, Yokohama, Japan/ NTT Communication Science Laboratories, NTT Corporation, Kyoto, Japan/ Department of Mathematical Informatics, The University of Tokyo, Tokyo, Japan/ Department of Computer Science and Mathematical Informatics, Nagoya University, Nagoya, Japan

Abstract: Approximating a divergence between two probability distributions from their samples is a fundamental challenge in statistics, information theory, and machine learning. A divergence approximator can be used for various purposes, such as two-sample homogeneity testing, change-point detection, and class-balance estimation. Furthermore, an approximator of a divergence between the joint distribution and the product of marginals can be used for independence testing, which has a wide range of applications, including feature selection and extraction, clustering, object matching, independent component analysis, and causal direction estimation. In this paper, we review recent advances in divergence approximation. Our emphasis is that directly approximating the divergence without estimating probability distributions is more sensible than a naive twostep approach of first estimating probability distributions and then approximating the divergence. Furthermore, despite the overwhelming popularity of the Kullback-Leibler divergence as a divergence measure, we argue that alternatives such as the Pearson divergence, the relative Pearson divergence, and the L2-distance are more useful in practice because of their computationally efficient approximability, high numerical stability, and superior robustness against outliers.

Keyword: Machine learning; Probability distributions; Kullback-Leibler divergence; Pearson divergence; L²-distance

Full Paper: 224 Downloads, 3188 View